<mods:mods version="3.3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><mods:titleInfo><mods:title>NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation&#13;
Sequencing (Illumina) Data</mods:title></mods:titleInfo><mods:name type="personal"><mods:namePart type="given">M A V S K</mods:namePart><mods:namePart type="family">Katta</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:name type="personal"><mods:namePart type="given">A W</mods:namePart><mods:namePart type="family">Khan</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:name type="personal"><mods:namePart type="given">D</mods:namePart><mods:namePart type="family">Doddamani</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:name type="personal"><mods:namePart type="given">M</mods:namePart><mods:namePart type="family">Thudi</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:name type="personal"><mods:namePart type="given">R K</mods:namePart><mods:namePart type="family">Varshney</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:abstract>Rapid popularity and adaptation of next generation sequencing (NGS) approaches have&#13;
generated huge volumes of data. High throughput platforms like Illumina HiSeq produce&#13;
terabytes of raw data that requires quick processing. Quality control of the data is an&#13;
important component prior to the downstream analyses. To address these issues, we have&#13;
developed a quality control pipeline, NGS-QCbox that scales up to process hundreds or&#13;
thousands of samples. Raspberry is an in-house tool, developed in C language utilizing&#13;
HTSlib (v1.2.1) (http://htslib.org), for computing read/base level statistics. It can be used as&#13;
stand-alone application and can process both compressed and uncompressed FASTQ format&#13;
files. NGS-QCbox integrates Raspberry with other open-source tools for alignment&#13;
(Bowtie2), SNP calling (SAMtools) and other utilities (bedtools) towards analyzing raw NGS&#13;
data at higher efficiency and in high-throughput manner. The pipeline implements batch processing&#13;
of jobs using Bpipe (https://github.com/ssadedin/bpipe) in parallel and internally, a&#13;
fine grained task parallelization utilizing OpenMP. It reports read and base statistics along&#13;
with genome coverage and variants in a user friendly format. The pipeline developed presents&#13;
a simple menu driven interface and can be used in either quick or complete mode. In&#13;
addition, the pipeline in quick mode outperforms in speed against other similar existing QC&#13;
pipeline/tools. The NGS-QCbox pipeline, Raspberry tool and associated scripts are made&#13;
available at the URL https://github.com/CEG-ICRISAT/NGS-QCbox and https://github.com/&#13;
CEG-ICRISAT/Raspberry for rapid quality control analysis of large-scale next generation&#13;
sequencing (Illumina) data.</mods:abstract><mods:classification authority="lcc">Agriculture-Farming, Production, Technology, Economics</mods:classification><mods:originInfo><mods:dateIssued encoding="iso8061">2015-10</mods:dateIssued></mods:originInfo><mods:originInfo><mods:publisher>Public Library of Science</mods:publisher></mods:originInfo><mods:genre>Article</mods:genre></mods:mods>