<mets:mets OBJID="eprint_9623" LABEL="Eprints Item" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd" xmlns:mets="http://www.loc.gov/METS/" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><mets:metsHdr CREATEDATE="2023-07-04T23:20:58Z"><mets:agent ROLE="CUSTODIAN" TYPE="ORGANIZATION"><mets:name>OAR@ICRISAT</mets:name></mets:agent></mets:metsHdr><mets:dmdSec ID="DMD_eprint_9623_mods"><mets:mdWrap MDTYPE="MODS"><mets:xmlData><mods:titleInfo><mods:title>NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation&#13;
Sequencing (Illumina) Data</mods:title></mods:titleInfo><mods:name type="personal"><mods:namePart type="given">M A V S K</mods:namePart><mods:namePart type="family">Katta</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:name type="personal"><mods:namePart type="given">A W</mods:namePart><mods:namePart type="family">Khan</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:name type="personal"><mods:namePart type="given">D</mods:namePart><mods:namePart type="family">Doddamani</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:name type="personal"><mods:namePart type="given">M</mods:namePart><mods:namePart type="family">Thudi</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:name type="personal"><mods:namePart type="given">R K</mods:namePart><mods:namePart type="family">Varshney</mods:namePart><mods:role><mods:roleTerm type="text">author</mods:roleTerm></mods:role></mods:name><mods:abstract>Rapid popularity and adaptation of next generation sequencing (NGS) approaches have&#13;
generated huge volumes of data. High throughput platforms like Illumina HiSeq produce&#13;
terabytes of raw data that requires quick processing. Quality control of the data is an&#13;
important component prior to the downstream analyses. To address these issues, we have&#13;
developed a quality control pipeline, NGS-QCbox that scales up to process hundreds or&#13;
thousands of samples. Raspberry is an in-house tool, developed in C language utilizing&#13;
HTSlib (v1.2.1) (http://htslib.org), for computing read/base level statistics. It can be used as&#13;
stand-alone application and can process both compressed and uncompressed FASTQ format&#13;
files. NGS-QCbox integrates Raspberry with other open-source tools for alignment&#13;
(Bowtie2), SNP calling (SAMtools) and other utilities (bedtools) towards analyzing raw NGS&#13;
data at higher efficiency and in high-throughput manner. The pipeline implements batch processing&#13;
of jobs using Bpipe (https://github.com/ssadedin/bpipe) in parallel and internally, a&#13;
fine grained task parallelization utilizing OpenMP. It reports read and base statistics along&#13;
with genome coverage and variants in a user friendly format. The pipeline developed presents&#13;
a simple menu driven interface and can be used in either quick or complete mode. In&#13;
addition, the pipeline in quick mode outperforms in speed against other similar existing QC&#13;
pipeline/tools. The NGS-QCbox pipeline, Raspberry tool and associated scripts are made&#13;
available at the URL https://github.com/CEG-ICRISAT/NGS-QCbox and https://github.com/&#13;
CEG-ICRISAT/Raspberry for rapid quality control analysis of large-scale next generation&#13;
sequencing (Illumina) data.</mods:abstract><mods:classification authority="lcc">Agriculture-Farming, Production, Technology, Economics</mods:classification><mods:originInfo><mods:dateIssued encoding="iso8061">2015-10</mods:dateIssued></mods:originInfo><mods:originInfo><mods:publisher>Public Library of Science</mods:publisher></mods:originInfo><mods:genre>Article</mods:genre></mets:xmlData></mets:mdWrap></mets:dmdSec><mets:amdSec ID="TMD_eprint_9623"><mets:rightsMD ID="rights_eprint_9623_mods"><mets:mdWrap MDTYPE="MODS"><mets:xmlData><mods:useAndReproduction>
<p xmlns="http://www.w3.org/1999/xhtml"><strong>For work being deposited by its own author:</strong> 
In self-archiving this collection of files and associated bibliographic 
metadata, I grant OAR@ICRISAT the right to store 
them and to make them permanently available publicly for free on-line. 
I declare that this material is my own intellectual property and I 
understand that OAR@ICRISAT does not assume any 
responsibility if there is any breach of copyright in distributing these 
files or metadata. (All authors are urged to prominently assert their 
copyright on the title page of their work.)</p>

<p xmlns="http://www.w3.org/1999/xhtml"><strong>For work being deposited by someone other than its 
author:</strong> I hereby declare that the collection of files and 
associated bibliographic metadata that I am archiving at 
OAR@ICRISAT) is in the public domain. If this is 
not the case, I accept full responsibility for any breach of copyright 
that distributing these files or metadata may entail.</p>

<p xmlns="http://www.w3.org/1999/xhtml">Clicking on the deposit button indicates your agreement to these 
terms.</p>
    </mods:useAndReproduction></mets:xmlData></mets:mdWrap></mets:rightsMD></mets:amdSec><mets:fileSec><mets:fileGrp USE="reference"><mets:file ID="eprint_9623_44541_1" SIZE="1101712" OWNERID="http://oar.icrisat.org/9623/1/PLOS-2015.pdf" MIMETYPE="application/pdf"><mets:FLocat LOCTYPE="URL" xlink:type="simple" xlink:href="http://oar.icrisat.org/9623/1/PLOS-2015.pdf"></mets:FLocat></mets:file></mets:fileGrp></mets:fileSec><mets:structMap><mets:div DMDID="DMD_eprint_9623_mods" ADMID="TMD_eprint_9623"><mets:fptr FILEID="eprint_9623_document_44541_1"></mets:fptr></mets:div></mets:structMap></mets:mets>