Skip to main content
ARS Home » Midwest Area » Peoria, Illinois » National Center for Agricultural Utilization Research » Mycotoxin Prevention and Applied Microbiology Research » Research » Publications at this Location » Publication #331761

Research Project: Genomic Analyses and Management of Agricultural and Industrial Microbial Genetic Resources and Associated Information

Location: Mycotoxin Prevention and Applied Microbiology Research

Title: in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies

Author
item ZHOU, XIAOFAN - Vanderbilt University
item PERSIS, DAVID - University Of Wisconsin
item KOMINEK, JACEK - University Of Wisconsin
item Kurtzman, Cletus
item HITTINGER, CHRIS - University Of Wisconsin
item ROKAS, ANTONIS - Vanderbilt University

Submitted to: G3, Genes/Genomes/Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/17/2016
Publication Date: 11/1/2016
Citation: Zhou, X., Persis, D., Kominek, J., Kurtzman, C.P., Hittinger, C.T., Rokas, A. 2016. in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies. G3, Genes/Genomes/Genetics. 6(11):3655-3662.

Interpretive Summary: Recent progress in genomics has enabled decoding of the genome of virtually any organism, greatly expanding the potential for understanding the biology and evolution of all organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly processes. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. Various genome sequencing projects are underway in ARS and elsewhere and application of iWGS to these studies will improve data quality and diminish time needed for data analysis and subsequent application to solving problems in agriculture, biotechnology, and medicine.

Technical Abstract: The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in non-model organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, evaluate the performance of a wide variety of user-specified sequencing strategies, and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.