Subtropical Horticulture Research Site Logo
ARS Home About Us Helptop nav spacerContact Us En Espanoltop nav spacer
Printable VersionPrintable Version     E-mail this pageE-mail this page
Agricultural Research Service United States Department of Agriculture
Search
  Advanced Search
 
Programs and Projects
Subjects of Investigation
 

Research Project: DEVELOPMENT OF AN INTERNATIONAL MARKER ASSISTED SELECTION PROGRAM FOR CACAO

Location: Subtropical Horticulture Research

Title: Evaluation of Methods for de novo Genome assembly from High-throughput Sequencing Reads Reveals Dependencies that Affect the Quality of the Results

Authors
item Haiminen, Niina -
item Kuhn, David
item Parida, Laxmi -
item Rigoutsos, Isidore -

Submitted to: PLoS One
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: August 1, 2011
Publication Date: September 7, 2011
Repository URL: http://doi:10.1371/journal.pone.0024182
Citation: Haiminen, N., Kuhn, D.N., Parida, L., Rigoutsos, I. 2011. Evaluation of Methods for de novo Genome assembly from High-throughput Sequencing Reads Reveals Dependencies that Affect the Quality of the Results. PLoS One. 6(9): e24182.

Interpretive Summary: Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. But for plants where a complete genome sequence is not yet available, determining what kind and how much sequence data needs to be collected to be able to correctly assemble a genome is a difficult task. We have created synthetic datasets of short reads (~100 nt) from already sequenced genomes of different sizes and with different amounts of repetitive sequence and used them to test publicly available assembly programs. Our benchmarks can be used to roughly estimate the amount and type of sequencing coverage necessary to assemble a genome and, hence, roughly estimate the cost of a genome sequencing project.

Technical Abstract: Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (= 100 nucleotides) through a detailed study involving genomic sequences of various lengths in conjunction with several of the currently available assembly programs. Our analysis indicates that the choice of the assembler can have a significant effect on the quality of assembly results. Our empirical computational analysis shows that one is in principle able to determine which sequencing coverage will provide the best assembly in terms of size and correctness, if the attributes of the target genome, assembly program, expected read length and error rate are known.

   

 
Project Team
Gutierrez, Osman
Kuhn, David
 
Publications
   Publications
 
Related National Programs
  Plant Genetic Resources, Genomics and Genetic Improvement (301)
 
 
Last Modified: 05/23/2013
ARS Home | USDA.gov | Site Map | Policies and Links 
FOIA | Accessibility Statement | Privacy Policy | Nondiscrimination Statement | Information Quality | USA.gov | White House