Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Publications at this Location » Publication #370029

Research Project: MaizeGDB: Enabling Access to Basic, Translational, and Applied Research Information

Location: Corn Insects and Crop Genetics Research

Title: Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Author
item OU, SHUJUN - Iowa State University
item LIU, JIANING - University Of Georgia
item CHOUGULE, KAPEEL - Cold Spring Harbor Laboratory
item FUNGTAMMASAN, ARKARACHAI - Dnanexus
item SEETHARAM, ARUN - Iowa State University
item STEIN, JOSHUA - Cold Spring Harbor Laboratory
item LLACA, VICTOR - Corteva Agriscience
item MANCHANDA, NANCY - Iowa State University
item GILBERT, AMANDA - University Of Minnesota
item Woodhouse, Margaret

Submitted to: Nature Biotechnology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/5/2020
Publication Date: 5/8/2020
Citation: Ou, S., Liu, J., Chougule, K., Fungtammasan, A., Seetharam, A., Stein, J., Llaca, V., Manchanda, N., Gilbert, A., Woodhouse, M.H. 2020. Effect of sequence depth and length in long-read aAssembly of the maize inbred NC358. Nature Biotechnology. 11:2288. https://doi.org/10.1038/s41467-020-16037-7.
DOI: https://doi.org/10.1038/s41467-020-16037-7

Interpretive Summary: Recent improvements in the quality and yield of genome assembly technology have made it possible to rapidly generate reference-quality assemblies for complex genomes. Still, generating these assemblies is costly, and an assessment of critical sequence depth and read length to obtain high-quality assemblies is important for allocating limited resources. To this end, we have generated eight independent assemblies for the complex genome of maize inbred line NC358 with various levels of sequence quality, and have identified critical levels of quality a genome must have in order to represent important features of genome structure.

Technical Abstract: Recent improvements in the quality and yield of long-read data and scaffolding technology have made it possible to rapidly generate reference-quality assemblies for complex genomes. Still, generating these assemblies is costly, and an assessment of critical sequence depth and read length to obtain high-quality assemblies is important for allocating limited resources. To this end, we have generated eight independent assemblies for the complex genome of maize inbred line NC358 using PacBio datasets ranging from 20-75x genomic depth and N50 read lengths of 11-21 KB. Assemblies with equal or less than 30x depth and N50 read length of 11 KB were highly fragmented. The critical point in coverage for the gene space, including tandem gene arrays, and transposon space was 40x depth. Distinct critical points were observed for other non-TE repeat features of the genome. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.