Skip to main content
ARS Home » Pacific West Area » Wenatchee, Washington » Physiology and Pathology of Tree Fruits Research » Research » Publications at this Location » Publication #336072

Research Project: Developmental Genomics and Metabolomics Influencing Temperate Tree Fruit Quality

Location: Physiology and Pathology of Tree Fruits Research

Title: Selecting superior de novo transcriptome assemblies: lessons learned by leveraging the best plant genome [abstract]

item Honaas, Loren
item WAFULA, ERIC - Pennsylvania State University
item WICKETT, NORMAN - Pennsylvania State University
item DER, JOSHUA - Pennsylvania State University
item ZHANG, YETING - Pennsylvania State University
item EDGER, PATRICK - University Of Missouri
item ALTMAN, NAOMI - Pennsylvania State University
item PIRES, CHRIS - University Of Missouri
item LEEBENS-MACK, JAMES - University Of Georgia
item DEPAMPHILIS, CLAUDE - Pennsylvania State University

Submitted to: Plant and Animal Genome Conference
Publication Type: Abstract Only
Publication Acceptance Date: 11/25/2016
Publication Date: 1/16/2016
Citation: Honaas, L.A., Wafula, E.K., Wickett, N.J., Der, J.P., Zhang, Y., Edger, P.P., Altman, N.S., Pires, C., Leebens-Mack, J.H., dePamphilis, C.W. 2016. Selecting superior de novo transcriptome assemblies: lessons learned by leveraging the best plant genome [abstract]. Plant and Animal Genome Conference. p. 12.

Interpretive Summary:

Technical Abstract: Whereas de novo assemblies of RNA-Seq data are being published for a growing number of species across the tree of life, there are currently no broadly accepted methods for evaluating such assemblies. Here we present a detailed comparison of 99 transcriptome assemblies, generated with 6 de novo assemblers including CLC, Trinity, SOAP, Oases, ABySS and NextGENe. Controlled analyses of de novo assemblies for Arabidopsis thaliana and Oryza sativa transcriptomes provide new insights into the strengths and limitations of transcriptome assembly strategies. We find that the leading assemblers generate reassuringly accurate assemblies for the majority of transcripts. At the same time, we find a propensity for assemblers to fail to fully assemble highly expressed genes. Surprisingly, the instance of true chimeric assemblies is very low for all assemblers. Normalized libraries are reduced in highly abundant transcripts, but they also lack 1000s of low abundance transcripts. We conclude that the quality of de novo transcriptome assemblies is best assessed through consideration of a combination of metrics: 1) proportion of reads mapping to an assembly 2) recovery of conserved, widely expressed genes, 3) N50 length statistics, and 4) the total number of unigenes. We provide benchmark Illumina transcriptome data and introduce SCERNA, a broadly applicable modular protocol for de novo assembly improvement. Finally, our de novo assembly of the Arabidopsis leaf transcriptome revealed ~20 putative Arabidopsis genes lacking in the current annotation.