Location: Corn Insects and Crop Genetics ResearchTitle: De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity) Author
Submitted to: Nature Protocols
Publication Type: Peer reviewed journal
Publication Acceptance Date: 4/21/2013
Publication Date: 7/11/2013
Citation: Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., Couger, M.B., Eccles, D., Li, B., Lieber, M., Weeks, N.T. 2013. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity. Nature Protocols. 8(8):1494-1512. Interpretive Summary: While the cost of whole-genome sequencing has fallen in recent years, whole-genome assembly is still expensive, time consuming, and requires a high level of expertise. Transcriptome sequencing with RNA-Seq costs much less than whole-genome sequencing, allowing researchers on a limited budget to identify and characterize genes for organisms that lack assembled genome sequences. De novo transcriptome assembly software is required to identify such genic sequence from RNA-Seq. Trinity is one of the most popular de novo transcriptome assembly pipelines. Trinity also facilitates downstream analysis; for example, researchers can provide RNA-Seq for multiple samples (e.g., different tissues or environmental conditions) and use components to analyze differentially-expressed genes.
Technical Abstract: De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than five hours.