|LI, WEIZHONG - J Craig Venter Institute|
|RICHTER, ALEXANDER - J Craig Venter Institute|
|JUNG, YUNSUP - J Craig Venter Institute|
Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/23/2016
Publication Date: 9/27/2016
Citation: Li, W., Richter, A.R., Jung, Y., Li, R.W. 2016. Web-based bioinformatics workflows for end-to-end RNA-seq data computation and analysis in agricultural animal species. Biomed Central (BMC) Genomics. 17:761.
Interpretive Summary: Sequencing steady-state RNA molecules in a biological sample (RNA-seq) has been widely used in biomedical research. RNA-seq overcomes many limitations of previous technologies, such as microarrays and real-time PCR. Most importantly, RNA-seq has been shown to unravel previously inaccessible complexities in the transcriptome, such as allele-specific expression and novel promoters and isoforms, gene expression (abundance estimation), detection of alternative splicing, RNA editing, and novel transcripts. However, difficulties in creating these complicated computational pipelines, installing and maintaining software packages, and obtaining sufficient computational resources tend to overwhelm bench biologists from attempting to analyze their own RNA-seq data. In this study, we developed a web portal offering integrated workflows that enable all essential steps of RNA-seq procedures for agricultural animal scientists.
Technical Abstract: Remarkable advances in next-generation sequencing (NGS) technologies, bioinformatics algorithms, and computational technologies have significantly accelerated genomic research. However, complicated NGS data analysis still remains as a major bottleneck. RNA-seq, as one of the major area in the NGS field, also confronts great challenges in data analysis. To address the challenges in RNA-seq data analysis, we developed a web portal that offers three integrated workflows that can perform end-to-end compute and analysis, including sequence quality control, read-mapping, transcriptome assembly, reconstruction and quantification, and differential analysis. The first workflow utilizes Tuxedo (Tophat, Cufflink, Cuffmerge and Cuffdiff suite of tools). The second workflow deploys Trinity for de novo assembly and uses RSEM for transcript quantification and EdgeR for differential analysis. The third combines STAR, RSEM, and EdgeR for data analysis. All these workflows support multiple samples and multiple groups of samples and perform differential analysis between groups in a single workflow job submission. The calculated results are available for download and post-analysis. The supported animal species include chicken, cow, duck, goat, pig, horse, rabbit, sheep, turkey, as well as several other model organisms including yeast, C. elegans, Drosophila, and human, with genomic sequences and annotations obtained from ENSEMBL. The RNA-seq portal is freely available from http://weizhongli-lab.org/RNA-seq. In conclusion, the web portal offers not only bioinformatics software, workflows, computation and reference data, but also an integrated environment for complex RNA-seq data analysis for agricultural animal species. In this project, our aim is not to develop new RNA-seq tools, but to build web workflows for using popular existing RNA-seq methods and make these tools more accessible to the communities.