Location: Plant, Soil and Nutrition ResearchTitle: Improved RNA-seq workflows using cyVerse cyberinfrastructure
|CHOUGULE, KAPEE - Cold Spring Harbor Laboratory|
|WANG, LIYA - Cold Spring Harbor Laboratory|
|STEIN, JOSHUA - Cold Spring Harbor Laboratory|
|WANG, XIAOFEI - Cold Spring Harbor Laboratory|
|DEVISETTY, UPENDRA KUMAR - University Of Arizona|
|Klein, Robert - Bob|
Submitted to: Current Protocols in Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/25/2018
Publication Date: 9/12/2018
Citation: Chougule, K., Wang, L., Stein, J., Wang, X., Devisetty, U., Klein, R.R., Ware, D. 2018. Improved RNA-seq workflows using cyVerse cyberinfrastructure. Current Protocols in Bioinformatics. 63(1):e53. https://doi.org/10.1002/cpbi.53.
Interpretive Summary: Major advancements in science hinge on understanding the regulation of genes controlling traits that are critically important to agriculture. Genes are tiny packets of genetic blueprint material that are found inside the cells of all plants and animals that control all of the physical characteristics of these organisms. Our work focuses on providing a user-friendly workflow of bioinformatic tools for analyzing the expression of all the genes within an organism. This study details our efforts to improve existing pipelines of bioinformatic tools for researchers who lack command-line experience, and provide web-based graphical user interfaces that are scalable, accurate, and easy to use. This refinement of our bioinformatic pipeline represents new resources for the entire plant community, and will allow scientists to understand those key features of the genetic blueprint that control how a given plant performs. Information will be primarily used by fellow scientists, but the work should ultimately result in better adapted, higher producing crop varieties available to American farmers.
Technical Abstract: RNA-seq is a vital tool for understanding gene structure and expression patterns. Typical RNA-seq analysis protocol use short reads for quality control, alignment to the reference genome, and assembly. The resultant transcripts are quantified and used for differential expression and visualization. Existing tools and protocols for RNA-seq are vast and diverse; given their differences in performance, it is critical to select an analysis protocol that is scalable, accurate, and easy to use. Tuxedo, a popular alignment-based protocol for RNA-seq analysis, has been updated with HISAT2, StringTie, StringTie-merge, and Ballgown, and the updated protocol outperforms its predecessor. Similarly, new pseudoalignmentbased protocols like Kallisto and Sleuth reduce runtime and improve performance. However, use of these tools is challenging for researchers without command-line experience. Here, we describe two new RNA-seq protocols, in which all apps are deployed on CyVerse Cyberinfrastructure with user-friendly graphical user interfaces, and validate their performance using plant RNA-seq data.