Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Publications at this Location » Publication #294585

Title: Visual mining methods for RNA-Seq data: data structure, dispersion estimation and significance testing

Author
item YIN, TENGFEI - Iowa State University
item MAJUMDER, MAHBUBUL - Iowa State University
item CHOWDHURY, NILADRI ROY - Iowa State University
item COOK, DIANNE - Iowa State University
item Shoemaker, Randy
item Graham, Michelle

Submitted to: Journal of Data Mining in Genomics & Proteomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/19/2013
Publication Date: 8/28/2013
Citation: Yin, T., Majumder, M., Chowdhury, N., Cook, D., Shoemaker, R.C., Graham, M.A. 2013. Visual mining methods for RNA-Seq data: data structure, dispersion estimation and significance testing. Journal of Data Mining in Genomics & Proteomics. 4:139. DOI 10.4172/2153-0602.1000139.

Interpretive Summary: Analysis of RNA-seq data sets is often complex. Scientists rely on computer programs to perform statistical tests and identify genes whose activity changes in response to an experimental treatment. Understanding how these programs work is as important as carefully planning an experiment. In this manuscript, we examine two different statistical methods for analyzing the same data set. Assumptions made by two methods drastically alters the results. By creating tools to visualize the raw data, we can help determine which is the best computer program to use for examining different data sets.

Technical Abstract: In an analysis of RNA-Seq data from soybeans, initial significance testing using one software package produced very different gene lists from those yielded by another. How can this happen? This paper demonstrates how the disparities between the results were investigated, and can be explained. This type of contradiction can occur more generally in high-throughput analyses. To explore the model fitting and hypothesis testing, we implemented an interactive graphic that allows the exploration of the effect of dispersion estimation on the overall estimation of variance and differential expression tests. In addition, we propose a new procedure to test for the presence of any structure in biological data.