Author
YIN, TENGFEI - Iowa State University | |
MAJUMDER, MAHBUBUL - Iowa State University | |
CHOWDHURY, NILADRI ROY - Iowa State University | |
COOK, DIANNE - Iowa State University | |
Shoemaker, Randy | |
Graham, Michelle |
Submitted to: Journal of Data Mining in Genomics & Proteomics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 8/19/2013 Publication Date: 8/28/2013 Citation: Yin, T., Majumder, M., Chowdhury, N., Cook, D., Shoemaker, R.C., Graham, M.A. 2013. Visual mining methods for RNA-Seq data: data structure, dispersion estimation and significance testing. Journal of Data Mining in Genomics & Proteomics. 4:139. DOI 10.4172/2153-0602.1000139. Interpretive Summary: Analysis of RNA-seq data sets is often complex. Scientists rely on computer programs to perform statistical tests and identify genes whose activity changes in response to an experimental treatment. Understanding how these programs work is as important as carefully planning an experiment. In this manuscript, we examine two different statistical methods for analyzing the same data set. Assumptions made by two methods drastically alters the results. By creating tools to visualize the raw data, we can help determine which is the best computer program to use for examining different data sets. Technical Abstract: In an analysis of RNA-Seq data from soybeans, initial significance testing using one software package produced very different gene lists from those yielded by another. How can this happen? This paper demonstrates how the disparities between the results were investigated, and can be explained. This type of contradiction can occur more generally in high-throughput analyses. To explore the model fitting and hypothesis testing, we implemented an interactive graphic that allows the exploration of the effect of dispersion estimation on the overall estimation of variance and differential expression tests. In addition, we propose a new procedure to test for the presence of any structure in biological data. |