Submitted to: Journal of Agricultural, Biological, and Environmental Statistics
Publication Type: Peer reviewed journal
Publication Acceptance Date: 3/15/2006
Publication Date: 9/1/2006
Citation: Nettleton, D., Hwang, G., Caldo, R.A., Wise, R.P. 2006. Estimating the number of true null hypotheses from a histogram of p-values. Journal of Agricultural, Biological, and Environmental Statistics. 11:337-356. Interpretive Summary: Microarray analysis commonly consists of a data-driven exploratory approach that relies on searching for differentially expressed or co-regulated genes. Subsequently, the investigator is left with the painstaking task of searching for connections between the genes that showed interesting activity, by searching through annotation from BLAST hits, specialized genome databases, protein information, and pathway links. This problem is becoming increasingly important as the analysis of many genome-based experiments involves follow-up testing of hundreds or thousands of genes. When mapping quantitative trait loci, for example, each of hundreds of genetic loci are tested for association with a quantitative trait of interest. This manuscript describes a novel algorithm to estimate the true number of differentially expressed genes from microarray-based experiments that are associated with a particular treatment. It will have broad applicability to biologists and statisticians involved in microarray-based studies to deduce the optimal number of genes to pursue for follow-up study.
Technical Abstract: Mosig et al. (2001, Genetics 157, 1683-1698) proposed an intuitively appealing method for estimating the number of true null hypotheses in a multiple test situation. They presented an iterative algorithm that relies on a histogram of observed p-values to obtain their estimator. We characterize the limit of their iterative algorithm and show that their estimator can be computed directly without iteration. We compare the performance of the histogram-based estimator with other procedures for estimating the number of true null hypotheses from a collection of observed p-values and find that the histogram-based estimator performs well in settings similar to those encountered in microarray data analysis. We demonstrate the approach using p-values from a large microarray experiment aimed at uncovering molecular mechanisms of barley resistance to a fungal pathogen.