|Dardick, Christopher - Chris|
Submitted to: BMC Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/19/2008
Publication Date: 7/19/2008
Citation: Lu, R., Lee, G., Shultz, M., Dardick, C.D., Jung, K., Phetsom, J., Jia, Y., Rice, R.H., Goldberg, Z., Schnable, P.S., Ronald, P., Rocke, D.M. 2008. Assessing probe-specific dye and slide biases in two-color microarray data. BMC Bioinformatics. 9:314. doi:10.1186/1471-2105-9-314. Interpretive Summary: Microarray technology is used to monitor the activity of thousands of genes simultaneously. In some versions of the technology, two biological tissue samples are combined and compared (for example, diseased and healthy) to identify which genes are turned on or off between the two states. To distinguish which sample is which, a different fluorescent dye is attached to each sample. The difference in the amount of fluorescence from each dye indicates the differences in gene activity. One problem commonly associated with this technology is called dye bias, where one dye is more abundant than the other as a result of an inherent lack of technical precision. When analyzing the data, dye bias can skew the results and lead to erroneous conclusions. Previously, a number of analytical methods have been developed to minimize dye bias but do not eliminate it. Here, we report a method to quantify the amount of dye bias in any experiment. This method serves as both a diagnostic tool to assess the reliability of an experiment and provides a way to improve data sets by eliminating samples with unacceptable levels of dye bias. The method provides a substantial improvement to the analysis of microarray data.
Technical Abstract: A primary reason for using two-color microarrays is that the use of two samples labeled with different dyes on the same slide and that bind to probes on the same spot is supposed to adjust for many factors that introduce noise and errors into the analysis. Most users assume that any differences between the dyes can be adjusted out by standard methods of normalization, so measures, such as log ratios on the same slide, are reliable measures of comparative expression. However, even after the normalization, there are still probe specific dye and slide variation among the data. We develop a method to quantify the amount of the dye-by-probe and slide-by-probe interaction. This serves as a diagnostic, both visual and numeric, of the existence of probe-specific dye bias. We show how this improved the performance of two-color array analysis for arrays for genomic analysis of biological samples ranging from rice to human tissue. We develop a procedure for quantifying the extent of probe-specific dye and slide bias in two-color microarrays. The primary output is a graphical diagnostic of the extent of the bias which called ECDF (Empirical Cumulative Distribution Function), though, numerical results are also obtained. We show that the dye and slide biases were high for human and rice genomic arrays in two gene expression facilities, even after the standard intensity-based normalization, and describe how this diagnostic allowed the problems causing the probe-specific bias to be addressed and resulted in important improvements in performance. The R package LMGene, which contains the method described in this paper, has been available to download from Bioconductor.