|YU, DANNI - Purdue University|
|DANKU, JOHN - Purdue University|
|KIM, SUNGJIN - Cornell University - New York|
|VATAMANIUK, OLENA - Cornell University - New York|
|SALT, DAVID - Purdue University|
|VITEK, OLGA - Purdue University|
Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/8/2011
Publication Date: N/A
Interpretive Summary: Ionomics is the study of elemental accumulation in living systems using high-throughput elemental profiling. With this technique, we can rapidly generate large quantities of data on thousands of samples, allowing for the profiling of large collections of lines where single genes have been disrupted. The elemental composition of living organisms is highly variable in response to small variations in the growth environment, which are virtually impossible to eliminate. Therefore, it is necessary to design experiments and develop statistical methods that will enable researchers to differentiate between changes due to altered genetics from those due to environmental variation. We have conducted two large ionomic experiments on collections of brewers yeast and developed statistical methods to reduce the confounding effects of environmental variation when interpreting the data.
Technical Abstract: Motivation: Accurate interpretation of perturbation screens is essential for a successful functional investigation. However, the screened phenotypes are often distorted by noise, and their analysis requires specialized statistical analysis tools. The number and scope of statistical methods available for this task is currently limited. In particular, many normalization methods make a restrictive assumption that the perturbations only affect a minority of the phenotypes. Many testing procedures under-estimate the overall variation in the system. We argue that the available methods are insufficient for the analysis of modern large-scale genetic perturbation screens. Results: We propose an experimental design that involves four control strains, and a 2-step normalization and estimation procedure based on linear mixed-effects modeling, that yield an accurate identification of hits. We evaluated the proposed approach using two comprehensive screens of S. cerevisiae, which involve 4965 single-gene knock-out mutants and 5825 single-gene over-expressed mutants, and found that the proposed approach (1) enables a sensitive discovery of biologically meaningful changes, (2) enables a practical experimental design, (3) allows extensions to alternative experimental workflows, and (4) strongly outperforms the B-score for normalization and moderate T statistic for testing in cases where a large fraction of mutations affect the phenotype. Availability: All experimental data sets are publicly available at www.ionomicshub.org. The source code is available from the authors upon request.