|LIU, XIAOLEI - Huazhong Agricultural University|
|HUANG, MENG - Washington State University|
|FAN, BIN - Huazhong Agricultural University|
|Buckler, Edward - Ed|
|ZHANG, ZHIWU - Washington State University|
Submitted to: PLoS Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/3/2015
Publication Date: 2/1/2016
Citation: Liu, X., Huang, M., Fan, B., Buckler IV, E.S., Zhang, Z. 2016. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genetics. 12(3):e1005957.
Interpretive Summary: The natural variation traits that are seen across a species are the product of thousands of genes working together that are encoded by billions of base pairs of DNA. To identify the functional variants requires genetic mapping and sophisticated mathematical approaches to relate trait variation with DNA variation. This study combined approaches that are powerful at modeling numerous small effect variants with approaches useful for modeling large effect genetic variants. The approach developed show substantial promise in identifying functional natural DNA variation, while also being computationally efficient. This algorithm and software will be widely used to dissect traits cross numerous species.
Technical Abstract: False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days.