|BIAN, YANG - North Carolina State University|
|Holland, Jim - Jim|
Submitted to: G3, Genes/Genomes/Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/9/2015
Publication Date: 10/30/2015
Citation: Bian, Y., Holland, J.B. 2015. Ensemble learning of QTL models improves prediction of complex traits. G3, Genes/Genomes/Genetics. 5:2073-2084.
Interpretive Summary: Genetic markers have been used to identify genomic regions containing genes that affect complex, quantitatively measured traits, such as plant height and flowering time in corn. Modern genomics techniques allow scientists to tag plant genomes with thousands of markers, which is expected to help the search for quantitative trait genes (QTL). However, a statistical difficulty occurs when many more genetic markers are used than there are individuals in the testing population, and the predictive ability of the marker-trait associations are poor. We created a method to avoid the problem of ‘overfitting’ too many markers in genetic prediction models. The method is related to the general approach of ‘ensemble’ prediction models, but we take advantage of some specific known characteristics of genome maps to optimize the approach for predicting quantitative traits. We show that our new method (called ‘TAGGING’) outperforms standard QTL mapping methods for prediction of complex traits in maize.
Technical Abstract: Quantitative trait locus (QTL) models can provide useful insights into trait genetic architecture because of their straightforward interpretability, but are less useful for genetic prediction due to difficulty in including the effects of numerous small effect loci without overfitting. Tight linkage between markers introduces near collinearity among marker genotypes, complicating detection of QTL and estimation of QTL effects in linkage mapping, and this problem is exacerbated by very high density linkage maps. Here we developed a thinning and aggregating (TAGGING) method as a new ensemble learning approach to QTL mapping. TAGGING reduces collinearity problems by thinning dense linkage maps, maintains aspects of marker selection that characterize standard QTL mapping, and by ensembling, incorporates information from many more markers-trait associations than traditional QTL mapping. The objective of TAGGING was to improve prediction power compared to QTL mapping while also providing more specific insights into genetic architecture than genome-wide prediction models. TAGGING was compared to standard QTL mapping using cross validation of empirical data from the maize (Zea mays L.) nested association mapping population. TAGGING-assisted QTL mapping substantially improved prediction ability for both biparental and multi-family populations, by reducing both the variance and bias in prediction. Furthermore, an ensemble model combining predictions from TAGGING-assisted QTL and infinitesimal models improved prediction abilities over the component models, indicating some complementarity between model assumptions and suggesting that some trait genetic architectures involve a mixture of a few major QTL and polygenic effects.