Author
AKDEMIR, DENIZ - Cornell University | |
SANCHEZ, JULIO ISODRO - Cornell University | |
Jannink, Jean-Luc |
Submitted to: Genetics Selection Evolution
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 3/30/2015 Publication Date: 5/6/2015 Publication URL: http://DOI: 10.1186/s12711-015-0116-6 Citation: Akdemir, D., Sanchez, J., Jannink, J. 2015. Optimization of genomic selection training populations with a genetic algorithm. Genetics Selection Evolution. 47:38. Interpretive Summary: In genomic selection, the genotypes of the selection candidates whose value we need to predict are known and one potential way to improve prediction accuracy is to select the most appropriate training population. We derived a computationally efficient statistic to this accuracy. To use that statistic for the process of training population design, we adopted a genetic algorithm to subset a larger set of individuals with genotypes and phenotypes. Our results show that, compared to a random sample of the same size, our method generates models with better accuracies. We implement the proposed training population design method on four datasets from arabidopsis, wheat, rice and maize. Validation in those datasets showed improved performance of genomic selection models. Technical Abstract: In this article, we derive a computationally efficient statistic to measure the reliability of estimates of genetic breeding values for a fixed set of genotypes based on a given training set of genotypes and phenotypes. We adopt a genetic algorithm scheme to find a training set of certain size from a larger set of candidate genotypes that optimizes this reliability measure. Our results show that, compared to a random sample of the same size, phenotyping individuals selected by our method results in models with better accuracies. We implement the proposed training selection methodology on four data sets from arabidopsis, wheat, rice and maize. A dynamic model building process that takes genotypes of the individuals in the test sample into account while selecting the training individuals improves the performance of genomic selection models. |