|HESLOT, NICOLAS - Cornell University
|SORRELLS, MARK - Cornell University
Submitted to: Crop Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/12/2012
Publication Date: 4/16/2013
Citation: Heslot, N., Jannink, J., Sorrells, M. 2013. Using genomic prediction to characterize environments and optimize prediction accuracy in applied breeding data. Crop Science. 53(3):921-933.
Interpretive Summary: Genomic selection (GS) methods seek to estimate marker effects in order to build a model to predict phenotypes. Considering replication of markers across experiments gives a new way to cope with unbalanced trials where not all breeding lines are evaluated in all environments. Using a two-row elite barley population tested for grain yield across Europe from 2007 to 2010 we compared the genome-wide marker by environment interaction with the breeding line by environment interaction (G*E). These two interactions were significantly related. The analyses enabled the identification of outlier environments that were atypical of the overall set of environments in which experiments were conducted. A new method was developed to optimize the set of environments used to develop the genomic selection prediction model. This method identifies and removes less predictive environments from the set. Using this approach with the barley dataset, prediction accuracy increased from 0.54 to 0.61 while controlling overfitting and focusing the prediction on the environments that the breeder is targeting.
Technical Abstract: Simulation and empirical studies of genomic selection (GS) show accuracies sufficient to generate rapid annual genetic gains. It also shifts the focus from the evaluation of lines to the evaluation of alleles. Consequently, new methods should be developed to optimize the use of large historic multi-environment trials for genomic selection. Considering alleles replication rather than lines also provides a new way to cope with unbalanced phenotypic datasets. Using a two-row elite barley population tested for grain yield across Europe from 2007 to 2010 we compared the genome-wide allele by environment interaction with the genotype by environment interaction (G*E). We characterized allele effect estimates at each test location and used them to identify outliers environments. The prediction accuracy between environment was significantly correlated with the genetic correlations between environment suggesting that the genotype by environment interaction is of the same nature for genomic and phenotypic selection and is thus calling for the same answers. Allele effects were as stable across environment as line performance suggesting that there is not a greater G*E issue when shifting from phenotypic to genomic selection. A new method is developed to optimize the composition of the training population for prediction in the target population of environments (TPE). This method does not search for mega-environments, but instead identifies and removes less predictive environments from the set of environments used to train the model. Using this approach with the barley dataset, cross-validated accuracy increased from 0.54 to 0.61 while controlling overfitting and focusing the prediction on the TPE.