EVALUATING GENOMIC SELECTION FOR APPLIED PLANT BREEDING
Plant, Soil and Nutrition Research
2011 Annual Report
1a.Objectives (from AD-416)
1. Select progeny based on genomic selection (GS) and phenotypic selection and compare their performance in subsequent field trials.
2. Assess the ability of GS to predict the true breeding value of a parent
3. Determine whether a trained GS model maintains accuracy over cycles of breeding.
4. Use simulations to assess scenarios for the introduction and implementation of GS in a breeding program to optimize short- and long-term success over cycles of GS.
1b.Approach (from AD-416)
Cooperating PI’s will run field experiments on sets of progeny to obtain phenotypic data; they will also extract DNA and obtain marker data. We will analyze these data using several genomic selection methods to generate predictions of breeding and genetic values based on marker data. In parallel, we will use these data and other data available from historical trials to estimate breeding and genetic values directly from phenotype and pedigree data. We will then correlate predictions with directly observed data to evaluate the accuracy of genomic selection.
Simulation and empirical studies of genomic selection (GS) show accuracies sufficient to generate rapid genetic gains. However, with the increased popularity of genomic selection approaches, numerous models have been proposed and no comparative analysis is available to identify the most promising models. Using eight datasets from wheat, barley, Arabidopsis, and maize, the predictive ability of currently available genomic selection models along with several machine learning methods, was evaluated. A similar level of accuracy was observed for many models, though computation times were quite different and effect estimates often were also. Our comparisons suggested that genomic selection in plant breeding programs could be based on a small set of models such as the Bayesian Lasso, weighted Bayesian shrinkage regression, reproducing kernel Hilbert spaces regression, and random forest (a machine learning method that could capture non additive effects). We were unsuccessful at combining different models to improve accuracy. We found large differences in accuracy between subpopulations within datasets that were not easy to explain. The broad diversity of empirical datasets tested added evidence that genomic selection could increase genetic gain per unit of time and cost.