Location: Plant, Soil and Nutrition Research
Project Number: 8062-21000-038-00-D
Project Type: In-House Appropriated
Start Date: Apr 1, 2013
End Date: Mar 31, 2018
1: Develop improved methods to predict per se and progeny performance based on DNA marker data. 1.A. Develop approaches to analyze historical, multi-year, unbalanced data that estimate and maximize future prediction accuracy. 1.B. Develop methods to simultaneously estimate the effects of 100,000 markers or more. 1.C. Develop methods to use genetic data to prioritize lines to be phenotyped in view of maximizing future prediction accuracy. 2: Develop new breeding schemes that leverage genomic data to optimally balance short- and long-term genetic gain. 2.A. Develop methods to optimize the tradeoff between family number and family size in a genomic selection pedigree breeding program. 2.B. Develop methods to maximize the capture of beneficial alleles and minimize the retention of deleterious alleles during introgression using genomic selection. 2.C. Develop and simulate methods that optimize gain from selection and maintenance of favorable diversity in the population. 3: Develop improved tools accessible to public sector breeding programs that facilitate the use of genomic selection methods. 3.A. Develop an R package providing a unified interface for several important genomic prediction methods including additive and non-additive predictors. 3.B. Integrate components of genomic prediction (phenotype analysis; marker imputation; genomic prediction) into an online data management and analysis tool. 4: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources.
Develop Bayesian analogues of the factor analytic models and extended for genomic prediction. The Bayesian formulations will enable model averaging thereby minimizing required user input. Analysis output will be processed to increase its interpretability. Modify these models to fit random marker effects, including for cases when many marker effects are missing. Test hypotheses relative to the value of the generalized coefficient of determination (GCD) of the mean contrast between parents of the selection candidates and the training population mean for genomic prediction accuracy for those candidates. Develop an algorithm to estimate the GCD of the expected mean contrast between selection candidates themselves and the training population mean using the distribution of possible candidate genotypes constructed from knowledge of their parental genotypes. Extend the GCD approach to cases where marker effects are not all assumed to come from the same normal distribution. Develop new breeding schemes that leverage genomic data to optimally balance short- and long-term genetic gain. Develop optimization algorithms that use genetic variance components, economic costs and budgets, and logistic constraints to compute a plan specifying the number of crosses to be made, the number of lines to develop per cross, and the fraction of lines to genotype to maximize gain from selection in a multitrait or polygenic trait context. Test hypotheses relative to the value of whole-population marker imputation accuracy as a predictor of whether multi- or specific-population training should be used to predict genotypic or breeding values of progeny admixed by introgression of exotic germplasm. Test methods to retain diversity at the breeding program level during genomic selection, one that works to minimize relatedness in the selected set and one that incorporates an estimate of genetic potential of the selected set. In conjunction with these methods, estimate chromosome segment-specific levels of repulsion phase linkage disequilibrium to select individuals carrying effective recombination events. Assemble these methods of genomic prediction into a single package in the free, open-source, statistical language R to facilitate user access to the methods. Integrate methods into a database constructed specifically to house breeding line performance and genotype data. Develop download methods and visualizations for analysis results.