Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #338022

Title: Optimal designs for genomic selection in hybrid crops

item GUO, TINGTING - Iowa State University
item YU, XIAOQING - Iowa State University
item LI, XIANRAN - Iowa State University
item ZHANG, HAOZHE - Iowa State University
item ZHU, CHENGSONG - Iowa State University
item Flint-Garcia, Sherry
item McMullen, Michael
item Holland, Jim - Jim
item Szalma, Stephen
item WISSER, RANDALL - University Of Delaware
item YU, JIANMING - Iowa State University

Submitted to: Molecular Plant
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/24/2018
Publication Date: 3/1/2019
Citation: Guo, T., Yu, X., Li, X., Zhang, H., Zhu, C., Flint-Garcia, S.A., McMullen, M.D., Holland, J.B., Szalma, S.J., Wisser, R., Yu, J. 2019. Optimal designs for genomic selection in hybrid crops. Molecular Plant. 12(3):390-401.

Interpretive Summary: Plant breeders select the best materials to advance in their breeding programs using a variety of tools and strategies, ranging from selection based on the phenotype (traits of interest) alone to the use of statistical models that predict the phenotype based only on genetic data. One such tool is referred to as “genomic selection.” In genomic selection, a population of plants that have both genetic information associated with them and a collection of phenotypes, referred to as a "training population", is used to "train" or test a statistical model to generate reliable predictions. The trained statistical model is then used to predict the phenotypes of individuals in a breeding population based only on the genetic information that is available and without waiting for the plants to exhibit the phenotype. This speeds up the breeding process because the plants do not need to be evaluated in multiple year field trials. The individuals with the best predicted phenotypes are advanced in the breeding program. The accuracy of genomic selection depends on a number of factors, including the choice of the "training population". In this study, three different mathematical methods for choosing the training population were compared to the current practice of random sampling of plants to generate a “training population.” The mathematical methods were applied to a novel dataset generated for corn, and to existing datasets for wheat and rice. All three mathematical methods improved the accuracy of genomic selection over a random sampling protocol and required smaller training sets, both of which save money and resources in a breeding program. The results of this study will be useful to plant breeding programs in both the private and public sectors as they seek to improve efficiency in their breeding programs for the improvement of crop plants.

Technical Abstract: Novel strategies and effective tools in crop improvement are essential for sustainable food production. Genomic selection is enabled by the improved capacity in genomics and biotechnology. However, whether data mining can improve prediction efficiency and offer new insights into breeding program design has not been examined in detail. Here we show that representative subset selection can be applied to training set design to enhance performance prediction of hybrids. Specifically, maximization of connectedness and diversity (MaxCD) was developed from the genetic mating scheme perspective by exploring patterns in genomic relationships and phenotypic variation. Partitioning around medoids (PAM) and fast and unique representative subset selection (FURS) were introduced from cluster analysis and graphic network analysis to the genomic prediction context. These three training set designs outperformed random sampling in prediction accuracy across three traits evaluated for a set of 276 maize hybrids. Similarly, analyses with 2,556 wheat hybrids from an early-stage hybrid breeding system and 1,439 rice hybrids from an established hybrid breeding system validated the advantages of the new methods. With representative subset selection, effective genomic prediction models can be established with a training set 2~13% of the size of the whole set. Two criteria, connectedness and diversity, were quantified to explain performance comparisons with random sampling. Enhanced by design concept, genomic selection may reshape the plant breeding pipeline by enabling the efficient exploration of the enormous inference space of hybrid combinations. Research in data mining and design optimization can offer additional guidelines to streamline the plant breeding process.