Location: Plant Genetics ResearchTitle: Yield prediction through integration of genetic, environment, and management data through deep learning
|WALLACE, JASON - University Of Georgia|
|SCHNABLE, JAMES - University Of Nebraska|
|KOLKMANN, JUDITH - Cornell University|
|ALACA, BORIS - Goettingen University|
|BEISSINGER, TIMOTHY - Goettingen University|
|ERTL, DAVID - Iowa Corn Promotion Board|
|GAGE, JOSEPH - North Carolina State University|
|HIRSCH, CANDICE - University Of Minnesota|
|Knoll, Joseph - Joe|
|DE LEON, NATALIA - University Of Wisconsin|
|LIMA, DAYANE - University Of Wisconsin|
|MORETA, DANILO - Cornell University|
|SINGH, MANINDER - Michigan State University|
|THOMPSON, ADDIE - Michigan State University|
|WELDEKIDAN, TECHLEMARIAM - University Of Delaware|
Submitted to: G3, Genes/Genomes/Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/22/2022
Publication Date: 4/1/2023
Citation: Kick, D.R., Wallace, J.G., Schnable, J.C., Kolkmann, J.M., Alaca, B., Beissinger, T.M., Edwards, J.W., Ertl, D., Flint-Garcia, S.A., Gage, J.L., Hirsch, C.N., Knoll, J.E., de Leon, N., Lima, D.C., Moreta, D., Singh, M.P., Thompson, A., Weldekidan, T., Washburn, J.D. 2023. Yield prediction through integration of genetic, environment, and management data through deep learning. G3, Genes/Genomes/Genetics. 13(4). Article jkad006. https://doi.org/10.1093/g3journal/jkad006.
Interpretive Summary: Predicting crop yield for a given cultivar and location is impeded by interaction effects -- cultivars don't always behave the same in each environment. More accurate predictions could reduce the time needed to develop new cultivars through genomic selection to identify cultivars well suited to specific environments. Here we improve maize yield prediction accuracy by using genetic, environmental, and management data, along with interactions between these data types in a deep learning model. This model is more accurate (on average) than the other models tested; both machine learning models and linear models. Interactions between data type is key to this model's performance and changes the importance of variables in the data. Additionally, we detail the process of model development to aid others in creating models for their crop of interest or improving upon this model.
Technical Abstract: Accurate prediction of the phenotypic outcomes produced by different combinations of genotypes, environments, and management interventions remains a key goal in biology with direct applications to agriculture, research, and conservation. The past decades have seen an expansion of new methods applied toward this goal. Here we predict maize yield using deep neural networks, compare the efficacy of 2 model development methods, and contextualize model performance using conventional linear and machine learning models. We examine the usefulness of incorporating interactions between disparate data types. We find deep learning and best linear unbiased predictor (BLUP) models with interactions had the best overall performance. BLUP models achieved the lowest average error, but deep learning models performed more consistently with similar average error. Optimizing deep neural network submodules for each data type improved model performance relative to optimizing the whole model for all data types at once. Examining the effect of interactions in the best-performing model revealed that including interactions altered the model's sensitivity to weather and management features, including a reduction of the importance scores for timepoints expected to have a limited physiological basis for influencing yield—those at the extreme end of the season, nearly 200 days post planting. Based on these results, deep learning provides a promising avenue for the phenotypic prediction of complex traits in complex environments and a potential mechanism to better understand the influence of environmental and genetic factors.