Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #396296

Research Project: Genetic and Physiological Mechanisms Underlying Complex Agronomic Traits in Grain Crops

Location: Plant Genetics Research

Title: Yield prediction through integration of genetic, environment, and management data through deep learning

item Kick, Daniel
item WALLACE, JASON - University Of Georgia
item SCHNABLE, JAMES - University Of Nebraska
item KOLKMANN, JUDITH - Cornell University
item BORIS, ALACA - Goettingen University
item BEISSINGER, TIMOTHY - Goettingen University
item IRTL, DAVID - Iowa Corn Promotion Board
item Flint-Garcia, Sherry
item GAGE, JOSEPH - North Carolina State University
item HIRSCH, CANDICE - University Of Minnesota
item Knoll, Joseph - Joe
item DE LEON, NATALIA - University Of Wisconsin
item LIMA, DAYANE - University Of Wisconsin
item MORETA, DANILO - Cornell University
item SINGH, MANINDER - Michigan State University
item WELDEKIDAN, TECHLEMARIAN - University Of Delaware
item Washburn, Jacob

Submitted to: bioRxiv
Publication Type: Pre-print Publication
Publication Acceptance Date: 7/30/2022
Publication Date: 7/30/2022
Citation: Kick, D.R., Wallace, J.G., Schnable, J.C., Kolkmann, J.M., Boris, A., Beissinger, T.M., Irtl, D., Flint Garcia, S.A., Gage, J.L., Hirsch, C.N., Knoll, J.E., De Leon, N., Lima, D.C., Moreta, D., Singh, M.P., Weldekidan, T., Washburn, J.D. 2022. Yield prediction through integration of genetic, environment, and management data through deep learning. bioRxiv. Article bioRxiv 2022.07.29.502051.

Interpretive Summary: Predicting crop yield for a given cultivar and location is impeded by interaction effects -- cultivars don't always behave the same in each environment. More accurate predictions could reduce the time needed to develop new cultivars through genomic selection and identify cultivars well suited to specific environments. Here we improve maize yield prediction accuracy by using genetic, environmental, and management data, along with interactions between these data types in a deep learning model. This model is more accurate (on average) than the other models tested; both machine learning models and linear models. Interactions between data type is key to this model's performance and changes the importance of variables in the data. Additionally, we detail the process of model development to aid others in creating models for their crop of interest or improving upon this model.

Technical Abstract: Accurate prediction of an organism’s phenotype for combinations of genotypes, environments, and management interventions remains a key goal in biology with direct applications to agriculture, research, and conservation. The past decade has seen an expansion of the methods applied towards this aim. Here we predict maize yield using deep neural networks, compare the efficacy of two model development methods, contextualize model performance using linear and machine learning models, and examine the usefulness of incorporating interactions between disparate data types. From the best performing model we discuss the influence of interactions between data types on the salience of features in the data set