Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #403813

Research Project: Genetic and Physiological Mechanisms Underlying Complex Agronomic Traits in Grain Crops

Location: Plant Genetics Research

Title: Ensemble of BLUP, machine learning, and deep learning models predict maize yield better than each model alone

item Kick, Daniel
item Washburn, Jacob

Submitted to: bioRxiv
Publication Type: Pre-print Publication
Publication Acceptance Date: 4/2/2023
Publication Date: 4/2/2023
Citation: Kick, D.R., Washburn, J.D. 2023. Ensemble of BLUP, machine learning, and deep learning models predict maize yield better than each model alone. bioRxiv. Article bioRxiv 2023.03.30.532932.

Interpretive Summary: More accurately predicting how a crop variety will perform in an environment enables faster development of new crop varieties with favorable characteristics such as drought resistance or higher yield. Much focus has been given to what type of predictive model best represents the complex interactions between genes and environmental conditions. We show that if multiple models are used together the predictions are often better than those of the separate models. This work supports efforts to increase the quality and quantity of agricultural products by increasing the speed and accuracy of crop improvement.

Technical Abstract: Predicting phenotypes accurately from genomic, environment, and management factors is key to accelerating the development of novel cultivars with desirable traits. Furthermore, inclusion of management and environmental factors enables in silico studies to predict the effect of specific management interventions or future climates. Despite the value such models would confer, much work remains to improve the accuracy of phenotypic predictions. Rather than advocate for a specific modeling strategy, here we demonstrate that combining predictions from disparate models using simple ensemble approaches can result in better accuracy than the models on their own. Using published models containing genomic, environmental, and management effects to predict maize yield we investigate different strategies to create model ensembles. We find that ensembling generally improves performance even when using only two models. The number and type of models included alter accuracy with improvements diminishing as the number of model replicates increases. We find that an average of predictions, weighted by the inverse of each model’s expected error, using predictions from best linear unbiased predictors, linear fixed effects models, deep learning models, and select machine learning models performed best.