Submitted to: Agronomy Journal
Publication Type: Peer reviewed journal
Publication Acceptance Date: 10/24/2006
Publication Date: 4/1/2007
Citation: White, J.W., Boote, K.J., Hoogenboom, G., Jones, P.G. 2007. Regression-based evaluation of ecophysiological models. Agronomy Journal 99: 419-427. Interpretive Summary: Ecophysiological models of crops are increasingly used as tools both for research and to guide decisions by growers. Models find use in irrigation management, designing better cultivars, predicting potential impacts of global change, and numerous other fields. It is often difficult to know whether a model is suitable for a particular use, even when large sets of field data are available for model testing. The most widely used statistical tests for judging models have problems and are often misapplied. This paper explains how slightly more complex analyses, which use a technique called “multiple regression” to account for effects of experiments, cultivars or other sources of variation can provide researchers with more reliable estimates of the utility of a model and furthermore, suggest priorities for model improvement. In this study, the CSM-CROPGRO-Soybean model was evaluated, testing for effects of environment, cultivars, and environmental conditions such as temperature or daylength. When applied to 28 data sets for soybean, representing 11 locations and 17 cultivars (giving a total 113 treatment combinations), the regressions showed that the model simulated days to anthesis and grain yield well for a wide range of environmental conditions and associated yield levels. Differences among environments represented a larger portion of unexplained variation than did differences among cultivars. Thus, the model might be improved be examining how it describes soybean response to environment rather than how it describes cultivar differences. Alternatively, there may be problems with the input data for soil or weather conditions during the experiments. Use of multiple regression should lead to better matching of models to practical applications and help identify aspects of models that require improvement. Ultimately, growers, policy makers and other stakeholders will benefit from more reliable predictions of how crops respond to management or to external factors such as changing climate.
Technical Abstract: Ecophysiological models of crops are increasingly used as research and decision support tools for topics ranging from precision agriculture to global change. It is often difficult to assess how suitable a model is for a particular application, even when large sets of field data are readily available for testing the model. Basic procedures for model evaluation have been widely published but have deficiencies. Many techniques rely on bivariate linear regressions between observed and simulated values of a trait. Bivariate regressions assume statistical independence among all observed values, but field observations often have dependencies if they originate from series of experiments or involve experiments using nested designs (e.g., with split plots). By representing effects of experiments, cultivars or other sources of variation as factors, linear regression models can specify expected dependencies, permitting analyses that are statistically more rigorous and provide more insights into model performance. The goal of this study was to evaluate the CSM-CROPGRO-Soybean model using regressions that included environment and cultivars as factors as well as continuous variables such as temperature or daylength. When applied to 28 data sets for soybean (Glycine max. (L.) Merr.), representing 11 locations and 17 cultivars (giving a total 113 treatment combinations), the regressions showed that the model simulated days to anthesis and grain yield well for a wide range of environmental conditions and associated yield levels. Differences among environments represented a larger portion of unexplained variation than did differences among cultivars. This suggests that further improvements in the model should be sought in crop response to environment rather than in representing cultivars, or alternatively, that description of environments such as soil profiles or daily weather are more problematic than characterizing cultivars. The sub-model for photosynthesis that scales leaf-level values to canopy resulted in more accurate simulations of grain yield than the simpler canopy-level sub-model. Multiple regression is a valuable tool for analyzing crop models, allowing diverse tests that are much more informative than bi-variate comparisons of observed and simulated data.