Submitted to: Journal of Agricultural Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/25/2006
Publication Date: 12/31/2006
Citation: Weiss, A., Wilhelm, W.W. 2006. The circuitous path to the comparison of simulated values from crop models with field observations. Journal of Agricultural Science 144:475-488. Interpretive Summary: Computer simulations (models) are valuable tools to organize understanding and ideas about how biological systems function and react to genetic and environmental changes. The algorithms (equations) in the model can be changed as understanding changes to better describe the natural systems. In addition, new concepts can be incorporated and the modified model evaluated for improved representation of the system under investigation. In the use of models, agreement and disagreement between observed (ground truth) and predicted (model output) values occur for a number of reasons. One of the possible reasons is that the model accurately describes the natural system. However, errors in ground-truth data and models structure can also cause agreement (when in fact the ground-truth and prediction should not) or disagreement (when ground-truth and predictions should agree). These errors can be grouped into several obvious classes: errors in collection and reporting of model input data; errors in observed data; and errors in construction of algorithms used in the model. However there are more subtle errors which may be far more difficult to detect. An example is the computation of day length, which is critical in describing the photoperiod response in crops. There are several official ways to compute day length. Differences across the several methods can be as great as 1 hour. If model developers are not perfectly clear in methods used to compute day length, users may assume a different method. Another example is difference in water content of observed dry matter compared to that in simulation results. Traditionally simulation output is reported as dry matter (no water) but observed data on grain may be reported at the market standard water content (i.e., 15.5% water for corn grain). Even if observed data are not adjusted to the standard water content, a small amount of water is retained in dry grain that is not in the simulated grain. We also generally assume errors result in disagreement between simulated and observed data. However, it is possible, and very probable, that errors can cause agreement between simulated and observed data when in fact they disagree widely. The result of this less accepted type of error is that false understanding is assumed. It is critical that model developers and users are aware of both the obvious and subtle reasons for model output and observed data to agree or disagree and work diligently to minimize the problems to make greatest use of the valuable tool of simulation modeling.
Technical Abstract: To quantify the performance of a crop simulation model, model outputs are compared to observed values using statistical measures of bias, i.e., the difference between simulated and observed values. While applying these statistical measures is unambiguous for the experienced user, the same cannot always be said of determining the observed or simulated values. For example, differences in accessing crop development can be due to the subjectivity of an observer or to a definition that is difficult to apply in the field. Methods of determining kernel number, kernel mass, and yield can vary among researchers, which can add errors to comparisons between experimental results and simulated values. If kernel moisture is not carefully determined and reported it can add error to values of grain yield and kernels per unit area regardless of the protocol used to collect these data. Inaccurate determination of kernel moisture will also influence computation of grain protein or oil content. Problems can also be associated with input data to the simulation models. Under reporting of precipitation values from tipping bucket rain gages, commonly found on automated weather stations, can introduce errors in results from crop simulation models. Using weather data collected too far from an experimental site may compound problems with input data. The importance of accurate soil and weather input data increases as the environment becomes more limiting. Problems can also arise from algorithms that calculate important parameters in a model, such as day length, which is used to determine a photoperiod response. Errors in the calculation of photoperiod can be related to the definition of sunrise and sunset and the inclusion or exclusion of civil twilight or to the improper calculation of the solar declination. Even the simple calculation of the daily mean air temperature can have impact on the results from a non-linear algorithm. During a period when crop simulation modeling is moving in the difficult direction of incorporating genomic-based inputs, we cannot forget the critical importance of careful and accurate collection and reporting of field data and the need to develop robust algorithms that accommodate readily available or easily acquired input data. As scientists we have an obligation to provide the best available knowledge and understanding as possible. Avoiding potential pitfalls will assist us as we develop new knowledge and understanding.