Skip to main content
ARS Home » Plains Area » Fort Collins, Colorado » Center for Agricultural Resources Research » Rangeland Resources & Systems Research » Research » Publications at this Location » Publication #428571

Research Project: Developing Precision Management Strategies to Enhance Productivity, Biodiversity, and Climate Resilience in Rangeland Social-ecological Systems

Location: Rangeland Resources & Systems Research

Title: Bringing cross-validation into the real world to evaluate transferability of satellite-based vegetation models

Author
item Kearney, Sean
item Augustine, David
item Porensky, Lauren
item Peirce, Erika
item Hiestand, Mikael
item Derner, Justin

Submitted to: Scientific Reports
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/9/2026
Publication Date: 2/17/2026
Citation: Kearney, S.P., Augustine, D.J., Porensky, L.M., Peirce, E.S., Hiestand, M.P., Derner, J.D. 2026. Bringing cross-validation into the real world to evaluate transferability of satellite-based vegetation models. Scientific Reports. 16. Article e9383. https://doi.org/10.1038/s41598-026-39866-w.
DOI: https://doi.org/10.1038/s41598-026-39866-w

Interpretive Summary: Using imagery obtained from satellites to map vegetation has become increasingly common. In rangelands, satellite imagery is often used to estimate the standing crop of forage, to help make management decisions for livestock operations. Because very large datasets are often available from satellites, many researchers have been using complex machine-learning algorithms to train models that predict vegetation conditions, such as forage availability. However, it is not clear whether these complex algorithms can perform better than simpler models when being used to make predictions on new imagery. We used a very large dataset of more than 10,000 ground-based measurements of forage in shortgrass rangeland of Colorado to compare predictions generated by several different types of simple vs. complex machine-learning algorithms. In particular, we compared their ability to predict forage availability in a new year which was not used to create the algorithm. We found that less complex approaches, in particular partial least squares regression, were able to more consistently and accurately predict conditions in a new year than complex approaches such as Histogram Gradient Boosted Regression and Random Forest. We recommend that the development of algorithms to predict vegetation conditions from satellite imagery use validation methods that include predictions to new locations and new time periods outside calibration dataset.

Technical Abstract: Near-real-time mapping of vegetation using satellite imagery is becoming increasingly common and valuable across a wide range of ecosystems. The availability of large datasets has led many researchers to complex machine learning algorithms (MLAs) to train satellite models. However, complex MLAs may underperform for the inherently extrapolative applications required for real-world vegetation monitoring. We used a dataset of nearly 10,000 training samples of standing herbaceous grazingland biomass collected over ten years to train progressively more complex MLAs, test them across progressively more extrapolative cross-validation (CV) groupings, and evaluate their performance and consistency. The performance of all MLA’s decreased substantially when tested against more extrapolative CV groupings. The commonly used approach of random k-fold CV produced overly optimistic performance (R2: 0.71-0.78) compared to a more realistic task of predicting for an unseen year (R2: 0.49-0.54). Simpler MLAs, such as partial least squares regression, were more consistent and outperformed complex MLAs for the most extrapolative tasks, and performance was less sensitive to the distinctness of unseen test data. We conclude that random k-fold CV likely produces unrealistically optimistic expectations for real-world applications of satellite vegetation models, and could be associated with major prediction misses when models are used in novel environmental conditions.