Skip to main content
ARS Home » Southeast Area » Florence, South Carolina » Coastal Plain Soil, Water and Plant Conservation Research » Research » Research Project #449142

Research Project: Developing Multivariate Machine Learning Models for Early-Season Cotton Yield Prediction Across the Conterminous United States

Location: Coastal Plain Soil, Water and Plant Conservation Research

Project Number: 6082-13000-011-006-R
Project Type: Reimbursable Cooperative Agreement

Start Date: Jan 1, 2026
End Date: Dec 31, 2026

Objective:
Develop a multivariate machine learning modeling framework for robust early-season cotton yield prediction for cotton-producing states/regions of the United States.

Approach:
Through this research, we will address cotton yield anomalies in 17 cotton-producing states across the conterminous US including California, Arizona, New Mexico, Oklahoma, Texas, Kansas, Louisiana, Missouri, Arkansas, Mississippi, Tennessee, Alabama, Florida, North Carolina, Georgia, Virginia, and South Carolina. We will retrieve and use county-level time series of cotton yield, remote sensing, soil, and climate data of the last thirty years. Explicitly, the dependent variable will be county-level cotton yield time series, and the explanatory variables will include county-level ENSO signals, drought indices, soil characteristics (e.g. soil water capacity), monthly precipitation total, monthly number of precipitation events, monthly ET, monthly cumulative solar radiation, monthly cumulative temperature, monthly normalized difference vegetation index (NDVI), monthly leaf area index (LAI), monthly land liquid water equivalence thickness anomaly (LWE), monthly soil moisture (Soil Moisture Active Passive SMAP), irrigation rates, tile drainage rates, fertilizer use, and cotton genotype features (e.g. cycle length). Variables selection and transformation algorithms will be used to eliminate spatial redundancy and superfluous signals. Different ML models including recurrent neural networks, random forest, k-nearest neighbors (KNN), support vector machine (SVM) regression, logistic model, cluster analyses, and spatial regionalization will be used to develop a multivariate framework for a robust early-season cotton yield prediction.