Location: Sugarcane ResearchTitle: Data mining sugarcane breeding yield data for ratoon yield prediction
|WAGUESPACK, HERMAN - American Sugar Cane League|
|KIMBENG, COLLINS - LSU Agcenter|
|PONTIF, MICHAEL - LSU Agcenter|
|Boykin, Deborah - Debbie|
Submitted to: Euphytica
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/11/2021
Publication Date: 3/5/2021
Citation: Todd, J.R., Dufrene Jr, E.O., Waguespack, H., Kimbeng, C., Pontif, M., Boykin, D.L. 2021. Data mining sugarcane breeding yield data for ratoon yield prediction. Euphytica. 217:54. https://doi.org/10.1007/s10681-021-02786-z.
Interpretive Summary: New sugarcane varieties are important for continued profitability of sugarcane in Louisiana. Because of labor, time and space limitations sugarcane breeders are sometimes required to make breeding decisions without yield data. Yield prediction could be useful when harvests data is not attainable or when selection is necessary before harvest. If harvest yield data is unavailable breeders rely upon previous yield data, field measurements and ratings to make selection decisions. Machine learning techniques can be used to make models with a set of predictors to estimate a target variable. These models can then be applied to data without the target variable to create the estimates. Based on previous data models of third ratoon cane yield were created. Predictors of third ratoon developed from models utilizing plant cane though second ratoon yield data had less experimental error with third ratoon data than second ratoon yield data. Predictions like these will enable breeders to make better decisions when complete yield data is not available, leading to increased efficiency of the breeding program and the release of more high yielding cultivars.
Technical Abstract: Ratooning ability is an important trait of sugarcane (Saccharum spp.), because it increases profits and reduces costs by reducing the number of plantings. In the Louisiana sugarcane variety development program, selection decisions are often made prior to measuring third ratoon yields. Machine learning (ML) techniques can use yield variables, such as cane tonnage and sucrose content to create predictors. Yield variables from 11 test locations, 24 genotypes, and 22 years were used to create a model to predict third ratoon-cane yield using ML techniques including Linear Regression, Random Forest, Ada Boost, Stochastic Gradient, Neural Network, Support Vector Machines, and k-nearest neighbors algorithm. With only prior yield data, prediction error was measured as the difference between predicted third ratoon yield and measured third ratoon yield. The Adaboost ML predictors of third ratoon yield had lower experimental error, when compared to second ratoon as a predictor of third ratoon. However, because of the effect of cycle the predictions were not always consistently lower every crop cycle than second ratoon as a predictor of third ratoon. Using a model that partitioned overall prediction error into sources of variances, the results also indicated that location within cycle (52%) followed by genotype by location within cycle (10%) were the largest sources of error for predicting ratooning ability. Ratoon stalk number, sucrose and cane yield ranked highly toward predictions. These results demonstrate the potential of ML techniques to improve selection for ratooning ability.