Skip to main content
ARS Home » Southeast Area » Raleigh, North Carolina » Plant Science Research » Research » Publications at this Location » Publication #323833

Research Project: Genetic Improvement of Small Grains for Biotic and Abiotic Stress Tolerance and Characterization of Pathogen Populations

Location: Plant Science Research

Title: Predicting pre-planting risk of Stagonospora nodorum blotch in winter wheat using machine learning models

Author
item MEHRA, LUCKY - North Carolina State University
item Cowger, Christina
item GROSS, KEVIN - North Carolina State University
item OJIAMBO, PETER - North Carolina State University

Submitted to: Frontiers in Plant Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/8/2016
Publication Date: 3/30/2016
Citation: Mehra, L., Cowger, C., Gross, K., Ojiambo, P. 2016. Predicting pre-planting risk of Stagonospora nodorum blotch in winter wheat using machine learning models. Frontiers in Plant Science. http://dx.doi.org/10.3389/fpls.2016.00390.

Interpretive Summary: Factors that are known before planting wheat can help predict the later severity of the fungal disease Stagonospora nodorum blotch (SNB), which infects the leaves, stems, and heads of wheat (Triticum aestivum). We studied whether SNB severity could be predicted using the pre-planting factors of variety resistance, latitude, longitude, previous crop, seeding rate, seed treatment, tillage type, and/or wheat residue on the soil surface. We compared the performance of four modeling techniques in predicting SNB severity. Those techniques were multiple regression (MR) and three machine-learning algorithms (artificial neural networks, categorical and regression trees, and random forests). With these techniques, models were developed using 431 cases where disease was observed in field experiments conducted from 2012 to 2014. A strong relationship was observed between late-season severity of SNB and the pre-planting predictors of latitude, longitude, wheat residue, and cultivar resistance. Models were evaluated based on several criteria. The MR model explained 33% of variability in the data, while machine learning models explained 47 to 79% of the total variability. Similarly, the MR model correctly classified 74% of the disease cases, while machine learning models correctly classified 81 to 83% of these cases. The random forest (RF) algorithm, which explained 79% of the variability in the data, was the most accurate in predicting the risk of SNB, with an accuracy rate of 93%. The RF algorithm could allow early assessment of SNB risk, facilitating sound disease management decisions prior to planting of wheat.

Technical Abstract: Pre-planting factors have been associated with the late-season severity of Stagonospora nodorum blotch (SNB), caused by the fungal pathogen Parastagonospora nodorum, in winter wheat (Triticum aestivum). The relative importance of these factors in the risk of SNB has not been determined and this knowledge can facilitate disease management decisions prior to planting of the wheat crop. In this study, we examined the performance of multiple regression (MR) and three machine learning algorithms (artificial neural networks, categorical and regression trees, and random forests) in predicting the pre-planting risk of SNB in wheat. Pre-planting factors tested as potential predictor variables were cultivar resistance, latitude, longitude, previous crop, seeding rate, seed treatment, tillage type, and wheat residue. Disease severity assessed at the end of the growing season was used as the response variable. The models were developed using 431 disease cases (unique combinations of predictors) collected from 2012 to 2014 and these cases were randomly divided into training, validation, and test datasets. Models were evaluated based on the regression of observed against predicted severity values of SNB, sensitivity-specificity ROC analysis, and the Kappa statistic. A strong relationship was observed between late-season severity of SNB and the pre-planting predictors of latitude, longitude, wheat residue, and cultivar resistance. The MR model explained 33% of variability in the data, while machine learning models explained 47 to 79% of the total variability. Similarly, the MR model correctly classified 74% of the disease cases, while machine learning models correctly classified 81 to 83% of these cases. Results show that the random forest (RF) algorithm, which explained 79% of the variability within the data, was the most accurate in predicting the risk of SNB, with an accuracy rate of 93%. The RF algorithm could allow early assessment of the risk of SNB, facilitating sound disease management decisions prior to planting of wheat.