Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Sustainable Perennial Crops Laboratory » Research » Publications at this Location » Publication #420812

Research Project: Development of Pathogen- and Plant-Based Genetic Tools and Disease Mitigation Methods for Tropical Perennial Crops

Location: Sustainable Perennial Crops Laboratory

Title: Seed quality drives grain yield in Ethiopian and Senegalese sorghum: Insights from machine learning

Author
item Ahn, Ezekiel
item Prom, Louis
item Jang, Jae Hee
item Baek, Insuck
item TUKULI, ADAMA - Orise Fellow
item LIM, SEUNGHYUN - Orise Fellow
item HONG, SEOK - Ulsan National Institute Of Science And Technology (UNIST)
item LEE, YOONJUNG - University Of Minnesota Crookston
item Kim, Moon
item Meinhardt, Lyndel
item Park, Sunchung
item MAGILL, CLINT - Texas A&M University

Submitted to: PLOS ONE
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/15/2025
Publication Date: 8/14/2025
Citation: Ahn, E.J., Prom, L.K., Jang, J., Baek, I., Tukuli, A.R., Lim, S., Hong, S.M., Lee, Y., Kim, M.S., Meinhardt, L.W., Park, S., Magill, C. 2025. Seed quality drives grain yield in Ethiopian and Senegalese sorghum: Insights from machine learning. PLOS ONE. https://doi.org/10.1371/journal.pone.0329366.
DOI: https://doi.org/10.1371/journal.pone.0329366

Interpretive Summary: Sorghum is a crucial cereal crop, especially in Africa and Asia, providing food and fodder. This study explored the genetic diversity of sorghum varieties from Ethiopia and Senegal, aiming to group these varieties based on their traits and predict their grain yield using machine learning techniques. The study found that machine learning can effectively categorize sorghum varieties based on their features, which could help breeders select suitable varieties for crossing to create improved hybrids. Additionally, the study successfully predicted grain yield using these techniques, identifying seed weight and germination rate as the most important factors for determining yield potential. This information can help breeders develop improved sorghum varieties with better yields, contributing to increased food security and agricultural sustainability in the region.

Technical Abstract: This study evaluated the application of machine learning for clustering and yield prediction in a collection of 179 sorghum accessions from Ethiopia and Senegal. Various machine learning models were employed, including Bagging classifier, DBSCAN, Gaussian Mixture Model, K-Nearest Neighbors, Random Forest, and Support Vector Machines, to analyze phenotypic data comprising nine key traits: grain yield, seed weight, flowering time, germination rate, panicle height and length, and resistance to anthracnose, grain mold, and rust. Clustering analysis revealed distinct groupings within the accessions, highlighting the presence of inherent sub-types within this germplasm, which could be valuable for breeding programs. The Boosted Tree model exhibited exceptional accuracy in predicting grain yield, emphasizing the importance of seed weight, germination rate, and flowering time as key determinants of yield potential. This research underscores the potential of machine learning in sorghum germplasm characterization and yield optimization for enhanced breeding strategies.