Skip to main content
ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Crop Improvement and Genetics Research » Research » Publications at this Location » Publication #429863

Research Project: GrainGenes- A Global Data Repository for Small Grains

Location: Crop Improvement and Genetics Research

Title: Delivering AI-ready genomics with MaizeGDB

Author
item HALEY, OLIVIA - Oak Ridge Institute For Science And Education (ORISE)
item TIBBS-CORTES, LAURA - Oak Ridge Institute For Science And Education (ORISE)
item HARDING, STEPHEN - Oak Ridge Institute For Science And Education (ORISE)
item PORETSKY, ELLY - Oak Ridge Institute For Science And Education (ORISE)
item Cannon, Ethalinda
item Portwood Ii, John
item GARDINER, JACK - University Of Missouri
item Sen, Taner
item Kim, Hye-Seon
item Woodhouse, Margaret
item Andorf, Carson

Submitted to: Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/16/2025
Publication Date: 1/13/2026
Citation: Haley, O.C., Tibbs-Cortes, L., Harding, S., Poretsky, E., Cannon, E.K., Portwood Ii, J.L., Gardiner, J.M., Sen, T.Z., Kim, H., Woodhouse, M.H., Andorf, C.M. 2026. Delivering AI-ready genomics with MaizeGDB. Genetics. iyag005. https://doi.org/10.1093/genetics/iyag005.
DOI: https://doi.org/10.1093/genetics/iyag005

Interpretive Summary: Corn researchers have collected large amounts of genetic and protein data, but the information is challenging to combine and utilize, which slows progress in developing improved crops. The ARS Maize Genetics and Genomics Database (MaizeGDB) addressed this issue by organizing data into consistent formats, preparing it in advance for artificial intelligence (AI) analysis, and creating clear, repeatable workflows that others can use. The project also provides AI-based scores that estimate whether a DNA change affects the plant, and simple genome views that show which parts of the DNA are likely important. These steps organize research data into an AI-ready resource that speeds up the identification of gene function and the interpretation of genetic changes. This helps the public and American farmers by enabling quicker development of corn varieties that are more resistant to disease and challenging growing conditions, which improves yields and supports stable, affordable food supplies.

Technical Abstract: The integration of Artificial Intelligence (AI) and Machine Learning (ML) is changing biological research, particularly in agriculture, where large and complex datasets offer opportunities for discovery and crop improvement. Maize (Zea mays), a globally critical crop with extensive genomic, genetic, and proteomic resources, stands to benefit from AI integration. The Maize Genetics and Genomics Database (MaizeGDB) is proactively building an AI-ready infrastructure by standardizing datasets, pre-computing complex features, developing novel interactive tools, and providing reproducible workflows. This paper details MaizeGDB's strategic initiatives to create a foundation of AI-ready data in standardized formats and generate pre-computed embeddings from cutting-edge DNA and protein language models. We introduce new functionalities, including zero-shot variant effect scoring and genome browser tracks for visualizing nucleotide importance. Furthermore, we provide custom dataset assembly resources and reproducible workflows via GitHub. By providing access to and organization of maize data, MaizeGDB enables the maize research and breeding community to leverage AI for the accelerated discovery of gene function, variant interpretation, and the development of improved maize varieties.