Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Research Project #444388

Research Project: MaizeGDB - Database and Computational Resources for Maize Genetics, Genomics, and Breeding Research

Location: Corn Insects and Crop Genetics Research

2024 Annual Report


Objectives
Objective 1: Improve maize trait analysis (e.g., drought and cold tolerance, disease and pest resistance), germplasm development, genetic studies, and breeding through stewardship of maize genomes, pan genomes, genetic data, and phenotype data. Goal 1.A: Bring in reference-quality genome assemblies of domesticated maize outgroups that include stress-resilient varieties and connect gene-model and genome-browser pan-gene relationships between these genomes and domesticated maize. Goal 1.B: Represent and integrate maize diversity through hosting maize genomes, pan-genomes, graph information, and whole-genome sequencing data. Objective 2: Identify and curate key datasets (e.g., 3-D protein structure, pangenome gene functions) that will serve to enhance maize functional genome annotation with an emphasis on the targeted curation of traits related to abiotic and biotic stress and climate change. Goal 2.A: Integrate maize stress-response expression and trait data with MaizeGDB genomes and functional genome annotation tools. Goal 2.B: Integrate 3-D gene model protein structures across maize genomes, compare them within a pan-gene framework, and create gene function predictions based on protein structure similarity. Objective 3: Develop infrastructure to integrate, add value to, and visualize multi-omics data sets, enable comparative genomics, facilitate genome to phenome knowledge discovery, and provide analysis through artificial intelligence approaches and genomic discovery tools. Goal 3.A: Provide comparative and pan-genome resources to understand diversity and organize genes and develop artificial intelligence approaches to facilitate exploration of the complex relationship between phenotype and genotype. Objective 4: Provide community support services, build strategic partnerships, and provide database training and outreach activities for user communities and stakeholders. Goal 4.A: Facilitate communication among maize researchers to support research community needs and create and leverage synergistic activities with other databases and plant research communities.


Approach
The Maize Genetics and Genomics Database (MaizeGDB – http://www.maizegdb.org) is the model organism database for maize. MaizeGDB’s overall aim is to provide long-term storage, support, and stability to the maize research community’s data and to provide informatics services for access, integration, visualization, and knowledge discovery. The MaizeGDB website, database, and underlying resources allow plant researchers to understand basic plant biology, make genetic enhancement, facilitate breeding efforts, and translate those findings into products that increase crop quality and production. To accelerate research and breeding progress, generated data must be made freely and easily accessible. Curation of high-quality and high-impact datasets has been the foundation of the MaizeGDB project since its inception over 25 years ago. MaizeGDB serves as a two-way conduit for getting maize research data to and from our stakeholders. The maize research community uses data at MaizeGDB to facilitate their research, and in return, their published data gets curated at MaizeGDB. The information and data provided at MaizeGDB and facilitated through outreach has directly been used in research that has had broad commercial, social, and academic impacts. The MaizeGDB team will make accessible high-quality, actively curated and reliable genetic, genomic, and phenotypic description datasets. At the root of high-quality genome annotation lies well-supported assemblies and annotations. For this reason, we focus our efforts on benefitting researchers by developing a system to ensure long-term stewardship of both a representative reference genome sequence assembly with associated structural and functional annotations as well as additional reference-quality genomes that help represent the diversity of maize. In addition, we will enable researchers to access data in a customized and flexible manner by deploying tools that enable direct interaction with the MaizeGDB database. Continued efforts to engage in education, outreach, and organizational needs of the maize research community will involve the creation and deployment of video and one-on-one tutorials, updating maize Cooperators on developments of interest to the community, and supporting the information technology needs of the Maize Genetics Executive Committee and Annual Maize Genetics Conference Steering Committee.


Progress Report
ARS scientists from the Maize Genetics and Genomics Database (MaizeGDB) project in Ames, Iowa, provide valuable tools and resources for investigative research and crop improvement by leveraging maize genetics, genomics, and breeding data. In line with Objective 1, MaizeGDB continues to enhance its stewardship efforts to encompass a wide range of high-quality genome sequences, including genome assemblies and annotations from closely-related species. This collection enables researchers to explore the rich diversity of maize genetics and genomics. The team has expanded its collection of high-quality genome sequences, including those from maize's wild relatives, to over 100 genomes and 1,400 supporting datasets. New features include tools linking gene data, genomic markers, and protein data across various maize varieties. A new data center focused on pan-genes (genes shared across multiple species within a genus, allowing for the study of their evolution and variation) offers interactive visualization tools, such as sequence alignments and gene trees, to investigate the evolution of genes across the Zea genus. These advancements will enhance the ability to analyze and utilize data from different sources, ultimately facilitating targeted crop improvement and addressing global food security challenges. As part of Objective 2, MaizeGDB curates and hosts important datasets that contribute to functional annotation of the maize genome. The recent focus is on datasets related to stress tolerance and climate change, including how maize responds to challenges like drought, heat, pathogens, and pests. A new gold-standard dataset is now available, featuring information on approximately 3,000 genes from 25 studies. This dataset reveals how genes are expressed differently under various stress conditions, providing valuable insights into maize's genetic resilience. Additionally, MaizeGDB has integrated other valuable datasets, including AI-derived gene annotations, a comprehensive protein atlas, and an updated genome-wide association study atlas. These resources enable researchers to explore the genetic basis of stress tolerance and climate resilience in maize, ultimately supporting the development of more resilient and sustainable crops. Regarding Objective 3, MaizeGDB is building a robust infrastructure to handle large-scale datasets and facilitate knowledge discovery. This updated infrastructure will enable researchers to integrate and visualize diverse types of genetic and genomic data, leading to new insights into the genes underlying agronomically important traits. Two new tools were developed: PanEffect, an AI-powered workflow that predicts and visualizes the effects of hundreds of millions of potential genetic variations in maize, and SNPversity 2.0, a tool that analyzes whole genome sequencing data from over 1,500 maize lines and wild relatives. These two tools are connected to allow researchers to easily filter, visualize, and download variations in the maize genome based on specific locations and accessions and quickly see the possible functional impact the variations have at the protein level. These enhancements will streamline data analysis, accelerate scientific breakthroughs, and ultimately improve our understanding of maize genetics and genomics. Objective 4 focuses on MaizeGDB's role as the central hub for maize research, facilitating communication and collaboration among researchers worldwide. The MaizeGDB team actively engages with the maize research community to identify their needs and priorities, enabling tailored resources and services to better serve their requirements. Additionally, strategic partnerships have been formed with dozens of agricultural biological databases, collaborating on data standards and interoperability. Training and outreach activities are provided to empower user communities and stakeholders, ensuring they can maximize the benefits of MaizeGDB. Through stewardship efforts, infrastructure development, curation of key datasets, and community support services, MaizeGDB continues to enhance the landscape of maize genetics, genomics, and breeding research.


Accomplishments
1. Unlocking maize diversity with pan-genomics. Pan-genomes encompass all the genetic sequences found in a collection of genomes, and are therefore more valuable than single-reference genomes for studying species diversity. This is especially true for a species like maize, which has a remarkably diverse and complex genome. Presenting maize pan-genome data, analyses, and visualization is further complicated by the extensive gene functional annotations present at the Maize Genetics and Genomics Database (MaizeGDB). ARS scientists in Ames, Iowa, have enhanced the MaizeGDB database to include genetic information from the entire Zea genus, covering both maize, and its wild relative, teosinte. This advancement in maize pan-genomics can help transform our ability to harness the full genetic potential of the Zea genus. The work enables researchers to unlock new insights into maize diversity and gene function, which are essential for developing more resilient and productive crop varieties to meet global food security challenges.


Review Publications
Woodhouse, M.H., Cannon, E.K., Portwood II, J.L., Harper, E.C., Gardiner, J.M., Schaeffer, M.L., Andorf, C.M. 2021. A pan-genomic approach to genome databases using maize as a model system. Biomed Central (BMC) Plant Biology. 21. Article 385. https://doi.org/10.1186/s12870-021-03173-5.
Sen, S., Woodhouse, M.H., Portwood Ii, J.L., Andorf, C.M. 2023. Maize feature store: a centralized resource to manage and analyze curated maize multi-omics features for machine learning applications. Database: The Journal of Biological Databases and Curation . 2023. Article baad078. https://doi.org/10.1093/database/baad078.
Poretsky, E., Andorf, C.M., Sen, T.Z. 2024. PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models. Plant Direct. 7(12). Article e554. https://doi.org/10.1002/pld3.554.
Poretsky, E., Cagirici, H.B., Andorf, C.M., Sen, T.Z. 2024. Harnessing the predicted maize pan-interactome for putative gene function prediction and prioritization of candidate genes for important traits. Genetics. 14(5). Article jkae059. https://doi.org/10.1093/g3journal/jkae059.
Andorf, C.M., Haley, O., Hayford, R.K., Portwood II, J.L., Harding, S.F., Sen, S., Cannon, E.K., Gardiner, J.M., Kim, H., Woodhouse, M.R. 2024. PanEffect: a pan-genome visualization tool for variant effects in maize. Bioinformatics. 40(2). Article btae073. https://doi.org/10.1093/bioinformatics/btae073.
Cannon, E.K., Portwood II, J.L., Hayford, R.K., Hayley, O.C., Gardiner, J.M., Andorf, C.M., Woodhouse, M.R. 2024. Enhanced pan-genomic resources at the maize genetics and genomics database. Genetics. 227(1). https://doi.org/10.1093/genetics/iyae036.