Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #434556

Research Project: Mapping Crop Genome Functions for Biology-Enabled Germplasm Improvement

Location: Plant, Soil and Nutrition Research

2023 Annual Report

Objective 1: Apply comparative genomic, genetic, and molecular approaches to the dissection of complex traits and the understanding of genome functions; develop and implement new standards for the management and analysis of plant genomic, genetic and phenotypic information; and dissect gene networks associated with programming crop plant development and adaptation to environment (GxE). Sub-objective 1.A: Reference genomic resources will be generated to target support four crop communities. The achievement of this objective will generate information management resources for maize (Zea mays), sorghum (Sorghum bicolor), grapevine (Vitis vinifera), and rice (Oryza sativa). Sub-objective 1.B: Develop functional and comparative genomics resources for plant reference genomes. The achievement of this objective will expand the Gramene databases to encompass reference genomes of at least 75 unique plant species. Sub-objective 1.C: Develop functional networks for crop and model species. Through achieving this objective, an integrated genetics, transcriptomics, and molecular interaction data will be generated to define regulatory networks that influence plant traits through effects on developmental morphology (architecture) and response to environment. Objective 2: Accelerate sorghum trait analysis, germplasm analysis, genetic studies, and breeding by acquiring, integrating, and providing open access to sorghum genome sequences and annotations, germplasm diversity information, trait mapping information, and phenotype information in a sorghum crop genome database system, with an initial emphasis on sugarcane aphid resistance.

The future of crop breeding will increasingly rely on strategies that combine genetic resources with rapidly advancing tools and knowledge in genomics, trait mapping, high-throughput phenotyping, and genome-editing. Yet, major challenges remain in translating vast amounts of data into useable biological models and building scalable information systems to enable researchers and breeders to contribute to and exploit these future technologies. To meet these challenges, this project will engage several strategic initiatives and collaborations that produce new genomics data, cyberinfrastructure, and hypothesis-based research. The first objective will generate new genomics datasets among four crop groups: sorghum, maize, rice and grapevine. Objective 1.A will produce a minimum of 30 high-quality reference genome assemblies, transcriptomes, and corresponding annotations. In maize and sorghum, we will also generate ENCODE-type molecular data sets to study the relationships between chromatin structure, gene expression, and phenotype. In sorghum and grapevine, we will sequence disease resistance genes across key germplasms that target critically important pests/pathogens. To enable sharing of reproducible workflows and promote interoperability, computational work will be performed using the recently developed SciApps cyberinfrastructure. In Objective 1.B, genomics data will be further disseminated via Gramene/Ensembl to support genome stewardship, comparative and pan-genomics analysis (in 2-3 crop groups), and display of ENCODE-type and publicly archived variation/genotype data. This platform will enable researchers to evaluate structural variation within crop clades and use conservation profiles to evaluate candidate genes. In Objective 1.C, we will continue several hypothesis-based studies of gene regulatory networks that underlie yield components influenced by morphological development and nutrient and stress response/adaptation. These projects combine forward and reverse genetics with transcriptional profiling, fluorescence in situ sequencing, and yeast-based molecular interaction assays to elucidate regulatory pathways that control plant traits. We will continue to use sorghum EMS mutagenized lines to dissect pathways underlying inflorescence architecture and the multi-seeded trait. Research in nitrogen use efficiency will be continued using maize and Arabidopsis as models. Objective 2 focuses on development of the new sorghum genomics and genetics portal to serve scientists and breeders working on grain sorghum improvement. Goals include initial release of Sorghum Base as a comparative functional genomics resource, with future development of infrastructure to support phenotypic data and genomics-enabled germplasm improvement. A critical component of this plan includes sorghum community engagement. The products of these two objectives will include well-characterized germplasm and the associated genotypic and phenotypic characterization of complex agronomic traits, which will enable genomic-assisted breeding and novel approaches for understanding the genetic architecture of traits critical to US agriculture.

Progress Report
This is the final report for Project 8062-21000-044-000D, which ended April 3, 2023. During this period, we have continued to improve the content and function of the SorghumBase and Gramene sites. SorghumBase (SB; had three releases (R4, R5, R6). The SB site now hosts 29 sorghum genomes, and the genetic variation capturing more than 1.6 million loss of function mutations (403356). Additionally, this year we have added 10 new sorghum accession parents of the CP-NAM population (398830), updated assembly and annotations for BTx623 V5, added novel EMS data sets–8.6 million SNPs from 890 EMS population (404620)–and natural variants with 32.5 million SNPs from the SAP panel. Complementing the genomic data sets are updates to the publication database (over 500 abstracts), weekly community news, and events. We have improved the search function with the addition of a publication tab, which has a table of PUBMED linked papers associated with the curated gene function from NCBI geneRIFs and Rice RAP-DB. Trait Ontology or Plant Ontology terms are indexed for the curated genes and are searchable. While we have robust infrastructure for genomics, there are limited resources for phenotypic and breeding data. Therefore, we are working with TAMU and Cornell to adopt the Breedbase data model. In the last year, we hosted a working version of Sorghum Breedbase, and identified challenges associated with data collection, standards, scaling and access control. The main Gramene site (G; featured one major release (R66). R66 includes 128 reference crop and model genomes. June 2023 debuted the Oryza CLIMtools in collaboration with Penn State University to bring interactive views of environment-by-genome associations for 413 climate variables in ~940 Oryza landraces, and correlations between the local environment. To support releases, Cold Spring Harbor Laboratory (CSHL) coordinates on the mirror integration of EnsemblGenomes/Plants (398502), EBI atlas, and Plant Reactome (382815, 382817). CSHL supported the biocuration of genomic data for EnsemblPlants and expression data for EBI Atlas and PlantReactome. Gramene currently hosts 3 species-specific pansites with a total of 3 releases: maize (R3), rice (R5, R6), and grapevine (R3, no release last year). Updated rice content ( includes new synteny maps and updated genetic variation calls for Nipponbare, MH63, IR64, and Azucena accessions (R6) (406689,406687) and indexed links to publications for more than 4000 rice genes. Maize R3 includes 8 new maize genomes for a total of 43 genomes (402118) in coordination with MaizeGDB. The grapevine site ( provides access to 17 grapevine genomes and had no major data updates this year, but efforts included collaboration on updated PN2004 gene structural annotations (400981). During this period, we annotated 10 sorghum parent genomes from carbon-partitioning nested association mapping (CPNAM) with UNCC (398830), and the medicinal plant S. Nigrium with South Korean collaboration (402028) and updated the PN40024 reference using full-length cDNA isolation from developmental tissue in collaboration with the European Union (EU) grapevine community (404552). In coordination with the rice community, our focus during this period was on providing access to structural variation associated with inversions (398836) and mapping the 3K rice panel using GATK in context of several rice accessions (406689). A major community outreach effort has been the coordination of a Sorghum Community Marker Panel focused on domestic breeding efforts. A US commercial provider was identified, and more than ~3000 probes were submitted for panel design validation. We are coordinating with GRIN-Global on the selection of the germplasm for assay validation with input from government, academic and industry stakeholders. Despite cheaper sequencing costs, challenges remain in annotating reference genomes. This year we are in the process of benchmarking a pangene annotation workflow, which remaps a pangene index built from gene trees to a reference assembly and refines the models using accession-specific evidence. Preliminary analyses suggest that the workflow needs less compute resources and improves sensitivity. We have been coordinating with peer infrastructure initiatives such as NSF Pan Oryza projects, EU Grapedia, and MaizeGDB to curate gene structure sets. We have coordinated with RAP-DB to identify missing genes and annotate them for incorporation in their next release. We have provided access and training for Gramene genetree tool (grapevine, rice, sorghum, maize) and Apollo instances (Sorghum, maize) to peer resources, and are actively curating models to support the pangene index and functional catalogs. We are contributing to AgBio Consortium activities, specifically 4 working groups (399297, 402109, 402154, 404552), and have a representative on the AgBio Steering Committee. In addition to the delivery of the G & SB portals, we are actively engaging the community through various channels. Virtual and in-person meetings provide opportunity for training, feedback, and suggestions for new functionality. We work directly with sorghum, maize, rice, and grapevine stakeholders to coordinate nomenclature standards and priority datasets for integration and held more than 30 individual meetings. Major community events in the last year include: TriSociety meeting (402304), PAG 2023 (402103, 402106, 402108, 402154), Maize Genetics 2023 (402115, 402118, 402116, 402099, 402150), ASPB North East section (404473, 404472, 404478), and Sorghum in 21st Century 2023 (403355, 403356). We also had virtual presence at the rice IRGSF 2022 (402302) and Plant Cell Atlas 2022 (399297) and are scheduled to participate in ASPB 2023 (399297, 403567, 406547, 406548, 406549). This year we held a 2-hour virtual workshop on SB Webinars. In addition to reference genomes, we are continuing to dissect gene networks associated with plant development and environmental response. Plant Development: Flower development is an important contributor to plant yield. We have begun to prototype single nuclei sequencing (snRNA-seq) for sorghum inflorescence meristems, which will complement partially completed work on bulk RNA-seq of meristems across different developmental timepoints. This will help understand conserved and divergent gene regulatory networks (GRNs) that contribute to panicle architectural diversity within sorghum accessions. Currently three germplasms (BTx623, Rio, and Pink Kafir) have been screened across three stages of meristematic development and have shown shared and divergent GRNs. This initial trial provided sufficient insight into the depth and breadth of developmental stages required for categorizing floral architectural GRNs. To complement this, DAP-seq experiments are being conducted to identify the transcription factor (TF) binding sites of orthologous TFs known to play a role in inflorescence development and structure in maize, rice, or Arabidopsis. Environmental Response: Macro and micronutrients are critical to plant growth since their limitation as well as excess can have fitness impacts. We continue our work to characterize associated GRNs. Thus, we are extending our DAP-seq pipeline to profile the under-characterized GRAS family of TFs, which are important for numerous aspects of plant growth and development, but specifically root development and signaling. We have successfully identified gene targets for three TFs in different GRAS family clades within sorghum and are in the process of completing an analysis for publication. These TF binding profiles also provide resources for identifying conserved and divergent promoter and enhancer motifs across plant species and sorghum accessions. Nitrogen Use Efficiency: We finished processing sorghum RNAseq data under limiting and sufficient nitrogen, processing the whole plant for phenotypes. To support the comparison of previous maize data on the promoters of a yeast one hybrid network, these data were propagated from maize V3 to V5 and the RNAseq was reanalyzed on Maize V5. Preliminary data suggested the higher conservation of regulatory modules associated with nitrogen assimilation as compared to transport. We have identified a candidate monocot BZIP transcription factor for potential follow up. A manuscript is in progress (402150, 04473). Phosphorous: Efficient use of available phosphorus is crucial for plant growth, development, and yield. In the last year, we made use of the Arabidopsis yeast one hybrid miRNA transcription factor regulatory network. Using published information on the role of miRNA in response to phosphorus, we extracted a subnetwork from a miRNA Yeast one hybrid network, combined this with published Arabidopsis expression data, and identified several TFs candidates including six which had been previously validated by others. We were able to obtain Homozygous tDNA insertions for nine TFs, two of which demonstrated root length using a plate assay. ZN and FE: We continue the work on genes related to ZN with collaborators from Brookhaven National Laboratory and have completed genetic analyses, functional validation of a predicted Zinc chaperone using arabidopsis tDNA mutants, and prepared and submitted the manuscript (405909). We generated Arabidopsis CRISPR knock outs, a predicted heme related protein, validated impact under limiting Zn, a draft manuscript is in preparation (406357). To understand the plant response to Zn & Fe, we have grown hydroponically grown sorghum BTx623, under high, normal and low Zn & Fe, and phenotyped those seedlings for mineral composition using ICP-MS, dry weight, whole plant phenotypes, and transcriptome responses. The initial analyses are complete, and we are currently working on a manuscript (403912).

1. SorghumBase website release and updates. SorghumBase is a public web-based database that houses sorghum genetic and genomic data that is useful to the sorghum research community. Since the initial release in 2021, the site has seen updated for data, software and usability. Data updates include more genomes, more variant information, and improved functional information on the gene models. These are important to the community because they provide valuable insight into genetic variation that can contribute to differences in physical traits between sorghum lines. This was accomplished by continually adding newly published genomes to the SorghumBase interface while iteratively improving upon gene models and phylogenetic trees through automated and community curation, and extending the functionality on the site. Stewardship of plant genomic data is critical to support access and reuse of the data to address sustainable agriculture.

2. Ten new high quality genome assemblies for diverse bioenergy sorghum genotypes. This work successfully generated high-quality reference genome assemblies for 10 sorghum genomes that are important to breeders and molecular biologists, particularly in characterizing the functional space including genes, regulatory regions, and structural variation. The data is available through SorghumBase. This research has importance to the breeding and researchers to support identifying genetic regions between these accessions that are possible targets for incorporation into breeding programs or biochemical analysis.

3. An improved reference of the grapevine genome reasserts the origin of the PN40024 highly homozygous genotype. Long-read sequencing of high-quality grape DNA followed by assembly and annotation resulted in a superior, contiguous reference genome and gene annotations. The improved reference assembly created here is valuable to the to better understand the metabolic and regulatory pathways that contribute to desirable agronomic traits.

4. Working toward community standards with the AgBio Consortium. Having agreed upon standards for measurements and descriptors within the life sciences is crucial for making scientific research universally approachable. The AgBio Consortium is a communal effort by dozens of scientists over the globe to understand current standards–or lack thereof–across various aspects of genetic, genomic, and phenotypic experimentation. This includes descriptive terminology, file structure, and interoperability between wet bench and field researchers and how that work ultimately is incorporated into databases for the larger public. Our group participates in multiple working groups within the AgBio Consortium, and recently published work that aims to coalesce standards for describing genotypes and phenotypes.

Review Publications
Voelker, W.G., Krishnan, K., Chougule, K., Alexander, L.C., Lu, Z., Olson, A., Ware, D., Songsomboon, K., Ponce, C., Brenton, Z.W., Boatwright, L., Cooper, E.A. 2023. Ten new high quality genome assemblies for diverse bioenergy sorghum genotypes. Frontiers in Plant Science. 13:1040909.
Gladman, N.P., Hufnagel, B., Regulski, M., Liu, Z., Wang, X., Chougule, K., Kochian, L., Magalhaes, J., Ware, D. 2022. Sorghum root epigenetic landscape during limiting phosphorus conditions. Plant Direct. 6(5):e393.
Zhou, Y., Yu, Z., Chebotarov, D., Chougule, K., Ly, Z., Rivera, L.F., Kathiresan, N., Al-Bader, N., Mohammed, N., Alsantely, A., Mussurova, S., Santos, J., Thimma, M., Troukhan, M., Fornasiero, A., Green, C., Copetti, D., Kudrna, D., Llaca, V., Lorieux, M., Zuccolo, A., Ware, D., McNally, K., Zhang, J., Wing, R., Kudrna, D. 2023. Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian Rice (Oryza sativa). Nature Genetics. 14:1567.