Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #434556

Research Project: Mapping Crop Genome Functions for Biology-Enabled Germplasm Improvement

Location: Plant, Soil and Nutrition Research

2022 Annual Report

Objective 1: Apply comparative genomic, genetic, and molecular approaches to the dissection of complex traits and the understanding of genome functions; develop and implement new standards for the management and analysis of plant genomic, genetic and phenotypic information; and dissect gene networks associated with programming crop plant development and adaptation to environment (GxE). Sub-objective 1.A: Reference genomic resources will be generated to target support four crop communities. The achievement of this objective will generate information management resources for maize (Zea mays), sorghum (Sorghum bicolor), grapevine (Vitis vinifera), and rice (Oryza sativa). Sub-objective 1.B: Develop functional and comparative genomics resources for plant reference genomes. The achievement of this objective will expand the Gramene databases to encompass reference genomes of at least 75 unique plant species. Sub-objective 1.C: Develop functional networks for crop and model species. Through achieving this objective, an integrated genetics, transcriptomics, and molecular interaction data will be generated to define regulatory networks that influence plant traits through effects on developmental morphology (architecture) and response to environment. Objective 2: Accelerate sorghum trait analysis, germplasm analysis, genetic studies, and breeding by acquiring, integrating, and providing open access to sorghum genome sequences and annotations, germplasm diversity information, trait mapping information, and phenotype information in a sorghum crop genome database system, with an initial emphasis on sugarcane aphid resistance.

The future of crop breeding will increasingly rely on strategies that combine genetic resources with rapidly advancing tools and knowledge in genomics, trait mapping, high-throughput phenotyping, and genome-editing. Yet, major challenges remain in translating vast amounts of data into useable biological models and building scalable information systems to enable researchers and breeders to contribute to and exploit these future technologies. To meet these challenges, this project will engage several strategic initiatives and collaborations that produce new genomics data, cyberinfrastructure, and hypothesis-based research. The first objective will generate new genomics datasets among four crop groups: sorghum, maize, rice and grapevine. Objective 1.A will produce a minimum of 30 high-quality reference genome assemblies, transcriptomes, and corresponding annotations. In maize and sorghum, we will also generate ENCODE-type molecular data sets to study the relationships between chromatin structure, gene expression, and phenotype. In sorghum and grapevine, we will sequence disease resistance genes across key germplasms that target critically important pests/pathogens. To enable sharing of reproducible workflows and promote interoperability, computational work will be performed using the recently developed SciApps cyberinfrastructure. In Objective 1.B, genomics data will be further disseminated via Gramene/Ensembl to support genome stewardship, comparative and pan-genomics analysis (in 2-3 crop groups), and display of ENCODE-type and publicly archived variation/genotype data. This platform will enable researchers to evaluate structural variation within crop clades and use conservation profiles to evaluate candidate genes. In Objective 1.C, we will continue several hypothesis-based studies of gene regulatory networks that underlie yield components influenced by morphological development and nutrient and stress response/adaptation. These projects combine forward and reverse genetics with transcriptional profiling, fluorescence in situ sequencing, and yeast-based molecular interaction assays to elucidate regulatory pathways that control plant traits. We will continue to use sorghum EMS mutagenized lines to dissect pathways underlying inflorescence architecture and the multi-seeded trait. Research in nitrogen use efficiency will be continued using maize and Arabidopsis as models. Objective 2 focuses on development of the new sorghum genomics and genetics portal to serve scientists and breeders working on grain sorghum improvement. Goals include initial release of Sorghum Base as a comparative functional genomics resource, with future development of infrastructure to support phenotypic data and genomics-enabled germplasm improvement. A critical component of this plan includes sorghum community engagement. The products of these two objectives will include well-characterized germplasm and the associated genotypic and phenotypic characterization of complex agronomic traits, which will enable genomic-assisted breeding and novel approaches for understanding the genetic architecture of traits critical to US agriculture.

Progress Report
Progress was made towards all objectives. In the last year, we continued development of genomic resources for the maize, rice, grapevine and sorghum communities as part of Objectives 1.a, 1.b, and 2. We completed two Gramene releases, staged data for five pan-genome site releases, developed structural annotations for 20 rice, and 1 grapevine reference assemblies, staged 2 updates of the SorghBase portal, continued outreach to the community as well as completed research projects for root system architecture during limiting phosphorus and genetic controls of biomass in sugarcane. We have continued to increase content and new capabilities at Gramene [Log 382815, 395771]. Gramene featured 2 releases (R64, R65). The resource is a collaborative effort with European Bioinformatics Institute (EBI), Oregon State University (OSU) and Ontario Institute for Cancer Research (OICR). A new feature in R64 was the Arabidopsis CLIMtools in collaboration with with Pennsylvania State University that brings interactive web-based views of environment-by-genome associations for 473 climate variables in 2999 Arabidopsis accessions, RiboSNitch prediction, and correlations between the local environment and a pool of curated phenotypes (Log 397011). To support releases, Cold Spring Harbor Laboratory (CSHL) coordinates on the mirror integration of EnsemblGenomes [Log 398502], EBI atlas, and Plant Reactome (Log 382815, 382817). We supported the biocuration of genomic data for EnsemblPlants and expression data for EBI Atlas and PlantReactome releases. R65 includes 118 reference genomes, 70 of which are from agronomically important species, and provides access to genome browsers for each genome. These browsers support interoperability between resources and species, include functionality to upload customized data sets and view gene trees. The Gramene Pan Genome subsites had five updates this year. All pan sites have a common search interface that simplifies site-specific customization and deployment, and are built using similar infrastructure as Gramene. We developed a web service to support sequence similarity searches, implemented incremental additions of reference genomes to existing gene trees, and used virtual machines to reduce staging time for releases. In addition to sorghum, which is hosted at SorghumBase (SB), described below, Gramene currently hosts three species: maize, rice, and grapevine. Updated content for the site included staging of 17 rice (R3, R4), and 11 grapevine genomes (R2, R3). For maize (R2) data tracks from nested association mapping (NAM) population were updated (Log 381179) in coordination with MaizeGDB. The rice pan site ( provides access to Asian cultivated MAGIC16 rice lines. The grapevine site ( provides access to 17 grapevine genomes (Log 395781). The first public release of the sorghum knowledge base portal SorghumBase (SB; was completed in late summer 2021 followed by 2 major data updates in the last year (R2, R3) and had manuscript published on the database (Log 389929). SB R2 provided access to 13 new sorghum varieties for a total of 18 reference genomes. R3 focused on improving access to population data, including over 17 million new genetic variants: nearly 13 million naturally occurring SNPs in 499 sorghum accessions, and more than 3 million SNPs from two independent chemically-induced point mutation populations. Complementing the major data releases are weekly community news and events. In January of 2022, we implemented research highlights summarizing recent scientific papers with quotes from authors and examples of data views from SB. The publications database now hosts over 400 abstracts and the news items and publications abstracts are indexed and searchable from the site, allowing users to search for authors and by keywords. We worked with stakeholders and had more than 40 individual meetings to support data integration, improved site usability, and received feedback from the community. We have held 3 webinars and participated in three conferences to support in person meetings with researchers (Log 397007). We have also begun evaluation of the Breeding Insight resources for application to Sorghum. In addition to the delivery of the Gramene & SB portals, we are actively engaging the community through various channels. Virtual and in person meetings provide opportunity for training, community feedback, and suggestions for new functionality. We work directly with stakeholders for sorghum, maize, rice, and grapevine to coordinate on nomenclature standards and priority datasets for integration. Major community events in the last year include: Pan Genome Evolution, American Society of Plant Biology 2022 (Log 397005), Maize Genetics 2022, and Biology of Genomes 2022 in which we hosted talks, posters, and/or demonstrations, as well as 3 SB Webinars. A major outreach this year has been on coordinating support for community curation of gene structures using the Gramene genetree tool and Apollo; we have been working closely with the grapevine community via two virtual workshops and begun engagement with the rice community as well. In addition to working with individual researchers on standards, we are contributing to AgBio Consortium activities, specifically 4 working groups, and a representative on the AgBio Steering Committee. While these new technologies have greatly lowered the cost and improved the quality of reference genomes, challenges remain in annotating these genomes consistently and developing a pangene index. Working with collaborators on sugarcane, from Brazil, the prior work on understanding biomass was revised and accepted (Log 387126). In the last year we continued to update gene annotations for current sugarcane SP80-3280 assembly and characterized 73,946 novel transcripts. We studied the changes in the expression level of genes involved in amino acids, lipids, and carbohydrates metabolism among genotypes contrasting for biomass production, and have submitted a manuscript on the work (Log 397021). We continue to work with the rice community on structural annotations for MAGIC16 Oryza lines, resulting in improved transcript lengths, UTRs, and exon numbers. We contributed to a manuscript focused on characterization of structural variation in rice (Log 397024). With the grapevine community we sequenced full-length cDNA from developmental tissue using an improved benchmarked PacBio annotation pipeline for the PN40024 reference, which have also been used in the community curation workshops described above. In addition to reference genomes, we are continuing to dissect gene networks associated with plant development and environmental response. Flower development is an important contributor to plant yield. To understand the genes that control flower development, collaborators at CSHL, generated a high-resolution gene expression atlas of the maize floral meristem using single cell sequencing approach (scRNA-seq) and showed distinct sets of genes govern the regulation and identity of stem cells. These integrated transcriptomic networks can better predict genetic redundancy and identify candidate genes associated with beneficial crop yield traits. The manuscript on this work was revised and accepted (Log 382496). We have begun to prototype scRNA-seq for sorghum inflorescence meristems. Roots play a major role in plant growth through uptake of water, macro-, and micro-nutrients. Based on the previously established transcription factor regulatory network in the model plant Arabidopsis, we have prioritized 14 TF families for functional analyses of their impact on primary root development. We have established a plate assay for primary root growth at 7 days and screened 80 loss-of-function TF candidates. Preliminary data revealed candidates associated with shorter and longer roots and a further understanding of the underlying gene network will provide candidate genes for modulating plant root system architecture (Log 386838). We completed revisions on previous work on characterization of Zinc Finger Homeodomain Transcription factors role in plant architecture which has been accepted. We continued work on Nitrogen Use Efficiency. In the last year we established a local hydroponics system. We have used this to grow sorghum under limiting and sufficient nitrogen, collected whole plant phenotypes and tissues, and are evaluating gene expression (Log 397009, 397010). We have also grown sorghum BTx623 under limiting and excess Zn and Fe and phenotyped those seedlings for mineral composition using ICP-MS. Efficient use of available phosphorus is crucial for plant growth, development, and yield. In the last year, with collaborators from Brazil and Canada, we completed the analyses of sorghum root system architecture (RSA) modulation, gene expression, and DNA modification in response to limiting phosphorus (LP) conditions. We show that during LP conditions sorghum RSA expands laterally and vertically, and global 5-methylcytosine and H3K4 and H3K27 trimethylation levels decrease in the RSA. Change in gene expression under LP is weakly to moderately correlated with H3K4me3 DNA modification at genic and promoter regions of genes and lateral root regions display the most disparate amount of differential gene expression within the RSA (Log 386841, 386839). Subsequent studies will look at the molecular basis of the role of promoter, enhancer, and cryptic sequence regulatory candidates for enhanced P usage efficiency leading to newer approaches for improved crop adaptation to LP soils worldwide.

1. Pan genome resources for rice, corn, maize, and grapevine. ARS researchers in Ithaca, New York, worked with academic partners at Cold Spring Harbor and stakeholder from each commodity group to deliver five updated releases of the 4 pan genome portals. The species specific portals provide access to 26 maize, 18 sorghum, 17 rice and 11 grapevine reference sequence assemblies, in a consistent format, utilizing infrastructure developed for the human genome project. Each portal includes 5 other plant species, including the model plant as well as fly (Drosophila melanogaster), to construct protein-based gene trees. The gene trees are used to provide information on the quality of the gene annotation structure. The trees are also used to transfer information between species as many of the crop species are poorly annotated for gene function. The gene trees support insights into gene loss and gain within and between species. In plants one path to change gene expression is to change the total copy number of genes. Changes in copy number within individuals provide a pool of genetic variation; in specific environments it may provide a fitness impact, acting as potential reserves for adaptive responses with a species. Access to these gains and loss, in the context of other genomes within a species and across species, can provide insights for geneticists, biologists and breeders interested in crop improvement into genes associated with agronomically important traits ranging.

2. Root system changes in sorghum during limiting phosphorus conditions. Phosphorus is a limited macronutrient critical for plant development and exogenous field application can lead to yield increases. Understanding how plants respond to limited phosphorus may reveal insight on how to improve plant resilience to low phosphate inputs. ARS researchers in Ithaca, New York, and collaborators at Cold Spring Harbor Laboratory, New York, Brazil, and Canada published work on the effects of low phosphorus (LP) on the root system in sorghum. The work revealed an expansion of the sorghum root system, which is uncommon compared to many other crops, resulting in large surface area within the root system that likely improves the overall ability of sorghum to uptake phosphorus from the environment.

Review Publications
Olson, A., Ware, D. 2021. Ranked choice voting for representative transcripts with TRaCE. Bioinformatics. 2021, btab542.
Tello-Ruiz, M.K., Jaiswal, P., Ware, D. 2022. Gramene: a resource for comparative analysis of plants genomes and pathways. In: Edwards, D. Plant Bioinformatics Methods and Protocols. 3rd edition. Hertfordshire, UK:Humana Press. p.101-131.
Yates, A.D., Allen, J., Amode, R.M., Azov, A.G., Barba, M., Becerra, A., Bhai, J., Campbell, L.I., Carbajo Manuel, M., Chakiachvili, M., Chougule, K., Christensen, M., Contreras-Moreira, B., Cuzick, A., Fioretto, L., Davis, P., De Silva, N.H., Diamantakis, S., Dyer, S., Elster, J., Filippi, C.V., Gall, A., Grigoriadis, D., Guijarro-Clarke, C., Gupta, P., Hammond-Kosack, K.E., Howe, K.L., Jaiswal, P., Kaikala, V., Kumar, V., Kumari, S., Langridge, N., Le, T., Luypaert, M., Maslen, G.L., Maurel, T., Moore, B., Muffato, M., Mushtaq, A., Naamati, G., Naithani, S., Olson, A., Parker, A., Paulini, M., Pedro, H., Perry, E., Preece, J., Quinton-Tulloch, M., Rodgers, F., Rosello, M., Ruffier, M., Seager, J., Sitnik, V., Szpak, M., Tate, J., Tello-Ruiz, M.K., Trevanion, S.J., Urban, M., Ware, D., Wei, S., Williams, G., Winterbottom, A., Zarowiecki, M., Finn, R.D., Flicek, P. 2021. Ensembl Genomes 2022: an expanding genome resource for non-vertebrates. Nucleic Acids Research. 50(D1):D996-D1003.
Kumari, S., Kumar, V., Beilsmith, K., Seaver, S., Canon, S., Dehal, P., Gu, T., Joachimiak, M., Lerma-Ortiz, C., Liu, F., Zhenyuan, L., Pearson, E., Ranjan, P., Riel, W., Henry, C.S., Arkin, A.P., Ware, D. 2021. A KBase case study on genome-wide transcriptomics and plant primary metabolism in response to drought stress in sorghum. Current Plant Biology.
Gladman, N., Olson, A., Wei, S., Chogule, K., Lu, Z., Tello0ruiz, M., Meijs, I., Van Buren, P., Jiao, Y., Wang, B., Kumar, V., Kumari, S., Zhang, L., Burke, J.J., Chen, J., Burow, G.B., Hayes, C.M., Emendack, Y., Xin, Z., Ware, D. 2022. SorghumBase: a web-based portal for sorghum genetic information and community advancement. Planta. 255:35.
Ferroro-Serrano, A., Sylvia, M.M., Forstmeier, P.C., Olson, A.J., Ware, D., Bevilacqua, P.C., Assmann, S.M. 2022. Experimental demonstration and pan-structurome prediction of climate-associated riboSNitches in arabidopsis. Genome Biology. 23:101.