Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #434556

Research Project: Mapping Crop Genome Functions for Biology-Enabled Germplasm Improvement

Location: Plant, Soil and Nutrition Research

2021 Annual Report

Objective 1: Apply comparative genomic, genetic, and molecular approaches to the dissection of complex traits and the understanding of genome functions; develop and implement new standards for the management and analysis of plant genomic, genetic and phenotypic information; and dissect gene networks associated with programming crop plant development and adaptation to environment (GxE). Sub-objective 1.A: Reference genomic resources will be generated to target support four crop communities. The achievement of this objective will generate information management resources for maize (Zea mays), sorghum (Sorghum bicolor), grapevine (Vitis vinifera), and rice (Oryza sativa). Sub-objective 1.B: Develop functional and comparative genomics resources for plant reference genomes. The achievement of this objective will expand the Gramene databases to encompass reference genomes of at least 75 unique plant species. Sub-objective 1.C: Develop functional networks for crop and model species. Through achieving this objective, an integrated genetics, transcriptomics, and molecular interaction data will be generated to define regulatory networks that influence plant traits through effects on developmental morphology (architecture) and response to environment. Objective 2: Accelerate sorghum trait analysis, germplasm analysis, genetic studies, and breeding by acquiring, integrating, and providing open access to sorghum genome sequences and annotations, germplasm diversity information, trait mapping information, and phenotype information in a sorghum crop genome database system, with an initial emphasis on sugarcane aphid resistance.

The future of crop breeding will increasingly rely on strategies that combine genetic resources with rapidly advancing tools and knowledge in genomics, trait mapping, high-throughput phenotyping, and genome-editing. Yet, major challenges remain in translating vast amounts of data into useable biological models and building scalable information systems to enable researchers and breeders to contribute to and exploit these future technologies. To meet these challenges, this project will engage several strategic initiatives and collaborations that produce new genomics data, cyberinfrastructure, and hypothesis-based research. The first objective will generate new genomics datasets among four crop groups: sorghum, maize, rice and grapevine. Objective 1.A will produce a minimum of 30 high-quality reference genome assemblies, transcriptomes, and corresponding annotations. In maize and sorghum, we will also generate ENCODE-type molecular data sets to study the relationships between chromatin structure, gene expression, and phenotype. In sorghum and grapevine, we will sequence disease resistance genes across key germplasms that target critically important pests/pathogens. To enable sharing of reproducible workflows and promote interoperability, computational work will be performed using the recently developed SciApps cyberinfrastructure. In Objective 1.B, genomics data will be further disseminated via Gramene/Ensembl to support genome stewardship, comparative and pan-genomics analysis (in 2-3 crop groups), and display of ENCODE-type and publicly archived variation/genotype data. This platform will enable researchers to evaluate structural variation within crop clades and use conservation profiles to evaluate candidate genes. In Objective 1.C, we will continue several hypothesis-based studies of gene regulatory networks that underlie yield components influenced by morphological development and nutrient and stress response/adaptation. These projects combine forward and reverse genetics with transcriptional profiling, fluorescence in situ sequencing, and yeast-based molecular interaction assays to elucidate regulatory pathways that control plant traits. We will continue to use sorghum EMS mutagenized lines to dissect pathways underlying inflorescence architecture and the multi-seeded trait. Research in nitrogen use efficiency will be continued using maize and Arabidopsis as models. Objective 2 focuses on development of the new sorghum genomics and genetics portal to serve scientists and breeders working on grain sorghum improvement. Goals include initial release of Sorghum Base as a comparative functional genomics resource, with future development of infrastructure to support phenotypic data and genomics-enabled germplasm improvement. A critical component of this plan includes sorghum community engagement. The products of these two objectives will include well-characterized germplasm and the associated genotypic and phenotypic characterization of complex agronomic traits, which will enable genomic-assisted breeding and novel approaches for understanding the genetic architecture of traits critical to US agriculture.

Progress Report
In the last year, we continued the development of genomic resources for the maize, rice, grapevine, and sorghum communities. We improved and finalized the gene structural annotations pipeline and applied this to 26 maize and 5 sorghum genomes. Each maize genome contained an average of 40,621 protein-coding and 4,998 non-coding gene models per genome, with over a million independent gene models generated across the 26 lines. Phylogenetic dating revealed that most genes are shared with species in the Andropogoneae tribe and grass family [Log 381179, 381178]. We developed and improved TRaCE, a technique to assign the most represented/common transcript utilizing a ranked-choice voting algorithm that incorporates domain coverage, protein length, and similarity to expressed transcripts [Log 382631]. In Sorghum, using optical map technology, we were able to identify several errors in the contig order of the current reference Tx623. We characterized structural variations (SVs) using Tx2783 as a reference and completed a genome-wide scanning for disease resistance (R) genes, both of which revealed high levels of diversity among the five sorghum accessions. Tx2783, the Sugar Cane Aphid (SCA) tolerant line, contained an SV of 191 kb at the top of chromosome 6, where the SCA locus had been previously mapped, containing a cluster of R genes [Log 377837]. In addition to the reference genomes, we dissect gene networks associated with plant development and response to the environment. Flower development is an important contributor to plant yield. To understand the genes that control flower development, collaborators at Cold Spring Harbor Laboratory (CSHL), generated a high-resolution gene expression atlas of the maize floral meristem using single-cell sequencing approach (scRNA-seq) and showed that distinct sets of genes govern the regulation and identity of stem cells. We combined their scRNA-seq and transcription factor (TFs) binding experiments (ChIP-seq) and were able to predict and validate directly modulated targets of TFs, including networks controlling inflorescence development and epidermal differentiation. These transcriptomic networks can better predict genetic redundancy and identify candidate genes associated with beneficial crop yield traits [Log 382496]. We have begun to prototyped scRNA-seq for sorghum inflorescence meristem. Access to data from maize and Sorghum will allow us to compare and contrast modules between these two important grass species. Roots play a major role in plant growth through uptake of water, macro, and micronutrients. Based on the previously established transcription factor regulatory network in the model plant Arabidopsis, we have prioritized 14 TF gene families for functional analyses of their impact on primary root development. We have established a plate assay for primary root growth at 7 days. We are currently screening 40 loss-of-function TF candidates. Preliminary data has identified candidates associated with shorter and longer roots, and a further understanding of the underlying gene network will provide candidate genes for modulating plant root system architecture [Log 386838]. Efficient use of available phosphorus is crucial for plant growth, development, and yield. In the last year, with collaborators from Brazil and Canada, we completed the analyses of sorghum root system architecture (RSA) modulation, gene expression, and DNA modification in response to limiting phosphorus (LP) conditions. We show that sorghum RSA expands laterally and vertically during LP conditions, and global 5-methylcytosine and H3K4 and H3K27 trimethylation levels decrease in the RSA. Change in gene expression under LP is weakly to moderately correlated with H3K4me3 DNA modification at genic and promoter regions of genes, and lateral root regions display the most disparate amount of differential gene expression within the RSA [Log 386841, 386839]. Subsequent studies will look at the molecular basis of the role of promoter, enhancer, and cryptic sequence regulatory candidates for enhanced P usage efficiency leading to newer approaches for improved crop adaptation to low P soils worldwide Understanding the biological modules associated with drought response helps support improved germplasm. Collaborators in Brazil are working on a comprehensive picture of drought response recovery in the leaves of sugarcane. We have worked with the collaborators to improve the genome annotation for sugarcane and the drought recovery data interpretation. Studies over multiple years have identified gene co-expression modules enriched in photosynthesis, small molecule metabolism, alpha-amino acid metabolism, trehalose biosynthesis, serine family amino acid metabolism, and carbohydrate transport. Together, these findings suggest that carbohydrate metabolism is coordinated with the degradation of amino acids to provide carbon skeletons to the tricarboxylic acid cycle. This coordination may help maintain energetic balance during drought stress adaptation, facilitating recovery after the stress is alleviated. Our results shed light on candidate regulatory elements and devise biotechnology strategies for developing drought-tolerant sugarcane plants [Log 387126]. We continue to support cyberinfrastructure for improved genome data management of agronomically important species and their models. We have continued development on SciApps, a lightweight bioinformatics workflow system, and submitted a book chapter on the work [Log 383732]. In the last year, we completed two releases of the main Gramene site. The Gramene project is an online reference resource for plant genomes and curated pathways to aid functional genomics research in crops and model plant species. The resource is a collaborative effort with European Bioinformatics Institute (EBI), Oregon State University (OSU), and Ontario Institute for Cancer Research (OICR). The current release Gramene 63 includes 93 plant genomes, an increase in 26 genomes from last year. These genomes provide the basis for the comparative analysis, including 122,947 gene families, providing information on paralogs (similar genes within species), and orthologs (similar gene between species). These protein-based gene families provide the input for the 83 synteny maps, which are projections of conservation of gene order between species. They also serve as the foundation for plant pathway projection based on the rice pathways manually curated from the Plant Reactome [Log 382815, 382817]. In addition to the main site described above, we have invested major efforts in developing four species pan-genome subsites for maize, Sorghum, rice, and grapevine communities. Each subsite includes a common set of seven anchor outgroup species. In the last year, we have focused on expanding the number of species-specific genomes to 27 in maize, 25 in rice, 10 in Grapevine, and 5 in Sorghum as part of the SorghumBase effort below. We have worked with EBI on the curation of expression data, expanding the data sets available for these species. The Gramene project is actively engaging the community through various channels including webinars, presentations, talks, posters, and demonstrations during major community events including TriSociety 2020 [Log 386831], American Society of Plant Biology 2021, Maize Genetics 2021 [Log 386835], and Biology of Genomes 2021 providing training and the community’s feedback on our current tools and user suggestions for new functionality. This year, a major outreach has been on coordinating support for community curation of gene structures for maize, sorghum, and grapevine community. For maize and Sorghum, we continued to support the network of researchers and teaching faculty working on Course-based Undergraduate Research Experiences (CUREs). This year we initiated grapevine community efforts with collaborators from Europe to support curation of the reference grapevine catalog. The SorghumBase portal was released in July 2021 and includes five sorghum reference genomes: Sorghum bicolor (L.) Moench subsp. bicolor BTx623, RTx430, RTx436, Tx2783 and Rio. In addition to the sorghum genomes, the site hosts six plant outgroup species (rice, maize, Arabidopsis thaliana, Grapevine, a vascular plant, and a single-celled green algae) and Drosophila melanogaster, which are used to build 21,429 protein-coding gene family trees, a total of five pairwise DNA alignments for each of the sorghum genomes aligned to rice. The site also contains genetic variation, gene expression, orthology-based pathway projections for the reference genome S. bicolor BTx623 and phenotypes from quantitative trait locus (QTL) data developed by collaborators in Australia. The QTLs have been curated by our collaborators at Oregon State University and assigned 147 trait ontology (TO) and 23 Crop Ontology terms (CO) terms for a total of 232 traits from 146 publications. During the past year, we continued to meet with stakeholders to review their needs from resources and provided updates on the development of the SorghumBase portal ( We presented at two meetings, held two webinars, and have had more than 25 meetings with collaborators. The data integration has focused on genomes, genetics, and phenotypic variation. Working with Australian collaborators, we have staged the reference assemblies of 10 additional sorghum accessions and are working on 10 additional reference genome parents of the sorghum nest association panels with Clemson University and the University of North Carolina. We have also curated new genotype data from published resequencing data for the natural populations and a new EMS (Ethyl methanesulfonate) population to release in the next.

1. Gene structural annotations for 26 reference genomes of corn. A reference genome contains the blueprint for the parts list (the genes) and the instructions on when and where these genes should be expressed (regulatory regions) in an organism. While a single reference for a species is a good starting point, it does not capture the complement of genes within a species. In the last year an ARS researcher in Ithaca, New York, worked with academic partners in Cold Spring Harbor, New York, Athens, Georgia, St. Paul, Minnesota, and Ames, Iowa and refined methods to predict the genes for maize using information on gene transcripts, and proteins from other species. The group applied this method to 26 reference assemblies that are the parents of the maize nested association panel that contains a large portion of the variation segregating in maize and has been phenotyped by researchers across the globe for agronomic traits. Across the 26 genomes, a total of 103,033 pan-genes were identified, an increase of 40K genes from previous based on transcriptome assemblies of seedling RNA-seq reads from 500 individuals. The analyses suggest there are ~32K genes in the core/near-core portion of the pan-genome and 70,981 genes in the dispensable/private portion. The dispensable and private portion genes provide access to genes segregating within the species and associated with agronomic.

2. Release of two new high-quality reference genomes and the SorghumBase portal. Sorghum yield advancement has been relatively stagnant since the 1970s, and recently the sugarcane aphid has arisen as a major challenge for the sorghum industry. This pest and lower yields compared to maize are the major challenges to the sorghum community, which is limited in size and funding compared to maize. To break the yield plateau in grain sorghum and provide resources to address the threat of sugarcane aphid and other emerging pests, an integrated sorghum community-centric approach is necessary. ARS researchers in Ithaca, New York , and Lubbock, Texas, worked with academic partners in Cold Spring Harbor and commercial collaborators in Iowa to develop two new high-quality reference genomes Tx2783, the sugarcane aphid tolerant line, and RTx436 male parent inbred. These reference genomes and three publicly available accessions are included in the first release of the portal. Improved access and data management of these genomes will enable researchers to exploit the relevant biological knowledge and genetic resources to effectively address the needs of the sorghum research community to increase our ability to understand complex biological traits that are critical to U.S. agriculture.

Review Publications
Tello-Ruiz, M.K., Naithani, S., Gupta, P., Olson, A., Wei, S., Preece, J., Jiao, Y., Wang, B., Chougule, K., Garg, P., Elser, J., Kumari, S., Kumar, V., Contreras-Moreira, B., Naamati, G., George, N., Cook, J., Bolser, D.M., D'Eustachio, P., Stein, L.D., Gupta, A., Xu, W., Regala, J., Papatheodorou, I., Kersey, P.J., Flicek, P., Taylor, C., Jaiswal, P., Ware, D. 2020. Gramene 2021: harnessing the power of comparative genomics and pathways for plant research. Nucleic Acids Research. 49:D1452-D1463.
Wang, L., Lu, Z., Regulski, M., Jiao, Y., Chen, J., Ware, D., Xin, Z. 2021. BSAseq: An interactive and integrated web-based workflow for identification of causal mutations in bulked f2 populations. Bioinformatics. 37(3):382-387.
Vaughn, J.N., Korani, W., Stein, J.C., Edwards, J., Peterson, D.G., Simpson, S.A., Youngblood, R.C., Grimwood, J., Ware, D., Mcclung, A.M., Scheffler, B.E. 2021. Gene disruption by structural mutations drives selection in US rice breeding over the last century. PLoS Genetics. 17(3): e1009389.
Ou, S., Liu, J., Chougule, K., Fungtammasan, A., Seetharam, A., Stein, J., Llaca, V., Manchanda, N., Gilbert, A., Wei, S., Ware, D., Woodhouse, M.H., et all. 2020. Effect of sequence depth and length in long-read assembly of the maize inbred NC358. Nature Communications. 11.
Parry, G., Provart, N.J., Brady, S.M., Uzilday, B., Adams, K., Araujo, W., Aubourg, S., Baginsky, S., Bakker, E., Barenfaller, K., Batley, J., Beale, M., Beilstein, M., Belkhadir, Y., Berardini, T., Bergelson, J., Blanco-Herrera, F., Brady, S., Braun, H., Briggs, S., Brownfield, L., Cardarelli, M., Castellanos-Uribe, M., Coruzzi, G., Dassanayake, M., De Jaeger, G., Dilkes, B., Doherty, C., Ecker, J., Edger, P., Edwards, D., El Kasmi, F., Eriksson, M., Exposito-Alonso, M., Falter-Braun, P., Fernie, A., Ferro, M., Fiehn, O., Friesner, J., Greenham, K., Guo, Y., Hamann, T., Hancock, A., Hauser, M., Heazlewood, J., Ho, C., Horak, H., Huala, E., Hwang, I., Iuchi, S., Jaiswal, P., Jakobson, L., Jiang, Y., Jiao, Y., Jones, A., Kadota, Y., Khurana, J., Kliebenstein, D., Knee, E., Kobayashi, M., Koch, M., Krouk, G., Larson, T., Last, R., Lepiniec, L., Li, S., Lurin, C., Lysak, M., Maere, S., Malinowski, R., Maumus, F., May, S., Mayer, K., Mendoza-Cozatl, D., Mendoza-Poudereux, I., Meyers, B., Micol, J., Millar, H., Mock, H., Mukhtar, K., Mukhtar, S., Murcha, M., Nakagami, H., Nakamura, Y., Nicolov, L., Nikolau, B., Nowack, M., Nunes-Nesi, A., Palmgren, M., Parry, G., Patron, N., Peck, S., Pedmale, U., Perrot-Rechenmann, C., Pieruschka, R., Pio-Beltran, J., Pires, J., Provart, Rajjou, L., Reiser, L., Reumann, S., N., Rhee, S., Rigas, S., Ware, D. 2020. Current status of the multinational Arabidopsis community. Plant Direct. 00:1-9.
Xu, X., Crow, M., Rice, B.R., Li, F., Harris, B., Liu, L., Arevalo, E.D., Lu, Z., Jackson, D., Ware, D., Wang, L., Fox, N., Wang, X., Drenkow, J., Luo, A., Char, S., Yang, B., Sylvester, A., Gingeras, T., Schmitz, R., Lipka, A., Gillis, J. 2021. Single-cell RNA sequencing of developing ears facilitates functional analysis and trait gene discovery in maize. Developmental Cell. 56:557-568.
Hufford, M.B., Seetharam, A.S., Woodhouse, M.H., Chougle, K.M., Ou, S., Liu, J., Ricci, W.A., Guo, T., Olson, A., Qiu, Y., Portwood II, J.L., Cannon, E.K., Andorf, C.M., Ware, D., Dawe, K.R. et al. 2021. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science. 373(6555):655-662.
Diniz, A., Da Silva, D. ., Lembke, C.A., Costa, M.B., Ten-Caten, F., Li, F., Vilela, R., Menossi, M., Ware, D., Engres, L., Souza, G. 2020. Amino acid and carbohydrate metabolism are coordinated to maintain energetic balance during drought in sugarcane. International Journal of Molecular Sciences. 21(23):9124.