Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #434556

Research Project: Mapping Crop Genome Functions for Biology-Enabled Germplasm Improvement

Location: Plant, Soil and Nutrition Research

2020 Annual Report

Objective 1: Apply comparative genomic, genetic, and molecular approaches to the dissection of complex traits and the understanding of genome functions; develop and implement new standards for the management and analysis of plant genomic, genetic and phenotypic information; and dissect gene networks associated with programming crop plant development and adaptation to environment (GxE). Sub-objective 1.A: Reference genomic resources will be generated to target support four crop communities. The achievement of this objective will generate information management resources for maize (Zea mays), sorghum (Sorghum bicolor), grapevine (Vitis vinifera), and rice (Oryza sativa). Sub-objective 1.B: Develop functional and comparative genomics resources for plant reference genomes. The achievement of this objective will expand the Gramene databases to encompass reference genomes of at least 75 unique plant species. Sub-objective 1.C: Develop functional networks for crop and model species. Through achieving this objective, an integrated genetics, transcriptomics, and molecular interaction data will be generated to define regulatory networks that influence plant traits through effects on developmental morphology (architecture) and response to environment. Objective 2: Accelerate sorghum trait analysis, germplasm analysis, genetic studies, and breeding by acquiring, integrating, and providing open access to sorghum genome sequences and annotations, germplasm diversity information, trait mapping information, and phenotype information in a sorghum crop genome database system, with an initial emphasis on sugarcane aphid resistance.

The future of crop breeding will increasingly rely on strategies that combine genetic resources with rapidly advancing tools and knowledge in genomics, trait mapping, high-throughput phenotyping, and genome-editing. Yet, major challenges remain in translating vast amounts of data into useable biological models and building scalable information systems to enable researchers and breeders to contribute to and exploit these future technologies. To meet these challenges, this project will engage several strategic initiatives and collaborations that produce new genomics data, cyberinfrastructure, and hypothesis-based research. The first objective will generate new genomics datasets among four crop groups: sorghum, maize, rice and grapevine. Objective 1.A will produce a minimum of 30 high-quality reference genome assemblies, transcriptomes, and corresponding annotations. In maize and sorghum, we will also generate ENCODE-type molecular data sets to study the relationships between chromatin structure, gene expression, and phenotype. In sorghum and grapevine, we will sequence disease resistance genes across key germplasms that target critically important pests/pathogens. To enable sharing of reproducible workflows and promote interoperability, computational work will be performed using the recently developed SciApps cyberinfrastructure. In Objective 1.B, genomics data will be further disseminated via Gramene/Ensembl to support genome stewardship, comparative and pan-genomics analysis (in 2-3 crop groups), and display of ENCODE-type and publicly archived variation/genotype data. This platform will enable researchers to evaluate structural variation within crop clades and use conservation profiles to evaluate candidate genes. In Objective 1.C, we will continue several hypothesis-based studies of gene regulatory networks that underlie yield components influenced by morphological development and nutrient and stress response/adaptation. These projects combine forward and reverse genetics with transcriptional profiling, fluorescence in situ sequencing, and yeast-based molecular interaction assays to elucidate regulatory pathways that control plant traits. We will continue to use sorghum EMS mutagenized lines to dissect pathways underlying inflorescence architecture and the multi-seeded trait. Research in nitrogen use efficiency will be continued using maize and Arabidopsis as models. Objective 2 focuses on development of the new sorghum genomics and genetics portal to serve scientists and breeders working on grain sorghum improvement. Goals include initial release of Sorghum Base as a comparative functional genomics resource, with future development of infrastructure to support phenotypic data and genomics-enabled germplasm improvement. A critical component of this plan includes sorghum community engagement. The products of these two objectives will include well-characterized germplasm and the associated genotypic and phenotypic characterization of complex agronomic traits, which will enable genomic-assisted breeding and novel approaches for understanding the genetic architecture of traits critical to US agriculture.

Progress Report
In the last year, we pressed forward with long single molecule (LSM) sequencing to support reference genome assemblies, while optimizing the parameters (such as genome coverage and read length of PacBio LSM sequencing and Bionano LSM optical maps) for assemblies. We continued benchmarking assembly tools with academic, government and industry partners. Draft assemblies for 26 maize accessions representing the parents of the maize nested-association-mapping (NAM) panel, recurrent parent B73 and B73 Abnormal, were released to the community. The contig N50 for B73_v5 reference is 52 Mb, approximately 50 times greater than the previous B73_v4 reference 1 Mb. Two sorghum genomes - TX2783 (Sugarcane Aphid resistant) and TX436 (High food quality male parental line) were assembled. The final assembly of TX2783 consists of 19 scaffolds with a contig N50 of 25.6 Mb, and that of TX436 consists of 18 scaffolds with a contig N50 of 20.3 Mb. We generated reference transcriptomes for two sorghum accessions TX2783 and TX436, using RNAseq. We continue to improve and extend annotation pipelines. For the transposable element (TE), we benchmarked several programs using curated libraries with collaborators. Extensive de novo TE Annotator (EDTA) was shown to be robust across both plant and animal species. For gene annotation, we continued our evaluation of different methods and finalized a workflow that generated both evidenced and ab initio models. The workflow was applied to 27 maize and 4 sorghum accessions. Initial review of the maize annotations showed an improved annotation evidence distance (AED). We continued development on SciApps, a lightweight bioinformatics workflow system. This year we improved the robustness and usability by working with collaborators in maize and added new workflows for Bulked Segregant Analysis (BSA) with sorghum. In addition to the genomic resources and software development, we provided continued delivery of webinars, workshops, and training to support the scientific community via the Gramene, SciApps, and KBase infrastructure, while participating in more than five international and domestic conferences. We continue to participate in the Agbio consortium for standards for sustainable genomics and genetics databases for agriculture. We also continued to deliver the Gramene portal. The portal consists of comparative genomics databases in collaboration with the Ensembl Genomes project at the European Bioinformatics Institute (EMBL-EBI), Plant Reactome, and with the EBI’s Expression Atlas project to provide manually curated, quality-controlled genome pathways and analyzed transcriptomic data. The genome data and pathways are made accessible using FAIR principles (findable, accessible, interoperable and reproducible), and adhere to the open standards for agricultural data management and stewardship. During this reporting period, we made two releases of the portal. The resource now stands at 67 distinct species spanning major crops, models, and lower plants. Together with collaborators, a proposal for additional funding for Gramene was submitted to NSF and is under review. We have prototyped three species: rice, grape, and maize. This year’s efforts focused on improving scalability. To this end, we established shared core outgroups and a path forward for incremental builds of the gene trees. We continued work on maize community curation and contributed to a virtual workshop targeting the curation of the new maize B73_v5 gene models. The workshop outcomes included review of new v5 models; curated gene models; development of standard benchmarking data sets; recommendations for improvement in the usability of the tools; and recommendations for changes to the maize community nomenclature. A manuscript on core markers for grapevine was submitted last year by collaborators, updated and accepted. We continue to work with the rice community with a draft manuscript on US rice germplasm in progress. Through achieving the development of functional networks for crop and model species, integrated genetics, transcriptomics, and molecular interaction, data will be generated to define regulatory networks that influence plant traits through effects on developmental morphology (architecture) and response to environment. Biological models provide insights for engineering and improved germplasm through marker assisted breeding, or directed mutagenesis such as Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR). We are continuing to dissect gene networks on development: grain number, shoot apical meristems, male sterility, vegetative branching, root system architecture (RSA), response to environment, nitrogen use efficiency (NUE), phosphorous use efficiency (PUE), water use efficiency (WUE), and diseases resistance. In the last year, we have contributed to two manuscript submissions on grain number. Using BSA of F2 sorghum individuals, we have previously characterized that multi-seeded (msd), msd1 mutations located within a TCP transcriptional factor and the msd2 mutations a Lipoxygenase (LOX) enzyme. In the last year, we confirmed msd3 mutations in a Fatty Acid Desaturase (FAD) enzyme. We finalized the manuscript on MSD2 and submitted MSD3 genes. With collaborators at Cold Spring Harbor, we are characterizing five maize transcription factor (TF) targets using ChIP-Seq in the maize shoot apical meristem (SAM). The plant’s growing shoot tip orchestrates the balance between stem cell renewal and organ initiation essential for post-embryonic growth. Libraries have been completed, and preliminary analyses of the initial ChIP-Seq experiments are under way. In root architecture, we identified 30 putative Arabidopsis TF candidates, T-DNA insertion mutants have been obtained, and genotyped and genetic crosses for approximately 50 double mutants from 15 TF families are in progress. To support functional characterization of crop genes informed from our previous Arabidopsis NUE GRN, we have obtained 40 available mu-Insertions lines and approximately 10 sorghum EMS candidates and will be backcrossing these this summer. Arabidopsis T-DNA mutants for five TFs, predicted to have a fitness impact on low phosphorous were obtained, validated, and seeds bulked. Preliminary screens suggest at least one candidate exhibited a change in root growth under low phosphorous. With collaborators from Brazil and Canada, we previously generated transcript profiles, open chromatin states, and histone marks for three sorghum genotypes under low and normal phosphorous, and analyses are currently underway. In the past year, we have continued to meet with stakeholders to review their needs from resources and provided updates on the development of the SorghumBase portal ( Based on recommendations from the working group, the portal release was moved from Winter of 2019 to Fall of 2020 to allow for the integration of additional genomes. We are finalizing two reference genomes TX436 & 2783 and have curated three additional genomes, BTX623, Tx430, and Rio. We have coordinated standard sorghum nomenclature for assigning gene IDs and using the nomenclature utilized for BTX623. We have obtained access to the genotype data for the Bioenergy Association Population (BAP), began coordinated access to the Sorghum Association Panel (SAP) genotypes, and a obtained second EMS population. We established collaborations with four research groups to support the integration of an additional 20 reference assemblies in the next 18 months. We have initiated integration of phenotype data utilizing the existing Ensembl infrastructure and have modeled QTLs from an existing sorghum QTL, and have assigned 147 trait ontology (TO) and 23 Crop Ontology terms (CO) terms for a total of 232 traits from 146 publications. We held two webinars and have held more than 20 meetings with individual collaborators to support coordination, standards, and access to the sorghum germplasm, genome, genotype and phenotype data.

1. Full reference genomes for corn. Accurate reference assemblies are critical resources providing the blueprint for an individual organism. Corn genomes are complex, highly repetitive, and fluid genomes, where a single reference genome may only account for about 50 percent of the gene content in the species. The ever-evolving sequencing technologies allow us to design and optimize the assembly algorithms and workflows and fine-tune assembly parameters. An ARS researcher in Ithaca, New York, worked with academic and industry partners in Cold Spring Harbor, New York; Athens, Georgia; Ames, Iowa; and Mountain View, and California, on benchmarking sequencing and assembly approaches in corn. The study led to recommendations for the amount and quality of the sequence, and the analysis programs needed to support a reference assembly in corn and other plant genomes.

2. Identification of genes controlling sorghum grain number. Identification of genes controlling sorghum grain number: Flower development, the number of grains a plant will produce, is one component of the complex trait of yield. ARS researchers in Ithaca, New York, and Lubbock, Texas, worked with academic partners in Cold Spring Harbor, New York, to identify two genes in a plant hormone pathway that can lead to an increase in fertile flowers and an increase in grain number. This provides valuable insights for engineering approaches to modulate grain number in sorghum and also has a potential to inform other closely related crops.

Review Publications
Gladman, N.P., Jiao, Y., Lee, Y., Zhang, L., Chopra, R., Regulski, M., Burow, G.B., Hayes, C.M., Christensen, S.A., Dampanaboina, L., Chen, J., Burke, J.J., Ware, D., Xin, Z. 2019. Fertility of pedicellate spikelets in sorghum is controlled by a jasmonic acid regulatory module. Nature Plants. 20(19).
Wang, B., Kumar, V., Olson, A., Ware, D. 2019. Reviving the Transcriptome Studies: An insight into the emergence of Single-Molecule Transcriptome Sequencing. Frontiers in Genetics. 10:384.
Chen, J., Jiao, Y., Echevarria Laza, H.J., Payton, P.R., Ware, D., Xin, Z. 2019. Identification of the first nuclear male sterility gene (Male-sterile 9) in sorghum. The Plant Genome. 12.
Gaudinier, A., Rodrigues-Medina, J., Zhang, L., Olson, A., Liseron-Monfils, C., Ware, D. 2018. Transcriptional regulation of nitrogen and nitrogen-related metabolism in Arabidopsis. Nature. 563:259-264.
Wittmeyer, K., Cui, J., Chaterjee, D., Lee, T., Tan, Q., Jiao, Y., Wang, P., Gaffoora, I., Ware, D., Meyers, B., Chopra, S. 2018. The dominant and poorly penetrant phenotypes of maize unstable factor for orange1 are caused by DNA methylation changes at a linked transposon. The Plant Cell.
Dampanaboina, L., Jiao, Y., Chen, J., Gladman, N.P., Burow, G.B., Hayes, C.M., Christensen, S.A., Burke, J.J., Ware, D., Xin, Z. 2019. Sorghum MSD3 encodes omega-3 fatty acid desaturase that regulates grain number by reducing jasmonic acid levels. International Journal of Molecular Sciences. 20(21):5359.
Ou, S., Su, W., Liao, Y., Chougule, K.M., Agda, J.R., Hellinga, A.J., Lugo, C., Elliott, T.A., Ware, D., Peterson, T., Jiang, N., Hirsch, C.N., Hufford, M.B. 2019. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biology. 20(1):275.
Wang, L., Lu, Z., Delabastide, M., Van Buren, P., Wang, X., Ghiban, C., Regulski, M., Drenkow, J., Ware, D., Gingeras, T., Xu, X., Ramirez, C., Fernandez Marco, C., Williams, J., Dobin, A., Birnbaum, K., Jackson, D., Martienssen, R., Mccombie, R.W., Micklos, D., Schatz, M. 2020. Management, analyses, and distribution of the MaizeCODE Data on the Cloud. Frontiers in Plant Science. 31.
Wang, B., Tseng, E., Baybayan, P., Eng, K., Regulski, M., Jiao, Y., Wang, L., Olson, A., Chougule, K., Van Buren, P., Ware, D. 2020. Variant phasing and haplotypic expression from long-read sequencing in maize. Communications Biology. 3(78).
Knauer, S., Javell, M., Li, L., Li, X., Ma, X., Wimalanathan, K., Kumari, S., Johnston, R., Leiboff, S., Meeley, R., Schnable, P.S., Ware, D., Lawrence-Dill, C., Yu, J., Muehlbauer, G.J., Scanlon, M.J., Timmermans, M.C. 2019. A high-resolution gene expression atlas links dedicated meristem genes to key architectural traits. Genome Research. 29(12):1962-1973.
Liu, J., Seetharam, A.S., Chougule, K.M., Ou, S., Swentowsky, K.W., Gent, J.I., Llaca, V., Woodhouse, M.H., Manchanda, N., Presting, G.G., Kurdna, D.A., Alabady, M., Hirsch, C.N., Fengler, K.A., Ware, D., Michael, T.P., Hufford, M.B., Dawe, R.K. 2020. Gapless assembly of maize chromosomes using long-read technologies. Genome Biology. 21.
Zou, C., Karn, A., Reisch, B., Nguyen, A., Sun, Y., Bao, Y., Campbell, M.S., Church, D., Williams, S., Xu, X., Ledbetter, C.A., Patel, S., Fennell, A., Glaubitz, J., Clark, M., Ware, D., Londo, J.P., Sun, Q., Cadle Davidson, L.E. 2020. Haplotyping the Vitis collinear core genome with rhAmpSeq improves marker transferability in a diverse genus. Nature Communications.
Tello-Ruiz, M.K., Marco, C.F., Hsu, F., Khangura, R.S., Qiao, P., Sapkota, S., Stiszer, M.C., Wasikowski, R., Wu, H., Junpeng, Z., Chougule, K., Barone, L.C., Ghiban, C., Muna, D., Olson, A.C., Wang, L.C., Ware, D., Micklos, D.A. 2019. Double triage to identify poorly annotated genes in maize: The missing link in community curation. PLoS Computational Biology. 14(10).
Naithani, S., Gupta, P., Preece, J., D'Eustachio, P., Elser, J.L., Garg, P., Dikeman, D.A., Kiff, J., Cook, J., Olson, A., Wei, S., Tello-Ruiz, M.K., Mundo, A.F., Munoz-Pomer, A., Mohammed, S., Cheng, T., Bolton, E., Papatheodorou, I., Stein, L., Ware, D., Jaiswal, P. 2020. Plant Reactome: a knowledgebase and resource for comparative pathway analysis. Nucleic Acids Research. 48(D1):D1093-D1103.
Sun, S., Zhou, Y., Chen, J., Shi, J., Zhao, H., Zhao, H., Song, W., Zhang, M., Cui, Y., Dong, X., Liu, H., Ma, X., Yinping, J., Bo, W., Wei, X., Stein, J., Glaubitz, J., Lu, F., Yu, G., Liang, C., Fengler, K., Li, B., Rafalski, A., Schnable, P., Ware, D., Buckler IV, E.S., Lai, J. 2018. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nature Genetics.
Friesner, J., Assmann, S., Bastow, R., Bailey-Serres, J., Beynon, J., Brendel, V., Buell, R., Bucksch, A., Demura, T., Dinneny, J., Doherty, C., Eveland, A., Falter-Braun, P., Gehan, M., Gonzales, M., Grotewold, E., Gutierrez, R., Kramer, U., Krouk, G., Ma, S., Markelz, R., Megraw, M., Meyers, B., Murray, J., Provart, N., Rhee, S., Smith, R., Spalding, E., Teal, T., Torii, K., Town, C., Vaughn, M., Vierstra, R., Ware, D., Wilkins, O., Williams, C., Brady, S. 2017. The next generation of training for Arabidopsis researchers: bioinformatics and quantitative biology. Plant Physiology. 175:1499-1509.
Kersey, P., Allen, J., Allot, A., Barba, M., Boddu, S., Bolt, B., Carvalho-Silva, D., Christensen, M., Davis, P., Grabmueller, C., Kumar, N., Liu, Z., Maurel, T., Moore, B., Mcdowall, M., Maheswari, U., Naamati, G., Newman, V., Ong, C., Paulini, M., Pedro, H., Perry, E., Russell, M., Sparrow, H., Tapanari, E., Taylor, K., Vullo, A., Williams, G., Zadissia, A., Olson, A., Stein, J., Wei, S., Tello-Ruiz, M., Ware, D., Luciani, A., Potter, S., Finn, R., Urban, M., Hammond-Kosack, K., Bolser, D., Nishadi, D., Howe, K., Langridge, N., Maslen, G., Staines, D., Yates, A. 2018. Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species. Nucleic Acids Research. 46(D1):D802-D808. doi: 10.1093/nar/gkx1011.
Harper, E.C., Campbell, J., Cannon, E.K., Jung, S., Main, D., Poelchau, M.F., Walls, R.L., Andorf, C.M., Arnaud, E., Berardini, T.Z., Birkett, C.L., Cannon, S.B., Carson, J., Condon, B., Cooper, L., Dunn, N., Elsik, C., Farmer, A., Ficklin, S., Grant, D.M., Grau, E., Hendon, N., Hu, Z., Humann, J., Jaiswal, P., Jonquet, C., Laporte, M., Larmande, P., Lazo, G.R., McCarthy, F., Menda, N., Mungall, C., Munoz-Torres, M., Naithani, S., Nelson, R., Nesdill, D., Park, C., Reecy, J., Reiser, L., Sanderson, L., Sen, T.Z., Staton, M., Subramaniam, S., Karey, T., Unda, V., Unni, D., Wang, L., Ware, D., Wegrzyn, J., Williams, J., Woodhouse, M. 2018. AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database: The Journal of Biological Databases and Curation. 2018(1):1-32.
Howe, K.L., Contreras-Moreira, B., De Silva, N., Maslen, G., Akanni, W., Allen, J., Alvarez-Jarreta, J., Barba, M., Bolser, D.M., Cambell, L., Carbajo, M., Chakiachvili, M., Christensen, M., Cummins, C., Cuzick, A., Davis, P., Fexova, S., Gall, A., George, N., Gil, L., Gupta, P., Hammond-Kosack, K.E., Haskell, E., Hunt, S.E., Jaiswal, P., Janacek, S.H., Kersey, P.J., Langridge, N., Maheswari, U., Maurel, T., Mcdowall, M.D., Moore, B., Muffato, M., Naamati, G., Naithani, S., Olson, A., Papatheodorou, I., Patricio, M., Paulini, M., Pedro, H., Perry, E., Preece, J., Rosello, M., Russell, M., Sitnik, V., Staines, D.M., Stein, J., Tello-Ruiz, M.K., Trevanion, S.J., Urban, M., Wei, S., Ware, D., Williams, G., Yates, A.D., Flicek, P. 2020. Ensembl Genomes 2020-enabling non-vertebrate genomic research. Nucleic Acids Research. 48(D1,8):D689-D695.