Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #425848

Research Project: Enhancing Plant Genome Function Maps Through Genomic, Genetic, Computational and Collaborative Research

Location: Plant, Soil and Nutrition Research

2018 Annual Report

1: Apply computational, genomic, genetic and/or systems biology approaches to develop new models for plant genome structure and organization that advance our understanding of plant evolution and diversity. 1.1: Establish an integrated reference genome resource for plant genomes. 1.2: Analysis and visualization of genotypic, epigenomic, and functionally phenotypic diversity. 1.3: Comparative genomics: analysis of plant genomes (stewardship of reference resource) and visualization informed by evolutionary histories. 2: Analyze and develop genome level regulatory network models that focus on and integrate the processes underlying plant development and responses to environmental change. 2.1: Develop genome-wide functional networks for the model plant genome Arabidopsis. 2.2: Crop GRNs to support functional prediction for agriculturally relevant phenotypes. 3: Collaborate, develop and implement new standards for the management and analysis of plant genomic, genetic and phenotypic information to facilitate integration and interoperability between biological databases. 4: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources. 5: Accelerate sorghum trait analysis, germplasm analysis, genetic studies, and breeding by acquiring, integrating, and providing open access to sorghum genome sequences and annotations, germplasm diversity information, trait mapping information, and phenotype information in a sorghum crop genome database system, with an initial emphasis on sugarcane aphid resistance.

We propose to leverage emerging and standard computational and experimental approaches, building on existing and newly developed resources to support stewardship of plant genome reference sequences, genome annotations and gene networks. This will support development of a common standard platform for comparative genomic analysis and visualization. The enriched genome annotations will include controlled vocabularies to describe metadata and primary data associated with comparative phylogenomics, epigenetics, and population-based phenotypes. The proposed research in gene networks is directed at the development and validation of gene regulatory networks (GRN). The network view of the underlying molecular processes will enhance the fundamental biological understanding of development and abiotic stress responses and its relationship to agronomic traits. The computationally predicted and experimentally verified sub-networks combined with the prioritized regulatory gene targets will provide focal points for further research at gene-by-gene level. They will be integrated with the suite of genetic resources obtained from Objective 1, including SNPs and orthology mapping, and thus will be a resource for breeders and researchers engaged in molecular breeding approaches and segregation analysis. Genome-wide network reconstructions will be quite useful in quantifying and characterizing the genotype-to-phenotype relationships. We propose to leverage and build upon existing infrastructure to manage and analyze plant genomic, genetic, and phenotypic data. The resources will focus on the delivery of anticipated products from Objectives 1 and 2 with a focus on plant datasets, but much of the software will be species-agnostic, making the resources developed from the project usable to a broader audience including animals, insects, and fish relevant to agriculture, human health, and a sustainable environment.

Progress Report
This is the final report for project 8062-21000-041-00D, which terminated on April 3, 2018. Progress was made on all five objectives. It should be noted that the last two objectives were added later in the project. Objective 1: Over the course of this project, we have supported standard representation of plant reference genomes and comparative analyses in Gramene and Ensembl Plants using the Ensembl infrastructure. The outcome of the collaboration is a scalable solution for supporting data management of reference genomes, their annotations, as well as reference for gene homology relationships. Developed jointly by the Cold Spring Harbor Laboratory (CSHL) and European Bioninformatics Institute (EBI) teams, this Gramene/Ensembl Plant resource now stands at 53 distinct species. Each browser provides community-supported gene structural annotations, functional domains, and gene ontologies. For rice, maize and Arabidopsis, which are the targeted species in this project, the browsers also display curated QTL studies and aligned transcriptomic, and epigenetic data. Single nucleotide polymorphism (SNP) and structural variation are provided for 12 species. Variant sites are annotated for predicted functional consequences, and individual genotypes are provided across germplasm accessions. The Ensembl data model also accommodates phenotypic data, including genome-wide association studies (GWAS) in Arabidopsis and other species. With the genomes and community annotations as input, reference gene trees have been produced using Ensembl Compara gene tree pipeline that performs an all-vs-all phylogenetic analysis of protein-coding genes. We also compare genomes using whole-genome alignment and gene level synteny over ancestrally conserved regions. Taken together, these maps allow users to visualize structural differences and their impact on gene movement, duplication, and loss. Ortholog assignments in gene trees enable cross-species comparison of gene function. We have worked with collaborators this year to renew the Gramene project grant. We continue to evaluate emerging sequencing technologies to support the reference plant genomes, characterize genetic variation and improve gene structure and functional annotations. We published the generation of a new reference maize B73 genome, using a hybrid approach. At par with the quality of human genome, the revised assembly decreases the gaps in the 3kb flanking sequence of genes affected from 20% to less than 1%, thus substantially improving the annotation of core promoter elements, and has nearly complete centromeres and telomeres. We have also contributed to the publication of 2 additional maize accessions (w22 and Mo17) and 13 rice genomes, using both short read and long read sequencing technologies. Using these genomes, we were able to identify structural variation within species and in the case of rice, identified specific genes that provide genetic resistance to disease pathogens. We further demonstrated that some of these genes were in rice cultivated in Asia and Africa thousands of years ago. We published an expanded catalog of maize and sorghum transcriptomes, improving the number of alternate transcripts from 1.6 to 3.3 per gene. Using these data sets, we performed an evolutionary analysis and found young genes were likely to be generated in reproductive tissues, and usually had fewer isoforms than old genes. In the last two years, we have worked with collaborators to identify funds to support PacBio assemblies and annotation of all 25 maize parents of the NAM population and 11 rice accessions including Carolina Rice. While the long read sequencing provides the most robust reference assemblies, they are cost prohibitive at this time to produce more than a few reference assemblies. We continue to explore other effective measures and have established collaborations to utilize linked read technology, 10X genomics, to produce draft assemblies of grape, sorghum, and maize. There has always been recognition that a single reference genome was not sufficient to represent the complexity of the genetic diversity within a species, but it has only been possible in the last few years to begin to generate the data at scale. With the ability to generate many reference genomes for each species, it will require new approaches to manage the pan-genome models. As part of our work to explore pan-genome models, we contributed to the development of ACE and ACE+ (Accessing Changes to Exons), a software tool to generate structural gene annotations for each sequenced individual in a population. We have built a prototype pan-genome browser for maize (4 genomes), rice (13 genomes), and grape (3 genomes), utilizing the Ensembl infrastructure. This effort focuses on comparative and phylogenetic analysis to support storing and accessing gene based structural variation with satellite browsers for visualization. Objective 2: The goal of this work is to integrate genetics and genomics data sets to identify molecular networks that influence the morphology (architecture) of plants (root, stem, and flower tissues) and their response to the environment stress (such as low nitrogen and phosphorous). Over the course of this project, we have generated genomic, genetic, and molecular resources in the model Arabidopsis, maize, and sorghum. A major focus of this objective has been the development of molecular networks developed in model Arabidopsis, and the use of both forward and reverse genetics to exploit a sorghum ethyl methanesulfonate (EMS) population developed by collaborators in Lubbock, Texas. Over the course of this project, we have published models on genes contributing to flower architecture, cell wall biogenesis, grain number, and cuticle wax. These models have been used to identify candidate genes to move forward in germplasm trials. We continue to dissect gene networks on nitrogen use efficiency (NUE), phosphorous use efficiency, water use efficiency, vegetative branching, root architecture and diseases resistance, and anticipate publications on these in the near future. Objective 3: Over the course of this research with our colleagues in major cyberinfrastructure projects, we have contributed to the development and adoption of national high performance computing (HPC) and cloud resources to support the storage and analyses of the genome and phenotype data. Such initiatives include collaboration with the Department of Energy (DOE) Systems Biology Knowledgebase (KBase), the National Science Foundation (NSF) CyVerse (formerly iPlant), NSF Gramene, and EBI EnsemblGenomes. We have contributed to publications on all four platforms and focused on development, implementation and improvement of workflows for genome assembly, annotations, transcript and epigenetic profiling, and comparative analyses. We have invested efforts in KBase platform to expand and improve support for gene and transcript expression profiling, functional enrichment of gene sets, and genome annotation of metabolic pathways using PlantSEED. We have invested efforts in Cyverse to improve access to software to support gene structure and functional annotation, transcript expression profiling, and data management and sharing, contributing to the development of the DataCommons and GenBank submission services. A major challenge in data analyses is integrating data and analyses. Key to this is capturing meta data associated with the sample and the analysis workflow. A more recent effort has been the development of SciApps, a lightweight bioinformatics workflow system powered by the CyVerse infrastructure that uses the Agave Science API (application programming interface) to manage the jobs and workflows using XSEDE HPC and the CyVerse Data Store. SciApps provides access to resources to a scientist with limited programing skills, through a graphical user interface for job submission, workflow creation, and management of both jobs and workflows. In addition to the software development, we provided continued delivery of webinars, workshops, and training to support the scientific community, via the Gramene, KBase and CyVerse infrastructure. Over the course of this project, we have participated in outreach and training activities at more than 20 international/domestic conferences. We have worked directly with commodity stakeholders in maize, sorghum, rice and grape and are working with collaborators to secure renewal of funding for these projects. Objective 4: In addition to the infrastructure outlined in Objective 3, an ARS scientist has actively contributed to the ARS Big Data initiative, including the development and adoption of the SCINet platform. After serving as acting ARS CSIO for three years, the scientist supported the transition of this role to another ARS scientist, and continue to work closely with the acting Chief Scientific Information Officer, Associate Director, Chief Information Officer and Chief Technology Officer, to support the SCINet platform development. Objective 5: This is a new objective, added in the last year, targeting the development of genomic resources and data management for sorghum with an initial emphasis on sugarcane aphid (SCA) resistance. During the last year, we have met with stakeholders to review needs of the resources and provide updates on the development, architected the initial site, and established a domain name ( We have generated draft assemblies of 4 accessions, including the SCA tolerant line, utilizing linked read technologies (10X genomics) and have contracted for long read sequences to improve the scaffolds and fill in the gaps. We generated full-length cDNA from 11 tissues of BTX623, using PacBio isoseq, and are in the process of improving the transcript annotation for the reference assembly. We have worked with collaborators in the UK to develop capture arrays for grass disease resistance genes, and will evaluate these using eight sorghum lines.

1. Characterization of transcriptional regulation of nitrogen associated metabolism and growth. Nitrogen is an essential nutrient for plant growth and metabolic processes. Plants perceive nitrogen deficiency as a stress resulting in altered plant development and a reduction in growth and yield. Application of nitrogen fertilizers results in increased plant growth and played a role in the green revolution. Ecologically, excess fertilizer has a negative impact on the environment. A better understanding of how plants regulate nitrogen metabolism is critical to improve agricultural productivity. An ARS researcher in Ithaca New York, worked with academic and industry partners in Cold Spring Harbor, New York; Davis, California; and Johnston, Iowa, to map the transcriptional regulatory network of a model plant Arabidopsis, and identified 23 novel transcription factors that regulate root and shoot architecture under nitrogen deprivation. Genetic perturbation of the genes in the network identified coordinated control of nitrogen metabolism genes. This work has added to previous studies of nitrogen metabolism by identifying a core set of nitrogen metabolism genes and their upstream regulators. This knowledge provides insights for engineering approaches to improve nitrogen use efficiency in crop plants.

2. Isoform profiling of transcripts provides clues to morphological and functional differences between maize and sorghum. Plant development, growth, and response to the environment, are the direct outcome of gene products encoded in the genome. For each gene loci there can be one or more transcript products, and these different isoforms can result in differences in plant growth. Characterization of these different transcripts, where and when they are expressed, and how they evolve in evolutionary time can provide insights in genes involved in morphology and adaptive responses. An ARS researcher in Ithaca, New York, worked with academic and industry partners in Cold Spring Harbor, New York, and Menlo Park, California, to improve the transcriptional isoforms of maize and sorghum in 11 different tissues.

Review Publications
Friesner, J., Assmann, S., Bastow, R., Bailey-Serres, J., Beynon, J., Brendel, V., Buell, R., Bucksch, A., Demura, T., Dinneny, J., Doherty, C., Eveland, A., Falter-Braun, P., Gehan, M., Gonzales, M., Grotewold, E., Gutierrez, R., Kramer, U., Krouk, G., Ma, S., Markelz, R., Megraw, M., Meyers, B., Murray, J., Provart, N., Rhee, S., Smith, R., Spalding, E., Teal, T., Torii, K., Town, C., Vaughn, M., Vierstra, R., Ware, D., Wilkins, O., Williams, C., Brady, S. 2017. The next generation of training for Arabidopsis researchers: bioinformatics and quantitative biology. Plant Physiology. 175:1499-1509.
Wang, L., Zu, Z., Van Buren, P., Ware, D. 2018. SciApps: A cloud-based platform for reproducible bioinformatics workflows. Bioinformatics.
Majsec, K., Bhuiyan, N., Sun, Q., Kumari, S., Kumar, V., Ware, D., Van Wijk, K. 2017. The plastid and mitochondrial peptidase network in Arabidopsis thaliana: a foundation for testing genetic interactions and functions in organellar proteostasis. The Plant Cell.
Wang, B., Regulski, M., Tseng, E., Olson, A., Goodwin, S., Mccombie, R., Ware, D. 2018. A comparative transcriptional landscape of maize and sorghum obtained by single-molecule sequencing. Genome Research. doi:10.1101/gr.227462.117.
Jiao, Y., Burow, G.B., Gladman, N., Acosta Martinez, V., Chen, J., Burke, J.J., Ware, D., Xin, Z. 2018. Efficient identification of causal mutations through sequencing of bulked F2 from two allelic bloomless mutants. Frontiers in Plant Science.
Jiao, Y., Lee, Y., Gladman, N., Chopra, R., Christensen, S.A., Burow, G.B., Hayes, C.M., Burke, J.J., Ware, D., Xin, Z., Regulski, M. 2018. MSD1 regulates pedicellate spikelet fertility through the jasmonic acid pathway in sorghum. Nature Communications. 9(1):822.
Jiao, Y., Peluso, P., Liang, T., Shi, J., Stitzer, M., Wang, B., Campbell, M., Stein, J., Wei, X., Chin, J., Guill, K.E., Regulski, M., Sunita, K., Olson, A., Gent, J., Schneider, K., Wolfgruber, T., May, M., Springer, N., Antoniou, E., McCombie, R., Presting, G., McMullen, M.D., Ross-Ibarra, J., Kelly, D., Hastie, A., Rank, D., Ware, D. 2017. Improved maize reference genome with single-molecule technologies. Nature. doi: 10.1038/nature22971.