Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #425848

Research Project: Enhancing Plant Genome Function Maps Through Genomic, Genetic, Computational and Collaborative Research

Location: Plant, Soil and Nutrition Research

2016 Annual Report

1: Apply computational, genomic, genetic and/or systems biology approaches to develop new models for plant genome structure and organization that advance our understanding of plant evolution and diversity. 1.1: Establish an integrated reference genome resource for plant genomes. 1.2: Analysis and visualization of genotypic, epigenomic, and functionally phenotypic diversity. 1.3: Comparative genomics: analysis of plant genomes (stewardship of reference resource) and visualization informed by evolutionary histories. 2: Analyze and develop genome level regulatory network models that focus on and integrate the processes underlying plant development and responses to environmental change. 2.1: Develop genome-wide functional networks for the model plant genome Arabidopsis. 2.2: Crop GRNs to support functional prediction for agriculturally relevant phenotypes. 3: Collaborate, develop and implement new standards for the management and analysis of plant genomic, genetic and phenotypic information to facilitate integration and interoperability between biological databases. 4: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources.

We propose to leverage emerging and standard computational and experimental approaches, building on existing and newly developed resources to support stewardship of plant genome reference sequences, genome annotations and gene networks. This will support development of a common standard platform for comparative genomic analysis and visualization. The enriched genome annotations will include controlled vocabularies to describe metadata and primary data associated with comparative phylogenomics, epigenetics, and population-based phenotypes. The proposed research in gene networks is directed at the development and validation of gene regulatory networks (GRN). The network view of the underlying molecular processes will enhance the fundamental biological understanding of development and abiotic stress responses and its relationship to agronomic traits. The computationally predicted and experimentally verified sub-networks combined with the prioritized regulatory gene targets will provide focal points for further research at gene-by-gene level. They will be integrated with the suite of genetic resources obtained from Objective 1, including SNPs and orthology mapping, and thus will be a resource for breeders and researchers engaged in molecular breeding approaches and segregation analysis. Genome-wide network reconstructions will be quite useful in quantifying and characterizing the genotype-to-phenotype relationships. We propose to leverage and build upon existing infrastructure to manage and analyze plant genomic, genetic, and phenotypic data. The resources will focus on the delivery of anticipated products from Objectives 1 and 2 with a focus on plant datasets, but much of the software will be species-agnostic, making the resources developed from the project usable to a broader audience including animals, insects, and fish relevant to agriculture, human health, and a sustainable environment.

Progress Report
Objective 1: We have continued to support standard representation of plant reference genome and comparative analyses in Gramene and Ensembl Plants using the Ensembl infrastructure and continue to evaluate methodology to support the reference plant genomes utilizing newly emerging sequencing technologies. This year we added two new reference genomes, for a total of 39 genomes in release. Also natural genetic variation was updated for two genomes, and we introduced genetic variation for one. Gramene announced five major releases that included software updates and primarily targeted the comparative analyses pipelines to support whole genome alignments and protein based gene trees. An extension of the work in Gramene, has been targeted research on the Oryza genus. Having emerged ~15 MYA, the Oryza genus now includes two cultivated species, Asian rice and African rice, and 21 wild species adapted to a broad range of tropical and subtropical habitats. In collaboration with the Oryza Genome Evolutionary project, we have built a dedicated resource to facilitate comparative and functional genomics research within the Oryza genus. The site ( features chromosome level assemblies of eleven genomes and partial reference assemblies for an additional four Oryza species. To compare these genomes we have applied systematic annotation and phylogenetic analysis of protein-coding genes. These results are yielding new insights into the taxonomic origin of genes and patterns of duplication, movement, and loss influenced by genome architecture. Sequencing technology has been rapidly progressing over the years. Consequently, the opportunity to utilize this technology to support new reference genomes and diversity profiling is constantly evolving. In the last year, we continued to evaluate hybrid approaches to support reference genome assembly, improve transcript models, and survey a sorghum EMS population. We continued our collaboration with Pacific Biosciences to support genome assembly and transcript profiling of the maize B73 line using the P6-C4 chemistry and established a new partnership with BioNano to evaluate high-resolution optical maps. We completed three de novo assemblies using 65X coverage of the genome. We compared these to the BioNano map, which allowed identification and correction of chimeras in both data sets. The resulting assembly consisted of 2,958 contigs with the N50 of 1.18 Mb. The hybrid assembly of the contigs with the genome map consisted of 354 scaffolds, with a scaffold N50 of 9.6Mb, comprising all but 268 PacBio contigs left unscaffolded. The final product is equivalent to the quality of the current draft human sequence. We have also used the single-molecule sequencing to improve the quality and diversity of transcripts. Using PacBio Iso-seq we have characterized 111,151 transcripts from 6 maize tissues capturing 70% of the genes annotated in maize. A large proportion of transcripts, 57%, represent novel, sometimes tissue-specific, isoforms of known genes and 3% correspond to novel gene loci. Objective 2: The goal of this work is to integrate genetics and genomics data sets to find molecular networks that influence the morphology (architecture) of plants (roots, stem, and flowers) and their response to the environment (low nitrogen and phosphorous). Because roots are responsible for the uptake of water & minerals, and inflorescences (flowers) bear the fruits and grains that we eat, the genetic and regulatory factors that govern their formation are clearly relevant to important agronomic traits such as nitrogen use efficiency, grain yield and harvesting ability. In the last year, we continued to develop a model to integrate expression data and network topology to support characterization of candidate genes associated with root system architecture utilizing our Arabidopsis miRNA Gene Regulatory Network (GRN) that contains 5,376 Protein-DNA interactions (PDIs). We are using this information to prioritize candidates for evaluation. We have continued our work on the ZF-HD transition factors (TF), which were identified as hub genes within the Arabidopsis miRNA network. In the last year, we generated multiple loss-of-function mutants using different approaches, including genetic crosses among single loss-of-function mutants, generation of knockdown artificial miRNA (amiRNA), and independent repressor lines. With the multiple loss of function, mutants within a single Arabidopsis background we observed several phenotypes, including altered flower structures and increased vegetative branching. To rapidly evaluate the functional conservation of this TF family in Arabidopsis and in the moss Physcomitrella patens, we have performed tobacco transient expression assays, selecting targets strategically from multispecies phylogenetic gene trees of the ZF-HD TF family. We hypothesize that these TFs act as developmental regulators, controlling vegetative plant architecture including branching and flower architecture. To extend the utilization of the ZF-HD TF genes to crops, we are screening the sorghum ethyl methanesulfonate (EMS) mutant collection (Collaboration with USDA-ARS in Lubbock, Texas) development. We have continued our ongoing collaboration with University of California, Davis and pioneer to study the GRN of Nitrogen Use Efficiency (NUE). Based on our previous GRN study of NUE in Arabidopsis, plus transcriptome analysis of genes those are differentially expressed in maize root and shoot tissues under nitrogen-limiting conditions (30 libraries). We selected 45 maize genes involved in the nitrogen metabolic pathway, transport, signal transduction, degradation, and transcriptional regulation, and are screening this against a prioritized 450 maize TFs library. In addition to maize, we have begun work on sorghum, an important emergent bioenergy crop that is also used for human consumption in sub-Saharan Africa. We are using next-generation sequencing approaches to identify single-nucleotide mutations associated with an increase in seed number and yield in collaboration with researchers in Lubbock, Texas. Using this approach, we have identified two genes that can change the structure of flowers and generate more seeds. This year we have embarked on collaboration with Embrapa in Brazil to characterize epigenetic marks related to root system morphology in sorghum and abiotic stress tolerance (low phosphate). We have selected four sorghum lines based on contrasting root phenotypes and other traits related to abiotic stress tolerance assessed by Embrapa in Brazil and generated a total of more than 120 libraries. Objective 3: Breakthroughs in imaging and sequencing technologies have led to new opportunities to generate reference genome sequences for a majority of the species. It has also resulted in massive challenges to manage, analyze, share and draw insights from the thousands of trillions of data points that are being generated. “Big Data” in Biology will require a paradigm shift. Data is no longer sparse. Cultural changes will be required to shift resources from the generation to the management and sharing of data. With colleagues, we are evaluating and developing national high performance computing resources to support the storage, and analyses of the plant genome and phenotype data. Such initiatives include collaboration with the Department of Energy (DOE) Systems Biology Knowledgebase (KBase) and the National Science Foundation (NSF) iPlant Collaborative (CyVerse/iPlant). KBase is an open-source, open-architecture framework for reproducible and collaborative computational systems biology. One of the primary objectives of KBase is to enable more accurate models for dynamic cellular systems for plants and microbes. In the last year we have contributed to a variety of analytical workflows that integrate gene expression profiles and metabolic networks, largely supported by the RNA-seq pipeline. In the last year, the iPlant project was rebranded to CyVerse based on a request from NSF to be more inclusive of research communities. The CyVerse platforms offer an open-source, comprehensive and foundational infrastructure to support plant biology research. In the last year, work has continued to support improvements associated with the Data Store, Discovery Environment (DE), and Atmosphere and have a successful first federated prototype of CyVerse system. In addition to the software development, we provided continued delivery of webinars, workshops, and training to support the scientific community, via the Gramene, KBase and CyVerse infrastructure. In the last year, we performed outreach and training at more than five international/domestic conferences. We have worked directly with commodity stakeholders in maize, sorghum, rice and grape. Objective 4: In the last year, an ARS scientist has continued their role as Chief Scientific Information Officer, in support of ARS Big Data initiative. In this role, they has worked closely with the Associate Director, Chief Information Officer and Chief Technology Officer to support the SCINet platform development. The SciNet platform consists of the Science data highway deployed among six locations (DMZ), High Performance Computing (HPC) resources (Ames, Iowa), development of the Virtual Resource Support Core (VRSC), and support for science focus workshops.

1. Improving the maize reference genome using single molecule sequencing technology. Complete and accurate reference genomes and gene annotations are necessary to characterize genetic and epigenetic variation, the basis of trait variation in crops. The current maize reference sequence is fragmented and missing complex repeat regions; it was based on technology from 2007. In the last two years ARS scientists in Ithaca, New York have worked with collaborators to generate a new reference genome assembly and gene annotation utilizing PacBio Single Molecule Real-Time (SMRT) sequencing and a high-resolution whole-genome restriction map. The new genome map represents a 66-fold improvement in fragmentation and doubles the number of characterized proteins. Comparison of a high-resolution restriction map of another maize inbred line also revealed the high level of structure variation in maize. The applications of these new technologies now make it possible to support development of high quality reference genomes for most species.


Review Publications
Kersey, P.J., Allen, J., Armean, I., Boddu, S., Bolt, B.J., Carvalho-Silva, D., Christensen, M., Davis, P., Falin, L.J., Grabmueller, C., Humphrey, J., Kerhornou, A., Khobova, J., Aranganathan, N.K., Langridge, N., Lowy, E., Mcdowall, M.D., Maheswari, U., Nuhn, M., Ong, C., Overduin, B., Paulini, M., Pedro, H., Perry, E., Spudich, G., Tapanari, E., Walts, B., Williams, G., Marcela-Tello, M., Stein, J., Wei, S., Ware, D., Boiser, D.M., Howe, K.L., Kulesha, E., Lawson, D., Maslen, G., Staines, D.M. 2016. Ensembl genomes 2016: more genomes, more complexity. Nucleic Acids Research. 44:D574-D580.
Liseron-Monfils, C., Ware, D. 2015. Revealing gene regulation and association through biological networks. Current Biology. 3-4:30-39.
Tello-Ruiz, M., Stein, J., Wei, S., Preece, J., Olson, A., Naithani, S., Amarashinghe, V., Dharmawardhana, P., Jiao, Y., Mulvaney, J., Kumari, S., Chougule, K., Elser, J., Wang, B., Thomason, J., Bolser, D., Kerhornou, A., Walts, B., Fonseca, N., Huerta, L., Keays, M., Tang, Y., Parkinson, H., Fabregat, A., Mckay, S., Weiser, J., D'Eustachio, P., Stein, L., Petryszak, R., Kersey, P., Jaiswal, P., Ware, D. 2015. Gramene 2016: comparative plant genomics and pathway resources. Nucleic Acids Research. doi: 10.1093/nar/gkv1179.
Merchant, N., Lyons, E., Goff, S., Vaughn, M., Ware, D., Micklos, D., Antin, P. 2016. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biology. 14(1):e1002342.
Liu, H., Niu, Y., Gonzales-Portillo, P.J., Zhou, H., Wang, L., Ware, D. 2015. An ultra-high-density map as a community resource for discerning the genetic basis of quantitative traits in maize. Biomed Central (BMC) Genomics. 16:1078.