Location: Plant, Soil and Nutrition Research2017 Annual Report
1: Apply computational, genomic, genetic and/or systems biology approaches to develop new models for plant genome structure and organization that advance our understanding of plant evolution and diversity. 1.1: Establish an integrated reference genome resource for plant genomes. 1.2: Analysis and visualization of genotypic, epigenomic, and functionally phenotypic diversity. 1.3: Comparative genomics: analysis of plant genomes (stewardship of reference resource) and visualization informed by evolutionary histories. 2: Analyze and develop genome level regulatory network models that focus on and integrate the processes underlying plant development and responses to environmental change. 2.1: Develop genome-wide functional networks for the model plant genome Arabidopsis. 2.2: Crop GRNs to support functional prediction for agriculturally relevant phenotypes. 3: Collaborate, develop and implement new standards for the management and analysis of plant genomic, genetic and phenotypic information to facilitate integration and interoperability between biological databases. 4: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources. 5: Accelerate sorghum trait analysis, germplasm analysis, genetic studies, and breeding by acquiring, integrating, and providing open access to sorghum genome sequences and annotations, germplasm diversity information, trait mapping information, and phenotype information in a sorghum crop genome database system, with an initial emphasis on sugarcane aphid resistance.
We propose to leverage emerging and standard computational and experimental approaches, building on existing and newly developed resources to support stewardship of plant genome reference sequences, genome annotations and gene networks. This will support development of a common standard platform for comparative genomic analysis and visualization. The enriched genome annotations will include controlled vocabularies to describe metadata and primary data associated with comparative phylogenomics, epigenetics, and population-based phenotypes. The proposed research in gene networks is directed at the development and validation of gene regulatory networks (GRN). The network view of the underlying molecular processes will enhance the fundamental biological understanding of development and abiotic stress responses and its relationship to agronomic traits. The computationally predicted and experimentally verified sub-networks combined with the prioritized regulatory gene targets will provide focal points for further research at gene-by-gene level. They will be integrated with the suite of genetic resources obtained from Objective 1, including SNPs and orthology mapping, and thus will be a resource for breeders and researchers engaged in molecular breeding approaches and segregation analysis. Genome-wide network reconstructions will be quite useful in quantifying and characterizing the genotype-to-phenotype relationships. We propose to leverage and build upon existing infrastructure to manage and analyze plant genomic, genetic, and phenotypic data. The resources will focus on the delivery of anticipated products from Objectives 1 and 2 with a focus on plant datasets, but much of the software will be species-agnostic, making the resources developed from the project usable to a broader audience including animals, insects, and fish relevant to agriculture, human health, and a sustainable environment.
Objective 1: We have continued to support standard representation of plant reference genomes and comparative analyses in Gramene and Ensembl Plants using the Ensembl infrastructure and continue to evaluate and improve the methodology to support the reference plant genomes utilizing newly emerging sequencing technologies. In the last year, the Gramene comparative genomics and pathways database (Release 53) grew to include 44 plant reference genome assemblies, including five new genomes (sugar beet, oilseed rape, red clover, and two red algae species) and updates to four existing genomes (maize, bread wheat, sorghum and Arabidopsis thaliana). As a part of the major new initiative within EMBL-EBI, Track Hub Registry now displays over 900 public RNA-Seq studies (16,000 tracks across 35 plant species) on the Gramene site. To support genetic variation, we added data from the rice 3000 genomes project, co-developed ACE (Assessing Changes to Exons), a new software tool to generate structural gene annotations for each sequenced individual in a population and built a prototype pan-genome browser for maize, which provides a focused comparative and phylogenetic analysis of four maize accessions (B73, W22, PH207, and teosinte TIL11) to support storing and accessing gene based structural variation in maize, complete with a satellite browser (maizev4.ensembl.org) for visualization. We have also continued to extend oge.gramene.org, our satellite browser resource focused on rice evolution, by adding two new reference genomes for rice varieties IR8 and N22, resulting in a total number of thirteen reference assemblies in this rice resource. We are working with collaborators this year to renew the Gramene project grant. We have continued to support the maize reference genome and annotation. We published on the maize reference transcriptomes (and the B73 V4 reference assembly and annotations. The IsoSeq transcriptome data increased the number of alternate transcripts from 1.6 to 3.3 per gene and the improved assembly decreased the gaps in the 3kb flanking sequence of genes affected from 20% to less than 1%, thus substantially improving the annotation of core promoter elements. We are currently working on PacBio assemblies and annotations of 3 additional accessions, Ki11, NC350 and Mo17, all parents of the reference NAM (nested association mapping) population. We are working with collaborators to identify funds to support assembly and annotation of all 25 parental lines. We continue to explore emerging sequencing technologies as we target grape, sorghum, and maize genomes, establishing a collaboration with 10X genomics to utilize linked read technologies. Preliminary results in grape suggest that the linked read assemblies capture the gene space similarly as effectively as the long read assemblies from PacBio single molecule technologies, but fail to capture the full complement of repetitive elements. Objective 2: The goal of this work is to integrate genetics and genomics data sets to identify molecular networks that influence the morphology (architecture) of plants (roots, stem, and flowers) and their response to the environment stress (such as low nitrogen and phosphorous). We have prepared and will submit a manuscript on the arabidopsis ZF-HD hubs in the miRNA network, which impact vegetative branching and floral architecture. We have contributed to a manuscript on arabidopsis genes involved in cell wall biogenesis. We have also identified a sorghum transcription factor controlling panicle seed number and are currently preparing a manuscript to share that finding. We continue to dissect N use efficiency (NUE) networks in arabidopsis and maize. This year, with our collaborators at University of California-Davis and Pioneer, we finalized an arabidopsis NUE network, integrated the project and public expression data, and using our NetCorr pipeline ranked TFs, the information was used to prioritize germplasm evaluation under altered nitrogen metabolism. We produced a draft Maize NUE network based on 100 promoters and 450 TFs, and have characterized conserved gene edges. We have continued evaluation of the RNA-seq and Chipseq data from the four sorghum lines based on contrasting root phenotypes and yield response to low phosphorous. Objective 3: With colleagues, we are evaluating and developing national high performance computing and cloud resources to support the storage, and analyses of the plant genome and phenotype data. Such initiatives include collaboration with the Department of Energy (DOE) Systems Biology Knowledgebase (KBase) and the National Science Foundation (NSF) iPlant Collaborative (CyVerse/iPlant), and NSF Gramene. In the last year, the focus has been on updating and extending workflows for genome assembly, annotations, transcript and epigenetic profiling, in all 3 platforms. In addition to the software development, we provided continued delivery of webinars, workshops, and training to support the scientific community, via the Gramene, KBase and CyVerse infrastructure. In the last year, we performed outreach and training at more than five international/domestic conferences. We have worked directly with commodity stakeholders in maize, sorghum, rice and grape and are working with collaborators to secure renewal of funding for these projects. Objective 4: In the last year an ARS Scientist has been serving in the role as ARS Chief Scientific Information Officer, in support of ARS Big Data initiative and supported the transition of this role to another ARS scientist. In this role, we have worked closely with the Associate Director, Chief Information Officer and Chief Technology Officer to support the SCINet platform development. In the last year, a Research Support Agreement was established with Iowa State University to support the Virtual Resource Support Core (VRSC). We are currently working on two additional agreements, an interagency collaboration with NIH to support Genome assembly and annotation and one with Software and Data carpentry to support training in data literacy.
1. Genetic resistance to disease pathogens discovered in wild rice. Disease pathogens severely impact crop production, necessitate the use of costly and dangerous chemicals and pose an increasing threat as climate change expands the geographical range of pests in the future. ARS researchers in Ithaca, New York have worked with domestic and international partners to analyze the genomes of 13 wild rice species and in doing so, identified specific genes that provide genetic resistance to disease pathogens. They demonstrated that some of these genes were in rice cultivated in Asia and Africa thousands of years ago. This knowledge gained holds promise of finding additional disease genes that will help breeders to develop novel and durable resistance to existing and emerging plant pathogens that might otherwise limit crop performance.
2. Interpreting and predicting DNA sequence accurately. Maize (corn) is the most important crop grown in the U.S. in terms of value with wide uses for human and animal food in addition to bioenergy. ARS researchers in Ithaca, New York developed maize gene sequence information resulting in more accurate prediction and definition of maize genes. More accurate gene sequences for a highly important US crop will provide robust input for downstream applications, including efforts to breed improved maize varieties.
Majoros, W.H., Campbell, M.S., Holt, C., Denardo, E., Ware, D., Allen, A.S., Yandell, M., Reddy, T. 2016. High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE. Bioinformatics. doi: 10.1093/bioinformatics/btw799.
Gupta, P., Naithani, S., Tello-Ruiz, M., Chougule, K., D'Eustachio, P., Fabregat, A., Jiao, J., Keays, M., Lee, Y., Kumari, S., Mulvaney, J., Olson, A., Preece, J., Stein, J., Wei, S., Weiser, J., Huerta, L., Petryszak, R., Kersey, P., Stein, L., Ware, D., Jaiswal, P. 2016. Gramene database: navigating plant comparative genomics resources. Current Biology. DOI: 10.1016/j.cpb.2016.12.005.
Adam-Blondon, A., Alaux, M., Pommier, C., Cantu, D., Cheng, Z., Cramer, G.R., Davies, C., Delrot, S., Deluc, L., Di Gaspero, G., Grimplet, J., Fennell, A., Londo, J.P., Kersey, P., Mattivi, F., Naithani, S., Neveu, P., Nikolski, M., Pezzotti, M., Reisch, B., Topfer, R., Vivier, M., Ware, D., Quesneville, H. 2016. Towards an open grapevine information system. Horticulture Research. doi: 10.1038/hortres.2016.56.
Smedley, D., Haider, S., Spooner, W., Ware, D., Youens-Clark, K., Kasprzyk, A. 2015. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Research. 43(W1):W589-598.
Taylor-Teeples, M., Lin, L., De Lucas, M., Zhang, L., Ware, D., Brady, S. 2014. An arabidopsis gene regulatory network for secondary cell wall synthesis. Nature. 517:571-575.
He, F., Yoo, S., Wang, D., Kumari, S., Gerstein, M., Ware, D., Maslov, S. 2016. Large-scale atlas of microarray data reveals biological landscape of gene expression in Arabidopsis. Plant Journal. 86(6):472-480.
Jiao, Y., Burke, J.J., Chopra, R., Burow, G.B., Chen, J., Wang, B., Hayes, C.M., Emendack, Y., Ware, D., Xin, Z. 2016. A sorghum mutant resource as an efficient platform for gene discovery in grasses. The Plant Cell. 28:1551-1562.
Wang, B., Tseng, E., Regulski, M., Clark, T., Hon, T., Jiao, Y., Lu, Z., Olson, A., Stein, J., Ware, D. 2016. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nature Communications. 7:11708 doi: 10.1038/ncomms11708.
Law, M., Childs, K.L., Campbell, M.S., Stein, J.C., Olson, A.J., Holt, C., Panchy, N., Lei, J., Jiao, D., Andorf, C.M., Lawrence, C.J., Ware, D., Shiu, S., Sun, Y., Jiang, N., Yandell, M. 2015. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiology. 167(1):25-39.
Seaver, S.M., Gerdesa, S., Frelind, O., Lerma-Ortize, C., Bradburyd, L.M., Zallote, R., Hasnaind, G., Niehausd, T.D., El Yacoubie, B., Pasternak, S., Olson, R., Pusch, G., Overbeek, R., Stevens, R., De Crecy-Lagarde, V., Ware, D., Hanson, A.D., Henry, C.S. 2014. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource. Proceedings of the National Academy of Sciences. 111(26):9645-9650.
Adam-Blondon, A., Alaux, M., Pommier, C., Cantu, D., Cheng, Z., Cramer, G., Davies, C., Delrot, S., Deluc, L., Di Gaspero, G., Grimplet, J., Fennell, A., Londo, J.P., Kersey, P., Mattivi, F., Naithani, S., Neveu, P., Nikolski, M., Pezzotti, M., Reisch, B., Topfer, R., Vivier, M., Ware, D., Quesneville, H. 2016. Towards an open grapevine information system. Horticulture Research. 3:16056. doi:10.1038/hortres.2016.56.