Project : USDA ARS

ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #434556

Research Project: Mapping Crop Genome Functions for Biology-Enabled Germplasm Improvement

Location: Plant, Soil and Nutrition Research

2019 Annual Report

Objectives
Objective 1: Apply comparative genomic, genetic, and molecular approaches to the dissection of complex traits and the understanding of genome functions; develop and implement new standards for the management and analysis of plant genomic, genetic and phenotypic information; and dissect gene networks associated with programming crop plant development and adaptation to environment (GxE). Sub-objective 1.A: Reference genomic resources will be generated to target support four crop communities. The achievement of this objective will generate information management resources for maize (Zea mays), sorghum (Sorghum bicolor), grapevine (Vitis vinifera), and rice (Oryza sativa). Sub-objective 1.B: Develop functional and comparative genomics resources for plant reference genomes. The achievement of this objective will expand the Gramene databases to encompass reference genomes of at least 75 unique plant species. Sub-objective 1.C: Develop functional networks for crop and model species. Through achieving this objective, an integrated genetics, transcriptomics, and molecular interaction data will be generated to define regulatory networks that influence plant traits through effects on developmental morphology (architecture) and response to environment. Objective 2: Accelerate sorghum trait analysis, germplasm analysis, genetic studies, and breeding by acquiring, integrating, and providing open access to sorghum genome sequences and annotations, germplasm diversity information, trait mapping information, and phenotype information in a sorghum crop genome database system, with an initial emphasis on sugarcane aphid resistance.

Approach
The future of crop breeding will increasingly rely on strategies that combine genetic resources with rapidly advancing tools and knowledge in genomics, trait mapping, high-throughput phenotyping, and genome-editing. Yet, major challenges remain in translating vast amounts of data into useable biological models and building scalable information systems to enable researchers and breeders to contribute to and exploit these future technologies. To meet these challenges, this project will engage several strategic initiatives and collaborations that produce new genomics data, cyberinfrastructure, and hypothesis-based research. The first objective will generate new genomics datasets among four crop groups: sorghum, maize, rice and grapevine. Objective 1.A will produce a minimum of 30 high-quality reference genome assemblies, transcriptomes, and corresponding annotations. In maize and sorghum, we will also generate ENCODE-type molecular data sets to study the relationships between chromatin structure, gene expression, and phenotype. In sorghum and grapevine, we will sequence disease resistance genes across key germplasms that target critically important pests/pathogens. To enable sharing of reproducible workflows and promote interoperability, computational work will be performed using the recently developed SciApps cyberinfrastructure. In Objective 1.B, genomics data will be further disseminated via Gramene/Ensembl to support genome stewardship, comparative and pan-genomics analysis (in 2-3 crop groups), and display of ENCODE-type and publicly archived variation/genotype data. This platform will enable researchers to evaluate structural variation within crop clades and use conservation profiles to evaluate candidate genes. In Objective 1.C, we will continue several hypothesis-based studies of gene regulatory networks that underlie yield components influenced by morphological development and nutrient and stress response/adaptation. These projects combine forward and reverse genetics with transcriptional profiling, fluorescence in situ sequencing, and yeast-based molecular interaction assays to elucidate regulatory pathways that control plant traits. We will continue to use sorghum EMS mutagenized lines to dissect pathways underlying inflorescence architecture and the multi-seeded trait. Research in nitrogen use efficiency will be continued using maize and Arabidopsis as models. Objective 2 focuses on development of the new sorghum genomics and genetics portal to serve scientists and breeders working on grain sorghum improvement. Goals include initial release of Sorghum Base as a comparative functional genomics resource, with future development of infrastructure to support phenotypic data and genomics-enabled germplasm improvement. A critical component of this plan includes sorghum community engagement. The products of these two objectives will include well-characterized germplasm and the associated genotypic and phenotypic characterization of complex agronomic traits, which will enable genomic-assisted breeding and novel approaches for understanding the genetic architecture of traits critical to US agriculture.

Progress Report
Objective 1: Sub-objective 1.A: Reference genomes and sequence assemblies: We continue to evaluate emerging sequencing technologies. This year we pressed forward with the long single molecule (LSM) sequencing and benchmarked assembly tools with academic, government and industry partners. Draft assemblies were generated for 27 maize accessions, the parents of the maize nested-association-mapping (NAM) panel, and 2 sorghum genomes - TX2783 (Sugarcane Aphid resistant), and TX436 (High food quality male parental line). The new B73v5 has an improved sequence N50 of 48 Mb, up from 1 Mb in v4. Further, our collaborators developed core markers for grapevine, and submitted a manuscript on this work (362048). Our collaborators also submitted a manuscript on the characterization of maize mutant UFO (unstable factor for orange1), a novel protein in Poaceae, which triggers widespread phenotypic and molecular dysregulation (358338). Reference Transcriptomes: During the last year, we made reciprocal crosses to maize Ki11 and B73, generated Iso-seq libraries and were able to demonstrate phasing of the transcript and submitted a manuscript on the work (335965) (Accomplishment 1). In addition, we submitted a review manuscript on using LSM sequencing for transcript profiling (365053). Baseline annotations: This year we are improving and extending our existing gene annotation pipeline. In a review of our existing workflow we identified instances where we were under calling the gene loci. We were able to improve these calls by curating the repeat libraries that contained gene fragments and added additional ab initio gene callers. Ren-Seq: We have contributed to the design of oligo capture arrays for disease resistance genes in both grape (collaborator: Geneva, NY) and grasses (collaborator: UC Berkeley). The capture array for grasses was designed and a 90K oligo capture array was generated. A pilot study for 8 sorghum germplasm captures included 8 tolerant and resistant lines. We are waiting for the sequencing from the capture. Cyberinfrastructure to support reproducible workflows: We have continued development on SciApps, a lightweight bioinformatics workflow system (344693, 354685). This year, we improved the robustness and usability, and added new workflows for RAMPAGE (RNA Annotation and Mapping of Promoters by Gene Expression). We provided continued delivery of webinars, workshops, and training to support the scientific community, via the Gramene (327938, 337632, 327938), SciApps (344693, 354685), and KBase infrastructure (337646) while participating in more than 8 international/domestic conferences. We have worked directly with commodity stakeholders in maize, sorghum, rice and grape. We contributed to the Agbio consortium manuscripts (352286) on recommendations for standards for sustainable genomics and genetics databases for agriculture. Sub-objective 1.B: Develop functional and comparative genomics resources for plant reference genomes. The achievement of this objective will expand the Gramene databases to encompass reference genomes of at least 75 unique plant species. Gramene: We continued our collaboration to deliver the Gramene portal (327938, 337632, 327938). The Gramene delivers the comparative genomics databases in collaboration with the Ensembl Genomes (326566) project at the European Bioinformatics Institute (EMBL-EBI), and collaborates closely with the EBI’s Expression Atlas project to provide manually curated, quality-controlled, and analyzed transcriptomic data. The genomes and pathways are made accessible using FAIR principles (findable, accessible, interoperable and reproducible), and adhere to the open standards for agricultural data management and stewardship. This year, Gramene made three releases. The resource now stands at 61 distinct species--spanning major crops, models, and lower plants. With collaborators, a proposal was submitted to NSF, declined, and a supplement with a recommendation from the NSF program manager to resubmit was received. Gramene PanGenome subsites: In the last two years, we have used the Ensembl data model to generate pan-genome subsites for rice, grape and maize. Using the Ensembl Compara gene tree pipeline, we perform an all-vs-all phylogenetic analysis of protein-coding genes. In the last year we updated the grape and maize beta sites. Sub-objective 1.C: Develop functional networks for crop and model species. Through achieving this objective, an integrated genetics, transcriptomics, and molecular interaction data will be generated to define regulatory networks that influence plant traits through effects on developmental morphology (architecture) and response to environment. Biological models provide insights for engineering an improved germplasm through marker assisted breeding, or directed mutagenesis, such as CRSIPR. The novel genetic variation in sorghum can be used for functional screening and as direct inputs into breeding programs. We are continuing to dissect gene networks on grain number (344217, 365033), nitrogen use efficiency (NUE) (349554), phosphorous use efficiency (PUE), water use efficiency (WUE), vegetative branching (345677), root architecture and diseases resistance. We have contributed to manuscript submission on grain number (365033), the characterization of the gene locus Male sterile 9 (encoder of a phd-finger protein required for pollen development) with collaborators in Lubbock (359143), and characterized the cis regulatory motifs associated with the molecular circuit underlying the maize shoot apical meristem (365059). Grain number: In our previous work, we determined MSD1 is a TCP transcription factor (344217). RNAseq results suggested MSD1 may impact programmed cell death signaling in the infertile, pedicellate spikelets in wild-type by modulating hormone pathways. This year we identified the gene for MSD2 as lipoxygenase (LOX). LOX catalyzes the first step of plant hormone jasmonic acid (365033). We demonstrated that MSD1 binds its own and MSD2 promoter and forms homo-dimers with itself, and another close sorghum paralog in transient assays. We used DNA Affinity Purification Sequencing (DAPseq) to generate genome wide binding maps of MSD1 and demonstrated MSD1 binds near the transcriptional start site of other putatively regulatory genes. (Accomplishment 1) Nitrogen use efficiency: In our previous work, we generated an arabidopsis NUE (349554) and were able to prioritize candidate genes for phenotyping using the NECorr pipeline (365046). This year, we extended the arabidopsis NUE GRN, with inclusion on the public arabidopsis DAP Seq genome wide Transcription Factor binding maps. We found 255 TFs that have peaks within the promoter regions of 462 genes in the Y1H NUE network, 169 of which are correlated with target gene expression levels from the N gene expression samples. We shifted the validation of target in grasses, from maize to sorghum, leveraging sorghum EMS population. We initiated a screen for 80 of the candidate TFs. (see objective 2.0 IDT evaluation). Shoot apical meristems: The shoot apical meristem (SAM) at the plant’s growing shoot tip orchestrates the balance between stem cell renewal and organ initiation essential for post-embryonic growth. In work with collaborators at Tubbingen, Germany, we supported characterization of the molecular signatures, based on transcript profiling of laser capture cells from 10 domains in the maize SAM. Our contributions to the work included examination of the cis-regulatory motifs and enrichment in the central zone. (Accomplishment 3) Objective 2: Accelerate sorghum trait analysis, germplasm analysis, genetic studies, and breeding by acquiring, integrating, and providing open access to sorghum genome sequences and annotations, germplasm diversity information, trait mapping information, and phenotype information in a sorghum crop genome database system, with an initial emphasis on sugarcane aphid resistance. In the past year, we have continued to meet with stakeholders to review needs of the resources and provide updates on the development of the SorghumBase portal. The portal is in beta (www.sorghumbase.org) with a targeted release in December 2019. This year, the project had representation at an annual field day in Lubbock, TX (2018), PAG 2019 (365028) and the DOE Genomics Contractors meeting. We held 4 webinars, provided training on the content management system and hosted 2 researchers on site. The outcome of the visits is the incorporation of the EMS workflow in SciApps, based on the methods used to characterize the MSD1 (344217), and the evaluation of the adoption of the fieldbook, currently under evaluation by the Breeding Insight project as well. We coordinated the incorporation of 200 U.S. Grain Sorghum markers, in 2000K sets of core markers for sorghum and the development of 220 probe set for markers for novel variation in the EMS population using rhAmpSeq (Repeat Amplification Sequencing) technology with a commercial company. The 220 designs were tested and the initial results are promising where we were able to confirm around 48% of the targets sampling on 8 plants from each pool. Unfortunately, the 2000K probe design had many off targets and the collaborators are exploring other technology.

Accomplishments
1. Long read single molecule sequencing provides an opportunity for phasing of genetic variants. Haplotype phasing is important for the interpretation of the genome function, and the characterization of haplotype expression, will contribute to an understanding of heterosis or hybrid vigor. Hybrid vigor is the improvement in a trait that results from a cross between two individuals in a species or between species. An ARS researcher in Ithaca, New York, worked with academic partners at Cold Spring Harbor, and industry partners in California to characterize isoform-level phasing of full-length gene transcripts in two inbred maize lines, B73 and Ki11, as well as their reciprocal crosses. We developed a tool to analyze the full-length transcript data and were able to phase genes in three different tissues. We identified parental-origin transcript isoforms, different novel isoforms between maize parent and hybrid lines, and provided measures of haplo-typic expression that increase power and accuracy in studies of allelic expression. This is the first study of phased full-length isoforms in maize, as well as in plants. The methods developed support characterization of allele-specific full-length transcriptional level, and opportunities to explore potential insights into heterosis.

2. Loss of function of a metabolic enzyme lipoxygenase (LOX) increases seed number in sorghum. Agronomically important traits like yield can be broken down into smaller traits, one of which includes grain number. In sorghum there are two types of flowers on the stem, one type of flower is normal infertile and does not produce seeds. ARS researchers in New York, Texas and Florida worked with academic collaborators at Cold Spring Harbor to characterize a novel mutation in sorghum responsible for an increase in grain number. The gene is the first in a metabolic pathway that generates a plant hormone. The reduction of the plant hormone results in both flower types being fertile and an increase in grain number. The knowledge of these genes, and the germplasm can be used to increase grain number in sorghum, and perhaps in other grasses.

3. Expression map of corns shoot stem cell niche reveals complex coordination of genes are required to specifying cell fate. In plants the shoot apical meristem (SAM) at the plant’s growing shoot tip orchestrates the balance between stem cell renewal and organ initiation essential for post-embryonic growth. An ARS researcher in Ithaca, New York, worked with academic partners in Cold Spring Harbor, and Tübingen, Germany, to support characterization of molecular signatures, based on transcript profiling of 10 domains in the maize SAM. A key outcome is sub-functionalization and combinatorial transcription factor activity, emerge as fundamental features underlying cell fate specification and allelic diversity present near dynamically expressed genes and the transcription factors that regulate them, link these molecular circuitries, to morphological traits. The knowledge of these genes and the function may be candidates for new agronomic targets.

Review Publications
Majoros, W., Holt, C., Campbell, M., Ware, D., Yandell, M., Reddy, T. 2018. Predicting gene structure changes resulting from genetic variants via exon definition features. Bioinformatics. 34(21):3616-3623.
Arkin, A., Stevens, R., Cottingham, R., Maslov, S., Henry, C., Dehal, P., Ware, D., Perez, F., Harris, N., Canon, S., Sneddon, M., Henderson, M., Riehl, W., Gunter, D., Murphy-Olson, D., Chen, S., Kamimura, R., Brettin, T., Meyer, F., Chivian, D., Weston, D., Glass, E., Davison, B., Kumari, S., Allen, B., Baumohl, J., Best, A., Bowen, B., Brenner, S., Bun, C., Chandonia, J., Chia, J., Colasanti, R., Conrad, N., David, J., Dejongh, M., Devoid, S., Dietrich, E., Drake, M., Dubchak, I., Edirisinghe, J., Fang, G., Faria, J., Frybarger, P., Gerlach, W., Gerstein, M., Gurtowski, J., Haun, H., He, F., Jain, R., Joachimiak, M., Keegan, K., Kondo, S., Kumar, V., Land, M., Mills, M., Novichkov, P., Oh, T., Olson, G., Olson, B., Parrello, B., Pasternak, S., Pearson, E., Poon, S., Price, G., Ramakrishnan, S., Ranjan, P., Ronald, P., Schatz, M., Seaver, S., Shukla, M., Sutormin, R., Syed, M., Thomason, J., Tintle, N., Wang, D., Xia, F., Yoo, H., Yoo, S. 2018. KBase: the United States Department of Energy systems biology knowledgebase. Nature Biotechnology. 36:566-569.
Chougule, K., Wang, L., Stein, J., Wang, X., Devisetty, U., Klein, R.R., Ware, D. 2018. Improved RNA-seq workflows using cyVerse cyberinfrastructure. Current Protocols in Bioinformatics. 63(1):e53. https://doi.org/10.1002/cpbi.53.
Bukowski, R., Guo, X., Lu, Y., Zou, C., He, B., Rong, Z., Yang, B., Wang, B., Xu, D., Xie, C., Fan, L., Gao, S., Xy, X., Zhang, G., Li, Y., Jiao, Y., Doebley, J., Ross-Ibarra, J., Buffalo, V., Romay, C., Buckler IV, E.S., Wu, Y., Lai, J., Ware, D., Sun, Q. 2018. Construction of the third generation Zea mays haplotype map. Gigascience. 7(4):1-12.

U.S. DEPARTMENT OF AGRICULTURE

Plant, Soil and Nutrition Research: Ithaca, NY