Location: Plant, Soil and Nutrition Research2011 Annual Report
1a. Objectives (from AD-416)
Biological research benefits extraordinarily from the integration of many different types of data both within and between species. The first specific objective of the proposal builds on existing and emerging data sets, providing resources to characterize, track and ultimately identify sequence associated with agronomically important traits. The second objective addresses infrastructure to manage, visualize and distribute complex datasets. The research makes use of four methodologies, data integration, software development, genome annotation, and evolutionary analysis. Throughout the proposal each objective builds upon each other. Combined they hold greater potential for providing a knowledge base for improving agricultural varieties. 1. Enhance our knowledge of plant genome structure, organization and evolution through computational and experimental approaches. 2. Develop and implement standards for plant genome databases. This includes development of vocabulary, methods, database structures and visualization software to facilitate data integration and interoperability.
1b. Approach (from AD-416)
We propose to leverage computational and experimental approaches, building on existing and new developed resources to create standardized baseline comparative maps and genome annotations across plant genomes, with an emphasis on crop grasses and other agriculturally important species as well as model genomes. As part of this work we will leverage existing infrastructure and build upon these to deliver data management and visualization tools for sequence, maps, diversity, and phenotype data sets.
3. Progress Report
Significant progress was made on research for both project objectives dealing with plant genome structure and the development and improvement of plant genome databases. All of this research is under National Program 301, specifically within Component 2: Crop Informatics, Genomics, and Genetic Analyses. Computational resources were developed to support crop informatics, focusing on structural and functional comparisons, and characterizing genetic diversity. In support of this work, methodology was developed to identify sequence variation, regulatory sequences, epigenetic variation and transcript profiling using the recently available new DNA sequencing technologies. Improvements in the computational resources included updates to existing software, evaluation of new software, new development as well as improvements for visualization and performance, review of existing standards and data sets, and continuing existing as well as extending collaborations with other bioinformatic projects including Ensembl, Plant Ensembl, MaizeGDB, GrainGenes and IPlant. To support the baseline annotation objectives, gene trees were generated for 10 plant genomes as well as 4 model species (human, drosophilia, c. elegans, and yeast). The gene trees serve as a foundation for comparative maps between species and evolutionary analyses within species. This framework has provided the foundation for projection of functions for similar genes in different plant species). Whole genome alignments between a reference dicot (Arabidopsis) and monocot (rice) serve as an additional support that complements protein based comparisons. Primary structural and functional annotations were generated for the updated RefGenV2 assembly of maize. More recent work has focused on evaluating de novo assembly tools to support plant genome assembly from next generation sequencing reads. This work is part of the NSF Gramene project, and the USDA/ DOE/NSF Maize Sequencing project. The developed primary annotations have contributed to understanding natural variation and genome stability- specifically how genes are maintained and lost, and to generate the current patterns of maize variation. In the last year, we also studied within species variation in grape and maize. Maize variation data was generated from 100 inbred lines. 55 million high confidence DNA sequence variations were identified. This work was done in collaboration with the NSF sponsored Gramene and Maize Diversity projects, EBI Ensembl genomes, and USDA ARS researchers. Resources were also developed to support the identification of the promoter sequences where transcription factors bind to regulate gene expression and to identify regulatory networks associated with plant development and response to abiotic stresses. Using computational methods, known core promoter motifs were globally evaluated from eukaryotes and then characterized in 6 plant genomes. Predicted binding sites for more than 90 promoters have been identified across the 6 plant genomes. Also, the prediction of gene targets for regulation by maize microRNA's was carried out.
1. Characterize genetic variation in maize. Maize is the world’s largest production crop in the world and its diversity has allowed it in a few thousand years to adapt to the tropics, mountains, and temperate locations. ARS researchers at the Robert W. Holley Center for Agriculture and Health at Ithaca, NY, are the primary leaders, using next generation sequencing technology characterizing this diversity to an unprecedented level in the world’s key breeding lines and its wild relatives. More than 55 million high confidence small genetic variations were identified. In addition to small genetic variation, surprisingly, nearly 90% of the genome is associated with large changes in sequence variation and this variation is responsible for a substantial portion of trait variation in maize. Despite the incredible differences between individual maize plants, there is tremendous similarity in key gene content even with the relatives of modern maize. This suggests that adaptations (perennialism, frost and drought tolerance, etc.) amongst all of maize’s relatives are likely integratable in maize.
2. Characterization of model plant root regulatory network. How roots grow and how they respond to changes in the environment are determined often in a cell type-specific manner. ARS researchers at the Robert W. Holley Center for Agriculture and Health at Ithaca, NY, have established a resource that allows for high-throughput identification of genes that control gene expression in roots of a model plant Arabidopsis. By using this resource they have developed a regulatory network that contains more than 60 genes that are involved in root architecture and response to environment. The genes in this network have been released and now serve as a tool to identify similar genes in crop plants.
Schnable, P., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., Liang, C., Wing, R., Wilson, R., Zhang, L., Chia, J., Narechania, A. 2009. The B73maize genome: complexity, diversity, dynamics. Science. 326(5956):1112-1115.