Location: Plant, Soil and Nutrition Research2008 Annual Report
1a. Objectives (from AD-416)
Biological research benefits extraordinarily from the integration of many different types of data both within and between species. The first specific objective of the proposal builds on existing and emerging data sets, providing resources to characterize, track and ultimately identify sequence associated with agronomically important traits. The second objective addresses infrastructure to manage, visualize and distribute complex datasets. The research makes use of four methodologies, data integration, software development, genome annotation, and evolutionary analysis. Throughout the proposal each objective builds upon each other. Combined they hold greater potential for providing a knowledge base for improving agricultural varieties. 1. Enhance our knowledge of plant genome structure, organization and evolution through computational and experimental approaches. 2. Develop and implement standards for plant genome databases. This includes development of vocabulary, methods, database structures and visualization software to facilitate data integration and interoperability.
1b. Approach (from AD-416)
We propose to leverage computational and experimental approaches, building on existing and new developed resources to create standardized baseline comparative maps and genome annotations across plant genomes, with an emphasis on crop grasses and other agriculturally important species as well as model genomes. As part of this work we will leverage existing infrastructure and build upon these to deliver data management and visualization tools for sequence, maps, diversity, and phenotype data sets.
3. Progress Report
This is to report activities from May of 2008. In the past 3 months we are reviewing the results of the gene trees and exploring methods that would allow us to use the gene trees to build synteny maps. The group is testing whole genome nucleotide alignment pipeline available as part of the Ensembl project. Alignments have been produced between a monocots, rice, sorghum and maize and between the dicots, arabidopsis, poplar and grape. We are now evaluating the results. One area of specific interest within the group is functional regulatory sequences and networks. Transcription factors bind to specific DNA motifs and control the expression of genes. To computationally identify motifs we are evaluating CREAD (Comprehensive Regulatory Element Analysis Discovery) which was developed in Michael Zhang’s group at CSHL. The objective is directly related to NP 301 Component 2. In the last 3 months we have continued to make use of the Ensembl infrastructure to store and visualize genomic data sets and are making use of the most recent release 49. We have continued to use and support the development of the plant ontology, including updates to the software, descriptions, and associations to terms. We have recently made available web services using the distributed annotations server that is native to Ensembl, and participated in the development and deployment of semantic services as part of the Virtual Plant Information Network (VPIN). More recent work in the lab has focused on the handling and interpretation of short sequence reads from the new sequencing technologies. The group has been evaluating methods for aligning data to reference genomes and using this information to identify single nucleotide variation. The objective is directly related to NP 301 Component 3.
5. Significant Activities that Support Special Target Populations