Location: Plant, Soil and Nutrition Research2009 Annual Report
1a. Objectives (from AD-416)
Biological research benefits extraordinarily from the integration of many different types of data both within and between species. The first specific objective of the proposal builds on existing and emerging data sets, providing resources to characterize, track and ultimately identify sequence associated with agronomically important traits. The second objective addresses infrastructure to manage, visualize and distribute complex datasets. The research makes use of four methodologies, data integration, software development, genome annotation, and evolutionary analysis. Throughout the proposal each objective builds upon each other. Combined they hold greater potential for providing a knowledge base for improving agricultural varieties. 1. Enhance our knowledge of plant genome structure, organization and evolution through computational and experimental approaches. 2. Develop and implement standards for plant genome databases. This includes development of vocabulary, methods, database structures and visualization software to facilitate data integration and interoperability.
1b. Approach (from AD-416)
We propose to leverage computational and experimental approaches, building on existing and new developed resources to create standardized baseline comparative maps and genome annotations across plant genomes, with an emphasis on crop grasses and other agriculturally important species as well as model genomes. As part of this work we will leverage existing infrastructure and build upon these to deliver data management and visualization tools for sequence, maps, diversity, and phenotype data sets.
3. Progress Report
Tools were developed to enhance the existing genome annotations infrastructure and methodology was developed to identify sequence variation and regulatory sequences using the recently available short-read sequencing technology. Improvements in the computational resources included updates to existing software, evaluation of new software, new development as well as improvements for visualization and performance, review of existing standards and data sets, and continuing existing as well as extending collaborations with other bioinformatic projects including Ensembl, Plant Ensembl, GMOD, MaizeGDB, and GrainGenes. Baseline annotations were developed for several plant genomes as part of the NSF Gramene project, the USDA, DOE, NSF Maize Sequencing project, and the USDA Grape Variation project part of the larger USDA Genetic Trait Index. The Baseline annotation identified 82% of the genome as repetitive, ~ 30 K high-confidence protein coding genes were identified, and ~ 130 microRNA genes from 26 families were produced for the complete draft assembly of maize B73. Sequence variation was characterized for 27 maize ( ~3 million variations) and 12 grape (473,000 variations) accessions were determined using short read sequencing approaches. The primary annotations developed as part of this project has contributed towards the understanding of genome stability specifically how genes are maintained and lost, current patterns of variation in maize, and hybrid vigor. Grape variation data was used to produce a genotyping array that is currently being used to genotype a large portion of the USDA European and American grape collection. Analysis of the variation data from grape and maize has lead to insights into the utility and challenges of using existing platforms as well as short read sequencing for genotyping and genome wide association analysis in plants. The information can be used to make recommendations for future investments for these as well as other crop species.
1. Characterized the Maize B73 genome. Sequencing a genome in itself is useful but the integration of additional biological information associated with the genomes makes each of the individual data sets considerably more valuable and useful. In the last year, we have contributed to the assembly and annotation of the recent draft of the maize B73 genome. As part of the annotation work, the group computationally predicted repeats, protein-coding genes, and microRNA genes. Through a collaborative effort, more than 3 million variations were identified between the B73 accession and 25 additional accessions of maize that are part of the maize nested association mapping panel. These annotations will serve as the baseline for development of resources to genotype and phenotype maize germplasm to identify functionally significant alleles useful for improving the hardiness, yield and nutritional content in the US and international community.
2. Used short-read-sequencing to predict natural variation in Grapevine. The recent emergence of short-read-sequencing technologies promises to dramatically accelerate the use of genetic information for crop improvement. In a collaborative study, we have used this technology for large-scale polymorphism discovery and a subsequent genome-wide assessment of the population structure and pattern of linkage disequilibrium (LD) in grapevine (genus Vitis), the world’s most economically important fruit crop. Based on 10 cultivated Vitis vinifera and 7 wild Vitis species, we produced reduce representation of their genomes using short sequence reads, producing, 2.6Gb of DNA sequence. We developed methodology used to identify 469,470 putative single nucleotide polymorphisms (SNPs) and 71,397 high-quality SNPs. Although high levels of genetic diversity resulted in difficulties in the design of the custom genotyping array, 8898 SNPs were used to develop a custom Infinium genotyping array (the Vitis9KSNP array). The project demonstrated that the cultivated grapevine has low LD even at short ranges, but that LD persists above background levels to 3kb. While genotyping arrays are useful for assessing population structure and the decay of LD across large numbers of samples, the study suggests that whole-genome sequencing will become the genotyping method of choice for genome-wide genetic mapping studies in high-diversity plant species.
Lawrence, C.J., Ware, D. 2009. Databases and data mining. In: Bennetzen, J.L., Hake, S., editors. Handbook of Maize. Vol. 2, Genetics and Genomics. 1st edition. New York, NY: Springer Science and Business Media, LLC. p. 659-672.