Skip to main content
ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Crop Improvement and Genetics Research » Research » Research Project #425427

Research Project: Small Grains Database and Bioinformatics Resources

Location: Crop Improvement and Genetics Research

2018 Annual Report

Over the next 5 years the project will focus on the following specific objectives as part of the long-term purpose to synthesize, display, and provide access to small grains genomics and genetics data for the research community and applied users. Objective 1: Annotate wheat, barley and oat whole genome sequences in collaboration with the crop research communities and integrate with genetic, physical, and trait maps. • Sub-objective 1.A. - Contribute to wheat genome annotations and incorporation of small grains annotations into GrainGenes. • Sub-objective 1.B. - Collaborate in integrating small grains genetic, physical, and trait maps. • Sub-objective 1.C. - Modifying GrainGenes with enhanced user tools in accessing genomic and mapping data. Objective 2: Integrate genotyping and phenotyping results from the Triticeae Coordinated Agricultural Project (T-CAP) including the T3 database, the National Small Grains Collection and GRIN database, and Gramene, to enhance support for trait analysis by association mapping and trait improvement by genomic selection. • Sub-objective 2.A. - Collaborate in developing common standards describing phenotypes and traits across species. • Sub-objective 2.B. - Convert data from GRIN, ARS Genotyping Laboratories, and the small grains Regional Field Nurseries to GrainGenes database formats. • Sub-objective 2.C. – Modify the GrainGenes schema to accommodate increased data volume and utilization. Objective 3: Collate, analyze, and present trait data from wheat, barley and oat communities to facilitate the genetic improvement of target traits and trait gene isolation. • Sub-objective 3.A. - Collate data on target traits. • Sub-objective 3.B. - Implement tools and interfaces for map displays. Objective 4: Maintain existing and develop new user community outreach. • Sub-objective 4.A. - Solicitation of user community input. • Sub-objective 4.B. - Training and education for use of GrainGenes resources. Objective 5: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources [NP301, C2, PS2A].

1) Contribute to the annotation of whole genome sequences of wheat, barley, and oats in collaboration with the research community along with other national and international small grains genomics efforts. 2) Incorporation of genomic sequences and maps (genetic, physical, trait) into GrainGenes. To include integration of maps from multiple sources and related data sets already represented within GrainGenes. 3) Integrate genotyping and phenotyping data into GrainGenes. To include collaborating the GRIN, Gramene, and the Triticeae T-CAP project. 4) Modify the GrainGenes web site with enhanced user tools for accessing data, implement tools and interfaces for enhanced map displays, and modify the GrainGenes database schema to accommodate larger data sets. To include a complete rewrite and redeign of the GrainGenes web site and databases. 5) Enhanced research community outreach through regular solicitation of user community input, development of social medium tools for data access and user training, and develop formal training manuals and training manuals for GrainGenes users.

Progress Report
This is the final report for this project which has been replaced by 2030-21000-024-00D, “GrainGenes: Enabling Data Access and Sustainability for Small Grains Researchers”. For additional information, see the report for the new project. The availability of DNA sequences as high-quality reference genomes for plants has made significant impacts on organizing the genetic information and maps of crop plants. Genomic sequence data is increasingly available for wheat, barley, rye, and oat. Wheat data from a significant 90K Single Nucleotide Polymorphism (SNP) marker project was added to build bridges between mapping populations and the International Wheat Genome Sequencing Consortium (IWGSC) reference maps. Probe data and maps were made available with links to the scaffolds provided by IWGSC. In parallel, mapping studies concentrating on rust resistance loci were added to GrainGenes. These resources include markers and map locations for genes essential for addressing the international threat to food security posed by the Ug99 fungus. In addition to the reference hexaploid wheat Triticum aestivum or ABD genome provided by IWGSC, data from the sequencing and mapping of a progenitor wheat species related to the ancestor of the D-genome of wheat was added; it contained 6732 molecular markers arrayed on the physical map. Barley data have been added from a reference map that is a consensus of detailed genetic maps published in 2012. This map is being used to close the gaps in the barley genome sequence. As a start, the GrainGenes project prepared links and maps for the 15,718 reference genes. Building connections of genetic and physical maps to BAC clones from cultivars ‘Morex’, ‘Barke’, and ‘Bowman’ germplasm is in progress. Mapping data for barley yield quantitative trait locus (QTLs) were also added to the database. Oat data from the molecular markers used to build the first physically-anchored hexaploid oat map were added. These markers were discovered in the six mapping populations that were used to build the first complete genetic reference maps. Also added to GrainGenes were data from a mapping study that employed probes from a 6000-bead microarray. The new array is the platform the oat community is using to build new maps with additional populations from crosses between breeding lines currently used for crop improvement. Project scientists added genetic maps published in 2009 for rye and in 2012 for Leymus spp to GrainGenes. These can serve as references for now and as templates for soon-to-be available genetic maps featuring the newer molecular markers being generated with next-generation sequencing technology. Modifications were made to the website. GrainGenes 3.0 utilizes a content management system (CMS) based on Drupal that expedites some of the routine administrative tasks such as updates of events and news. The CMS is now built around the relational database structure currently underlying GrainGenes. Tests are underway to utilize and improve upon relational and visualization platforms developed within the Generic Model Organism Database (GMOD) initiative (CMap, Chado, Tripal, and Jbrowse), and apply them to the GrainGenes environment. Several tools for analyses of genomic and genetic data are hosted at the GrainGenes website. New tools developed by the project are NetVenn, which uses protein sequence data to facilitate genome-wide comparison of orthologous clusters across multiple species; Arabidopsis interactome module (AIM), a plant protein interaction database that can provide valuable insights into the function of a protein of interest; and OrthoVenn, an interactive Web application for evolutionary and functional comparisons of genes from different plant species. Also added to GrainGenes is a tool developed by a stakeholder group at University of California, Davis, WheatExp, for analyzing wheat transcriptome/expression database. Such tools will allow researchers to discriminate among individual genes and orthologues in studies of the expression of genes that underlie traits. Over the course of the last year, the GrainGenes website, database, and all other hosted websites were moved to a new hardware platform. The new hardware provides a substantial improvement in processing power, storage capacity, and memory. This permits data to be accessed more quickly and allows for larger, more complex data sets to be hosted. The new hardware will also have better reliability and a reduced maintenance footprint. Towards the goal of expanding vocabularies to improve database interoperability, previously annotated disease resistance genes curated in other species were used to find best sequence matches in the wheat reference genome. These matches were added as an annotation track to the genome browser and will serve as leads to candidate resistance genes for wheat breeders. Two new tools were added to the GrainGenes platform. One displays wheat gene expression data broken down by tissue type and stages of development. This is of use to molecular biologists seeking to identify candidate genes that control various aspects of wheat growth and yield and adaptation to environmental stress. The other tool facilitates the design of DNA primers that can distinguish among genes residing on different genomes in polyploids such as wheat and cotton. This tool is applicable to breeders designing molecular markers to follow traits in breeding programs. The added tools were developed in collaborations between stakeholders and GrainGenes staff, under the framework of Objective 3. Reference genomes for small grains were collected and populated on the GrainGenes website with plans to highlight new research data by generating new visualization tracks. Improvements in the ability to display linkage disequilibrium (LD) data were incorporated using the JBrowse genome visualization tool. The ARS SciNet initiative aims to bring physical infrastructure, connectivity and support staff all together to allow ARS scientists to derive useful information from the very large datasets that are increasingly common results of their research. This year, a member of the GrainGenes research team led the establishment of the first functional Internet 2 link between the Albany, California, location and the ARS high-performance computing center in Ames, Iowa. In FY17, GrainGenes collaborated with the International Wild Emmer Wheat Genome Sequencing Consortium and visualized the sequence and annotations of wild emmer wheat (‘Zavitan’) to place the first downloadable complete genome sequence and an associated genome browser on GrainGenes. Annotated releases of DNA sequence information from reference genomes for wheat, barley, and rye were placed into the JBrowse visualization tool. In FY17, several high-impact map sets were added to the GrainGenes database. In addition, a multitude of other small grains papers were curated, thus contributing to the collection of genetic map, QTL, locus, gene, sequence, germplasm, and related data types that are stored in GrainGenes. A summary of those curation activities follows: 1) Marker Assisted Selection (MAS) Wheat data, 2) National Small Grains Collection durum wheat stem rust data, 3) durum wheat pre-harvest sprouting data, 4) hexaploid oat consensus map markers, 5) barley 9K iSelect single nucleotide polymorphism (SNP) data, 6) tetraploid wheat consensus map markers, and 7) wheat landrace consensus map. The GrainGenes team made extensive changes to the GrainGenes Home Page to improve content, ensure consistent visual communication, and provide means for user feedback and interaction. Changes include adding links for data download, a GrainGenes mailing list, and tutorials; increasing the number of “Species Portals” and moving the links for Annual Wheat Newsletter, Barley Genetics Newsletter and Oat Newsletter to the front page to give them more visibility; itemizing and dating “GrainGenes Updates” entries, creating “Quick Links,” and putting a “Feedback” button on the header of each page to improve communication with our users to open direct communication channels with users, with a 24-hour response time to assure users that their feedback is valued. GrainGenes has housed sequence databases for many projects, and access to over sixty databases has been organized into a single navigable page. Several additional genetic maps were uploaded onto GrainGenes including the hexaploid oat consensus map, tetraploid wheat consensus map, and wheat landrace consensus Map. The details for each map are as follows: 1) The hexaploid oat consensus map was created from 12 biparental recombinant inbred line (RIL) populations, 2) Tetraploid Wheat Consensus Map was assembled using genotypic data of 13 tetraploid wheat mapping populations, 3) Wheat Landrace Consensus Map, viewable in CMap, contains 21 linkage groups corresponding to the chromosomes of hexaploid wheat and includes over 2,400 genetic loci, and 4) 85,545 SNP markers from a 600,000-member genotyping array were positioned into the reference scaffolds under development for the rye genome. For public outreach, an area was created on the front page for updates: 1) to increase the reach of GrainGenes updates, 2) attract new users, 3) inform users about available tools, and 4) broadcast new data curated at GrainGenes. Scientific articles of interest to the small grains community as well as job postings were shared by GrainGenes. Such public outreach efforts help GrainGenes to continue to be a hub for small grains communication and a knowledge resource.


Review Publications
Horvath, D.P., Patel, S., Dogramaci, M., Chao, W.S., Anderson, J.V., Foley, M.E., Scheffler, B., Lazo, G., Dorn, K., Yan, C., Childers, A., Schatz, M., Marcus, S. 2018. Gene space and transcriptome assemblies of leafy spurge (Euphorbia esula) identify promoter sequences, repetitive elements, high-quality markers, and a full-length chloroplast genome. Weed Science. 66(3):355-367.
Odell, S.G., Lazo, G.R., Woodhouse, M.R., Hane, D.L., Sen, T.Z. 2017. The art of curation at a biological database: principles and application. Current Plant Biology. 11-12:2-11.