Skip to main content
ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Crop Improvement and Genetics Research » Research » Research Project #425427

Research Project: Small Grains Database and Bioinformatics Resources

Location: Crop Improvement and Genetics Research

2016 Annual Report

Over the next 5 years the project will focus on the following specific objectives as part of the long-term purpose to synthesize, display, and provide access to small grains genomics and genetics data for the research community and applied users. Objective 1: Annotate wheat, barley and oat whole genome sequences in collaboration with the crop research communities and integrate with genetic, physical, and trait maps. • Sub-objective 1.A. - Contribute to wheat genome annotations and incorporation of small grains annotations into GrainGenes. • Sub-objective 1.B. - Collaborate in integrating small grains genetic, physical, and trait maps. • Sub-objective 1.C. - Modifying GrainGenes with enhanced user tools in accessing genomic and mapping data. Objective 2: Integrate genotyping and phenotyping results from the Triticeae Coordinated Agricultural Project (T-CAP) including the T3 database, the National Small Grains Collection and GRIN database, and Gramene, to enhance support for trait analysis by association mapping and trait improvement by genomic selection. • Sub-objective 2.A. - Collaborate in developing common standards describing phenotypes and traits across species. • Sub-objective 2.B. - Convert data from GRIN, ARS Genotyping Laboratories, and the small grains Regional Field Nurseries to GrainGenes database formats. • Sub-objective 2.C. – Modify the GrainGenes schema to accommodate increased data volume and utilization. Objective 3: Collate, analyze, and present trait data from wheat, barley and oat communities to facilitate the genetic improvement of target traits and trait gene isolation. • Sub-objective 3.A. - Collate data on target traits. • Sub-objective 3.B. - Implement tools and interfaces for map displays. Objective 4: Maintain existing and develop new user community outreach. • Sub-objective 4.A. - Solicitation of user community input. • Sub-objective 4.B. - Training and education for use of GrainGenes resources. Objective 5: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources [NP301, C2, PS2A].

1) Contribute to the annotation of whole genome sequences of wheat, barley, and oats in collaboration with the research community along with other national and international small grains genomics efforts. 2) Incorporation of genomic sequences and maps (genetic, physical, trait) into GrainGenes. To include integration of maps from multiple sources and related data sets already represented within GrainGenes. 3) Integrate genotyping and phenotyping data into GrainGenes. To include collaborating the GRIN, Gramene, and the Triticeae T-CAP project. 4) Modify the GrainGenes web site with enhanced user tools for accessing data, implement tools and interfaces for enhanced map displays, and modify the GrainGenes database schema to accommodate larger data sets. To include a complete rewrite and redeign of the GrainGenes web site and databases. 5) Enhanced research community outreach through regular solicitation of user community input, development of social medium tools for data access and user training, and develop formal training manuals and training manuals for GrainGenes users.

Progress Report
The availability of deoxyribonucleic acid (DNA) sequences as high-quality reference genomes for maize, sorghum and rice has made significant impacts on organizing the genetic information and maps of these crop plants. Genomic sequence data is increasingly available for wheat, barley and oat, but thus far, complete reference genomes for these crops have not been published. Nevertheless, to meet Objective 1, the available DNA sequences have been used to build visual annotation tracks for navigating the content of the wheat and barley genomes on the GrainGenes Genome Browser, and these have been added to the GrainGenes database resource. As the genomic sequences become more refined with each update, new data will be incorporated into the GrainGenes platform. Towards the goal of expanding vocabularies to improve database interoperability, previously annotated disease resistance genes curated in other species were used to find best sequence matches in the wheat reference genome. These matches were added as an annotation track to the genome browser and will serve as leads to candidate resistance genes for wheat breeders. The database continues to evolve in form and function with refinements in the interface between users and the content management system (CMS). The sorting and listing of large output data tables was improved. More functional modules were added to the interface to extend the capabilities of the database. To meet Sub-objective 2C, the CMS is now built around the GrainGenes relational database. Tests are underway to utilize and improve upon relational and visualization platforms developed within the GMOD (Generic Model Organism Database) schema (CMap, Chado, Tripal, and Jbrowse), and to incorporate the most useful of them into the GrainGenes environment. Two new tools were added to the GrainGenes platform, both of which are accessible from the GrainGenes front page. One displays wheat gene expression data broken down by tissue type and stages of development. This will be of use to molecular biologists seeking to identify candidate genes that control various aspects of wheat growth and yield and adaptation to environmental stress. The other tool facilitates the design of DNA primers that can distinguish among genes residing on different genomes in polyploids such as wheat and cotton. This tool can be used by breeders to design molecular markers to follow traits in breeding programs. The added tools were developed in collaborations between stakeholders and GrainGenes staff, under the framework of Objective 3. Strategic discussions are ongoing through weekly conference calls with members of the Triticae Toolbox (T3) database that houses genotypic and phenotypic data from the Triticeae Coordinated Agricultural Project (T-CAP). Under consideration are ways of integrating data and data identifiers to best serve the small grains research community. As a low-effort, high-return initiative, a need to match germplasm designations in the two databases was identified. Data on variation within the populations studied by investigators in the T-CAP project will be connected to gene annotation data in GrainGenes to generate hypotheses about the functions of those genes as manifested in traits of interest to wheat and barley breeders. To further address objective 3, reference genomes were collected and populated on the GrainGenes website with plans to highlight new research data by generating new visualization tracks. Improvements in the ability to display linkage disequilibrium (LD) data were incorporated by the use of the Jbrowse genome visualization tool. Reduced staffing precluded major progress in meeting the goals of Objective 4; however, user community input was solicited at conferences attended by GrainGenes personnel. The ARS SciNet initiative aims to bring physical infrastructure, connectivity and support staff all together to allow ARS scientists to derive useful information from the very large datasets that are increasingly common results of their research. This year, a member of the GrainGenes research team led the establishment of the first functional Internet 2 link between the Albany, California location and the ARS high-performance computing center in Ames, Iowa. With further improvements in accessibility, this link will provide Albany location scientists with the resources they need to translate their data into knowledge and applications. In particular, the ability to use this resource efficiently will aid genome studies by efficiently using remote resources.

1. A useful tool for molecular marker development in polyploid crops. Many of our important crop plants, including wheat and cotton, were formed by the natural hybridization of two or more ancestral plants that each contained two copies of each chromosome. Such plants contain more than one genome and are said to be polyploid. Because of the multiple copies, it is difficult to use gene amplification or Polymerase Chain Reaction (PCR) technology to identify which specific versions of each gene are present in any given variety of a polyploid crop plant. To address this need, ARS scientists in Albany, California, developed a user-friendly web-based application tool for easy and effective design of PCR primers that can distinguish sequences from the different genomes within a polyploid species. The tool is a new resource for crop improvement researchers who employ PCR-based technology, such as the molecular markers that breeders use to follow different gene versions in genetic crosses. It is freely available to the community at


Review Publications
Pearce, S., Vazquez-Cross, H., Herin, S.Y., Hane, D., Wang, Y., Gu, Y.Q., Dubcovsky, J. 2015. WheatExp: an RNA-seq expression database for polypoid wheat. Biomed Central (BMC) Plant Biology. 15:299.
Wang, Y., Tiwari, V.K., Rawat, N., Gill, B.S., Huo, N., You, F.M., Coleman-Derr, D.A., Gu, Y.Q. 2016. GSP: a web-based platform for designing genome-specific primers in polyploids. Bioinformatics. 32(15):2382-2383.