Skip to main content
ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Crop Improvement and Genetics Research » Research » Research Project #425427

Research Project: Small Grains Database and Bioinformatics Resources

Location: Crop Improvement and Genetics Research

2014 Annual Report


Objectives
Over the next 5 years the project will focus on the following specific objectives as part of the long-term purpose to synthesize, display, and provide access to small grains genomics and genetics data for the research community and applied users. Objective 1: Annotate wheat, barley and oat whole genome sequences in collaboration with the crop research communities and integrate with genetic, physical, and trait maps. • Sub-objective 1.A. - Contribute to wheat genome annotations and incorporation of small grains annotations into GrainGenes. • Sub-objective 1.B. - Collaborate in integrating small grains genetic, physical, and trait maps. • Sub-objective 1.C. - Modifying GrainGenes with enhanced user tools in accessing genomic and mapping data. Objective 2: Integrate genotyping and phenotyping results from the Triticeae Coordinated Agricultural Project (T-CAP) including the T3 database, the National Small Grains Collection and GRIN database, and Gramene, to enhance support for trait analysis by association mapping and trait improvement by genomic selection. • Sub-objective 2.A. - Collaborate in developing common standards describing phenotypes and traits across species. • Sub-objective 2.B. - Convert data from GRIN, ARS Genotyping Laboratories, and the small grains Regional Field Nurseries to GrainGenes database formats. • Sub-objective 2.C. – Modify the GrainGenes schema to accommodate increased data volume and utilization. Objective 3: Collate, analyze, and present trait data from wheat, barley and oat communities to facilitate the genetic improvement of target traits and trait gene isolation. • Sub-objective 3.A. - Collate data on target traits. • Sub-objective 3.B. - Implement tools and interfaces for map displays. Objective 4: Maintain existing and develop new user community outreach. • Sub-objective 4.A. - Solicitation of user community input. • Sub-objective 4.B. - Training and education for use of GrainGenes resources. Objective 5: Facilitate the use of genomic and genetic data, information, and tools for germplasm improvement, thus empowering ARS scientists and partners to use a new generation of computational tools and resources [NP301, C2, PS2A].


Approach
1) Contribute to the annotation of whole genome sequences of wheat, barley, and oats in collaboration with the research community along with other national and international small grains genomics efforts. 2) Incorporation of genomic sequences and maps (genetic, physical, trait) into GrainGenes. To include integration of maps from multiple sources and related data sets already represented within GrainGenes. 3) Integrate genotyping and phenotyping data into GrainGenes. To include collaborating the GRIN, Gramene, and the Triticeae T-CAP project. 4) Modify the GrainGenes web site with enhanced user tools for accessing data, implement tools and interfaces for enhanced map displays, and modify the GrainGenes database schema to accommodate larger data sets. To include a complete rewrite and redeign of the GrainGenes web site and databases. 5) Enhanced research community outreach through regular solicitation of user community input, development of social medium tools for data access and user training, and develop formal training manuals and training manuals for GrainGenes users.


Progress Report
The GrainGenes project advanced in many small areas as data and opportunity presented themselves, as described below. But the three major areas of greatest concern as existing deficiencies were: 1. handling large quantities of genotyping data, alleles of many markers for many germplasm lines; 2. handling large quantities of phenotype data for many germplasm lines; and 3. integration of genome sequence data anchored to genetic maps The third was not directly addressed this year, except for planning and discussion. For the first two, it was recognized that large-scale genotyping and phenotyping data require a paradigm shift in the user interface for accessing them, relative to the existing GrainGenes interface. Briefly, the traditional GrainGenes approach is designed for direct human viewing, delivering a single record at a time. The record is usually complex and contains links to all directly relevant additional records. This "browse" interface works very well for some purposes but is inherently counter to the usage requirements of genotype and phenotype data, which are not amenable to direct human interpretation without loading into statistical analysis software to detect the patterns. For example, imagine a Germplasm record in GrainGenes that contained a field "Genotype_Data" with the alleles for say 50,000 markers and a field "Phenotype_Data" with the values for thirty traits in ten trials. To be useful, such a record would, at minimum, need to be combined with many other Germplasm records and its contents heavily re-structured for analysis. Rather than trying to adapt GrainGenes to meet this need better within the existing constraints, we have taken the longer-term approach of developing the best system possible, designed for this purpose from the ground up: The Triticeae Toolbox (T3). By doing so, in collaboration with an active group of users who depend on the system for their daily work, we have learned much about the data storage and delivery requirements and optimizations that we could never have foreseen a priori. For example, single-record viewing is not part of the primary workflow, whereas querying simultaneously on the attributes of up to three "key" entity types (e.g. germplasm, trait, trial) is. In addition, built-in tools for visualization and preliminary analysis are necessary components for satisfactory usage. A second, future phase of integrating T3 with GrainGenes will involve trying to merge the two interface paradigms without sacrificing the strengths of either. These activities relate the subordinate project 5325-21000-021-03R, "Improving Barley and Wheat Germplasm for Changing Environments" to Objective 2: "Integrate genotyping and phenotyping results from the Triticeae Coordinated Agricultural Project (T-CAP) including the T3 database..." A new area of focus this year was work toward development of an Application Programming Interface (API) for exchange of genotype and phenotype data among databases like GrainGenes and T3, and with external software tools to analyze the data. Encouraged by National Program Staff, this direction was seen as a appropriate activity under Sub-objective 2.A, "Collaborate in developing common standards describing phenotypes and traits across species." Therefore we accepted an invitation from the Bill and Melinda Gates Foundation to a two-day work session with international cooperators to create a "Breeding API" in support of crop breeders in Sub-Saharan Africa and South Asia. The results of this collaboration are being made public as they emerge at http://docs.breeding.apiary.io/ Subordinate project 5325-21000-021-01T, "The North American Collaborative Oat Research Enterprise (CORE)" is an activity under Objective 3, "Collate, analyze, and present trait data from wheat, barley and oat communities..." Progress on this collaboration included: 1. Presenting T3 as a possible repository for existing and future oat genotype and phenotype data at a planning workshop in Winnipeg, a proposal which was enthusiastically approved. 2. Creating a web portal "Oat Global" to support coordination of future research, http://wheat.pw.usda.gov/OG/. Another major step forward under Objective 3 was the establishment of The Breeders Datafarm, as the official phenotype and genotype database of the US Wheat and Barley Scab Initiative. Using the T3 software and schema, this database will hold not only current and future results from the annual Fusarium head blight nurseries but also the accumulated historical data. It is online at http://malt.pw.usda.gov/t3/bd/ and is being populated with data by the Initiative's participants with help from us as needed. Communications with our users under Objective 4: "Maintain existing and develop new user community outreach" were greatly advanced by creating a GrainGenes Liaison Committee. This committee subsequently met with us, heard our reports on the current status, and produced a very useful set of recommendations. In addition, the T3 project holds teleconferences with its User Group at two-month intervals, and invites particular stakeholders to its weekly staff teleconference when relevant.


Accomplishments
1. Creation of the Breeders Datafarm. The Breeders Datafarm is the database of phenotype and genotype (alleles of genes and trait-linked markers) data derived from elite wheat and barley germplasm being used as breeding material by participants in the US Wheat and Barley Scab Initiative (USWBSI). Although this project has performed well-organized uniform nursery trials for over 15 years, it previously had no database to store the results in, archiving them in annual reports as spreadsheet files. The ARS-funded USWBSI, headquartered in East Lansing Michigan, provisionally adopted this database at its annual meeting in December 2013. The database and its built-in software tools, developed initially for the Triticeae Coordinated Agricultural Project with substantial effort from GrainGenes, will greatly assist wheat and barley breeding programs using genomics-integrated, marker-assisted selection to produce new improved germplasm and released varieties. ARS scientists in Albany, California assisted in planning the project and providing computer, network, security and software infrastructure and ongoing maintenance.


Review Publications
Wang, Y., Thilmony, R.L., Gu, Y.Q. 2014. Net Venn - An integrated network analysis web platform for gene lists. Nucleic Acids Research. DOI: 10.1093/narlgku331.
Noyszewski, A., Ghavami, F., Al-Nimer, L., Soltani, A., Gu, Y.Q., Huo, N., Meinhardt, S., Kianian, P., Kianian, S. 2014. Accelerated evolution of the mitochondrial genome in an alloplasmic line of durum wheat. Biomed Central (BMC) Genomics. 15:67.
Fox, S.E., Geniza, M., Hanumappa, M., Naihani, S., Sullivan, C., Preece, J., Tiwari, V.K., Elser, J., Leonard, J.M., Sage, A., Gresham, C., Kerhornou, A., Bolser, D., Mccarthy, F., Kersey, P., Lazo, G.R., Jaiswal, P. 2014. De Novo Transcriptome Assembly and Analyses of Gene Expression during Photomorphogenesis in Diploid Wheat Triticum monococcum. PLoS One. 9:E96855.
Nigro, D., Gu, Y.Q., Huo, N., Marcotuli, I., Blanco, A., Gadaleta, A., Anderson, O.D. 2013. Structural analysis of the wheat genes encoding NADH-dependent glutamine-2-oxoglutarate amidotransferases genes and correlation with grain protein content. PLoS One. 8(9):e73751.