Skip to main content
ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Crop Improvement and Genetics Research » Research » Research Project #444625

Research Project: GrainGenes- A Global Data Repository for Small Grains

Location: Crop Improvement and Genetics Research

2024 Annual Report


Objectives
GrainGenes is an international, centralized crop database for peer-reviewed small grains data and information portal that serves the small grains research and breeding communities (wheat, barley, oat, and rye). The GrainGenes project ensures long-term data curation, accessibility, and sustainability so that small grains researchers can develop new, more nutritious, disease and pest resistant, high yielding cultivars. Objective 1: Accelerate small grains (wheat, oats, barley, and rye) trait, germplasm, genetics and genomics, and breeding data analysis and information by curating small grains genome sequences, germplasm diversity information, pangenomes, trait mapping information, and phenotype data into GrainGenes. Sub-objective 1.A: Integrate small grains genome assemblies, pangenomes, and annotations into GrainGenes. Sub-objective 1.B: Integrate genetic, diversity, functional, and phenotypic data into GrainGenes with a pangenome-centric focus. Objective 2: Develop computational and visualization tools to curate, integrate, and query the genetic, genomic, and phenotypic relationships in small grains germplasm, and deploy machine learning and artificial intelligence approaches to enhance functional annotations and discover biological interactions. Sub-objective 2.A: Develop methods and pipelines to link genetic, genomic, functional, and phenotypic information and to enhance pangenome-centric focus. Sub-objective 2.B: Implement web-based and computational tools to integrate and visualize genomic data linked with genetic, expression, functional, and diversity data. Objective 3: Collaborate with database developers and plant researchers to develop improved methods and mechanisms for open, standardized data and knowledge exchange to enhance database utility and interoperability. Sub-objective 3.A: Collaborate with data and germplasm repositories and organizations to facilitate the curation, sharing, and linking of data. Objective 4: Provide community support and training for small grains researchers through workshops, webinars, and other outreach activities. Sub-objective 4.A: Facilitate communication and information sharing among the small grain communities and GrainGenes to support research needs.


Approach
As a service project, the GrainGenes team does not perform hypothesis-driven research, but rather fulfills its long-term objectives by adding value to peer-reviewed data generated by others. It provides data curation, management and integration, long-term sustainability, and digital platforms as needed. Driven by stakeholder input, GrainGenes will maintain a central location for curated genomic, genetic, functional, and phenotypic data sets, downloadable in standardized formats, enhanced by intuitive query and visualization tools. Objective 1: Our approach will be to (a) curate genomic, pangenomic, and diversity data into GrainGenes database; (b) create new genome browsers, gene model pages to aggregate and link genomic and genetic data at GrainGenes; (c) curate high-impact, peer-reviewed genetic, trait, phenotypic data into GrainGenes; (d) visualize more accurate genetic maps at GrainGenes; and (e) curate functional and structural annotations (gene ontology, enzymatic functions, protein structure). For Objective 2: we will (a) create better search indexing and linking for data discovery at GrainGenes; (b) implement computational pipelines to link and align genomic and genetic features between different genome assemblies and GrainGenes pages; (c) implement computational pipelines to link and align genomic and genetic features between different genome assemblies and GrainGenes pages; (d) implement pipelines to facilitate data curation into the GrainGenes database; (e) implement and maintain genome browsers that allow comparative viewing using JBrowse2; (f) implement and maintain genome browsers to display tracks for multiple genome assemblies; and (g) create a BLAST plug-in that can be easily installed in JBrowse instances to allow users to align their sequences against small grains genome assemblies from JBrowse. For Objective 3: we will (a) enhance links and data sharing between GrainGenes and the Triticeae Toolbox for small grains data; (b) collaborate with other data and germplasm repositories, groups and organizations to facilitate the curation, sharing, and linking of data; (c) improve data interoperability and data sharing with WheatIS; (d) coordinate with ARS databases MaizeGDB and The Triticeae Toolbox to establish distributed infrastructure to serve users faster and more reliably; and (e) actively participate in the AgBioData Consortium. For Objective 4: we will (a) present GrainGenes tools and resources in conferences and site visits; (b) create training videos to teach users how they can use GrainGenes more efficiently; (c) organize annual meetings between GrainGenes and the GrainGenes Liaison Committee to receive community feedback; (d) maintain GrainGenes and OatMail e-mail lists to help the communication among the members of small grain communities; and (e) maintain and provide digital platforms to small grain researchers as needed.


Progress Report
This report documents progress for project 2030-21000-056-000D, titled, “GrainGenes- A Global Data Repository for Small Grains”, which started in April 2023. In support of Sub-objective 1A, ARS researchers in Albany, California, continued genetic marker tracks for genome browsers shared with the Triticeae Toolbox database. 4708 new quantitative trait loci mapped with 2525 single nucleotide polymorphic markers were curated in GrainGenes from Cheng et al., Nature, 2024 (also known as the Watkins Collection). In addition, three new genome browser tracks have been added to the IWGSC Chinese Spring v1 genome browser aligning 1239 quantitative trait loci and significant markers for salinity, pathology, and agronomic traits. All quantitative trait loci now have new records in the GrainGenes Database with reciprocal links to and from the browser. For Sub-objective 1B, research continued collecting, curating, and displaying genetic, functional, phenotypic, germplasm, or trait data from small grains papers and Wheat Gene Catalogue through GrainGenes. Specifically, the collection of KASP (Kompetitive Allele Specific Polymerase chain reaction) primer sets developed to distinguish allele states of important traits in the small grains were enhanced. 878 KASP datasets from 25 journal articles were curated along with their associated quantitative trait loci. Under Sub-objective 2A, ARS researchers continued creating links from genome browsers to external databases. Specifically, tracks with AlphaFold-linked proteins for selected genome browsers allowing for links to predicted 3D structures (Jumper et al. Nature (2021) were created with links to the Alphafold site. These tracks were created for the following genome browsers: Chinese Spring IWGSC v1, Durum Wheat, Aegilops v4, Triticum urartu, Barley Morex v3, and Oat Sang. Additional external links were created to electronic fluorescent pictographic (eFP) browsers for plant gene expression profiles. Supporting Sub-objective 2B, research continued creating and releasing a JBrowse2 instance for the IWGSC Chinese Spring wheat v1 assembly. A wheat pangenome workspace was prepared to integrate genomes of 29 wheat varieties. This workspace incorporates tools like the JBrowse2 genome browser paired with AccuSyn to assist syntenic/comparative views within the germplasm matrix. Similar set-ups were paired alongside A and D progenitor species efforts. Efficient pangenome Basic Local Alignment Search Tool (BLAST) searches have been optimized to reflect pangenome alignments. A dashboard environment has been added to display customized statistics views of the collection. Precomputed BLAST results provide tables and figures to help to visualize the presence and absence variations from a pangenomic point of view. For Sub-objective 3A, ARS researchers in Albany, California, continued contributing to the creation and publication of an AgBioData survey on data sharing. Specifically, two AgBioData working groups, focused on Data Sharing and Ontologies, conducted a Consortium-wide survey to assess the status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017 (Clarke et al, Database, 2023). In support of Sub-objective 4A, research continued creating two new training videos about how to use GrainGenes in response to user needs. Both videos are available on YouTube, and links are available at https://wheat.pw.usda.gov/GG3/tutorials. The first video tutorial is based on the webinar provided for the International Wheat Genome Sequencing Consortium to describe tools and resources titled, “GrainGenes: a Centralized Nexus for Small Grains Data and Communities (2024).” The second tutorial is titled, “Search Browsers for Gene/Transcript ID,” and provides an overview of a new search feature for genomic features in GrainGenes.


Accomplishments
1. Annual site visitors to GrainGenes reached to 41,994 annually. GrainGenes (https://wheat.pw.usda.gov) is the ARS flagship database for small grains data, including wheat, barley, rye, and oat. The userbase of GrainGenes is distributed across six continents, more than half of which are located in the United States, China, and India. In comparison to the previous year, GrainGenes site visitors reached to 41,994 based on unique internet protocol (IP) addresses.

2. The predicted maize pan-interactome was harnessed for putative gene function prediction and prioritization of candidate genes for important traits. Maize, an essential crop with significant agricultural importance, has been the subject of extensive research, resulting in a wealth of genomic and phenotypic data. The recent release of the genome assemblies and annotations for the 26 maize inbred lines have enabled large-scale pan-genomic comparative studies. A study conducted by ARS researchers in Albany, California, not only provides a comprehensive resource of predicted protein-protein interaction networks for all 26 maize genomes, but also offers a means to predict protein functions and prioritize gene candidates through the analysis of interactome clusters. Their study facilitates the understanding of genotype-phenotype relationship and help breeding efforts to develop plants with desired traits.


Review Publications
Poretsky, E., Andorf, C.M., Sen, T.Z. 2024. PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models. Plant Direct. 7(12). Article e554. https://doi.org/10.1002/pld3.554.
Alaux, M., Dyer, S., Sen, T.Z. 2023. Wheat data integration and FAIRification: IWGSC, GrainGenes, Ensembl and other data repositories. In: Appels, R., Eversole, K., Feuillet, C., Gallagher, D., editors. The Wheat Genome. Cham, CH: Springer. p. 13-25. https://doi.org/10.1007/978-3-031-38294-9_2.
Poretsky, E., Cagirici, H.B., Andorf, C.M., Sen, T.Z. 2024. Harnessing the predicted maize pan-interactome for putative gene function prediction and prioritization of candidate genes for important traits. Genetics. 14(5). Article jkae059. https://doi.org/10.1093/g3journal/jkae059.
Wight, C.P., Blake, V.C., Jellen, E.N., Yao, E., Sen, T.Z., Tinker, N.A. 2024. One hundred years of comparative genetic and physical mapping in cultivated oat (Avena sativa). Crop and Pasture Science. 75(2). Article CP23246. https://doi.org/10.1071/CP23246.
Grewal, S., Yang, C., Scholefield, D., Ashling, S., Ghosh, S., Swarbreck, D., Collins, J., Yao, E., Sen, T.Z., Wilson, M., Yant, L., King, I., King, J. 2024. Chromosome-scale genome assembly of bread wheat’s wild relative Triticum timopheevii. Scientific Data. 11. Article 420. https://doi.org/10.1038/s41597-024-03260-w.
Jellen, E.N., Wight, C.P., Spannagl, M., Blake, V.C., Chong, J., Herrmann, M., Howarth, C.N., Huang, Y., Juqing, J., Katsiotis, A., Langdon, T., Li, C., Park, R., Tinker, N.A., Sen, T.Z. 2024. A uniform gene and chromosome nomenclature system for oat (Avena spp.). Crop and Pasture Science. 75(1). Article CP23247. https://doi.org/10.1071/CP23247.