Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #429210

Research Project: Championing Improvement of Sorghum and Other Agriculturally Important Species through Data Stewardship and Functional Dissection of Complex Traits

Location: Plant, Soil and Nutrition Research

Title: Propagating rsIDs Across Crop Pan-Genomes in Gramene Platform Using the Ensembl Variant Remapping Pipeline

Author
item CHOUGULE, KAPEEL - Cold Spring Harbor Laboratory
item KIM, SUYAN - Cold Spring Harbor Laboratory
item WEI, SHARON - Cold Spring Harbor Laboratory
item OLSON, ANDREW - Cold Spring Harbor Laboratory
item LU, ZHENYUAN - Cold Spring Harbor Laboratory
item TELLO-RUIZ, MARCELA - Cold Spring Harbor Laboratory
item Ware, Doreen

Submitted to: Meeting Abstract
Publication Type: Abstract Only
Publication Acceptance Date: 11/5/2025
Publication Date: 11/5/2025
Citation: Chougule, K., Kim, S., Wei, S., Olson, A., Lu, Z., Tello-Ruiz, M., Ware, D. 2025. Propagating rsIDs Across Crop Pan-Genomes in Gramene Platform Using the Ensembl Variant Remapping Pipeline. Meeting Abstract. Genome Informatics Conference.

Interpretive Summary:

Technical Abstract: The Reference SNP cluster ID (rsID) has long been the standard identifier for genetic variation in human genomics, enabling stable cross-referencing across databases, assemblies, and studies. Its persistence across reference versions has transformed population genetics, medical research, and clinical applications. Following this success, rsIDs are now increasingly adopted in plant genomics through the European Variation Archive (EVA), which has assigned hundreds of millions or even billions of identifiers to crop genomes. This adoption ensures that variants are referenced independently of a single genome build, simplifying integration, promoting FAIR data stewardship, and enabling reproducible, trait-driven analyses. Gramene has adopted rsIDs as a unifying framework to consolidate genetic variation knowledge across species, improve phenotype prediction, and enhance trait-based marker discovery. By integrating EVA-assigned rsIDs into its variation module, Gramene provides consistent identifiers for millions of variants, decoupling genetic variation data from specific assemblies and supporting pan-genome scale interoperability. Currently, rsIDs have been integrated into four major crop genomes: Sorghum (41M), Rice (67M), Maize (78M), and Grape (0.3M). Together this represents more than 193 million rsIDs standardized across crop species across Gramene and its pan-sites. As additional pan-genomes and breeding lines are sequenced, the propagation of rsIDs from reference genomes to new assemblies provides an efficient alternative to re-calling variants for each accession. Using EVA’s Ensembl Variant Remapping pipeline, rsIDs are mapped with high success, achieving ~98% accuracy between reference assembly versions and ~87% across pan-genomes. These stable identifiers are directly accessible through Gramene’s genome browser, where remapped rsIDs are made available as searchable variant tracks and gene-level annotations. This integration allows researchers to link variants with gene function, trait associations, and orthologous loci across different accessions and assemblies within a species, while maintaining consistency with EVA’s species-specific rsID assignments. The adoption and propagation of rsIDs provide a stable framework for managing plant genetic variation across assemblies and accessions, ensuring that resources remain interoperable, FAIR, and directly useful for breeding and translational research. Support for this work is provided by USDA-ARS grant 8062-21000-051-000D.