Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #411376

Research Project: Championing Improvement of Sorghum and Other Agriculturally Important Species through Data Stewardship and Functional Dissection of Complex Traits

Location: Plant, Soil and Nutrition Research

Title: Standardizing biocuration of genetic variation data to promote FAIRification

item TELLO-RUIZ, MARCELA - Cold Spring Harbor Laboratory
item ALI, KAZIM - University Of Karachi
item Ali, Gul - Shad
item Bassil, Nahla
item BEIER, SEBASTIAN - Ibg-4 Bioinformatics
item Bushakra, Jill
item COBO-SIMON, IRENE - Instituto Nacional De Investigacion Y Technologia Agraria Y Alimentaria
item Ware, Doreen
item WEI, SHARON - Cold Spring Harbor Laboratory
item CEZARD, TIMOTHEE - Embl-Ebi
item DYER, SARAH - Embl-Ebi
item Gutierrez, Osman
item Harrison, Melanie
item HUMANN, JODI - Washington State University
item KUMAR, VIVEK - Cold Spring Harbor Laboratory
item Nelson, Rex
item SALAVATI, MAZDAK - Roslin Institute
item SHEEHAN, MOIRA - Cornell University

Submitted to: Meeting Abstract
Publication Type: Abstract Only
Publication Acceptance Date: 1/12/2024
Publication Date: N/A
Citation: N/A

Interpretive Summary:

Technical Abstract: The Standards for Genetic Variation Data Working Group of the AgBioData Consortium brings together a community of biocurators, data providers, bioinformaticians, and computer scientists engaged in agricultural research. Late this year, the Public Genetic Resources Working Group merged with our group. Our working group’s primary tasks have evolved into the harmonization and adoption of standards for genotypic and phenotypic variation data across diverse platforms in the plant and animal kingdoms. Additionally, the group aims to promote interoperability and facilitate access to these datasets for researchers and breeders. Thanks to the FAANG (Functional Annotation of ANimal Genomes) project, there has been considerable progress in the adoption and dissemination of metadata standards for animal genetic variants. In plants, the first guidelines for findable, accessible, interoperable, and reusable (FAIR) handling of genetic variants were published in 2022. This involved direct collaboration with EMBL-EBI, one of the International Nucleotide Sequence Database Collaboration (INSDC) pillars, to support data submission to BioSamples and the European Variation Archive (EVA) global repository. A preliminary checklist was provided to classify and validate data and metadata, making significant progress in enhancing data availability. The Standards for Genetic Variation Working Group has broadened such guidelines with recommendations to crosslink sample identifiers with agricultural resources, specifically germplasm repositories like USDA-ARS GRIN (Germplasm Resources Information Network)-Global. The group also suggests including synonyms for common sample names, and include traceable population panel associations. We surveyed the AgBioData community, namely species-specific and clade-wide databases, germplasm repositories, as well as independent data producers. The goal was to gather information on existing and anticipated genetic variation data sets to facilitate adoption of standards, and promote interoperability between resources. In addition, we identified new challenges, such as the lack of reference genome assemblies in an INSDC repository or genetic variation not publicly available in standard form (e.g., VCF file), and discussed potential solutions and sustainability workflows. This includes adapting and further developing tools used to address similar problems encountered previously with human data. We will showcase how such challenges are being addressed. Progress towards the above objectives, along with the crucial need for training data generators submitting data to public repositories, is critical to make genetic variation data more FAIR for agroscience. Future plants look to link variation data sets with phenotypic data to support association studies and advancing breeding approaches.