Location: Plant, Soil and Nutrition Research
Title: Creating a FAIR data ecosystem for incorporating single cell genomics data into agricultural G2P researchAuthor
KAPOOR, MUSKAN - Iowa State University | |
SOKOLOV, ALEXEY - Embl-Ebi | |
VENTURA, ENRIQUE SEPENA - Embl-Ebi | |
YORDANOVA, GALABINA - Embl-Ebi | |
PROVART, NICHOLAS - University Of Toronto | |
PAPATHEODOROU, IRENE - Embl-Ebi | |
GEORGE, NANCY - Embl-Ebi | |
Ware, Doreen | |
KUMARI, SUNITA - Cold Spring Harbor Laboratory | |
TICKLE, TIMOTHY - Massachusetts Institute Of Technology | |
COLE, BENJAMIN - Lawrence Berkeley National Laboratory | |
BURDETT, TONY - Embl-Ebi | |
HARRISON, PETER - Embl-Ebi | |
TUGGLE, CHRISTOPHER - Iowa State University |
Submitted to: Meeting Abstract
Publication Type: Abstract Only Publication Acceptance Date: 10/14/2022 Publication Date: N/A Citation: N/A Interpretive Summary: Technical Abstract: The agriculture genomics community has numerous data submission standards available, but little experience in describing and storing single cell (e.g. scRNAseq) data. Other single cell genomics infrastructure efforts, such as the Human Cell Atlas Data Coordination Platform (HCA DCP), have resources that could benefit our community. For example, the HCA DCP is integrated with Terra, a cloud-native workbench for computational biology developed by Broad, Verily and Microsoft that houses tools for scGenomics analysis at scale. We will describe a pilot-scale project to determine if our current metadata standards for livestock and crops can be used to ingest scRNAseq datasets in a manner consistent with HCA DCP standards and if established resources (e.g. Terra) can be used to analyze the ingested data. Currently, the most comprehensive data ingestion portal for high throughput sequencing datasets from plants, fungi, protists and animals (including human) at the European Bioinformatics Institute, Annotare, ensures that sufficient metadata are collected to enable re-analysis and dissemination via the Single Cell Expression Atlas knowledgebase (SCEA). To support use of controlled vocabularies, Annotare supports an ontology auto-complete function that allows the users to search for and use the appropriate terms from many ontologies and can readily be used to process and search single cell data via the SCEA and transferred to the Galaxy analysis space for further analysis. All experiments submitted to ArrayExpress via annotare are manually curated by bioinformaticians. There is another portal that is limited to animal single cell datasets, the FAANG portal, provides access to bulk and scRNAseq data. scRNAseq data/metadata can be submitted to the FAANG using a semi-automated process where files can be validated using the HCA DCP metadata and data validation service. Once incorporated, datasets are used to augment this resource for use by the scientific community. These files are also incorporated using EMBL-EBI’s HCA DCP ingestion service, and then transferred to Terra for further analysis. We intend to build upon these existing tools to construct a scientist-friendly data resource and analytical ecosystem to facilitate single cell-level genomic analysis through data ingestion, storage, retrieval, re-use, visualization, and comparative annotation across agricultural species. |