Publication : USDA ARS

ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #399297

Research Project: Mapping Crop Genome Functions for Biology-Enabled Germplasm Improvement

Location: Plant, Soil and Nutrition Research

Title: Creating a FAIR data ecosystem for incorporating single cell genomics data into agricultural G2P research

Author

	KAPOOR, MUSKAN - Iowa State University
	SOKOLOV, ALEXEY - Embl-Ebi
	VENTURA, ENRIQUE SEPENA - Embl-Ebi
	YORDANOVA, GALABINA - Embl-Ebi
	PROVART, NICHOLAS - University Of Toronto
	PAPATHEODOROU, IRENE - Embl-Ebi
	GEORGE, NANCY - Embl-Ebi
	Ware, Doreen
	KUMARI, SUNITA - Cold Spring Harbor Laboratory
	TICKLE, TIMOTHY - Massachusetts Institute Of Technology
	COLE, BENJAMIN - Lawrence Berkeley National Laboratory
	BURDETT, TONY - Embl-Ebi
	HARRISON, PETER - Embl-Ebi
	TUGGLE, CHRISTOPHER - Iowa State University

Submitted to: Meeting Abstract
Publication Type: Abstract Only
Publication Acceptance Date: 10/14/2022
Publication Date: N/A
Citation: N/A

Interpretive Summary:

Technical Abstract: The agriculture genomics community has numerous data submission standards available, but little experience in describing and storing single cell (e.g. scRNAseq) data. Other single cell genomics infrastructure efforts, such as the Human Cell Atlas Data Coordination Platform (HCA DCP), have resources that could benefit our community. For example, the HCA DCP is integrated with Terra, a cloud-native workbench for computational biology developed by Broad, Verily and Microsoft that houses tools for scGenomics analysis at scale. We will describe a pilot-scale project to determine if our current metadata standards for livestock and crops can be used to ingest scRNAseq datasets in a manner consistent with HCA DCP standards and if established resources (e.g. Terra) can be used to analyze the ingested data. Currently, the most comprehensive data ingestion portal for high throughput sequencing datasets from plants, fungi, protists and animals (including human) at the European Bioinformatics Institute, Annotare, ensures that sufficient metadata are collected to enable re-analysis and dissemination via the Single Cell Expression Atlas knowledgebase (SCEA). To support use of controlled vocabularies, Annotare supports an ontology auto-complete function that allows the users to search for and use the appropriate terms from many ontologies and can readily be used to process and search single cell data via the SCEA and transferred to the Galaxy analysis space for further analysis. All experiments submitted to ArrayExpress via annotare are manually curated by bioinformaticians. There is another portal that is limited to animal single cell datasets, the FAANG portal, provides access to bulk and scRNAseq data. scRNAseq data/metadata can be submitted to the FAANG using a semi-automated process where files can be validated using the HCA DCP metadata and data validation service. Once incorporated, datasets are used to augment this resource for use by the scientific community. These files are also incorporated using EMBL-EBI’s HCA DCP ingestion service, and then transferred to Terra for further analysis. We intend to build upon these existing tools to construct a scientist-friendly data resource and analytical ecosystem to facilitate single cell-level genomic analysis through data ingestion, storage, retrieval, re-use, visualization, and comparative annotation across agricultural species.

U.S. DEPARTMENT OF AGRICULTURE

Plant, Soil and Nutrition Research: Ithaca, NY