Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #417423

Research Project: Championing Improvement of Sorghum and Other Agriculturally Important Species through Data Stewardship and Functional Dissection of Complex Traits

Location: Plant, Soil and Nutrition Research

Title: Building a FAIR data ecosystem for incorporating single-cell transcriptomics data into agricultural genome to phenome research

Author
item KAPOOR, MUSKAN - Iowa State University
item VENTRUA, ENRIQUE SAPENA - Embl-Ebi
item WALSH, AMY - University Of Missouri
item SOKOLOV, ALEXEY - Embl-Ebi
item GEORGE, NANCY - Embl-Ebi
item KUMARI, SUNITA - Cold Spring Harbor Laboratory
item PROVART, NICHOLAS - University Of Toronto
item COLE, BENJAMIN - Lawrence Berkeley National Laboratory
item LIBAULT, MARC - University Of Missouri
item TUGGLE, CHRISTOPHER - Iowa State University
item TICKLE, TIMOTHY - Broad Institute Of Mit/harvard
item WARREN, WESLEY - University Of Missouri
item KOLTES, JAMES - Iowa State University
item PAPATHEODOROU, IRENE - Earlham Institute
item Ware, Doreen
item HARISSON, PETER - Embl-Ebi
item ELSIK, CHRISTINE - University Of Missouri
item YORDANOVA, GALABINA - Embl-Ebi
item BURDETT, TONY - Embl-Ebi

Submitted to: Frontiers in Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/13/2024
Publication Date: 11/28/2024
Citation: Kapoor, M., Ventrua, E., Walsh, A., Sokolov, A., George, N., Kumari, S., Provart, N.J., Cole, B., Libault, M., Tuggle, C.K., Tickle, T., Warren, W.C., Koltes, J., Papatheodorou, I., Ware, D., Harisson, P., Elsik, C., Yordanova, G., Burdett, T. 2024. Building a FAIR data ecosystem for incorporating single-cell transcriptomics data into agricultural genome to phenome research. Frontiers in Genetics. Vol. 15. https://doi.org/10.3389/fgene.2024.1460351.
DOI: https://doi.org/10.3389/fgene.2024.1460351

Interpretive Summary: The agriculture genomics community has numerous data submission standards available, but the standards for describing and storing single-cell RNA-seq data are not well defined. However, the Human Cell Atlas and the Single Cell Expression Atlas at EMBL-EBI offer a comprehensive data ingestion portal for high-throughput sequencing datasets, including plants, protists, and animals. The FAANG data portal at EMBL-EBI emphasizes delivering rich metadata and highly accurate and reliable annotation of farmed animals but is not computationally linked to either of these resources. Herein, we have provided the gaps and challenges of different scRNAseq portals and described whether the current FAANG metadata standards for livestock can be used to ingest scRNA-seq datasets similar to Human Cell Atlas standards.

Technical Abstract: The agriculture genomics community has numerous data submission standards available, but the standards for describing and storing single-cell (SC, e.g., scRNA-seq) data are comparatively underdeveloped. To bridge this gap, we leveraged recent advancements in human genomics infrastructure, such as the integration of the Human Cell Atlas Data Portal with Terra, a secure, scalable, open-source platform for biomedical researchers to access data, run analysis tools, and collaborate, co-developed by the Broad Institute of MIT and Harvard, Microsoft, and Verily. In parallel, the Single Cell Expression Atlas at EMBL-EBI offers a comprehensive data ingestion portal for high-throughput sequencing datasets, including plants, protists, and animals (including humans). Developing data tools connecting these resources would offer significant advantages to the agricultural genomics community. The FAANG data portal at EMBL-EBI emphasizes delivering rich metadata and highly accurate and reliable annotation of farmed animals but is not computationally linked to either of these resources. Herein, we describe a pilot-scale project that determines whether the current FAANG metadata standards for livestock can be used to ingest scRNA-seq datasets into Terra in a manner consistent with HCA Data Portal standards. Importantly, rich scRNA-seq metadata can now be brokered through the FAANG data portal using a semi-automated process, thereby avoiding the need for substantial expert curation. We have further extended the functionality of this tool so that validated and ingested SC files within the HCA Data Portal are transferred to Terra for further analysis. In addition, we verified data ingestion into Terra, hosted on Azure, and demonstrated the use of a workflow to analyze the first ingested porcine scRNA-seq dataset. Additionally, we have also developed prototype tools to visualize the output of scRNA-seq analyses on genome browsers to compare gene expression patterns across tissues and cell populations. This JBrowse tool now features distinct tracks, showcasing PBMC scRNA-seq alongside two bulk RNA-seq experiments. We intend to further build upon these existing tools to construct a scientist-friendly data resource and analytical ecosystem based on Findable, Accessible, Interoperable, and Reusable (FAIR) SC principles to facilitate SC-level genomic analysis through data ingestion, storage, retrieval, re-use, visualization, and comparative annotation across agricultural species.