Author
CLARKE, JENNIFER - University Of Nebraska | |
COOPER, LAURA - Oregon State University | |
Poelchau, Monica | |
BERARDINI, TANYA - Phoenix Bioinformatics | |
ELSER, JUSTIN - Oregon State University | |
FARMER, ANDREW - National Center For Genome Resources | |
FICKLIN, STEPHEN - Washington State University | |
KUMARI, SUNITA - Cold Spring Harbor Laboratory | |
LAPORTE, MARIE-ANGELIQUE - Bioversity International | |
Nelson, Rex | |
SADOHARA, RIE - Michigan State University | |
SELBY, PETER - Cornell University | |
THESSEN, ANNE - University Of Colorado | |
WHITEHEAD, BRANDON - Manaaki Whenua Landcare Research | |
Sen, Taner |
Submitted to: ArXiv
Publication Type: Pre-print Publication Publication Acceptance Date: 7/18/2023 Publication Date: 7/18/2023 Citation: Clarke, J., Cooper, L., Poelchau, M.F., Berardini, T., Elser, J., Farmer, A., Ficklin, S., Kumari, S., Laporte, M., Nelson, R., Sadohara, R., Selby, P., Thessen, A.E., Whitehead, B., Sen, T.Z. 2023. Data Sharing and Ontology use among Agricultural Genetics, Genomics and Breeding Databases and Resources of the AgBioData Consortium. ArXiv. arXiv:2307.08958. https://doi.org/10.48550/arXiv.2307.08958. DOI: https://doi.org/10.48550/arXiv.2307.08958 Interpretive Summary: Agricultural databases are important resources for scientists to find up-to-date, accurate information and data on genetics, genomics and breeding topics. In turn, accessing this information helps scientists produce better research that helps farmers. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 agricultural databases and resources. In this article, we describe how well AgBioData databases perform in two areas – data sharing and ontologies, which refers to a way to structure data to make it meaningful to computer programs. We developed an assessment of AgBioData databases to measure this. The results suggest that AgBioData databases share data in meaningful ways, but there may be room for improvement for sharing descriptions of the data (metadata). We also found that ontology use has not changed since a similar survey was conducted in 2017. We recommend improvements in both areas by 1) providing training resources; 2) further study how best to sharing descriptions of the data (metadata); 3) teaching customers why data sharing and ontologies are important; 4) working on how to share and describe phenotypic data (data that describes physical attributes of crops or animals); and 5) finding ways to sustain databases, development standards for data types. These recommendations should help databases improve data sharing and ontology use. Technical Abstract: Over the last couple of decades, there has been rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model and crop plants, animals, ontologies and breeding platforms. One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of data sets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups focused on Data Sharing and Ontologies, respectively, conducted a consortium-wide survey to assess the current status and future needs of the members in those areas. Most of the 33 respondents from 37 databases represented plant databases (72.7%), while others represented various livestock animal, fishes, insect or model organism databases. Results suggest that data sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is possible for all metadata and data types across all databases; and that ontology use has not substantially changed since a similar survey was conducted in 2017. We recommend improvements in both areas by 1) providing training resources for specific data sharing techniques, as well as ontology use, for database personnel; 2) further study on what metadata is shared, and how well it is shared among databases; 3) promoting an understanding of data sharing and ontologies in the user/stakeholder community; 4) prioritizing data sharing improvements for specific phenotypic data types and formats; and 5) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and identification, promotion, or development of data standards. Combined, these improvements are likely to help databases increase development efforts towards increased ontology use, and data sharing via programmatic means. |