Submitted to: BioMed Central (BMC) BioData Mining
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: September 23, 2009
Publication Date: September 23, 2009
Citation: Nelson, R., Avraham, S., Shoemaker, R.C., May, G., Ware, D., Gessler, D.D. 2009. Applications and Methods Utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for Bioinformatics Resource Discovery and Disparate Data and Service Integration. BioMed Central (BMC) BioData Mining. 10:309. Interpretive Summary: A large number of databases have been developed throughout the world. These databases house vast amounts of genetic and biological data and often provide unique analysis tools. Knowing where to find specific types of data or specific analysis tools is difficult. In this manuscript, staff with the biological databases SoyBase, Gramene and Legume Information System report the use of the semantic web protocol to make data and services more easily discoverable. This protocol will make data contained in each database more easily transferred to researchers. This protocol will also make discovering computational resources hosted by each database easier for public, private and university researchers. The use of published semantic protocol will facilitate the leveraging of resources built up by the participating databases for soybean and grass genetics to other important legume and grain species. The protocol will also make it easier to identify data that might be relevant to other non-legume or grass species.
Technical Abstract: Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of scientific data between information resources difficult and labor intensive. A recently described semantic web protocol, the Simple Semantic Web Architecture and Protocol (SSWAP; pronounced “swap”) offers the ability to describe data and services in a semantically meaningful way. We report here how three major information resources (Gramene, SoyBase and the Legume Information System [LIS]) utilized SSWAP to semantically enable their data and web services. We selected high priority Quantitative Trait Locus (QTL) data, genomic mapping data, trait and phenotypic data, and sequence data and associated services such as BLAST for publication, data retrieval, and service invocation via semantic web services. The data and services were mapped to concepts and categories as implemented in legacy and de novo community ontologies. We then used SSWAP to express these offerings in OWL RDF/XML documents appropriate for their semantic discovery and retrieval. We implemented SSWAP services to respond to web queries and return data as requested. These services are registered with the SSWAP Discovery Server and are available for semantic discovery and engagement at http://sswap.info. A total of 10 services delivering QTL information from Gramene were created. Six services delivering information about soybean QTLs, and seven services delivering information from SoyBase were created. For LIS, we constructed three services, two of which allow the retrieval of DNA and RNA FASTA sequences, with the third service providing nucleic acid sequence comparison capability (BLAST). Our implementation of approximately two dozen such services means that biological data at three large information resources (Gramene, SoyBase, LIS) is available for programmatic access, semantic searching, and enhanced interaction between the separate missions of these three resources.