Submitted to: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Publication Type: Peer reviewed journal
Publication Acceptance Date: 4/9/2008
Publication Date: 1/1/2010
Citation: Lushbough, C., Bergman, M.K., Lawrence, C.J., Jennewein, D., Brendel, V. 2008. BioExtract Server - An Integrated Workflow-enabling System to Access and Analyze Heterogeneous, Distributed Biomolecular Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.98. Interpretive Summary: To be able to leverage information toward its use in application, it must be made accessible. For data sets that are large and that derive from multiple places, a need to create interfaces to access, search, and analyze the diverse data as one dataset must be addressed. The BioExtract Server enables researchers to access data from multiple sources and to analyze it with tools simultaneously. This enables researchers to spend less time analyzing existing data and more time testing derived hypotheses.
Technical Abstract: Many computational workflows in bioinformatics require access to multiple, distributed data sources and analytic tools. The requisite data sources may include large public data repositories, community databases, and project databases for use in domain-specific research. Because different data sources frequently utilize distinct query languages and return results in unique formats, researchers must either rely upon a small number of primary data sources or become familiar with multiple query languages and formats. Similarly, the associated analytic tools often require specific input formats and produce unique outputs that make it difficult to utilize the output from one tool as input to another. The BioExtract Server (http://bioextract.org) is a distributed service designed to consolidate, analyze, and serve data from heterogeneous biomolecular databases. The basic operations of the BioExtract Server allow researchers via their Web browsers to: specify data sources; flexibly query data sources with a range of relational operators; apply analytic tools; download result sets; and store query results for later reuse. As the researcher works with the system, their “steps” are saved in the background. At any time these steps can be saved as a workflow simply by providing a name and description. Once saved, these workflows can be executed and/or modified.