|Coe jr, Edward|
Submitted to: Omics - A Journal Of Integrative Biology
Publication Type: Peer reviewed journal
Publication Acceptance Date: 4/10/2003
Publication Date: 5/1/2003
Citation: KAZIC, T., COE JR, E.H., POLACCO, M.L., SHYU, C. WHITHER BIOLOGICAL DATABASE RESEARCH?. OMICS - A JOURNAL OF INTEGRATIVE BIOLOGY. 2003. v. 7. p. 61-65. Interpretive Summary: Modern biological databases are more and more significant to research scientists and to the public, who want them to be complete, accurate, current, fast, easy to use, and permanent. The difficulty in meeting these desires is scale. Today, any database that displayed all these properties would have very large curatorial and programming staffs; complete control over its vocabularies, usage, and design; coordinated access to any other databases with which it interacts; and an inexhaustible supply of money. One might trade smaller staffs and less money for narrower coverage, less accuracy and currency, more terminology confusion, less automating, and shorter database lifetimes. None of these trades answers the very real needs of modern biological researchers or of the public community that wishes to access the information. All result in more work for scientists, curators, technical staff, and program directors. This paper identifies current trends in biological information handling that can be expected to influence forward progress (and even bring about simplification), and we propose several areas in which research will contribute to improvements in access and content of databases. These areas of research include improved methods for compiling and maintaining databases, and ways to overcome obstacles to communication among databases. This information will be important to biological researchers in their attempts to mine data for details used to design more efficient crops.
Technical Abstract: We consider how the landscape of biological databases may evolve in the future, and what research is needed to realize this evolution. We suggest today's dispersal of diverse resources will only increase as the number and size of those resources, driving the need for semantic interoperability even more strongly. Because the complexity of the questions biologists want answered continues to rapidly escalate, we will need to draw upon high-performance computing resources such as the GRID to process complex queries. Finally, we still need data, and our ways of acquiring and curating data must improve by orders of magnitude. Research is advocated into (1) improved methods for database population and maintenance; (2) semantic interoperability of databases; (3) deep domain content; (4) complex queries; (5) mobilizing more powerful machinery; (6) complex and combined models; (7) use of multidimensional, multivariate, and multistructured data; and (8) understanding the mathematics and statistics of biology.