Location: Genomics and Bioinformatics Research
Project Number: 6066-21310-004-13-S
Project Type: Specific Cooperative Agreement
Start Date: Sep 15, 2011
End Date: Sep 15, 2016
Advances in biotechnology have led to tremendous increases in biomolecular data. For example, over the last thirty years the number of nucleotides in GenBank, an online DNA/protein sequence repository, has literally doubled every month. Analysis and utilization of exponentially increasing quantities of biomolecular data has required more intimate association of biology with high performance computing. The single-processor bioinformatics tools written in the last few years are already proving inadequate for deriving biological information from large data sets in a timely fashion. Moreover, such huge volumes of data have created a need for more powerful visualization tools that can translate digital data into intuitive graphical formats. We will generate new data analysis/visualization tools specifically designed for use on cluster supercomputers. Parallelized programs provide the built-in scalability required for the rapidly growing computational biology community.
We will develop high-throughput analysis pipelines for rapidly and accurately integrating genomic, transcriptomic, proteomic, metabolomic, and phenotypic data for species of importance to U.S. agriculture. Research will focus on expediting the association of genotype with phenotype while defining the biomolecular interactions that link the two. Unlike most existing bioinformatics tools, our algorithms and pipelines will employ parallel processing and other high-performance computing (HPC) principles from their inception, thus permitting scaling of computer resources to adequately meet the storage and memory needs of a wide-array of projects. In addition to de novo tool development, we will work to upgrade existing tools using HPC concepts. An important component of our work will be development of effective ways to visualize complex relationships among diverse data sets. To make our analyzed data as accessible and understandable as possible, we will utilize gene ontology (GO) techniques to annotate and “cross-link” molecular data.