Submitted to: Database: The Journal of Biological Databases and Curation
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/2/2016
Publication Date: 3/29/2016
Citation: Manter, D.K., Korsa, M.S., Tebbe, C.R., Delgado, J.A. 2016. myPhyloDB: a local web server for the storage and analysis of metagenomics data. Database: The Journal of Biological Databases and Curation. doi:10.1093/database/baw37.
Interpretive Summary: myPhyloDB is a user-friendly personal database with a browser-interface designed to facilitate the storage, processing, analysis, and distribution of metagenomics data. MyPhyloDB archives raw sequencing files, and allows for easy selection of project(s)/sample(s) of any combination from all available data in the database. The advent of next-gen-sequencing has resulted in the rapid growth of metagenomic studies across a wide range of ecosystems. However, this increase in data generation capacity has resulted in the need for appropriate data stewardship procedures. While considerable effort has focused on archiving data in public databases (e.g., SRA), this raw data is not in an easy to use format, nor is it designed to facilitate comparative studies. Furthermore, although there has been a dramatic increase in the software tools available to process and analyze metagenomics data (e.g., Mothur , QIIME, METAGENassist), none of these are focused on the storage and/or retrieval of multi-project datasets for largescale comparative metagenomics. Comparative and/or cross-location studies can provide unique insights into the relationships driving microbial community structure and function. For example, national networks have been implemented in a variety of natural (e.g., LTER, NEON) and managed (e.g., GRACEnet, LTAR) systems; however, the tools necessary to compare microbial distribution and abundance across these networks do not currently exist and myPhyloDB represents one of the first national efforts in this direction. As such, we envision myPhyloDB to be the building block to develop both small-scale private, and public, web-services to help achieve this goal.
Technical Abstract: myPhyloDB is a user-friendly personal database with a browser-interface designed to facilitate the storage, processing, analysis, and distribution of metagenomics data. MyPhyloDB archives raw sequencing files, and allows for easy selection of project(s)/sample(s) of any combination from all available data in the database. The data processing capabilities of myPhyloDB are also flexible enough to allow the upload and storage of pre-processed data, or use the built-in Mothur pipeline to automate the processing of raw sequencing data. myPhyloDB provides several analytical (e.g., ANOVA, t-tests, linear regression, DESeq2, and PCoA) and normalization (rarefaction, DESeq2, proportion) tools for the comparative analysis of taxonomic abundance, species richness, and species diversity for projects of various types (e.g., Human-associated, Human gut microbiome, Air, Soil, and Water) for any taxonomic level(s) desired. Finally, since myPhyloDB is a local web-server, users can quickly distribute data between colleagues and end-users by simply granting others access to their personal myPhyloDB database. myPhyloDB is open source software and can be obtained from: http://www.ars.usda.gov/services/software/download.htm?softwareid=xxxx.