Submitted to: Meeting Abstract
Publication Type: Abstract Only
Publication Acceptance Date: March 22, 2012
Publication Date: June 19, 2012
Citation: Allen, H.K., Bunge, J., Foster, J.A., Stanton, T.B. 2012. Estimating richness from phage metagenomes [abstract]. American Society for Microbiology General Meeting, June 16-19, 2012, San Francisco, California. Paper No. 2462. Technical Abstract: Bacteriophages are important drivers of ecosystem functions, yet little is known about the vast majority of phages. Phage metagenomics, or the study of the collective genome of an assemblage of phages, enables the investigation of broad ecological questions in phage communities. One ecological characteristic is species richness, or the number of different species in a community. Phages do not have a phylogenetic marker analogous to the bacterial 16S rRNA gene with which to estimate richness, and so contig spectra are employed. A contig spectrum is generated from a phage metagenome by assembling the random sequence reads into contigs (groups of sequences that overlap) and counting the number of sequences that group within each contig. Current tools available to analyze contig spectra to estimate richness, such as those employed by PHACCS, are limited by relying on rank-abundance data. Here we present improvements to estimating phage species richness based on contig spectra. The program Catchall (http://www.northeastern.edu/catchall/) was implemented to analyze a contig spectrum in terms of frequency count data rather than rank-abundance, thus enabling statistical analyses. Also, we reasoned that phage metagenomic contig spectra contain a large number of potentially spurious singletons, due to both biological (generalized transduction) and technical (no process to remove sequencing errors) phenomena. Catchall was therefore modified to implement an optional statistical procedure that discounts the low-frequency observations. The results show greater estimates of phage species richness than those reported by PHACCS in nearly all environments analyzed, including swine feces and reclaimed fresh water. The discounting procedure possibly yielded more biologically relevant data, with phage richness estimates in the thousands per sample compared to hundreds of thousands per sample without discounting. Moreover, the standard errors associated with the estimates decreased dramatically by using the discounting procedure. Using Catchall to analyze contig spectra will improve calculations of phage species richness from metagenomes, particularly from large datasets.