Skip to main content
ARS Home » Midwest Area » Ames, Iowa » National Animal Disease Center » Food Safety and Enteric Pathogens Research » Research » Publications at this Location » Publication #279308

Title: Estimation of viral richness from shotgun metagenomes using a frequency count approach

item Allen, Heather
item BUNGE, JOHN - Cornell University
item FOSTER, JAMES - University Of Idaho
item Bayles, Darrell
item Stanton, Thaddeus

Submitted to: Microbiome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/2/2012
Publication Date: 2/4/2013
Citation: Allen, H.K., Bunge, J., Foster, J.A., Bayles, D.O., Stanton, T.B. 2013. Estimation of viral richness from shotgun metagenomes using a frequency count approach. BMC Microbiome. 1(5). Available:

Interpretive Summary: Phages, the viruses that infect bacteria, are the most abundant biological entities on earth. Studying them relies on the culturability of their host bacteria, but most bacteria are not readily culturable by standard laboratory techniques. The study of the collective genome of an assemblage of phages, which is called phage metagenomics, is one way to access the vast diversity of phages. The number of phage species in an environment (phage richness) is one important aspect of phage diversity. However, previous tools to study the richness of phage metagenomes are limited by their underlying mathematical principles. To improve calculations of phage richness from metagenomes, we recently developed a program called Catchall to analyze the richness of any population data. In this study we apply Catchall to the analysis of phage metagenomic data. The results show much greater estimates of phage species richness than previously reported in all environments analyzed, including swine feces and fresh water. For example, Catchall revealed roughly 1000 to 100,000 phage species per swine fecal phage metagenome, whereas previous tools estimated fewer than 1000 species. Using Catchall to analyze phage metagenomic data will improve estimations of species richness, particularly from large datasets.

Technical Abstract: Bacteriophages are important drivers of ecosystem functions, yet little is known about the vast majority of phages. Phage metagenomics enables the investigation of broad ecological questions in phage communities. One ecological characteristic is species richness, or the number of different species in a community. Phages do not have a phylogenetic marker analogous to the bacterial 16S rRNA gene with which to estimate richness, and so contig spectra are employed. A contig spectrum is generated from a phage metagenome by assembling the random sequence reads into groups of sequences that overlap (contigs) and counting the number of sequences that group within each contig. Current tools available to analyze contig spectra to estimate richness are limited by relying on rank-abundance data. Here we present improvements to estimating phage species richness from contig spectra. The program Catchall ( was implemented to analyze contig spectra in terms of frequency count data rather than rank-abundance, thus enabling statistical analyses. Also, a statistical discounting procedure was employed to deflate the influence of potentially spurious low-frequency counts on richness estimates. The results show greater estimates of phage species richness than those reported by PHACCS in nearly all environments analyzed, including swine feces and reclaimed fresh water. Using Catchall to analyze contig spectra will improve calculations of species richness from phage metagenomes, particularly from large datasets.