Food Safety and Enteric Pathogens Research Unit Site Logo
ARS Home About Us Helptop nav spacerContact Us En Espanoltop nav spacer
Printable VersionPrintable Version     E-mail this pageE-mail this page
Agricultural Research Service United States Department of Agriculture
Search
  Advanced Search
 
Programs and Projects
Subjects of Investigation
 

Research Project: ANIMAL INTESTINAL MICROBIOMES, FOODBORNE PATHOGENS, AND ANTIMICROBIALS

Location: Food Safety and Enteric Pathogens Research Unit

Title: Estimating population diversity with CatchAll

Authors
item Bunge, John -
item Woodard, Linda -
item Bohning, Dankmar -
item Foster, James -
item Connolly, Sean -
item Allen, Heather

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: February 6, 2012
Publication Date: February 13, 2012
Citation: Bunge, J., Woodard, L., Bohning, D., Foster, J.A., Connolly, S., Allen, H.K. 2012. Estimating population diversity with CatchAll. Bioinformatics. 28(7):1045-1047. Available: http://bioinformatics.oxfordjournals.org/content/28/7/1045.long.

Interpretive Summary: Next-generation sequencing has produced massive quantities of data that are in great need of robust statistical analysis tools. In particular, estimtating total population sizes from sample sequences remains a challenge. We present a program called CatchAll that estimates total population sizes with ease and speed. Frequency count data from any population can be analyzed, including bacterial and phage diversity counts. This program uses modern statistical approaches to provide the user with the best overall output using up to 12 different methods and models. Importantly, CatchAll also offers a unique mathematical approach to discount potential outliers in a dataset. Graphical display of the outputs has been optimized in an Excel-based spreadsheet program.

Technical Abstract: The massive quantity of data produced by next-generation sequencing has created a pressing need for advanced statistical tools, in particular for analysis of bacterial and phage communities. Here we address estimating the total diversity in a population – the species richness. This is an important statistical problem with a rich literature, but to date only relatively simple methods have been implemented in readily available software. There is a need for a software package employing modern, computationally-intensive statistical procedures, with error terms, goodness-of-fit assessments, and robustness comparisons. The same methods also apply to estimating the total size of a population. We present CatchAll, a fast, easy-to-use, platform-independent software package which uses optimized numerical searching to compute maximum likelihood estimates for finite-mixture models, linear regression-based models with non-diagonal weight matrices, and all existing coverage-based nonparametric methods, while accounting for outlier detection/deletion and other data-analytic considerations. Given sample “frequency count” data, CatchAll computes 12 different diversity estimates and compares the results via a model-selection algorithm, providing the user with a best overall choice. In addition CatchAll derives model-based discounted estimates of total diversity to adjust for possibly uncertain low-frequency counts. It is accompanied by an Excel-based graphical display spreadsheet program.

   

 
Project Team
Stanton, Thaddeus
Allen, Heather
Casey, Thomas
 
Publications
   Publications
 
Related National Programs
  Food Safety, (animal and plant products) (108)
 
Related Projects
   MEMORANDUM OF UNDERSTANDING: PROMOTE ANIMAL HEALTH INITIATIVES WITH IOWA STATE UNIVERSITY
 
 
Last Modified: 06/19/2013
ARS Home | USDA.gov | Site Map | Policies and Links 
FOIA | Accessibility Statement | Privacy Policy | Nondiscrimination Statement | Information Quality | USA.gov | White House