Author
Manter, Daniel | |
BAKKER, MATTHEW |
Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 6/25/2015 Publication Date: 7/1/2015 Citation: Manter, D.K., Bakker, M.G. 2015. Estimating beta diversity for under-sampled communities using the variably weighted Odum dissimilarity index and OTUshuff. Bioinformatics. 31(21):3451-3459. doi: 10.1093/bioinformatics/btv394 Interpretive Summary: The size and diversity of microbial populations pose challenges to characterizing and contrasting communities. For instance, sequence-based analyses of bacterial populations in soil frequently suggest the presence of more than 106-109 individuals distributed across thousands of distinct taxa per gram of soil. Although the advent of high-throughput sequencing technologies has increased the depth at which we are able to census microbial taxa, in most cases sampling still detects only a fraction of the diversity present at a given site leading to erroneous estimates of the true difference between sites and/or treatments (i.e., pseudo ' -diversity). In this paper, we present a new analytical method, OTUshuff, for the statistical comparison of two samples that is insensitive to pseudo '-diversity. We also propose a new suite of distance measures, the weighted Odum distance (DwOdum). DwOdum distance score is flexible, allowing for either abundant or rare OTUs to be down-weighted depending upon the user’s interest. Based on both the simulations and actual data, we show that the down-weighting of rare OTUs results in more accurate estimates of '-diversity, particularly when populations are under-sampled. Conversely, the down-weighting of abundant taxa can lead to increased sensitivity in hypothesis testing or the ability to determine if two samples are significantly different. Technical Abstract: Characterization of complex microbial communities by DNA sequencing has become a standard technique in microbial ecology. Yet, particular features of this approach render traditional methods of community comparison problematic. In particular, a very low proportion of community members are typically sampled and spurious taxa (e.g., resulting from sequencing errors) in datasets can generate varying levels of pseudo '-diversity. A robust measure of '-diversity should minimize such errors. We present a new analytical method, OTUshuff, for the statistical comparison of two samples that is insensitive to pseudo '-diversity. We also propose a new suite of distance measures, the weighted Odum distance (DwOdum). The DwOdum distance score is flexible, allowing for either abundant or rare OTUs to be down-weighted depending upon the user’s interest. In particular, we are interested in the down-weighting of rare OTUs as a means to minimize pseudo '-diversity arising from incomplete sampling. We illustrate the utility of OTUshuff, DwOdum, and their combination, using simulated data. In addition, we use actual bacterial 16S pyrosequencing data derived from a set of diverse agricultural and forest sites in order to evaluate samples for the presence of pseudo '-diversity and to increase the accuracy of '-diversity estimates even in severely under-sampled communities. Based on both the simulations and actual data, we show that the down-weighting of rare OTUs results in more accurate estimates of '-diversity, particularly when populations are under-sampled. Conversely, the down-weighting of abundant taxa can lead to increased sensitivity in hypothesis testing or the ability to determine if two samples are significantly different. |