Submitted to: Plant Ecology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: July 8, 2009
Publication Date: February 20, 2010
Citation: Goslee, S.C. 2010. Correlation analysis of dissimilarity matrices. Plant Ecology. 206:279-286. Interpretive Summary: One group of widely-used statistical methods requires dissimilarity matrices to be calculated from the raw data. In ecology, the most widely used examples are ordination, cluster analysis, and the Mantel test. Questions have been raised on the relationship between the multivariate dissimilarity and the original data, an issue that affects the correct use of these statistical techniques. This paper uses simulated data to demonstrate a clear relationship between the raw data and the multivariate dissimilarities for data that meet certain criteria. The relationship also explains how and why the Mantel test coefficient does not in practice behave like the correlation coefficient it is derived from. This novel statistical understanding will be highly useful for ecologists, geneticists, and others using dissimilarity-based statistical methods.
Technical Abstract: Distance-based methods have been a valuable tool for ecologists for decades. Ordination and cluster analysis in particular have been widely practiced because they allow the visualization of a multivariate dataset in a few dimensions. The Mantel test and its relatives add hypothesis testing to the distance-based toolbox, but no information is available on when to combine data vectors into a single multivariate dissimilarity or when to treat them independently. For Euclidean distances on scaled data, the correlation of a pair of multivariate distance matrices can be calculated from the correlations between the two sets of individual distance matrices, demonstrating a clear link between univariate and multivariate distances. This relationship also provides a means for understanding the maximum possible value of the Mantel statistic, which can be considerably less than 1 for a given analysis. Scaling of the data, whether standardization or other means, is an essential component of any dissimilarity analysis where multiple variables are used to calculate one coefficient.