Problems with Species-Environment Correlation
|Top Previous Next|
The following words of caution are from
Influence of noisy environmental data on canonical correspondence analysis
by Bruce McCune
Ecology, Dec 1997.
As the number of variables in the second matrix increases to near the maximum possible (one less than the number of sample units), the species-environment correlation always converges on 1.0. At that point, the LC scores (sites scores calculated as a linear combination of the environmental variables) and WA scores (sites scores calculated as a weighted average of the species abundances) are identical because the second matrix exerts no influence over the results (ter Braak 1994), since the large number of variables in the second matrix can support any pattern found by the weighted averaging step with the species matrix.
Using a wide variety of real data sets, it is clear that the species-environment correlation is almost always high. It is usually > 0.6, and often much higher. This seems to be true regardless of other criteria for performance of the ordination, such as interpretability or proportion of variance in the species matrix that is explained.
For these reasons, I conclude that (1) the species-environment correlation is a poor criterion for evaluating the success of an ordination, (2) the species-environment correlation should not be interpreted literally as a measure of the strength of the relationship between species and the environment, and (3) the statistical significance of the species-environment correlation, even when it appears very high, should always be checked with randomization tests.
The problems caused by large numbers of noisy environmental variables cannot be alleviated by using a stepwise selection procedure, similar to that commonly used in multiple regression. If the environmental data are noisy, stepwise selection will simply pick out the best random variables, and the species-environment correlation will still be misleadingly high. As with multiple regression, the parsimony of the procedure is set by the size of the pool of independent variables, not the number of independent variables selected.
I recommend that the species-environment correlation not be reported as such. If it is reported at all, a more appropriate name would be the "LC-WA correlation" to avoid misleading the reader. If it is reported, it should be accompanied by a randomization test for statistical significance. A better (but still imperfect) measure of the strength of the relationship between species and environmental matrices is the proportion of variance in the species data that is captured by the environmental data.