What is CCA?

Top Previous Next

Canonical correspondence analysis (CCA; ter Braak 1986, 1994) is an ordination method in which the ordination of the biological (main) matrix by correspondence analysis or reciprocal averaging is constrained by a multiple regression on the variables included in the environmental matrix.

In ecological terms, the ordination of sites and species is constrained by their relationships to environmental variables. If the environmental variables included are major determinants of community structure and abundance of species changes along these environmental gradients then this technique aids interpretation of community structure and identification of the features that mould it.

The alternative approach is to use ordination techniques such as Correspondence Analysis or Principal Components Analysis which ordinate the community (sites-species) data alone and then undertake an auxiliary analysis to identify the environmental variables that correlate most strongly with the ordination axes.

CCA excels at representing community data sets where: (1) species responses are unimodal (hump-shaped), and (2) the important underlying environmental variables have been measured. Note that condition 1 causes problems for methods assuming linear response curves (PCA) but causes no problems for CCA, according to ter Braak (1986, 1994). Condition 2 results from the environmental matrix being used to constrain the ordination results, unlike any other ordination technique apart from Canonical Correlation. For this reason, CCA has been called a method for "direct gradient analysis" (ter Braak 1986).

CCA is currently one of the most popular ordination techniques in community ecology. It is, however, one of the most dangerous in the hands of people who do not take the time to understand this relatively complex method. The dangers lie in several areas: (1) Because it includes multiple regression of community gradients on environmental variables, it is subject to all of the hazards of multiple regression. Multicollinearity is a particular problem and it is easy to believe that a relatively high coefficient of multiple correlation implies a highly significant result which it may not. Further, it must be remembered that the method uses linear regression, it is quite likely that the response of the community to changes in an environmental variable may not be linear. (2) As the number of environmental variables increases relative to the number of observations, the results become increasingly dubious as the appearance of very strong relationships becomes inevitable. (3) Statistics indicating the "percentage of variance explained" can be calculated in several ways, each for a different question, but users frequently confuse these statistics when reporting their results.

CCA does not explicitly calculate a distance matrix. But CCA, like CA and PCA, is implicitly based on the chi-squared distance measure where samples are weighted according to their totals (Chardy et al. 1976; Minchin 1987a). This gives high weight to species whose total abundance in the data matrix is low, thus exaggerating the distinctiveness of samples containing several rare species (Faith et al., 1987; Minchin 1987a).