PiscesLogoSmallerStill Worked Example - Cicada Song

Top  Previous  Next

Demonstration data set: cicada.csv.


Reference: Ohya, E., 2004. Identification of Tibicen cicada species by a Principal Components Analysis of their songs. Anais da Academia Brasileira de Ciências 76: 441-444.


Ohya (2004) used recordings of cicadas to demonstrate that the songs of different species could be differentiated using PCA. This example shows the use of PCA to compare the features of time series. It also shows that standard measurements for known types, in this case species, can be included in the data set so that samples can be assigned to groups by their proximity to these standards within the ordination space.


Preliminary data examination and transformation

The data set comprised observations of Peak frequency (Hz), Mean Frequency (Hz) and No of pulses per 0.2 s. Recordings were made on 12 individuals of unknown species and 3 standard sets for the species Tibicen japonicus, T. flammatus and T. bihamatus. No transformations were undertaken.


The use of the correlation matrix

PCA was undertaken on the correlation matrix; this was essential if the ordination was not to be dominated by frequency measurements, which were between 5000 and 7000, while the pulse rate ranged between 8 and 20. It would also have been possible to have rescaled frequency in KHz and used the variance-covariance matrix.



As shown in the table below, the first 2 axes explained about 98% of the total variability in the data set, demonstrating that a 2 D graph can show the relationship between the cicada species. The sum of all the eigenvalues, which is a measure of the total variability, is 3, which is simply the sum of the number of variables used in the analysis, because the correlation matrix was used with 3 variables. The first dimension only has one eigenvalue > 1), but the second is required to distinguish between T. japonicus and T. flammatus.




Cumulative percentage of the total variance








Figure 1 is a biplot of the eigenvectors and the sample scores. They can be effectively placed on the same graph because of the small number of variables and samples included in this study. The 3 samples for known species have been marked as large squares and labelled with the species name.

The 2D plots shows a clear separation between the species and the clustering of the unknown samples around the T. bihamatus standard indicates that all samples except for S1 can be assigned to this species. S1 has a song very similar to that of T. japonicus.


Fig 1: PCA plot of the song of 3 species of cicada.



As the author stated “The cluster analysis of the PCA scores clearly separated T. japonicus, T. flammatus and T. bihamatus from each other and allocated the samples as expected.“ He did include a warning: “However, one should collect real specimens with each sound recording in order to check the result of this method.”


Alternative approaches

There are no other methods that work as well as a PCA using the correlation matrix for data of this type.