|
Top Previous Next |
Demonstration data set: Petrology.csv
Reference: P. C. Ragland, J. F. Conley, W. C. Parker, and J. A. Van Orman, 1997, Use of principal components analysis in petrology: an example from the Martinsville igneous complex, Virginia, U.S.A. Mineralogy and Petrology 60:165-184.
This example is based on the study by Ragland et al. (1997). This paper examined the utility of PCA for the analysis of the relationship between geological structures using a chemical dataset for the Martinsville igneous complex (MIC), Virginia, USA. The study sought to answer 4 main questions:
Preliminary data examination and transformation The data set comprised data on the percentage weight of the oxides of 10 major elements and the concentration in parts per million of 3 trace elements (Rb, Sr and Zr). The authors checked the variables for normality and noted that MgO was not normal and so log-transformed this variable. This transformation does in fact make no difference to the ordination or the resulting conclusions.
The use of the correlation matrix PCA was undertaken on the correlation matrix; this was essential if the ordination was not to be dominated by the three variables with the largest variance because of their magnitude (Zr, Sr and Mg). This is the correct choice if it is believed that all elements can potentially equally contribute to the study of the relationships between the rocks.
Results As shown in the table below, the first 2 axes explained about 72.9 % of the total variability in the data set. The sum of all the eigenvalues, which is a measure of the total variability, is 14, which is simply the sum of the number of variables used in the analysis, because the correlation matrix was used. Therefore the percentage variability explained by the largest eigenvalue is 7.28/14 x 100 = 52.01%. The first 3 dimensions are probably meaningful (eigenvalues > 1).
The authors reported a higher percentage of the variability explained by the first two axes probably because they combined the percentage composition for the two iron oxides into a single variable. However, this makes little difference to the ordination produced.
These results show that much of the variability in chemical composition can be expressed in 2 dimensions.
The plot of the eigenvectors (Fig 1) shows that Principal Axis 1 arranges the samples so that those with the highest concentrations of Ca, Fe, Al, Mn, Sr, P and Ti are towards the left (negative) and those with highest concentrations of Si, Rb and K to the right (positive). Axis 2 is a measure of Zr and Na concentration with the greatest concentrations at the bottom (negative direction) of the axis. The authors recognised 4 groups of eigenvectors. 1) Si, Rb, K; 2) Ca, Mg, 3) Fe, Al, Sr, Mn; 4) P, Ti; and 5) Na, Zr. When grouping eigenvectors you must consider the angle between them, not the length of the vector. The present results would suggest that P and Ti make a poor group, but they can be viewed as intermediate between {Zr, Na} and {Fe, Al, Sr, Mn}.
An examination of the 2D plots (Fig 2) shows a clear clustering of the rock samples. When the samples are grouped according to their mineralogy it is clear that the PCA ordination based on chemistry produces a similar classification. For example, that the syenodiorite can best be distinguished by its relatively high Na and Zr contents. The granites are characterized by relatively high Si, Rb, and K, and the Rich Acres gabbros being relatively enriched in Mg, Ca, and Fe. The authors do not consider these finding “particularly a surprise” and if the PCA “only confirmed the mineralogical groupings and chemical differences easily apparent, they would be of limited value.” The particular value of the PCA is in showing relationships and hybrids. For instance, the hybrid Leatherwood rocks (blue squares) are intermediate in composition between the granites (yellow squares) and the diorites (red squares).
Fig 1: A plot of the eigenvectors.
Fig 2: PCA of the rock samples.
Conclusions Ragland et al. (1997) concluded that Principal Component Analysis was useful: “... PCA is an insightful tool in petrology and geochemistry and is recommended as a first-step, exploratory technique for a dataset of chemical analyses. It allows the researcher to determine which of the original variables may be the most useful in characterizing the dataset.” Further, it is capable of identifying possible relationships and hybrids and thus can be used as an aid when generating hypotheses about relationships between and origins of rocks.
Alternative approaches Ragland et al. (1997) used the correlation matrix for the PCA which had the effect of giving equal weighting to every element. The plot below shows the ordination of the sites using the variance-covariance matrix calculated with all variables log transformed. It is interesting to note that essentially the same clusters are formed but the eigenvectors show a number of tight pairs {Zr, Na}, {K, Rb}, {Mn, Fe} and {Ca, Mg}. This plot also shows that it possible to place the samples and the variable eigenvectors on the same plot. Some authors, including Ragland et al. (1997), plot only the apex of the eigenvectors. To avoid confusion, this should be avoided; the relationship between eigenvectors given by their angular difference is more easily studied if they are plotted as vectors.
Fig 3: PCA for petrology example based on the covariance matrix.
|