Data set statistics 
Top Previous Next 
Summary statistics for both the raw and working data sets are displayed by clicking on the Summary tab.
You can choose summary statistics for either the raw or working data sheets. Statistics can be generated for rows, columns, general summary statistics and the userdefined groups for your data. These statistics are particularly useful prior to undertaking many analyses. For example PCA should not be undertaken on a data set that comprises large numbers of zeros. Use Summary statistics to determine the number of zero elements in your data. The group statistics are also useful when summarising the results of a Linear Discriminant Analysis (also called Canonical Variate Analysis).
When first activated the data grid will display the following general statistics for the working data:
No. of Variables (rows)  This is the number of rows of data in the data set. No. of Samples (cols)  This is the number of columns in the data set. No. of zero cells  This is the number of zero entries in the data matrix. No. of nonzero cells  This is the number of nonzero entries in the data matrix. % of zero cells  This is the number of zero entries divided by the number of cells in the data matrix. Maximum value  This is the maximum value in the data matrix. Minimum value  This is the minimum value in the data matrix. Range  This is the difference between the maximum and minimum values. Mean  This is the mean of all the values in the data matrix. Standard deviation  This is the standard deviation of all the values in the data matrix. Median  This is the median of all the values in the data matrix.
To obtain general statistics on the raw data click the Raw Data radio button in the 'Use' panel situated below the grid to select Raw Data.
Statistics for the individual rows, columns and groups of either the raw or working data matrix are selected using the Statistics radio buttons situated below the grid.
Values for each row or column are shown in the order they appear in the original data matrix:
The statistics for rows and columns calculated are as follows: Mean  This is the mean of all the values in the data matrix. Median  This is the median of all the values in the data matrix. Max  This is the maximum value in the data matrix. Min  This is the minimum value in the data matrix. Zeros  This is the number of zero entries in the data matrix. Nonzeros  This is the number of nonzero entries in the data matrix. % Zeros  (Number of zeros/Total number of cells) * 100 Sum  This is the sum of all the values in each row or column in the data matrix. Sum Sqr  This is the sums of squares of all the values in each row or column in the data matrix. Variance  This is the variance of all the values in each row or column in the data matrix. Skewness  This is the skewness of all the values in each row or column in the data matrix. Kurtosis  This is the kurtosis of all the values in each row or column in the data matrix.
For groups, the arithmetic mean and variance for each of the variables in each group are presented. For example, in the example below a userdefined group of samples called Ashley Rails had a mean Aluminium percentage of 17.32%. Group statistics are only available with Raw Data.
