PiscesLogoSmallerStill  Data set statistics

Top  Previous  Next

Summary statistics for both the raw and working data sets are displayed by clicking on the Summary tab.

 

summary tab

You can choose summary statistics for either the raw or working data sheets. Statistics can be generated for rows, columns, general summary statistics and the user-defined groups for your data. These statistics are particularly useful prior to undertaking many analyses. For example PCA should not be undertaken on a data set that comprises large numbers of zeros. Use Summary statistics to determine the number of zero elements in your data. The group statistics are also useful when summarising the results of a Linear Discriminant Analysis (also called Canonical Variate Analysis).

 

When first activated the data grid will display the following general statistics for the working data:

 

summary2

 

No. of Variables (rows) - This is the number of rows of data in the data set.

No. of Samples (cols) - This is the number of columns in the data set.

No. of zero cells - This is the number of zero entries in the data matrix.

No. of non-zero cells - This is the number of non-zero entries in the data matrix.

% of zero cells - This is the number of zero entries divided by the number of cells in the data matrix.

Maximum value - This is the maximum value in the data matrix.

Minimum value - This is the minimum value in the data matrix.

Range - This is the difference between the maximum and minimum values.

Mean - This is the mean of all the values in the data matrix.

Standard deviation - This is the standard deviation of all the values in the data matrix.

Median - This is the median of all the values in the data matrix.

 

To obtain general statistics on the raw data click the Raw Data radio button in the 'Use' panel situated below the grid to select Raw Data.

 

Summary options2

 

Statistics for the individual rows, columns and groups of either the raw or working data matrix are selected using the Statistics radio buttons situated below the grid.

 

Values for each row or column are shown in the order they appear in the original data matrix:

 

summary data row2

 

The statistics for rows and columns calculated are as follows:

Mean - This is the mean of all the values in the data matrix.

Median - This is the median of all the values in the data matrix.

Max - This is the maximum value in the data matrix.

Min - This is the minimum value in the data matrix.

Zeros - This is the number of zero entries in the data matrix.

Non-zeros - This is the number of non-zero entries in the data matrix.

% Zeros - (Number of zeros/Total number of cells) * 100

Sum - This is the sum of all the values in each row or column in the data matrix.

Sum Sqr - This is the sums of squares of all the values in each row or column in the data matrix.

Variance - This is the variance of all the values in each row or column in the data matrix.

Skewness - This is the skewness of all the values in each row or column in the data matrix.

Kurtosis - This is the kurtosis of all the values in each row or column in the data matrix.

 

For groups, the arithmetic mean and variance for each of the variables in each group are presented. For example, in the example below a user-defined group of samples called Ashley Rails had a mean Aluminium percentage of 17.32%. Group statistics are only available with Raw Data.

 

summary groups2