PiscesLogoSmallerStill  Maximum size of the data set and computation speed

Top  Previous  Next

In CAP 5, array handling technology is used to enable the analysis of very large data sets.

 

Theoretically, the size of the input data matrix is unlimited, although, in practice, the memory resources of the PC are usually the limiting factor. In tests, we have run CAP with a data array of 5000 variables x 5000 samples without difficulty.

 

Because of its unusual computational requirements, which involves the generation of new data items in the form of pseudospecies, TWINSPAN is more demanding in memory usage. The maximum size of the data set whilst running TWINSPAN is therefore likely to be smaller than for the other multivariate methods. It should still easily be sufficient for most users' needs, however.

 

Our approach to data handling means that on a reasonably modern PC, most of the computation, even on very large data sets, can be completed within a few seconds.  All computations with data sets of up to 100 samples by 100 variables are usually almost instantaneous.

 

If you do run into speed problems with very large data sets, it will generally be improved if there are no other Windows applications running at the time.