Principle Component Analysis (PCA) is a widely used statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components (or sometimes, principal modes of variation). PCA can be done by two methods:
- Eigenvalue decomposition of a data covariance (or cross-correlation)
- Singular Value Decomposition of a data
It is usually performed after normalizing (centering by the mean) the data matrix for each attribute.
In Harp-DAAL, we have two methods for computing PCA
SVD Based PCA
The input is a set of p-dimensional dense vectors, DAAL kernel invokes a SVD decomposition to find out $p_r$ principle directions (Eigenvectors) The details of SVD kernel by Intel DAAL is here.
Harp-DAAL provides distributed mode for SVD based PCA for dense input datasets.
Correleation Based PCA
The input is a $p\tims p$ correlation matrix, and the DAAL PCA kernel will find out the $p_r$ directions 1. Details from Intel DAAL Documentation is here.
Harp-DAAL provides distributed mode for Correlation based PCA for both of dense and sparse (CSR format) input datasets.
- Bro, R.; Acar, E.; Kolda, T.. Resolving the sign ambiguity in the singular value decomposition. SANDIA Report, SAND2007-6422, Unlimited Release, October, 2007. [return]