## Documentation Center |

Principal component analysis (PCA) on data

`[COEFF,SCORE] = princomp(X)[COEFF,SCORE,latent] = princomp(X)[COEFF,SCORE,latent,tsquare] = princomp(X)[...] = princomp(X,'econ')`

`COEFF = princomp(X)` performs principal components
analysis (PCA) on the *n*-by-*p* data
matrix `X`, and returns the principal component coefficients,
also known as loadings. Rows of `X` correspond to
observations, columns to variables. `COEFF` is a *p*-by-*p* matrix,
each column containing coefficients for one principal component. The
columns are in order of decreasing component variance.

`princomp` centers `X` by
subtracting off column means, but does not rescale the columns of `X`.
To perform principal components analysis with standardized variables,
that is, based on correlations, use `princomp(zscore(X))`.
To perform principal components analysis directly on a covariance
or correlation matrix, use `pcacov`.

`[COEFF,SCORE] = princomp(X)` returns `SCORE`,
the principal component scores; that is, the representation of `X` in
the principal component space. Rows of `SCORE` correspond
to observations, columns to components.

`[COEFF,SCORE,latent] = princomp(X)` returns `latent`,
a vector containing the eigenvalues of the covariance matrix of `X`.

`[COEFF,SCORE,latent,tsquare] = princomp(X)` returns `tsquare`,
which contains Hotelling's T^{2} statistic
for each data point.

The scores are the data formed by transforming the original
data into the space of the principal components. The values of the
vector `latent` are the variance of the columns of `SCORE`.
Hotelling's T^{2} is a measure of the multivariate
distance of each observation from the center of the data set.

When `n <= p`, `SCORE(:,n:p)` and `latent(n:p)` are
necessarily zero, and the columns of `COEFF(:,n:p)` define
directions that are orthogonal to `X`.

`[...] = princomp(X,'econ')` returns only
the elements of `latent` that are not necessarily
zero, and the corresponding columns of `COEFF` and `SCORE`,
that is, when `n <= p`, only the first `n-1`.
This can be significantly faster when `p` is much
larger than `n`.

Compute principal components for the `ingredients` data
in the Hald data set, and the variance accounted for by each component.

load hald; [pc,score,latent,tsquare] = princomp(ingredients); pc,latent pc = -0.0678 -0.6460 0.5673 0.5062 -0.6785 -0.0200 -0.5440 0.4933 0.0290 0.7553 0.4036 0.5156 0.7309 -0.1085 -0.4684 0.4844 latent = 517.7969 67.4964 12.4054 0.2372

The following command and plot show that two components account for 98% of the variance:

cumsum(latent)./sum(latent) ans = 0.86597 0.97886 0.9996 1 biplot(pc(:,1:2),'Scores',score(:,1:2),'VarLabels',... {'X1' 'X2' 'X3' 'X4'})

For a more detailed example and explanation of this analysis method, see Principal Component Analysis (PCA).

[1] Jackson, J. E., *A User's Guide
to Principal Components*, John Wiley and Sons, 1991, p.
592.

[2] Jolliffe, I. T., *Principal
Component Analysis*, 2nd edition, Springer, 2002.

[3] Krzanowski, W. J. *Principles
of Multivariate Analysis: A User's Perspective*. New York:
Oxford University Press, 1988.

[4] Seber, G. A. F., *Multivariate
Observations*, Wiley, 1984.

`barttest` | `biplot` | `canoncorr` | `factoran` | `pca` | `pcacov` | `pcares ` | `rotatefactors`

Was this topic helpful?