principal component analysis

1. relation to svd

Taken mostly from this stack overflow answer
Let \(X\) be a \(n \times p\) matrix of data: \(n\) is the number of samples, and \(p\) is the number of features
Assume that \(X\) is centered: each feature has 0 mean.
Then, \(C = X^TX\) is the covariance matrix (usually it would be \((X-\bar{X})^T(X - \bar{X})\), but the mean is 0). It has size \(p \times p\). It is symmetric (remember that covariance matrices are symmetric).
Because \(C\) is symmetric, it is diagonalizable (see diagonalizable matrix).
So we can write \(C = VLV^T/(n-1)\). Note that this isn't unique, but any factorization of this form must have the columns of \(V\) be eigenvectors and the diagonal entries of \(L\) be eigenvalues of \(C\) (see diagonalizable matrix).
The eigenvectors \(V\) are called principal directions
\(XV\) is the coordinates of the data where the principal directions form the basis. Think of taking a row of \(X\), one data point, and then taking the dot product with each one of the directions.
How does this relate to the SVD?
Take the singular value decomposition of \(X\): \(X = U\Sigma V^T\). Then \(C = X^TX = V \Sigma U^T U \Sigma V^T /(n-1) = V \Sigma^2 / (n-1) V^T\). Here, we remember that \(U\) is a unitary matrix.
We remember that any factorization of this form means that \(V\) holds eigenvectors. So \(V\) gives the principal directions.