Let us consider four two-dimensional data points x1 = (1,1), x2 = (1, -1), x3 = (-1, -1), and x4 = (-1,1). Use PCA to project these four data points into an one-dimensional space.
In [1]:
import numpy
Lets represent the data points as columns in a matrix X
In [2]:
X = numpy.zeros((2, 4), dtype=float)
X[:, 0] = numpy.array([1, 1])
X[:, 1] = numpy.array([1, -1])
X[:, 2] = numpy.array([-1, -1])
X[:, 3] = numpy.array([-1, 1])
In [3]:
print X
Now lets compute the covariance matrix S.
In [19]:
mean = numpy.mean(X, axis=1)
In [20]:
mean
Out[20]:
In [23]:
Y = numpy.zeros((2,4), dtype=float)
for i in range(4):
Y[:,i] = X[:,i] - mean
In [24]:
print Y
In [44]:
S = numpy.zeros((2,2), dtype=float)
for i in range(4):
S = S + numpy.outer(Y[:,i], Y[:,i].T)
In [45]:
S
Out[45]:
In [47]:
S = 0.25 * S
In [48]:
S
Out[48]:
In [50]:
numpy.linalg.eig(S)
Out[50]:
We have two eigenvectors for eigenvalues 1 and 1. We can select any one of the two here as there is no unique max. Lets select the first eigenvector p = (1, 0). The projection points will be given by the inner-product of p and each of x1, x2, x3, and x4 as follows.
In [52]:
p = numpy.array([1, 0])
Z = numpy.dot(X.T, p)
In [53]:
print Z
Therefore, the x1 and x2 will be projected to (1,0) and x3 and x4 will be projected to (-1, 0).
In [ ]: