Let us consider four two-dimensional data points x1 = (1,1), x2 = (1, -1), x3 = (-1, -1), and x4 = (-1,1). Use PCA to project these four data points into an one-dimensional space.


In [1]:
import numpy

Lets represent the data points as columns in a matrix X


In [2]:
X = numpy.zeros((2, 4), dtype=float)
X[:, 0] = numpy.array([1, 1])
X[:, 1] = numpy.array([1, -1])
X[:, 2] = numpy.array([-1, -1])
X[:, 3] = numpy.array([-1, 1])

In [3]:
print X


[[ 1.  1. -1. -1.]
 [ 1. -1. -1.  1.]]

Now lets compute the covariance matrix S.


In [19]:
mean = numpy.mean(X, axis=1)

In [20]:
mean


Out[20]:
array([ 0.,  0.])

In [23]:
Y = numpy.zeros((2,4), dtype=float)
for i in range(4):
    Y[:,i] = X[:,i] - mean

In [24]:
print Y


[[ 1.  1. -1. -1.]
 [ 1. -1. -1.  1.]]

In [44]:
S = numpy.zeros((2,2), dtype=float)
for i in range(4):
    S = S + numpy.outer(Y[:,i], Y[:,i].T)

In [45]:
S


Out[45]:
array([[ 4.,  0.],
       [ 0.,  4.]])

In [47]:
S = 0.25 * S

In [48]:
S


Out[48]:
array([[ 1.,  0.],
       [ 0.,  1.]])

In [50]:
numpy.linalg.eig(S)


Out[50]:
(array([ 1.,  1.]), array([[ 1.,  0.],
        [ 0.,  1.]]))

We have two eigenvectors for eigenvalues 1 and 1. We can select any one of the two here as there is no unique max. Lets select the first eigenvector p = (1, 0). The projection points will be given by the inner-product of p and each of x1, x2, x3, and x4 as follows.


In [52]:
p = numpy.array([1, 0])
Z = numpy.dot(X.T, p)

In [53]:
print Z


[ 1.  1. -1. -1.]

Therefore, the x1 and x2 will be projected to (1,0) and x3 and x4 will be projected to (-1, 0).


In [ ]: