Let us consider four two-dimensional data points x1 = (1,1), x2 = (1, -1), x3 = (-1, -1), and x4 = (-1,1). Use PCA to project these four data points into an one-dimensional space.

``````

In [1]:

import numpy

``````

Lets represent the data points as columns in a matrix X

``````

In [2]:

X = numpy.zeros((2, 4), dtype=float)
X[:, 0] = numpy.array([1, 1])
X[:, 1] = numpy.array([1, -1])
X[:, 2] = numpy.array([-1, -1])
X[:, 3] = numpy.array([-1, 1])

``````
``````

In [3]:

print X

``````
``````

[[ 1.  1. -1. -1.]
[ 1. -1. -1.  1.]]

``````

Now lets compute the covariance matrix S.

``````

In [19]:

mean = numpy.mean(X, axis=1)

``````
``````

In [20]:

mean

``````
``````

Out[20]:

array([ 0.,  0.])

``````
``````

In [23]:

Y = numpy.zeros((2,4), dtype=float)
for i in range(4):
Y[:,i] = X[:,i] - mean

``````
``````

In [24]:

print Y

``````
``````

[[ 1.  1. -1. -1.]
[ 1. -1. -1.  1.]]

``````
``````

In [44]:

S = numpy.zeros((2,2), dtype=float)
for i in range(4):
S = S + numpy.outer(Y[:,i], Y[:,i].T)

``````
``````

In [45]:

S

``````
``````

Out[45]:

array([[ 4.,  0.],
[ 0.,  4.]])

``````
``````

In [47]:

S = 0.25 * S

``````
``````

In [48]:

S

``````
``````

Out[48]:

array([[ 1.,  0.],
[ 0.,  1.]])

``````
``````

In [50]:

numpy.linalg.eig(S)

``````
``````

Out[50]:

(array([ 1.,  1.]), array([[ 1.,  0.],
[ 0.,  1.]]))

``````

We have two eigenvectors for eigenvalues 1 and 1. We can select any one of the two here as there is no unique max. Lets select the first eigenvector p = (1, 0). The projection points will be given by the inner-product of p and each of x1, x2, x3, and x4 as follows.

``````

In [52]:

p = numpy.array([1, 0])
Z = numpy.dot(X.T, p)

``````
``````

In [53]:

print Z

``````
``````

[ 1.  1. -1. -1.]

``````

Therefore, the x1 and x2 will be projected to (1,0) and x3 and x4 will be projected to (-1, 0).

``````

In [ ]:

``````