Let us consider four two-dimensional data points x1 = (1,1), x2 = (1, -1), x3 = (-1, -1), and x4 = (-1,1). Use PCA to project these four data points into an one-dimensional space.



In [1]:

    
import numpy

Lets represent the data points as columns in a matrix X



In [2]:

    
X = numpy.zeros((2, 4), dtype=float)
X[:, 0] = numpy.array([1, 1])
X[:, 1] = numpy.array([1, -1])
X[:, 2] = numpy.array([-1, -1])
X[:, 3] = numpy.array([-1, 1])



In [3]:

    
print X









    



[[ 1.  1. -1. -1.]
 [ 1. -1. -1.  1.]]

Now lets compute the covariance matrix S.



In [19]:

    
mean = numpy.mean(X, axis=1)



In [20]:

    
mean









    Out[20]:





array([ 0.,  0.])



In [23]:

    
Y = numpy.zeros((2,4), dtype=float)
for i in range(4):
    Y[:,i] = X[:,i] - mean



In [24]:

    
print Y









    



[[ 1.  1. -1. -1.]
 [ 1. -1. -1.  1.]]



In [44]:

    
S = numpy.zeros((2,2), dtype=float)
for i in range(4):
    S = S + numpy.outer(Y[:,i], Y[:,i].T)



In [45]:

    
S









    Out[45]:





array([[ 4.,  0.],
       [ 0.,  4.]])



In [47]:

    
S = 0.25 * S



In [48]:

    
S









    Out[48]:





array([[ 1.,  0.],
       [ 0.,  1.]])



In [50]:

    
numpy.linalg.eig(S)









    Out[50]:





(array([ 1.,  1.]), array([[ 1.,  0.],
        [ 0.,  1.]]))

We have two eigenvectors for eigenvalues 1 and 1. We can select any one of the two here as there is no unique max. Lets select the first eigenvector p = (1, 0). The projection points will be given by the inner-product of p and each of x1, x2, x3, and x4 as follows.



In [52]:

    
p = numpy.array([1, 0])
Z = numpy.dot(X.T, p)



In [53]:

    
print Z









    



[ 1.  1. -1. -1.]

Therefore, the x1 and x2 will be projected to (1,0) and x3 and x4 will be projected to (-1, 0).



In [ ]: