Linear Classification

Lecture 2 on Linear Classification

Making use of MNIST dataset again.



In [1]:

    
import numpy as np
import matplotlib.pylab as plt
import math
from scipy.stats import mode

%matplotlib inline



In [2]:

    
from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
mnist = fetch_mldata('MNIST original', data_home='../data')



In [3]:

    
mnist.data.shape









    Out[3]:





(70000, 784)



In [4]:

    
X = np.append(np.ones((mnist.data.shape[0],1)), mnist.data, axis = 1)



In [5]:

    
Y = mnist.target



In [6]:

    
def display(x, label):
    pixels = x.reshape((28, 28))
    plt.title('{label}'.format(label=label))
    plt.imshow(pixels, cmap='gray')
    plt.show()



In [7]:

    
display(X[1][1:785], 'Y[0]')



In [8]:

    
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=42)



In [9]:

    
w = np.random.rand(10, X_train.shape[1])
w_orig = w



In [10]:

    
w.dot(X_train[30000])









    Out[10]:





array([ 5434.4880896 ,  6107.67234256,  5959.90920417,  6315.93616878,
        7493.96525192,  5884.83252161,  6745.33324595,  5506.82711   ,
        6466.48406735,  5814.33083287])

Loss Function

Manhattan Distance

Also known as L1 distance.



In [11]:

    
def manhattan_distance(s1, s2):
    return np.sum(np.abs(s1 - s2), axis=1)

Nearest Neighbour Classification

k-nearest neighbour

Method makes use of distance or loss measure. Training is O(1) and Test is O(n). At test time, we pick the k training examples which have least distance to test point. The mode of the k points is picked as the class for test point.



In [12]:

    
class NearestNeighbour:
    
    def __init__(self, k, loss = manhattan_distance):
        self.k = k
        self.loss = loss
        
    
    def train(self, X, Y):
        self.X = X
        self.Y = Y
        
        
    def test(self, X):
        losses = self.loss(self.X, X)
        return mode(self.Y[losses.argsort()[self.k]])[0][0]



In [13]:

    
n = NearestNeighbour(1)
n.train(X_train, Y_train)



In [14]:

    
n.test(X_test[0])









    Out[14]:





7.0



In [15]:

    
display(X_test[0][1:785], "%s"%Y_test[0])



In [25]:

    
def test(X_test, Y_test):
    count_failed = 0

    for i in range(X_test.shape[0]):
        if Y_test[i] != n.test(X_test[i]):
            count_failed += 1
            
    return (count_failed, num_test)

count_failed, num_tests = test(X_test, Y_test)

print("\n Results:")
print("Total: %s " % num_tests)
print("Failed: %s " % count_failed)
print("Failed: %s " % (1.0 * count_failed / num_tests))









    



 Results:
Total: 23100 
Failed: 1042 
Failed: 0.04510822510822511



In [24]:









    Out[24]:





(23100, 785)



In [ ]: