One may have noticed in the previous exercise, prediction is very slow with a KNN classifier. Thus, there exists approximation methods. One in particular is FLANN. To install, we use Homebrew.
$ brew install flann --with-python
$ /usr/local/Cellar/flann/1.8.4_1/share/python
$ sudo python setup.py install
Let's give it a whirl.
In [1]:
%matplotlib inline
import numpy as np
import idx2numpy
import pyflann
import mnist
import matplotlib.pyplot as plt
train_image_labels = idx2numpy.convert_from_file('train-labels-idx1-ubyte')
train_images = idx2numpy.convert_from_file('train-images-idx3-ubyte')
test_image_labels = idx2numpy.convert_from_file('t10k-labels-idx1-ubyte')
test_images = idx2numpy.convert_from_file('t10k-images-idx3-ubyte')
flattened_train_images = train_images.reshape((train_images.shape[0],
train_images.shape[1]*train_images.shape[2]),
order='C')
flattened_test_images = test_images.reshape((test_images.shape[0],
test_images.shape[1]*test_images.shape[2]),
order='C')
The first step is to build the index.
In [2]:
pyflann.set_distance_type('minkowski', order=2)
flann = pyflann.FLANN()
index_params = flann.build_index(flattened_train_images, log_level='info',
algorithm='kdtree', trees=10)
index_params
Out[2]:
Let's take a look at our test data. Plotting routines can be found on GitHub.
In [3]:
fig = plt.figure(figsize=(10,5))
for i in range (8):
fig.add_subplot(2,4,i + 1)
mnist.plot_image(test_images[i], test_image_labels[i])
fig.suptitle("Sample Test MNIST Digits")
plt.show()
Let us try and make predictions now.
In [4]:
neighbor, dist = flann.nn_index(flattened_test_images[:1000], num_neigbors=1, checks=index_params['checks'])
test_image_predictions = train_image_labels[neighbor]
sum(test_image_predictions != test_image_labels[:1000])/1000
Out[4]:
We see that the error rate increases only slightly, and the the algorithm runs faster.