ACKNOWLEDGEMENT
The MNIST data loader is adapted from https://github.com/sorki/python-mnist/blob/master/mnist/loader.py. Also stole ideas and code from Sonya Sawtelle's excellent blog at http://sdsawtelle.github.io/blog/output/week4-andrew-ng-machine-learning-with-python.html.
In this notebook we're going to use neural networks in the SciKit Learn library to learn how to classify two datasets. Instead of learning how to implement neural networks from scratch we're going to use off-the-shelf neural networks. But we're going to learn how a neural network works by probing its behavior in various ways.
The first dataset is the one we used in the non-linear logistic regression notebook.
The second dataset is quite well known - it's a handwritten set of digits -- a bunch of images from a famous dataset called MNIST (http://yann.lecun.com/exdb/mnist/). Let's start with the MNIST dataset.
In [1]:
# Get access to the MNIST class defined in mnist_loader.py
# mnist_loader.py contains methods for loading and displaying the MNIST data
%run mnist_loader.py
In [2]:
# Import our usual libraries
from __future__ import division
import numpy as np
from numpy import random as rnd
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import os
In [3]:
# Initialize the mnist object
mnist = MNIST()
type(mnist)
Out[3]:
In [4]:
# Get the testing images and labels
test_images, test_labels = mnist.load_testing()
In [5]:
type(test_images), type(test_labels)
Out[5]:
In [6]:
mnist.display(test_images[4000])
Out[6]:
In [7]:
# Get the training images and labels
train_images, train_labels = mnist.load_training()
In [8]:
mnist.display(train_images[3390])
Out[8]:
In [9]:
# How many images and labels in the training and test datasets?
[len(d) for d in [train_images, train_labels, test_images, test_labels]]
Out[9]:
In [10]:
# Size of the datasets
[np.array(d).shape for d in [train_images, train_labels, test_images, test_labels]]
Out[10]:
In [11]:
# How many unique digits do we have handwriting examples for?
list(set(test_labels))
Out[11]:
In [12]:
# Finally, extract the data as inputs to the neural network classifier.
# Scale the data and one-hot encode the labels
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
X_train = np.array(train_images)/255 # rescale to values between 0 and 1
# One-hot encode
y_train = enc.fit_transform(np.array(train_labels).reshape(-1,1)).toarray()
X_test = np.array(test_images)/255 # rescale
# One-hot encode
y_test = enc.fit_transform(np.array(test_labels).reshape(-1,1)).toarray()
In [13]:
y_train[2000]
Out[13]:
In [14]:
# Check the one-hot encoding
mnist.display(X_train[2000])
Out[14]:
In [141]:
# We need a multi-level classifier because we have to classify digits into 0 through 9 distinct classes.
from sklearn.neural_network import MLPClassifier
# Hidden layers are specified as follows
# (n1, ) means n1 units and 1 hidden layer
# (n1, n2) means n1 units in the first hidden layer and n2 units in the second hidden layer
# (n1, n2, n3) maeans n1 units in the first hidden layer, n2 units in the second hidden layer,
# and n3 units in the third hidden layer
# Experiment with max_iter -- set it to 10, 50, 100, 200 to see how the neural network behaves
clf = MLPClassifier(solver='sgd', alpha=1e-5,
hidden_layer_sizes=(25, 25), random_state=1, verbose=False, max_iter=50)
clf.fit(X_train, y_train)
Out[141]:
In [126]:
# How quickly is the classifier learning?
fig, ax = plt.subplots(figsize=(8,5))
plt.plot(clf.loss_curve_)
Out[126]:
In [17]:
# The classifier's predictions for the first 5 images in the test dataset
clf.predict(X_test[0:5])
Out[17]:
When the number of learning iterations is low, the model will often not make any predictions. You'll see all zeros in some or most of the arrays above. Sometimes you'll see multiple ones appear in a row -- that shows the classifier is (eerie!) confused about what number it is and makes a couple of best guesses.
In [18]:
# Check if the first prediction read off from the one-hot encoding in the first list above
# matches with the classifier's prediction
mnist.display(test_images[0])
Out[18]:
In [19]:
mnist.display(test_images[3])
Out[19]:
In [130]:
# Select n_sel random images from the test dataset -- we have 10,000 test images in total (see above)
n_total = 10000
n_sel = 10
test_image_ids = rnd.choice(range(0,n_total - 1), n_sel, replace=False)
# for each of these image ids, get the test image
test_images_sel = [X_test[i] for i in test_image_ids]
# for each of these image ids, get the result from the classifier
clf.predict(test_images_sel)
Out[130]:
In [131]:
# Now show these randomly selected test images and our neural net classifier's prediction based on its training
# Adapted from Sonya Sawtelle
# http://sdsawtelle.github.io/blog/output/week4-andrew-ng-machine-learning-with-python.html
fig, axs = plt.subplots(2, 5, sharex=True, sharey=True, figsize=(10,10))
axs = axs.flatten() # The returned axs is actually a matrix holding the handles to all the subplot axes objects
graymap = plt.get_cmap("gray")
for i, indx in enumerate(test_image_ids):
im_mat = np.reshape(test_images_sel[i], (28, 28))
labl = str(np.where(clf.predict(test_images_sel[i].reshape(1, -1))[0] == 1)[0])
# Plot the image along with the label it is assigned by the fitted model.
axs[i].imshow(im_mat, cmap=graymap, interpolation="None")
axs[i].annotate(labl, xy=(0.05, 0.85), xycoords="axes fraction", color="red", fontsize=18)
axs[i].xaxis.set_visible(False) # Hide the axes labels for clarity
axs[i].yaxis.set_visible(False)
That's close to perfect, but just a small sample of the test dataset. But not bad for just using a pretty basic neural network. How does the classifier do over the entire dataset?
In [22]:
# How well does the classifer perform?
print("Training set score: %f" % clf.score(X_train, y_train))
print("Test set score: %f" % clf.score(X_test, y_test))
In [23]:
# Misclassification of the test dataset
# Get all the results in terms of success or failure
results = [(clf.predict(X_test[i].reshape(1,-1)) == y_test[i]).all() for i in range(len(X_test))]
# Get the index numbers of all the failures
idx_failures = [i for i, x in enumerate(results) if x == False]
# At the same time, get the index numbers of all the successful predictions
idx_successes = [i for i, x in enumerate(results) if x == True]
In [24]:
# Quick check on test images that are not classified correctly
idx_failures[0:5]
Out[24]:
In [122]:
# Let's check on the misclassification for the first few failures
for i in idx_failures[0:3]:
print clf.predict(X_test[i].reshape(1,-1))
plt.figure()
mnist.display(X_test[i])
We can see the kind of mistakes the classifier is making. Let's be systematic and figure out if there's a way in which the classifier is prone to making mistakes. One way to investigate this is to look at every digit in the test set and see how the classifier fails to identify that digit.
In [26]:
# The classifier's failure patterns
combined_errors_labels = []
for i in idx_failures:
pred = clf.predict(X_test[i].reshape(1,-1))
pred_value = np.where(pred[0] == 1)[0]
if pred_value.size == 0:
pred_value = [10] # NOTE: 10 means no prediction is made
label = y_test[i]
label_value = np.squeeze(np.where(label == 1)[0]) #np.squeeze to make into an integer
combined_values = [pred_value, label_value]
combined_errors_labels.append(combined_values)
In [27]:
# The classifier's success patterns
combined_success_labels = []
for i in idx_successes:
pred = clf.predict(X_test[i].reshape(1,-1))
pred_value = np.where(pred[0] == 1)[0]
# This if clause will never fire...
if pred_value.size == 0:
pred_value = [10] # NOTE: 10 means no prediction is made
label = y_test[i]
label_value = np.squeeze(np.where(label == 1)[0]) #np.squeeze to make into an integer
combined_values = [pred_value, label_value]
combined_success_labels.append(combined_values)
In [28]:
combined_success_labels[0:5]
Out[28]:
In [29]:
# Get the count of successes by digit
# NOTE: This would be much easier if the data were in a Pandas dataframe. Alas...
# Create a list of indexed variables
correct_classified_0 = []
correct_classified_1 = []
correct_classified_2 = []
correct_classified_3 = []
correct_classified_4 = []
correct_classified_5 = []
correct_classified_6 = []
correct_classified_7 = []
correct_classified_8 = []
correct_classified_9 = []
outcomes = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
for i in range(len(outcomes)):
eval('correct_classified_'+ outcomes[i]).append(combined_success_labels.count([[i], [i]]))
In [30]:
correct_classified_0
Out[30]:
In [31]:
combined_errors_labels[0:5]
Out[31]:
In [32]:
# NOTE: 10 means no prediction is made
from tabulate import tabulate
headers = ['Classfier Predicts', 'Actual Value']
print tabulate(combined_errors_labels[0:10], tablefmt='grid', headers=headers)
In [33]:
# When the actual value is 0, 1, 2, ..., 9 what are the different ways in which the classifier misclassifies?
# Create a list of indexed variables
misclassified_0 = []
misclassified_1 = []
misclassified_2 = []
misclassified_3 = []
misclassified_4 = []
misclassified_5 = []
misclassified_6 = []
misclassified_7 = []
misclassified_8 = []
misclassified_9 = []
# error_sets = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
# Use outcomes from above instead
# NOTE: eval is considered dangerous - REFACTOR to eliminate
for i in range(len(outcomes)):
for j in range(len(combined_errors_labels)):
if combined_errors_labels[j][1] == i:
eval('misclassified_'+ outcomes[i]).append(combined_errors_labels[j][0])
In [34]:
# Calculate the labeling success rate by digit
for i in range(len(outcomes)):
n_correctly_classified = eval('correct_classified_'+ outcomes[i])[0]
n_misclassified = len(eval('misclassified_'+ outcomes[i]))
success_rate = n_correctly_classified/(n_correctly_classified + n_misclassified)
print("Success rate for labeling digit %i is %f" %(i, success_rate))
In [35]:
# Histogram of how a digit is misclassified
digit = misclassified_6
hist_data = [item for sublist in digit for item in sublist]
plt.hist(hist_data)
plt.xlabel('Identified As')
plt.ylabel('Number of Mistakes')
plt.title('Actual Digit = 6')
#plt.xticks(range(0,11,1))
Out[35]:
In [36]:
# Making it easier using Sonya Sawtelle's code
misclassified = [misclassified_0, misclassified_1, misclassified_2,
misclassified_3, misclassified_4, misclassified_5,
misclassified_6, misclassified_7, misclassified_8,
misclassified_9]
In [37]:
len(misclassified)
Out[37]:
In [38]:
# Code from Sonya Sawtelle
fig, axs = plt.subplots(5, 2, sharex=False, sharey=True, figsize=(12,12))
fig.suptitle("Misclassifications For Each Digit", fontsize=14)
axs = axs.flatten()
for i in range(len(misclassified)):
ax = axs[i] # not sure what this does but it's required
digit = misclassified[i]
hist_data = [item for sublist in digit for item in sublist]
ax.hist(hist_data, label=("digit %i" %i), bins=np.arange(1, 11, 1)+0.5) # Shift the bins to get labels aligned
ax.set_xlim([0, 11])
#ax.xticks(range(0,11))
ax.legend(loc="upper left")
ax.yaxis.set_visible(True)
Note: When you see a digit misidentified as itself, what that means is that the classifier was unsure and made more than one guess, including a guess that it was the digit itself. Because the classifier was not sure, we count this as a misclassification.
The digits 0 and 1 are the easiest to recognize while 5 and 8 seem to be confused with a wide range of other digits. 9 is also hard to discern correctly.
In [40]:
[clf.score(X_train, y_train), clf.score(X_test, y_test)]
Out[40]:
In [41]:
clf.loss_
Out[41]:
In [42]:
clf.coefs_[2].shape
Out[42]:
In [140]:
# Pull the weights of a given neuron in a given activation layer
#
hidden_2 = np.transpose(clf.coefs_[1])[2]
fig, ax = plt.subplots(1, figsize=(5,5))
ax.imshow(np.reshape(hidden_2, (5,5)), cmap=plt.get_cmap("Greys"), aspect="auto", interpolation="nearest")
Out[140]:
In [44]:
clf.coefs_[1].shape
Out[44]:
In [45]:
np.matrix(X_test[0]).shape
Out[45]:
In [46]:
# The first hidden layer's activation values.
# Note that the index on X_test can be any value -- it picks an image from the test dataset
layer1_act_vals = (np.matrix(X_test[500]) * clf.coefs_[0]).T
In [47]:
plt.imshow(layer1_act_vals.reshape(5,5))
Out[47]:
In [48]:
clf.intercepts_
Out[48]:
In [49]:
# How the input layer influences the first hidden layer
fig, ax = plt.subplots(1, 1, figsize=(12,6))
ax.imshow(np.transpose(clf.coefs_[0]), cmap=plt.get_cmap("gray"), aspect="auto")
plt.xlabel("Neuron in input layer")
plt.ylabel("Neuron in first hidden layer")
plt.title("Weights $\Theta^{(1)}$")
Out[49]:
In [50]:
# How the first hidden layer influences the second hidden layer
fig, ax = plt.subplots(1, 1, figsize=(12,6))
ax.imshow(np.transpose(clf.coefs_[1]), cmap=plt.get_cmap("gray"), aspect="auto")
plt.xlabel("Neuron in first hidden layer")
plt.ylabel("Neuron in second hidden layer")
plt.title("Weights $\Theta^{(2)}$")
Out[50]:
In [51]:
# How the second hidden layer influences the output layer
fig, ax = plt.subplots(1, 1, figsize=(12,6))
ax.imshow(np.transpose(clf.coefs_[2]), cmap=plt.get_cmap("gray"), aspect="auto")
plt.xlabel("Neuron in second hidden layer")
plt.ylabel("Neuron in the output layer")
plt.title("Weights $\Theta^{(3)}$")
Out[51]:
In [52]:
# Unpickle the file
import cPickle as pickle
pkl_file_path = os.getcwd() + '/Data/ex2data2.pkl'
X2_train, X2_test, y2_train, y2_test = pickle.load( open( pkl_file_path, "rb" ) )
In [53]:
X2_train[0:3,:]
Out[53]:
In [54]:
y2_train[0:3]
Out[54]:
The data set here is the one where we have two input measurements (suitably normalized) and one output. So it's a binary classifier that we need. We'll set up our neural network to reflect this structure. The nice thing is we can use the same hidden layer structure as before to do our work -- the classifier is smart enough to recognize and adjust for the differences in dimension of the inputs and the outputs.
In [75]:
# We need a binary classifier because we have to classify results into 1 of 2 categories -- Accepted or Rejected.
# As a reminder, here is the classifier we've been using for the MNIST dataset
# **********************
#from sklearn.neural_network import MLPClassifier
# Hidden layers are specified as follows
# (n1, ) means n1 units and 1 hidden layer
# (n1, n2) means n1 units in the first hidden layer and n2 units in the second hidden layer
# (n1, n2, n3) maeans n1 units in the first hidden layer, n2 units in the second hidden layer,
# and n3 units in the third hidden layer
# Experiment with max_iter -- set it to 10, 50, 100, 200 to see how the neural network behaves
# Here is our modified neural network classifier for our new dataset
# Note that the solver is now lbfgs instead of sdg; everything else is the same
clf2 = MLPClassifier(solver='lbfgs', alpha=1e-3,
hidden_layer_sizes=(25, 25), activation='logistic', random_state=1, verbose=False, max_iter=100)
# ***********************
# We'll now fit the same classifier to the new dataset
clf2.fit(X2_train, y2_train)
Out[75]:
In [76]:
# How quickly is the classifier learning?
# NOTE: There is no loss curve in scikit learn for the lbfgs solver
fig, ax = plt.subplots(figsize=(8,5))
plt.plot(clf2.loss_curve_)
In [77]:
# How well does the classifer perform?
print("Training set score: %f" % clf2.score(X2_train, y2_train))
print("Test set score: %f" % clf2.score(X2_test, y2_test))
In [78]:
# Get the predictions for the test set
preds = clf2.predict(X2_test)
In [79]:
actual_vals = y2_test
In [142]:
clf2.coefs_[0]
Out[142]:
In [143]:
clf2.intercepts_[0]
Out[143]:
We've seen that the classifer does OK -- about 80% in accuracy. Let's visualize the classifier's boundary.
In [81]:
# Contour plot of the decision boundary
# Make grid values
xx1, xx2 = np.mgrid[-1:1.5:.02, -1:1.5:.02]
In [82]:
# Create the grid
grid = np.c_[xx1.ravel(), xx2.ravel()]
grid.shape
Out[82]:
In [91]:
grid[0:4,:]
Out[91]:
In [83]:
# Get the prediction for each grid value
preds = clf2.predict(grid)
In [93]:
# number of ones predicted
pred_ones = (preds != np.zeros(15625)).sum()
# number of zeroes predicted
pred_zeros = (preds == np.zeros(15625)).sum()
pred_ones, pred_zeros
Out[93]:
In [96]:
# For the test dataset, segment the Accepted from the Rejected
accepted = []
rejected = []
for i in range(len(X2_test)):
if y2_test[i] == 1:
accepted.append(X2_test[i])
else:
rejected.append(X2_test[i])
In [118]:
np.array(accepted)[:,0]
Out[118]:
In [120]:
#plt.contour(xx1,xx2,np.array(probs_to_binary).reshape(xx1.shape))
fig, ax = plt.subplots(figsize=(8,5))
ax.scatter(np.array(accepted)[:,0], np.array(accepted)[:,1], s=30, c='b', marker='s', label='Accepted')
ax.scatter(np.array(rejected)[:,0], np.array(rejected)[:,1], s=30, c='r', marker='x', label='Rejected')
ax.legend()
ax.set_xlabel('Test 1 Score')
ax.set_ylabel('Test 2 Score')
ax.set_title('Test Dataset')
plt.contour(xx1,xx2,preds.reshape(xx1.shape), colors='y', linewidths=0.5)
Out[120]:
We see that the neural network has produced a rather complex boundary even without us manufacturing any polynomials. Indeed, neural networks are great when you have a complicated non-linear boundary to draw in order to classify your dataset. But you need a lot more data for the network to become good at finding the right boundary -- the boundary that will give the best classification results.
In [ ]: