conda install kerasjupyter notebookIf you cannot see line numbers press Shift+Lto switch them on or check the View menu.
In [9]:
# The %... is an iPython thing, and is not part of the Python language.
# In this case we're just telling the plotting library to draw things on
# the notebook, instead of on a separate window.
%matplotlib inline
# the import statements load differnt Python packages that we need for the tutorial
# See all the "as ..." contructs? They're just aliasing the package names.
# That way we can call methods like plt.plot() instead of matplotlib.pyplot.plot().
# packages for scientif computing and visualization
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
import time
# configuration of the notebook
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("notebook")
# machine learning library imports
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils
In this example, we will rely on the NIST MNIST data set, a data set for the recognition of hand-written digits. MNIST is a data set that has been used by the NIST such as the discussed TREC campaign.
The following script will display some sample digits to give an example of the contents of the data set.
In [70]:
# load (download if needed) the MNIST dataset of handwritten numbers
# we will get a training and test set consisting of bitmaps
# in the X_* arrays and the associated labels in the y_* arrays
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# plot 4 images as gray scale images using subplots without axis labels
plt.subplot(221)
plt.axis('off')
# -1 inverts the image because of aesthetical reasons
plt.imshow(X_train[0]*-1, cmap=plt.get_cmap('gray'))
plt.subplot(222)
plt.axis('off')
plt.imshow(X_train[1]*-1, cmap=plt.get_cmap('gray'))
plt.subplot(223)
plt.axis('off')
plt.imshow(X_train[2]*-1, cmap=plt.get_cmap('gray'))
plt.subplot(224)
plt.axis('off')
plt.imshow(X_train[3]*-1, cmap=plt.get_cmap('gray'))
# show the plot
#plt.savefig("test.pdf",format="pdf")
plt.show()
Next, we define out machine learning model with different layers. Roughly speaking, the function baseline_model() defines how the neural network looks like. For more details, see the documentation.
In [72]:
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
# Compile model, use logarithmic loss for evaluation
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# flatten 28*28 images from the MNIST data set to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# build the model
model = baseline_model()
# fit the model, i.e., start the actual learning
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
# print the error rate of the algorithm
print("Baseline Error: %.2f%%" % (100-scores[1]*100))
In [4]:
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# the steps indicate the size of the training sample
steps=[18,100,1000,5000,10000,20000,30000,40000,50000]
# this dict (basically a hashmap) holds the error rate for each iteration
errorPerStep=dict()
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
for step in steps:
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# limit the training data size to the current step, the : means "from 0 to step"
X_train=X_train[0:step]
y_train=y_train[0:step]
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Baseline Error: %.2f%%" % (100-scores[1]*100))
errorPerStep[step]=(100-scores[1]*100)
Next, we will illustrate our results.
In [159]:
print(errorPerStep)
x=[]
y=[]
for e in errorPerStep:
x.append(e)
y.append(errorPerStep[e])
plt.xlabel("Training Samples")
plt.ylabel("Baseline Error (%)")
plt.plot(x,y,'o-')
plt.savefig("test.pdf",format="pdf")
The graph indicates clearly that the baseline error decreases with the increase of training data. In other words, the overfitting effect is limited in relation to the amount of data the learning algorithm has seen.
To end the example, we will check how well the model can predict new input.
In [160]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# choose a random sample as our test image
test_im = X_train[25]
# display the image
plt.imshow(test_im.reshape(28,28)*-1, cmap=plt.get_cmap('gray'), interpolation='none')
plt.axis('off')
num_pixels = X_train.shape[1] * X_train.shape[2]
# as we are dealing with only one image, we have to restrict the array to a 1D * 784
test_im = test_im.reshape(1, num_pixels).astype('float32')
# let the model predict the image
r=model.predict(test_im)
itemindex = np.where(r[0]==1)
print("The model predicts: %i for the following image:"%itemindex[0])
The next cell illustrates how accuracy changes with respect to different distributions between two classes if the model always predict that an element belongs to class A. $$ Accuracy=\frac{|tp+tn|}{|tp|+|tn|+|fp|+|fn|}\equiv\frac{|\mbox{correct predictions}|}{|\mbox{predictions}|} $$
In [158]:
# arrays for plotting
x=[] # samples in A
y=[] # samples in B
accuracies=[] # calculated accuracies for each distribution
# distributions between class A and B, first entry means 90% in A, 10% in B
distributions=[[90,10],[55,45],[70,30],[50,50],[20,80]]
for distribution in distributions:
x.append(distribution[0])
y.append(distribution[1])
samplesA=np.ones((1,distribution[0])) # membership of class A is encoded as 1
samplesB=np.zeros((1,distribution[1])) # membership of class B is encoded as 0
# combine both arrays
reality=np.concatenate((samplesA,samplesB),axis=None)
# as said above, our model always associates the elements with class A (encoded by 1)
prediction=np.ones((1,100))
tpCount=0
# count the true positives
for (i,val) in enumerate(prediction[0]):
if not reality[i]==val:
pass
else:
tpCount+=1
# calculate the accuracy and add the to the accuracies array for later visualization
acc=float(tpCount+tnCount)/100.0
accuracies.append(acc*1000) # the multiplication by 1000 is done for visualization purposes only
print("Accuracy: %.2f"%(acc))
# plot the results as a bubble chart
plt.xlim(0,100)
plt.ylim(0,100)
plt.xlabel("Samples in A")
plt.ylabel("Samples in B")
plt.title("Accuracy of a Always-A Predictor")
plt.scatter(x, y, s=accuracies*100000,alpha=0.5)
#plt.savefig("test.png",format="png")
plt.show()
The $Logarithmic ~Loss=\frac{-1}{N}\sum_{i=1}^N\sum_{j=1}^M y_{ij}\log(p_{ij}) \rightarrow [0,\infty)$ penalizes wrong predicitions. For the sake of simplicity, we simply use the function provided by sklearn, a machine-learning toolkit for Python.
The manual will give you more details.
In [77]:
from sklearn.metrics import log_loss
# the correct cluster for each sample, i.e., sample 1 is in class 0
y_true = [0, 0, 1, 1,2]
# the predictions: 1st sample is 90% predicted to be in class 0
y_pred = [[.9, .1,.0], [.8, .2,.0], [.3, .7,.0], [.01, .99,.0],[.0,.0,1.0]]
print(log_loss(y_true, y_pred))
# perfect prediction
y_perfect = [[1.0, .0,.0], [1.0, .0,.0], [.0, 1.0,.0], [0, 1.0,.0],[.0,.0,1.0]]
print(log_loss(y_true, y_perfect))
x=[]
y=[]
# the for loop modifies the first prediction of an element belonging to class 0 from 0 to 1
# in other words, from a wrong to a correct prediction
for i in range(1,11):
r2=y_perfect
r2[0][0]=float(i/10)
x.append(r2[0][0])
y.append(log_loss(y_true,r2))
# plot the result
plt.xlabel("Predicted Probability")
plt.ylabel("Logarithmic Loss")
plt.title("Does an object of class X belong do class X?")
plt.plot(x,y,'o-')
#plt.savefig("test.pdf",format="pdf")
Out[77]:
Using an exhaustive sample use case that uses a naive Bayes classifier to determine wheter a Rotten Tomatoes critic is positive or negative, you will see how cross-validation works in practice.