In [1]:
import numpy as np
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
We're going to use some examples from https://github.com/fchollet/keras/tree/master/examples. There are tons more and you should check them out! We'll use these examples to learn about some different sorts of layers, and strategies for our activation functions, loss functions, optimizers, etc.
This examples is from https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py. It's a good one to start with because it's not much more complex than what we have seen, but uses real data!
In [2]:
import keras
from keras.datasets import mnist # load up the training data!
from keras.models import Sequential # our model
from keras.layers import Dense, Dropout # Dropout laters?!
from keras.optimizers import RMSprop # our optimizer
Typically it's good practice to specify your parameters together
In [3]:
batch_size = 128
num_classes = 10
epochs = 10 # this is too low
Now get the data. It's nicely split up between training and testing data which we'll see can be useful. We'll also see that this data treats the images as matrices (row is an observation, column is a pixel). However, the input data doesn't need to be a matrix.
In [4]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
In [5]:
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
The tutorial then makes a few changes to the data.
First, reshape it -- to make sure that the rows and columns are what we expect them to be.
Then, divide by 255 so that the values go from 0 to 1.
Such scaling is typically a good idea.
It also treats the $X$ values as float32 which you don't have to worry about too much but makes computation a bit faster (at the expense of non-critical numerical detail).
In [6]:
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
As before we use the to_categorical() function
In [7]:
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
Now define our model
In [8]:
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax')) # remember y has 10 categories!
In [9]:
# comment this line if you don't have graphviz installed
SVG(model_to_dot(model).create(prog='dot', format='svg'))
Out[9]:
What is a "dropout layer"? See Quora:
Using “dropout", you randomly deactivate certain units (neurons) in a layer with a certain probability $p$. So, if you set half of the activations of a layer to zero, the neural network won’t be able to rely on particular activations in a given feed-forward pass during training. As a consequence, the neural network will learn different, redundant representations; the network can’t rely on the particular neurons and the combination (or interaction) of these to be present. Another nice side effect is that training will be faster.
We can use the summary() method to look at our model instead of the plot -- this will work on your computer.
In [10]:
model.summary()
In [11]:
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
Now let's run our model.
Note that by giving it a name (history = model.fit(...) we'll be able to access some of its outputs.
We also use the validation_data argument to make it print out the model performance on validation data (which is not used for fitting the model/calculating the back-propagation).
The verbose=1 makes the model talk to us as it fits -- put 0 to make it run silently
In [12]:
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
Now we can score our model
In [17]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
It's nice to see how our model performs on validation data. This gives us a nice benchmark on how well the model generalizes to data that it hasn't used in training before. However, there are some limitations.