In [1]:

    
import keras
print(f"Keras Version: {keras.__version__}")
import tensorflow as tf
print(f"Tensorflow Version {tf.__version__}")









    



Using TensorFlow backend.






    



Keras Version: 2.0.4
Tensorflow Version 1.1.0

Keras is a high level wrapper (API) for Tensorflow and Theano which aims to make them easier to use. Tensorflow gets quite verbose and there is a lot of detail to handle, which Keras trys to abstract away to sane defaults, while allowing the option to tinker with the tensors where wanted.

the data

To get a feel for Keras, I'm seeing how it goes with MNIST.

Keras already has some datasets included, so using the ever popular mnist:

MNIST database of handwritten digits

Dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.



In [2]:

    
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()









    



Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz

Checking the data:



In [3]:

    
f"Shapes x_train: {x_train.shape}, y_train: {y_train.shape}, x_test: {x_test.shape}, y_test: {y_test.shape}"









    Out[3]:





'Shapes x_train: (60000, 28, 28), y_train: (60000,), x_test: (10000, 28, 28), y_test: (10000,)'

The train and test images are 28x28 sized images, which we need to reshape into a 1d vector to make our super simple NN deal with.

Now, it's a good idea to always eyeball the data, so here goes:



In [33]:

    
# min to max values in x_train
x_train.min(), x_train.max()









    Out[33]:





(0, 255)



In [128]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

fig, axes = plt.subplots(2,5, figsize=(10,4))

for i, ax in enumerate(axes.flatten()):
    ax.imshow(x_train[i], cmap='gray')
    ax.set_title(f"Class {y_train[i]}")
    ax.set_xticks([]) , ax.set_yticks([])



In [129]:

    
y_train[:10]









    Out[129]:





array([5, 0, 4, 1, 9, 2, 1, 3, 1, 4], dtype=uint8)

ok, we've seen the data, but we need to preprocess it into a neural net friendly shape.

preprocessing the data

The image data is 60K 28x28 images, and the image test data is 10K 28x28 images. We want the number of images to stay the same, while the 28x28 should become 784. Since the data is just numpy arrays we can use np.reshape:



In [32]:

    
X_train = x_train.reshape(-1, 28*28)
X_test = x_test.reshape(-1, 28*28)
x_train.shape, X_train.shape, x_test.shape, X_test.shape









    Out[32]:





((60000, 28, 28), (60000, 784), (10000, 28, 28), (10000, 784))

that was easy!

Now, often image data is normalized:



In [34]:

    
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
X_train.min(), X_train.max()









    Out[34]:





(0.0, 1.0)

Moving on to the image labels:

the image labels are stored as a simple numpy array, with each entry telling us what number each corresponding drawing is. Since our NN will spit out a prediction of the likelyhood of what number the drawing is, our NN will work better with the y data one hot encoded.



In [35]:

    
print("Existing image labels")
print(f"y_train: {y_train[:10]} | y_test: {y_test[:10]}")

from keras.utils import np_utils
Y_train = np_utils.to_categorical(y_train)
Y_test = np_utils.to_categorical(y_test)

print(f"Y_Train encoded: {Y_train[0]}")
print(f"Y_test encoded: {Y_test[0]}")









    



Existing image labels
y_train: [5 0 4 1 9 2 1 3 1 4] | y_test: [7 2 1 0 4 1 4 9 5 9]
Y_Train encoded: [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
Y_test encoded: [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]

so now our data is all ready to go!

A simple neural net



In [235]:

    
EPOCHS = 20

from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
model = Sequential()

from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=2, verbose=1)

model.add(Dense(32, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.05))

model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.05))

model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# we can either use part of the training set as validation data or provide a validation set
history = model.fit(X_train, Y_train, epochs=EPOCHS, batch_size=128, shuffle=True, 
                    validation_split=0.05, callbacks=[early_stopping])

#model.fit(X_train, Y_train, epochs=10, batch_size=128, shuffle=True, validation_data=(X_test,Y_test))









    



Train on 57000 samples, validate on 3000 samples
Epoch 1/20
57000/57000 [==============================] - 3s - loss: 0.5108 - acc: 0.8530 - val_loss: 0.1717 - val_acc: 0.9543
Epoch 2/20
57000/57000 [==============================] - 2s - loss: 0.2469 - acc: 0.9255 - val_loss: 0.1275 - val_acc: 0.9667
Epoch 3/20
57000/57000 [==============================] - 3s - loss: 0.1972 - acc: 0.9409 - val_loss: 0.1128 - val_acc: 0.9730
Epoch 4/20
57000/57000 [==============================] - 3s - loss: 0.1701 - acc: 0.9482 - val_loss: 0.1065 - val_acc: 0.9733
Epoch 5/20
57000/57000 [==============================] - 2s - loss: 0.1508 - acc: 0.9544 - val_loss: 0.1009 - val_acc: 0.9733
Epoch 6/20
57000/57000 [==============================] - 3s - loss: 0.1373 - acc: 0.9575 - val_loss: 0.0959 - val_acc: 0.9743
Epoch 7/20
57000/57000 [==============================] - 2s - loss: 0.1266 - acc: 0.9613 - val_loss: 0.0895 - val_acc: 0.9750
Epoch 8/20
57000/57000 [==============================] - 3s - loss: 0.1190 - acc: 0.9635 - val_loss: 0.0815 - val_acc: 0.9760
Epoch 9/20
57000/57000 [==============================] - 3s - loss: 0.1132 - acc: 0.9645 - val_loss: 0.0798 - val_acc: 0.9790
Epoch 10/20
57000/57000 [==============================] - 2s - loss: 0.1084 - acc: 0.9665 - val_loss: 0.0780 - val_acc: 0.9783
Epoch 11/20
57000/57000 [==============================] - 2s - loss: 0.1021 - acc: 0.9683 - val_loss: 0.0820 - val_acc: 0.9797
Epoch 12/20
57000/57000 [==============================] - 2s - loss: 0.0987 - acc: 0.9695 - val_loss: 0.0790 - val_acc: 0.9757
Epoch 13/20
57000/57000 [==============================] - 2s - loss: 0.0927 - acc: 0.9712 - val_loss: 0.0822 - val_acc: 0.9773
Epoch 00012: early stopping



In [236]:

    
model.evaluate(X_test, Y_test, batch_size=256)









    



 7936/10000 [======================>.......] - ETA: 0s





    Out[236]:





[0.11057286697514356, 0.96889999999999998]

and viola, this super simple NN gets 97% accuracy!



In [229]:

    
model.summary()









    



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_46 (Dense)             (None, 32)                25120     
_________________________________________________________________
activation_46 (Activation)   (None, 32)                0         
_________________________________________________________________
dropout_31 (Dropout)         (None, 32)                0         
_________________________________________________________________
dense_47 (Dense)             (None, 64)                2112      
_________________________________________________________________
activation_47 (Activation)   (None, 64)                0         
_________________________________________________________________
dropout_32 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_48 (Dense)             (None, 10)                650       
_________________________________________________________________
activation_48 (Activation)   (None, 10)                0         
=================================================================
Total params: 27,882
Trainable params: 27,882
Non-trainable params: 0
_________________________________________________________________



In [230]:

    
history.history.keys()









    Out[230]:





dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])



In [237]:

    
fig, axs = plt.subplots(1,2,figsize=(15,5))

acc = axs[0]
acc.plot(history.history['val_acc'])
acc.plot(history.history['acc'])
acc.legend(['val_acc', 'acc'])
acc.set_title('Model Accuracy')
acc.set_ylabel('Accuracy')
acc.set_xlabel('Epoch')

loss = axs[1]
loss.plot(history.history['val_loss'])
loss.plot(history.history['loss'])
loss.legend(['val_loss', 'loss'])
loss.set_title('Model Loss')
loss.set_ylabel('Loss')
loss.set_xlabel('Epoch')
plt.show();

Now to to see how different hyperparameters affect the network:



In [238]:

    
b_size = [32,64,128]
history_runs = []

for b in b_size:
    print(f'Training on batchsize {b}')
    history_runs.append(model.fit(X_train, Y_train, epochs=20, batch_size=b, 
                                  shuffle=True, validation_data=(X_test,Y_test), callbacks=[early_stopping]))









    



Training on batchsize 32
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
60000/60000 [==============================] - 9s - loss: 0.1069 - acc: 0.9678 - val_loss: 0.1206 - val_acc: 0.9666
Epoch 2/20
60000/60000 [==============================] - 9s - loss: 0.1070 - acc: 0.9684 - val_loss: 0.1223 - val_acc: 0.9683
Epoch 3/20
60000/60000 [==============================] - 8s - loss: 0.1062 - acc: 0.9692 - val_loss: 0.1220 - val_acc: 0.9692
Epoch 4/20
60000/60000 [==============================] - 9s - loss: 0.1030 - acc: 0.9698 - val_loss: 0.1224 - val_acc: 0.9691
Epoch 00003: early stopping
Training on batchsize 64
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
60000/60000 [==============================] - 4s - loss: 0.0954 - acc: 0.9716 - val_loss: 0.1125 - val_acc: 0.9709
Epoch 2/20
60000/60000 [==============================] - 4s - loss: 0.0921 - acc: 0.9732 - val_loss: 0.1077 - val_acc: 0.9709
Epoch 3/20
60000/60000 [==============================] - 4s - loss: 0.0917 - acc: 0.9736 - val_loss: 0.1164 - val_acc: 0.9709
Epoch 4/20
60000/60000 [==============================] - 4s - loss: 0.0884 - acc: 0.9736 - val_loss: 0.1080 - val_acc: 0.9714
Epoch 5/20
60000/60000 [==============================] - 4s - loss: 0.0861 - acc: 0.9747 - val_loss: 0.1116 - val_acc: 0.9721
Epoch 00004: early stopping
Training on batchsize 128
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
60000/60000 [==============================] - 2s - loss: 0.0796 - acc: 0.9760 - val_loss: 0.1085 - val_acc: 0.9721
Epoch 2/20
60000/60000 [==============================] - 2s - loss: 0.0753 - acc: 0.9774 - val_loss: 0.1131 - val_acc: 0.9719
Epoch 3/20
60000/60000 [==============================] - 2s - loss: 0.0740 - acc: 0.9773 - val_loss: 0.1110 - val_acc: 0.9727
Epoch 4/20
60000/60000 [==============================] - 2s - loss: 0.0723 - acc: 0.9776 - val_loss: 0.1042 - val_acc: 0.9739
Epoch 5/20
60000/60000 [==============================] - 2s - loss: 0.0721 - acc: 0.9776 - val_loss: 0.1103 - val_acc: 0.9719
Epoch 6/20
60000/60000 [==============================] - 2s - loss: 0.0698 - acc: 0.9787 - val_loss: 0.1099 - val_acc: 0.9737
Epoch 7/20
60000/60000 [==============================] - 2s - loss: 0.0680 - acc: 0.9787 - val_loss: 0.1142 - val_acc: 0.9710
Epoch 00006: early stopping



In [240]:

    
fig, axs = plt.subplots(1,2,figsize=(18,5))

acc_legend = []
loss_legend = []

for history, b in zip(history_runs, b_size):
    acc = axs[0]
    acc.plot(history.history['val_acc'], linewidth=1.2, label='val_acc'+ str(b))
    acc.plot(history.history['acc'], linestyle='--', linewidth=2.5, label='acc'+ str(b))
    acc.set_title('Model Accuracy')
    acc.set_ylabel('Accuracy')
    acc.set_xlabel('Epoch')
    
    loss = axs[1]
    loss.plot(history.history['val_loss'], linewidth=1.2, label='val_loss '+ str(b))
    loss.plot(history.history['loss'], linestyle='--', linewidth=2.5, label='loss '+ str(b))
    loss.set_title('Model Loss')
    loss.set_ylabel('Loss')
    loss.set_xlabel('Epoch')

acc.legend(fontsize=14)
loss.legend(fontsize=14)
plt.show();

that makes it easier to see the specific value for batch size, as well as a quick eyeball check on wether the network is over training - which it seems to be, since accuracy is going down and loss is going up at about epoch 6.

A more complex parameter search would try out combinations of parameters, like other optimizers, different learning rates, and son on, but it depends on the problem and the computing time available.

For a more thorough parameter search there are libraries like grid search and others like hyperas.