In [1]:
import keras
print(f"Keras Version: {keras.__version__}")
import tensorflow as tf
print(f"Tensorflow Version {tf.__version__}")
Keras is a high level wrapper (API) for Tensorflow and Theano which aims to make them easier to use. Tensorflow gets quite verbose and there is a lot of detail to handle, which Keras trys to abstract away to sane defaults, while allowing the option to tinker with the tensors where wanted.
To get a feel for Keras, I'm seeing how it goes with MNIST.
Keras already has some datasets included, so using the ever popular mnist:
MNIST database of handwritten digits
Dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
In [2]:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Checking the data:
In [3]:
f"Shapes x_train: {x_train.shape}, y_train: {y_train.shape}, x_test: {x_test.shape}, y_test: {y_test.shape}"
Out[3]:
The train and test images are 28x28
sized images, which we need to reshape into a 1d vector to make our super simple NN deal with.
Now, it's a good idea to always eyeball the data, so here goes:
In [33]:
# min to max values in x_train
x_train.min(), x_train.max()
Out[33]:
In [128]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
fig, axes = plt.subplots(2,5, figsize=(10,4))
for i, ax in enumerate(axes.flatten()):
ax.imshow(x_train[i], cmap='gray')
ax.set_title(f"Class {y_train[i]}")
ax.set_xticks([]) , ax.set_yticks([])
In [129]:
y_train[:10]
Out[129]:
ok, we've seen the data, but we need to preprocess it into a neural net friendly shape.
The image data is 60K 28x28
images, and the image test data is 10K 28x28
images. We want the number of images to stay the same, while the 28x28 should become 784. Since the data is just numpy arrays we can use np.reshape:
In [32]:
X_train = x_train.reshape(-1, 28*28)
X_test = x_test.reshape(-1, 28*28)
x_train.shape, X_train.shape, x_test.shape, X_test.shape
Out[32]:
that was easy!
Now, often image data is normalized:
In [34]:
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
X_train.min(), X_train.max()
Out[34]:
Moving on to the image labels:
the image labels are stored as a simple numpy array, with each entry telling us what number each corresponding drawing is. Since our NN will spit out a prediction of the likelyhood of what number the drawing is, our NN will work better with the y data one hot encoded.
In [35]:
print("Existing image labels")
print(f"y_train: {y_train[:10]} | y_test: {y_test[:10]}")
from keras.utils import np_utils
Y_train = np_utils.to_categorical(y_train)
Y_test = np_utils.to_categorical(y_test)
print(f"Y_Train encoded: {Y_train[0]}")
print(f"Y_test encoded: {Y_test[0]}")
In [235]:
EPOCHS = 20
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
model = Sequential()
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=2, verbose=1)
model.add(Dense(32, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.05))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.05))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# we can either use part of the training set as validation data or provide a validation set
history = model.fit(X_train, Y_train, epochs=EPOCHS, batch_size=128, shuffle=True,
validation_split=0.05, callbacks=[early_stopping])
#model.fit(X_train, Y_train, epochs=10, batch_size=128, shuffle=True, validation_data=(X_test,Y_test))
In [236]:
model.evaluate(X_test, Y_test, batch_size=256)
Out[236]:
and viola, this super simple NN gets 97% accuracy!
In [229]:
model.summary()
In [230]:
history.history.keys()
Out[230]:
In [237]:
fig, axs = plt.subplots(1,2,figsize=(15,5))
acc = axs[0]
acc.plot(history.history['val_acc'])
acc.plot(history.history['acc'])
acc.legend(['val_acc', 'acc'])
acc.set_title('Model Accuracy')
acc.set_ylabel('Accuracy')
acc.set_xlabel('Epoch')
loss = axs[1]
loss.plot(history.history['val_loss'])
loss.plot(history.history['loss'])
loss.legend(['val_loss', 'loss'])
loss.set_title('Model Loss')
loss.set_ylabel('Loss')
loss.set_xlabel('Epoch')
plt.show();
Now to to see how different hyperparameters affect the network:
In [238]:
b_size = [32,64,128]
history_runs = []
for b in b_size:
print(f'Training on batchsize {b}')
history_runs.append(model.fit(X_train, Y_train, epochs=20, batch_size=b,
shuffle=True, validation_data=(X_test,Y_test), callbacks=[early_stopping]))
In [240]:
fig, axs = plt.subplots(1,2,figsize=(18,5))
acc_legend = []
loss_legend = []
for history, b in zip(history_runs, b_size):
acc = axs[0]
acc.plot(history.history['val_acc'], linewidth=1.2, label='val_acc'+ str(b))
acc.plot(history.history['acc'], linestyle='--', linewidth=2.5, label='acc'+ str(b))
acc.set_title('Model Accuracy')
acc.set_ylabel('Accuracy')
acc.set_xlabel('Epoch')
loss = axs[1]
loss.plot(history.history['val_loss'], linewidth=1.2, label='val_loss '+ str(b))
loss.plot(history.history['loss'], linestyle='--', linewidth=2.5, label='loss '+ str(b))
loss.set_title('Model Loss')
loss.set_ylabel('Loss')
loss.set_xlabel('Epoch')
acc.legend(fontsize=14)
loss.legend(fontsize=14)
plt.show();
that makes it easier to see the specific value for batch size, as well as a quick eyeball check on wether the network is over training - which it seems to be, since accuracy is going down and loss is going up at about epoch 6.
A more complex parameter search would try out combinations of parameters, like other optimizers, different learning rates, and son on, but it depends on the problem and the computing time available.
For a more thorough parameter search there are libraries like grid search and others like hyperas.