CNN from scratch - Keras+TensorFlow

This is for CNN models built from scratch, using Keras based on TensorFlow. First, some preparation work.


In [1]:
from keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten, Activation, add
from keras.layers.core import Dropout
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import GlobalAveragePooling2D
from keras.optimizers import RMSprop
from keras.models import Model, Sequential, load_model
from keras.callbacks import ModelCheckpoint, EarlyStopping
from keras import backend as ktf
from keras.preprocessing.image import ImageDataGenerator
from lib.data_utils import get_MNIST_data
import matplotlib.pyplot as plt
import numpy as np
%matplotlib notebook


Using TensorFlow backend.

Read the MNIST data. Notice that we assume that it's 'kaggle-DigitRecognizer/data/train.csv', and we use helper function to read into a dictionary.


In [2]:
# by default, there would be 41000 training data, 1000 test data and 1000 validation data(within traning set)
data = get_MNIST_data(num_validation=4000)

# see if we get the data correctly
print('image size: ', data['X_train'].shape)


image size:  (41000, 28, 28, 1)

Simple CNN model

Build a simple CNN model using Keras and then train from scratch.


In [11]:
# model architecture
# [batchnorm-Conv-Conv-maxpool]x2 - [dense]x2 - [softmax]
# new lowest: 1.01 0.79 (0.0121, 0.76, 1974, True, False)
# new lowest: 1.23 0.73 (0.0044, 0.45, 1392, True, False)
simple_CNN = Sequential()
simple_CNN.add(BatchNormalization(input_shape=(28, 28, 1)))
simple_CNN.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
simple_CNN.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
simple_CNN.add(MaxPooling2D((2, 2))) # (14,14,32)
simple_CNN.add(Dropout(0.2))

simple_CNN.add(BatchNormalization())
simple_CNN.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
simple_CNN.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
simple_CNN.add(MaxPooling2D((2, 2))) # (7,7,64)
simple_CNN.add(Dropout(0.2))

simple_CNN.add(Flatten())
simple_CNN.add(Dense(1392, activation='relu'))
simple_CNN.add(Dropout(0.45))
simple_CNN.add(Dense(10, activation='softmax'))

# set loss and optimizer
rmsprop = RMSprop(lr=0.0044, decay=0.99)
simple_CNN.compile(loss='sparse_categorical_crossentropy', optimizer=rmsprop, metrics=['accuracy'])

# train the model
checkpoint = ModelCheckpoint('../models/simpleCNN_{epoch:02d}-{loss:.4f}.h5',
                             monitor='loss',
                             save_best_only=True)
earlystop = EarlyStopping(min_delta=0.0001, patience=3)

# use test data to monitor early stopping
simple_CNN.fit(data['X_train'], data['y_train'].reshape(-1,1),
               batch_size=64,
               epochs=200,
               validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
               callbacks=[checkpoint, earlystop],
               initial_epoch=0)


Train on 41000 samples, validate on 1000 samples
Epoch 1/200
41000/41000 [==============================] - 87s - loss: 0.5541 - acc: 0.8399 - val_loss: 0.2823 - val_acc: 0.9160
Epoch 2/200
41000/41000 [==============================] - 86s - loss: 0.3203 - acc: 0.9033 - val_loss: 0.2407 - val_acc: 0.9290
Epoch 3/200
41000/41000 [==============================] - 86s - loss: 0.2901 - acc: 0.9116 - val_loss: 0.2225 - val_acc: 0.9340
Epoch 4/200
41000/41000 [==============================] - 87s - loss: 0.2690 - acc: 0.9186 - val_loss: 0.2117 - val_acc: 0.9370
Epoch 5/200
41000/41000 [==============================] - 87s - loss: 0.2545 - acc: 0.9233 - val_loss: 0.2037 - val_acc: 0.9430
Epoch 6/200
41000/41000 [==============================] - 87s - loss: 0.2467 - acc: 0.9255 - val_loss: 0.1984 - val_acc: 0.9440
Epoch 7/200
41000/41000 [==============================] - 87s - loss: 0.2399 - acc: 0.9267 - val_loss: 0.1939 - val_acc: 0.9450
Epoch 8/200
41000/41000 [==============================] - 88s - loss: 0.2354 - acc: 0.9279 - val_loss: 0.1906 - val_acc: 0.9470
Epoch 9/200
41000/41000 [==============================] - 88s - loss: 0.2357 - acc: 0.9293 - val_loss: 0.1871 - val_acc: 0.9490
Epoch 10/200
41000/41000 [==============================] - 88s - loss: 0.2267 - acc: 0.9311 - val_loss: 0.1847 - val_acc: 0.9500
Epoch 11/200
41000/41000 [==============================] - 87s - loss: 0.2280 - acc: 0.9320 - val_loss: 0.1821 - val_acc: 0.9500
Epoch 12/200
41000/41000 [==============================] - 87s - loss: 0.2224 - acc: 0.9323 - val_loss: 0.1801 - val_acc: 0.9510
Epoch 13/200
41000/41000 [==============================] - 89s - loss: 0.2177 - acc: 0.9345 - val_loss: 0.1785 - val_acc: 0.9520
Epoch 14/200
41000/41000 [==============================] - 86s - loss: 0.2168 - acc: 0.9349 - val_loss: 0.1768 - val_acc: 0.9520
Epoch 15/200
41000/41000 [==============================] - 86s - loss: 0.2154 - acc: 0.9339 - val_loss: 0.1754 - val_acc: 0.9520
Epoch 16/200
41000/41000 [==============================] - 86s - loss: 0.2115 - acc: 0.9360 - val_loss: 0.1738 - val_acc: 0.9520
Epoch 17/200
41000/41000 [==============================] - 87s - loss: 0.2094 - acc: 0.9359 - val_loss: 0.1724 - val_acc: 0.9520
Epoch 18/200
41000/41000 [==============================] - 87s - loss: 0.2100 - acc: 0.9366 - val_loss: 0.1711 - val_acc: 0.9520
Epoch 19/200
41000/41000 [==============================] - 85s - loss: 0.2095 - acc: 0.9357 - val_loss: 0.1700 - val_acc: 0.9530
Epoch 20/200
41000/41000 [==============================] - 86s - loss: 0.2048 - acc: 0.9390 - val_loss: 0.1690 - val_acc: 0.9530
Epoch 21/200
41000/41000 [==============================] - 85s - loss: 0.2051 - acc: 0.9393 - val_loss: 0.1679 - val_acc: 0.9530
Epoch 22/200
41000/41000 [==============================] - 85s - loss: 0.2043 - acc: 0.9378 - val_loss: 0.1669 - val_acc: 0.9540
Epoch 23/200
41000/41000 [==============================] - 86s - loss: 0.2014 - acc: 0.9379 - val_loss: 0.1660 - val_acc: 0.9540
Epoch 24/200
41000/41000 [==============================] - 87s - loss: 0.2009 - acc: 0.9391 - val_loss: 0.1652 - val_acc: 0.9530
Epoch 25/200
41000/41000 [==============================] - 86s - loss: 0.1990 - acc: 0.9395 - val_loss: 0.1645 - val_acc: 0.9530
Epoch 26/200
41000/41000 [==============================] - 87s - loss: 0.1987 - acc: 0.9409 - val_loss: 0.1640 - val_acc: 0.9530
Epoch 27/200
41000/41000 [==============================] - 86s - loss: 0.1971 - acc: 0.9408 - val_loss: 0.1634 - val_acc: 0.9540
Epoch 28/200
41000/41000 [==============================] - 86s - loss: 0.1965 - acc: 0.9402 - val_loss: 0.1627 - val_acc: 0.9540
Epoch 29/200
41000/41000 [==============================] - 86s - loss: 0.1934 - acc: 0.9416 - val_loss: 0.1620 - val_acc: 0.9540
Epoch 30/200
41000/41000 [==============================] - 86s - loss: 0.1943 - acc: 0.9419 - val_loss: 0.1615 - val_acc: 0.9540
Epoch 31/200
41000/41000 [==============================] - 86s - loss: 0.1936 - acc: 0.9422 - val_loss: 0.1609 - val_acc: 0.9540
Epoch 32/200
41000/41000 [==============================] - 85s - loss: 0.1925 - acc: 0.9424 - val_loss: 0.1603 - val_acc: 0.9540
Epoch 33/200
41000/41000 [==============================] - 86s - loss: 0.1927 - acc: 0.9412 - val_loss: 0.1597 - val_acc: 0.9540
Epoch 34/200
41000/41000 [==============================] - 86s - loss: 0.1916 - acc: 0.9419 - val_loss: 0.1592 - val_acc: 0.9540
Epoch 35/200
41000/41000 [==============================] - 86s - loss: 0.1906 - acc: 0.9421 - val_loss: 0.1587 - val_acc: 0.9540
Epoch 36/200
41000/41000 [==============================] - 86s - loss: 0.1918 - acc: 0.9413 - val_loss: 0.1582 - val_acc: 0.9540
Epoch 37/200
41000/41000 [==============================] - 87s - loss: 0.1916 - acc: 0.9424 - val_loss: 0.1577 - val_acc: 0.9540
Epoch 38/200
41000/41000 [==============================] - 87s - loss: 0.1928 - acc: 0.9410 - val_loss: 0.1572 - val_acc: 0.9540
Epoch 39/200
41000/41000 [==============================] - 88s - loss: 0.1908 - acc: 0.9423 - val_loss: 0.1568 - val_acc: 0.9540
Epoch 40/200
41000/41000 [==============================] - 87s - loss: 0.1893 - acc: 0.9432 - val_loss: 0.1564 - val_acc: 0.9540
Epoch 41/200
41000/41000 [==============================] - 87s - loss: 0.1873 - acc: 0.9437 - val_loss: 0.1560 - val_acc: 0.9540
Epoch 42/200
41000/41000 [==============================] - 88s - loss: 0.1903 - acc: 0.9426 - val_loss: 0.1556 - val_acc: 0.9540
Epoch 43/200
41000/41000 [==============================] - 88s - loss: 0.1856 - acc: 0.9443 - val_loss: 0.1553 - val_acc: 0.9540
Epoch 44/200
41000/41000 [==============================] - 86s - loss: 0.1879 - acc: 0.9430 - val_loss: 0.1548 - val_acc: 0.9540
Epoch 45/200
41000/41000 [==============================] - 86s - loss: 0.1899 - acc: 0.9425 - val_loss: 0.1545 - val_acc: 0.9550
Epoch 46/200
41000/41000 [==============================] - 87s - loss: 0.1834 - acc: 0.9441 - val_loss: 0.1541 - val_acc: 0.9550
Epoch 47/200
41000/41000 [==============================] - 86s - loss: 0.1870 - acc: 0.9426 - val_loss: 0.1538 - val_acc: 0.9550
Epoch 48/200
41000/41000 [==============================] - 85s - loss: 0.1848 - acc: 0.9456 - val_loss: 0.1535 - val_acc: 0.9550
Epoch 49/200
41000/41000 [==============================] - 85s - loss: 0.1853 - acc: 0.9435 - val_loss: 0.1532 - val_acc: 0.9550
Epoch 50/200
41000/41000 [==============================] - 86s - loss: 0.1842 - acc: 0.9434 - val_loss: 0.1528 - val_acc: 0.9550
Epoch 51/200
41000/41000 [==============================] - 86s - loss: 0.1823 - acc: 0.9456 - val_loss: 0.1525 - val_acc: 0.9550
Epoch 52/200
41000/41000 [==============================] - 86s - loss: 0.1829 - acc: 0.9451 - val_loss: 0.1521 - val_acc: 0.9550
Epoch 53/200
41000/41000 [==============================] - 86s - loss: 0.1826 - acc: 0.9440 - val_loss: 0.1518 - val_acc: 0.9550
Epoch 54/200
41000/41000 [==============================] - 86s - loss: 0.1809 - acc: 0.9450 - val_loss: 0.1516 - val_acc: 0.9550
Epoch 55/200
41000/41000 [==============================] - 87s - loss: 0.1801 - acc: 0.9452 - val_loss: 0.1513 - val_acc: 0.9550
Epoch 56/200
41000/41000 [==============================] - 87s - loss: 0.1818 - acc: 0.9442 - val_loss: 0.1511 - val_acc: 0.9550
Epoch 57/200
41000/41000 [==============================] - 86s - loss: 0.1794 - acc: 0.9455 - val_loss: 0.1509 - val_acc: 0.9550
Epoch 58/200
41000/41000 [==============================] - 86s - loss: 0.1784 - acc: 0.9468 - val_loss: 0.1506 - val_acc: 0.9550
Epoch 59/200
41000/41000 [==============================] - 87s - loss: 0.1774 - acc: 0.9458 - val_loss: 0.1503 - val_acc: 0.9550
Epoch 60/200
41000/41000 [==============================] - 86s - loss: 0.1813 - acc: 0.9441 - val_loss: 0.1501 - val_acc: 0.9550
Epoch 61/200
41000/41000 [==============================] - 88s - loss: 0.1778 - acc: 0.9449 - val_loss: 0.1499 - val_acc: 0.9550
Epoch 62/200
41000/41000 [==============================] - 87s - loss: 0.1788 - acc: 0.9456 - val_loss: 0.1497 - val_acc: 0.9560
Epoch 63/200
41000/41000 [==============================] - 87s - loss: 0.1802 - acc: 0.9447 - val_loss: 0.1495 - val_acc: 0.9550
Epoch 64/200
41000/41000 [==============================] - 88s - loss: 0.1792 - acc: 0.9458 - val_loss: 0.1491 - val_acc: 0.9560
Epoch 65/200
12672/41000 [========>.....................] - ETA: 60s - loss: 0.1696 - acc: 0.9493
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-11-808299f3aad6> in <module>()
     37                validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
     38                callbacks=[checkpoint, earlystop],
---> 39                initial_epoch=0)

/usr/local/lib/python3.5/dist-packages/keras/models.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
    865                               class_weight=class_weight,
    866                               sample_weight=sample_weight,
--> 867                               initial_epoch=initial_epoch)
    868 
    869     def evaluate(self, x, y, batch_size=32, verbose=1,

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1593                               initial_epoch=initial_epoch,
   1594                               steps_per_epoch=steps_per_epoch,
-> 1595                               validation_steps=validation_steps)
   1596 
   1597     def evaluate(self, x, y,

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in _fit_loop(self, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
   1180                     batch_logs['size'] = len(batch_ids)
   1181                     callbacks.on_batch_begin(batch_index, batch_logs)
-> 1182                     outs = f(ins_batch)
   1183                     if not isinstance(outs, list):
   1184                         outs = [outs]

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2268         updated = session.run(self.outputs + [self.updates_op],
   2269                               feed_dict=feed_dict,
-> 2270                               **self.session_kwargs)
   2271         return updated[:len(self.outputs)]
   2272 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1325   def _do_call(self, fn, *args):
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:
   1329       message = compat.as_text(e.message)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1304           return tf_session.TF_Run(session, options,
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 
   1308     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

In [3]:
# resume training
model = load_model('../models/simpleCNN_86-0.0034.h5')

# set the loss and optimizer
rmsprop = RMSprop(lr=0.0000000044)
model.compile(optimizer=rmsprop, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# train the model
checkpoint = ModelCheckpoint('../models/simpleCNN_{epoch:02d}-{loss:.4f}.h5',
                             monitor='loss',
                             save_best_only=True)
earlystop = EarlyStopping(min_delta=0.0001, patience=5)

model.fit(data['X_train'], data['y_train'].reshape(-1,1),
               batch_size=64,
               epochs=200,
               validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
               callbacks=[checkpoint, earlystop],
               initial_epoch=87)


Train on 41000 samples, validate on 1000 samples
Epoch 88/200
41000/41000 [==============================] - 88s - loss: 0.0125 - acc: 0.9979 - val_loss: 0.0090 - val_acc: 0.9980
Epoch 89/200
41000/41000 [==============================] - 89s - loss: 0.0143 - acc: 0.9975 - val_loss: 0.0090 - val_acc: 0.9980
Epoch 90/200
41000/41000 [==============================] - 93s - loss: 0.0135 - acc: 0.9973 - val_loss: 0.0090 - val_acc: 0.9980
Epoch 91/200
39104/41000 [===========================>..] - ETA: 4s - loss: 0.0130 - acc: 0.9976
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-3-e9221608c8bc> in <module>()
     17                validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
     18                callbacks=[checkpoint, earlystop],
---> 19                initial_epoch=87)

/usr/local/lib/python3.5/dist-packages/keras/models.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
    865                               class_weight=class_weight,
    866                               sample_weight=sample_weight,
--> 867                               initial_epoch=initial_epoch)
    868 
    869     def evaluate(self, x, y, batch_size=32, verbose=1,

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1593                               initial_epoch=initial_epoch,
   1594                               steps_per_epoch=steps_per_epoch,
-> 1595                               validation_steps=validation_steps)
   1596 
   1597     def evaluate(self, x, y,

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in _fit_loop(self, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
   1180                     batch_logs['size'] = len(batch_ids)
   1181                     callbacks.on_batch_begin(batch_index, batch_logs)
-> 1182                     outs = f(ins_batch)
   1183                     if not isinstance(outs, list):
   1184                         outs = [outs]

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2268         updated = session.run(self.outputs + [self.updates_op],
   2269                               feed_dict=feed_dict,
-> 2270                               **self.session_kwargs)
   2271         return updated[:len(self.outputs)]
   2272 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1325   def _do_call(self, fn, *args):
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:
   1329       message = compat.as_text(e.message)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1304           return tf_session.TF_Run(session, options,
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 
   1308     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

Mini ResNet

Build the small ResNet with 22 layers using Keras and train from scratch.


In [ ]:
# model architecture
# [Conv-batchnorm-relu]x4 - [residual: [Conv-batchnorm-relu]x2-Conv-batchnorm-add-relu]x6
# 4
inputs = Input(shape=(28, 28, 1))
x = Conv2D(64, (7, 7), padding='same')(inputs)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(64, (1, 1), padding='same')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(64, (3, 3), padding='same')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(256, (1, 1), padding='same')(x)
x = BatchNormalization()(x)
res = MaxPooling2D((2, 2))(x) # (14, 14, 64)

# repeated residual modules
for i in range(6): # 6x3 = 18
    x = Conv2D(64, (1, 1), padding='same')(res)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(64, (3, 3), padding='same')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(256, (1, 1), padding='same')(x)
    x = BatchNormalization()(x)
    x = add([x, res])
    res = Activation('relu')(x)

x = GlobalAveragePooling2D(data_format='channels_last')(res) #(,256)
predictions = Dense(10, activation='softmax')(x)

# connect the model
mini_ResNet = Model(inputs=inputs, outputs=predictions)

# set loss and optimizer
rmsprop = RMSprop(lr=0.1, decay=0.9999)
mini_ResNet.compile(loss='sparse_categorical_crossentropy', optimizer=rmsprop, metrics=['accuracy'])

# train the model
checkpoint = ModelCheckpoint('miniResNet_{epoch:02d}-{accuracy:.2f}.h5',
                             monitor='accuracy',
                             save_best_only=True)
plateau = ReduceLROnPlateau(factor=0.1, patience=3, min_lr=0.0001)
mini_ResNet.fit(data['X_train'], data['y_train'].reshape(-1, 1), 
                batch_size=32, epochs=10,
                callbacks=[checkpoint, plateau])

# test the model and see accuracy
score = mini_ResNet.evaluate(data['X_test'], data['y_test'].reshape(-1, 1), batch_size=32)
print(score)

In [ ]:
# save the model: 0.903
mini_ResNet.save('mini_ResNet.h5')

Simple CNN with residual connections

Inspired by ResNet, we try to add residual connections to the simple CNN model above and see if there exists difference of performance.


In [17]:
# model architecture
# [Conv] - [batchnorm-Conv-Conv-add-maxpool]x2 - [dense]x2 - [softmax]
inputs = Input(shape=(28,28,1))
x = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)

res = BatchNormalization()(x) # (28, 28, 64)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(res)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = add([res, x])
x = MaxPooling2D((2, 2))(x)

res = BatchNormalization()(x) # (14, 14, 64)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(res)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = add([res, x])
x = MaxPooling2D((2, 2))(x)

x = GlobalAveragePooling2D(data_format='channels_last')(x)
predictions = Dense(10, activation='softmax')(x)

simple_resCNN = Model(inputs=inputs,outputs=predictions)

# set loss and optimizer
rmsprop = RMSprop(lr=0.01, decay=0.978)
simple_resCNN.compile(loss='sparse_categorical_crossentropy', optimizer=rmsprop, metrics=['accuracy'])

# train the model
checkpoint = ModelCheckpoint('../models/simpleResCNN_{epoch:02d}-{loss:.4f}.h5',
                             monitor='loss',
                             save_best_only=True)
earlystop = EarlyStopping(min_delta=0.0001, patience=5)

# use test data to monitor early stopping
simple_resCNN.fit(data['X_train'], data['y_train'].reshape(-1,1),
               batch_size=64,
               epochs=200,
               validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
               callbacks=[checkpoint, earlystop],
               initial_epoch=0)


Train on 41000 samples, validate on 1000 samples
Epoch 1/200
41000/41000 [==============================] - 191s - loss: 1.7076 - acc: 0.5502 - val_loss: 1.3702 - val_acc: 0.7250
Epoch 2/200
41000/41000 [==============================] - 243s - loss: 1.2667 - acc: 0.7288 - val_loss: 1.1595 - val_acc: 0.7870
Epoch 3/200
41000/41000 [==============================] - 248s - loss: 1.1169 - acc: 0.7659 - val_loss: 1.0416 - val_acc: 0.8060
Epoch 4/200
41000/41000 [==============================] - 250s - loss: 1.0223 - acc: 0.7837 - val_loss: 0.9624 - val_acc: 0.8150
Epoch 5/200
41000/41000 [==============================] - 249s - loss: 0.9558 - acc: 0.7957 - val_loss: 0.9050 - val_acc: 0.8240
Epoch 6/200
41000/41000 [==============================] - 248s - loss: 0.9049 - acc: 0.8035 - val_loss: 0.8609 - val_acc: 0.8300
Epoch 7/200
41000/41000 [==============================] - 164s - loss: 0.8647 - acc: 0.8120 - val_loss: 0.8253 - val_acc: 0.8400
Epoch 8/200
 2752/41000 [=>............................] - ETA: 150s - loss: 0.8437 - acc: 0.8081
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-17-2f8bc5017bd8> in <module>()
     37                validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
     38                callbacks=[checkpoint, earlystop],
---> 39                initial_epoch=0)

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1593                               initial_epoch=initial_epoch,
   1594                               steps_per_epoch=steps_per_epoch,
-> 1595                               validation_steps=validation_steps)
   1596 
   1597     def evaluate(self, x, y,

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in _fit_loop(self, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
   1180                     batch_logs['size'] = len(batch_ids)
   1181                     callbacks.on_batch_begin(batch_index, batch_logs)
-> 1182                     outs = f(ins_batch)
   1183                     if not isinstance(outs, list):
   1184                         outs = [outs]

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2268         updated = session.run(self.outputs + [self.updates_op],
   2269                               feed_dict=feed_dict,
-> 2270                               **self.session_kwargs)
   2271         return updated[:len(self.outputs)]
   2272 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1325   def _do_call(self, fn, *args):
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:
   1329       message = compat.as_text(e.message)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1304           return tf_session.TF_Run(session, options,
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 
   1308     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

In [6]:
# resume training
model = load_model('../models/simpleCNN_29-0.4773.h5')

# set the loss and optimizer
rmsprop = RMSprop(lr=0.00001,decay=0.978)
model.compile(optimizer=rmsprop, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# train the model
checkpoint = ModelCheckpoint('../models/simpleCNN_{epoch:02d}-{loss:.4f}.h5',
                             monitor='loss',
                             save_best_only=True)
earlystop = EarlyStopping(min_delta=0.0001, patience=5)

model.fit(data['X_train'], data['y_train'].reshape(-1,1),
               batch_size=64,
               epochs=200,
               validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
               callbacks=[checkpoint, earlystop],
               initial_epoch=26)


Train on 41000 samples, validate on 1000 samples
Epoch 27/200
13184/41000 [========>.....................] - ETA: 114s - loss: 0.4798 - acc: 0.8984
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-6-27cd9bb736de> in <module>()
     17                validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
     18                callbacks=[checkpoint, earlystop],
---> 19                initial_epoch=26)

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1593                               initial_epoch=initial_epoch,
   1594                               steps_per_epoch=steps_per_epoch,
-> 1595                               validation_steps=validation_steps)
   1596 
   1597     def evaluate(self, x, y,

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in _fit_loop(self, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
   1180                     batch_logs['size'] = len(batch_ids)
   1181                     callbacks.on_batch_begin(batch_index, batch_logs)
-> 1182                     outs = f(ins_batch)
   1183                     if not isinstance(outs, list):
   1184                         outs = [outs]

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2268         updated = session.run(self.outputs + [self.updates_op],
   2269                               feed_dict=feed_dict,
-> 2270                               **self.session_kwargs)
   2271         return updated[:len(self.outputs)]
   2272 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1325   def _do_call(self, fn, *args):
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:
   1329       message = compat.as_text(e.message)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1304           return tf_session.TF_Run(session, options,
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 
   1308     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

In [14]:
# validate the structure
inputs = Input(shape=(28,28,1))
x = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)

res = BatchNormalization()(x) # (28, 28, 64)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(res)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = add([res, x])
x = MaxPooling2D((2, 2))(x)

res = BatchNormalization()(x) # (14, 14, 64)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(res)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = add([res, x])
x = MaxPooling2D((2, 2))(x)

x = GlobalAveragePooling2D(data_format='channels_last')(x)
predictions = Dense(10, activation='softmax')(x)

simple_resCNN = Model(inputs=inputs,outputs=predictions)

# set loss and optimizer
rmsprop = RMSprop(lr=0.001, decay=0.978)
simple_resCNN.compile(loss='sparse_categorical_crossentropy', optimizer=rmsprop, metrics=['accuracy'])

# use test data to monitor early stopping
simple_resCNN.fit(data['X_val'], data['y_val'].reshape(-1,1),
               batch_size=64,
               epochs=1,
               validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
               initial_epoch=0)


Train on 4000 samples, validate on 1000 samples
Epoch 1/1
4000/4000 [==============================] - 24s - loss: 2.1898 - acc: 0.2667 - val_loss: 3.3601 - val_acc: 0.0890
Out[14]:
<keras.callbacks.History at 0x7ff7842f7978>

Inspect the wrong answers

It's often helpful to see how the model classifies the wrong results. Here we randomly pick 10 wrong classified images from test set.


In [3]:
model = load_model('../models/simpleResCNN.h5')
pred = np.argmax(model.predict(data['X_test']), axis=1)
wrong_idx = [i for i in range(len(pred)) if pred[i] != data['y_test'][i]]

In [12]:
np.random.shuffle(wrong_idx)
fig = plt.figure(figsize=(4, 5))
for i in range(1,6):
    for j in range(1,3):
        idx = wrong_idx.pop()
        a = fig.add_subplot(j,5,i)
        plt.imshow(data['X_test'][idx].reshape((28,28)))
        plt.axis('off')
        plt.title(pred[idx])
        
plt.show()



In [ ]:


In [ ]:

Hyperparameter finetuning

We finetune the hyperparameters and also structures of SimpleCNN. Here we use random search for following hyperparameters:

  • initial learning rate
  • dropout rate(for fully-connected layer)
  • dense layer unit size And manual search for following structures:
  • convolution depth for each modules
  • number of modules

In [4]:
# validate the model and return the test error
def simpleCNN_model(lr=0.001, dropout=0.5, dense_dim=1024, drop_conv=True, avgpool=True):
    simple_CNN = Sequential()
    simple_CNN.add(BatchNormalization(input_shape=(28, 28, 1)))
    simple_CNN.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
    simple_CNN.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
    simple_CNN.add(MaxPooling2D((2, 2))) # (14,14,32)
    if drop_conv:
        simple_CNN.add(Dropout(0.2))

    simple_CNN.add(BatchNormalization())
    simple_CNN.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
    simple_CNN.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
    simple_CNN.add(MaxPooling2D((2, 2))) # (7,7,64)
    if drop_conv:
        simple_CNN.add(Dropout(0.2))

    if avgpool:
        simple_CNN.add(GlobalAveragePooling2D())
    else:
        simple_CNN.add(Flatten())
        simple_CNN.add(Dense(dense_dim, activation='relu'))
        simple_CNN.add(Dropout(dropout))
    simple_CNN.add(Dense(10, activation='softmax'))

    # set loss and optimizer
    rmsprop = RMSprop(lr=lr, decay=0.999)
    simple_CNN.compile(loss='sparse_categorical_crossentropy', optimizer=rmsprop, metrics=['accuracy'])

    # use test data to monitor early stopping
    history = simple_CNN.fit(data['X_val'], data['y_val'].reshape(-1,1),
                   batch_size=64,
                   epochs=1,
                   validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
                   initial_epoch=0,
                   verbose=False)
    return history.history['val_loss'][0], history.history['val_acc'][0]

First, random search. Each run is only 1 epoch with 4000 validation data. Hyperparamters are sampled from uniform distribution with different range, while hidden unit size requires integer. Structure 'switches' are sampled from binomial distribution(p=0.5).


In [7]:
# validation: 4000; 1 epoch
# finetune list: initial learning rate, dropout rate, hidden unit size
best_parameters={'lr': 0.001, 'dropout': 0.5, 'dense_dim': 1024}
lowest_err = 1000
lr_range = (0.1,0.0001); dropout_range = (0.3,0.8); dense_range = (512,2048)
while True:
    lr = np.random.uniform(lr_range[0], lr_range[1])
    dropout = np.random.uniform(dropout_range[0], dropout_range[1])
    dense_dim = int(np.random.uniform(dense_range[0], dense_range[1]))
    drop_conv, avgpool = np.random.binomial(1,0.5,2)
    ktf.clear_session()
    test_err, test_acc = simpleCNN_model(lr, dropout, dense_dim, drop_conv, avgpool)
    if test_err < lowest_err:
        print('new lowest: ', round(test_err,2), round(test_acc,2), 
              (round(lr,4), round(dropout,2), dense_dim, bool(drop_conv), bool(avgpool)))
        lowest_err = test_err
        best_parameters['lr'] = lr
        best_parameters['dropout'] = dropout
        best_parameters['dense_dim'] = dense_dim


new lowest:  14.57 0.1 (0.0577, 0.57, 820, False, True)
new lowest:  2.25 0.13 (0.0076, 0.69, 1985, True, True)
new lowest:  1.42 0.69 (0.0037, 0.4, 1846, False, False)
new lowest:  1.3 0.61 (0.0111, 0.45, 2006, True, False)
new lowest:  1.23 0.73 (0.0044, 0.45, 1392, True, False)
new lowest:  1.01 0.79 (0.0121, 0.76, 1974, True, False)
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-7-57b40fff6a6f> in <module>()
     10     drop_conv, avgpool = np.random.binomial(1,0.5,2)
     11     ktf.clear_session()
---> 12     test_err, test_acc = simpleCNN_model(lr, dropout, dense_dim, drop_conv, avgpool)
     13     if test_err < lowest_err:
     14         print('new lowest: ', round(test_err,2), round(test_acc,2), 

<ipython-input-4-ac4d56c80a55> in simpleCNN_model(lr, dropout, dense_dim, drop_conv, avgpool)
     34                    validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)),
     35                    initial_epoch=0,
---> 36                    verbose=False)
     37     return history.history['val_loss'][0], history.history['val_acc'][0]

/usr/local/lib/python3.5/dist-packages/keras/models.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
    865                               class_weight=class_weight,
    866                               sample_weight=sample_weight,
--> 867                               initial_epoch=initial_epoch)
    868 
    869     def evaluate(self, x, y, batch_size=32, verbose=1,

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1593                               initial_epoch=initial_epoch,
   1594                               steps_per_epoch=steps_per_epoch,
-> 1595                               validation_steps=validation_steps)
   1596 
   1597     def evaluate(self, x, y,

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in _fit_loop(self, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
   1180                     batch_logs['size'] = len(batch_ids)
   1181                     callbacks.on_batch_begin(batch_index, batch_logs)
-> 1182                     outs = f(ins_batch)
   1183                     if not isinstance(outs, list):
   1184                         outs = [outs]

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2268         updated = session.run(self.outputs + [self.updates_op],
   2269                               feed_dict=feed_dict,
-> 2270                               **self.session_kwargs)
   2271         return updated[:len(self.outputs)]
   2272 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1325   def _do_call(self, fn, *args):
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:
   1329       message = compat.as_text(e.message)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1304           return tf_session.TF_Run(session, options,
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 
   1308     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

SimpleCNN with data augmentation

Here, we try to reduce the error in a different way, from a perspective of data. In theory, the classifier can learn better with more data. Data augmentation is a way to increase the training set size. It realizes this by slightly transforming the original data, and using original and also this extra transformed dataset to train the model.


In [11]:
# set the data generator to transform the data
idg = ImageDataGenerator(width_shift_range=0.05,
                         fill_mode='constant')
# build the model
simple_CNN = Sequential()
simple_CNN.add(BatchNormalization(input_shape=(28, 28, 1)))
simple_CNN.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
simple_CNN.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
simple_CNN.add(MaxPooling2D((2, 2))) # (14,14,32)
simple_CNN.add(Dropout(0.2))

simple_CNN.add(BatchNormalization())
simple_CNN.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
simple_CNN.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
simple_CNN.add(MaxPooling2D((2, 2))) # (7,7,64)
simple_CNN.add(Dropout(0.2))

simple_CNN.add(Flatten())
simple_CNN.add(Dense(1392, activation='relu'))
simple_CNN.add(Dropout(0.45))
simple_CNN.add(Dense(10, activation='softmax'))

# set loss and optimizer
rmsprop = RMSprop(lr=0.0044, decay=0.99)
simple_CNN.compile(loss='sparse_categorical_crossentropy',
                   optimizer=rmsprop,
                   metrics=['accuracy'])

# train the model using indefinite number of training data
checkpoint = ModelCheckpoint('../models/simpleCNN_aug_{epoch:02d}-{loss:.4f}.h5',
                             monitor='loss',
                             save_best_only=True)
earlystop = EarlyStopping(min_delta=0.0001, patience=3)

simple_CNN.fit_generator(idg.flow(data['X_train'], 
                                      data['y_train'].reshape(-1, 1), 
                                      batch_size=64),
                         steps_per_epoch=len(data['X_train'])/64,
                         initial_epoch=0,
                         epochs=100,
                         callbacks=[checkpoint, earlystop],
                         validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)))


Epoch 1/100
641/640 [==============================] - 97s - loss: 0.7405 - acc: 0.7713 - val_loss: 0.3624 - val_acc: 0.9090
Epoch 2/100
 77/640 [==>...........................] - ETA: 84s - loss: 0.5269 - acc: 0.8375
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-11-b3818afdfd05> in <module>()
     42                          epochs=100,
     43                          callbacks=[checkpoint, earlystop],
---> 44                          validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)))

/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     85                 warnings.warn('Update your `' + object_name +
     86                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 87             return func(*args, **kwargs)
     88         wrapper._original_function = func
     89         return wrapper

/usr/local/lib/python3.5/dist-packages/keras/models.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, initial_epoch)
   1119                                         workers=workers,
   1120                                         use_multiprocessing=use_multiprocessing,
-> 1121                                         initial_epoch=initial_epoch)
   1122 
   1123     @interfaces.legacy_generator_methods_support

/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     85                 warnings.warn('Update your `' + object_name +
     86                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 87             return func(*args, **kwargs)
     88         wrapper._original_function = func
     89         return wrapper

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   2037                     outs = self.train_on_batch(x, y,
   2038                                                sample_weight=sample_weight,
-> 2039                                                class_weight=class_weight)
   2040 
   2041                     if not isinstance(outs, list):

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1757             ins = x + y + sample_weights
   1758         self._make_train_function()
-> 1759         outputs = self.train_function(ins)
   1760         if len(outputs) == 1:
   1761             return outputs[0]

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2268         updated = session.run(self.outputs + [self.updates_op],
   2269                               feed_dict=feed_dict,
-> 2270                               **self.session_kwargs)
   2271         return updated[:len(self.outputs)]
   2272 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1325   def _do_call(self, fn, *args):
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:
   1329       message = compat.as_text(e.message)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1304           return tf_session.TF_Run(session, options,
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 
   1308     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

In [6]:
# resume training
model = load_model('../models/simpleCNN_aug_44-0.9536.h5')

# set the loss and optimizer
rmsprop = RMSprop(lr=0.00044,decay=0.99)
model.compile(optimizer=rmsprop, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# train the model
checkpoint = ModelCheckpoint('../models/simpleCNN_aug_{epoch:02d}-{loss:.4f}.h5',
                             monitor='loss',
                             save_best_only=True)
earlystop = EarlyStopping(min_delta=0.0001, patience=5)

model.fit_generator(idg.flow(data['X_train'], 
                                  data['y_train'].reshape(-1, 1), 
                                  batch_size=64),
                         steps_per_epoch=len(data['X_train'])/64,
                         initial_epoch=45,
                         epochs=100,
                         callbacks=[checkpoint, earlystop],
                         validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)))


Epoch 46/100
641/640 [==============================] - 87s - loss: 0.8927 - acc: 0.7113 - val_loss: 0.3978 - val_acc: 0.8970
Epoch 47/100
641/640 [==============================] - 87s - loss: 0.8764 - acc: 0.7191 - val_loss: 0.3935 - val_acc: 0.8970
Epoch 48/100
641/640 [==============================] - 86s - loss: 0.8735 - acc: 0.7180 - val_loss: 0.3911 - val_acc: 0.8980
Epoch 49/100
641/640 [==============================] - 87s - loss: 0.8671 - acc: 0.7242 - val_loss: 0.3896 - val_acc: 0.8990
Epoch 50/100
641/640 [==============================] - 87s - loss: 0.8573 - acc: 0.7242 - val_loss: 0.3891 - val_acc: 0.8990
Epoch 51/100
641/640 [==============================] - 88s - loss: 0.8628 - acc: 0.7236 - val_loss: 0.3875 - val_acc: 0.8990
Epoch 52/100
641/640 [==============================] - 86s - loss: 0.8583 - acc: 0.7245 - val_loss: 0.3864 - val_acc: 0.8990
Epoch 53/100
641/640 [==============================] - 86s - loss: 0.8540 - acc: 0.7262 - val_loss: 0.3850 - val_acc: 0.8990
Epoch 54/100
641/640 [==============================] - 86s - loss: 0.8476 - acc: 0.7293 - val_loss: 0.3849 - val_acc: 0.8990
Epoch 55/100
641/640 [==============================] - 86s - loss: 0.8514 - acc: 0.7271 - val_loss: 0.3843 - val_acc: 0.9000
Epoch 56/100
641/640 [==============================] - 86s - loss: 0.8557 - acc: 0.7239 - val_loss: 0.3830 - val_acc: 0.8990
Epoch 57/100
641/640 [==============================] - 86s - loss: 0.8517 - acc: 0.7270 - val_loss: 0.3819 - val_acc: 0.9010
Epoch 58/100
641/640 [==============================] - 86s - loss: 0.8572 - acc: 0.7258 - val_loss: 0.3829 - val_acc: 0.9000
Epoch 59/100
641/640 [==============================] - 86s - loss: 0.8484 - acc: 0.7290 - val_loss: 0.3820 - val_acc: 0.9000
Epoch 60/100
641/640 [==============================] - 86s - loss: 0.8519 - acc: 0.7287 - val_loss: 0.3809 - val_acc: 0.9010
Epoch 61/100
641/640 [==============================] - 86s - loss: 0.8458 - acc: 0.7282 - val_loss: 0.3817 - val_acc: 0.9000
Epoch 62/100
641/640 [==============================] - 86s - loss: 0.8500 - acc: 0.7260 - val_loss: 0.3807 - val_acc: 0.9000
Epoch 63/100
641/640 [==============================] - 86s - loss: 0.8471 - acc: 0.7281 - val_loss: 0.3811 - val_acc: 0.9000
Epoch 64/100
641/640 [==============================] - 86s - loss: 0.8507 - acc: 0.7265 - val_loss: 0.3789 - val_acc: 0.9010
Epoch 65/100
641/640 [==============================] - 86s - loss: 0.8446 - acc: 0.7305 - val_loss: 0.3793 - val_acc: 0.9010
Epoch 66/100
641/640 [==============================] - 86s - loss: 0.8452 - acc: 0.7296 - val_loss: 0.3787 - val_acc: 0.9010
Epoch 67/100
641/640 [==============================] - 86s - loss: 0.8485 - acc: 0.7279 - val_loss: 0.3803 - val_acc: 0.9000
Epoch 68/100
641/640 [==============================] - 86s - loss: 0.8438 - acc: 0.7299 - val_loss: 0.3792 - val_acc: 0.9010
Epoch 69/100
641/640 [==============================] - 86s - loss: 0.8456 - acc: 0.7292 - val_loss: 0.3788 - val_acc: 0.9010
Epoch 70/100
641/640 [==============================] - 86s - loss: 0.8420 - acc: 0.7297 - val_loss: 0.3804 - val_acc: 0.9000
Epoch 71/100
641/640 [==============================] - 86s - loss: 0.8432 - acc: 0.7278 - val_loss: 0.3787 - val_acc: 0.9000
Epoch 72/100
641/640 [==============================] - 86s - loss: 0.8387 - acc: 0.7296 - val_loss: 0.3779 - val_acc: 0.9010
Epoch 73/100
641/640 [==============================] - 86s - loss: 0.8428 - acc: 0.7319 - val_loss: 0.3779 - val_acc: 0.9000
Epoch 74/100
641/640 [==============================] - 86s - loss: 0.8499 - acc: 0.7270 - val_loss: 0.3771 - val_acc: 0.9010
Epoch 75/100
641/640 [==============================] - 86s - loss: 0.8471 - acc: 0.7253 - val_loss: 0.3769 - val_acc: 0.9010
Epoch 76/100
641/640 [==============================] - 86s - loss: 0.8469 - acc: 0.7287 - val_loss: 0.3773 - val_acc: 0.9010
Epoch 77/100
641/640 [==============================] - 86s - loss: 0.8439 - acc: 0.7323 - val_loss: 0.3766 - val_acc: 0.9010
Epoch 78/100
641/640 [==============================] - 86s - loss: 0.8469 - acc: 0.7297 - val_loss: 0.3769 - val_acc: 0.9010
Epoch 79/100
641/640 [==============================] - 86s - loss: 0.8492 - acc: 0.7268 - val_loss: 0.3763 - val_acc: 0.9010
Epoch 80/100
641/640 [==============================] - 86s - loss: 0.8409 - acc: 0.7325 - val_loss: 0.3759 - val_acc: 0.9010
Epoch 81/100
641/640 [==============================] - 86s - loss: 0.8450 - acc: 0.7281 - val_loss: 0.3758 - val_acc: 0.9010
Epoch 82/100
641/640 [==============================] - 86s - loss: 0.8373 - acc: 0.7328 - val_loss: 0.3763 - val_acc: 0.9010
Epoch 83/100
641/640 [==============================] - 86s - loss: 0.8471 - acc: 0.7257 - val_loss: 0.3763 - val_acc: 0.9010
Epoch 84/100
641/640 [==============================] - 86s - loss: 0.8421 - acc: 0.7301 - val_loss: 0.3762 - val_acc: 0.9010
Epoch 85/100
641/640 [==============================] - 86s - loss: 0.8443 - acc: 0.7303 - val_loss: 0.3763 - val_acc: 0.9000
Epoch 86/100
641/640 [==============================] - 86s - loss: 0.8395 - acc: 0.7303 - val_loss: 0.3751 - val_acc: 0.9010
Epoch 87/100
641/640 [==============================] - 86s - loss: 0.8403 - acc: 0.7308 - val_loss: 0.3748 - val_acc: 0.9010
Epoch 88/100
641/640 [==============================] - 86s - loss: 0.8371 - acc: 0.7327 - val_loss: 0.3756 - val_acc: 0.9010
Epoch 89/100
641/640 [==============================] - 86s - loss: 0.8460 - acc: 0.7270 - val_loss: 0.3747 - val_acc: 0.9010
Epoch 90/100
641/640 [==============================] - 86s - loss: 0.8429 - acc: 0.7305 - val_loss: 0.3750 - val_acc: 0.9010
Epoch 91/100
641/640 [==============================] - 86s - loss: 0.8312 - acc: 0.7332 - val_loss: 0.3748 - val_acc: 0.9020
Epoch 92/100
641/640 [==============================] - 86s - loss: 0.8423 - acc: 0.7294 - val_loss: 0.3750 - val_acc: 0.9010
Epoch 93/100
641/640 [==============================] - 86s - loss: 0.8404 - acc: 0.7304 - val_loss: 0.3744 - val_acc: 0.9010
Epoch 94/100
516/640 [=======================>......] - ETA: 16s - loss: 0.8351 - acc: 0.7314
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-6-42e6f41ea684> in <module>()
     19                          epochs=100,
     20                          callbacks=[checkpoint, earlystop],
---> 21                          validation_data=(data['X_test'], data['y_test'].reshape(-1, 1)))

/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     85                 warnings.warn('Update your `' + object_name +
     86                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 87             return func(*args, **kwargs)
     88         wrapper._original_function = func
     89         return wrapper

/usr/local/lib/python3.5/dist-packages/keras/models.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, initial_epoch)
   1119                                         workers=workers,
   1120                                         use_multiprocessing=use_multiprocessing,
-> 1121                                         initial_epoch=initial_epoch)
   1122 
   1123     @interfaces.legacy_generator_methods_support

/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     85                 warnings.warn('Update your `' + object_name +
     86                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 87             return func(*args, **kwargs)
     88         wrapper._original_function = func
     89         return wrapper

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   2037                     outs = self.train_on_batch(x, y,
   2038                                                sample_weight=sample_weight,
-> 2039                                                class_weight=class_weight)
   2040 
   2041                     if not isinstance(outs, list):

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1757             ins = x + y + sample_weights
   1758         self._make_train_function()
-> 1759         outputs = self.train_function(ins)
   1760         if len(outputs) == 1:
   1761             return outputs[0]

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2268         updated = session.run(self.outputs + [self.updates_op],
   2269                               feed_dict=feed_dict,
-> 2270                               **self.session_kwargs)
   2271         return updated[:len(self.outputs)]
   2272 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1325   def _do_call(self, fn, *args):
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:
   1329       message = compat.as_text(e.message)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1304           return tf_session.TF_Run(session, options,
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 
   1308     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:

Create submissions

Load the saved trained models and produce predictions for submission on Kaggle.


In [4]:
from lib.data_utils import create_submission
from keras.models import load_model

# for simple CNN model
model = load_model('../models/simpleCNN_86-0.0034.h5')
print('Load model successfully.')
create_submission(model, '../data/test.csv', '../submission/submission_simpleCNN_tuned_87.csv', 128)


Load model successfully.
28000/28000 [==============================] - 13s    

In [8]:
history = simpleCNN_model()


Train on 4000 samples, validate on 1000 samples
Epoch 1/1
4000/4000 [==============================] - 18s - loss: 2.2335 - acc: 0.2383 - val_loss: 2.2805 - val_acc: 0.1230

In [10]:
print(history.history)


{'loss': [2.233535957336426], 'acc': [0.23824999999999999], 'val_loss': [2.2805455265045165], 'val_acc': [0.12300000023841857]}

In [ ]:
new lowest:  1.85 0.76 (0.0009, 0.58, 1892, False, False)
new lowest:  1.78 0.52 (0.0077, 0.34, 867, True, False)