Chapter 4.4 - Overfitting and underfitting


In [1]:
import numpy as np
from keras.datasets import imdb


Using TensorFlow backend.

In [2]:
# Getting IMDB data
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = 10000)

In [3]:
# Vectorizing the sequence
def vectorize_sequences(sequences, dimension = 10000):
    # Create an all-zero matrix of shape (len(sequences), dimension)
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.  # set specific indices of results[i] to 1s
    return results

In [4]:
# Vectorizing the input datasets
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

In [5]:
# Changing types of the to float 32
y_train = np.asarray(train_labels)
y_train = y_train.astype('float32')

y_test = np.asarray(test_labels)
y_test = y_test.astype('float32')

Defining models


In [6]:
# Keras imports
from keras.models import Sequential
from keras.layers import Dense

In [7]:
# Defining the original model
def build_original_model():
    model = Sequential()
    model.add(Dense(units = 16, 
                    activation = 'relu', 
                    input_shape = (10000,)))
    model.add(Dense(units = 16, 
                    activation = 'relu'))
    model.add(Dense(units = 1, 
                    activation = 'sigmoid'))
    model.compile(optimizer = 'rmsprop',
                       loss = 'binary_crossentropy',
                       metrics = ['acc'])
    return model

In [8]:
# Defining a small model
def build_small_model():
    model = Sequential()
    model.add(Dense(units = 4, 
                    activation = 'relu', 
                    input_shape = (10000,)))
    model.add(Dense(units = 4, 
                    activation = 'relu'))
    model.add(Dense(units = 1, 
                    activation = 'sigmoid'))
    model.compile(optimizer = 'rmsprop',
                       loss = 'binary_crossentropy',
                       metrics = ['acc'])
    return model

In [9]:
# Defining a big model
def build_big_model():
    model = Sequential()
    model.add(Dense(units = 512, 
                    activation = 'relu', 
                    input_shape = (10000,)))
    model.add(Dense(units = 512, 
                    activation = 'relu'))
    model.add(Dense(units = 1, 
                    activation = 'sigmoid'))
    model.compile(optimizer = 'rmsprop',
                       loss = 'binary_crossentropy',
                       metrics = ['acc'])
    return model

Comparison between the original and the small network


In [10]:
# Initializing the original network
original_network = build_original_model()

In [11]:
# Training the original network
original_network_history = original_network.fit(x_train, 
                     y_train,
                     epochs = 20,
                     batch_size = 512,
                     validation_data = (x_test, y_test))


Train on 25000 samples, validate on 25000 samples
Epoch 1/20
25000/25000 [==============================] - 7s - loss: 0.4617 - acc: 0.8187 - val_loss: 0.3468 - val_acc: 0.8789
Epoch 2/20
25000/25000 [==============================] - 4s - loss: 0.2687 - acc: 0.9069 - val_loss: 0.2910 - val_acc: 0.8880
Epoch 3/20
25000/25000 [==============================] - 4s - loss: 0.2046 - acc: 0.9288 - val_loss: 0.2809 - val_acc: 0.8888
Epoch 4/20
25000/25000 [==============================] - 4s - loss: 0.1725 - acc: 0.9381 - val_loss: 0.2963 - val_acc: 0.8833
Epoch 5/20
25000/25000 [==============================] - 4s - loss: 0.1459 - acc: 0.9484 - val_loss: 0.3102 - val_acc: 0.8799
Epoch 6/20
25000/25000 [==============================] - 4s - loss: 0.1296 - acc: 0.9558 - val_loss: 0.3287 - val_acc: 0.8779
Epoch 7/20
25000/25000 [==============================] - 4s - loss: 0.1154 - acc: 0.9599 - val_loss: 0.3579 - val_acc: 0.8738
Epoch 8/20
25000/25000 [==============================] - 4s - loss: 0.1014 - acc: 0.9663 - val_loss: 0.3723 - val_acc: 0.8716
Epoch 9/20
25000/25000 [==============================] - 4s - loss: 0.0908 - acc: 0.9698 - val_loss: 0.4124 - val_acc: 0.8670
Epoch 10/20
25000/25000 [==============================] - 4s - loss: 0.0818 - acc: 0.9735 - val_loss: 0.4678 - val_acc: 0.8527
Epoch 11/20
25000/25000 [==============================] - 4s - loss: 0.0730 - acc: 0.9768 - val_loss: 0.4728 - val_acc: 0.8625
Epoch 12/20
25000/25000 [==============================] - 4s - loss: 0.0647 - acc: 0.9796 - val_loss: 0.4802 - val_acc: 0.8619
Epoch 13/20
25000/25000 [==============================] - 4s - loss: 0.0558 - acc: 0.9829 - val_loss: 0.5112 - val_acc: 0.8592
Epoch 14/20
25000/25000 [==============================] - 4s - loss: 0.0472 - acc: 0.9862 - val_loss: 0.6020 - val_acc: 0.8540
Epoch 15/20
25000/25000 [==============================] - 4s - loss: 0.0435 - acc: 0.9872 - val_loss: 0.5830 - val_acc: 0.8531
Epoch 16/20
25000/25000 [==============================] - 4s - loss: 0.0372 - acc: 0.9892 - val_loss: 0.6242 - val_acc: 0.8564
Epoch 17/20
25000/25000 [==============================] - 4s - loss: 0.0305 - acc: 0.9919 - val_loss: 0.6532 - val_acc: 0.8554
Epoch 18/20
25000/25000 [==============================] - 4s - loss: 0.0288 - acc: 0.9916 - val_loss: 0.6988 - val_acc: 0.8531
Epoch 19/20
25000/25000 [==============================] - 4s - loss: 0.0218 - acc: 0.9949 - val_loss: 0.7236 - val_acc: 0.8527
Epoch 20/20
25000/25000 [==============================] - 4s - loss: 0.0206 - acc: 0.9955 - val_loss: 0.7522 - val_acc: 0.8502

In [12]:
small_network = build_small_model()

In [13]:
small_network_history = small_network.fit(x_train,  
                                          y_train,
                                          epochs = 20,
                                          batch_size = 512,
                                          validation_data = (x_test, y_test))


Train on 25000 samples, validate on 25000 samples
Epoch 1/20
25000/25000 [==============================] - 4s - loss: 0.5178 - acc: 0.7892 - val_loss: 0.4157 - val_acc: 0.8693
Epoch 2/20
25000/25000 [==============================] - 4s - loss: 0.3292 - acc: 0.8989 - val_loss: 0.3240 - val_acc: 0.8853
Epoch 3/20
25000/25000 [==============================] - 4s - loss: 0.2468 - acc: 0.9206 - val_loss: 0.2929 - val_acc: 0.8879
Epoch 4/20
25000/25000 [==============================] - 4s - loss: 0.2030 - acc: 0.9337 - val_loss: 0.2824 - val_acc: 0.8887
Epoch 5/20
25000/25000 [==============================] - 4s - loss: 0.1766 - acc: 0.9416 - val_loss: 0.2887 - val_acc: 0.8856
Epoch 6/20
25000/25000 [==============================] - 4s - loss: 0.1563 - acc: 0.9482 - val_loss: 0.2910 - val_acc: 0.8839
Epoch 7/20
25000/25000 [==============================] - 4s - loss: 0.1418 - acc: 0.9534 - val_loss: 0.3010 - val_acc: 0.8819
Epoch 8/20
25000/25000 [==============================] - 5s - loss: 0.1286 - acc: 0.9572 - val_loss: 0.3127 - val_acc: 0.8783
Epoch 9/20
25000/25000 [==============================] - 4s - loss: 0.1182 - acc: 0.9622 - val_loss: 0.3303 - val_acc: 0.8760
Epoch 10/20
25000/25000 [==============================] - 4s - loss: 0.1083 - acc: 0.9664 - val_loss: 0.3400 - val_acc: 0.8742
Epoch 11/20
25000/25000 [==============================] - 4s - loss: 0.0989 - acc: 0.9696 - val_loss: 0.3629 - val_acc: 0.8708
Epoch 12/20
25000/25000 [==============================] - 4s - loss: 0.0916 - acc: 0.9717 - val_loss: 0.3753 - val_acc: 0.8690
Epoch 13/20
25000/25000 [==============================] - 4s - loss: 0.0844 - acc: 0.9748 - val_loss: 0.3927 - val_acc: 0.8685
Epoch 14/20
25000/25000 [==============================] - 4s - loss: 0.0780 - acc: 0.9764 - val_loss: 0.4081 - val_acc: 0.8655
Epoch 15/20
25000/25000 [==============================] - 4s - loss: 0.0719 - acc: 0.9795 - val_loss: 0.4290 - val_acc: 0.8647
Epoch 16/20
25000/25000 [==============================] - 4s - loss: 0.0663 - acc: 0.9813 - val_loss: 0.4460 - val_acc: 0.8628
Epoch 17/20
25000/25000 [==============================] - 4s - loss: 0.0609 - acc: 0.9827 - val_loss: 0.4744 - val_acc: 0.8602
Epoch 18/20
25000/25000 [==============================] - 4s - loss: 0.0563 - acc: 0.9846 - val_loss: 0.5025 - val_acc: 0.8583
Epoch 19/20
25000/25000 [==============================] - 4s - loss: 0.0529 - acc: 0.9868 - val_loss: 0.5121 - val_acc: 0.8584
Epoch 20/20
25000/25000 [==============================] - 4s - loss: 0.0476 - acc: 0.9878 - val_loss: 0.5338 - val_acc: 0.8572

In [14]:
epochs = range(1, 21)
original_val_loss = original_network_history.history['val_loss']
small_model_val_loss = small_network_history.history['val_loss']

In [15]:
import matplotlib.pyplot as plt
plt.figure(figsize = (10, 6))

# Plotting the validation loss of the original network
# b+ is for "blue cross"
plt.plot(epochs, 
         original_val_loss, 
         'b+', 
         label='Original model')
# Plotting the validation loss of the small network
# "bo" is for "blue dot"
plt.plot(epochs, 
         small_model_val_loss, 'bo', label='Small model')
plt.xlabel('Epochs')
plt.ylabel('Validation loss')
plt.legend()

plt.show()


Conclusions

The smaller network overfits later than the bigger network. Let's investigate other scenario.

Comparison between the original and the big network


In [16]:
# Initializing the big network
big_network = build_big_model()

In [17]:
big_network_history = big_network.fit(x_train,  
                                      y_train,
                                      epochs = 20,
                                      batch_size = 512,
                                      validation_data = (x_test, y_test))


Train on 25000 samples, validate on 25000 samples
Epoch 1/20
25000/25000 [==============================] - 5s - loss: 0.4550 - acc: 0.7944 - val_loss: 0.3653 - val_acc: 0.8388
Epoch 2/20
25000/25000 [==============================] - 5s - loss: 0.2174 - acc: 0.9133 - val_loss: 0.3120 - val_acc: 0.8735
Epoch 3/20
25000/25000 [==============================] - 7s - loss: 0.1303 - acc: 0.9528 - val_loss: 0.3331 - val_acc: 0.8856
Epoch 4/20
25000/25000 [==============================] - 4s - loss: 0.0612 - acc: 0.9798 - val_loss: 0.3894 - val_acc: 0.8831
Epoch 5/20
25000/25000 [==============================] - 4s - loss: 0.0582 - acc: 0.9880 - val_loss: 0.5048 - val_acc: 0.8826
Epoch 6/20
25000/25000 [==============================] - 4s - loss: 0.0668 - acc: 0.9885 - val_loss: 0.4805 - val_acc: 0.8807
Epoch 7/20
25000/25000 [==============================] - 4s - loss: 0.0014 - acc: 1.0000 - val_loss: 0.6924 - val_acc: 0.8812
Epoch 8/20
25000/25000 [==============================] - 5s - loss: 0.0804 - acc: 0.9895 - val_loss: 0.6798 - val_acc: 0.8670
Epoch 9/20
25000/25000 [==============================] - 4s - loss: 4.0498e-04 - acc: 1.0000 - val_loss: 0.7492 - val_acc: 0.8804
Epoch 10/20
25000/25000 [==============================] - 4s - loss: 0.0621 - acc: 0.9912 - val_loss: 0.6472 - val_acc: 0.8805
Epoch 11/20
25000/25000 [==============================] - 4s - loss: 0.0317 - acc: 0.9941 - val_loss: 0.6362 - val_acc: 0.8742
Epoch 12/20
25000/25000 [==============================] - 5s - loss: 5.1698e-04 - acc: 1.0000 - val_loss: 0.7481 - val_acc: 0.8781
Epoch 13/20
25000/25000 [==============================] - 5s - loss: 0.0591 - acc: 0.9929 - val_loss: 0.7631 - val_acc: 0.8794
Epoch 14/20
25000/25000 [==============================] - 5s - loss: 0.0392 - acc: 0.9941 - val_loss: 0.6872 - val_acc: 0.8776
Epoch 15/20
25000/25000 [==============================] - 5s - loss: 5.7577e-04 - acc: 0.9999 - val_loss: 0.7824 - val_acc: 0.8778
Epoch 16/20
25000/25000 [==============================] - 5s - loss: 0.0400 - acc: 0.9948 - val_loss: 0.8146 - val_acc: 0.8710
Epoch 17/20
25000/25000 [==============================] - 5s - loss: 0.0370 - acc: 0.9952 - val_loss: 0.7615 - val_acc: 0.8741
Epoch 18/20
25000/25000 [==============================] - 5s - loss: 0.0041 - acc: 0.9993 - val_loss: 2.1480 - val_acc: 0.7342
Epoch 19/20
25000/25000 [==============================] - 5s - loss: 0.0233 - acc: 0.9962 - val_loss: 0.7797 - val_acc: 0.8762
Epoch 20/20
25000/25000 [==============================] - 5s - loss: 0.0409 - acc: 0.9945 - val_loss: 0.7715 - val_acc: 0.8748

In [18]:
# Comparison of validation losses
big_model_val_loss = big_network_history.history['val_loss']

plt.figure(figsize = (10, 6))
plt.plot(epochs, 
         original_val_loss, 
         'b+', 
         label = 'Original model')
plt.plot(epochs, 
         big_model_val_loss, 
         'bo', 
         label = 'Big model')
plt.xlabel('Epochs')
plt.ylabel('Validation loss')
plt.legend()

plt.show()



In [19]:
# Comparison of traning losses
original_train_loss = original_network_history.history['loss']
bigger_model_train_loss = big_network_history.history['loss']

plt.figure(figsize = (10, 6))
plt.plot(epochs, 
         original_train_loss, 
         'b+', 
         label = 'Original model')
plt.plot(epochs, 
         bigger_model_train_loss, 
         'bo', 
         label = 'Big model')
plt.xlabel('Epochs')
plt.ylabel('Training loss')
plt.legend()

plt.show()


Conclusions

The bigger network adpats faster, but it results in more severe overfitting issues.

Weight regularization


In [20]:
from keras.regularizers import l2

In [21]:
# Building the original network with weight regularization
def build_original_model_with_l2():
    model = Sequential()
    model.add(Dense(units = 16, 
                    activation = 'relu', 
                    # Parameter l -> weight loss
                    kernel_regularizer = l2(l = 0.001),
                    input_shape = (10000,)))
    model.add(Dense(units = 16, 
                    activation = 'relu', 
                    kernel_regularizer = l2(l = 0.001)))
    model.add(Dense(units = 1, 
                    activation = 'sigmoid'))
    model.compile(optimizer = 'rmsprop',
                       loss = 'binary_crossentropy',
                       metrics = ['acc'])
    return model

In [22]:
original_network_with_l2 = build_original_model_with_l2()

In [23]:
original_network_with_l2_history = original_network_with_l2.fit(x_train,
                                                                y_train,
                                                                epochs = 20,
                                                                batch_size = 512,
                                                                validation_data = (x_test, y_test))


Train on 25000 samples, validate on 25000 samples
Epoch 1/20
25000/25000 [==============================] - 5s - loss: 0.4807 - acc: 0.8271 - val_loss: 0.3787 - val_acc: 0.8814
Epoch 2/20
25000/25000 [==============================] - 4s - loss: 0.3150 - acc: 0.9066 - val_loss: 0.3350 - val_acc: 0.8899
Epoch 3/20
25000/25000 [==============================] - 4s - loss: 0.2741 - acc: 0.9187 - val_loss: 0.3310 - val_acc: 0.8899
Epoch 4/20
25000/25000 [==============================] - 4s - loss: 0.2519 - acc: 0.9272 - val_loss: 0.3435 - val_acc: 0.8828
Epoch 5/20
25000/25000 [==============================] - 4s - loss: 0.2409 - acc: 0.9329 - val_loss: 0.3478 - val_acc: 0.8818
Epoch 6/20
25000/25000 [==============================] - 4s - loss: 0.2327 - acc: 0.9367 - val_loss: 0.3572 - val_acc: 0.8781
Epoch 7/20
25000/25000 [==============================] - 4s - loss: 0.2260 - acc: 0.9387 - val_loss: 0.3643 - val_acc: 0.8788
Epoch 8/20
25000/25000 [==============================] - 4s - loss: 0.2224 - acc: 0.9382 - val_loss: 0.3640 - val_acc: 0.8785
Epoch 9/20
25000/25000 [==============================] - 4s - loss: 0.2162 - acc: 0.9432 - val_loss: 0.3713 - val_acc: 0.8761
Epoch 10/20
25000/25000 [==============================] - 5s - loss: 0.2137 - acc: 0.9445 - val_loss: 0.4122 - val_acc: 0.8650
Epoch 11/20
25000/25000 [==============================] - 7s - loss: 0.2103 - acc: 0.9444 - val_loss: 0.3921 - val_acc: 0.8720
Epoch 12/20
25000/25000 [==============================] - 5s - loss: 0.2068 - acc: 0.9443 - val_loss: 0.3884 - val_acc: 0.8738
Epoch 13/20
25000/25000 [==============================] - 4s - loss: 0.2069 - acc: 0.9452 - val_loss: 0.4029 - val_acc: 0.8725
Epoch 14/20
25000/25000 [==============================] - 4s - loss: 0.2016 - acc: 0.9491 - val_loss: 0.4037 - val_acc: 0.8703
Epoch 15/20
25000/25000 [==============================] - 4s - loss: 0.1959 - acc: 0.9505 - val_loss: 0.4918 - val_acc: 0.8524
Epoch 16/20
25000/25000 [==============================] - 4s - loss: 0.2014 - acc: 0.9463 - val_loss: 0.4144 - val_acc: 0.8718
Epoch 17/20
25000/25000 [==============================] - 4s - loss: 0.1958 - acc: 0.9494 - val_loss: 0.4198 - val_acc: 0.8697
Epoch 18/20
25000/25000 [==============================] - 4s - loss: 0.1934 - acc: 0.9520 - val_loss: 0.4378 - val_acc: 0.8670
Epoch 19/20
25000/25000 [==============================] - 4s - loss: 0.1924 - acc: 0.9498 - val_loss: 0.4174 - val_acc: 0.8695
Epoch 20/20
25000/25000 [==============================] - 4s - loss: 0.1909 - acc: 0.9514 - val_loss: 0.4286 - val_acc: 0.8706

In [24]:
original_network_with_l2_val_loss = original_network_with_l2_history.history['val_loss']

plt.figure(figsize = (10, 6))
plt.plot(epochs, 
         original_val_loss, 
         'b+', 
         label = 'Original model')
plt.plot(epochs, 
         original_network_with_l2_val_loss, 
         'bo', 
         label = 'L2-regularized model')
plt.xlabel('Epochs')
plt.ylabel('Validation loss')
plt.legend()

plt.show()


Dropout


In [25]:
from keras.layers import Dropout

In [26]:
# Defining the original model with dropout
def build_original_model_with_dropout():
    model = Sequential()
    model.add(Dense(units = 16, 
                    activation = 'relu', 
                    input_shape = (10000,)))
    # Parameter rate -> fraction of the input units to drop
    model.add(Dropout(rate = 0.5))
    model.add(Dense(units = 16, 
                    activation = 'relu'))
    model.add(Dropout(rate = 0.5))
    model.add(Dense(units = 1, 
                    activation = 'sigmoid'))
    model.compile(optimizer = 'rmsprop',
                       loss = 'binary_crossentropy',
                       metrics = ['acc'])
    return model

In [27]:
original_model_with_dropout = build_original_model_with_dropout()

In [28]:
original_model_with_dropout_history = original_model_with_dropout.fit(x_train,
                                                                      y_train,
                                                                      epochs = 20,
                                                                      batch_size = 512,
                                                                      validation_data = (x_test, y_test))


Train on 25000 samples, validate on 25000 samples
Epoch 1/20
25000/25000 [==============================] - 4s - loss: 0.5982 - acc: 0.6751 - val_loss: 0.4552 - val_acc: 0.8656
Epoch 2/20
25000/25000 [==============================] - 4s - loss: 0.4562 - acc: 0.8033 - val_loss: 0.3582 - val_acc: 0.8738
Epoch 3/20
25000/25000 [==============================] - 4s - loss: 0.3696 - acc: 0.8579 - val_loss: 0.2955 - val_acc: 0.8886
Epoch 4/20
25000/25000 [==============================] - 4s - loss: 0.3131 - acc: 0.8887 - val_loss: 0.2770 - val_acc: 0.8904
Epoch 5/20
25000/25000 [==============================] - 4s - loss: 0.2728 - acc: 0.9066 - val_loss: 0.2768 - val_acc: 0.8889
Epoch 6/20
25000/25000 [==============================] - 4s - loss: 0.2429 - acc: 0.9198 - val_loss: 0.2849 - val_acc: 0.8878
Epoch 7/20
25000/25000 [==============================] - 4s - loss: 0.2149 - acc: 0.9298 - val_loss: 0.2973 - val_acc: 0.8862
Epoch 8/20
25000/25000 [==============================] - 4s - loss: 0.1973 - acc: 0.9370 - val_loss: 0.3113 - val_acc: 0.8867
Epoch 9/20
25000/25000 [==============================] - 4s - loss: 0.1805 - acc: 0.9422 - val_loss: 0.3369 - val_acc: 0.8848
Epoch 10/20
25000/25000 [==============================] - 4s - loss: 0.1668 - acc: 0.9462 - val_loss: 0.3536 - val_acc: 0.8821
Epoch 11/20
25000/25000 [==============================] - 4s - loss: 0.1588 - acc: 0.9489 - val_loss: 0.3793 - val_acc: 0.8818
Epoch 12/20
25000/25000 [==============================] - 4s - loss: 0.1485 - acc: 0.9517 - val_loss: 0.4112 - val_acc: 0.8807
Epoch 13/20
25000/25000 [==============================] - 4s - loss: 0.1394 - acc: 0.9554 - val_loss: 0.4100 - val_acc: 0.8784
Epoch 14/20
25000/25000 [==============================] - 4s - loss: 0.1338 - acc: 0.9585 - val_loss: 0.4579 - val_acc: 0.8780
Epoch 15/20
25000/25000 [==============================] - 4s - loss: 0.1272 - acc: 0.9586 - val_loss: 0.4584 - val_acc: 0.8786
Epoch 16/20
25000/25000 [==============================] - 4s - loss: 0.1219 - acc: 0.9593 - val_loss: 0.4745 - val_acc: 0.8780
Epoch 17/20
25000/25000 [==============================] - 4s - loss: 0.1207 - acc: 0.9615 - val_loss: 0.4875 - val_acc: 0.8756
Epoch 18/20
25000/25000 [==============================] - 4s - loss: 0.1205 - acc: 0.9618 - val_loss: 0.5011 - val_acc: 0.8754
Epoch 19/20
25000/25000 [==============================] - 4s - loss: 0.1125 - acc: 0.9633 - val_loss: 0.5368 - val_acc: 0.8740
Epoch 20/20
25000/25000 [==============================] - 4s - loss: 0.1161 - acc: 0.9604 - val_loss: 0.5335 - val_acc: 0.8730

In [29]:
original_model_with_dropout_val_loss = original_model_with_dropout_history.history['val_loss']

plt.figure(figsize = (10, 6))
plt.plot(epochs, original_val_loss, 'b+', label='Original model')
plt.plot(epochs, original_model_with_dropout_val_loss, 'bo', label='Dropout-regularized model')
plt.xlabel('Epochs')
plt.ylabel('Validation loss')
plt.legend()

plt.show()


Conclusion

Model with droupout overfits later.