In [1]:
%%HTML
<style>
.container { width:100% }
</style>


Building a Neural Network with Keras


In [2]:
import gzip
import pickle
import random
import numpy  as np
import keras

from keras.layers import Dense


Using TensorFlow backend.

The following magic command is necessary to prevent the Python kernel to die because of linkage problems.


In [3]:
%env KMP_DUPLICATE_LIB_OK=TRUE


env: KMP_DUPLICATE_LIB_OK=TRUE

The function $\texttt{vectorized_result}(d)$ converts the digit $d \in \{0,\cdots,9\}$ and returns a NumPy vector $\mathbf{x}$ of shape $(10, 1)$ such that $$ \mathbf{x}[i] = \left\{ \begin{array}{ll} 1 & \mbox{if $i = j$;} \\ 0 & \mbox{otherwise.} \end{array} \right. $$ This function is used to convert a digit $d$ into the expected output of a neural network that has an output unit for every digit.


In [4]:
def vectorized_result(d):
    e    = np.zeros((10, ), dtype=np.float32)
    e[d] = 1.0
    return e

The function $\texttt{load_data}()$ returns a pair of the form $$ (\texttt{training_data}, \texttt{test_data}) $$ where

  • $\texttt{training_data}$ is a list containing 60,000 pairs $(\textbf{x}, \textbf{y})$ s.t. $\textbf{x}$ is a 784-dimensional `numpy.ndarray` containing the input image and $\textbf{y}$ is a 10-dimensional `numpy.ndarray` corresponding to the correct digit for x.
  • $\texttt{test_data}$ is a list containing 10,000 pairs $(\textbf{x}, y)$. In each case, $\textbf{x}$ is a 784-dimensional `numpy.ndarry` containing the input image, and $y$ is the corresponding digit value.

In [5]:
def load_data():
    with gzip.open('mnist.pkl.gz', 'rb') as f:
        train, validate, test = pickle.load(f, encoding="latin1")
    X_train = np.array([np.reshape(x, (784, )) for x in train[0]])
    X_test  = np.array([np.reshape(x, (784, )) for x in test [0]])
    Y_train = np.array([vectorized_result(y) for y in train[1]])
    Y_test  = np.array([vectorized_result(y) for y in test [1]])
    return (X_train, X_test, Y_train, Y_test)

In [6]:
X_train, X_test, Y_train, Y_test = load_data()

Let us see what we have read:


In [7]:
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape


Out[7]:
((50000, 784), (10000, 784), (50000, 10), (10000, 10))

Below, we create a neural network with two hidden layers.

  • The first hidden layer has 60 nodes and uses the ReLU function as activation function.
  • The second hidden layer uses 30 nodes and also uses the ReLu function.
  • The output layer uses the softmax function as activation function. This function is defined as follows: $$ \sigma(\mathbf{z})_i := \frac{e^{z_i}}{\sum\limits_{d=0}^{10} e^{z_d}} $$ Here, $N$ is the number of output nodes and $z_i$ is the sum of the inputs of the $i$-th output neuron. This function guarantees that the outputs of the 10 output nodes can be interpreted as probabilities, since there sum is equal to $1$.
  • The loss function used is the cross-entropy.
    If a neuron outputs the value $a$, when it should output the value $y \in \{0,1\}$, the cross entropy cost of this neuron is defined as $$ C(a, y) := - y \cdot \ln(a) - (1-y)\cdot \ln(1-a). $$
  • The cost function is minimzed using stochastic gradient descent with a learning rate of $0.3$.

In [8]:
model = keras.models.Sequential()
model.add(keras.layers.Dense( 60, activation='relu', input_dim=784))
model.add(keras.layers.Dense( 30, activation='relu'               ))
model.add(keras.layers.Dense( 10, activation='softmax'            ))
model.compile(loss       = 'categorical_crossentropy', 
              optimizer  = keras.optimizers.SGD(lr=0.3), 
              metrics    = ['accuracy'])
model.summary()


WARNING: Logging before flag parsing goes to stderr.
W0326 15:47:32.480726 4569234880 deprecation_wrapper.py:119] From /Users/karlstroetmann/anaconda3/envs/ds/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0326 15:47:32.510921 4569234880 deprecation_wrapper.py:119] From /Users/karlstroetmann/anaconda3/envs/ds/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0326 15:47:32.513059 4569234880 deprecation_wrapper.py:119] From /Users/karlstroetmann/anaconda3/envs/ds/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0326 15:47:32.568136 4569234880 deprecation_wrapper.py:119] From /Users/karlstroetmann/anaconda3/envs/ds/lib/python3.7/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0326 15:47:32.577199 4569234880 deprecation_wrapper.py:119] From /Users/karlstroetmann/anaconda3/envs/ds/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 60)                47100     
_________________________________________________________________
dense_2 (Dense)              (None, 30)                1830      
_________________________________________________________________
dense_3 (Dense)              (None, 10)                310       
=================================================================
Total params: 49,240
Trainable params: 49,240
Non-trainable params: 0
_________________________________________________________________

In [9]:
%%time
history = model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=30, batch_size=100, verbose=1)


W0326 15:49:44.651621 4569234880 deprecation.py:323] From /Users/karlstroetmann/anaconda3/envs/ds/lib/python3.7/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0326 15:49:44.706430 4569234880 deprecation_wrapper.py:119] From /Users/karlstroetmann/anaconda3/envs/ds/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

Train on 50000 samples, validate on 10000 samples
Epoch 1/30
50000/50000 [==============================] - 3s 58us/step - loss: 0.3712 - acc: 0.8862 - val_loss: 0.1769 - val_acc: 0.9467
Epoch 2/30
50000/50000 [==============================] - 2s 42us/step - loss: 0.1571 - acc: 0.9515 - val_loss: 0.1327 - val_acc: 0.9619
Epoch 3/30
50000/50000 [==============================] - 2s 43us/step - loss: 0.1172 - acc: 0.9644 - val_loss: 0.1298 - val_acc: 0.9603
Epoch 4/30
50000/50000 [==============================] - 2s 40us/step - loss: 0.0929 - acc: 0.9716 - val_loss: 0.1145 - val_acc: 0.9657
Epoch 5/30
50000/50000 [==============================] - 2s 41us/step - loss: 0.0796 - acc: 0.9744 - val_loss: 0.1024 - val_acc: 0.9680
Epoch 6/30
50000/50000 [==============================] - 2s 41us/step - loss: 0.0678 - acc: 0.9793 - val_loss: 0.1031 - val_acc: 0.9678
Epoch 7/30
50000/50000 [==============================] - 2s 40us/step - loss: 0.0595 - acc: 0.9810 - val_loss: 0.0978 - val_acc: 0.9719
Epoch 8/30
50000/50000 [==============================] - 2s 40us/step - loss: 0.0513 - acc: 0.9840 - val_loss: 0.1035 - val_acc: 0.9684
Epoch 9/30
50000/50000 [==============================] - 2s 40us/step - loss: 0.0449 - acc: 0.9857 - val_loss: 0.1008 - val_acc: 0.9699
Epoch 10/30
50000/50000 [==============================] - 2s 39us/step - loss: 0.0405 - acc: 0.9871 - val_loss: 0.1019 - val_acc: 0.9734
Epoch 11/30
50000/50000 [==============================] - 2s 41us/step - loss: 0.0349 - acc: 0.9887 - val_loss: 0.1073 - val_acc: 0.9721
Epoch 12/30
50000/50000 [==============================] - 2s 42us/step - loss: 0.0308 - acc: 0.9903 - val_loss: 0.1002 - val_acc: 0.9738
Epoch 13/30
50000/50000 [==============================] - 2s 44us/step - loss: 0.0272 - acc: 0.9914 - val_loss: 0.1094 - val_acc: 0.9715
Epoch 14/30
50000/50000 [==============================] - 2s 40us/step - loss: 0.0208 - acc: 0.9935 - val_loss: 0.0992 - val_acc: 0.9758
Epoch 15/30
50000/50000 [==============================] - 2s 38us/step - loss: 0.0191 - acc: 0.9941 - val_loss: 0.1051 - val_acc: 0.9733
Epoch 16/30
50000/50000 [==============================] - 2s 39us/step - loss: 0.0177 - acc: 0.9944 - val_loss: 0.1061 - val_acc: 0.9734
Epoch 17/30
50000/50000 [==============================] - 2s 40us/step - loss: 0.0292 - acc: 0.9918 - val_loss: 0.1027 - val_acc: 0.9739
Epoch 18/30
50000/50000 [==============================] - 2s 44us/step - loss: 0.0185 - acc: 0.9936 - val_loss: 0.1075 - val_acc: 0.9759
Epoch 19/30
50000/50000 [==============================] - 2s 39us/step - loss: 0.0161 - acc: 0.9950 - val_loss: 0.1143 - val_acc: 0.9743
Epoch 20/30
50000/50000 [==============================] - 2s 41us/step - loss: 0.0141 - acc: 0.9956 - val_loss: 0.1119 - val_acc: 0.9743
Epoch 21/30
50000/50000 [==============================] - 2s 41us/step - loss: 0.0099 - acc: 0.9970 - val_loss: 0.1114 - val_acc: 0.9762
Epoch 22/30
50000/50000 [==============================] - 2s 42us/step - loss: 0.0078 - acc: 0.9978 - val_loss: 0.1211 - val_acc: 0.9736
Epoch 23/30
50000/50000 [==============================] - 2s 42us/step - loss: 0.0042 - acc: 0.9990 - val_loss: 0.1117 - val_acc: 0.9773
Epoch 24/30
50000/50000 [==============================] - 2s 40us/step - loss: 0.0042 - acc: 0.9991 - val_loss: 0.1102 - val_acc: 0.9762
Epoch 25/30
50000/50000 [==============================] - 2s 39us/step - loss: 0.0023 - acc: 0.9998 - val_loss: 0.1119 - val_acc: 0.9767
Epoch 26/30
50000/50000 [==============================] - 2s 42us/step - loss: 0.0013 - acc: 1.0000 - val_loss: 0.1116 - val_acc: 0.9772
Epoch 27/30
50000/50000 [==============================] - 2s 41us/step - loss: 0.0010 - acc: 1.0000 - val_loss: 0.1120 - val_acc: 0.9770
Epoch 28/30
50000/50000 [==============================] - 2s 41us/step - loss: 8.1927e-04 - acc: 1.0000 - val_loss: 0.1123 - val_acc: 0.9774
Epoch 29/30
50000/50000 [==============================] - 2s 40us/step - loss: 7.1307e-04 - acc: 1.0000 - val_loss: 0.1132 - val_acc: 0.9780
Epoch 30/30
50000/50000 [==============================] - 2s 39us/step - loss: 6.3803e-04 - acc: 1.0000 - val_loss: 0.1142 - val_acc: 0.9782
CPU times: user 1min 32s, sys: 6.81 s, total: 1min 39s
Wall time: 1min 2s

In [ ]: