Introduction to Convolution Neural Nets

========

Version 0.1

By B Nord 2018 Nov 09

This notebook was developed within the framework. The original notebook can be run in a web browser, and is available . It has been recreated below, though we recommend you run the web-based version.

Install packages on the back end


In [0]:
# install software on the backend, which is located at 
# Google's Super Secret Sky Server in an alternate universe.
# The backend is called a 'hosted runtime' if it is on their server.
# A local runtime would start a colab notebook on your machine locally. 
# Think of google colab as a Google Docs version of Jupyter Notebooks

# remove display of install details
%%capture --no-display 

# pip install
!pip install numpy matplotlib scipy pandas scikit-learn astropy seaborn ipython jupyter #standard install for DSFP
!pip install keras tensorflow  # required for deep learning 
!pip install pycm

In [2]:
# standard-ish imports
import numpy as np
import matplotlib.pyplot as plt
import time
import itertools

# non-standard, but stable package for confusion matrices
from pycm import ConfusionMatrix


# neural network / machine learning packages
from sklearn import metrics
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras import backend as K


Using TensorFlow backend.

Convolutional Neural Networks make the future now!

Learning Objectives

  1. Gain familiarity with
    1. Two standard convolutional neural network (CNN) architectures:
      1. Feed-forward CNN
      2. Convolutional Autoencoder (CAE)
    2. One standard task performed with CNNs: Binary Classification
    3. One new diagnostic of CNNs: Feature maps from the first layer
  2. Experience fundamental considerations, pitfalls, and strategies when training NNs
    1. Data set preparation (never underestimate the time required for this)
    2. CNN layer manipulation and architecture design
    3. Model fitting (the learning process)
    4. Effects of image quality
  3. Apply diagnostics from previous exercises
  4. Apply new diagnostics: look inside the networks with feature maps of the first layer
  5. Continue connecting NN functionality to data set structure and problem of interest

Some of this notebook is very similar to the first one, but we're using a new architecture that has more moving pieces.

I'm still taking bets that we can start a paper with deep nets during the Saturday hack.

Activity 1: Classify Handwritten Digits with Convolutional Neural Networks (CNNs)

Is it a "zero" [0] or a "one" [1]? (ooooh, the suspense; or maybe the suspense has dissipated by now.)

Prepare the Data

Download the data

(ooh look it's all stored on Amazon's AWS!) (pssst, we're in the cloooud)


In [0]:
# import MNIST data
(x_train_temp, y_train_temp), (x_test_temp, y_test_temp) = mnist.load_data()

Look at the data

(always do this so that you know what the structure is.)


In [0]:
# Print the shapes
print("Train Data Shape:", x_train_temp.shape)
print("Test Data Shape:", x_test_temp.shape)
print("Train Label Shape:", y_train_temp.shape)
print("Test Label Shape:", y_test_temp.shape)

Do the shapes of 'data' and 'label' (for train and test, respectively) match? If they don't now, Keras/TF will kindly yell at you later.


In [0]:
# Print an example
print("Example:")
print("y_train[0] is the label for the 0th image, and it is a", y_train_temp[0])
print("x_train[0] is the image data, and you kind of see the pattern in the array of numbers")
print(x_train_temp[0])

Can you see the pattern of the number in the array?


In [0]:
# Plot the data! 
f = plt.figure()
f.add_subplot(1,2, 1)
plt.imshow(x_train_temp[0])
f.add_subplot(1,2, 2)
plt.imshow(x_train_temp[1])
plt.show(block=True)

Prepare the data

Data often need to be re-shaped and normalized for ingestion into the neural network.

Normalize the data

The images are recast as float and normalized to one for the network.


In [0]:
print("Before:", np.min(x_train_temp), np.max(x_train_temp))
x_train = x_train_temp.astype('float32')
x_test = x_test_temp.astype('float32')
x_train /= 255
x_test /= 255
y_train = y_train_temp
y_test = y_test_temp
print("After:", np.min(x_train), np.max(x_train))

Reshape the data arrays: set the input shape to be ready for a convolution [NEW]

We're going to use a Dense Neural Architecture, not as images, so we need to make the input shape appropriate.


In [0]:
# read the dimensions from one example in the training set
img_rows, img_cols = x_train[0].shape[0], x_train[0].shape[1]

# Different NN libraries (e.g., TF) use different ordering of dimensions
# Here we set the "input shape" so that later the NN knows what shape to expect
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)  
    input_shape = (img_rows, img_cols, 1)

Apply one-hot encoding to the data

  1. Current encoding provides a literal label. For example, the label for "3" is 3.
  2. One-hot encoding places a "1" in an array at the appropriate location for that datum. For example, the label "3" becomes [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]

This increases the efficiency of the matrix algebra during network training and evaluation.


In [0]:
# One-hot encoding
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Design Neural Network Architecture!

Select model format


In [0]:
model = Sequential()

Add layers to the model sequentially [NEW]


In [0]:
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.summary()

Things to think about and notice:

  1. How does the "output shape" column change as you go through the network? How does this relate to pictures of CNNs you've seen (or might find on google images, for example)?
  2. What happens when you re-compile the , without first compiling model-definition cell. Why does that happen?

Compile the model

Select three key options

  1. optimizer: the method for optimizing the weights. "Stochastic Gradient Descent (SGD)" is the canonical method.
  2. loss function: the form of the function to encode the difference between the data's true label and the predict label.
  3. metric: the function by which the model is evaluated.

In [0]:
model.compile(optimizer="sgd", loss='categorical_crossentropy', metrics=['accuracy'])

Fit (read: Train) the model


In [0]:
# Training parameters
batch_size = 32 # number of images per epoch
num_epochs = 5 # number of epochs
validation_split = 0.8 # fraction of the training set that is for validation only

In [0]:
# Train the model
history = model.fit(x_train, y_train, 
                    batch_size=batch_size, 
                    epochs=num_epochs, 
                    validation_split=validation_split, 
                    verbose=True)

Things to think about and notice:

  1. How fast is this training compared to the Dense/Fully Connected Networks? What could be a causing a difference between these two networks?
  2. Why is it taking a long time at the end of each epoch?

Diagnostics!

Evaluate overall model efficacy

Evaluate model on training and test data and compare. This provides summary values that are equivalent to the final value in the accuracy/loss history plots.


In [0]:
loss_train, acc_train  = model.evaluate(x_train, y_train, verbose=False)
loss_test, acc_test  = model.evaluate(x_test, y_test, verbose=False)
print(f'Train acc/loss: {acc_train:.3}, {loss_train:.3}')
print(f'Test acc/loss: {acc_test:.3}, {loss_test:.3}')

Predict train and test data


In [0]:
y_pred_train = model.predict(x_train, verbose=True)
y_pred_test = model.predict(x_test,verbose=True)

Plot accuracy and loss as a function of epochs (equivalently training time)


In [0]:
# set up figure
f = plt.figure(figsize=(12,5))
f.add_subplot(1,2, 1)

# plot accuracy as a function of epoch
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='best')

# plot loss as a function of epoch
f.add_subplot(1,2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='best')
plt.show(block=True)

Things to think about and notice:

  1. How do these curve shapes compare to the initial dense network results?

Confusion Matrix


In [0]:
# Function: Convert from categorical back to numerical value
def convert_to_index(array_categorical):
  array_index = [np.argmax(array_temp) for array_temp in array_categorical]
  return array_index

def plot_confusion_matrix(cm,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function modified to plots the ConfusionMatrix object.
    Normalization can be applied by setting `normalize=True`.
    
    Code Reference : 
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
    
    This script is derived from PyCM repository: https://github.com/sepandhaghighi/pycm
    
    """

    plt_cm = []
    for i in cm.classes :
        row=[]
        for j in cm.classes:
            row.append(cm.table[i][j])
        plt_cm.append(row)
    plt_cm = np.array(plt_cm)
    if normalize:
        plt_cm = plt_cm.astype('float') / plt_cm.sum(axis=1)[:, np.newaxis]     
    plt.imshow(plt_cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(cm.classes))
    plt.xticks(tick_marks, cm.classes, rotation=45)
    plt.yticks(tick_marks, cm.classes)

    fmt = '.2f' if normalize else 'd'
    thresh = plt_cm.max() / 2.
    for i, j in itertools.product(range(plt_cm.shape[0]), range(plt_cm.shape[1])):
        plt.text(j, i, format(plt_cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if plt_cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('Actual')
    plt.xlabel('Predict')

In [0]:
# apply conversion function to data
y_test_ind = convert_to_index(y_test)
y_pred_test_ind = convert_to_index(y_pred_test)

# compute confusion matrix
cm_test = ConfusionMatrix(y_test_ind, y_pred_test_ind)
np.set_printoptions(precision=2)

# plot confusion matrix result
plt.figure()
plot_confusion_matrix(cm_test,title='cm')

Things to think about and notice:

  1. How does this confusion matrix compare to that from the Dense network?

Problems for the CNNs (I mean ones that Wolf Blitzer can't solve)


Problem 1: There are a lot of moving parts here. A lot of in's and out's

(bonus points if you know the 2000's movie, from which this is a near-quote.)

So, let's reduce the data set size at the beginning of the notebook.

For the rest of the exercises, we'd like to have the flexibility to experiment with larger networks (MOAR PARAMETERS, MOAR), so let's reduce the data set size.

  1. Go to the , and add a cell after it.
  2. Use array indexing and slicing to create a smaller training set. How about 5000?
  3. When train the model then, we'll want to update that validation fraction so that we get about 3000 in our training set.

Problem 2: Keeeep Learning!

What happens if you run the cell that does the model-fitting again, right after doing it the first time. What do you notice about the loss and accuracy, as compared to when you did the fitting the first time?

Why do you think this is happening?


Problem 3: What happens if you add a maxpooling layer?

Does this change the training speed? Why might this be? Check the model summary output to see what effect the pooling layer has.


Problem 4: How deep can you make the network?

  1. Make a deep network and see how many parameters you can make. Is it trainable in a reasonable amount of time? Try add Conv layers, but not pooling layers.
  2. What if you want it to be efficient? Try adding a Max Pooling Layers after every Conv layer. How many layers can you possibly add now? Compile the model until you have an output shape of ( None, 1, 1 , #PARAMS) before the first dense layer.

Problem 5: Comparing performance and efficiency between CNNs and Dense Networks

Experiment with the neural network above, and reduce the number of parameters to near that of the Dense network in the first exercise.

Is there a CNN architecture that has the same number of parameters as the Dense network, but can perform better?

Remember to think deeply, to pool your resources. When you're nearing the end it may not be as dense as it looks, but nearly so.


Problem 6: What happens to the training result when you degrade the images?

In this part, we will degrade the images by adding noise, and then by blurring the images, and we'll look at how the network training responds.


Problem 7: Let's see if we can look inside the neural networks

Using the FAQ from Keras or any other online resource, like examples from Github, can we make a plot of the feature maps for any of the layers, so we can see what the neural net sees?


Problem 8: Let's progress to Regression.

Consider the labels as real values and modify the network to perform regression instead of classification on those values. You may want to consider the following:

  • normalizing the labels.
  • normalizing the image data.
  • modifying the activations that are used.
  • modifying the loss function that is appropriate for real-valued prediction. (see keras loss )

Activity 2: Compress Handwritten Digits with a Convolutional Autoencoder (CAE)

Add layers to the model sequentially [NEW]


In [0]:
autoencoder = Sequential()

# Encoder Layers
autoencoder.add(Conv2D(16, (3, 3), activation='relu', padding='same', input_shape=x_train.shape[1:]))
autoencoder.add(MaxPooling2D((2, 2), padding='same'))
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(MaxPooling2D((2, 2), padding='same'))
autoencoder.add(Conv2D(8, (3, 3), strides=(2,2), activation='relu', padding='same'))
autoencoder.add(MaxPooling2D((2, 2), padding='same'))
autoencoder.add(Conv2D(8, (3, 3), strides=(2,2), activation='relu', padding='same'))
autoencoder.add(MaxPooling2D((2, 2), padding='same'))

# Flatten encoding for visualization
autoencoder.add(Flatten())
autoencoder.add(Reshape((1, 1, 8)))

# Decoder Layers
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(16, (3, 3), activation='relu'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(1, (3, 3), activation='sigmoid', padding='same'))

autoencoder.summary()

Create a separate model that is just the encoder

This will allow us to encode the images and look at what the encoding results in.


In [0]:
encoder = Model(inputs=autoencoder.input, outputs=autoencoder.get_layer('flatten_8').output)
encoder.summary()

Compile the autencoder


In [0]:
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
num_epochs = 10

Plot the input, output, and encoded images


In [0]:
# set number of images to visualize
num_images = 10

# select random subsect to visualize
np.random.seed(42)
random_test_images = np.random.randint(x_test.shape[0], size=num_images)

# encode images
encoded_imgs = encoder.predict(x_test)

#decode encode AND decode images
decoded_imgs = autoencoder.predict(x_test)


# plot figure
plt.figure(figsize=(18, 4))

num_rows=4
num_pixel_x = 2
num_pixel_y = 4

for i, image_idx in enumerate(random_test_images):
    # plot original image
    ax = plt.subplot(4, num_images, i + 1)
    plt.imshow(x_test[image_idx].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
    # plot encoded image
    ax = plt.subplot(num_rows, num_images, num_images + i + 1)

    plt.imshow(encoded_imgs[image_idx].reshape(num_pixel_x, num_pixel_y), interpolation=None, resample=None)
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
    # plot reconstructed image
    ax = plt.subplot(num_rows, num_images, 2*num_images + i + 1)
    plt.imshow(decoded_imgs[image_idx].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()