Introduction to Neural Networks

========

Version 0.1

By B Nord 2018 Nov 09

This notebook was developed within the framework. The original notebook can be run in a web browser, and is available . It has been recreated below, though we recommend you run the web-based version.

Install packages on the back end



In [0]:

    
# install software on the backend, which is located at 
# Google's Super Secret Sky Server in an alternate universe.
# The backend is called a 'hosted runtime' if it is on their server.
# A local runtime would start a colab notebook on your machine locally. 
# Think of google colab as a Google Docs version of Jupyter Notebooks

# remove display of install details
%%capture --no-display 

# pip install
!pip install numpy matplotlib scipy pandas scikit-learn astropy seaborn ipython jupyter #standard install for DSFP
!pip install keras tensorflow  # required for deep learning 
!pip install pycm



In [0]:

    
# standard-ish imports
import numpy as np
import matplotlib.pyplot as plt
import time
import itertools

# non-standard, but stable package for confusion matrices
from pycm import ConfusionMatrix


# neural network / machine learning packages
from sklearn import metrics
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras import backend as K

Let's experiment with neural networks!

Learning Objectives

Gain familiarity with
1. Standard NN libraries: Keras and Tensorflow
2. One standard NN architecture: fully connected ('dense') networks
3. Two standard tasks performed with NNs: Binary Classification, Multi-class Classification
4. Diagnostics of NNs
  1. History (Loss Function)
  2. Receiver Operator Curve and Area Under the Curve
Experience fundamental considerations, pitfalls, and strategies when training NNs
1. Data set preparation (never underestimate the time required for this)
2. Training set size
3. Training speed and efficiency
4. Model fitting (training)
Begin connecting NN functionality to data set structure and problem of interest

Topics not covered

Class (im)balance
Training set diversity

Notes

Our main task will be to Classify Handwritten Digits (Is it a "zero" [0] or a "one" [1]?; ooooh, the suspense). This is very useful though, because it's an opportunity to separate yourself from the science of the data. MNIST is a benchmark data set for machine learning. Astronomy doesn't yet have machine learning-specific benchmark data sets.

Prepare the Data

For this MNIST set, this is the fastest it will probably ever take you to prepare a data set. Ye have been warn-ed.

Download the data

(ooh look it's all stored on Amazon's AWS!) (pssst, we're in the cloooud, it's the future!)



In [0]:

    
# import MNIST data
(x_train_temp, y_train_temp), (x_test_temp, y_test_temp) = mnist.load_data()

Look at the data

(always do this so that you know what the structure is.)



In [0]:

    
# Print the shapes
print("Train Data Shape:", x_train_temp.shape)
print("Test Data Shape:", x_test_temp.shape)
print("Train Label Shape:", y_train_temp.shape)
print("Test Label Shape:", y_test_temp.shape)

Do the shapes of 'data' and 'label' (for train and test, respectively) match? If they don't now, Keras/TF will kindly yell at you later.



In [0]:

    
# Print an example
print("Example:")
print("y_train[0] is the label for the 0th image, and it is a", y_train_temp[0])
print("x_train[0] is the image data, and you kind of see the pattern in the array of numbers")
print(x_train_temp[0])

Can you see the pattern of the number in the array?



In [0]:

    
# Plot the data! 
f = plt.figure()
f.add_subplot(1,2, 1)
plt.imshow(x_train_temp[0])
f.add_subplot(1,2, 2)
plt.imshow(x_train_temp[1])
plt.show(block=True)

Prepare the data

Data often need to be re-shaped and normalized for ingestion into the neural network.

Normalize the data

The images are recast as float and normalized to one for the network.



In [0]:

    
print("Before:", np.min(x_train_temp), np.max(x_train_temp))
x_train = x_train_temp.astype('float32')
x_test = x_test_temp.astype('float32')
x_train /= 255
x_test /= 255
y_train = y_train_temp
y_test = y_test_temp
print("After:", np.min(x_train), np.max(x_train))

Reshape the data arrays: flatten for ingestion into a Dense layer.

We're going to use a Dense Neural Network Architecture, which takes 1D data structures, not 2D images. So we need to make the input shape appropriate, so we 'flatten' it.



In [0]:

    
# Flatten the images
img_rows, img_cols = x_train[0].shape[0], x_train[0].shape[1] 
img_size = img_rows*img_cols
x_train = x_train.reshape(x_train.shape[0], img_size)
x_test = x_test.reshape(x_test.shape[0], img_size)
print("New shape", x_train.shape)

Apply one-hot encoding to the data

Current encoding provides a literal label. For example, the label for "3" is 3.
One-hot encoding places a "1" in an array at the appropriate location for that datum. For example, the label "3" becomes [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]

This increases the efficiency of the matrix algebra during network training and evaluation.



In [0]:

    
# One-hot encoding
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Create a Neural Net and train it!

Decide model format



In [0]:

    
model = Sequential()

For later: check out the keras documentation to see what other kinds of model formats are available. What might they be useful for?

Add layers to the model sequentially



In [0]:

    
model.add(Dense(units=32, activation='sigmoid', input_shape=(img_size,))) # dense layer of 32 neurons
model.add(Dense(units=num_classes, activation='softmax')) # dense layer of 'num_classes' neurons, because these are the number of options for the classification
#model.add(BatchNormalization())
model.summary() # look at the network output

Review the model summary, noting the major features --- the meaning of each column, the flow downward, the number of parameters at the bottom.
What's the math that leads to the number of parameters in each layer? This is a little bit of fun math to play with. In the long run it can help understand how to debug model compilation errors, and to design custom networks better. No need to dwell here too long, but give it a try.The ingredients are
1. image size
2. number of units in the layer
3. number of bias elements per unit

Compile the model

Select three key options

optimizer: the method for optimizing the weights. "Stochastic Gradient Descent (SGD)" is the canonical method.
loss function: the form of the function to encode the difference between the data's true label and the predict label.
metric: the function by which the model is evaluated.



In [0]:

    
model.compile(optimizer="sgd", loss='categorical_crossentropy', metrics=['accuracy'])

The optimizer is an element that I sometimes tune if I want more control over the training. Check the Keras docs for the optiosn available for the optimizer.

Fit (read: Train) the model



In [0]:

    
# Training parameters
batch_size = 32 # number of images per epoch
num_epochs = 5 # number of epochs
validation_split = 0.8 # fraction of the training set that is for validation only



In [0]:

    
history = model.fit(x_train, y_train, 
                    batch_size=batch_size, 
                    epochs=num_epochs, 
                    validation_split=validation_split, 
                    verbose=True)

"Exerciiiiiiisee the neurons! This Network is cleeeah!"

(bonus points if you know the 90's movie, from which this is a near-quote.)

How well does the training go if we don't normalize the data to be between 0 and 1.0?
What is the effect of batch_size on how long it takes the network to achieve 80% accuracy?
What happens if you increase the validation set fraction (and thus reduce the training set size)? Try going really low on the training set size. Can you still get 80% accuracy?

Diagnostics!

Because neural networks are such complex objects, and in this era of experimentation, diagnostics (both standard and tailored) are critical for training NN models.

Evaluate overall model efficacy

Evaluate model on training and test data and compare. This provides summary values that are equivalent to the final value in the accuracy/loss history plots.



In [0]:

    
loss_train, acc_train  = model.evaluate(x_train, y_train, verbose=False)
loss_test, acc_test  = model.evaluate(x_test, y_test, verbose=False)
print(f'Train acc/loss: {acc_train:.3}, {loss_train:.3}')
print(f'Test acc/loss: {acc_test:.3}, {loss_test:.3}')

Predict train and test data



In [0]:

    
y_pred_train = model.predict(x_train, verbose=True)
y_pred_test = model.predict(x_test,verbose=True)

Plot accuracy and loss as a function of epochs (equivalently training time)



In [0]:

    
# set up figure
f = plt.figure(figsize=(12,5))
f.add_subplot(1,2, 1)

# plot accuracy as a function of epoch
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='best')

# plot loss as a function of epoch
f.add_subplot(1,2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='best')
plt.show(block=True)

Define Confusion Matrix



In [0]:

    
# Function: Convert from categorical back to numerical value
def convert_to_index(array_categorical):
  array_index = [np.argmax(array_temp) for array_temp in array_categorical]
  return array_index

def plot_confusion_matrix(cm,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function modified to plots the ConfusionMatrix object.
    Normalization can be applied by setting `normalize=True`.
    
    Code Reference : 
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
    
    This script is derived from PyCM repository: https://github.com/sepandhaghighi/pycm
    
    """

    plt_cm = []
    for i in cm.classes :
        row=[]
        for j in cm.classes:
            row.append(cm.table[i][j])
        plt_cm.append(row)
    plt_cm = np.array(plt_cm)
    if normalize:
        plt_cm = plt_cm.astype('float') / plt_cm.sum(axis=1)[:, np.newaxis]     
    plt.imshow(plt_cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(cm.classes))
    plt.xticks(tick_marks, cm.classes, rotation=45)
    plt.yticks(tick_marks, cm.classes)

    fmt = '.2f' if normalize else 'd'
    thresh = plt_cm.max() / 2.
    for i, j in itertools.product(range(plt_cm.shape[0]), range(plt_cm.shape[1])):
        plt.text(j, i, format(plt_cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if plt_cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('Actual')
    plt.xlabel('Predict')

Plot Confusion Matrix



In [0]:

    
# apply conversion function to data
y_test_ind = convert_to_index(y_test)
y_pred_test_ind = convert_to_index(y_pred_test)

# compute confusion matrix
cm_test = ConfusionMatrix(y_test_ind, y_pred_test_ind)
np.set_printoptions(precision=2)

# plot confusion matrix result
plt.figure()
plot_confusion_matrix(cm_test,title='cm')

Problems

Problem 1: Binary data

Problem 3a: Generate a binary data sets from the MNIST data set.
Problem 3b: Prepare the data set in the same way as the multi-class scenario, but apprioriately for only two classes.
Problem 3c: Re-train the neural network for the binary classification problem.
Problem 3d: Perform all the diagnostics again.
Problem 3e: Use sklearn to generate a Receiver Operator Curve (ROC) and the associated AUC.
Problem 3f: Create a new diagnostic that shows examples of the images and how they were classified. For a given architecture and training, plot four columns of images, five rows down. Each row is just another example of something that is classified. The four columns are
- True positive
- False positive
- True negative
- True postiive
So this will be a grid of images that are examples of classifications. This is important to give a sense of how well the network is classifying different types of objects.

Problem 2: How does the following metrics scale with the data set size? (Constructing a learning curve)

time required to perform 5 epochs.
accuracy and loss at final step

Some advice for this problem:

create a function or functions that can wrap around the NN architecture, and then output key results.
watch out for network initialization. If you don't redefine the network model each time, the weights will be preserved from the most recent training.

Sub-problems

Try training and evaluation for a number of well-spaced data set sizes and report on the pattern you see. Plot a "learning curve": the loss (or accuracy) as a function of the training set data size. This is an important step in judging what you need for your training set, and in assessing the feasibility of the application.
Do you see any change a function data size? If not, go back and train the original network for more epochs. When does the loss plateau? Use that epoch for the
Does keras have any model function fit parameters that automatically decide when it will stop? Try using that.

Problem 3: How do the following factors and metrics scale with the number of neurons in the first layer?

time required to perform 10 epochs
accuracy and loss at final step

When developing an NN model, we think about the balance between accuracy, computation time, and network complexity. This exercise is intended to draw out some of the patterns in the confluence of those aspects. The goal of this problem is to produce plots that show general trends.

Problem 4: Apply this analysis to the Fashion MNIST data set

tip: you can use a very similar procedure to load the data as you did for MNIST

hint: "fashion_mnist"

Problem 6: Apply this to some mock strong lensing data

We have mock strong lensing data made with Lenspop.

Download from cloud storage to your remote runtime.



In [0]:

    
!gsutil cp -r gs://lsst-dsfp2018/stronglenses ./
!ls .

Load the numpy binary files into numpy arrays



In [0]:

Problem 7: Toying around: make data that looks like your problem.

MNIST is great for learning and in some cases benchmarking, but it's not the best for physics and astronomy problems. In our questions, there's a physical underpinning, there are first principles that underly the images we make. So, when developing a neural network, it can be very useful to generate some data that morphologically looks like your astro data. This can help in developing the architecture before deploying on more complicated data, as well as diagnosing systematics in the more complicated data.