Convolutional Neural Networks


In this notebook, we train a CNN to classify images from the CIFAR-10 database.

The images in this database are small color images that fall into one of ten classes; some example images are pictured below.

Test for CUDA

Since these are larger (32x32x3) images, it may prove useful to speed up your training time by using a GPU. CUDA is a parallel computing platform and CUDA Tensors are the same as typical Tensors, only they utilize GPU's for computation.


In [1]:
import torch
import numpy as np

# Check if CUDA is available
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print('CUDA is available!  Training on GPU ...')


CUDA is available!  Training on GPU ...

In [2]:
# Training on the CPU
train_on_gpu = False

Load and Augment the Data

Downloading may take a minute. We load in the training and test data, split the training data into a training and validation set, then create DataLoaders for each of these sets of data.

Augmentation

In this cell, we perform some simple data augmentation by randomly flipping and rotating the given image data. We do this by defining a torchvision transform, and you can learn about all the transforms that are used to pre-process and augment data, here.

TODO: Look at the transformation documentation; add more augmentation transforms, and see how your model performs.

This type of data augmentation should add some positional variety to these images, so that when we train a model on this data, it will be robust in the face of geometric changes (i.e. it will recognize a ship, no matter which direction it is facing). It's recommended that you choose one or two transforms.


In [3]:
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Number of subprocesses to use for data loading
num_workers = 0
# How many samples per batch to load
batch_size = 20
# Percentage of training set to use as validation
valid_size = 0.2

# Convert data to a normalized torch.FloatTensor
transform = transforms.Compose([transforms.RandomHorizontalFlip(), # randomly flip and rotate
                                transforms.RandomRotation(degrees=10),
                                transforms.ToTensor(),
                                transforms.Normalize(mean=(0.5, 0.5, 0.5),
                                                     std=(0.5, 0.5, 0.5))])

# choose the training and test datasets
train_data = datasets.CIFAR10(root='data',
                              train=True,
                              download=True,
                              transform=transform)
test_data = datasets.CIFAR10(root='data',
                             train=False,
                             download=True,
                             transform=transform)

# obtain training indices that will be used for validation
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(indices=train_idx)
valid_sampler = SubsetRandomSampler(indices=valid_idx)

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(dataset=train_data,
                                           batch_size=batch_size,
                                           sampler=train_sampler,
                                           num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(dataset=train_data,
                                           batch_size=batch_size,
                                           sampler=valid_sampler,
                                           num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(dataset=test_data,
                                          batch_size=batch_size,
                                          num_workers=num_workers)

# specify the image classes
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck']


Files already downloaded and verified
Files already downloaded and verified

Visualize a Batch of Training Data


In [4]:
import matplotlib.pyplot as plt
%matplotlib inline

# helper function to un-normalize and display an image
def imshow(img):
    img = img / 2 + 0.5  # unnormalize
    plt.imshow(np.transpose(img, (1, 2, 0)))  # convert from Tensor image

In [5]:
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy() # convert images to numpy for display

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
# display 20 images
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    imshow(images[idx])
    ax.set_title(classes[labels[idx]])


View an Image in More Detail

Here, we look at the normalized red, green, and blue (RGB) color channels as three separate, grayscale intensity images.


In [6]:
rgb_img = np.squeeze(images[3])
channels = ['red channel', 'green channel', 'blue channel']

fig = plt.figure(figsize = (36, 36)) 
for idx in np.arange(rgb_img.shape[0]):
    ax = fig.add_subplot(1, 3, idx + 1)
    img = rgb_img[idx]
    ax.imshow(img, cmap='gray')
    ax.set_title(channels[idx])
    width, height = img.shape
    thresh = img.max()/2.5
    for x in range(width):
        for y in range(height):
            val = round(img[x][y],2) if img[x][y] !=0 else 0
            ax.annotate(str(val), xy=(y,x),
                    horizontalalignment='center',
                    verticalalignment='center', size=8,
                    color='white' if img[x][y]<thresh else 'black')



Define the Network Architecture

This time, you'll define a CNN architecture. Instead of an MLP, which used linear, fully-connected layers, you'll use the following:

  • Convolutional layers, which can be thought of as stack of filtered images.
  • Maxpooling layers, which reduce the x-y size of an input, keeping only the most active pixels from the previous layer.
  • The usual Linear + Dropout layers to avoid overfitting and produce a 10-dim output.

A network with 2 convolutional layers is shown in the image below and in the code, and you've been given starter code with one convolutional and one maxpooling layer.

TODO: Define a model with multiple convolutional layers, and define the feedforward metwork behavior.

The more convolutional layers you include, the more complex patterns in color and shape a model can detect. It's suggested that your final model include 2 or 3 convolutional layers as well as linear layers + dropout in between to avoid overfitting.

It's good practice to look at existing research and implementations of related models as a starting point for defining your own models. You may find it useful to look at this PyTorch classification example or this, more complex Keras example to help decide on a final structure.

Output volume for a convolutional layer

To compute the output size of a given convolutional layer we can perform the following calculation (taken from Stanford's cs231n course):

We can compute the spatial size of the output volume as a function of the input volume size (W), the kernel/filter size (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. The correct formula for calculating how many neurons define the output_W is given by (W−F+2P)/S+1.

For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output.


In [7]:
import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Convolutional layer (sees 32x32x3 image tensor)
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        # Convolutional layer (sees 16x16x16 tensor)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        # Convolutional layer (sees 8x8x32 tensor)
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        # Max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        # Linear layer (64 * 4 * 4 -> 500)
        self.fc1 = nn.Linear(64 * 4 * 4, 500)
        # Linear layer (500 -> 10)
        self.fc2 = nn.Linear(500, 10)
        # Dropout layer (p=0.25)
        self.dropout = nn.Dropout(0.25)

    def forward(self, x):
        # Add sequence of convolutional and max pooling layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        # Flatten image input
        x = x.view(-1, 64 * 4 * 4)
        # Add dropout layer
        x = self.dropout(x)
        # Add 1st hidden layer, with relu activation function
        x = F.relu(self.fc1(x))
        # Add dropout layer
        x = self.dropout(x)
        # Add 2nd hidden layer, with relu activation function
        x = self.fc2(x)
        return x

# Create a complete CNN
model = Net()
print(model)

# Move tensors to GPU if CUDA is available
if train_on_gpu:
    model.cuda()


Net(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=1024, out_features=500, bias=True)
  (fc2): Linear(in_features=500, out_features=10, bias=True)
  (dropout): Dropout(p=0.25)
)

Specify Loss Function and Optimizer

Decide on a loss and optimization function that is best suited for this classification task. The linked code examples from above, may be a good starting point; this PyTorch classification example or this, more complex Keras example. Pay close attention to the value for learning rate as this value determines how your model converges to a small error.

TODO: Define the loss and optimizer and see how these choices change the loss over time.


In [8]:
import torch.optim as optim

# Specify loss function (categorical cross-entropy)
criterion = nn.CrossEntropyLoss()

# Specify optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)

Train the Network

Remember to look at how the training and validation loss decreases over time; if the validation loss ever increases it indicates possible overfitting.


In [9]:
# Number of epochs to train the model
n_epochs = 30

valid_loss_min = np.Inf # track change in validation loss

for epoch in range(1, n_epochs + 1):

    # Keep track of training and validation loss
    train_loss = 0.0
    valid_loss = 0.0
    
    # train the model
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        # Move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # Clear the gradients of all optimized variables
        optimizer.zero_grad()
        # Forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # Calculate the batch loss
        loss = criterion(output, target)
        # Backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # Perform a single optimization step (parameter update)
        optimizer.step()
        # Update training loss
        train_loss += loss.item()*data.size(0)
          
    # Validate the model
    model.eval()
    for batch_idx, (data, target) in enumerate(valid_loader):
        # Move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # Forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # Calculate the batch loss
        loss = criterion(output, target)
        # Update average validation loss 
        valid_loss += loss.item()*data.size(0)
    
    # Calculate average losses
    train_loss = train_loss/len(train_loader.dataset)
    valid_loss = valid_loss/len(valid_loader.dataset)
        
    # Print training / validation statistics 
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
        epoch, train_loss, valid_loss))
    
    # Save model if validation loss has decreased
    if valid_loss <= valid_loss_min:
        print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
        valid_loss_min,
        valid_loss))
        torch.save(model.state_dict(), './models/model_augmented.pth')
        valid_loss_min = valid_loss


Epoch: 1 	Training Loss: 1.841536 	Validation Loss: 0.459870
Validation loss decreased (inf --> 0.459870).  Saving model ...
Epoch: 2 	Training Loss: 1.838149 	Validation Loss: 0.458697
Validation loss decreased (0.459870 --> 0.458697).  Saving model ...
Epoch: 3 	Training Loss: 1.831323 	Validation Loss: 0.455757
Validation loss decreased (0.458697 --> 0.455757).  Saving model ...
Epoch: 4 	Training Loss: 1.808367 	Validation Loss: 0.444980
Validation loss decreased (0.455757 --> 0.444980).  Saving model ...
Epoch: 5 	Training Loss: 1.740955 	Validation Loss: 0.422208
Validation loss decreased (0.444980 --> 0.422208).  Saving model ...
Epoch: 6 	Training Loss: 1.674041 	Validation Loss: 0.409740
Validation loss decreased (0.422208 --> 0.409740).  Saving model ...
Epoch: 7 	Training Loss: 1.632229 	Validation Loss: 0.399800
Validation loss decreased (0.409740 --> 0.399800).  Saving model ...
Epoch: 8 	Training Loss: 1.597971 	Validation Loss: 0.390888
Validation loss decreased (0.399800 --> 0.390888).  Saving model ...
Epoch: 9 	Training Loss: 1.562747 	Validation Loss: 0.381608
Validation loss decreased (0.390888 --> 0.381608).  Saving model ...
Epoch: 10 	Training Loss: 1.525217 	Validation Loss: 0.370303
Validation loss decreased (0.381608 --> 0.370303).  Saving model ...
Epoch: 11 	Training Loss: 1.486305 	Validation Loss: 0.359043
Validation loss decreased (0.370303 --> 0.359043).  Saving model ...
Epoch: 12 	Training Loss: 1.446706 	Validation Loss: 0.347437
Validation loss decreased (0.359043 --> 0.347437).  Saving model ...
Epoch: 13 	Training Loss: 1.406326 	Validation Loss: 0.335667
Validation loss decreased (0.347437 --> 0.335667).  Saving model ...
Epoch: 14 	Training Loss: 1.367256 	Validation Loss: 0.325635
Validation loss decreased (0.335667 --> 0.325635).  Saving model ...
Epoch: 15 	Training Loss: 1.334556 	Validation Loss: 0.317403
Validation loss decreased (0.325635 --> 0.317403).  Saving model ...
Epoch: 16 	Training Loss: 1.305848 	Validation Loss: 0.311378
Validation loss decreased (0.317403 --> 0.311378).  Saving model ...
Epoch: 17 	Training Loss: 1.283004 	Validation Loss: 0.306173
Validation loss decreased (0.311378 --> 0.306173).  Saving model ...
Epoch: 18 	Training Loss: 1.262927 	Validation Loss: 0.301529
Validation loss decreased (0.306173 --> 0.301529).  Saving model ...
Epoch: 19 	Training Loss: 1.245384 	Validation Loss: 0.299505
Validation loss decreased (0.301529 --> 0.299505).  Saving model ...
Epoch: 20 	Training Loss: 1.228464 	Validation Loss: 0.293387
Validation loss decreased (0.299505 --> 0.293387).  Saving model ...
Epoch: 21 	Training Loss: 1.215509 	Validation Loss: 0.290346
Validation loss decreased (0.293387 --> 0.290346).  Saving model ...
Epoch: 22 	Training Loss: 1.200476 	Validation Loss: 0.286930
Validation loss decreased (0.290346 --> 0.286930).  Saving model ...
Epoch: 23 	Training Loss: 1.189381 	Validation Loss: 0.284045
Validation loss decreased (0.286930 --> 0.284045).  Saving model ...
Epoch: 24 	Training Loss: 1.176571 	Validation Loss: 0.279836
Validation loss decreased (0.284045 --> 0.279836).  Saving model ...
Epoch: 25 	Training Loss: 1.163998 	Validation Loss: 0.277205
Validation loss decreased (0.279836 --> 0.277205).  Saving model ...
Epoch: 26 	Training Loss: 1.152890 	Validation Loss: 0.274506
Validation loss decreased (0.277205 --> 0.274506).  Saving model ...
Epoch: 27 	Training Loss: 1.142232 	Validation Loss: 0.272191
Validation loss decreased (0.274506 --> 0.272191).  Saving model ...
Epoch: 28 	Training Loss: 1.132959 	Validation Loss: 0.269709
Validation loss decreased (0.272191 --> 0.269709).  Saving model ...
Epoch: 29 	Training Loss: 1.120869 	Validation Loss: 0.266593
Validation loss decreased (0.269709 --> 0.266593).  Saving model ...
Epoch: 30 	Training Loss: 1.108679 	Validation Loss: 0.264410
Validation loss decreased (0.266593 --> 0.264410).  Saving model ...

Load the Model with the Lowest Validation Loss


In [10]:
model.load_state_dict(torch.load('./models/model_augmented.pth'))

Test the Trained Network

Test your trained model on previously unseen data! A "good" result will be a CNN that gets around 70% (or more, try your best!) accuracy on these test images.


In [11]:
# Track test loss
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

model.eval()
# Iterate over test data
for batch_idx, (data, target) in enumerate(test_loader):
    # Move tensors to GPU if CUDA is available
    if train_on_gpu:
        data, target = data.cuda(), target.cuda()
    # Forward pass: compute predicted outputs by passing inputs to the model
    output = model(data)
    # Calculate the batch loss
    loss = criterion(output, target)
    # Update test loss 
    test_loss += loss.item()*data.size(0)
    # Convert output probabilities to predicted class
    _, pred = torch.max(output, 1)    
    # Compare predictions to true label
    correct_tensor = pred.eq(target.data.view_as(pred))
    correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
    # Calculate test accuracy for each object class
    for i in range(batch_size):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1

# Average test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (classes[i], 100 * class_correct[i] / class_total[i],
                                                         np.sum(class_correct[i]), np.sum(class_total[i])))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (100. * np.sum(class_correct) / np.sum(class_total),
                                                      np.sum(class_correct), np.sum(class_total)))


Test Loss: 1.321644

Test Accuracy of airplane: 53% (538/1000)
Test Accuracy of automobile: 67% (674/1000)
Test Accuracy of  bird: 34% (344/1000)
Test Accuracy of   cat: 30% (302/1000)
Test Accuracy of  deer: 36% (360/1000)
Test Accuracy of   dog: 49% (495/1000)
Test Accuracy of  frog: 71% (711/1000)
Test Accuracy of horse: 60% (600/1000)
Test Accuracy of  ship: 66% (662/1000)
Test Accuracy of truck: 56% (569/1000)

Test Accuracy (Overall): 52% (5255/10000)

Visualize Sample Test Results


In [12]:
# Obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()
images.numpy()

# Move model inputs to cuda, if GPU available
if train_on_gpu:
    images = images.cuda()

# Get sample outputs
output = model(images)
# Convert output probabilities to predicted class
_, preds_tensor = torch.max(output, 1)
preds = np.squeeze(preds_tensor.numpy()) if not train_on_gpu else np.squeeze(preds_tensor.cpu().numpy())

# Plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    imshow(images[idx])
    ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),
                 color=("green" if preds[idx]==labels[idx].item() else "red"))