Image Classification

In this project, you'll classify images from the CIFAR-10 dataset. The dataset consists of airplanes, dogs, cats, and other objects. You'll preprocess the images, then train a convolutional neural network on all the samples. The images need to be normalized and the labels need to be one-hot encoded. You'll get to apply what you learned and build a convolutional, max pooling, dropout, and fully connected layers. At the end, you'll get to see your neural network's predictions on the sample images.

Get the Data

Run the following cell to download the CIFAR-10 dataset for python.


In [22]:
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
from urllib.request import urlretrieve
from os.path import isfile, isdir
from tqdm import tqdm
import problem_unittests as tests
import tarfile

cifar10_dataset_folder_path = 'cifar-10-batches-py'

class DLProgress(tqdm):
    last_block = 0

    def hook(self, block_num=1, block_size=1, total_size=None):
        self.total = total_size
        self.update((block_num - self.last_block) * block_size)
        self.last_block = block_num

if not isfile('cifar-10-python.tar.gz'):
    with DLProgress(unit='B', unit_scale=True, miniters=1, desc='CIFAR-10 Dataset') as pbar:
        urlretrieve(
            'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz',
            'cifar-10-python.tar.gz',
            pbar.hook)

if not isdir(cifar10_dataset_folder_path):
    with tarfile.open('cifar-10-python.tar.gz') as tar:
        tar.extractall()
        tar.close()


tests.test_folder_path(cifar10_dataset_folder_path)


All files found!

Explore the Data

The dataset is broken into batches to prevent your machine from running out of memory. The CIFAR-10 dataset consists of 5 batches, named data_batch_1, data_batch_2, etc.. Each batch contains the labels and images that are one of the following:

  • airplane 1
  • automobile 2
  • bird 3
  • cat 4
  • deer 5
  • dog 6
  • frog 7
  • horse 8
  • ship 9
  • truck 10

  • Total 10 classes (Aras changed above/this section a bit)

Understanding a dataset is part of making predictions on the data. Play around with the code cell below by changing the batch_id and sample_id. The batch_id is the id for a batch (1-5). The sample_id is the id for a image and label pair in the batch.

Ask yourself "What are all possible labels?", "What is the range of values for the image data?", "Are the labels in order or random?". Answers to questions like these will help you preprocess the data and end up with better predictions.


In [23]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import helper
import numpy as np

# Explore the dataset
batch_id = 1
sample_id = 5
helper.display_stats(cifar10_dataset_folder_path, batch_id, sample_id)


Stats of batch 1:
Samples: 10000
Label Counts: {0: 1005, 1: 974, 2: 1032, 3: 1016, 4: 999, 5: 937, 6: 1030, 7: 1001, 8: 1025, 9: 981}
First 20 Labels: [6, 9, 9, 4, 1, 1, 2, 7, 8, 3, 4, 7, 7, 2, 9, 9, 9, 3, 2, 6]

Example of Image 5:
Image - Min Value: 0 Max Value: 252
Image - Shape: (32, 32, 3)
Label - Label Id: 1 Name: automobile

Implement Preprocess Functions

Normalize

In the cell below, implement the normalize function to take in image data, x, and return it as a normalized Numpy array. The values should be in the range of 0 to 1, inclusive. The return object should be the same shape as x.


In [24]:
def normalize(x):
    """
    Normalize a list of sample image data in the range of 0 to 1
    : x: List of image data.  The image shape is (32, 32, 3)
    : return: Numpy array of normalize data
    """
    # TODO: Implement Function
    ## image data shape = [t, i,j,k], t= num_img_per_batch (basically the list of images), i,j,k=height,width, and depth/channel
    return x/255


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_normalize(normalize)


Tests Passed

One-hot encode

Just like the previous code cell, you'll be implementing a function for preprocessing. This time, you'll implement the one_hot_encode function. The input, x, are a list of labels. Implement the function to return the list of labels as One-Hot encoded Numpy array. The possible values for labels are 0 to 9. The one-hot encoding function should return the same encoding for each value between each call to one_hot_encode. Make sure to save the map of encodings outside the function.

Hint: Don't reinvent the wheel.


In [25]:
# import helper ## I did this because sklearn.preprocessing was defined in there
from sklearn import preprocessing  ## from sklearn lib import preprocessing lib/sublib/functionality/class

def one_hot_encode(x):
    """
    One hot encode a list of sample labels. Return a one-hot encoded vector for each label.
    : x: List of sample Labels
    : return: Numpy array of one-hot encoded labels
    """
    # TODO: Implement Function

    ## This was in the helper.py which belongs to the generic helper functions
    #     def display_image_predictions(features, labels, predictions):
    #     n_classes = 10
    #     label_names = _load_label_names()
    #     label_binarizer = LabelBinarizer()
    #     label_binarizer.fit(range(n_classes))
    #     label_ids = label_binarizer.inverse_transform(np.array(labels))
    label_binarizer = preprocessing.LabelBinarizer() ## instantiate and initialized the one-hot encoder from class to one-hot
    n_class = 10 ## total num_classes
    label_binarizer.fit(range(n_class)) ## fit the one-vec to the range of number of classes, 10 in this case (dataset)
    return label_binarizer.transform(x) ## transform the class labels to one-hot vec


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_one_hot_encode(one_hot_encode)


Tests Passed

Preprocess all the data and save it

Running the code cell below will preprocess all the CIFAR-10 data and save it to file. The code below also uses 10% of the training data for validation.


In [26]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(cifar10_dataset_folder_path, normalize, one_hot_encode)

Implementation of CNN with backprop in NumPy


In [27]:
def get_im2col_indices(x_shape, field_height, field_width, padding=1, stride=1):
    # First figure out what the size of the output should be
    N, C, H, W = x_shape
    assert (H + 2 * padding - field_height) % stride == 0
    assert (W + 2 * padding - field_height) % stride == 0
    out_height = int((H + 2 * padding - field_height) / stride + 1)
    out_width = int((W + 2 * padding - field_width) / stride + 1)

    i0 = np.repeat(np.arange(field_height), field_width)
    i0 = np.tile(i0, C)
    i1 = stride * np.repeat(np.arange(out_height), out_width)
    j0 = np.tile(np.arange(field_width), field_height * C)
    j1 = stride * np.tile(np.arange(out_width), out_height)
    i = i0.reshape(-1, 1) + i1.reshape(1, -1)
    j = j0.reshape(-1, 1) + j1.reshape(1, -1)

    k = np.repeat(np.arange(C), field_height * field_width).reshape(-1, 1)

    return (k.astype(int), i.astype(int), j.astype(int))

In [28]:
def im2col_indices(x, field_height, field_width, padding=1, stride=1):
    """ An implementation of im2col based on some fancy indexing """
    # Zero-pad the input
    p = padding
    x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')

    k, i, j = get_im2col_indices(x.shape, field_height, field_width, padding, stride)

    cols = x_padded[:, k, i, j]
    C = x.shape[1]
    cols = cols.transpose(1, 2, 0).reshape(field_height * field_width * C, -1)
    return cols

In [13]:
def col2im_indices(cols, x_shape, field_height=3, field_width=3, padding=1,
                   stride=1):
    """ An implementation of col2im based on fancy indexing and np.add.at """
    N, C, H, W = x_shape
    H_padded, W_padded = H + 2 * padding, W + 2 * padding
    x_padded = np.zeros((N, C, H_padded, W_padded), dtype=cols.dtype)
    k, i, j = get_im2col_indices(x_shape, field_height, field_width, padding, stride)
    cols_reshaped = cols.reshape(C * field_height * field_width, -1, N)
    cols_reshaped = cols_reshaped.transpose(2, 0, 1)
    np.add.at(x_padded, (slice(None), k, i, j), cols_reshaped)
    if padding == 0:
        return x_padded
    return x_padded[:, :, padding:-padding, padding:-padding]

In [29]:
def conv_forward(X, W, b, stride=1, padding=1):
    cache = W, b, stride, padding
    n_filters, d_filter, h_filter, w_filter = W.shape
    n_x, d_x, h_x, w_x = X.shape
    h_out = (h_x - h_filter + 2 * padding) / stride + 1
    w_out = (w_x - w_filter + 2 * padding) / stride + 1

    if not h_out.is_integer() or not w_out.is_integer():
        raise Exception('Invalid output dimension!')

    h_out, w_out = int(h_out), int(w_out)

    X_col = im2col_indices(X, h_filter, w_filter, padding=padding, stride=stride)
    W_col = W.reshape(n_filters, -1)

    out = W_col @ X_col + b
    out = out.reshape(n_filters, h_out, w_out, n_x)
    out = out.transpose(3, 0, 1, 2)

    cache = (X, W, b, stride, padding, X_col)

    return out, cache

In [30]:
def conv_backward(dout, cache):
    X, W, b, stride, padding, X_col = cache
    n_filter, d_filter, h_filter, w_filter = W.shape

    db = np.sum(dout, axis=(0, 2, 3))
    db = db.reshape(n_filter, -1)

    dout_reshaped = dout.transpose(1, 2, 3, 0).reshape(n_filter, -1)
    dW = dout_reshaped @ X_col.T
    dW = dW.reshape(W.shape)

    W_reshape = W.reshape(n_filter, -1)
    dX_col = W_reshape.T @ dout_reshaped
    dX = col2im_indices(dX_col, X.shape, h_filter, w_filter, padding=padding, stride=stride)

    return dX, dW, db

In [31]:
# Now it is time to calculate the error using cross entropy
def cross_entropy(y_pred, y_train):
    m = y_pred.shape[0]

    prob = softmax(y_pred)
    log_like = -np.log(prob[range(m), y_train])

    data_loss = np.sum(log_like) / m
    #     reg_loss = regularization(model, reg_type='l2', lam=lam)

    return data_loss # + reg_loss

def dcross_entropy(y_pred, y_train):
    m = y_pred.shape[0]

    grad_y = softmax(y_pred)
    grad_y[range(m), y_train] -= 1.
    grad_y /= m

    return grad_y

In [32]:
# Softmax and sidmoid are equally based on Bayesian NBC/ Naiive Bayesian Classifer as a probability-based classifier
def softmax(X):
    eX = np.exp((X.T - np.max(X, axis=1)).T)
    return (eX.T / eX.sum(axis=1)).T

def dsoftmax(sX): # derivative of the softmax which is the same as sigmoid as softmax is sigmoid and bayesian function for probabilistic classfication
    # X is the input to the softmax and sX is the sX=softmax(X)
    grad = np.zeros(shape=(len(sX[0]), len(sX[0])))
    
    # Start filling up the gradient
    for i in range(len(sX[0])): # mat_1xn, n=num_claess, 10 in  this case
        for j in range(len(sX[0])):
            if i==j: 
                grad[i, i] = sX[0, i] * (1-sX[0, i])
            else: 
                grad[i, j] = sX[0, j]* -sX[0, i]
    # return the gradient as the derivative of softmax/bwd softmax layer
    return grad

def sigmoid(X):
    return 1. / (1 + np.exp(-X))

def dsigmoid(X):
    return sigmoid(X) * (1-sigmoid(X))

In [33]:
def squared_loss(y_pred, y_train):
    m = y_pred.shape[0]
    data_loss = (0.5/m) * np.sum(y_pred - y_train)**2 # This is now convex error surface x^2 
    return data_loss #+ reg_loss

def dsquared_loss(y_pred, y_train):
    m = y_pred.shape[0]
    grad_y = (y_pred - y_train)/m # f(x)-y is the convex surface for descending/minimizing
    return grad_y

In [34]:
from sklearn.utils import shuffle as sklearn_shuffle

def get_minibatch(X, y, minibatch_size, shuffle=True):
    minibatches = []

    if shuffle:
        X, y = sklearn_shuffle(X, y)

    for i in range(0, X.shape[0], minibatch_size):
        X_mini = X[i:i + minibatch_size]
        y_mini = y[i:i + minibatch_size]

        minibatches.append((X_mini, y_mini))

    return minibatches

Check Point

This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.


In [35]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import pickle
import problem_unittests as tests
import helper

# Load the Preprocessed Validation data
valid_features, valid_labels = pickle.load(open('preprocess_validation.p', mode='rb'))

# # Training cycle
#     for epoch in range(num_):
#         # Loop over all batches
#         n_batches = 5
#         for batch_i in range(1, n_batches + 1):
#             for batch_features, batch_labels in helper.load_preprocess_training_batch(batch_i, batch_size):
#                 train_neural_network(sess, optimizer, keep_probability, batch_features, batch_labels)
#             print('Epoch {:>2}, CIFAR-10 Batch {}:  '.format(epoch + 1, batch_i), end='')
#             print_stats(sess, batch_features, batch_labels, cost, accuracy)

This is where the CNN imllementation in NumPy starts!


In [36]:
# Displaying an image using matplotlib
# importing the library/package
import matplotlib.pyplot as plot

# Using plot with imshow to show the image (N=5000, H=32, W=32, C=3)
plot.imshow(valid_features[0, :, :, :])


Out[36]:
<matplotlib.image.AxesImage at 0x7f404d681080>

In [63]:
test_y_prob = np.array([[0.1, 0.2, 0.1, 0.3, 0.8]]) # Function(Feature)
test_y      = np.array([[0.0, 0.0, 0.0, 0.0, 1.0]]) # Feedback
test_y, test_y.shape, test_y_prob, test_y_prob.shape


Out[63]:
(array([[ 0.,  0.,  0.,  0.,  1.]]),
 (1, 5),
 array([[ 0.1,  0.2,  0.1,  0.3,  0.8]]),
 (1, 5))

In [78]:
np.max(test_y_prob), np.max(test_y_prob*test_y), np.max(test_y_prob[test_y==1]), 
# if test_y_prob[test_y==1.0] == np.max(test_y_prob): print("yes")
# x=1 
# x++
# ((test_y==1.0)==(test_y_prob==np.max(test_y_prob)))
if test_y_prob[test_y==1.0]==test_y_prob[test_y_prob==np.max(test_y_prob)]: print('yes')


yes

In [ ]:
# dataset XY
X=valid_features.transpose(0, 3, 1, 2) # NCHW == mat_txn
Y=valid_labels #NH= num_classes=10 = mat_txn

# Parameters
# Conv layer
h_filter=3
w_filter=3 
c_filter=3
padding=1 
stride=1
num_filters = 20
w1 = np.random.normal(loc=0.0, scale=1.0, size=(num_filters, c_filter, h_filter, w_filter))# NCHW 20x9 x 9x500 = 20x500
w1 *= 1 / (c_filter * h_filter * w_filter) # initializing the w1
b1 = np.zeros(shape=(num_filters, 1), dtype=float)

# FC layer to the output layer -- This is really hard to have a final size for the FC to the output layer
w2 = np.random.normal(loc=0.0, scale=1.0, size=[1, Y[0:1].shape[1]]) # MUST be resized to mat_hxm == shape(FC_layer, output)
b2 = np.zeros(shape=Y[0:1].shape) # number of output nodes/units/neurons are equal to the number of classes

# Hyper-parameters
num_epochs = 100
batch_size = X.shape[0]//10 # minibatching technique with stochasticty/randomness or full bacth
error_list = [] # to display the plot or plot the error curve/ learning rate
accuracy_list = [] # training
# momentum = 1.0 # NOT used
# learning_rate= 1.0 # NOT used

# Training loops for epochs and updating params
for epoch in range(num_epochs): # start=0, stop=num_epochs, step=1

    # Initializing/reseting the gradients of the parameters/params
    dw1 = np.zeros(shape=w1.shape)
    db1 = np.zeros(shape=b1.shape)
    dw2 = np.zeros(shape=w2.shape)
    db2 = np.zeros(shape=b2.shape)
    err = 0 # train
    acc = 0 # train

    # Stochasticity/ Stochastic for random minibatch genration/ randomness
    #     # Shuffling the entire batch for a minibatch
    #     # Stochastic part for randomizing/shuffling through the dataset in every single epoch
    #     minibatches = get_minibatch(X=X, y=Y, minibatch_size=batch_size, shuffle=True)
    #     X_mini, Y_mini = minibatches[0]
    
    # The loop for learning the gradients
    for t in range(batch_size): # start=0, stop=mini_batch_size/batch_size, step=1
        
        # Each input and output sample in the batch/minibatch for dy and error
        x= X[t:t+1] # mat_nxcxhxw
        y= Y[t:t+1] # mat_txm
        
        # Forward pass
        # 1st layer: conv_layer
        h1_in, h1_cache = conv_forward(X=x, W=w1, b=b1, stride=1, padding=1) # wx+b
        h1_out = np.maximum(0.0, h1_in) # ReLU: activation function

        # The 2nd layer: FC layer to the output
        h1_fc = h1_out.reshape(1, -1)
        # Initializing w2 with the size of FC layer output/ flattened output
        if t==0: w2= np.resize(a=w2, new_shape=(h1_fc.shape[1], y.shape[1]))/ h1_fc.shape[1] # mat_hxm = mat_1xh, mat_1xm
        out = h1_fc @ w2
        out += b2        
        y_prob = softmax(X=out) # Multiclass function

        #         # Mean Square Error: Calculate the error one by one sample from the batch -- Euclidean distance
        #         err += 0.5 * (1/ batch_size) * np.sum((y_prob - y)**2) # convex surface ax2+b
        #         dy   =       (1/ batch_size) * (y_prob - y) # convex surface this way # ignoring the constant coefficies
        
        # Mean Cross Entropy (MCE): np.log is np.log(exp(x))=x equals to ln in math
        err += (1/batch_size) * -(np.sum(y* np.log(y_prob))) # mat_1x1==scalar value/variable
        dy   = (1/batch_size) * -(y/y_prob) # lr == (1/batch_size), and mat_1xm
        #dy   *= (batch_size/batch_size) # NOTE: Learning rate (lr) == 1/batch_size as minimum
        
        # Accuracy measurement
#         if (np.max(y_prob)==y_prob[y==1.0]): acc += 1.0 # if (max value in y_preb)==(the y_prob value of 1.0 index in y)
        if y_prob[y==1.0]==y_prob[y_prob==np.max(y_prob)]:
            acc += 1.0
#             print(np.max(y_prob), y_prob[y==1.0])

        # Backward pass
        # output layer/ 2nd layer
        # REMEMBER: softmax output is mat_1xm but dsoftmax output is mat_mxm.
        # REMEMBER: dsoftmax output is symmetric on the main diagnal (i, i), i.e. no need to be transposed.
        dout = dy @ dsoftmax(sX=y_prob) # mat_1xm= mat_1xm @ mat_mxm.T 
        if t==0: dw2 = np.resize(a=dw2, new_shape=w2.shape)
        db2 += dout * 1 # mat_1xh
        dw2 += (dout.T @ h1_fc).T # (mat_1xm.T @ mat_1xh).T= mat_hxm
        dh1_fc = dout @ w2.T # mat_1xh = mat_1xm @ mat_hxm.T 
        
        # 1st layer: conv layer
        dh1_out = dh1_fc.reshape(h1_out.shape)
        dh1_out[h1_out <= 0] = 0 # drelu/ relu_bwd
        dx_conv, dw_conv, db_conv = conv_backward(cache=h1_cache, dout=dh1_out)
        dw1 += dw_conv
        db1 += db_conv

    # Updating the parameters using the gradients for descending (gradient descent)
    w1 -= dw1
    b1 -= db1
    w2 -= dw2
    b2 -= db2

    # Printing out the total batch/minbatch error in each epoch
    print("Epoch:", epoch, "Error:", err, "Accuracy", acc)
    error_list.append(err)
    accuracy_list.append(acc)

# Ploting the error list for the learning curve/convergence
plot.plot(error_list)
plot.plot(accuracy_list)

In [72]:
# PReLU for batch_size=X//10 (mini batch), epochs=100, Error function=MCE, and m2_pos as the parameter in ReLU
plot.plot(error_list)


Out[72]:
[<matplotlib.lines.Line2D at 0x7fe2d3b5fb70>]

In [66]:
# PReLU for batch_size=X//1 (full batch), epochs=100, Error function=MCE, and m2_pos as the parameter in ReLU
plot.plot(error_list)


Out[66]:
[<matplotlib.lines.Line2D at 0x7fe2d3e04da0>]

In [63]:
# PReLU for batch_size=X//10, epochs=100, Error function=MCE, and m2_pos as the parameter in ReLU
plot.plot(error_list)


Out[63]:
[<matplotlib.lines.Line2D at 0x7fe2d3f7c550>]

In [82]:
plot.plot(error_list)


Out[82]:
[<matplotlib.lines.Line2D at 0x7f25a39877b8>]

In [39]:
# Applying PLU to batch gradient descent on validation batch/full batch.
plot.plot(error_list)


Out[39]:
[<matplotlib.lines.Line2D at 0x7f25a397cc88>]

In [37]:
# Learning curve for Batch Gradient Descent (BGD) with convnet using PLU (Parametric Linear Units).
# Uni PLU in this case which equals to mx and it completely linear.
# This is one is for the batch size of 1/10 of the total batch, batch_size = len(X)/10
# The batch used her is the validation batch
plot.plot(error_list)


Out[37]:
[<matplotlib.lines.Line2D at 0x7f25a3ac4550>]

In [33]:
# Learning curve for dy = 6 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)


Out[33]:
[<matplotlib.lines.Line2D at 0x7f25a3cb5ba8>]

In [31]:
# Learning curve for dy = 5 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)


Out[31]:
[<matplotlib.lines.Line2D at 0x7f25a3e05748>]

In [29]:
# Learning curve for dy = 4 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)


Out[29]:
[<matplotlib.lines.Line2D at 0x7f25a3f52da0>]

In [27]:
# Learning curve for dy = 3 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)


Out[27]:
[<matplotlib.lines.Line2D at 0x7f25a9f44780>]

In [20]:
# Learning curve for dy = 2 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)


Out[20]:
[<matplotlib.lines.Line2D at 0x7f25a448d278>]

In [18]:
# Learning curve for the validation set with MCE/Mean Cross Entropy
# dy = (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)


Out[18]:
[<matplotlib.lines.Line2D at 0x7f25a44fbe48>]

In [83]:
# dy = (1/batch_size) * -(y/ y_prob) for entire validation set not 1/10 of it.
plot.plot(error_list_MCE)


Out[83]:
[<matplotlib.lines.Line2D at 0x11bb93780>]

In [86]:
# dy = (1/batch_size) * (y_prob-y) for entire validation set not 1/10
plot.plot(error_list_MSE)


Out[86]:
[<matplotlib.lines.Line2D at 0x115a9ef98>]