In this project, you'll classify images from the CIFAR-10 dataset. The dataset consists of airplanes, dogs, cats, and other objects. You'll preprocess the images, then train a convolutional neural network on all the samples. The images need to be normalized and the labels need to be one-hot encoded. You'll get to apply what you learned and build a convolutional, max pooling, dropout, and fully connected layers. At the end, you'll get to see your neural network's predictions on the sample images.
Run the following cell to download the CIFAR-10 dataset for python.
In [22]:
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
from urllib.request import urlretrieve
from os.path import isfile, isdir
from tqdm import tqdm
import problem_unittests as tests
import tarfile
cifar10_dataset_folder_path = 'cifar-10-batches-py'
class DLProgress(tqdm):
last_block = 0
def hook(self, block_num=1, block_size=1, total_size=None):
self.total = total_size
self.update((block_num - self.last_block) * block_size)
self.last_block = block_num
if not isfile('cifar-10-python.tar.gz'):
with DLProgress(unit='B', unit_scale=True, miniters=1, desc='CIFAR-10 Dataset') as pbar:
urlretrieve(
'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz',
'cifar-10-python.tar.gz',
pbar.hook)
if not isdir(cifar10_dataset_folder_path):
with tarfile.open('cifar-10-python.tar.gz') as tar:
tar.extractall()
tar.close()
tests.test_folder_path(cifar10_dataset_folder_path)
The dataset is broken into batches to prevent your machine from running out of memory. The CIFAR-10 dataset consists of 5 batches, named data_batch_1, data_batch_2, etc.. Each batch contains the labels and images that are one of the following:
truck 10
Total 10 classes (Aras changed above/this section a bit)
Understanding a dataset is part of making predictions on the data. Play around with the code cell below by changing the batch_id and sample_id. The batch_id is the id for a batch (1-5). The sample_id is the id for a image and label pair in the batch.
Ask yourself "What are all possible labels?", "What is the range of values for the image data?", "Are the labels in order or random?". Answers to questions like these will help you preprocess the data and end up with better predictions.
In [23]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import helper
import numpy as np
# Explore the dataset
batch_id = 1
sample_id = 5
helper.display_stats(cifar10_dataset_folder_path, batch_id, sample_id)
In [24]:
def normalize(x):
"""
Normalize a list of sample image data in the range of 0 to 1
: x: List of image data. The image shape is (32, 32, 3)
: return: Numpy array of normalize data
"""
# TODO: Implement Function
## image data shape = [t, i,j,k], t= num_img_per_batch (basically the list of images), i,j,k=height,width, and depth/channel
return x/255
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_normalize(normalize)
Just like the previous code cell, you'll be implementing a function for preprocessing. This time, you'll implement the one_hot_encode function. The input, x, are a list of labels. Implement the function to return the list of labels as One-Hot encoded Numpy array. The possible values for labels are 0 to 9. The one-hot encoding function should return the same encoding for each value between each call to one_hot_encode. Make sure to save the map of encodings outside the function.
Hint: Don't reinvent the wheel.
In [25]:
# import helper ## I did this because sklearn.preprocessing was defined in there
from sklearn import preprocessing ## from sklearn lib import preprocessing lib/sublib/functionality/class
def one_hot_encode(x):
"""
One hot encode a list of sample labels. Return a one-hot encoded vector for each label.
: x: List of sample Labels
: return: Numpy array of one-hot encoded labels
"""
# TODO: Implement Function
## This was in the helper.py which belongs to the generic helper functions
# def display_image_predictions(features, labels, predictions):
# n_classes = 10
# label_names = _load_label_names()
# label_binarizer = LabelBinarizer()
# label_binarizer.fit(range(n_classes))
# label_ids = label_binarizer.inverse_transform(np.array(labels))
label_binarizer = preprocessing.LabelBinarizer() ## instantiate and initialized the one-hot encoder from class to one-hot
n_class = 10 ## total num_classes
label_binarizer.fit(range(n_class)) ## fit the one-vec to the range of number of classes, 10 in this case (dataset)
return label_binarizer.transform(x) ## transform the class labels to one-hot vec
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_one_hot_encode(one_hot_encode)
In [26]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(cifar10_dataset_folder_path, normalize, one_hot_encode)
In [27]:
def get_im2col_indices(x_shape, field_height, field_width, padding=1, stride=1):
# First figure out what the size of the output should be
N, C, H, W = x_shape
assert (H + 2 * padding - field_height) % stride == 0
assert (W + 2 * padding - field_height) % stride == 0
out_height = int((H + 2 * padding - field_height) / stride + 1)
out_width = int((W + 2 * padding - field_width) / stride + 1)
i0 = np.repeat(np.arange(field_height), field_width)
i0 = np.tile(i0, C)
i1 = stride * np.repeat(np.arange(out_height), out_width)
j0 = np.tile(np.arange(field_width), field_height * C)
j1 = stride * np.tile(np.arange(out_width), out_height)
i = i0.reshape(-1, 1) + i1.reshape(1, -1)
j = j0.reshape(-1, 1) + j1.reshape(1, -1)
k = np.repeat(np.arange(C), field_height * field_width).reshape(-1, 1)
return (k.astype(int), i.astype(int), j.astype(int))
In [28]:
def im2col_indices(x, field_height, field_width, padding=1, stride=1):
""" An implementation of im2col based on some fancy indexing """
# Zero-pad the input
p = padding
x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
k, i, j = get_im2col_indices(x.shape, field_height, field_width, padding, stride)
cols = x_padded[:, k, i, j]
C = x.shape[1]
cols = cols.transpose(1, 2, 0).reshape(field_height * field_width * C, -1)
return cols
In [13]:
def col2im_indices(cols, x_shape, field_height=3, field_width=3, padding=1,
stride=1):
""" An implementation of col2im based on fancy indexing and np.add.at """
N, C, H, W = x_shape
H_padded, W_padded = H + 2 * padding, W + 2 * padding
x_padded = np.zeros((N, C, H_padded, W_padded), dtype=cols.dtype)
k, i, j = get_im2col_indices(x_shape, field_height, field_width, padding, stride)
cols_reshaped = cols.reshape(C * field_height * field_width, -1, N)
cols_reshaped = cols_reshaped.transpose(2, 0, 1)
np.add.at(x_padded, (slice(None), k, i, j), cols_reshaped)
if padding == 0:
return x_padded
return x_padded[:, :, padding:-padding, padding:-padding]
In [29]:
def conv_forward(X, W, b, stride=1, padding=1):
cache = W, b, stride, padding
n_filters, d_filter, h_filter, w_filter = W.shape
n_x, d_x, h_x, w_x = X.shape
h_out = (h_x - h_filter + 2 * padding) / stride + 1
w_out = (w_x - w_filter + 2 * padding) / stride + 1
if not h_out.is_integer() or not w_out.is_integer():
raise Exception('Invalid output dimension!')
h_out, w_out = int(h_out), int(w_out)
X_col = im2col_indices(X, h_filter, w_filter, padding=padding, stride=stride)
W_col = W.reshape(n_filters, -1)
out = W_col @ X_col + b
out = out.reshape(n_filters, h_out, w_out, n_x)
out = out.transpose(3, 0, 1, 2)
cache = (X, W, b, stride, padding, X_col)
return out, cache
In [30]:
def conv_backward(dout, cache):
X, W, b, stride, padding, X_col = cache
n_filter, d_filter, h_filter, w_filter = W.shape
db = np.sum(dout, axis=(0, 2, 3))
db = db.reshape(n_filter, -1)
dout_reshaped = dout.transpose(1, 2, 3, 0).reshape(n_filter, -1)
dW = dout_reshaped @ X_col.T
dW = dW.reshape(W.shape)
W_reshape = W.reshape(n_filter, -1)
dX_col = W_reshape.T @ dout_reshaped
dX = col2im_indices(dX_col, X.shape, h_filter, w_filter, padding=padding, stride=stride)
return dX, dW, db
In [31]:
# Now it is time to calculate the error using cross entropy
def cross_entropy(y_pred, y_train):
m = y_pred.shape[0]
prob = softmax(y_pred)
log_like = -np.log(prob[range(m), y_train])
data_loss = np.sum(log_like) / m
# reg_loss = regularization(model, reg_type='l2', lam=lam)
return data_loss # + reg_loss
def dcross_entropy(y_pred, y_train):
m = y_pred.shape[0]
grad_y = softmax(y_pred)
grad_y[range(m), y_train] -= 1.
grad_y /= m
return grad_y
In [32]:
# Softmax and sidmoid are equally based on Bayesian NBC/ Naiive Bayesian Classifer as a probability-based classifier
def softmax(X):
eX = np.exp((X.T - np.max(X, axis=1)).T)
return (eX.T / eX.sum(axis=1)).T
def dsoftmax(sX): # derivative of the softmax which is the same as sigmoid as softmax is sigmoid and bayesian function for probabilistic classfication
# X is the input to the softmax and sX is the sX=softmax(X)
grad = np.zeros(shape=(len(sX[0]), len(sX[0])))
# Start filling up the gradient
for i in range(len(sX[0])): # mat_1xn, n=num_claess, 10 in this case
for j in range(len(sX[0])):
if i==j:
grad[i, i] = sX[0, i] * (1-sX[0, i])
else:
grad[i, j] = sX[0, j]* -sX[0, i]
# return the gradient as the derivative of softmax/bwd softmax layer
return grad
def sigmoid(X):
return 1. / (1 + np.exp(-X))
def dsigmoid(X):
return sigmoid(X) * (1-sigmoid(X))
In [33]:
def squared_loss(y_pred, y_train):
m = y_pred.shape[0]
data_loss = (0.5/m) * np.sum(y_pred - y_train)**2 # This is now convex error surface x^2
return data_loss #+ reg_loss
def dsquared_loss(y_pred, y_train):
m = y_pred.shape[0]
grad_y = (y_pred - y_train)/m # f(x)-y is the convex surface for descending/minimizing
return grad_y
In [34]:
from sklearn.utils import shuffle as sklearn_shuffle
def get_minibatch(X, y, minibatch_size, shuffle=True):
minibatches = []
if shuffle:
X, y = sklearn_shuffle(X, y)
for i in range(0, X.shape[0], minibatch_size):
X_mini = X[i:i + minibatch_size]
y_mini = y[i:i + minibatch_size]
minibatches.append((X_mini, y_mini))
return minibatches
In [35]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import pickle
import problem_unittests as tests
import helper
# Load the Preprocessed Validation data
valid_features, valid_labels = pickle.load(open('preprocess_validation.p', mode='rb'))
# # Training cycle
# for epoch in range(num_):
# # Loop over all batches
# n_batches = 5
# for batch_i in range(1, n_batches + 1):
# for batch_features, batch_labels in helper.load_preprocess_training_batch(batch_i, batch_size):
# train_neural_network(sess, optimizer, keep_probability, batch_features, batch_labels)
# print('Epoch {:>2}, CIFAR-10 Batch {}: '.format(epoch + 1, batch_i), end='')
# print_stats(sess, batch_features, batch_labels, cost, accuracy)
In [36]:
# Displaying an image using matplotlib
# importing the library/package
import matplotlib.pyplot as plot
# Using plot with imshow to show the image (N=5000, H=32, W=32, C=3)
plot.imshow(valid_features[0, :, :, :])
Out[36]:
In [63]:
test_y_prob = np.array([[0.1, 0.2, 0.1, 0.3, 0.8]]) # Function(Feature)
test_y = np.array([[0.0, 0.0, 0.0, 0.0, 1.0]]) # Feedback
test_y, test_y.shape, test_y_prob, test_y_prob.shape
Out[63]:
In [78]:
np.max(test_y_prob), np.max(test_y_prob*test_y), np.max(test_y_prob[test_y==1]),
# if test_y_prob[test_y==1.0] == np.max(test_y_prob): print("yes")
# x=1
# x++
# ((test_y==1.0)==(test_y_prob==np.max(test_y_prob)))
if test_y_prob[test_y==1.0]==test_y_prob[test_y_prob==np.max(test_y_prob)]: print('yes')
In [ ]:
# dataset XY
X=valid_features.transpose(0, 3, 1, 2) # NCHW == mat_txn
Y=valid_labels #NH= num_classes=10 = mat_txn
# Parameters
# Conv layer
h_filter=3
w_filter=3
c_filter=3
padding=1
stride=1
num_filters = 20
w1 = np.random.normal(loc=0.0, scale=1.0, size=(num_filters, c_filter, h_filter, w_filter))# NCHW 20x9 x 9x500 = 20x500
w1 *= 1 / (c_filter * h_filter * w_filter) # initializing the w1
b1 = np.zeros(shape=(num_filters, 1), dtype=float)
# FC layer to the output layer -- This is really hard to have a final size for the FC to the output layer
w2 = np.random.normal(loc=0.0, scale=1.0, size=[1, Y[0:1].shape[1]]) # MUST be resized to mat_hxm == shape(FC_layer, output)
b2 = np.zeros(shape=Y[0:1].shape) # number of output nodes/units/neurons are equal to the number of classes
# Hyper-parameters
num_epochs = 100
batch_size = X.shape[0]//10 # minibatching technique with stochasticty/randomness or full bacth
error_list = [] # to display the plot or plot the error curve/ learning rate
accuracy_list = [] # training
# momentum = 1.0 # NOT used
# learning_rate= 1.0 # NOT used
# Training loops for epochs and updating params
for epoch in range(num_epochs): # start=0, stop=num_epochs, step=1
# Initializing/reseting the gradients of the parameters/params
dw1 = np.zeros(shape=w1.shape)
db1 = np.zeros(shape=b1.shape)
dw2 = np.zeros(shape=w2.shape)
db2 = np.zeros(shape=b2.shape)
err = 0 # train
acc = 0 # train
# Stochasticity/ Stochastic for random minibatch genration/ randomness
# # Shuffling the entire batch for a minibatch
# # Stochastic part for randomizing/shuffling through the dataset in every single epoch
# minibatches = get_minibatch(X=X, y=Y, minibatch_size=batch_size, shuffle=True)
# X_mini, Y_mini = minibatches[0]
# The loop for learning the gradients
for t in range(batch_size): # start=0, stop=mini_batch_size/batch_size, step=1
# Each input and output sample in the batch/minibatch for dy and error
x= X[t:t+1] # mat_nxcxhxw
y= Y[t:t+1] # mat_txm
# Forward pass
# 1st layer: conv_layer
h1_in, h1_cache = conv_forward(X=x, W=w1, b=b1, stride=1, padding=1) # wx+b
h1_out = np.maximum(0.0, h1_in) # ReLU: activation function
# The 2nd layer: FC layer to the output
h1_fc = h1_out.reshape(1, -1)
# Initializing w2 with the size of FC layer output/ flattened output
if t==0: w2= np.resize(a=w2, new_shape=(h1_fc.shape[1], y.shape[1]))/ h1_fc.shape[1] # mat_hxm = mat_1xh, mat_1xm
out = h1_fc @ w2
out += b2
y_prob = softmax(X=out) # Multiclass function
# # Mean Square Error: Calculate the error one by one sample from the batch -- Euclidean distance
# err += 0.5 * (1/ batch_size) * np.sum((y_prob - y)**2) # convex surface ax2+b
# dy = (1/ batch_size) * (y_prob - y) # convex surface this way # ignoring the constant coefficies
# Mean Cross Entropy (MCE): np.log is np.log(exp(x))=x equals to ln in math
err += (1/batch_size) * -(np.sum(y* np.log(y_prob))) # mat_1x1==scalar value/variable
dy = (1/batch_size) * -(y/y_prob) # lr == (1/batch_size), and mat_1xm
#dy *= (batch_size/batch_size) # NOTE: Learning rate (lr) == 1/batch_size as minimum
# Accuracy measurement
# if (np.max(y_prob)==y_prob[y==1.0]): acc += 1.0 # if (max value in y_preb)==(the y_prob value of 1.0 index in y)
if y_prob[y==1.0]==y_prob[y_prob==np.max(y_prob)]:
acc += 1.0
# print(np.max(y_prob), y_prob[y==1.0])
# Backward pass
# output layer/ 2nd layer
# REMEMBER: softmax output is mat_1xm but dsoftmax output is mat_mxm.
# REMEMBER: dsoftmax output is symmetric on the main diagnal (i, i), i.e. no need to be transposed.
dout = dy @ dsoftmax(sX=y_prob) # mat_1xm= mat_1xm @ mat_mxm.T
if t==0: dw2 = np.resize(a=dw2, new_shape=w2.shape)
db2 += dout * 1 # mat_1xh
dw2 += (dout.T @ h1_fc).T # (mat_1xm.T @ mat_1xh).T= mat_hxm
dh1_fc = dout @ w2.T # mat_1xh = mat_1xm @ mat_hxm.T
# 1st layer: conv layer
dh1_out = dh1_fc.reshape(h1_out.shape)
dh1_out[h1_out <= 0] = 0 # drelu/ relu_bwd
dx_conv, dw_conv, db_conv = conv_backward(cache=h1_cache, dout=dh1_out)
dw1 += dw_conv
db1 += db_conv
# Updating the parameters using the gradients for descending (gradient descent)
w1 -= dw1
b1 -= db1
w2 -= dw2
b2 -= db2
# Printing out the total batch/minbatch error in each epoch
print("Epoch:", epoch, "Error:", err, "Accuracy", acc)
error_list.append(err)
accuracy_list.append(acc)
# Ploting the error list for the learning curve/convergence
plot.plot(error_list)
plot.plot(accuracy_list)
In [72]:
# PReLU for batch_size=X//10 (mini batch), epochs=100, Error function=MCE, and m2_pos as the parameter in ReLU
plot.plot(error_list)
Out[72]:
In [66]:
# PReLU for batch_size=X//1 (full batch), epochs=100, Error function=MCE, and m2_pos as the parameter in ReLU
plot.plot(error_list)
Out[66]:
In [63]:
# PReLU for batch_size=X//10, epochs=100, Error function=MCE, and m2_pos as the parameter in ReLU
plot.plot(error_list)
Out[63]:
In [82]:
plot.plot(error_list)
Out[82]:
In [39]:
# Applying PLU to batch gradient descent on validation batch/full batch.
plot.plot(error_list)
Out[39]:
In [37]:
# Learning curve for Batch Gradient Descent (BGD) with convnet using PLU (Parametric Linear Units).
# Uni PLU in this case which equals to mx and it completely linear.
# This is one is for the batch size of 1/10 of the total batch, batch_size = len(X)/10
# The batch used her is the validation batch
plot.plot(error_list)
Out[37]:
In [33]:
# Learning curve for dy = 6 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)
Out[33]:
In [31]:
# Learning curve for dy = 5 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)
Out[31]:
In [29]:
# Learning curve for dy = 4 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)
Out[29]:
In [27]:
# Learning curve for dy = 3 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)
Out[27]:
In [20]:
# Learning curve for dy = 2 * (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)
Out[20]:
In [18]:
# Learning curve for the validation set with MCE/Mean Cross Entropy
# dy = (1/batch_size) * -(y/ y_prob)
plot.plot(error_list)
Out[18]:
In [83]:
# dy = (1/batch_size) * -(y/ y_prob) for entire validation set not 1/10 of it.
plot.plot(error_list_MCE)
Out[83]:
In [86]:
# dy = (1/batch_size) * (y_prob-y) for entire validation set not 1/10
plot.plot(error_list_MSE)
Out[86]: