## Softmax regression

### Building a softmax regression classifier to solve multi-classification CIFAR-10 dataset

• implement a fully-vectorized loss function for the Softmax Regression
• implement the fully-vectorized expression for its analytic gradient
• check implementation using numerical gradient
• use a validation set to tune the learning rate and regularization strength
• optimize the loss function with Batch Gradient Descent and Stochastic Gradient Descent
``````

In [1]:

# Setup code for this notebook
import random
import numpy as np
import matplotlib.pyplot as plt

# This is a bit of magic gto make matplotlib figures appear inline
# in the notebook rather than in a new window
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# Some more magic so that the notebook will reload external python modules;

``````

get_datasets.sh

```# Get CIFAR10
wget http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar -xzvf cifar-10-python.tar.gz
rm cifar-10-python.tar.gz
```

``````

In [2]:

### Write function to load data in data_utils.py
# Write function to load the cifar-10 data
# The original code is from http://cs231n.github.io/assignment1/
# The function is in data_utils.py file for reusing.
import cPickle as pickle
import numpy as np
import os

""" load single batch of cifar """
with open(filename, 'r') as f:
X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
Y = np.array(Y)
return X, Y

""" load all of cifar """
xs = []
ys = []
for b in range(1,6):
f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
xs.append(X)
ys.append(Y)
Xtr = np.concatenate(xs)
Ytr = np.concatenate(ys)
del X, Y
return Xtr, Ytr, Xte, Yte

``````

## Load data and visualize samples

``````

In [3]:

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

def get_CIFAR10_data(num_training=49000, num_val=1000, num_test=10000, show_sample=True):
"""
Load the CIFAR-10 dataset, and divide the sample into training set, validation set and test set
"""

cifar10_dir = 'datasets/datasets-cifar-10/cifar-10-batches-py/'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# subsample the data for validation set
mask = xrange(num_training, num_training + num_val)
return X_train, y_train, X_val, y_val, X_test, y_test

def visualize_sample(X_train, y_train, classes, samples_per_class=7):
"""visualize some samples in the training datasets """
num_classes = len(classes)
for y, cls in enumerate(classes):
idxs = np.flatnonzero(y_train == y) # get all the indexes of cls
idxs = np.random.choice(idxs, samples_per_class, replace=False)
for i, idx in enumerate(idxs): # plot the image one by one
plt_idx = i * num_classes + y + 1 # i*num_classes and y+1 determine the row and column respectively
plt.subplot(samples_per_class, num_classes, plt_idx)
plt.imshow(X_train[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls)
plt.show()

def preprocessing_CIFAR10_data(X_train, y_train, X_val, y_val, X_test, y_test):

# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1)) # [49000, 3072]
X_val = np.reshape(X_val, (X_val.shape[0], -1)) # [1000, 3072]
X_test = np.reshape(X_test, (X_test.shape[0], -1)) # [10000, 3072]

# Normalize the data: subtract the mean image
mean_image = np.mean(X_train, axis = 0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image

# Add bias dimension and transform into columns
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))]).T
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))]).T
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))]).T
return X_train, y_train, X_val, y_val, X_test, y_test

# Invoke the above functions to get our data
X_train_raw, y_train_raw, X_val_raw, y_val_raw, X_test_raw, y_test_raw = get_CIFAR10_data()
visualize_sample(X_train_raw, y_train_raw, classes)
X_train, y_train, X_val, y_val, X_test, y_test = preprocessing_CIFAR10_data(X_train_raw, y_train_raw, X_val_raw, y_val_raw, X_test_raw, y_test_raw)

# As a sanity check, we print out th size of the training and test data dimenstion
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape

``````
``````

Train data shape:  (3073, 49000)
Train labels shape:  (49000,)
Validation data shape:  (3073, 1000)
Validation labels shape:  (1000,)
Test data shape:  (3073, 10000)
Test labels shape:  (10000,)

``````

## Softmax Regression Classifier

The code is running in the backend, you can find it here, or github

``````

In [4]:

# Test the loss and gradient and compare between two implementations
import time

# generate a rand weights W
W = np.random.randn(10, X_train.shape[0]) * 0.001
tic = time.time()
toc = time.time()
print 'Naive loss: %f, and gradient: computed in %fs' % (loss_naive, toc - tic)

tic = time.time()
toc = time.time()
print 'Vectorized loss: %f, and gradient: computed in %fs' % (loss_vec, toc - tic)

# Compare the gradient, because the gradient is a vector, we canuse the Frobenius norm to compare them
# the Frobenius norm of two matrices is the square root of the squared sum of differences of all elements
# Randomly choose some gradient to check
idxs = np.random.choice(X_train.shape[0], 10, replace=False)
print idxs
print 'Gradient difference between naive and vectorized version is: %f' % diff

``````
``````

Naive loss: 5.101850, and gradient: computed in 5.970861s
Vectorized loss: 5.101850, and gradient: computed in 0.578520s
[  58 1180 1066  114 3041  684 2871  732 2226 2493]
[ 0.96299059 -0.29548397  1.40331858  1.55453912  2.60552279  1.02290388
3.48212671  0.08947331  2.56720662  3.1420192 ]
[ 0.96299059 -0.29548397  1.40331858  1.55453912  2.60552279  1.02290388
3.48212671  0.08947331  2.56720662  3.1420192 ]
Gradient difference between naive and vectorized version is: 0.000000

``````

## Write function to compute gradient numerically to test the analytic gradient

And put the function in algorithms/gradient_check.py file for future use

``````

In [5]:

"""
sample a few random elements and only return numerical
in this dimensions.
"""
h = 1e-5

print x.shape

for i in xrange(num_checks):
ix = tuple([randrange(m) for m in x.shape])
print ix
x[ix] += h # increment by h
fxph = f(x) # evaluate f(x + h)
x[ix] -= 2 * h # increment by h
fxmh = f(x) # evaluate f(x - h)
x[ix] += h # reset

grad_numerical = (fxph - fxmh) / (2 * h)
print 'numerical: %f analytic: %f, relative error: %e' % (grad_numerical, grad_analytic, rel_error)

``````
``````

In [6]:

f = lambda w: loss_grad_softmax_vectorized(w, X_train, y_train, 0)[0]

``````
``````

(10, 3073)
(7, 375)
numerical: -3.916883 analytic: -3.916883, relative error: 2.703636e-08
(8, 2263)
numerical: -2.292656 analytic: -2.292656, relative error: 1.424316e-08
(3, 1720)
numerical: 0.717328 analytic: 0.717328, relative error: 3.519292e-08
(5, 2354)
numerical: -3.118907 analytic: -3.118907, relative error: 1.349971e-08
(2, 382)
numerical: 0.168714 analytic: 0.168714, relative error: 1.563022e-07
(7, 2426)
numerical: -0.016456 analytic: -0.016456, relative error: 1.111633e-06
(7, 406)
numerical: -4.598401 analytic: -4.598401, relative error: 1.310813e-08
(2, 1215)
numerical: 0.541616 analytic: 0.541616, relative error: 1.827953e-07
(2, 543)
numerical: 0.990534 analytic: 0.990534, relative error: 6.933231e-09
(2, 3054)
numerical: -0.067773 analytic: -0.067773, relative error: 1.851798e-06

``````
``````

In [7]:

# Training softmax regression classifier using SGD and BGD
from algorithms.classifiers import Softmax

# # using BGD algorithm
# softmax_bgd = Softmax()
# tic = time.time()
# losses_bgd = softmax_bgd.train(X_train, y_train, method='bgd', batch_size=200, learning_rate=1e-6,
#               reg = 1e2, num_iters=1000, verbose=True, vectorized=True)
# toc = time.time()
# print 'Traning time for BGD with vectorized version is %f \n' % (toc - tic)

# # Compute the accuracy of training data and validation data using Softmax.predict function
# y_train_pred_bgd = softmax_bgd.predict(X_train)[0]
# print 'Training accuracy: %f' % (np.mean(y_train == y_train_pred_bgd))
# y_val_pred_bgd = softmax_bgd.predict(X_val)[0]
# print 'Validation accuracy: %f' % (np.mean(y_val == y_val_pred_bgd))

# # using SGD algorithm
softmax_sgd = Softmax()
tic = time.time()
losses_sgd = softmax_sgd.train(X_train, y_train, method='sgd', batch_size=200, learning_rate=1e-6,
reg = 1e5, num_iters=1000, verbose=True, vectorized=True)
toc = time.time()
print 'Traning time for SGD with vectorized version is %f \n' % (toc - tic)

y_train_pred_sgd = softmax_sgd.predict(X_train)[0]
print 'Training accuracy: %f' % (np.mean(y_train == y_train_pred_sgd))
y_val_pred_sgd = softmax_sgd.predict(X_val)[0]
print 'Validation accuracy: %f' % (np.mean(y_val == y_val_pred_sgd))

``````
``````

iteration 0/1000: loss 1531.789596
iteration 100/1000: loss 2.149690
iteration 200/1000: loss 2.154195
iteration 300/1000: loss 2.105349
iteration 400/1000: loss 2.207055
iteration 500/1000: loss 2.178179
iteration 600/1000: loss 2.174367
iteration 700/1000: loss 2.122863
iteration 800/1000: loss 2.151371
iteration 900/1000: loss 2.132403
Traning time for SGD with vectorized version is 10.162502

Training accuracy: 0.309286
Validation accuracy: 0.327000

``````
``````

In [8]:

from ggplot import *
qplot(xrange(len(losses_sgd)), losses_sgd) + labs(x='Iteration number', y='SGD Loss value')

``````
``````

Out[8]:

<ggplot: (398142557)>

``````
``````

In [9]:

# Using validation set to tuen hyperparameters, i.e., learning rate and regularization strength
learning_rates = [1e-5, 1e-8]
regularization_strengths = [10e2, 10e4]

# Result is a dictionary mapping tuples of the form (learning_rate, regularization_strength)
# to tuples of the form (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1
best_softmax = None
# Choose the best hyperparameters by tuning on the validation set
i = 0
interval = 5
for learning_rate in np.linspace(learning_rates[0], learning_rates[1], num=interval):
i += 1
print 'The current iteration is %d/%d' % (i, interval)
for reg in np.linspace(regularization_strengths[0], regularization_strengths[1], num=interval):
softmax = Softmax()
softmax.train(X_train, y_train, method='sgd', batch_size=200, learning_rate=learning_rate,
reg = reg, num_iters=1000, verbose=False, vectorized=True)
y_train_pred = softmax.predict(X_train)[0]
y_val_pred = softmax.predict(X_val)[0]
train_accuracy = np.mean(y_train == y_train_pred)
val_accuracy = np.mean(y_val == y_val_pred)
results[(learning_rate, reg)] = (train_accuracy, val_accuracy)
if val_accuracy > best_val:
best_val = val_accuracy
best_softmax = softmax
else:
pass

# Print out the results
for learning_rate, reg in sorted(results):
train_accuracy,val_accuracy = results[(learning_rate, reg)]
print 'learning rate %e and regularization %e, \n \
the training accuracy is: %f and validation accuracy is: %f.\n' % (learning_rate, reg, train_accuracy, val_accuracy)

``````
``````

The current iteration is 1/5
The current iteration is 2/5
The current iteration is 3/5
The current iteration is 4/5
The current iteration is 5/5
learning rate 1.000000e-08 and regularization 1.000000e+03,
the training accuracy is: 0.134388 and validation accuracy is: 0.123000.

learning rate 1.000000e-08 and regularization 2.575000e+04,
the training accuracy is: 0.138265 and validation accuracy is: 0.129000.

learning rate 1.000000e-08 and regularization 5.050000e+04,
the training accuracy is: 0.159163 and validation accuracy is: 0.162000.

learning rate 1.000000e-08 and regularization 7.525000e+04,
the training accuracy is: 0.166184 and validation accuracy is: 0.163000.

learning rate 1.000000e-08 and regularization 1.000000e+05,
the training accuracy is: 0.160490 and validation accuracy is: 0.164000.

learning rate 2.507500e-06 and regularization 1.000000e+03,
the training accuracy is: 0.395959 and validation accuracy is: 0.395000.

learning rate 2.507500e-06 and regularization 2.575000e+04,
the training accuracy is: 0.323469 and validation accuracy is: 0.331000.

learning rate 2.507500e-06 and regularization 5.050000e+04,
the training accuracy is: 0.267694 and validation accuracy is: 0.271000.

learning rate 2.507500e-06 and regularization 7.525000e+04,
the training accuracy is: 0.275918 and validation accuracy is: 0.289000.

learning rate 2.507500e-06 and regularization 1.000000e+05,
the training accuracy is: 0.259388 and validation accuracy is: 0.266000.

learning rate 5.005000e-06 and regularization 1.000000e+03,
the training accuracy is: 0.375796 and validation accuracy is: 0.379000.

learning rate 5.005000e-06 and regularization 2.575000e+04,
the training accuracy is: 0.232367 and validation accuracy is: 0.228000.

learning rate 5.005000e-06 and regularization 5.050000e+04,
the training accuracy is: 0.208449 and validation accuracy is: 0.221000.

learning rate 5.005000e-06 and regularization 7.525000e+04,
the training accuracy is: 0.184367 and validation accuracy is: 0.195000.

learning rate 5.005000e-06 and regularization 1.000000e+05,
the training accuracy is: 0.154653 and validation accuracy is: 0.146000.

learning rate 7.502500e-06 and regularization 1.000000e+03,
the training accuracy is: 0.299490 and validation accuracy is: 0.300000.

learning rate 7.502500e-06 and regularization 2.575000e+04,
the training accuracy is: 0.185714 and validation accuracy is: 0.202000.

learning rate 7.502500e-06 and regularization 5.050000e+04,
the training accuracy is: 0.189306 and validation accuracy is: 0.181000.

learning rate 7.502500e-06 and regularization 7.525000e+04,
the training accuracy is: 0.174510 and validation accuracy is: 0.178000.

learning rate 7.502500e-06 and regularization 1.000000e+05,
the training accuracy is: 0.109286 and validation accuracy is: 0.127000.

learning rate 1.000000e-05 and regularization 1.000000e+03,
the training accuracy is: 0.302653 and validation accuracy is: 0.313000.

learning rate 1.000000e-05 and regularization 2.575000e+04,
the training accuracy is: 0.146429 and validation accuracy is: 0.131000.

learning rate 1.000000e-05 and regularization 5.050000e+04,
the training accuracy is: 0.104224 and validation accuracy is: 0.098000.

learning rate 1.000000e-05 and regularization 7.525000e+04,
the training accuracy is: 0.095714 and validation accuracy is: 0.101000.

learning rate 1.000000e-05 and regularization 1.000000e+05,
the training accuracy is: 0.103265 and validation accuracy is: 0.110000.

``````

## Test best logistic classifier on test datasets

The trained model works well on the test datasets, which has 39.04% accuracy!

``````

In [10]:

y_test_predict_result = best_softmax.predict(X_test)
y_test_predict = y_test_predict_result[0]
test_accuracy = np.mean(y_test == y_test_predict)
print 'The test accuracy is: %f' % test_accuracy

``````
``````

The test accuracy is: 0.383200

``````