Based on Stanford CS231n course. The dataset can be loaded from here (make sure you load the python version)
In [1]:
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
In [2]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
In [3]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
idxs = np.flatnonzero(y_train == y)
idxs = np.random.choice(idxs, samples_per_class, replace=False)
for i, idx in enumerate(idxs):
plt_idx = i * num_classes + y + 1
plt.subplot(samples_per_class, num_classes, plt_idx)
plt.imshow(X_train[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls)
plt.show()
In [4]:
# Subsample the data for more efficient code execution in this exercise
num_training = 5000
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]
num_test = 500
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
In [5]:
# Reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
print X_train.shape,X_test.shape
In [6]:
from cs231n.classifiers import KNearestNeighbor
# Create a kNN classifier instance.
# Remember that training a kNN classifier is a noop:
# the Classifier simply remembers the data and does no further processing
classifier = KNearestNeighbor()
classifier.train(X_train, y_train)
In [7]:
# Open cs231n/classifiers/k_nearest_neighbor.py and implement
# compute_distances_no_loops.
# A vectorized implementation that is faster by a factor
# of 20 than the naive two-loops implementation and factor
# 10 by the slightly more efficient one-loop
# ( see file k_nearest_neighbor.py)
dists = classifier.compute_distances_no_loops(X_test)
In [8]:
# We can visualize the distance matrix: each row is a single test example and
# its distances to training examples
plt.imshow(dists, interpolation='none')
Out[8]:
In [9]:
# Now implement the function predict_labels and run the code below:
# We use k = 1 (which is Nearest Neighbor).
k_choices = [1, 3, 5, 7, 8, 10, 12, 15, 20, 50, 100]
for k in k_choices:
y_test_pred = classifier.predict_labels(dists, k)
# Compute and print the fraction of correctly predicted examples
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print 'k=%d : Got %d / %d correct => accuracy: %f' % (k,num_correct, num_test, accuracy)
Resources :web demo, slides. The material below is based on the notes.
$ f(x_i,W,b) = Wx_i + b $
where $y_i \in 1,\cdots,k$ is the label of the $i$'th data point (total of $k$ labels) , $x_i \in \mathcal{R}^D, \quad i = 1,\cdots , N \quad , D=32 \times 32 \times 3 =3072$, $W$ is an $ k \times N$ matrix and $b \in \mathcal{R}^k$. We want to select $W$ and $b$ so that
$ f(x_i,W,b) \approx y_i \quad, i = 1,\cdots, N $
Each row of $W$ along with the corresponding bias term of $b$ constitue a single classifier. Some interpretations of $W$:
In practice we add the bias vector as a c(last) olumn of the matrix $W$ and expand the vector $\mathbf{x}$ by adding a last term consisting of the constant one. So we write
$f(\mathbf{x},W) = W \mathbf{x}$
In many occasions we will freely name the classifier after our loss function so we will talk about SVM or Softmax classifier instead of about a linear classifier with the hinge loss function of with the cross entropy loss function.
The multiclass SVM loss for the $i$'th example is
$L_i = \sum_{j \ne y_i} \max( 0, f(x_i,W)_j - f(x_i,W)_{y_i} + \Delta )$
That is we compare the score of $f(x_i,W)$ for the $j$'th class to the score of $f(x_i,W)$ for the class with label $y_i$ (which is the label of the $i$'th example). If that score is larger by a margin $\Delta$ from the score corresponding to the true label, that a positive contrubution is added to the loss function. Otherwise there is no contribution to the loss function.
$L = \frac{1}{N } \sum_i L_i + \lambda \sum_k \sum_l W_{k,l}^2 = \frac{1}{N } \sum_i L_i + \lambda R(W)$
Please consider Andrew's Ng notes (pdf) for more information about SVM and its max-margin properties.
Practical considerations
The softmax classifier is an extention of logistic regression classifier for multi-class problems. It is based on the cross-entropy loss:
$L_i = -\log(\frac{e^{f_{y_i}}}{\sum_{j} e^{f_j}}) = -f_{y_i} + \log({\sum_{j} e^{f_j}}) \quad, \quad f_j = f(\mathbf{x_i},W)_j$
The term $\frac{e^{f_{y_i}}}{\sum_{j} e^{f_j}}$ (the softmax function) transform the unormalized quantities $f(\mathbf{x_i},W)_j$ into a number in $[0,1]$ which now can be interpreted as probablity. Minimizing the loss function
$L = \frac{1}{N } \sum_i L_i + \lambda R(W)$
can be interpreted from two point of views:
Information theory view the softmax classifier minimizes the cross-entrpy which defined between two distributions $p$ and $q$:
$H(p,q) = -\sum_x p(x) \log q(x)$
In the above we take $q$ to be $\frac{e^{f_{y_i}}}{\sum_{j} e^{f_j}}$ and $p$ is the delta distribution $(0,0,\cdots,1,0,\cdots)$ having 0 in all places except the place $y_i$ of the true label of exanple $i$ for which the value is 1. It turns out that this is equivalent to minimizing the Kullback-Leibler divergence which is a measure of the distance between $q$ and $p$ so we woudl like the weight to be such that the prediction will have all the mass in the location of the correct label.
Most work is done with the softmax function but some claim that it is better to use SVM.
In [10]:
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
"""
Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
it for the linear classifier. These are the same steps as we used for the
SVM, but condensed to a single function.
"""
# Load the raw CIFAR-10 data
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
# subsample the data
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
# Normalize the data: subtract the mean image
mean_image = np.mean(X_train, axis = 0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
# Display the mean image
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
# add bias dimension and transform into columns
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))]).T
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))]).T
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))]).T
return X_train, y_train, X_val, y_val, X_test, y_test
# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
In [11]:
# Note: Vectorized form is x8 faster for loss computation
# and x3 faster for loss and gradient computation
#
# Gradient checking is important as it is very easy to make an
# error in the gradient computation
# and gradient check it with the function we provided for you
# Compute the loss and its gradient at W.
from cs231n.classifiers.linear_svm import svm_loss_naive
from cs231n.classifiers.linear_svm import svm_loss_vectorized
import time
W = np.random.randn(10, 3073) * 0.0001
loss, grad = svm_loss_vectorized(W, X_train, y_train, 0.0)
# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_vectorized(w, X_train, y_train, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)
In [12]:
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
# It is advidable to select learning rate, regulariation value
# and control more precisely the stopping criteria
# learning_rates = [1e-7, 5e-5]
# regularization_strengths = [5e4, 1e5]
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)
# Eevaluate the performance on both the training and validation accuracy
y_train_pred = svm.predict(X_train)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = svm.predict(X_val)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )
# Plot the loss as a function of iteration number
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
Out[12]:
In [13]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = svm.W[:,:-1] # strip out the bias
w = w.reshape(10, 32, 32, 3)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
plt.subplot(2, 5, i + 1)
# Rescale the weights to be between 0 and 255
wimg = 255.0 * (w[i].squeeze() - w_min) / (w_max - w_min)
plt.imshow(wimg.astype('uint8'))
plt.axis('off')
plt.title(classes[i])
In [14]:
# Vectorized implementation is x10 faster than naive
from cs231n.classifiers.softmax import softmax_loss_naive
from cs231n.classifiers.softmax import softmax_loss_vectorized
loss, grad = softmax_loss_vectorized(W, X_train, y_train, 0.0)
# As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_vectorized(w, X_train, y_train, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)
In [15]:
from cs231n.classifiers import Softmax
softm = Softmax()
tic = time.time()
# It is advidable to select learning rate, regulariation value
# and control more precisely the stopping criteria
# learning_rates = [1e-7, 5e-7]
# regularization_strengths = [5e4, 1e8]
loss_hist = softm.train(X_train, y_train, learning_rate=4e-7, reg=1e4,
num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)
# Eevaluate the performance on both the training and validation accuracy
y_train_pred = softm.predict(X_train)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = softm.predict(X_val)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )
# Plot the loss as a function of iteration number
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
Out[15]:
In [16]:
# Visualize the learned weights for each class
w = softm.W[:,:-1] # strip out the bias
w = w.reshape(10, 32, 32, 3)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
plt.subplot(2, 5, i + 1)
# Rescale the weights to be between 0 and 255
wimg = 255.0 * (w[i].squeeze() - w_min) / (w_max - w_min)
plt.imshow(wimg.astype('uint8'))
plt.axis('off')
plt.title(classes[i])
It is rare that image recognition is done on raw pixels. Usually we extract features from the image and do our classification on thise features. See the slides in here
In [17]:
# Load the CIFAR10 data
from cs231n.features import color_histogram_hsv, hog_feature
def get_CIFAR10_data1(num_training=49000, num_validation=1000, num_test=1000):
# Load the raw CIFAR-10 data
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
# Subsample the data
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
return X_train, y_train, X_val, y_val, X_test, y_test
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data1()
print X_train.shape
print y_train.shape
In [18]:
from cs231n.features import *
# Extract features. For each image we will compute a Histogram of Oriented
# Gradients (HOG) as well as a color histogram using the hue channel in HSV
# color space. We form our final feature vector for each image by concatenating
# the HOG and color histogram feature vectors.
#
# Roughly speaking, HOG should capture the texture of the image while ignoring
# color information, and the color histogram represents the color of the input
# image while ignoring texture. As a result, we expect that using both together
# ought to work better than using either alone. Verifying this assumption would
# be a good thing to try for the bonus section.
# The hog_feature and color_histogram_hsv functions both operate on a single
# image and return a feature vector for that image. The extract_features
# function takes a set of images and a list of feature functions and evaluates
# each feature function on each image, storing the results in a matrix where
# each column is the concatenation of all feature vectors for a single image.
num_color_bins = 10 # Number of bins in the color histogram
feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]
X_train_feats = extract_features(X_train, feature_fns, verbose=True)
X_val_feats = extract_features(X_val, feature_fns)
X_test_feats = extract_features(X_test, feature_fns)
# Preprocessing: Subtract the mean feature
mean_feat = np.mean(X_train_feats, axis=1)
mean_feat = np.expand_dims(mean_feat, axis=1)
X_train_feats -= mean_feat
X_val_feats -= mean_feat
X_test_feats -= mean_feat
# Preprocessing: Divide by standard deviation. This ensures that each feature
# has roughly the same scale.
std_feat = np.std(X_train_feats, axis=1)
std_feat = np.expand_dims(std_feat, axis=1)
X_train_feats /= std_feat
X_val_feats /= std_feat
X_test_feats /= std_feat
# Preprocessing: Add a bias dimension
X_train_feats = np.vstack([X_train_feats, np.ones((1, X_train_feats.shape[1]))])
X_val_feats = np.vstack([X_val_feats, np.ones((1, X_val_feats.shape[1]))])
X_test_feats = np.vstack([X_test_feats, np.ones((1, X_test_feats.shape[1]))])
In [19]:
from cs231n.classifiers import LinearSVM
svm_f = LinearSVM()
tic = time.time()
loss_hist = svm_f.train(X_train_feats, y_train, learning_rate=5e-8, reg=1e6,
num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)
# Eevaluate the performance on both the training and validation accuracy
y_train_pred = svm_f.predict(X_train_feats)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = svm_f.predict(X_val_feats)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )
# Plot the loss as a function of iteration number
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
Out[19]:
In [20]:
# Evaluate your classifier on the test set
y_test_pred = svm_f.predict(X_test_feats)
test_accuracy = np.mean(y_test == y_test_pred)
print test_accuracy
In [21]:
# An important way to gain intuition about how an algorithm works is to
# visualize the mistakes that it makes. In this visualization, we show examples
# of images that are misclassified by our current system. The first column
# shows images that our system labeled as "plane" but whose true label is
# something other than "plane".
examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
idxs = np.where((y_test != cls) & (y_test_pred == cls))[0]
idxs = np.random.choice(idxs, examples_per_class, replace=False)
for i, idx in enumerate(idxs):
plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
plt.imshow(X_test[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls_name)
plt.show()
In [22]:
from cs231n.classifiers import Softmax
soft_f = Softmax()
tic = time.time()
loss_hist = soft_f.train(X_train_feats, y_train,
learning_rate=5e-8, reg=2e6,
num_iters=1000, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)
# Eevaluate the performance on both the training and validation accuracy
y_train_pred = soft_f.predict(X_train_feats)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = soft_f.predict(X_val_feats)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )
# Plot the loss as a function of iteration number
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
Out[22]:
In [23]:
# Evaluate your classifier on the test set
y_test_pred = soft_f.predict(X_test_feats)
test_accuracy = np.mean(y_test == y_test_pred)
print test_accuracy
In [274]:
# An important way to gain intuition about how an algorithm works is to
# visualize the mistakes that it makes. In this visualization, we show examples
# of images that are misclassified by our current system. The first column
# shows images that our system labeled as "plane" but whose true label is
# something other than "plane".
examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
idxs = np.where((y_test != cls) & (y_test_pred == cls))[0]
idxs = np.random.choice(idxs, examples_per_class, replace=False)
for i, idx in enumerate(idxs):
plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
plt.imshow(X_test[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls_name)
plt.show()
In [54]:
A = np.array([[1,2,3,4],[5,6,7,8]])
b = np.array([1,1,1,1])
c = np.array([[1],[2]])
print c.shape
print A
print np.sum(A**2,axis=0) # Sum columns
print np.sum(A**2,axis=1) # sum rows
print
print A-b
print A - c
print np.linalg.norm(A - b,axis=1)