Image features exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

We have seen that we can achieve reasonable performance on an image classification task by training a linear classifier on the pixels of the input image. In this exercise we will show that we can improve our classification performance by training linear classifiers not on raw pixels but on features that are computed from the raw pixels.

All of your work for this exercise will be done in this notebook.


In [1]:
# Run some setup code
import numpy as np
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

# bool var. to let program show debug info.
debug = True
show_img = True

Load data

Similar to previous exercises, we will load CIFAR-10 data from disk.


In [3]:
import cifar10
# Load the raw CIFAR-10 data
m, m_val, m_dev, m_test = 49000, 1000, 500, 1000
m_spec = (m, m_val, m_dev, m_test)
data = cifar10.load_raw('../cifar-10-batches-py', m_spec, debug = debug)
X, y, X_test, y_test, X_val, y_val, X_dev, y_dev = data


Cifar-10 dataset has been loaded
X shape (50000, 32, 32, 3)
y shape (50000,)
X_test shape (10000, 32, 32, 3)
y_test shape (10000,)
Data has been splited.
X shape (49000, 32, 32, 3)
y shape (49000,)
X_val shape (1000, 32, 32, 3)
y_val shape (1000,)
X_test shape (1000, 32, 32, 3)
y_test shape (1000,)
X_dev shape (500, 32, 32, 3)
y_dev shape (500,)

Extract Features

For each image we will compute a Histogram of Oriented Gradients (HOG) as well as a color histogram using the hue channel in HSV color space. We form our final feature vector for each image by concatenating the HOG and color histogram feature vectors.

Roughly speaking, HOG should capture the texture of the image while ignoring color information, and the color histogram represents the color of the input image while ignoring texture. As a result, we expect that using both together ought to work better than using either alone. Verifying this assumption would be a good thing to try for the bonus section.

The hog_feature and color_histogram_hsv functions both operate on a single image and return a feature vector for that image. The extract_features function takes a set of images and a list of feature functions and evaluates each feature function on each image, storing the results in a matrix where each column is the concatenation of all feature vectors for a single image.


In [5]:
from features import *
num_color_bins = 10 # Number of bins in the color histogram
feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]
X_train_feats = extract_features(X, feature_fns, verbose=True)
X_val_feats = extract_features(X_val, feature_fns)
X_test_feats = extract_features(X_test, feature_fns)

# Preprocessing: Subtract the mean feature
mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)
X_train_feats -= mean_feat
X_val_feats -= mean_feat
X_test_feats -= mean_feat

# Preprocessing: Divide by standard deviation. This ensures that each feature
# has roughly the same scale.
std_feat = np.std(X_train_feats, axis=0, keepdims=True)
X_train_feats /= std_feat
X_val_feats /= std_feat
X_test_feats /= std_feat

# Preprocessing: Add a bias dimension
# X_train_feats = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])
# X_val_feats = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])
# X_test_feats = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])


Done extracting features for 1000 / 49000 images
Done extracting features for 2000 / 49000 images
Done extracting features for 3000 / 49000 images
Done extracting features for 4000 / 49000 images
Done extracting features for 5000 / 49000 images
Done extracting features for 6000 / 49000 images
Done extracting features for 7000 / 49000 images
Done extracting features for 8000 / 49000 images
Done extracting features for 9000 / 49000 images
Done extracting features for 10000 / 49000 images
Done extracting features for 11000 / 49000 images
Done extracting features for 12000 / 49000 images
Done extracting features for 13000 / 49000 images
Done extracting features for 14000 / 49000 images
Done extracting features for 15000 / 49000 images
Done extracting features for 16000 / 49000 images
Done extracting features for 17000 / 49000 images
Done extracting features for 18000 / 49000 images
Done extracting features for 19000 / 49000 images
Done extracting features for 20000 / 49000 images
Done extracting features for 21000 / 49000 images
Done extracting features for 22000 / 49000 images
Done extracting features for 23000 / 49000 images
Done extracting features for 24000 / 49000 images
Done extracting features for 25000 / 49000 images
Done extracting features for 26000 / 49000 images
Done extracting features for 27000 / 49000 images
Done extracting features for 28000 / 49000 images
Done extracting features for 29000 / 49000 images
Done extracting features for 30000 / 49000 images
Done extracting features for 31000 / 49000 images
Done extracting features for 32000 / 49000 images
Done extracting features for 33000 / 49000 images
Done extracting features for 34000 / 49000 images
Done extracting features for 35000 / 49000 images
Done extracting features for 36000 / 49000 images
Done extracting features for 37000 / 49000 images
Done extracting features for 38000 / 49000 images
Done extracting features for 39000 / 49000 images
Done extracting features for 40000 / 49000 images
Done extracting features for 41000 / 49000 images
Done extracting features for 42000 / 49000 images
Done extracting features for 43000 / 49000 images
Done extracting features for 44000 / 49000 images
Done extracting features for 45000 / 49000 images
Done extracting features for 46000 / 49000 images
Done extracting features for 47000 / 49000 images
Done extracting features for 48000 / 49000 images

Train SVM on features

Using the multiclass SVM code developed earlier in the assignment, train SVMs on top of the features extracted above; this should achieve better results than training SVMs directly on top of raw pixels.


In [12]:
from svm import SVM
n = X_train_feats.shape[1]
K = 10

# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained classifer in best_svm. You might also want to play          #
# with different numbers of bins in the color histogram. If you are careful    #
# you should be able to get accuracy of near 0.44 on the validation set.       #
best_model = None
best_val = -1
alpha, lamda, T, B = 1e-7, 3e4, 1000, 200
for lamda in [3e4]:
    hpara = (alpha, lamda, T, B)
    print hpara
    model = SVM(n, K)
    model.train(X_train_feats, y, hpara, show_img = False, debug = False)
    train_acc = np.mean(model.predict(X_val_feats) == y_val)
    val_acc = np.mean(model.predict(X_val_feats) == y_val)
    print 'train acc.:', train_acc, 'val. acc.:', val_acc
    if val_acc > best_val:
        best_model = model
        best_val = val_acc


(1e-07, 30000.0, 10000, 200)
train acc.: 0.348 val. acc.: 0.348

In [15]:
# Evaluate your trained SVM on the test set
print 'test acc.', np.mean(best_model.predict(X_test_feats) == y_test)


 test acc. 0.342

Neural Network on image features

Earlier in this assigment we saw that training a two-layer neural network on raw pixels achieved better classification performance than linear classifiers on raw pixels. In this notebook we have seen that linear classifiers on image features outperform linear classifiers on raw pixels.

For completeness, we should also try training a neural network on image features. This approach should outperform all previous approaches: you should easily be able to achieve over 55% classification accuracy on the test set; our best model achieves about 60% classification accuracy.


In [ ]:
from nnet import NNet
best_model = None
best_acc = -1
# TODO: Tune hyperparameters using the validation set. Store your best trained  
# model in best_net.                                                            
#                                                                               
# To help debug your network, it may help to use visualizations similar to the  
# ones we used above; these visualizations will have significant qualitative    
# differences from the ones we saw above for the poorly tuned network.          
#                                                                               
# Tweaking hyperparameters by hand can be fun, but you might find it useful to  
# write code to sweep through possible combinations of hyperparameters          
# automatically like we did on the previous exercises.
n0 = X_train_feats.shape[1]
n1 = 500
n2 = 10

alpha, lamda, T, B, rho = 2e-3, 1e-3, 1000, 200, 0.95
for alpha in [1e-2, 1e-1, 1e0]:
    hpara = (alpha, lamda, T, B, rho)
    print hpara
    model = NNet(n0, n1, n2, std = 1e-1)
    model.train(X_train_feats, y, X_val_feats, y_val, hpara, debug, show_img)
    
    # Predict on the val. set
    val_acc = np.mean(model.predict(X_val_feats) == y_val)
    print 'val. acc.:', val_acc
    print '\n'
    if val_acc > best_acc:
        best_acc = val_acc
        best_model = model


(0.01, 0.001, 1000, 200, 0.95)
iteration 0 / 1000: loss 4.099021
iteration 245 / 1000: loss 2.318138
iteration 490 / 1000: loss 2.019974
iteration 735 / 1000: loss 1.903303

In [ ]:
# Run your neural net classifier on the test set. You should be able to
# get more than 55% accuracy.
print 'test acc.', np.mean(best_model.predict(X_test_feats) == y_test)