Image features exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

We have seen that we can achieve reasonable performance on an image classification task by training a linear classifier on the pixels of the input image. In this exercise we will show that we can improve our classification performance by training linear classifiers not on raw pixels but on features that are computed from the raw pixels.

All of your work for this exercise will be done in this notebook.



In [1]:

    
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

Load data

Similar to previous exercises, we will load CIFAR-10 data from disk.



In [2]:

    
from cs231n.features import color_histogram_hsv, hog_feature

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
  # Load the raw CIFAR-10 data
  cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
  X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
  
  # Subsample the data
  mask = range(num_training, num_training + num_validation)
  X_val = X_train[mask]
  y_val = y_train[mask]
  mask = range(num_training)
  X_train = X_train[mask]
  y_train = y_train[mask]
  mask = range(num_test)
  X_test = X_test[mask]
  y_test = y_test[mask]

  return X_train, y_train, X_val, y_val, X_test, y_test

X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()

Extract Features

For each image we will compute a Histogram of Oriented Gradients (HOG) as well as a color histogram using the hue channel in HSV color space. We form our final feature vector for each image by concatenating the HOG and color histogram feature vectors.

Roughly speaking, HOG should capture the texture of the image while ignoring color information, and the color histogram represents the color of the input image while ignoring texture. As a result, we expect that using both together ought to work better than using either alone. Verifying this assumption would be a good thing to try for the bonus section.

The hog_feature and color_histogram_hsv functions both operate on a single image and return a feature vector for that image. The extract_features function takes a set of images and a list of feature functions and evaluates each feature function on each image, storing the results in a matrix where each column is the concatenation of all feature vectors for a single image.



In [3]:

    
from cs231n.features import *

num_color_bins = 10 # Number of bins in the color histogram
feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]
X_train_feats = extract_features(X_train, feature_fns, verbose=True)
X_val_feats = extract_features(X_val, feature_fns)
X_test_feats = extract_features(X_test, feature_fns)

# Preprocessing: Subtract the mean feature
mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)
X_train_feats -= mean_feat
X_val_feats -= mean_feat
X_test_feats -= mean_feat

# Preprocessing: Divide by standard deviation. This ensures that each feature
# has roughly the same scale.
std_feat = np.std(X_train_feats, axis=0, keepdims=True)
X_train_feats /= std_feat
X_val_feats /= std_feat
X_test_feats /= std_feat

# Preprocessing: Add a bias dimension
X_train_feats = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])
X_val_feats = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])
X_test_feats = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])









    



Done extracting features for 1000 / 49000 images
Done extracting features for 2000 / 49000 images
Done extracting features for 3000 / 49000 images
Done extracting features for 4000 / 49000 images
Done extracting features for 5000 / 49000 images
Done extracting features for 6000 / 49000 images
Done extracting features for 7000 / 49000 images
Done extracting features for 8000 / 49000 images
Done extracting features for 9000 / 49000 images
Done extracting features for 10000 / 49000 images
Done extracting features for 11000 / 49000 images
Done extracting features for 12000 / 49000 images
Done extracting features for 13000 / 49000 images
Done extracting features for 14000 / 49000 images
Done extracting features for 15000 / 49000 images
Done extracting features for 16000 / 49000 images
Done extracting features for 17000 / 49000 images
Done extracting features for 18000 / 49000 images
Done extracting features for 19000 / 49000 images
Done extracting features for 20000 / 49000 images
Done extracting features for 21000 / 49000 images
Done extracting features for 22000 / 49000 images
Done extracting features for 23000 / 49000 images
Done extracting features for 24000 / 49000 images
Done extracting features for 25000 / 49000 images
Done extracting features for 26000 / 49000 images
Done extracting features for 27000 / 49000 images
Done extracting features for 28000 / 49000 images
Done extracting features for 29000 / 49000 images
Done extracting features for 30000 / 49000 images
Done extracting features for 31000 / 49000 images
Done extracting features for 32000 / 49000 images
Done extracting features for 33000 / 49000 images
Done extracting features for 34000 / 49000 images
Done extracting features for 35000 / 49000 images
Done extracting features for 36000 / 49000 images
Done extracting features for 37000 / 49000 images
Done extracting features for 38000 / 49000 images
Done extracting features for 39000 / 49000 images
Done extracting features for 40000 / 49000 images
Done extracting features for 41000 / 49000 images
Done extracting features for 42000 / 49000 images
Done extracting features for 43000 / 49000 images
Done extracting features for 44000 / 49000 images
Done extracting features for 45000 / 49000 images
Done extracting features for 46000 / 49000 images
Done extracting features for 47000 / 49000 images
Done extracting features for 48000 / 49000 images

Train SVM on features

Using the multiclass SVM code developed earlier in the assignment, train SVMs on top of the features extracted above; this should achieve better results than training SVMs directly on top of raw pixels.



In [4]:

    
# Use the validation set to tune the learning rate and regularization strength

from cs231n.classifiers.linear_classifier import LinearSVM

learning_rates = [1e-9, 1e-8, 1e-7]
regularization_strengths = [1e5, 1e6, 1e7]

results = {}
best_val = -1
best_svm = None

pass
################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained classifer in best_svm. You might also want to play          #
# with different numbers of bins in the color histogram. If you are careful    #
# you should be able to get accuracy of near 0.44 on the validation set.       #
################################################################################
for lr in learning_rates:
    for reg in regularization_strengths:
        svm = LinearSVM()
        loss_hist = svm.train(X_train_feats, y_train, learning_rate=lr, reg=reg, 
                              num_iters=1500, verbose=True)
        training_accuracy = np.mean(svm.predict(X_train_feats) == y_train)
        validation_accuracy = np.mean(svm.predict(X_val_feats) == y_val)
        if best_val < validation_accuracy:
            best_val = validation_accuracy
            best_svm = svm
        results[(lr, reg)] = (training_accuracy, validation_accuracy)
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val









    



iteration 0 / 1500: loss 84.387940
iteration 100 / 1500: loss 82.904461
iteration 200 / 1500: loss 81.424644
iteration 300 / 1500: loss 79.988033
iteration 400 / 1500: loss 78.585195
iteration 500 / 1500: loss 77.207230
iteration 600 / 1500: loss 75.867593
iteration 700 / 1500: loss 74.523469
iteration 800 / 1500: loss 73.247620
iteration 900 / 1500: loss 71.969283
iteration 1000 / 1500: loss 70.707590
iteration 1100 / 1500: loss 69.484479
iteration 1200 / 1500: loss 68.302606
iteration 1300 / 1500: loss 67.134991
iteration 1400 / 1500: loss 65.974067
iteration 0 / 1500: loss 753.093695
iteration 100 / 1500: loss 618.143916
iteration 200 / 1500: loss 507.686104
iteration 300 / 1500: loss 417.254881
iteration 400 / 1500: loss 343.212321
iteration 500 / 1500: loss 282.604954
iteration 600 / 1500: loss 232.984782
iteration 700 / 1500: loss 192.359161
iteration 800 / 1500: loss 159.104490
iteration 900 / 1500: loss 131.888235
iteration 1000 / 1500: loss 109.595637
iteration 1100 / 1500: loss 91.355281
iteration 1200 / 1500: loss 76.422726
iteration 1300 / 1500: loss 64.193387
iteration 1400 / 1500: loss 54.187253
iteration 0 / 1500: loss 7687.012173
iteration 100 / 1500: loss 1037.699141
iteration 200 / 1500: loss 146.824921
iteration 300 / 1500: loss 27.465564
iteration 400 / 1500: loss 11.474116
iteration 500 / 1500: loss 9.331455
iteration 600 / 1500: loss 9.044413
iteration 700 / 1500: loss 9.005940
iteration 800 / 1500: loss 9.000791
iteration 900 / 1500: loss 9.000103
iteration 1000 / 1500: loss 9.000009
iteration 1100 / 1500: loss 8.999998
iteration 1200 / 1500: loss 8.999997
iteration 1300 / 1500: loss 8.999996
iteration 1400 / 1500: loss 8.999996
iteration 0 / 1500: loss 85.509344
iteration 100 / 1500: loss 71.625137
iteration 200 / 1500: loss 60.262009
iteration 300 / 1500: loss 50.971534
iteration 400 / 1500: loss 43.370086
iteration 500 / 1500: loss 37.132922
iteration 600 / 1500: loss 32.025690
iteration 700 / 1500: loss 27.847041
iteration 800 / 1500: loss 24.434127
iteration 900 / 1500: loss 21.635262
iteration 1000 / 1500: loss 19.341771
iteration 1100 / 1500: loss 17.465640
iteration 1200 / 1500: loss 15.930648
iteration 1300 / 1500: loss 14.676052
iteration 1400 / 1500: loss 13.648625
iteration 0 / 1500: loss 756.187507
iteration 100 / 1500: loss 109.109797
iteration 200 / 1500: loss 22.413042
iteration 300 / 1500: loss 10.796904
iteration 400 / 1500: loss 9.240618
iteration 500 / 1500: loss 9.032180
iteration 600 / 1500: loss 9.004278
iteration 700 / 1500: loss 9.000543
iteration 800 / 1500: loss 9.000047
iteration 900 / 1500: loss 8.999973
iteration 1000 / 1500: loss 8.999969
iteration 1100 / 1500: loss 8.999979
iteration 1200 / 1500: loss 8.999959
iteration 1300 / 1500: loss 8.999963
iteration 1400 / 1500: loss 8.999961
iteration 0 / 1500: loss 7900.259513
iteration 100 / 1500: loss 9.000003
iteration 200 / 1500: loss 8.999996
iteration 300 / 1500: loss 8.999997
iteration 400 / 1500: loss 8.999997
iteration 500 / 1500: loss 8.999996
iteration 600 / 1500: loss 8.999997
iteration 700 / 1500: loss 8.999997
iteration 800 / 1500: loss 8.999997
iteration 900 / 1500: loss 8.999997
iteration 1000 / 1500: loss 8.999997
iteration 1100 / 1500: loss 8.999997
iteration 1200 / 1500: loss 8.999997
iteration 1300 / 1500: loss 8.999996
iteration 1400 / 1500: loss 8.999997
iteration 0 / 1500: loss 84.310527
iteration 100 / 1500: loss 19.085035
iteration 200 / 1500: loss 10.352013
iteration 300 / 1500: loss 9.181533
iteration 400 / 1500: loss 9.023823
iteration 500 / 1500: loss 9.002835
iteration 600 / 1500: loss 9.000134
iteration 700 / 1500: loss 8.999813
iteration 800 / 1500: loss 8.999676
iteration 900 / 1500: loss 8.999624
iteration 1000 / 1500: loss 8.999651
iteration 1100 / 1500: loss 8.999603
iteration 1200 / 1500: loss 8.999688
iteration 1300 / 1500: loss 8.999647
iteration 1400 / 1500: loss 8.999631
iteration 0 / 1500: loss 797.927980
iteration 100 / 1500: loss 8.999977
iteration 200 / 1500: loss 8.999955
iteration 300 / 1500: loss 8.999964
iteration 400 / 1500: loss 8.999974
iteration 500 / 1500: loss 8.999968
iteration 600 / 1500: loss 8.999967
iteration 700 / 1500: loss 8.999975
iteration 800 / 1500: loss 8.999970
iteration 900 / 1500: loss 8.999962
iteration 1000 / 1500: loss 8.999961
iteration 1100 / 1500: loss 8.999961
iteration 1200 / 1500: loss 8.999966
iteration 1300 / 1500: loss 8.999964
iteration 1400 / 1500: loss 8.999962
iteration 0 / 1500: loss 7743.681769
iteration 100 / 1500: loss 9.000000
iteration 200 / 1500: loss 9.000000
iteration 300 / 1500: loss 8.999999
iteration 400 / 1500: loss 8.999999
iteration 500 / 1500: loss 9.000000
iteration 600 / 1500: loss 9.000001
iteration 700 / 1500: loss 9.000000
iteration 800 / 1500: loss 9.000000
iteration 900 / 1500: loss 8.999999
iteration 1000 / 1500: loss 9.000001
iteration 1100 / 1500: loss 9.000001
iteration 1200 / 1500: loss 9.000000
iteration 1300 / 1500: loss 9.000000
iteration 1400 / 1500: loss 9.000001
lr 1.000000e-09 reg 1.000000e+05 train accuracy: 0.091490 val accuracy: 0.091000
lr 1.000000e-09 reg 1.000000e+06 train accuracy: 0.080469 val accuracy: 0.083000
lr 1.000000e-09 reg 1.000000e+07 train accuracy: 0.416469 val accuracy: 0.417000
lr 1.000000e-08 reg 1.000000e+05 train accuracy: 0.118449 val accuracy: 0.111000
lr 1.000000e-08 reg 1.000000e+06 train accuracy: 0.415122 val accuracy: 0.418000
lr 1.000000e-08 reg 1.000000e+07 train accuracy: 0.405755 val accuracy: 0.405000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.411776 val accuracy: 0.419000
lr 1.000000e-07 reg 1.000000e+06 train accuracy: 0.404939 val accuracy: 0.393000
lr 1.000000e-07 reg 1.000000e+07 train accuracy: 0.301898 val accuracy: 0.279000
best validation accuracy achieved during cross-validation: 0.419000



In [5]:

    
# Evaluate your trained SVM on the test set
y_test_pred = best_svm.predict(X_test_feats)
test_accuracy = np.mean(y_test == y_test_pred)
print test_accuracy



In [6]:

    
# An important way to gain intuition about how an algorithm works is to
# visualize the mistakes that it makes. In this visualization, we show examples
# of images that are misclassified by our current system. The first column
# shows images that our system labeled as "plane" but whose true label is
# something other than "plane".

examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
    idxs = np.where((y_test != cls) & (y_test_pred == cls))[0]
    idxs = np.random.choice(idxs, examples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
        plt.imshow(X_test[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls_name)
plt.show()

Inline question 1:

Describe the misclassification results that you see. Do they make sense?

Answer: The mistakes encountered make sense when we consider the features we are using The color histogram feature creates a bias towards marking images with similar backgrounds as the same class. Thus, for example deer and other animals photographed against a green/brown background are confused. Similar, the HOG features cause us to put images with similar edges into the same category. The hard edges between car, truck, plane, and ship are quite similar, so the errors make sense.

Neural Network on image features

Earlier in this assigment we saw that training a two-layer neural network on raw pixels achieved better classification performance than linear classifiers on raw pixels. In this notebook we have seen that linear classifiers on image features outperform linear classifiers on raw pixels.

For completeness, we should also try training a neural network on image features. This approach should outperform all previous approaches: you should easily be able to achieve over 55% classification accuracy on the test set; our best model achieves about 60% classification accuracy.



In [7]:

    
print X_train_feats.shape









    



(49000, 155)



In [8]:

    
from cs231n.classifiers.neural_net import TwoLayerNet

input_dim = X_train_feats.shape[1]
hidden_dim = 50
num_classes = 10

net = TwoLayerNet(input_dim, hidden_dim, num_classes)
best_net = None

################################################################################
# TODO: Train a two-layer neural network on image features. You may want to    #
# cross-validate various parameters as in previous sections. Store your best   #
# model in the best_net variable.                                              #
################################################################################
## Identical to visualization code above
def visualize(stats):
    plt.subplot(2, 1, 1)
    plt.plot(stats['loss_history'])
    plt.title('Loss history')
    plt.xlabel('Iteration')
    plt.ylabel('Loss')

    plt.subplot(2, 1, 2)
    plt.plot(stats['train_acc_history'], label='train')
    plt.plot(stats['val_acc_history'], label='val')
    plt.title('Classification accuracy history')
    plt.xlabel('Epoch')
    plt.ylabel('Clasification accuracy')
    plt.show()

## Train the network
stats = net.train(X_train_feats, y_train, X_val_feats, y_val,
            num_iters=15000, batch_size=200,
            learning_rate=1e-3, learning_rate_decay=0.91,
            reg=0.028, verbose=True)

best_net = net

## Best accuracy on the validation set
stats['best_val_acc'] = max(stats['val_acc_history'])

visualize(stats)
    
test_acc = np.mean(net.predict(X_test_feats) == y_test)
print 'Validation accuracy: ', stats['best_val_acc'], 'Test accuracy: ', test_acc
################################################################################
#                              END OF YOUR CODE                                #
################################################################################









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-e75a0bbd57a5> in <module>()
     33             num_iters=15000, batch_size=200,
     34             learning_rate=1e-3, learning_rate_decay=0.91,
---> 35             reg=0.028, verbose=True)
     36 
     37 best_net = net

/home/lenovo/Desktop/ass/CS231n/assignment1/cs231n/classifiers/neural_net.pyc in train(self, X, y, X_val, y_val, learning_rate, learning_rate_decay, reg, num_iters, batch_size, verbose)
    192 
    193       # Compute loss and gradients using the current minibatch
--> 194       loss, grads = self.loss(X_batch, y=y_batch, reg=reg)
    195       loss_history.append(loss)
    196 

/home/lenovo/Desktop/ass/CS231n/assignment1/cs231n/classifiers/neural_net.pyc in loss(self, X, y, reg)
     66     W1, b1 = self.params['W1'], self.params['b1']
     67     W2, b2 = self.params['W2'], self.params['b2']
---> 68     N, D = X.shape
     69 
     70     # Compute the forward pass

ValueError: too many values to unpack



In [ ]:

    
# Run your neural net classifier on the test set. You should be able to
# get more than 55% accuracy.

test_acc = (net.predict(X_test_feats) == y_test).mean()
print test_acc

Bonus: Design your own features!

You have seen that simple image features can improve classification performance. So far we have tried HOG and color histograms, but other types of features may be able to achieve even better classification performance.

For bonus points, design and implement a new type of feature and use it for image classification on CIFAR-10. Explain how your feature works and why you expect it to be useful for image classification. Implement it in this notebook, cross-validate any hyperparameters, and compare its performance to the HOG + Color histogram baseline.

Bonus: Do something extra!

Use the material and code we have presented in this assignment to do something interesting. Was there another question we should have asked? Did any cool ideas pop into your head as you were working on the assignment? This is your chance to show off!