```
In [1]:
```# A bit of setup
import numpy as np
import matplotlib.pyplot as plt
from time import time
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

The TinyImageNet dataset is a subset of the ILSVRC-2012 classification dataset. It consists of 200 object classes, and for each object class it provides 500 training images, 50 validation images, and 50 test images. All images have been downsampled to 64x64 pixels. We have provided the labels for all training and validation images, but have withheld the labels for the test images.

We have further split the full TinyImageNet dataset into two equal pieces, each with 100 object classes. We refer to these datasets as TinyImageNet-100-A and TinyImageNet-100-B.

To download the data, go into the `cs231n/datasets`

directory and run the script `get_tiny_imagenet_splits.sh`

. Then run the following code to load the TinyImageNet-100-A dataset into memory.

NOTE: The full TinyImageNet dataset will take up about 490MB of disk space, and loading the full TinyImageNet-100-A dataset into memory will use about 2.8GB of memory.

```
In [2]:
```from cs231n.data_utils import load_tiny_imagenet
tiny_imagenet_a = 'cs231n/datasets/tiny-imagenet-100-A'
class_names, X_train, y_train, X_val, y_val, X_test, y_test = load_tiny_imagenet(tiny_imagenet_a)
# Zero-mean the data
mean_img = np.mean(X_train, axis=0)
X_train -= mean_img
X_val -= mean_img
X_test -= mean_img

```
```

Since ImageNet is based on the WordNet ontology, each class in ImageNet (and TinyImageNet) actually has several different names. For example "pop bottle" and "soda bottle" are both valid names for the same class. Run the following to see a list of all classes in TinyImageNet-100-A:

```
In [3]:
```for names in class_names:
print ' '.join('"%s"' % name for name in names)

```
```

```
In [4]:
```# Visualize some examples of the training data
classes_to_show = 7
examples_per_class = 5
class_idxs = np.random.choice(len(class_names), size=classes_to_show, replace=False)
for i, class_idx in enumerate(class_idxs):
train_idxs, = np.nonzero(y_train == class_idx)
train_idxs = np.random.choice(train_idxs, size=examples_per_class, replace=False)
for j, train_idx in enumerate(train_idxs):
img = X_train[train_idx] + mean_img
img = img.transpose(1, 2, 0).astype('uint8')
plt.subplot(examples_per_class, classes_to_show, 1 + i + classes_to_show * j)
if j == 0:
plt.title(class_names[class_idx][0])
plt.imshow(img)
plt.gca().axis('off')
plt.show()

```
```

Run the following to test your own classification performance on the TinyImageNet-100-A dataset.

You can run several times in 'training' mode to get familiar with the task; once you are ready to test yourself, switch the mode to `'val'`

.

You won't be penalized if you don't correctly classify all the images, but you should still try your best.

```
In [ ]:
```mode = 'train'
name_to_label = {n.lower(): i for i, ns in enumerate(class_names) for n in ns}
if mode == 'train':
X, y = X_train, y_train
elif mode == 'val':
X, y = X_val, y_val
num_correct = 0
num_images = 10
for i in xrange(num_images):
idx = np.random.randint(X.shape[0])
img = (X[idx] + mean_img).transpose(1, 2, 0).astype('uint8')
plt.imshow(img)
plt.gca().axis('off')
plt.gcf().set_size_inches((2, 2))
plt.show()
got_name = False
while not got_name:
name = raw_input('Guess the class for the above image (%d / %d) : ' % (i + 1, num_images))
name = name.lower()
got_name = name in name_to_label
if not got_name:
print 'That is not a valid class name; try again'
guess = name_to_label[name]
if guess == y[idx]:
num_correct += 1
print 'Correct!'
else:
print 'Incorrect; it was actually %r' % class_names[y[idx]]
acc = float(num_correct) / num_images
print 'You got %d / %d correct for an accuracy of %f' % (num_correct, num_images, acc)

We have provided 10 pretrained ConvNets for the TinyImageNet-100-A dataset. Each of these models is a five-layer ConvNet with the architecture

[conv - relu - pool] x 3 - affine - relu - affine - softmax

All convolutional layers are 3x3 with stride 1 and all pooling layers are 2x2 with stride 2. The first two convolutional layers have 32 filters each, and the third convolutional layer has 64 filters. The hidden affine layer has 512 neurons. You can run the forward and backward pass for these five layer convnets using the function `five_layer_convnet`

in the file `cs231n/classifiers/convnet.py`

.

Each of these models was trained for 25 epochs over the TinyImageNet-100-A training data with a batch size of 50 and with dropout on the hidden affine layer. Each model was trained using slightly different values for the learning rate, regularization, and dropout probability.

To download the pretrained models, go into the `cs231n/datasets`

directory and run the `get_pretrained_models.sh`

script. Once you have done so, run the following to load the pretrained models into memory.

NOTE: The pretrained models will take about 245MB of disk space.

```
In [5]:
```from cs231n.data_utils import load_models
models_dir = 'cs231n/datasets/tiny-100-A-pretrained'
# models is a dictionary mappping model names to models.
# Like the previous assignment, each model is a dictionary mapping parameter
# names to parameter values.
models = load_models(models_dir)

```
In [6]:
```from cs231n.classifiers.convnet import five_layer_convnet
# Dictionary mapping model names to their predicted class probabilities on the
# validation set. model_to_probs[model_name] is an array of shape (N_val, 100)
# where model_to_probs[model_name][i, j] = p indicates that models[model_name]
# predicts that X_val[i] has class i with probability p.
model_to_probs = {}
################################################################################
# TODO: Use each model to predict classification probabilities for all images #
# in the validation set. Store the predicted probabilities in the #
# model_to_probs dictionary as above. To compute forward passes and compute #
# probabilities, use the function five_layer_convnet in the file #
# cs231n/classifiers/convnet.py. #
# #
# HINT: Trying to predict on the entire validation set all at once will use a #
# ton of memory, so you should break the validation set into batches and run #
# each batch through each model separately. #
################################################################################
from cs231n.classifiers.convnet import five_layer_convnet
import math
batch_size = 100
for model_name, model in models.items():
model_to_probs[model_name] = None
for i in range(int(math.ceil(X_val.shape[0] / batch_size))):
for model_name, model in models.items():
y_predict = five_layer_convnet(X_val[i*batch_size: (i+1)*batch_size],
model,
None and y_val[i*batch_size: (i+1)*batch_size],
return_probs=True)
try:
if model_to_probs[model_name] is None:
model_to_probs[model_name] = y_predict
else:
model_to_probs[model_name] = np.concatenate(
(model_to_probs[model_name], y_predict), axis=0)
except:
print(model_to_probs[model_name].shape, y_predict.shape)
raise
pass
pass
################################################################################
# END OF YOUR CODE #
################################################################################
# Compute and print the accuracy for each model.
for model_name, probs in model_to_probs.iteritems():
acc = np.mean(np.argmax(probs, axis=1) == y_val)
print '%s got accuracy %f' % (model_name, acc)

```
```

A simple way to implement an ensemble of models is to average the predicted probabilites for each model in the ensemble.

More concretely, suppose we have models $k$ models $m_1,\ldots,m_k$ and we want to combine them into an ensemble. If $p(x=y_i \mid m_j)$ is the probability that the input $x$ is classified as $y_i$ under model $m_j$, then the enemble predicts

$$p(x=y_i \mid \{m_1,\ldots,m_k\}) = \frac1k\sum_{j=1}^kp(x=y_i\mid m_j)$$In the cell below, implement this simple ensemble method by filling in the `compute_ensemble_preds`

function. The ensemble of all 10 models should perform much better than the best individual model.

```
In [7]:
```def compute_ensemble_preds(probs_list):
"""
Use the predicted class probabilities from different models to implement
the ensembling method described above.
Inputs:
- probs_list: A list of numpy arrays, where each gives the predicted class
probabilities under some model. In other words,
probs_list[j][i, c] = p means that the jth model in the ensemble thinks
that the ith data point has class c with probability p.
Returns:
An array y_pred_ensemble of ensembled predictions, such that
y_pred_ensemble[i] = c means that ensemble predicts that the ith data point
is predicted to have class c.
"""
y_pred_ensemble = None
############################################################################
# TODO: Implement this function. Store the ensemble predictions in #
# y_pred_ensemble. #
############################################################################
probs_list_ensemble = np.mean(probs_list, axis=0)
y_pred_ensemble = np.argmax(probs_list_ensemble, axis=1)
pass
############################################################################
# END OF YOUR CODE #
############################################################################
return y_pred_ensemble
# Combine all models into an ensemble and make predictions on the validation set.
# This should be significantly better than the best individual model.
print np.mean(compute_ensemble_preds(model_to_probs.values()) == y_val)

```
```

Using our 10 pretrained models, we can form many different ensembles of different sizes. More precisely, if we have $n$ models and we want to form an ensemble of $k$ models, then there are $\binom{n}{k}$ possible ensembles that we can form, where

$$\binom{n}{k} = \frac{n!}{(n-k)!k!}$$We can use these different possible ensembles to study the effect of ensemble size on ensemble performance.

In the cell below, compute the validation set accuracy of all possible ensembles of our 10 pretrained models. Produce a scatter plot with "ensemble size" on the horizontal axis and "validation set accuracy" on the vertical axis. Your plot should have a total of

$$\sum_{k=1}^{10} \binom{10}{k}$$points corresponding to all possible ensembles of the 10 pretrained models.

You should be able to compute the validation set predictions of these ensembles without computing any more forward passes through any of the networks.

```
In [8]:
```################################################################################
# TODO: Create a plot comparing ensemble size with ensemble performance as #
# described above. #
# #
# HINT: Look up the function itertools.combinations. #
################################################################################
import itertools
ensemble_sizes = []
val_accs = []
for i in range(1, 11):
combinations = itertools.combinations(model_to_probs.values(), i)
for combination in combinations:
ensemble_sizes.append(i)
y_pred_ensemple = compute_ensemble_preds(combination)
val_accs.append(np.mean(y_pred_ensemple == y_val))
pass
plt.scatter(ensemble_sizes, val_accs)
plt.title('Ensemble size vs Performance')
plt.xlabel('ensemble size')
plt.ylabel('validation set accuracy')
################################################################################
# END OF YOUR CODE #
################################################################################

```
Out[8]:
```

```
In [ ]:
```