After running your Pylearn2 models, it's probably not best to compare them on the score they get on the validation set, as that is used in the training process; so could be the victim of overfitting. It would be better to run the model over the test set, which is supposed to be a holdout set used to compare models. We could rerun all our models with a monitor on this value, but for models we've already run, it might be more useful to be able to pull out this value for just that pickle.
This is likely to be wasted effort, because it seems like the kind of thing that should already exist in Pylearn2. Unfortunately, since I can't find it and it seems fairly simple to implement I'm just going to go ahead and write it.
Hopefully, this will also help us figure out what's going wrong with some submissions, that turn out to be incredibly bad; for example, those using augmentation.
In [1]:
import pylearn2.utils
import pylearn2.config
import theano
import neukrill_net.dense_dataset
import neukrill_net.utils
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import holoviews as hl
%load_ext holoviews.ipython
In [2]:
cd ..
In [232]:
settings = neukrill_net.utils.Settings("settings.json")
run_settings = neukrill_net.utils.load_run_settings(
"run_settings/alexnet_based_norm_global_8aug.json", settings, force=True)
In [233]:
%%time
# loading the model
model = pylearn2.utils.serial.load(run_settings['pickle abspath'])
In [234]:
reload(neukrill_net.dense_dataset)
Out[234]:
In [235]:
%%time
# loading the data
dataset = neukrill_net.dense_dataset.DensePNGDataset(settings_path=run_settings['settings_path'],
run_settings=run_settings['run_settings_path'],
train_or_predict='train',
training_set_mode='test', force=True)
In [236]:
# find allowed batch size over 1000 (want big batches)
# (Theano has to have fixed batch size and we don't want leftover)
batch_size=1000
while dataset.X.shape[0]%batch_size != 0:
batch_size += 1
In [237]:
n_batches = int(dataset.X.shape[0]/batch_size)
In [238]:
%%time
# set this batch size
model.set_batch_size(batch_size)
# compile Theano function
X = model.get_input_space().make_batch_theano()
Y = model.fprop(X)
f = theano.function([X],Y)
The following is the same as the code in test.py that applies the processing.
In [239]:
%%time
y = np.zeros((dataset.X.shape[0],len(settings.classes)))
for i in xrange(n_batches):
print("Batch {0} of {1}".format(i+1,n_batches))
x_arg = dataset.X[i*batch_size:(i+1)*batch_size,:]
if X.ndim > 2:
x_arg = dataset.get_topological_view(x_arg)
y[i*batch_size:(i+1)*batch_size,:] = (f(x_arg.astype(X.dtype).T))
In [240]:
plt.scatter(np.where(y == 0)[1],np.where(y==0)[0])
Out[240]:
In [241]:
import sklearn.metrics
In [242]:
sklearn.metrics.log_loss(dataset.y,y)
Out[242]:
In test.py we take the least intelligent approach to dealing with averaging over the different augmented versions. Basically, we just assume that whatever the augmentation factor is, the labels must repeat over that step size, so we can just collapse those into a single vector of probabilities.
First, we should check that assumption:
In [245]:
# augmentation factor
af = 8
In [246]:
for low,high in zip(range(0,dataset.y.shape[0],af),range(af,dataset.y.shape[0]+af,af)):
first = dataset.y[low][0]
if any(first != i for i in dataset.y[low:high].ravel()):
print("Labels do not match at:", (low,high))
break
In [247]:
y_collapsed = np.zeros((int(dataset.X.shape[0]/af), len(settings.classes)))
for i,(low,high) in enumerate(zip(range(0,dataset.y.shape[0],af),
range(af,dataset.y.shape[0]+af,af))):
y_collapsed[i,:] = np.mean(y[low:high,:], axis=0)
In [248]:
plt.scatter(np.where(y_collapsed == 0)[1],np.where(y_collapsed == 0)[0])
Out[248]:
There are no zeros in there now!
In [249]:
labels_collapsed = dataset.y[range(0,dataset.y.shape[0],af)]
In [250]:
labels_collapsed.shape
Out[250]:
In [251]:
sklearn.metrics.log_loss(labels_collapsed,y_collapsed)
Out[251]:
That's pretty much exactly what we got on the leaderboard.