WNixalo 2018/5/19-20;25-27
Making sure I have a working baseline for the MNIST dataset. See forum thread for motivation. PyTorch version: 0.3.1.post2
For a walkthrough on converting binary IDX files to NumPy arrays, see idx-to-numpy.ipynb
For a walkthrough debugging several issues with dataloading, see mnist-dataloader-issue.ipynb
This notebook is in large part a practice stage for a research-oriented work flow.
In [1]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2
In [2]:
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from pathlib import Path
import os
import struct # for IDX conversion
import gzip # for IDX conversion
from urllib.request import urlretrieve # for IDX conversion
from fastai.conv_learner import * # if you want to use fastai Learner
In [3]:
PATH = Path('data/mnist')
In [4]:
bs = 64
sz = 28
In [358]:
def plot_loss(learner, val=None):
"""Plots iterations vs loss and learning rate. Plots training or validation."""
lrs = learner.sched.lrs
x_axis = range(len(lrs))
loss = learner.sched.losses
min_loss = min(loss)
fig,ax = plt.subplots(figsize=(14,7))
ax.set_xlim(left=-20, right=x_axis[-1]+20)
ax.plot(x_axis, loss, label='loss')
ax.plot(x_axis, lrs, label='learning rate', color='firebrick');
ax.set_xlabel('Iterations')
ax.set_ylabel('Loss & LR')
# Validation Loss
if val is not None:
ep_end = len(lrs) // len(val)
ax.scatter(range(ep_end-1, len(lrs), ep_end), val, c='r', s=20, label='val loss')
# Minimum Loss
ax.axhline(y=min_loss, c='r', alpha=0.9, label='Min loss', lw=0.5)
idx = np.argmin(loss)
yscal = 1 / (ax.get_ylim()[1] - ax.get_ylim()[0])
yrltv = (min_loss - ax.get_ylim()[0]) * yscal
ax.axvline(x=x_axis[idx], ymin=0.5*yrltv, ymax=1.5*yrltv, c='r', alpha=0.9, lw=0.5)
# 150% Minimum Loss
idx = np.where(np.array(loss) <= 1.5*min_loss)[0]
idx = idx[0] if len(idx != 0) else None
if idx is not None: ax.axvline(x=x_axis[idx], c='slateblue', alpha=0.9, label='50% above Min Loss', lw=0.5)
# 50% Maximum Loss
idx = np.where(np.array(loss) <= 0.5*max(loss))[0]
idx = idx[0] if len(idx != 0) else None
if idx is not None: ax.axvline(x=x_axis[idx], c='teal', alpha=0.9, label='50% of Max Loss', lw=0.5)
fig.legend(bbox_to_anchor=(0.82,0.82), loc="upper right")
The basic method for creating a DataLoader in PyTorch. Adapted from their tutorial and an older notebook.
In [6]:
# torchvision datasets are PIL.Image images of range [0,1]. Must trsfm them
# to Tensors of normalized range [-1,1]
transform = torchvision.transforms.Compose(
[torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
In [7]:
# see: https://gist.github.com/kevinzakka/d33bf8d6c7f06a9d8c76d97a7879f5cb
# frm: https://github.com/pytorch/pytorch/issues/1106
trainset = torchvision.datasets.MNIST(root=PATH, train=True, download=True,
transform=transform)
validset = torchvision.datasets.MNIST(root=PATH, train=True, download=True,
transform=transform)
testset = torchvision.datasets.MNIST(root=PATH, train=False, download=True,
transform=transform)
p_val = 0.15
n_val = int(p_val * len(trainset))
idxs = np.arange(len(trainset))
np.random.shuffle(idxs)
train_idxs, valid_idxs = idxs[n_val:], idxs[:n_val]
train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idxs)
valid_sampler = torch.utils.data.sampler.SequentialSampler(valid_idxs)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=bs,
sampler=train_sampler, num_workers=2)
validloader = torch.utils.data.DataLoader(validset, batch_size=bs,
sampler=valid_sampler, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=bs, num_workers=2)
In [8]:
classes = [str(i) for i in range(10)]; classes
Out[8]:
The FastAI DataLoader shares some similarities in construction with the PyTorch one. The logic defining pytorch's DataLoader in the PyTorch source code:
if batch_sampler is None:
if sampler is None:
if shuffle:
sampler = RandomSampler(dataset)
else:
sampler = SequentialSampler(dataset)
batch_sampler = BatchSampler(sampler, batch_size, drop_last)
is the same as that in fast.ai's
if batch_sampler is None:
if sampler is None:
sampler = RandomSampler(dataset) if shuffle else SequentialSampler(dataset)
batch_sampler = BatchSampler(sampler, batch_size, drop_last)
So now I'm not confused about not using a batch sampler when building a pytorch dataloader, although I see one in fastai's DataLoader –– that's because pytorch does it too.
This loads and converts the MNIST IDX files into NumPy arrays. For MNIST data this looks to be about 45 MB for the images. This way allows for easy use of FastAI's ModelData class, and thus its (extremely useful) Learner abstraction and all other capabilities that come with it. The arrays can be loaded via: ImageClassifierData.from_arrays(..)
In [9]:
def download_mnist(path=Path('data/mnist')):
os.makedirs(path, exist_ok=True)
urls = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz',]
for url in urls:
fname = url.split('/')[-1]
if not os.path.exists(path/fname): urlretrieve(url, path/fname)
def read_IDX(fname):
"""see: https://gist.github.com/tylerneylon/ce60e8a06e7506ac45788443f7269e40"""
with gzip.open(fname) as f:
zero, data_type, dims = struct.unpack('>HBB', f.read(4))
shape = tuple(struct.unpack('>I', f.read(4))[0] for d in range(dims))
return np.frombuffer(f.read(), dtype=np.uint8).reshape(shape)
In [10]:
download_mnist()
In [11]:
fnames = [o for o in os.listdir(PATH) if 'ubyte.gz' in o] # could just use glob
fnames
Out[11]:
In [12]:
# thanks to: https://stackoverflow.com/a/14849322
trn_x_idx = [i for i,s in enumerate(fnames) if 'train-imag' in s][0]
trn_y_idx = [i for i,s in enumerate(fnames) if 'train-lab' in s][0]
# test data:
tst_x_idx = [i for i,s in enumerate(fnames) if 't10k-imag' in s][0]
tst_y_idx = [i for i,s in enumerate(fnames) if 't10k-lab' in s][0]
In [13]:
# load entire IDX files into memory as ndarrays
train_x_array = read_IDX(PATH/fnames[trn_x_idx])
train_y_array = read_IDX(PATH/fnames[trn_y_idx])
# test data:
test_x_array = read_IDX(PATH/fnames[tst_x_idx])
test_y_array = read_IDX(PATH/fnames[tst_y_idx])
In [14]:
# size of numpy arrays in MBs
train_x_array.nbytes / 2**20, train_y_array.nbytes / 2**20
Out[14]:
inception_stats
have the same Normalization that the pytorch transform above uses for its dataloader. I don't do any data augmentation besides that normalization. I also use the same train/val indices from the pytorch dataloader – to ensure my pytorch model and fastai learner are working on the same data.
Additionally in order to use pretrained models I'm going to concatenate the dataset to have 3 channels instead of 1 by copying dimensions. Another option is to forego a pretrained model and use a fresh resnet set to have only 1 input channel.
In [15]:
tfms = tfms_from_stats(inception_stats, sz=sz)
# `inception_stats` are: ([0.5,0.5,0.5],[0.5,0.5,0.5])
# see: https://github.com/fastai/fastai/blob/master/fastai/transforms.py#L695
In [16]:
# using same trn/val indices as pytorch dataloader
valid_x_array, valid_y_array = train_x_array[valid_idxs], train_y_array[valid_idxs]
train_x_array, train_y_array = train_x_array[train_idxs], train_y_array[train_idxs]
In [17]:
# stack dims for 3 channels
train_x_array = np.stack((train_x_array, train_x_array, train_x_array), axis=-1)
valid_x_array = np.stack((valid_x_array, valid_x_array, valid_x_array), axis=-1)
test_x_array = np.stack((test_x_array, test_x_array, test_x_array), axis=-1)
# convert labels to np.int8
train_y_array = train_y_array.astype(np.int8)
valid_y_array = valid_y_array.astype(np.int8)
test_y_array = test_y_array.astype(np.int8)
In [18]:
model_data = ImageClassifierData.from_arrays(PATH,
(train_x_array, train_y_array), (valid_x_array, valid_y_array),
bs=bs, tfms=tfms, num_workers=2, test=(test_x_array, test_y_array))
I want to have a "solid" simple ConvNet to use throughout these experiments. This model will include a large field-of-view input conv layer followed by several conv layers. Each conv layer uses BatchNorm and Leaky ReLU (I don't know if this is better than ReLU, but it sounds like a good'ish idea to me). The model's head uses an AdaptiveConcat Pooling layer (Fast AI invention that concatenates two adaptive average and max pooling layers) leading to a Linear layer. This model doesn't use dropout (I'll add that if it looks like it needs it).
In [19]:
class AdaptiveConcatPool2d(nn.Module):
"""fast.ai, see: https://github.com/fastai/fastai/tree/master/fastai/layers.py"""
def __init__(self, sz=None):
super().__init__()
sz = sz or (1,1)
self.ap = torch.nn.AdaptiveAvgPool2d(sz)
self.mp = torch.nn.AdaptiveAvgPool2d(sz)
def forward(self, x):
return torch.cat([self.mp(x), self.ap(x)], 1)
class Flatten(nn.Module):
"""fast.ai, see: https://github.com/fastai/fastai/tree/master/fastai/layers.py"""
def __init__(self):
super().__init__()
def forward(self, x):
return x.view(x.size(0), -1)
In [20]:
class ConvBNLayer(nn.Module):
"""conv layer with batchnorm"""
def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0):
super().__init__()
self.conv = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride)
self.bn = nn.BatchNorm2d(ch_out, momentum=0.1) # mom at default 0.1
self.lrelu = nn.LeakyReLU(0.01, inplace=True) # neg slope at default 0.01
def forward(self, x): return self.lrelu(self.bn(self.conv(x)))
class ConvNet(nn.Module):
# see ref: https://github.com/fastai/fastai/blob/master/fastai/models/darknet.py
def __init__(self, ch_in=1):
super().__init__()
self.conv0 = ConvBNLayer(ch_in, 16, kernel_size=7, stride=1, padding=2) # large FoV Conv
self.conv1 = ConvBNLayer(16, 32)
self.conv2 = ConvBNLayer(32, 64)
self.conv3 = ConvBNLayer(64, 128)
self.neck = nn.Sequential(*[AdaptiveConcatPool2d(1), Flatten()])
self.head = nn.Sequential(*[nn.BatchNorm2d(256),
nn.Dropout(p=0.25),
nn.Linear(256, 10)])
def forward(self, x):
x = self.conv0(x)
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.neck(x)
x = self.head(x)
return F.log_softmax(x, dim=-1)
In [21]:
convnet = ConvNet()
In [216]:
x,y = next(iter(trainloader))
x,y = Variable(x), Variable(y)
convnet(x)
I'll use two fast.ai learners: the basic convnet defined above that the pytorch model will also use, and a resnet18. I'll also use an ImageNet-pretrained resnet18 to see if that helps at all. If .pretrained
is not called, you will need to either use ConvnetBuilder
or define a custom head yourself. NOTE also that the standard pytorch ResNet model has a 7x7 ouput pooling layer by default, which may restrict your model's performance if it's not replaced (such as with ConvnetBuilder).
The non-pretrained learner's will need their conv layers unfrozen to train them.
In [22]:
model_data.c, model_data.is_multi, model_data.is_reg
Out[22]:
In [23]:
resnet_model = ConvnetBuilder(resnet18, model_data.c, model_data.is_multi, model_data.is_reg, pretrained=False)
resnet_learner = ConvLearner(model_data, resnet_model)
custom_learner = ConvLearner.from_model_data(ConvNet(ch_in=3), model_data)
pt_res_learner = ConvLearner.pretrained(resnet18, model_data, metrics=[accuracy]) ## NOTE: metrics=[accuracy] not needed - is default
Again, the learners' conv layers are initially frozen:
In [63]:
True in [[layer.trainable for layer in layer_group] for layer_group in resnet_learner.get_layer_groups()]
Out[63]:
By default only the 'head' classification layer is trainable:
In [64]:
[[layer.trainable for layer in layer_group] for layer_group in resnet_learner.get_layer_groups()]
Out[64]:
Construct the custom learner with ConvnetBuilder in order to make it's layers iterable:
In [66]:
[[layer.trainable for layer in layer_group] for layer_group in custom_learner.get_layer_groups()]
In [73]:
custom_learner.models
Out[73]:
In [74]:
resnet_learner.models
Out[74]:
In [78]:
# custom_learner
In [76]:
# resnet_learner
In [77]:
# pt_res_learner
I'll be comparing 4 models:
convnet
a 1-input channel custom CNN trained in straight PyTorchcustom_learner
a 3-input channel custom CNN trained with Fast AIresnet_learner
a 3-input channel fresh ResNet18 trained with Fast AIpt_res_learner
a 3-input channel pretrained (ImageNet) ResNet18 trained with Fast AI.Perhaps it'd be a good idea to replace the fresh ResNet18's input layer with a 1-channel input to compare it directly to the custom CNN. That's for a future run if I or anyone chooses to do so.
Do nn.functional.
loss functions go in the architecture, and nn.
loss functions become criterion? Huh, interesting. It calls nn.functional.
.
In [24]:
criterion = torch.nn.NLLLoss() # log_softmax already in arch; nll(log_softmax) <=> CE
optimizer = torch.optim.SGD(convnet.parameters(), lr=0.01, momentum=0.9)
The Fast.ai Learners:
In [25]:
custom_learner.crit
Out[25]:
In [26]:
resnet_learner.crit
Out[26]:
In [27]:
pt_res_learner.crit
Out[27]:
As far as I know, training in base PyTorch is tedious, so I'll do a sanity-check of it first, then do all my training with Fast AI. See ref: §4: Training or §9.1: Train ConvNet & ConvNetMod in this notebook.
There are ways to implement learning-rate scheduling and other advanced techniques in PyTorch – but by that point unless you're doing it for practice or testing a new module: that's what Fast.AI is for.
In [28]:
len(trainloader) # ceil(51,000 / bs) batches
Out[28]:
There are more improvements to doing train / valid phases – including learning rate scheduling and automatically saving best weights (see: pytorch tutorial) – but that's what fast.ai's for. I'll practice those in the future. Also since the FastAI library is pending an update to PyTorch 0.4, torch.set_grad_enabled
can't be used for inference mode. Instead I follow the advice on this pytorch forum thread. For now:
In [29]:
optimizer
Out[29]:
NOTE 1 the criterion and optimizer need to be initialized after the model is sent to the GPU if it is. See pytorch thread.
NOTE 2: Variable.volatile = True
can only be set immediately after a Variable is created. See pytorch thread. (this is for using a validation set and not affecting the gradients) – I got this error when trying to set .volatile=True
after sending the val data to GPU (torch.FloatTensor
$\rightarrow$ torch.cuda.FloatTensor
)
In [30]:
def train(model=None, crit=None, trainloader=None, valloader=None, num_epochs=1, verbose=True):
# if verbose:
# displays = 5
# display_step = max(len(dataloader) // displays, 1)
t0 = time.time()
dataloaders = {'train':trainloader}
if valloader: dataloaders['valid'] = valloader
# model.to('cuda:0' if torch.cuda.is_available() else 'cpu') # pytorch >= 0.4
to_gpu(model)
criterion = torch.nn.NLLLoss() # log_softmax already in arch; nll(log_softmax) <=> CE
optimizer = torch.optim.SGD(convnet.parameters(), lr=0.01, momentum=0.9)
# epoch w/ train & val phases
for epoch in range(num_epochs):
print(f'Epoch {epoch+1}/{num_epochs}\n{"-"*10}')
for phase in dataloaders:
running_loss = 0.0
running_correct = 0
for i,datum in enumerate(dataloaders[phase]):
inputs, labels = datum
inputs, labels = torch.autograd.Variable(inputs), torch.autograd.Variable(labels)
# zero param gradients
optimizer.zero_grad()
# (forward) track history if train
# with torch.set_grad_enabled(phase=='train'): # pytorch >= 0.4
if phase == 'valid': # pytorch 3.1 #
inputs.volatile=True #
labels.volatile=True #
# send data to gpu
inputs, labels = to_gpu(inputs), to_gpu(labels) # pytorch < 0.4
outputs = model(inputs) #
loss = crit(outputs, labels) #
_, preds= torch.max(outputs, 1) # for accuracy metric
#
# backward & optimize if train #
if phase == 'train': #
loss.backward() #
optimizer.step() # indent for pytorch >= 0.4
# stats
# pdb.set_trace()
running_loss += loss.data[0]
running_correct += torch.sum(preds == V(labels.data)) # wrap in V; pytorch 3.1
epoch_loss = running_loss / len(dataloaders[phase])
# if phase == 'valid': pdb.set_trace()
epoch_acc = float(running_correct.double() / len(dataloaders[phase])) # ? pytorch 3.1 reqs float conversion?
# pdb.set_trace()
print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
time_elapsed = time.time() - t0
print(f'Training Time {num_epochs} Epochs: {time_elapsed:.3f}s')
Manual PyTorch train / val training phases. See: pytorch tutorial
(forward) track history only if in train:
with torch.set_grad_enabled(False):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
NOTE: I think I'm doing something wrong with the validation phase. Saving. PyTorch Docs on Saving.
In [31]:
train(model=convnet, crit=criterion, trainloader=trainloader, valloader=validloader)
Previous run on CPU:
In [30]:
# train(model=convnet, crit=criterion, trainloader=trainloader, valloader=validloader)
In [32]:
torch.save(convnet.state_dict(), 'convnet_mnist_base.pth')
In [33]:
convnet.load_state_dict(torch.load('convnet_mnist_base.pth'))
To keep things simple, I won't be using 1-Cycle, Progressive Resizing, or much in the way of Cyclical Learning Rates. That could be a topic for later runs.
In [34]:
model_data.trn_ds.get1item(0)[1].dtype
Out[34]:
In [35]:
custom_learner.lr_find()
custom_learner.sched.plot()
In [36]:
custom_learner.sched.plot_lr()
In [67]:
# next(iter(model_data.get_dl(model_data.trn_ds, False)))
In [37]:
resnet_learner.lr_find()
resnet_learner.sched.plot()
In [38]:
pt_res_learner.lr_find()
pt_res_learner.sched.plot()
I'll use 1e-2
as the lr
for all of them.
In [39]:
lrs = 1e-2
In [40]:
# checking all conv layers are being trained:
[layer.trainable for layer in custom_learner.models.get_layer_groups()]
Out[40]:
In [41]:
%time custom_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)
Out[41]:
In [179]:
plot_metrics(custom_learner)
Just noticed this very useful feature. Even at very stripped-down settings, Fastai still 'revs' the learning rate up during train-start and back down before train-end:
In [45]:
custom_learner.sched.plot_lr()
In [50]:
[layer[0].trainable for layer in resnet_learner.models.get_layer_groups()]
Out[50]:
In [51]:
resnet_learner.unfreeze()
In [52]:
[layer[0].trainable for layer in resnet_learner.models.get_layer_groups()]
Out[52]:
In [53]:
%time resnet_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)
Out[53]:
In [180]:
plot_metrics(resnet_learner)
In [55]:
# only training classifier head
%time pt_res_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)
Out[55]:
In [199]:
# min(pt_res_learner.sched.losses)
pt_res_learner.sched.losses[-1]
Out[199]:
In [200]:
pt_res_learner.sched.val_losses
Out[200]:
In [181]:
plot_metrics(pt_res_learner)
In [182]:
x,y = next(iter(testloader)) # shape: ([64,1,28,28]; [64])
out = convnet(V(x)) # shape: ([64, 10])
In [183]:
_, preds = torch.max(out.data, 1)
In [184]:
list(zip(preds[:9], y[:9]))
Out[184]:
Cool, even with that little training it's able to get a lot right.
In [187]:
def test_pytorch(model, dataloader):
"""evaluation script. Returns tuple: (list of predictions, ratio correct)"""
correct = 0
total = 0
predictions = []
for batch in dataloader:
images, labels = batch ## could also go w: testloader.dataset.test_labels
images, labels = to_gpu(images), to_gpu(labels)
outputs = convnet(Variable(images))
_, preds = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (preds == labels).sum()
predictions.extend(preds)
return predictions, correct/total
In [364]:
preds, val_acc = test_pytorch(convnet, validloader)
val_acc
Out[364]:
In [188]:
preds, test_acc = test_pytorch(convnet, testloader)
test_acc
Out[188]:
97-98% accuracy on test set. Just checking:
In [189]:
_,y = next(iter(testloader))
list(zip(preds[:9], y[:9]))
Out[189]:
In [191]:
# get output predictions
log_preds = custom_learner.predict(is_test=True)
# compare top-scoring preds against dataset
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n
Out[191]:
In [195]:
## 2-3 ways to do the same thing
# log_preds_dl = custom_learner.predict_dl(testloader) # make sure num channels correct before trying this; havent tested
log_preds_dl = custom_learner.predict_dl(model_data.test_dl)
log_preds = custom_learner.predict(is_test=True)
I had some confusion. You do take the max as the top prediction; to get the actual probabilities, since it's a log softmax ouput, you exponentiate.
In [196]:
log_preds_dl.shape, log_preds.shape # same shape
Out[196]:
In [199]:
np.unique(log_preds_dl == log_preds) # same values
Out[199]:
In [232]:
Out[232]:
In [236]:
np.equal(testloader.dataset.test_labels, np.argmax(log_preds, axis=1)).sum() / len(testloader.dataset.test_labels)
Out[236]:
Untrained CNN gets sub-random (< 10%) accuracy. No surprise, it only ever guesses '5', and sometimes '4':
In [242]:
set(np.argmax(log_preds, axis=1)), np.argmax(log_preds, axis=1)
Out[242]:
In [192]:
log_preds = resnet_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n
Out[192]:
In [193]:
log_preds = pt_res_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n
Out[193]:
Seeing how far I can go (simply) before overfitting
In [273]:
# prev trn/val loss & valacc: 0.088194 0.068054 0.980333
%time custom_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)
Out[273]:
In [336]:
%time custom_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)
Out[336]:
In [361]:
plot_loss(custom_learner, val=custom_learner.sched.val_losses)
In [338]:
custom_learner.save('customcnn_mnist_acc_99056')
In [339]:
log_preds = custom_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n
Out[339]:
I think that's good enough for an MNIST warm up.
In [342]:
%time resnet_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)
Out[342]:
In [343]:
%time resnet_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)
Out[343]:
In [360]:
plot_loss(resnet_learner, val=resnet_learner.sched.val_losses)
In [345]:
log_preds = resnet_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n
Out[345]:
In [346]:
%time pt_res_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)
Out[346]:
In [347]:
%time pt_res_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)
Out[347]:
In [359]:
plot_loss(pt_res_learner, val=pt_res_learner.sched.val_losses)
In [365]:
log_preds = pt_res_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n
Out[365]:
With single-epoch test set accuracies already in the 90%s, I'm not sure how useful a standard-regime baseline with MNIST will be.
What has been extremely valuable was the practice setting this up has been. With pytorch, with fastai callbacks, with data processing, and a lot else. This'll hopefully make the next experiments with CIFAR-10 and ImageNet much smoother and to the point.
The custom CNN model convnet
in a simple pytorch training loop achieved a 97.83% test - accuracy after 1 epoch. I think I wrote the validation procedure wrong (current Pytorch documentation is for version 0.4; I'm working with 0.3.1), nonetheless a val loss of 0.0878 was recorded after 1 epoch.
The custom CNN learner custom_learner
achieved a 98.92% test accuracy after 7 epochs of training, 98.19% after only 1. Validation Loss (ep 7,1): 0.030089, 0.068054
The fresh ResNet18 learner resnet_learner
achieved a 99.31% test accuracy after 7, and 98.63% after 1. Validation Loss (ep 7,1): 0.028211, 0.05272
The pretrained ResNet18 learner pt_res_learner
(training only the classifier head) achieved a 92.36% test accuracy after 7, and 89.23% after 1. Validation Loss (ep 7,1): 0.334759, 0.58673
No model overfit, and only the fresh ResNet18 learner had a training loss better than validation. All learners appeared to be beginning to bottom-out in validation loss roughly around 0.3, maintaining the default Cosine Annealing learning-rate schedule fastai uses.
In looking up what default LR scheduler fastai uses: apparently fastai has a built-in SaveBestModel
callback in sgdr.py.
model/learner | 1-epoch val loss | 7-epoch val loss | 1-epoch test accuracy | 7-epoch test accuracy |
---|---|---|---|---|
convnet |
0.0878 | – | 97.83% | – |
custom_learner |
0.068054 | 0.030089 | 98.19% | 98.92% |
resnet_learner |
0.05272 | 0.028211 | 98.63% | 99.31% |
pt_res_learner |
0.58673 | 0.334759 | 89.23% | 92.36% |