MNIST Test

WNixalo 2018/5/19-20;25-27

Making sure I have a working baseline for the MNIST dataset. See forum thread for motivation. PyTorch version: 0.3.1.post2

For a walkthrough on converting binary IDX files to NumPy arrays, see idx-to-numpy.ipynb
For a walkthrough debugging several issues with dataloading, see mnist-dataloader-issue.ipynb

This notebook is in large part a practice stage for a research-oriented work flow.

Imports



In [1]:

    
%matplotlib inline
%reload_ext autoreload
%autoreload 2



In [2]:

    
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from pathlib import Path
import os
import struct   # for IDX conversion
import gzip     # for IDX conversion
from urllib.request import urlretrieve # for IDX conversion

from fastai.conv_learner import * # if you want to use fastai Learner



In [3]:

    
PATH = Path('data/mnist')



In [4]:

    
bs = 64
sz = 28



In [358]:

    
def plot_loss(learner, val=None):
    """Plots iterations vs loss and learning rate. Plots training or validation."""
    lrs  = learner.sched.lrs
    x_axis = range(len(lrs))
    loss = learner.sched.losses
    min_loss = min(loss)
        
    fig,ax = plt.subplots(figsize=(14,7))
    ax.set_xlim(left=-20, right=x_axis[-1]+20)
    ax.plot(x_axis, loss, label='loss')
    ax.plot(x_axis, lrs, label='learning rate', color='firebrick');
    ax.set_xlabel('Iterations')
    ax.set_ylabel('Loss & LR')
    
    # Validation Loss
    if val is not None:
        ep_end = len(lrs) // len(val)
        ax.scatter(range(ep_end-1, len(lrs), ep_end), val, c='r', s=20, label='val loss')
    # Minimum Loss
    ax.axhline(y=min_loss, c='r', alpha=0.9, label='Min loss', lw=0.5)
    idx = np.argmin(loss)
    yscal = 1 / (ax.get_ylim()[1] - ax.get_ylim()[0])
    yrltv = (min_loss - ax.get_ylim()[0]) * yscal
    ax.axvline(x=x_axis[idx], ymin=0.5*yrltv, ymax=1.5*yrltv, c='r', alpha=0.9, lw=0.5)
    # 150% Minimum Loss
    idx = np.where(np.array(loss) <= 1.5*min_loss)[0]
    idx = idx[0] if len(idx != 0) else None
    if idx is not None: ax.axvline(x=x_axis[idx], c='slateblue', alpha=0.9, label='50% above Min Loss', lw=0.5)
    # 50% Maximum Loss
    idx = np.where(np.array(loss) <= 0.5*max(loss))[0]
    idx = idx[0] if len(idx != 0) else None
    if idx is not None: ax.axvline(x=x_axis[idx], c='teal', alpha=0.9, label='50% of Max Loss', lw=0.5)
    
    fig.legend(bbox_to_anchor=(0.82,0.82), loc="upper right")

1. Data

1.1 PyTorch method:

The basic method for creating a DataLoader in PyTorch. Adapted from their tutorial and an older notebook.

NOTE the normalization values are largely arbitrary.



In [6]:

    
# torchvision datasets are PIL.Image images of range [0,1]. Must trsfm them 
# to Tensors of normalized range [-1,1]
transform = torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(),
     torchvision.transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])



In [7]:

    
# see: https://gist.github.com/kevinzakka/d33bf8d6c7f06a9d8c76d97a7879f5cb
# frm: https://github.com/pytorch/pytorch/issues/1106

trainset = torchvision.datasets.MNIST(root=PATH, train=True, download=True,
                                   transform=transform)
validset = torchvision.datasets.MNIST(root=PATH, train=True, download=True,
                                   transform=transform)
testset  = torchvision.datasets.MNIST(root=PATH, train=False, download=True,
                                   transform=transform)
p_val = 0.15
n_val = int(p_val * len(trainset))
idxs  = np.arange(len(trainset))
np.random.shuffle(idxs)
train_idxs, valid_idxs = idxs[n_val:], idxs[:n_val]
train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idxs)
valid_sampler = torch.utils.data.sampler.SequentialSampler(valid_idxs)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=bs,
                                          sampler=train_sampler, num_workers=2)
validloader = torch.utils.data.DataLoader(validset, batch_size=bs,
                                          sampler=valid_sampler, num_workers=2)
testloader  = torch.utils.data.DataLoader(testset, batch_size=bs, num_workers=2)



In [8]:

    
classes = [str(i) for i in range(10)]; classes









    Out[8]:





['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

1.1.1 Aside: DataLoaders – PyTorch & fastai:

See mnist-dataloader-issue.ipynb for an in depth dive.

The FastAI DataLoader shares some similarities in construction with the PyTorch one. The logic defining pytorch's DataLoader in the PyTorch source code:

if batch_sampler is None:
    if sampler is None:
        if shuffle:
            sampler = RandomSampler(dataset)
        else:
            sampler = SequentialSampler(dataset)
    batch_sampler = BatchSampler(sampler, batch_size, drop_last)

is the same as that in fast.ai's

if batch_sampler is None:
    if sampler is None:
        sampler = RandomSampler(dataset) if shuffle else SequentialSampler(dataset)
    batch_sampler = BatchSampler(sampler, batch_size, drop_last)

So now I'm not confused about not using a batch sampler when building a pytorch dataloader, although I see one in fastai's DataLoader –– that's because pytorch does it too.

1.2 Custom Method (for Fast AI Model Data)

This loads and converts the MNIST IDX files into NumPy arrays. For MNIST data this looks to be about 45 MB for the images. This way allows for easy use of FastAI's ModelData class, and thus its (extremely useful) Learner abstraction and all other capabilities that come with it. The arrays can be loaded via: ImageClassifierData.from_arrays(..)



In [9]:

    
def download_mnist(path=Path('data/mnist')):
    os.makedirs(path, exist_ok=True)
    urls = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
            'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
            'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
            'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz',]
    for url in urls:
        fname = url.split('/')[-1]
        if not os.path.exists(path/fname): urlretrieve(url, path/fname)

def read_IDX(fname):
    """see: https://gist.github.com/tylerneylon/ce60e8a06e7506ac45788443f7269e40"""
    with gzip.open(fname) as f:
        zero, data_type, dims = struct.unpack('>HBB', f.read(4))
        shape = tuple(struct.unpack('>I', f.read(4))[0] for d in range(dims))
        return np.frombuffer(f.read(), dtype=np.uint8).reshape(shape)



In [10]:

    
download_mnist()



In [11]:

    
fnames = [o for o in os.listdir(PATH) if 'ubyte.gz' in o] # could just use glob
fnames









    Out[11]:





['train-images-idx3-ubyte.gz',
 't10k-labels-idx1-ubyte.gz',
 'train-labels-idx1-ubyte.gz',
 't10k-images-idx3-ubyte.gz']



In [12]:

    
# thanks to: https://stackoverflow.com/a/14849322
trn_x_idx = [i for i,s in enumerate(fnames) if 'train-imag' in s][0]
trn_y_idx = [i for i,s in enumerate(fnames) if 'train-lab' in s][0]
# test data:
tst_x_idx = [i for i,s in enumerate(fnames) if 't10k-imag' in s][0]
tst_y_idx = [i for i,s in enumerate(fnames) if 't10k-lab' in s][0]



In [13]:

    
# load entire IDX files into memory as ndarrays
train_x_array = read_IDX(PATH/fnames[trn_x_idx])
train_y_array = read_IDX(PATH/fnames[trn_y_idx])
# test data:
test_x_array  = read_IDX(PATH/fnames[tst_x_idx])
test_y_array  = read_IDX(PATH/fnames[tst_y_idx])



In [14]:

    
# size of numpy arrays in MBs
train_x_array.nbytes / 2**20, train_y_array.nbytes / 2**20









    Out[14]:





(44.86083984375, 0.057220458984375)

1.3 Fast AI Model Data object

inception_stats have the same Normalization that the pytorch transform above uses for its dataloader. I don't do any data augmentation besides that normalization. I also use the same train/val indices from the pytorch dataloader – to ensure my pytorch model and fastai learner are working on the same data.

Additionally in order to use pretrained models I'm going to concatenate the dataset to have 3 channels instead of 1 by copying dimensions. Another option is to forego a pretrained model and use a fresh resnet set to have only 1 input channel.



In [15]:

    
tfms = tfms_from_stats(inception_stats, sz=sz)
# `inception_stats` are: ([0.5,0.5,0.5],[0.5,0.5,0.5])
# see: https://github.com/fastai/fastai/blob/master/fastai/transforms.py#L695



In [16]:

    
# using same trn/val indices as pytorch dataloader
valid_x_array, valid_y_array = train_x_array[valid_idxs], train_y_array[valid_idxs]
train_x_array, train_y_array = train_x_array[train_idxs], train_y_array[train_idxs]



In [17]:

    
# stack dims for 3 channels
train_x_array = np.stack((train_x_array, train_x_array, train_x_array), axis=-1)
valid_x_array = np.stack((valid_x_array, valid_x_array, valid_x_array), axis=-1)
test_x_array  = np.stack((test_x_array,  test_x_array,  test_x_array),  axis=-1)
# convert labels to np.int8
train_y_array = train_y_array.astype(np.int8)
valid_y_array = valid_y_array.astype(np.int8)
test_y_array  = test_y_array.astype(np.int8)



In [18]:

    
model_data = ImageClassifierData.from_arrays(PATH, 
    (train_x_array, train_y_array), (valid_x_array, valid_y_array),
    bs=bs, tfms=tfms, num_workers=2, test=(test_x_array, test_y_array))

2. Architecture

I want to have a "solid" simple ConvNet to use throughout these experiments. This model will include a large field-of-view input conv layer followed by several conv layers. Each conv layer uses BatchNorm and Leaky ReLU (I don't know if this is better than ReLU, but it sounds like a good'ish idea to me). The model's head uses an AdaptiveConcat Pooling layer (Fast AI invention that concatenates two adaptive average and max pooling layers) leading to a Linear layer. This model doesn't use dropout (I'll add that if it looks like it needs it).



In [19]:

    
class AdaptiveConcatPool2d(nn.Module):
    """fast.ai, see: https://github.com/fastai/fastai/tree/master/fastai/layers.py"""
    def __init__(self, sz=None):
        super().__init__()
        sz = sz or (1,1)
        self.ap = torch.nn.AdaptiveAvgPool2d(sz)
        self.mp = torch.nn.AdaptiveAvgPool2d(sz)
    def forward(self, x):
        return torch.cat([self.mp(x), self.ap(x)], 1)
    
class Flatten(nn.Module):
    """fast.ai, see: https://github.com/fastai/fastai/tree/master/fastai/layers.py"""
    def __init__(self):
        super().__init__()
    def forward(self, x):
        return x.view(x.size(0), -1)



In [20]:

    
class ConvBNLayer(nn.Module):
    """conv layer with batchnorm"""
    def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0):
        super().__init__()
        self.conv  = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride)
        self.bn    = nn.BatchNorm2d(ch_out, momentum=0.1) # mom at default 0.1
        self.lrelu = nn.LeakyReLU(0.01, inplace=True)     # neg slope at default 0.01
    def forward(self, x): return self.lrelu(self.bn(self.conv(x)))

class ConvNet(nn.Module):
    # see ref: https://github.com/fastai/fastai/blob/master/fastai/models/darknet.py
    def __init__(self, ch_in=1):
        super().__init__()
        self.conv0   = ConvBNLayer(ch_in, 16, kernel_size=7, stride=1, padding=2) # large FoV Conv
        self.conv1   = ConvBNLayer(16, 32)
        self.conv2   = ConvBNLayer(32, 64)
        self.conv3   = ConvBNLayer(64, 128)
        self.neck    = nn.Sequential(*[AdaptiveConcatPool2d(1), Flatten()])
        self.head    = nn.Sequential(*[nn.BatchNorm2d(256), 
                                      nn.Dropout(p=0.25),
                                      nn.Linear(256, 10)])        
    def forward(self, x):
        x = self.conv0(x)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.neck(x)
        x = self.head(x)
        return F.log_softmax(x, dim=-1)



In [21]:

    
convnet = ConvNet()

2.0.1 Aside: Discovering AdaptiveConcatPool doubles input tensor length



In [216]:

    
x,y = next(iter(trainloader))
x,y = Variable(x), Variable(y)
convnet(x)









    



> <ipython-input-204-3df4356516d4>(24)forward()
-> x = self.conv0(x)
(Pdb) n
> <ipython-input-204-3df4356516d4>(25)forward()
-> x = self.conv1(x)
(Pdb) n
> <ipython-input-204-3df4356516d4>(26)forward()
-> x = self.conv2(x)
(Pdb) n
> <ipython-input-204-3df4356516d4>(27)forward()
-> x = self.conv3(x)
(Pdb) n
> <ipython-input-204-3df4356516d4>(28)forward()
-> x = self.neck(x)
(Pdb) x.shape # sanity check
torch.Size([64, 128, 16, 16])
(Pdb) AdaptiveConcatPool2d(1)(x).shape
torch.Size([64, 256, 1, 1])
(Pdb) q






    



---------------------------------------------------------------------------
BdbQuit                                   Traceback (most recent call last)
<ipython-input-216-965816993670> in <module>()
      1 x,y = next(iter(trainloader))
      2 x,y = Variable(x), Variable(y)
----> 3 convnet(x)

~/Miniconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    355             result = self._slow_forward(*input, **kwargs)
    356         else:
--> 357             result = self.forward(*input, **kwargs)
    358         for hook in self._forward_hooks.values():
    359             hook_result = hook(self, input, result)

<ipython-input-204-3df4356516d4> in forward(self, x)
     26         x = self.conv2(x)
     27         x = self.conv3(x)
---> 28         x = self.neck(x)
     29         x = self.head(x)
     30         return F.log_softmax(x, dim=-1)

<ipython-input-204-3df4356516d4> in forward(self, x)
     26         x = self.conv2(x)
     27         x = self.conv3(x)
---> 28         x = self.neck(x)
     29         x = self.head(x)
     30         return F.log_softmax(x, dim=-1)

~/Miniconda3/envs/fastai/lib/python3.6/bdb.py in trace_dispatch(self, frame, event, arg)
     49             return # None
     50         if event == 'line':
---> 51             return self.dispatch_line(frame)
     52         if event == 'call':
     53             return self.dispatch_call(frame, arg)

~/Miniconda3/envs/fastai/lib/python3.6/bdb.py in dispatch_line(self, frame)
     68         if self.stop_here(frame) or self.break_here(frame):
     69             self.user_line(frame)
---> 70             if self.quitting: raise BdbQuit
     71         return self.trace_dispatch
     72 

BdbQuit:

2.1 Fast AI Learner

I'll use two fast.ai learners: the basic convnet defined above that the pytorch model will also use, and a resnet18. I'll also use an ImageNet-pretrained resnet18 to see if that helps at all. If .pretrained is not called, you will need to either use ConvnetBuilder or define a custom head yourself. NOTE also that the standard pytorch ResNet model has a 7x7 ouput pooling layer by default, which may restrict your model's performance if it's not replaced (such as with ConvnetBuilder).

The non-pretrained learner's will need their conv layers unfrozen to train them.



In [22]:

    
model_data.c, model_data.is_multi, model_data.is_reg









    Out[22]:





(10, False, False)



In [23]:

    
resnet_model = ConvnetBuilder(resnet18, model_data.c, model_data.is_multi, model_data.is_reg, pretrained=False)

resnet_learner = ConvLearner(model_data, resnet_model)
custom_learner = ConvLearner.from_model_data(ConvNet(ch_in=3), model_data)
pt_res_learner = ConvLearner.pretrained(resnet18, model_data, metrics=[accuracy]) ## NOTE: metrics=[accuracy] not needed - is default

2.1.1 Aside: Layers

Again, the learners' conv layers are initially frozen:



In [63]:

    
True in [[layer.trainable for layer in layer_group] for layer_group in resnet_learner.get_layer_groups()]









    Out[63]:





False

By default only the 'head' classification layer is trainable:



In [64]:

    
[[layer.trainable for layer in layer_group] for layer_group in resnet_learner.get_layer_groups()]









    Out[64]:





[[False, False, False, False, False, False],
 [False, False, False, False],
 [True, True, True, True, True, True, True, True]]

Construct the custom learner with ConvnetBuilder in order to make it's layers iterable:



In [66]:

    
[[layer.trainable for layer in layer_group] for layer_group in custom_learner.get_layer_groups()]









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-66-e14f1b642468> in <module>()
----> 1 [[layer.trainable for layer in layer_group] for layer_group in custom_learner.get_layer_groups()]

<ipython-input-66-e14f1b642468> in <listcomp>(.0)
----> 1 [[layer.trainable for layer in layer_group] for layer_group in custom_learner.get_layer_groups()]

TypeError: 'ConvBNLayer' object is not iterable



In [73]:

    
custom_learner.models









    Out[73]:





<fastai.core.BasicModel at 0x133b41c50>



In [74]:

    
resnet_learner.models









    Out[74]:





<fastai.conv_learner.ConvnetBuilder at 0x13087b4e0>



In [78]:

    
# custom_learner



In [76]:

    
# resnet_learner



In [77]:

    
# pt_res_learner

2.1.2 Recap: Models

I'll be comparing 4 models:

convnet a 1-input channel custom CNN trained in straight PyTorch
custom_learner a 3-input channel custom CNN trained with Fast AI
resnet_learner a 3-input channel fresh ResNet18 trained with Fast AI
pt_res_learner a 3-input channel pretrained (ImageNet) ResNet18 trained with Fast AI.

Perhaps it'd be a good idea to replace the fresh ResNet18's input layer with a 1-channel input to compare it directly to the custom CNN. That's for a future run if I or anyone chooses to do so.

3. Loss Function

torch.nn.CrossEntropyLoss

Do nn.functional. loss functions go in the architecture, and nn. loss functions become criterion? Huh, interesting. It calls nn.functional..



In [24]:

    
criterion = torch.nn.NLLLoss() # log_softmax already in arch; nll(log_softmax) <=> CE
optimizer = torch.optim.SGD(convnet.parameters(), lr=0.01, momentum=0.9)

The Fast.ai Learners:



In [25]:

    
custom_learner.crit









    Out[25]:





<function torch.nn.functional.nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True)>



In [26]:

    
resnet_learner.crit









    Out[26]:





<function torch.nn.functional.nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True)>



In [27]:

    
pt_res_learner.crit









    Out[27]:





<function torch.nn.functional.nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True)>

4. Training

As far as I know, training in base PyTorch is tedious, so I'll do a sanity-check of it first, then do all my training with Fast AI. See ref: §4: Training or §9.1: Train ConvNet & ConvNetMod in this notebook.

There are ways to implement learning-rate scheduling and other advanced techniques in PyTorch – but by that point unless you're doing it for practice or testing a new module: that's what Fast.AI is for.

4.1 base PyTorch



In [28]:

    
len(trainloader) # ceil(51,000 / bs) batches









    Out[28]:





797

There are more improvements to doing train / valid phases – including learning rate scheduling and automatically saving best weights (see: pytorch tutorial) – but that's what fast.ai's for. I'll practice those in the future. Also since the FastAI library is pending an update to PyTorch 0.4, torch.set_grad_enabled can't be used for inference mode. Instead I follow the advice on this pytorch forum thread. For now:



In [29]:

    
optimizer









    Out[29]:





<torch.optim.sgd.SGD at 0x7f54e1448550>

NOTE 1 the criterion and optimizer need to be initialized after the model is sent to the GPU if it is. See pytorch thread.

NOTE 2: Variable.volatile = True can only be set immediately after a Variable is created. See pytorch thread. (this is for using a validation set and not affecting the gradients) – I got this error when trying to set .volatile=True after sending the val data to GPU (torch.FloatTensor $\rightarrow$ torch.cuda.FloatTensor)



In [30]:

    
def train(model=None, crit=None, trainloader=None, valloader=None, num_epochs=1, verbose=True):
    # if verbose:
    #     displays = 5
    #     display_step = max(len(dataloader) // displays, 1)
    t0 = time.time()
    
    dataloaders = {'train':trainloader}
    if valloader: dataloaders['valid'] = valloader
        
#     model.to('cuda:0' if torch.cuda.is_available() else 'cpu') # pytorch >= 0.4
    to_gpu(model)
    criterion = torch.nn.NLLLoss() # log_softmax already in arch; nll(log_softmax) <=> CE
    optimizer = torch.optim.SGD(convnet.parameters(), lr=0.01, momentum=0.9)
    
    # epoch w/ train & val phases
    for epoch in range(num_epochs): 
        print(f'Epoch {epoch+1}/{num_epochs}\n{"-"*10}')
        
        for phase in dataloaders:
            running_loss = 0.0
            running_correct = 0
            
            for i,datum in enumerate(dataloaders[phase]):
                inputs, labels = datum
                inputs, labels = torch.autograd.Variable(inputs), torch.autograd.Variable(labels)
                
                # zero param gradients
                optimizer.zero_grad()

                # (forward) track history if train
                # with torch.set_grad_enabled(phase=='train'): # pytorch >= 0.4
                if phase == 'valid':     # pytorch 3.1 #
                    inputs.volatile=True               #
                    labels.volatile=True               #
                # send data to gpu
                inputs, labels = to_gpu(inputs), to_gpu(labels) # pytorch < 0.4
                outputs = model(inputs)                #
                loss    = crit(outputs, labels)        #
                _, preds= torch.max(outputs, 1) # for accuracy metric
                                                       #
                # backward & optimize if train         #
                if phase == 'train':                   #
                    loss.backward()                    #
                    optimizer.step()                   # indent for pytorch >= 0.4

                # stats
#                 pdb.set_trace()
                running_loss += loss.data[0]
                running_correct += torch.sum(preds == V(labels.data)) # wrap in V; pytorch 3.1
                    
            epoch_loss = running_loss / len(dataloaders[phase])
#             if phase == 'valid': pdb.set_trace()
            epoch_acc  = float(running_correct.double() / len(dataloaders[phase])) # ? pytorch 3.1 reqs float conversion?
#             pdb.set_trace()
            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
                
    time_elapsed = time.time() - t0
    print(f'Training Time {num_epochs} Epochs: {time_elapsed:.3f}s')

Manual PyTorch train / val training phases. See: pytorch tutorial

(forward) track history only if in train:

with torch.set_grad_enabled(False):
    outputs = model(inputs)
    _, preds = torch.max(outputs, 1)
    loss = criterion(outputs, labels)

backward + optimize only if in training phase

    if phase == 'train':
        loss.backward()
        optimizer.step()

NOTE: I think I'm doing something wrong with the validation phase. Saving. PyTorch Docs on Saving.



In [31]:

    
train(model=convnet, crit=criterion, trainloader=trainloader, valloader=validloader)









    



Epoch 1/1
----------
train Loss: 0.1861 Acc: 0.2334
valid Loss: 0.0878 Acc: 0.4610
Training Time 1 Epochs: 17.535s

Previous run on CPU:



In [30]:

    
# train(model=convnet, crit=criterion, trainloader=trainloader, valloader=validloader)









    



Epoch 1/1
----------
train Loss: 0.1932 Acc: 0.0540
valid Loss: 0.0766 Acc: 0.7518
Training Time 1 Epochs: 230.497s



In [32]:

    
torch.save(convnet.state_dict(), 'convnet_mnist_base.pth')



In [33]:

    
convnet.load_state_dict(torch.load('convnet_mnist_base.pth'))

4.2 with Fast AI

4.2.1 Finding Learning Rates

To keep things simple, I won't be using 1-Cycle, Progressive Resizing, or much in the way of Cyclical Learning Rates. That could be a topic for later runs.



In [34]:

    
model_data.trn_ds.get1item(0)[1].dtype









    Out[34]:





dtype('int8')



In [35]:

    
custom_learner.lr_find()
custom_learner.sched.plot()









    





 
 










    



 84%|████████▍ | 673/797 [00:17<00:03, 37.97it/s, loss=1.06]



In [36]:

    
custom_learner.sched.plot_lr()



In [67]:

    
# next(iter(model_data.get_dl(model_data.trn_ds, False)))



In [37]:

    
resnet_learner.lr_find()
resnet_learner.sched.plot()









    





 
 










    



 82%|████████▏ | 653/797 [00:14<00:03, 44.88it/s, loss=2.78]



In [38]:

    
pt_res_learner.lr_find()
pt_res_learner.sched.plot()









    





 
 










    



 82%|████████▏ | 653/797 [00:14<00:03, 45.07it/s, loss=3.6]  





    












    



 82%|████████▏ | 653/797 [00:25<00:05, 25.36it/s, loss=3.6]

I'll use 1e-2 as the lr for all of them.



In [39]:

    
lrs = 1e-2

4.2.2 `custom_learner`



In [40]:

    
# checking all conv layers are being trained:
[layer.trainable for layer in custom_learner.models.get_layer_groups()]









    Out[40]:





[True, True, True, True, True, True]



In [41]:

    
%time custom_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                     
    0      0.088194   0.068054   0.980333  
CPU times: user 20.2 s, sys: 7.77 s, total: 28 s
Wall time: 22.8 s






    Out[41]:





[array([0.06805]), 0.9803333334392972]



In [179]:

    
plot_metrics(custom_learner)

4.2.2.1 Aside: Fast.ai Automatic LR scaling:

Just noticed this very useful feature. Even at very stripped-down settings, Fastai still 'revs' the learning rate up during train-start and back down before train-end:



In [45]:

    
custom_learner.sched.plot_lr()

4.2.3 `resnet_learner`



In [50]:

    
[layer[0].trainable for layer in resnet_learner.models.get_layer_groups()]









    Out[50]:





[False, False, True]



In [51]:

    
resnet_learner.unfreeze()



In [52]:

    
[layer[0].trainable for layer in resnet_learner.models.get_layer_groups()]









    Out[52]:





[True, True, True]



In [53]:

    
%time resnet_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                     
    0      0.087478   0.05272    0.983444  
CPU times: user 39.5 s, sys: 15.5 s, total: 55.1 s
Wall time: 49.7 s






    Out[53]:





[array([0.05272]), 0.9834444443914625]



In [180]:

    
plot_metrics(resnet_learner)

4.2.4 `pt_res_learner`



In [55]:

    
# only training classifier head
%time pt_res_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                    
    0      0.554677   0.58673    0.891556  
CPU times: user 19.6 s, sys: 6.02 s, total: 25.6 s
Wall time: 20.4 s






    Out[55]:





[array([0.58673]), 0.8915555556085375]



In [199]:

    
# min(pt_res_learner.sched.losses)
pt_res_learner.sched.losses[-1]









    Out[199]:





0.5546770245500905



In [200]:

    
pt_res_learner.sched.val_losses









    Out[200]:





[0.5867299038039313]



In [181]:

    
plot_metrics(pt_res_learner)

5. Testing

5.1 PyTorch convnet



In [182]:

    
x,y = next(iter(testloader)) # shape: ([64,1,28,28]; [64])
out = convnet(V(x))          # shape: ([64, 10])



In [183]:

    
_, preds = torch.max(out.data, 1)



In [184]:

    
list(zip(preds[:9], y[:9]))









    Out[184]:





[(7, 7), (2, 2), (1, 1), (0, 0), (4, 4), (1, 1), (4, 4), (9, 9), (5, 5)]

Cool, even with that little training it's able to get a lot right.



In [187]:

    
def test_pytorch(model, dataloader):
    """evaluation script. Returns tuple: (list of predictions, ratio correct)"""
    correct = 0
    total = 0
    
    predictions = []

    for batch in dataloader:
        images, labels = batch   ## could also go w: testloader.dataset.test_labels
        images, labels = to_gpu(images), to_gpu(labels)
        outputs  = convnet(Variable(images))
        _, preds = torch.max(outputs.data, 1)
        total   += labels.size(0)
        correct += (preds == labels).sum()
        
        predictions.extend(preds)
        
    return predictions, correct/total



In [364]:

    
preds, val_acc = test_pytorch(convnet, validloader)
val_acc









    Out[364]:





0.9744444444444444



In [188]:

    
preds, test_acc = test_pytorch(convnet, testloader)
test_acc









    Out[188]:





0.9783

97-98% accuracy on test set. Just checking:



In [189]:

    
_,y = next(iter(testloader))
list(zip(preds[:9], y[:9]))









    Out[189]:





[(7, 7), (2, 2), (1, 1), (0, 0), (4, 4), (1, 1), (4, 4), (9, 9), (5, 5)]

5.2 `custom_learner`



In [191]:

    
# get output predictions
log_preds = custom_learner.predict(is_test=True)
# compare top-scoring preds against dataset
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n









    Out[191]:





0.9819

5.2.1 Aside: (untrained) `custom_learner` Sanity Checks:



In [195]:

    
## 2-3 ways to do the same thing
# log_preds_dl = custom_learner.predict_dl(testloader) # make sure num channels correct before trying this; havent tested
log_preds_dl = custom_learner.predict_dl(model_data.test_dl)
log_preds = custom_learner.predict(is_test=True)

I had some confusion. You do take the max as the top prediction; to get the actual probabilities, since it's a log softmax ouput, you exponentiate.



In [196]:

    
log_preds_dl.shape, log_preds.shape # same shape









    Out[196]:





((10000, 10), (10000, 10))



In [199]:

    
np.unique(log_preds_dl == log_preds) # same values









    Out[199]:





array([ True])



In [232]:









    Out[232]:





 7
 2
 1
⋮ 
 4
 5
 6
[torch.LongTensor of size 10000]



In [236]:

    
np.equal(testloader.dataset.test_labels, np.argmax(log_preds, axis=1)).sum() / len(testloader.dataset.test_labels)









    Out[236]:





0.0892

Untrained CNN gets sub-random (< 10%) accuracy. No surprise, it only ever guesses '5', and sometimes '4':



In [242]:

    
set(np.argmax(log_preds, axis=1)), np.argmax(log_preds, axis=1)









    Out[242]:





({4, 5}, array([5, 5, 5, ..., 5, 5, 5]))

5.3 `resnet_learner`



In [192]:

    
log_preds = resnet_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n









    Out[192]:





0.9863

5.4 `pt_res_learner`



In [193]:

    
log_preds = pt_res_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n









    Out[193]:





0.8923

6. Further Training & Testing

Seeing how far I can go (simply) before overfitting

6.1 `custom_learner`:



In [273]:

    
# prev trn/val loss & valacc: 0.088194   0.068054   0.980333  
%time custom_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                     
    0      0.067517   0.049205   0.986222  
    1      0.050665   0.043011   0.987444                     
CPU times: user 41.1 s, sys: 15.1 s, total: 56.1 s
Wall time: 45.4 s






    Out[273]:





[array([0.04301]), 0.9874444445504083]



In [336]:

    
%time custom_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                     
    0      0.043123   0.036729   0.989444  
    1      0.043052   0.033036   0.989778                     
    2      0.033544   0.030643   0.990889                     
    3      0.043682   0.030089   0.990556                     
CPU times: user 1min 22s, sys: 30.8 s, total: 1min 53s
Wall time: 1min 31s






    Out[336]:





[array([0.03009]), 0.9905555556615193]



In [361]:

    
plot_loss(custom_learner, val=custom_learner.sched.val_losses)



In [338]:

    
custom_learner.save('customcnn_mnist_acc_99056')



In [339]:

    
log_preds = custom_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n









    Out[339]:





0.9892

I think that's good enough for an MNIST warm up.

6.2 `resnet_learner`:



In [342]:

    
%time resnet_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                     
    0      0.063101   0.038616   0.988444  
    1      0.041075   0.034616   0.990222                     
CPU times: user 1min 20s, sys: 30.1 s, total: 1min 50s
Wall time: 1min 39s






    Out[342]:





[array([0.03462]), 0.9902222221692403]



In [343]:

    
%time resnet_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                     
    0      0.039452   0.030857   0.990667  
    1      0.032786   0.028692   0.992111                     
    2      0.024677   0.029187   0.991778                     
    3      0.02215    0.028211   0.991333                     
CPU times: user 2min 39s, sys: 1min 1s, total: 3min 41s
Wall time: 3min 19s






    Out[343]:





[array([0.02821]), 0.9913333334392972]



In [360]:

    
plot_loss(resnet_learner, val=resnet_learner.sched.val_losses)



In [345]:

    
log_preds = resnet_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n









    Out[345]:





0.9931

6.3 `pt_res_learner`:



In [346]:

    
%time pt_res_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                    
    0      0.499828   0.521596   0.908778  
    1      0.456638   0.385642   0.914556                    
CPU times: user 39.5 s, sys: 12.2 s, total: 51.7 s
Wall time: 40.9 s






    Out[346]:





[array([0.38564]), 0.9145555555025736]



In [347]:

    
%time pt_res_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                    
    0      0.450119   0.430365   0.917333  
    1      0.435357   0.407292   0.922667                    
    2      0.412722   0.429438   0.923556                    
    3      0.411739   0.334759   0.925889                    
CPU times: user 1min 18s, sys: 24.3 s, total: 1min 43s
Wall time: 1min 21s






    Out[347]:





[array([0.33476]), 0.9258888889948527]



In [359]:

    
plot_loss(pt_res_learner, val=pt_res_learner.sched.val_losses)



In [365]:

    
log_preds = pt_res_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n









    Out[365]:





0.9236

7. Comparisons & Thoughts

With single-epoch test set accuracies already in the 90%s, I'm not sure how useful a standard-regime baseline with MNIST will be.

What has been extremely valuable was the practice setting this up has been. With pytorch, with fastai callbacks, with data processing, and a lot else. This'll hopefully make the next experiments with CIFAR-10 and ImageNet much smoother and to the point.

Stats:

The custom CNN model convnet in a simple pytorch training loop achieved a 97.83% test - accuracy after 1 epoch. I think I wrote the validation procedure wrong (current Pytorch documentation is for version 0.4; I'm working with 0.3.1), nonetheless a val loss of 0.0878 was recorded after 1 epoch.
The custom CNN learner custom_learner achieved a 98.92% test accuracy after 7 epochs of training, 98.19% after only 1. Validation Loss (ep 7,1): 0.030089, 0.068054
The fresh ResNet18 learner resnet_learner achieved a 99.31% test accuracy after 7, and 98.63% after 1. Validation Loss (ep 7,1): 0.028211, 0.05272
The pretrained ResNet18 learner pt_res_learner (training only the classifier head) achieved a 92.36% test accuracy after 7, and 89.23% after 1. Validation Loss (ep 7,1): 0.334759, 0.58673

No model overfit, and only the fresh ResNet18 learner had a training loss better than validation. All learners appeared to be beginning to bottom-out in validation loss roughly around 0.3, maintaining the default Cosine Annealing learning-rate schedule fastai uses.

In looking up what default LR scheduler fastai uses: apparently fastai has a built-in SaveBestModel callback in sgdr.py.

model/learner	1-epoch val loss	7-epoch val loss	1-epoch test accuracy	7-epoch test accuracy
`convnet`	0.0878	–	97.83%	–
`custom_learner`	0.068054	0.030089	98.19%	98.92%
`resnet_learner`	0.05272	0.028211	98.63%	99.31%
`pt_res_learner`	0.58673	0.334759	89.23%	92.36%