MNIST Test

WNixalo 2018/5/19-20;25-26

Making sure I have a working baseline for the MNIST dataset. PyTorch version: 0.3.1.post2

This notebook is in large part a practice stage for a research-oriented work flow.


Imports


In [1]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [2]:
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from pathlib import Path
import os
import struct   # for IDX conversion
import gzip     # for IDX conversion
from urllib.request import urlretrieve # for IDX conversion

from fastai.conv_learner import * # if you want to use fastai Learner

In [3]:
PATH = Path('data/mnist')

In [4]:
bs = 64
sz = 28

In [333]:
def plot_loss(learner, val=None):
    """Plots iterations vs loss and learning rate. Plots training or validation."""
    lrs  = learner.sched.lrs
    x_axis = range(len(lrs))
    loss = learner.sched.losses
    min_loss = min(loss)
        
    fig,ax = plt.subplots(figsize=(14,7))
    ax.set_xlim(left=-20, right=x_axis[-1]+20)
    ax.plot(x_axis, loss, label='loss')
    ax.plot(x_axis, lrs, label='learning rate', color='firebrick');
    ax.set_xlabel('Iterations')
    ax.set_ylabel('Loss & LR')
    
    # Validation Loss
    if val is not None:
        ep_end = len(lrs) // len(val)
        ax.scatter(range(ep_end-1, len(lrs), ep_end), val, c='r', s=20, label='val loss')
    
    # Minimum Loss
    ax.axhline(y=min_loss, c='r', alpha=0.9, label='Min loss', lw=0.5)
    idx = np.argmin(min_loss)
    yscal = 1 / (ax.get_ylim()[1] - ax.get_ylim()[0])
    yrltv = (min_loss - ax.get_ylim()[0]) * yscal
    ax.axvline(x=x_axis[idx], ymin=0.5*yrltv, ymax=1.5*yrltv, c='r', alpha=0.9, lw=0.5)
    # 150% Minimum Loss
    idx = np.where(np.array(loss) <= 1.5*min_loss)[0][0]
    ax.axvline(x=x_axis[idx], c='slateblue', alpha=0.9, label='50% above Min Loss', lw=0.5)
    # 50% Maximum Loss
    idx = np.where(np.array(loss) <= 0.5*max(loss))[0][0]
    ax.axvline(x=x_axis[idx], c='teal', alpha=0.9, label='50% of Max Loss', lw=0.5)
    
    fig.legend(bbox_to_anchor=(0.82,0.82), loc="upper right")

1. Data

1.1 PyTorch method:

The basic method for creating a DataLoader in PyTorch. Adapted from their tutorial and an older notebook.


In [6]:
# torchvision datasets are PIL.Image images of range [0,1]. Must trsfm them 
# to Tensors of normalized range [-1,1]
transform = torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(),
     torchvision.transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])

In [7]:
# see: https://gist.github.com/kevinzakka/d33bf8d6c7f06a9d8c76d97a7879f5cb
# frm: https://github.com/pytorch/pytorch/issues/1106

trainset = torchvision.datasets.MNIST(root=PATH, train=True, download=True,
                                   transform=transform)
validset = torchvision.datasets.MNIST(root=PATH, train=True, download=True,
                                   transform=transform)
testset  = torchvision.datasets.MNIST(root=PATH, train=False, download=True,
                                   transform=transform)
p_val = 0.15
n_val = int(p_val * len(trainset))
idxs  = np.arange(len(trainset))
np.random.shuffle(idxs)
train_idxs, valid_idxs = idxs[n_val:], idxs[:n_val]
train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idxs)
valid_sampler = torch.utils.data.sampler.SequentialSampler(valid_idxs)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=bs,
                                          sampler=train_sampler, num_workers=2)
validloader = torch.utils.data.DataLoader(validset, batch_size=bs,
                                          sampler=valid_sampler, num_workers=2)
testloader  = torch.utils.data.DataLoader(testset, batch_size=bs, num_workers=2)

In [8]:
classes = [str(i) for i in range(10)]; classes


Out[8]:
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

1.1.1 Aside: DataLoaders – PyTorch & fastai:

The FastAI DataLoader shares some similarities in construction with the PyTorch one. The logic defining pytorch's DataLoader in the PyTorch source code:

if batch_sampler is None:
    if sampler is None:
        if shuffle:
            sampler = RandomSampler(dataset)
        else:
            sampler = SequentialSampler(dataset)
    batch_sampler = BatchSampler(sampler, batch_size, drop_last)

is the same as that in fast.ai's

if batch_sampler is None:
    if sampler is None:
        sampler = RandomSampler(dataset) if shuffle else SequentialSampler(dataset)
    batch_sampler = BatchSampler(sampler, batch_size, drop_last)

So now I'm not confused about not using a batch sampler when building a pytorch dataloader, although I see one in fastai's DataLoader –– that's because pytorch does it too.

1.2 Custom Method (for Fast AI Model Data)

This loads and converts the MNIST IDX files into NumPy arrays. For MNIST data this looks to be about 45 MB for the images. This way allows for easy use of FastAI's ModelData class, and thus its (extremely useful) Learner abstraction and all other capabilities that come with it. The arrays can be loaded via: ImageClassifierData.from_arrays(..)


In [9]:
def download_mnist(path=Path('data/mnist')):
    os.makedirs(path, exist_ok=True)
    urls = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
            'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
            'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
            'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz',]
    for url in urls:
        fname = url.split('/')[-1]
        if not os.path.exists(path/fname): urlretrieve(url, path/fname)

def read_IDX(fname):
    """see: https://gist.github.com/tylerneylon/ce60e8a06e7506ac45788443f7269e40"""
    with gzip.open(fname) as f:
        zero, data_type, dims = struct.unpack('>HBB', f.read(4))
        shape = tuple(struct.unpack('>I', f.read(4))[0] for d in range(dims))
        return np.frombuffer(f.read(), dtype=np.uint8).reshape(shape)

In [10]:
download_mnist()

In [11]:
fnames = [o for o in os.listdir(PATH) if 'ubyte.gz' in o] # could just use glob
fnames


Out[11]:
['train-images-idx3-ubyte.gz',
 't10k-labels-idx1-ubyte.gz',
 'train-labels-idx1-ubyte.gz',
 't10k-images-idx3-ubyte.gz']

In [12]:
# thanks to: https://stackoverflow.com/a/14849322
trn_x_idx = [i for i,s in enumerate(fnames) if 'train-imag' in s][0]
trn_y_idx = [i for i,s in enumerate(fnames) if 'train-lab' in s][0]
# test data:
tst_x_idx = [i for i,s in enumerate(fnames) if 't10k-imag' in s][0]
tst_y_idx = [i for i,s in enumerate(fnames) if 't10k-lab' in s][0]

In [13]:
# load entire IDX files into memory as ndarrays
train_x_array = read_IDX(PATH/fnames[trn_x_idx])
train_y_array = read_IDX(PATH/fnames[trn_y_idx])
# test data:
test_x_array  = read_IDX(PATH/fnames[tst_x_idx])
test_y_array  = read_IDX(PATH/fnames[tst_y_idx])

In [14]:
# size of numpy arrays in MBs
train_x_array.nbytes / 2**20, train_y_array.nbytes / 2**20


Out[14]:
(44.86083984375, 0.057220458984375)

1.3 Fast AI Model Data object

inception_stats have the same Normalization that the pytorch transform above uses for its dataloader. I don't do any data augmentation besides that normalization. I also use the same train/val indices from the pytorch dataloader – to ensure my pytorch model and fastai learner are working on the same data.

Additionally in order to use pretrained models I'm going to concatenate the dataset to have 3 channels instead of 1 by copying dimensions. Another option is to forego a pretrained model and use a fresh resnet set to have only 1 input channel.


In [15]:
tfms = tfms_from_stats(inception_stats, sz=sz)
# `inception_stats` are: ([0.5,0.5,0.5],[0.5,0.5,0.5])
# see: https://github.com/fastai/fastai/blob/master/fastai/transforms.py#L695

In [16]:
# using same trn/val indices as pytorch dataloader
valid_x_array, valid_y_array = train_x_array[valid_idxs], train_y_array[valid_idxs]
train_x_array, train_y_array = train_x_array[train_idxs], train_y_array[train_idxs]

In [17]:
# stack dims for 3 channels
train_x_array = np.stack((train_x_array, train_x_array, train_x_array), axis=-1)
valid_x_array = np.stack((valid_x_array, valid_x_array, valid_x_array), axis=-1)
test_x_array  = np.stack((test_x_array,  test_x_array,  test_x_array),  axis=-1)
# convert labels to np.int8
train_y_array = train_y_array.astype(np.int8)
valid_y_array = valid_y_array.astype(np.int8)
test_y_array  = test_y_array.astype(np.int8)

In [18]:
model_data = ImageClassifierData.from_arrays(PATH, 
    (train_x_array, train_y_array), (valid_x_array, valid_y_array),
    bs=bs, tfms=tfms, num_workers=2, test=(test_x_array, test_y_array))

2. Architecture

I want to have a "solid" simple ConvNet to use throughout these experiments. This model will include a large field-of-view input conv layer followed by several conv layers. Each conv layer uses BatchNorm and Leaky ReLU (I don't know if this is better than ReLU, but it sounds like a good'ish idea to me). The model's head uses an AdaptiveConcat Pooling layer (Fast AI invention that concatenates two adaptive average and max pooling layers) leading to a Linear layer. This model doesn't use dropout (I'll add that if it looks like it needs it).


In [19]:
class AdaptiveConcatPool2d(nn.Module):
    """fast.ai, see: https://github.com/fastai/fastai/tree/master/fastai/layers.py"""
    def __init__(self, sz=None):
        super().__init__()
        sz = sz or (1,1)
        self.ap = torch.nn.AdaptiveAvgPool2d(sz)
        self.mp = torch.nn.AdaptiveAvgPool2d(sz)
    def forward(self, x):
        return torch.cat([self.mp(x), self.ap(x)], 1)
    
class Flatten(nn.Module):
    """fast.ai, see: https://github.com/fastai/fastai/tree/master/fastai/layers.py"""
    def __init__(self):
        super().__init__()
    def forward(self, x):
        return x.view(x.size(0), -1)

In [20]:
class ConvBNLayer(nn.Module):
    """conv layer with batchnorm"""
    def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0):
        super().__init__()
        self.conv  = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride)
        self.bn    = nn.BatchNorm2d(ch_out, momentum=0.1) # mom at default 0.1
        self.lrelu = nn.LeakyReLU(0.01, inplace=True)     # neg slope at default 0.01
    def forward(self, x): return self.lrelu(self.bn(self.conv(x)))

class ConvNet(nn.Module):
    # see ref: https://github.com/fastai/fastai/blob/master/fastai/models/darknet.py
    def __init__(self, ch_in=1):
        super().__init__()
        self.conv0   = ConvBNLayer(ch_in, 16, kernel_size=7, stride=1, padding=2) # large FoV Conv
        self.conv1   = ConvBNLayer(16, 32)
        self.conv2   = ConvBNLayer(32, 64)
        self.conv3   = ConvBNLayer(64, 128)
        self.neck    = nn.Sequential(*[AdaptiveConcatPool2d(1), Flatten()])
        self.head    = nn.Sequential(*[nn.BatchNorm2d(256), 
                                      nn.Dropout(p=0.25),
                                      nn.Linear(256, 10)])        
    def forward(self, x):
        x = self.conv0(x)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.neck(x)
        x = self.head(x)
        return F.log_softmax(x, dim=-1)

In [21]:
convnet = ConvNet()

2.0.1 Aside: Discovering AdaptiveConcatPool doubles input tensor length


In [216]:
x,y = next(iter(trainloader))
x,y = Variable(x), Variable(y)
convnet(x)


> <ipython-input-204-3df4356516d4>(24)forward()
-> x = self.conv0(x)
(Pdb) n
> <ipython-input-204-3df4356516d4>(25)forward()
-> x = self.conv1(x)
(Pdb) n
> <ipython-input-204-3df4356516d4>(26)forward()
-> x = self.conv2(x)
(Pdb) n
> <ipython-input-204-3df4356516d4>(27)forward()
-> x = self.conv3(x)
(Pdb) n
> <ipython-input-204-3df4356516d4>(28)forward()
-> x = self.neck(x)
(Pdb) x.shape # sanity check
torch.Size([64, 128, 16, 16])
(Pdb) AdaptiveConcatPool2d(1)(x).shape
torch.Size([64, 256, 1, 1])
(Pdb) q
---------------------------------------------------------------------------
BdbQuit                                   Traceback (most recent call last)
<ipython-input-216-965816993670> in <module>()
      1 x,y = next(iter(trainloader))
      2 x,y = Variable(x), Variable(y)
----> 3 convnet(x)

~/Miniconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    355             result = self._slow_forward(*input, **kwargs)
    356         else:
--> 357             result = self.forward(*input, **kwargs)
    358         for hook in self._forward_hooks.values():
    359             hook_result = hook(self, input, result)

<ipython-input-204-3df4356516d4> in forward(self, x)
     26         x = self.conv2(x)
     27         x = self.conv3(x)
---> 28         x = self.neck(x)
     29         x = self.head(x)
     30         return F.log_softmax(x, dim=-1)

<ipython-input-204-3df4356516d4> in forward(self, x)
     26         x = self.conv2(x)
     27         x = self.conv3(x)
---> 28         x = self.neck(x)
     29         x = self.head(x)
     30         return F.log_softmax(x, dim=-1)

~/Miniconda3/envs/fastai/lib/python3.6/bdb.py in trace_dispatch(self, frame, event, arg)
     49             return # None
     50         if event == 'line':
---> 51             return self.dispatch_line(frame)
     52         if event == 'call':
     53             return self.dispatch_call(frame, arg)

~/Miniconda3/envs/fastai/lib/python3.6/bdb.py in dispatch_line(self, frame)
     68         if self.stop_here(frame) or self.break_here(frame):
     69             self.user_line(frame)
---> 70             if self.quitting: raise BdbQuit
     71         return self.trace_dispatch
     72 

BdbQuit: 

2.1 Fast AI Learner

I'll use two fast.ai learners: the basic convnet defined above that the pytorch model will also use, and a resnet18. I'll also use an ImageNet-pretrained resnet18 to see if that helps at all. If .pretrained is not called, you will need to either use ConvnetBuilder or define a custom head yourself. NOTE also that the standard pytorch ResNet model has a 7x7 ouput pooling layer by default, which may restrict your model's performance if it's not replaced (such as with ConvnetBuilder).

The non-pretrained learner's will need their conv layers unfrozen to train them.


In [22]:
model_data.c, model_data.is_multi, model_data.is_reg


Out[22]:
(10, False, False)

In [23]:
resnet_model = ConvnetBuilder(resnet18, model_data.c, model_data.is_multi, model_data.is_reg, pretrained=False)

resnet_learner = ConvLearner(model_data, resnet_model)
custom_learner = ConvLearner.from_model_data(ConvNet(ch_in=3), model_data)
pt_res_learner = ConvLearner.pretrained(resnet18, model_data, metrics=[accuracy]) ## NOTE: metrics=[accuracy] not needed - is default

2.1.1 Aside: Layers

Again, the learners' conv layers are initially frozen:


In [63]:
True in [[layer.trainable for layer in layer_group] for layer_group in resnet_learner.get_layer_groups()]


Out[63]:
False

By default only the 'head' classification layer is trainable:


In [64]:
[[layer.trainable for layer in layer_group] for layer_group in resnet_learner.get_layer_groups()]


Out[64]:
[[False, False, False, False, False, False],
 [False, False, False, False],
 [True, True, True, True, True, True, True, True]]

Construct the custom learner with ConvnetBuilder in order to make it's layers iterable:


In [66]:
[[layer.trainable for layer in layer_group] for layer_group in custom_learner.get_layer_groups()]


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-66-e14f1b642468> in <module>()
----> 1 [[layer.trainable for layer in layer_group] for layer_group in custom_learner.get_layer_groups()]

<ipython-input-66-e14f1b642468> in <listcomp>(.0)
----> 1 [[layer.trainable for layer in layer_group] for layer_group in custom_learner.get_layer_groups()]

TypeError: 'ConvBNLayer' object is not iterable

In [73]:
custom_learner.models


Out[73]:
<fastai.core.BasicModel at 0x133b41c50>

In [74]:
resnet_learner.models


Out[74]:
<fastai.conv_learner.ConvnetBuilder at 0x13087b4e0>

In [78]:
# custom_learner

In [76]:
# resnet_learner

In [77]:
# pt_res_learner

2.1.2 Recap: Models

I'll be comparing 4 models:

  1. convnet a 1-input channel custom CNN trained in straight PyTorch
  2. custom_learner a 3-input channel custom CNN trained with Fast AI
  3. resnet_learner a 3-input channel fresh ResNet18 trained with Fast AI
  4. pt_res_learner a 3-input channel pretrained (ImageNet) ResNet18 trained with Fast AI.

Perhaps it'd be a good idea to replace the fresh ResNet18's input layer with a 1-channel input to compare it directly to the custom CNN. That's for a future run if I or anyone chooses to do so.

3. Loss Function

torch.nn.CrossEntropyLoss

Do nn.functional. loss functions go in the architecture, and nn. loss functions become criterion? Huh, interesting. It calls nn.functional..


In [24]:
criterion = torch.nn.NLLLoss() # log_softmax already in arch; nll(log_softmax) <=> CE
optimizer = torch.optim.SGD(convnet.parameters(), lr=0.01, momentum=0.9)

The Fast.ai Learners:


In [25]:
custom_learner.crit


Out[25]:
<function torch.nn.functional.nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True)>

In [26]:
resnet_learner.crit


Out[26]:
<function torch.nn.functional.nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True)>

In [27]:
pt_res_learner.crit


Out[27]:
<function torch.nn.functional.nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True)>

4. Training

As far as I know, training in base PyTorch is tedious, so I'll do a sanity-check of it first, then do all my training with Fast AI. See ref: §4: Training or §9.1: Train ConvNet & ConvNetMod in this notebook.

There are ways to implement learning-rate scheduling and other advanced techniques in PyTorch – but by that point unless you're doing it for practice or testing a new module: that's what Fast.AI is for.

4.1 base PyTorch


In [28]:
len(trainloader) # ceil(51,000 / bs) batches


Out[28]:
797

There are more improvements to doing train / valid phases – including learning rate scheduling and automatically saving best weights (see: pytorch tutorial) – but that's what fast.ai's for. I'll practice those in the future. Also since the FastAI library is pending an update to PyTorch 0.4, torch.set_grad_enabled can't be used for inference mode. Instead I follow the advice on this pytorch forum thread. For now:


In [29]:
optimizer


Out[29]:
<torch.optim.sgd.SGD at 0x7f54e1448550>

NOTE 1 the criterion and optimizer need to be initialized after the model is sent to the GPU if it is. See pytorch thread.

NOTE 2: Variable.volatile = True can only be set immediately after a Variable is created. See pytorch thread. (this is for using a validation set and not affecting the gradients) – I got this error when trying to set .volatile=True after sending the val data to GPU (torch.FloatTensor $\rightarrow$ torch.cuda.FloatTensor)


In [30]:
def train(model=None, crit=None, trainloader=None, valloader=None, num_epochs=1, verbose=True):
    # if verbose:
    #     displays = 5
    #     display_step = max(len(dataloader) // displays, 1)
    t0 = time.time()
    
    dataloaders = {'train':trainloader}
    if valloader: dataloaders['valid'] = valloader
        
#     model.to('cuda:0' if torch.cuda.is_available() else 'cpu') # pytorch >= 0.4
    to_gpu(model)
    criterion = torch.nn.NLLLoss() # log_softmax already in arch; nll(log_softmax) <=> CE
    optimizer = torch.optim.SGD(convnet.parameters(), lr=0.01, momentum=0.9)
    
    # epoch w/ train & val phases
    for epoch in range(num_epochs): 
        print(f'Epoch {epoch+1}/{num_epochs}\n{"-"*10}')
        
        for phase in dataloaders:
            running_loss = 0.0
            running_correct = 0
            
            for i,datum in enumerate(dataloaders[phase]):
                inputs, labels = datum
                inputs, labels = torch.autograd.Variable(inputs), torch.autograd.Variable(labels)
                
                # zero param gradients
                optimizer.zero_grad()

                # (forward) track history if train
                # with torch.set_grad_enabled(phase=='train'): # pytorch >= 0.4
                if phase == 'valid':     # pytorch 3.1 #
                    inputs.volatile=True               #
                    labels.volatile=True               #
                # send data to gpu
                inputs, labels = to_gpu(inputs), to_gpu(labels) # pytorch < 0.4
                outputs = model(inputs)                #
                loss    = crit(outputs, labels)        #
                _, preds= torch.max(outputs, 1) # for accuracy metric
                                                       #
                # backward & optimize if train         #
                if phase == 'train':                   #
                    loss.backward()                    #
                    optimizer.step()                   # indent for pytorch >= 0.4

                # stats
#                 pdb.set_trace()
                running_loss += loss.data[0]
                running_correct += torch.sum(preds == V(labels.data)) # wrap in V; pytorch 3.1
                    
            epoch_loss = running_loss / len(dataloaders[phase])
#             if phase == 'valid': pdb.set_trace()
            epoch_acc  = float(running_correct.double() / len(dataloaders[phase])) # ? pytorch 3.1 reqs float conversion?
#             pdb.set_trace()
            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
                
    time_elapsed = time.time() - t0
    print(f'Training Time {num_epochs} Epochs: {time_elapsed:.3f}s')

Manual PyTorch train / val training phases. See: pytorch tutorial

(forward) track history only if in train:

with torch.set_grad_enabled(False):
    outputs = model(inputs)
    _, preds = torch.max(outputs, 1)
    loss = criterion(outputs, labels)

backward + optimize only if in training phase

    if phase == 'train':
        loss.backward()
        optimizer.step()

NOTE: I think I'm doing something wrong with the validation phase. Saving. PyTorch Docs on Saving.


In [31]:
train(model=convnet, crit=criterion, trainloader=trainloader, valloader=validloader)


Epoch 1/1
----------
train Loss: 0.1861 Acc: 0.2334
valid Loss: 0.0878 Acc: 0.4610
Training Time 1 Epochs: 17.535s

Previous run on CPU:


In [30]:
# train(model=convnet, crit=criterion, trainloader=trainloader, valloader=validloader)


Epoch 1/1
----------
train Loss: 0.1932 Acc: 0.0540
valid Loss: 0.0766 Acc: 0.7518
Training Time 1 Epochs: 230.497s

In [32]:
torch.save(convnet.state_dict(), 'convnet_mnist_base.pth')

In [33]:
convnet.load_state_dict(torch.load('convnet_mnist_base.pth'))

4.2 with Fast AI

4.2.1 Finding Learning Rates

To keep things simple, I won't be using 1-Cycle, Progressive Resizing, or much in the way of Cyclical Learning Rates. That could be a topic for later runs.


In [34]:
model_data.trn_ds.get1item(0)[1].dtype


Out[34]:
dtype('int8')

In [35]:
custom_learner.lr_find()
custom_learner.sched.plot()


 84%|████████▍ | 673/797 [00:17<00:03, 37.97it/s, loss=1.06] 

In [36]:
custom_learner.sched.plot_lr()



In [67]:
# next(iter(model_data.get_dl(model_data.trn_ds, False)))

In [37]:
resnet_learner.lr_find()
resnet_learner.sched.plot()


 82%|████████▏ | 653/797 [00:14<00:03, 44.88it/s, loss=2.78] 

In [38]:
pt_res_learner.lr_find()
pt_res_learner.sched.plot()


 82%|████████▏ | 653/797 [00:14<00:03, 45.07it/s, loss=3.6]  
 82%|████████▏ | 653/797 [00:25<00:05, 25.36it/s, loss=3.6]

I'll use 1e-2 as the lr for all of them.


In [39]:
lrs = 1e-2

4.2.2 custom_learner


In [40]:
# checking all conv layers are being trained:
[layer.trainable for layer in custom_learner.models.get_layer_groups()]


Out[40]:
[True, True, True, True, True, True]

In [41]:
%time custom_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)


epoch      trn_loss   val_loss   accuracy                     
    0      0.088194   0.068054   0.980333  
CPU times: user 20.2 s, sys: 7.77 s, total: 28 s
Wall time: 22.8 s
Out[41]:
[array([0.06805]), 0.9803333334392972]

In [179]:
plot_metrics(custom_learner)


4.2.2.1 Aside: Fast.ai Automatic LR scaling:

Just noticed this very useful feature. Even at very stripped-down settings, Fastai still 'revs' the learning rate up during train-start and back down before train-end:


In [45]:
custom_learner.sched.plot_lr()


4.2.3 resnet_learner


In [50]:
[layer[0].trainable for layer in resnet_learner.models.get_layer_groups()]


Out[50]:
[False, False, True]

In [51]:
resnet_learner.unfreeze()

In [52]:
[layer[0].trainable for layer in resnet_learner.models.get_layer_groups()]


Out[52]:
[True, True, True]

In [53]:
%time resnet_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)


epoch      trn_loss   val_loss   accuracy                     
    0      0.087478   0.05272    0.983444  
CPU times: user 39.5 s, sys: 15.5 s, total: 55.1 s
Wall time: 49.7 s
Out[53]:
[array([0.05272]), 0.9834444443914625]

In [180]:
plot_metrics(resnet_learner)


4.2.4 pt_res_learner


In [55]:
# only training classifier head
%time pt_res_learner.fit(lrs, n_cycle=1, cycle_len=1, cycle_mult=1)


epoch      trn_loss   val_loss   accuracy                    
    0      0.554677   0.58673    0.891556  
CPU times: user 19.6 s, sys: 6.02 s, total: 25.6 s
Wall time: 20.4 s
Out[55]:
[array([0.58673]), 0.8915555556085375]

In [199]:
# min(pt_res_learner.sched.losses)
pt_res_learner.sched.losses[-1]


Out[199]:
0.5546770245500905

In [200]:
pt_res_learner.sched.val_losses


Out[200]:
[0.5867299038039313]

In [181]:
plot_metrics(pt_res_learner)


5. Testing

5.0.1 PyTorch convnet


In [182]:
x,y = next(iter(testloader)) # shape: ([64,1,28,28]; [64])
out = convnet(V(x))          # shape: ([64, 10])

In [183]:
_, preds = torch.max(out.data, 1)

In [184]:
list(zip(preds[:9], y[:9]))


Out[184]:
[(7, 7), (2, 2), (1, 1), (0, 0), (4, 4), (1, 1), (4, 4), (9, 9), (5, 5)]

Cool, even with that little training it's able to get a lot right.


In [187]:
def test_pytorch(model, dataloader):
    """evaluation script. Returns tuple: (list of predictions, ratio correct)"""
    correct = 0
    total = 0
    
    predictions = []

    for batch in dataloader:
        images, labels = batch   ## could also go w: testloader.dataset.test_labels
        images, labels = to_gpu(images), to_gpu(labels)
        outputs  = convnet(Variable(images))
        _, preds = torch.max(outputs.data, 1)
        total   += labels.size(0)
        correct += (preds == labels).sum()
        
        predictions.extend(preds)
        
    return predictions, correct/total

In [188]:
preds, test_acc = test_pytorch(convnet, testloader)
test_acc


Out[188]:
0.9783

97-98% accuracy on test set. Just checking:


In [189]:
_,y = next(iter(testloader))
list(zip(preds[:9], y[:9]))


Out[189]:
[(7, 7), (2, 2), (1, 1), (0, 0), (4, 4), (1, 1), (4, 4), (9, 9), (5, 5)]

5.0.2 custom_learner


In [191]:
# get output predictions
log_preds = custom_learner.predict(is_test=True)
# compare top-scoring preds against dataset
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n


Out[191]:
0.9819
Aside: (untrained) custom_learner Sanity Checks:

In [195]:
## 2-3 ways to do the same thing
# log_preds_dl = custom_learner.predict_dl(testloader) # make sure num channels correct before trying this; havent tested
log_preds_dl = custom_learner.predict_dl(model_data.test_dl)
log_preds = custom_learner.predict(is_test=True)

I had some confusion. You do take the max as the top prediction; to get the actual probabilities, since it's a log softmax ouput, you exponentiate.


In [196]:
log_preds_dl.shape, log_preds.shape # same shape


Out[196]:
((10000, 10), (10000, 10))

In [199]:
np.unique(log_preds_dl == log_preds) # same values


Out[199]:
array([ True])

In [232]:



Out[232]:
 7
 2
 1
⋮ 
 4
 5
 6
[torch.LongTensor of size 10000]

In [236]:
np.equal(testloader.dataset.test_labels, np.argmax(log_preds, axis=1)).sum() / len(testloader.dataset.test_labels)


Out[236]:
0.0892

Untrained CNN gets sub-random (< 10%) accuracy. No surprise, it only ever guesses '5', and sometimes '4':


In [242]:
set(np.argmax(log_preds, axis=1)), np.argmax(log_preds, axis=1)


Out[242]:
({4, 5}, array([5, 5, 5, ..., 5, 5, 5]))

5.0.3 resnet_learner


In [192]:
log_preds = resnet_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n


Out[192]:
0.9863

5.0.4 pt_res_learner


In [193]:
log_preds = pt_res_learner.predict(is_test=True)
np.equal(model_data.test_dl.dataset.y, np.argmax(log_preds, axis=1)).sum() / model_data.test_ds.n


Out[193]:
0.8923

Further Training & Testing

Seeing how far I can go (simply) before overfitting


In [273]:
# prev trn/val loss & valacc: 0.088194   0.068054   0.980333  
%time custom_learner.fit(lrs, n_cycle=2, cycle_len=1, cycle_mult=1)


epoch      trn_loss   val_loss   accuracy                     
    0      0.067517   0.049205   0.986222  
    1      0.050665   0.043011   0.987444                     
CPU times: user 41.1 s, sys: 15.1 s, total: 56.1 s
Wall time: 45.4 s
Out[273]:
[array([0.04301]), 0.9874444445504083]

Aside: Validation Loss Callback

Note: fastai callbacks tutorial.

I noticed only the last training session's losses are saved in learn.sched.losses and learn.sched.val_losses only holds the validation losses at the end of each epoch for the last training session. So I'll put together a callback to save validation losses and use that from here forward.

It would be very easy to have this automatically save the model at the best validation loss:

def on_epoch_end(self, metrics):
    ...
    val_loss = metrics[0]
    if val_loss < self.best_val_loss:
        self.best_loss = val_loss
        self.learner.save(...)
    ...

In [288]:
class SaveValidationLoss(Callback):
    def on_train_begin(self):
        self.val_losses = []
    def on_batch_end(self, metrics):
        print(metrics)
#         pdb.set_trace()
#         self.val_losses.append(metrics[0])
    def on_epoch_end(self, metrics):
        pdb.set_trace()
        self.val_losses.append(metrics[0])
    def plot(self):
        plt.plot(list(range(len(self.val_losses))), self.val_losses)

In [289]:
save_val = SaveValidationLoss()

In [290]:
# custom_learner.save('tempcnn')
custom_learner.load('tempcnn')

In [291]:
%time custom_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1, callbacks=[save_val])


  0%|          | 0/797 [00:00<?, ?it/s, loss=0.0255]0.025495141744613647
  0%|          | 1/797 [00:00<02:20,  5.66it/s, loss=0.0504]0.050424622616382496
  0%|          | 1/797 [00:00<02:43,  4.88it/s, loss=0.0387]0.03868748187216067
  0%|          | 1/797 [00:00<03:06,  4.27it/s, loss=0.0431]0.04306340767205375
  0%|          | 1/797 [00:00<03:29,  3.81it/s, loss=0.0601]0.060122291251162446
  1%|          | 5/797 [00:00<00:46, 17.11it/s, loss=0.0556]0.055577646002337874
  1%|          | 5/797 [00:00<00:50, 15.58it/s, loss=0.0726]0.07264379670625486
  1%|          | 5/797 [00:00<00:55, 14.28it/s, loss=0.0725]0.07245841972359143
  1%|          | 5/797 [00:00<00:59, 13.21it/s, loss=0.0713]0.07125278683668045
  1%|          | 9/797 [00:00<00:35, 22.05it/s, loss=0.0655]0.0654600846233789
  1%|          | 9/797 [00:00<00:38, 20.69it/s, loss=0.0747]0.07474335648370986
  1%|          | 9/797 [00:00<00:40, 19.53it/s, loss=0.0797]0.07971918638695787
  1%|          | 9/797 [00:00<00:42, 18.50it/s, loss=0.0737]0.07369732598372422
  2%|▏         | 13/797 [00:00<00:30, 25.35it/s, loss=0.0686]0.06860506100523578
  2%|▏         | 13/797 [00:00<00:32, 24.13it/s, loss=0.0702]0.07015624734260936
  2%|▏         | 13/797 [00:00<00:34, 23.01it/s, loss=0.0761]0.07613675360885212
  2%|▏         | 13/797 [00:00<00:35, 22.00it/s, loss=0.0728]0.0727628834336517
  2%|▏         | 17/797 [00:00<00:28, 27.63it/s, loss=0.0687]0.06872292558465692
  2%|▏         | 17/797 [00:00<00:29, 26.60it/s, loss=0.068] 0.06799065024581236
  2%|▏         | 17/797 [00:00<00:30, 25.65it/s, loss=0.0657]0.06574998564724517
  2%|▏         | 17/797 [00:00<00:34, 22.69it/s, loss=0.0658]0.06583293503336006
  3%|▎         | 21/797 [00:00<00:30, 25.57it/s, loss=0.0655]0.06546473719974298
  3%|▎         | 21/797 [00:00<00:31, 24.85it/s, loss=0.0662]0.06624144896734646
  3%|▎         | 21/797 [00:00<00:32, 24.14it/s, loss=0.0653]0.06530661542018654
  3%|▎         | 24/797 [00:00<00:28, 26.83it/s, loss=0.065] 0.06498355445789131
  3%|▎         | 24/797 [00:00<00:29, 26.13it/s, loss=0.0644]0.06444774443252303
  3%|▎         | 24/797 [00:00<00:30, 25.47it/s, loss=0.0635]0.0635090002517771
  3%|▎         | 24/797 [00:00<00:31, 24.84it/s, loss=0.062] 0.061969574535282144
  3%|▎         | 24/797 [00:00<00:31, 24.25it/s, loss=0.0595]0.05953027962028089
  4%|▎         | 29/797 [00:01<00:26, 28.59it/s, loss=0.0595]0.05954887448767442
  4%|▎         | 29/797 [00:01<00:27, 27.94it/s, loss=0.0592]0.05916899094099512
  4%|▎         | 29/797 [00:01<00:28, 27.31it/s, loss=0.0573]0.05733703315160773
  4%|▎         | 29/797 [00:01<00:28, 26.71it/s, loss=0.0568]0.056772518592660345
  4%|▎         | 29/797 [00:01<00:29, 26.16it/s, loss=0.0578]0.057821472652135136
  4%|▍         | 34/797 [00:01<00:25, 30.06it/s, loss=0.0564]0.05638734673278472
  4%|▍         | 34/797 [00:01<00:25, 29.48it/s, loss=0.0562]0.05621916576783855
  4%|▍         | 34/797 [00:01<00:26, 28.92it/s, loss=0.0547]0.05469811311111914
  4%|▍         | 34/797 [00:01<00:26, 28.38it/s, loss=0.0536]0.05358728378218238
  4%|▍         | 34/797 [00:01<00:27, 27.87it/s, loss=0.0524]0.05240603074612415
  5%|▍         | 39/797 [00:01<00:24, 31.39it/s, loss=0.0519]0.05189593755567513
  5%|▍         | 39/797 [00:01<00:24, 30.33it/s, loss=0.0523]0.052316725168858755
  5%|▍         | 39/797 [00:01<00:26, 28.41it/s, loss=0.0524]0.052423794028505534
  5%|▍         | 39/797 [00:01<00:27, 27.49it/s, loss=0.0535]0.0534643600138161
  5%|▌         | 43/797 [00:01<00:25, 29.87it/s, loss=0.0523]0.05230427099887717
  5%|▌         | 43/797 [00:01<00:25, 29.46it/s, loss=0.0525]0.052525960310538404
  5%|▌         | 43/797 [00:01<00:25, 29.05it/s, loss=0.0527]0.05265729522159424
  5%|▌         | 43/797 [00:01<00:26, 28.65it/s, loss=0.0545]0.05452367254709838
  5%|▌         | 43/797 [00:01<00:26, 28.27it/s, loss=0.053] 0.053028978249498905
  6%|▌         | 48/797 [00:01<00:24, 31.12it/s, loss=0.0521]0.05209058787320828
  6%|▌         | 48/797 [00:01<00:24, 30.72it/s, loss=0.0526]0.052553842753148304
  6%|▌         | 48/797 [00:01<00:24, 30.31it/s, loss=0.0522]0.05221025461175417
  6%|▌         | 48/797 [00:01<00:25, 29.93it/s, loss=0.0511]0.051083606775088865
  6%|▌         | 48/797 [00:01<00:25, 29.54it/s, loss=0.0506]0.05055640939653587
  7%|▋         | 53/797 [00:01<00:23, 32.19it/s, loss=0.0507]0.050658638313224
  7%|▋         | 53/797 [00:01<00:23, 31.79it/s, loss=0.0506]0.05061287269683382
  7%|▋         | 53/797 [00:01<00:23, 31.39it/s, loss=0.0528]0.052752921246403656
  7%|▋         | 53/797 [00:01<00:23, 31.01it/s, loss=0.0524]0.05242604678100572
  7%|▋         | 53/797 [00:01<00:24, 30.63it/s, loss=0.0532]0.053245942414464766
  7%|▋         | 58/797 [00:01<00:22, 33.11it/s, loss=0.0539]0.053935792433169996
  7%|▋         | 58/797 [00:01<00:22, 32.73it/s, loss=0.0534]0.053419958250443995
  7%|▋         | 58/797 [00:01<00:23, 31.65it/s, loss=0.0563]0.056275748635797096
  7%|▋         | 58/797 [00:01<00:24, 30.20it/s, loss=0.0559]0.05594139274337902
  7%|▋         | 58/797 [00:01<00:24, 29.66it/s, loss=0.0549]0.05487459878831601
  8%|▊         | 63/797 [00:01<00:23, 31.87it/s, loss=0.0563]0.05627637125590591
  8%|▊         | 63/797 [00:01<00:23, 31.55it/s, loss=0.0551]0.05510261680713505
  8%|▊         | 63/797 [00:02<00:23, 31.22it/s, loss=0.0564]0.05636321247529201
  8%|▊         | 63/797 [00:02<00:23, 30.89it/s, loss=0.0554]0.05544628251512652
  8%|▊         | 63/797 [00:02<00:24, 30.58it/s, loss=0.0546]0.05455534802501805
  9%|▊         | 68/797 [00:02<00:22, 32.67it/s, loss=0.0538]0.05376774531609394
  9%|▊         | 68/797 [00:02<00:22, 32.35it/s, loss=0.0536]0.053638791277826996
  9%|▊         | 68/797 [00:02<00:22, 32.03it/s, loss=0.053] 0.052967358776328674
  9%|▊         | 68/797 [00:02<00:22, 31.72it/s, loss=0.0522]0.052226393077301696
  9%|▊         | 68/797 [00:02<00:23, 31.41it/s, loss=0.0517]0.05174696457245576
  9%|▉         | 73/797 [00:02<00:21, 33.39it/s, loss=0.0539]0.05388824517273569
  9%|▉         | 73/797 [00:02<00:21, 33.07it/s, loss=0.057] 0.0570409384207392
  9%|▉         | 73/797 [00:02<00:22, 32.76it/s, loss=0.0575]0.05745801787283794
  9%|▉         | 73/797 [00:02<00:22, 32.45it/s, loss=0.0584]0.05844152916326373
  9%|▉         | 73/797 [00:02<00:22, 32.15it/s, loss=0.0598]0.059815854300449105
 10%|▉         | 78/797 [00:02<00:21, 34.03it/s, loss=0.0595]0.05953989061719913
 10%|▉         | 78/797 [00:02<00:21, 33.72it/s, loss=0.0598]0.05984711296934889
 10%|▉         | 78/797 [00:02<00:21, 32.79it/s, loss=0.0615]0.06149566500471457
 10%|▉         | 78/797 [00:02<00:22, 32.40it/s, loss=0.0646]0.06459322653770024
 10%|▉         | 78/797 [00:02<00:22, 31.42it/s, loss=0.0648]0.06483660310077373
 10%|█         | 83/797 [00:02<00:21, 33.14it/s, loss=0.0644]0.06442748290550976
 10%|█         | 83/797 [00:02<00:21, 32.87it/s, loss=0.0633]0.06326913932188864
 10%|█         | 83/797 [00:02<00:21, 32.60it/s, loss=0.0629]0.06288322393079246
 10%|█         | 83/797 [00:02<00:22, 32.33it/s, loss=0.0619]0.061875393349185744
 10%|█         | 83/797 [00:02<00:22, 32.06it/s, loss=0.0622]0.06215470209336003
 11%|█         | 88/797 [00:02<00:21, 33.70it/s, loss=0.0651]0.06507376319542497
 11%|█         | 88/797 [00:02<00:21, 33.44it/s, loss=0.0659]0.06591527537813362
 11%|█         | 88/797 [00:02<00:21, 33.18it/s, loss=0.0659]0.06587346467577489
 11%|█         | 88/797 [00:02<00:21, 32.92it/s, loss=0.0658]0.06580914633899997
 11%|█         | 88/797 [00:02<00:21, 32.66it/s, loss=0.0654]0.0653646462252638
 12%|█▏        | 93/797 [00:02<00:20, 34.23it/s, loss=0.0666]0.06662897311406497
 12%|█▏        | 93/797 [00:02<00:20, 33.97it/s, loss=0.0666]0.06663239166265829
 12%|█▏        | 93/797 [00:02<00:20, 33.70it/s, loss=0.0668]0.0667969225236392
 12%|█▏        | 93/797 [00:02<00:21, 33.44it/s, loss=0.0661]0.06610670107473147
 12%|█▏        | 93/797 [00:02<00:21, 33.19it/s, loss=0.0658]0.06584432414580853
 12%|█▏        | 98/797 [00:02<00:20, 34.69it/s, loss=0.0656]0.06558638323112057
 12%|█▏        | 98/797 [00:02<00:20, 34.44it/s, loss=0.0651]0.06509599472564108
 12%|█▏        | 98/797 [00:02<00:20, 33.89it/s, loss=0.0643]0.06430741332909727
 12%|█▏        | 98/797 [00:02<00:20, 33.54it/s, loss=0.0633]0.06331954155075763
 12%|█▏        | 98/797 [00:03<00:21, 32.47it/s, loss=0.0624]0.06240150180052603
 13%|█▎        | 103/797 [00:03<00:20, 33.88it/s, loss=0.0617]0.061706967129034715
 13%|█▎        | 103/797 [00:03<00:20, 33.64it/s, loss=0.0607]0.0607244040883885
 13%|█▎        | 103/797 [00:03<00:20, 33.41it/s, loss=0.061] 0.060985588675756576
 13%|█▎        | 103/797 [00:03<00:20, 33.18it/s, loss=0.0603]0.060275423793743434
 13%|█▎        | 103/797 [00:03<00:21, 32.96it/s, loss=0.0601]0.060145618981634584
 14%|█▎        | 108/797 [00:03<00:20, 34.32it/s, loss=0.0599]0.05985279854019292
 14%|█▎        | 108/797 [00:03<00:20, 34.10it/s, loss=0.0591]0.05912457935462375
 14%|█▎        | 108/797 [00:03<00:20, 33.87it/s, loss=0.0591]0.05909808563007341
 14%|█▎        | 108/797 [00:03<00:20, 33.65it/s, loss=0.0594]0.059382100318559734
 14%|█▎        | 108/797 [00:03<00:20, 33.43it/s, loss=0.0592]0.059226997403706724
 14%|█▍        | 113/797 [00:03<00:19, 34.74it/s, loss=0.0587]0.058700478600576714
 14%|█▍        | 113/797 [00:03<00:19, 34.52it/s, loss=0.0577]0.057685301736904
 14%|█▍        | 113/797 [00:03<00:19, 34.30it/s, loss=0.0569]0.0568762497190501
 14%|█▍        | 113/797 [00:03<00:20, 34.08it/s, loss=0.0574]0.057440079332502295
 14%|█▍        | 113/797 [00:03<00:20, 33.87it/s, loss=0.0581]0.05811344082192292
 15%|█▍        | 118/797 [00:03<00:19, 35.14it/s, loss=0.0577]0.05772355340063109
 15%|█▍        | 118/797 [00:03<00:19, 34.92it/s, loss=0.0588]0.05878476579718708
 15%|█▍        | 118/797 [00:03<00:19, 34.41it/s, loss=0.0582]0.058195846232784725
 15%|█▍        | 118/797 [00:03<00:19, 34.14it/s, loss=0.0578]0.057791777239383384
 15%|█▍        | 118/797 [00:03<00:20, 33.77it/s, loss=0.058] 0.05802993081392004
 15%|█▌        | 123/797 [00:03<00:19, 34.90it/s, loss=0.0586]0.058617066862376395
 15%|█▌        | 123/797 [00:03<00:19, 34.38it/s, loss=0.0579]0.057906499414038776
 15%|█▌        | 123/797 [00:03<00:19, 34.17it/s, loss=0.0586]0.05855045763052747
 15%|█▌        | 123/797 [00:03<00:19, 33.97it/s, loss=0.0604]0.06038780614270188
 15%|█▌        | 123/797 [00:03<00:19, 33.77it/s, loss=0.0596]0.05959729882769048
 16%|█▌        | 128/797 [00:03<00:19, 34.93it/s, loss=0.0609]0.06093780132981869
 16%|█▌        | 128/797 [00:03<00:19, 34.74it/s, loss=0.061] 0.060988622215085185
 16%|█▌        | 128/797 [00:03<00:19, 34.53it/s, loss=0.0638]0.0638384721051807
 16%|█▌        | 128/797 [00:03<00:19, 34.33it/s, loss=0.063] 0.06296930404346368
 16%|█▌        | 128/797 [00:03<00:19, 34.14it/s, loss=0.0623]0.06228783755765086
 17%|█▋        | 133/797 [00:03<00:18, 35.26it/s, loss=0.0627]0.06267340288562416
 17%|█▋        | 133/797 [00:03<00:18, 35.07it/s, loss=0.0629]0.06292555738248352
 17%|█▋        | 133/797 [00:03<00:19, 34.87it/s, loss=0.0622]0.06223223566049228
 17%|█▋        | 133/797 [00:03<00:19, 34.68it/s, loss=0.0621]0.06209164728846505
 17%|█▋        | 133/797 [00:03<00:19, 34.49it/s, loss=0.0632]0.06324846400814432
 17%|█▋        | 138/797 [00:03<00:18, 35.59it/s, loss=0.0643]0.06426510907961083
 17%|█▋        | 138/797 [00:03<00:18, 35.40it/s, loss=0.0652]0.06519324531293978
 17%|█▋        | 138/797 [00:03<00:18, 34.97it/s, loss=0.0643]0.06432678322092959
 17%|█▋        | 138/797 [00:03<00:19, 34.66it/s, loss=0.0638]0.06378264367953781
 17%|█▋        | 138/797 [00:04<00:19, 34.13it/s, loss=0.0636]0.06359474009321937
 18%|█▊        | 143/797 [00:04<00:18, 34.96it/s, loss=0.0637]0.06369234411653613
 18%|█▊        | 143/797 [00:04<00:18, 34.79it/s, loss=0.0627]0.0626698565293176
 18%|█▊        | 143/797 [00:04<00:18, 34.61it/s, loss=0.0621]0.062146069187436016
 18%|█▊        | 143/797 [00:04<00:18, 34.43it/s, loss=0.0628]0.0628160186526293
 18%|█▊        | 147/797 [00:04<00:18, 35.22it/s, loss=0.062] 0.062039239314843155
 18%|█▊        | 147/797 [00:04<00:18, 35.04it/s, loss=0.0621]0.06214862418046805
 18%|█▊        | 147/797 [00:04<00:18, 34.87it/s, loss=0.0615]0.06146720506411672
 18%|█▊        | 147/797 [00:04<00:18, 34.70it/s, loss=0.0604]0.06042155305492253
 18%|█▊        | 147/797 [00:04<00:18, 34.52it/s, loss=0.0596]0.059620789888922214
 19%|█▉        | 152/797 [00:04<00:18, 35.52it/s, loss=0.0601]0.060143248043902996
 19%|█▉        | 152/797 [00:04<00:18, 35.34it/s, loss=0.0594]0.05937783878016235
 19%|█▉        | 152/797 [00:04<00:18, 35.17it/s, loss=0.0592]0.059239369311851746
 19%|█▉        | 152/797 [00:04<00:18, 35.00it/s, loss=0.0584]0.05841518072128104
 19%|█▉        | 152/797 [00:04<00:18, 34.83it/s, loss=0.0593]0.059264839328606905
 20%|█▉        | 157/797 [00:04<00:17, 35.80it/s, loss=0.0603]0.06031733798899968
 20%|█▉        | 157/797 [00:04<00:17, 35.63it/s, loss=0.0618]0.06179019329230765
 20%|█▉        | 157/797 [00:04<00:18, 35.46it/s, loss=0.0629]0.06294219965364527
 20%|█▉        | 157/797 [00:04<00:18, 35.07it/s, loss=0.0621]0.06207044916602278
 20%|█▉        | 157/797 [00:04<00:18, 34.61it/s, loss=0.0613]0.061271195289354936
 20%|██        | 162/797 [00:04<00:18, 35.20it/s, loss=0.0645]0.06454079246185691
 20%|██        | 162/797 [00:04<00:18, 35.05it/s, loss=0.0662]0.06620858405962352
 20%|██        | 162/797 [00:04<00:18, 34.89it/s, loss=0.0653]0.06530967217592837
 20%|██        | 162/797 [00:04<00:18, 34.74it/s, loss=0.0642]0.06417909989716422
 21%|██        | 166/797 [00:04<00:17, 35.43it/s, loss=0.0634]0.06340066096350404
 21%|██        | 166/797 [00:04<00:17, 35.27it/s, loss=0.0638]0.06383365376114432
 21%|██        | 166/797 [00:04<00:17, 35.12it/s, loss=0.0626]0.06264456181197232
 21%|██        | 166/797 [00:04<00:18, 34.96it/s, loss=0.0617]0.061677045226643114
 21%|██        | 166/797 [00:04<00:18, 34.81it/s, loss=0.0618]0.06180478011691848
 21%|██▏       | 171/797 [00:04<00:17, 35.70it/s, loss=0.0611]0.0611344235954816
 21%|██▏       | 171/797 [00:04<00:17, 35.54it/s, loss=0.0666]0.06659540075942347
 21%|██▏       | 171/797 [00:04<00:17, 35.39it/s, loss=0.0656]0.06555328386501952
 21%|██▏       | 171/797 [00:04<00:17, 35.23it/s, loss=0.0653]0.06533954048335053
 21%|██▏       | 171/797 [00:04<00:17, 35.08it/s, loss=0.0654]0.0654264966251912
 22%|██▏       | 176/797 [00:04<00:17, 35.95it/s, loss=0.066] 0.06595868307882266
 22%|██▏       | 176/797 [00:04<00:17, 35.79it/s, loss=0.0681]0.06814912940714694
 22%|██▏       | 176/797 [00:04<00:17, 35.64it/s, loss=0.0686]0.06860372935006805
 22%|██▏       | 176/797 [00:04<00:17, 35.49it/s, loss=0.0683]0.06826406878543709
 22%|██▏       | 176/797 [00:05<00:17, 35.05it/s, loss=0.0679]0.06790660825532721
 23%|██▎       | 181/797 [00:05<00:17, 35.36it/s, loss=0.0693]0.06928127819732903
 23%|██▎       | 181/797 [00:05<00:17, 35.18it/s, loss=0.0689]0.06892746073993666
 23%|██▎       | 181/797 [00:05<00:17, 35.04it/s, loss=0.0681]0.06805959272866889
 23%|██▎       | 181/797 [00:05<00:17, 34.90it/s, loss=0.069] 0.06904105435126202
 23%|██▎       | 185/797 [00:05<00:17, 35.52it/s, loss=0.0695]0.06953134880978681
 23%|██▎       | 185/797 [00:05<00:17, 35.38it/s, loss=0.0687]0.0687302325535762
 23%|██▎       | 185/797 [00:05<00:17, 35.25it/s, loss=0.0679]0.06787094567813874
 23%|██▎       | 185/797 [00:05<00:17, 35.11it/s, loss=0.0677]0.06767721688023702
 23%|██▎       | 185/797 [00:05<00:17, 34.97it/s, loss=0.0668]0.06682328535707939
 24%|██▍       | 190/797 [00:05<00:16, 35.77it/s, loss=0.0673]0.06728467671651833
 24%|██▍       | 190/797 [00:05<00:17, 35.63it/s, loss=0.0665]0.06647914102293512
 24%|██▍       | 190/797 [00:05<00:17, 35.49it/s, loss=0.0659]0.06593007655008819
 24%|██▍       | 190/797 [00:05<00:17, 35.35it/s, loss=0.0659]0.06586096685074064
 24%|██▍       | 190/797 [00:05<00:17, 35.21it/s, loss=0.0654]0.06535318418883986
 24%|██▍       | 195/797 [00:05<00:16, 35.99it/s, loss=0.0675]0.06749425180040852
 24%|██▍       | 195/797 [00:05<00:16, 35.85it/s, loss=0.0686]0.06863267844755527
 24%|██▍       | 195/797 [00:05<00:16, 35.71it/s, loss=0.0678]0.06783130610032785
 24%|██▍       | 195/797 [00:05<00:16, 35.58it/s, loss=0.0688]0.06882420844958187
 24%|██▍       | 195/797 [00:05<00:16, 35.44it/s, loss=0.0689]0.06893370669201689
 25%|██▌       | 200/797 [00:05<00:16, 35.98it/s, loss=0.0683]0.06834657773108777
 25%|██▌       | 200/797 [00:05<00:16, 35.72it/s, loss=0.068] 0.06796274234222793
 25%|██▌       | 200/797 [00:05<00:16, 35.38it/s, loss=0.0701]0.07005965312190977
 25%|██▌       | 200/797 [00:05<00:16, 35.13it/s, loss=0.0695]0.06947172120528926
 25%|██▌       | 200/797 [00:05<00:17, 35.00it/s, loss=0.0693]0.0692863565396745
 26%|██▌       | 205/797 [00:05<00:16, 35.74it/s, loss=0.0712]0.07120370415023387
 26%|██▌       | 205/797 [00:05<00:16, 35.61it/s, loss=0.071] 0.07102592879217544
 26%|██▌       | 205/797 [00:05<00:16, 35.48it/s, loss=0.0703]0.07033015562807556
 26%|██▌       | 205/797 [00:05<00:16, 35.36it/s, loss=0.0737]0.07366223090174791
 26%|██▌       | 205/797 [00:05<00:16, 35.23it/s, loss=0.0724]0.07240945684097629
 26%|██▋       | 210/797 [00:05<00:16, 35.96it/s, loss=0.0721]0.07205726480172332
 26%|██▋       | 210/797 [00:05<00:16, 35.83it/s, loss=0.0726]0.07260982355101978
 26%|██▋       | 210/797 [00:05<00:16, 35.71it/s, loss=0.0729]0.07293934266326768
 26%|██▋       | 210/797 [00:05<00:16, 35.58it/s, loss=0.0733]0.07329974588145377
 26%|██▋       | 210/797 [00:05<00:16, 35.46it/s, loss=0.0737]0.073736892514432
 27%|██▋       | 215/797 [00:05<00:16, 36.17it/s, loss=0.0732]0.07320010226759684
 27%|██▋       | 215/797 [00:05<00:16, 36.05it/s, loss=0.0725]0.0725043339169911
 27%|██▋       | 215/797 [00:05<00:16, 35.92it/s, loss=0.0716]0.07159707083609132
 27%|██▋       | 215/797 [00:06<00:16, 35.80it/s, loss=0.071] 0.07101079983273374
 27%|██▋       | 215/797 [00:06<00:16, 35.67it/s, loss=0.0712]0.07121164408444783
 28%|██▊       | 220/797 [00:06<00:15, 36.17it/s, loss=0.0709]0.07094373090045454
 28%|██▊       | 220/797 [00:06<00:16, 35.83it/s, loss=0.0712]0.07119825758274169
 28%|██▊       | 220/797 [00:06<00:16, 35.44it/s, loss=0.0704]0.07039425241703831
 28%|██▊       | 220/797 [00:06<00:16, 35.31it/s, loss=0.0725]0.07248359472004512
 28%|██▊       | 220/797 [00:06<00:16, 35.20it/s, loss=0.0738]0.07380192305014002
 28%|██▊       | 225/797 [00:06<00:15, 35.87it/s, loss=0.0729]0.07287471736699434
 28%|██▊       | 225/797 [00:06<00:15, 35.76it/s, loss=0.0723]0.07234140478047621
 28%|██▊       | 225/797 [00:06<00:16, 35.64it/s, loss=0.0721]0.07208588410190073
 28%|██▊       | 225/797 [00:06<00:16, 35.52it/s, loss=0.0712]0.07117072773755169
 28%|██▊       | 225/797 [00:06<00:16, 35.40it/s, loss=0.0699]0.06988760576774049
 29%|██▉       | 230/797 [00:06<00:15, 36.07it/s, loss=0.0697]0.06965025664572819
 29%|██▉       | 230/797 [00:06<00:15, 35.95it/s, loss=0.0687]0.068712453014816
 29%|██▉       | 230/797 [00:06<00:15, 35.84it/s, loss=0.0679]0.06787912178557179
 29%|██▉       | 230/797 [00:06<00:15, 35.72it/s, loss=0.0683]0.06829105325834355
 29%|██▉       | 230/797 [00:06<00:15, 35.60it/s, loss=0.0689]0.06886895676560965
 29%|██▉       | 235/797 [00:06<00:15, 36.26it/s, loss=0.0683]0.06827382966292382
 29%|██▉       | 235/797 [00:06<00:15, 36.14it/s, loss=0.068] 0.068002001612218
 29%|██▉       | 235/797 [00:06<00:15, 36.02it/s, loss=0.0671]0.06711894254150848
 29%|██▉       | 235/797 [00:06<00:15, 35.91it/s, loss=0.0666]0.06657308821168535
 29%|██▉       | 235/797 [00:06<00:15, 35.79it/s, loss=0.0656]0.06559024495746793
 30%|███       | 240/797 [00:06<00:15, 36.27it/s, loss=0.066] 0.06598768411007314
 30%|███       | 240/797 [00:06<00:15, 36.08it/s, loss=0.0663]0.06632637794246855
 30%|███       | 240/797 [00:06<00:15, 35.63it/s, loss=0.066] 0.06601000014459255
 30%|███       | 240/797 [00:06<00:15, 35.52it/s, loss=0.0651]0.06511097732731924
 30%|███       | 240/797 [00:06<00:15, 35.40it/s, loss=0.0652]0.06522422926779749
 31%|███       | 245/797 [00:06<00:15, 36.03it/s, loss=0.0648]0.06482653248971791
 31%|███       | 245/797 [00:06<00:15, 35.92it/s, loss=0.0649]0.06492849601835848
 31%|███       | 245/797 [00:06<00:15, 35.80it/s, loss=0.0639]0.0638870782381578
 31%|███       | 245/797 [00:06<00:15, 35.69it/s, loss=0.064] 0.06396479137613972
 31%|███       | 245/797 [00:06<00:15, 35.58it/s, loss=0.0629]0.06288560144765842
 31%|███▏      | 250/797 [00:06<00:15, 36.19it/s, loss=0.0619]0.06194770116783754
 31%|███▏      | 250/797 [00:06<00:15, 36.09it/s, loss=0.0613]0.061257572820073734
 31%|███▏      | 250/797 [00:06<00:15, 35.97it/s, loss=0.0606]0.060637533008887384
 31%|███▏      | 250/797 [00:06<00:15, 35.86it/s, loss=0.0609]0.060926762308236106
 31%|███▏      | 250/797 [00:06<00:15, 35.75it/s, loss=0.061] 0.06101163994445353
 32%|███▏      | 255/797 [00:07<00:14, 36.35it/s, loss=0.0601]0.060050499638796405
 32%|███▏      | 255/797 [00:07<00:14, 36.24it/s, loss=0.0634]0.06336130257322044
 32%|███▏      | 255/797 [00:07<00:14, 36.13it/s, loss=0.0626]0.06259559075540685
 32%|███▏      | 255/797 [00:07<00:15, 36.03it/s, loss=0.0623]0.0623381689881315
 32%|███▏      | 255/797 [00:07<00:15, 35.92it/s, loss=0.0616]0.06158757101687718
 33%|███▎      | 260/797 [00:07<00:14, 36.40it/s, loss=0.0612]0.06118293885306241
 33%|███▎      | 260/797 [00:07<00:14, 36.23it/s, loss=0.0615]0.06150781559822076
 33%|███▎      | 260/797 [00:07<00:14, 35.90it/s, loss=0.0605]0.06045374131976387
 33%|███▎      | 260/797 [00:07<00:15, 35.67it/s, loss=0.0611]0.061125679668527756
 33%|███▎      | 260/797 [00:07<00:15, 35.57it/s, loss=0.0629]0.06287380765693884
 33%|███▎      | 265/797 [00:07<00:14, 36.14it/s, loss=0.0624]0.062395729204682476
 33%|███▎      | 265/797 [00:07<00:14, 36.04it/s, loss=0.0627]0.06273312409732595
 33%|███▎      | 265/797 [00:07<00:14, 35.94it/s, loss=0.0636]0.06361653933524872
 33%|███▎      | 265/797 [00:07<00:14, 35.84it/s, loss=0.0634]0.06342071147582937
 33%|███▎      | 265/797 [00:07<00:14, 35.74it/s, loss=0.0632]0.0631656583895072
 34%|███▍      | 270/797 [00:07<00:14, 36.31it/s, loss=0.0638]0.06381944170283707
 34%|███▍      | 270/797 [00:07<00:14, 36.21it/s, loss=0.0629]0.06294335803692418
 34%|███▍      | 270/797 [00:07<00:14, 36.11it/s, loss=0.0621]0.06207773204974256
 34%|███▍      | 270/797 [00:07<00:14, 36.01it/s, loss=0.0611]0.061064870202072394
 34%|███▍      | 270/797 [00:07<00:14, 35.91it/s, loss=0.0612]0.06122144792431983
 35%|███▍      | 275/797 [00:07<00:14, 36.47it/s, loss=0.0624]0.06241457432172082
 35%|███▍      | 275/797 [00:07<00:14, 36.37it/s, loss=0.0627]0.06269670082037168
 35%|███▍      | 275/797 [00:07<00:14, 36.27it/s, loss=0.0616]0.06163878766097815
 35%|███▍      | 275/797 [00:07<00:14, 36.17it/s, loss=0.0613]0.061303047410838536
 35%|███▍      | 275/797 [00:07<00:14, 36.07it/s, loss=0.0621]0.06212113870944572
 35%|███▌      | 280/797 [00:07<00:14, 36.26it/s, loss=0.0625]0.062458070478920304
 35%|███▌      | 280/797 [00:07<00:14, 35.97it/s, loss=0.0627]0.06267228695704025
 35%|███▌      | 280/797 [00:07<00:14, 35.87it/s, loss=0.0626]0.06256223733401087
 35%|███▌      | 280/797 [00:07<00:14, 35.78it/s, loss=0.0634]0.06335174255457993
 35%|███▌      | 280/797 [00:07<00:14, 35.68it/s, loss=0.0631]0.06310227698384366
 36%|███▌      | 285/797 [00:07<00:14, 36.22it/s, loss=0.0621]0.06213909683704554
 36%|███▌      | 285/797 [00:07<00:14, 36.13it/s, loss=0.0633]0.06325408573177645
 36%|███▌      | 285/797 [00:07<00:14, 36.04it/s, loss=0.0644]0.06435479005709713
 36%|███▌      | 285/797 [00:07<00:14, 35.94it/s, loss=0.0638]0.06380897675054277
 36%|███▌      | 285/797 [00:07<00:14, 35.85it/s, loss=0.063] 0.06295747737221448
 36%|███▋      | 290/797 [00:07<00:13, 36.38it/s, loss=0.0635]0.06345771757537262
 36%|███▋      | 290/797 [00:07<00:13, 36.29it/s, loss=0.0634]0.0634366188913726
 36%|███▋      | 290/797 [00:08<00:14, 36.19it/s, loss=0.0628]0.06284448016598017
 36%|███▋      | 290/797 [00:08<00:14, 36.10it/s, loss=0.0633]0.06327627299181587
 36%|███▋      | 290/797 [00:08<00:14, 36.01it/s, loss=0.0622]0.062161099437675026
 37%|███▋      | 295/797 [00:08<00:13, 36.53it/s, loss=0.0632]0.06315833494311202
 37%|███▋      | 295/797 [00:08<00:13, 36.43it/s, loss=0.0631]0.06309578575583277
 37%|███▋      | 295/797 [00:08<00:13, 36.34it/s, loss=0.0621]0.062139362865146895
 37%|███▋      | 295/797 [00:08<00:13, 36.25it/s, loss=0.062] 0.06204724142421232
 37%|███▋      | 295/797 [00:08<00:13, 36.16it/s, loss=0.0616]0.06163658854887346
 38%|███▊      | 300/797 [00:08<00:13, 36.30it/s, loss=0.061] 0.0609977261325936
 38%|███▊      | 300/797 [00:08<00:13, 36.02it/s, loss=0.0602]0.060217007880860624
 38%|███▊      | 300/797 [00:08<00:13, 35.93it/s, loss=0.0597]0.059745453149319984
 38%|███▊      | 300/797 [00:08<00:13, 35.84it/s, loss=0.0598]0.05980281116908782
 38%|███▊      | 300/797 [00:08<00:13, 35.75it/s, loss=0.0592]0.05918216355503344
 38%|███▊      | 305/797 [00:08<00:13, 36.26it/s, loss=0.0583]0.05831167273963325
 38%|███▊      | 305/797 [00:08<00:13, 36.17it/s, loss=0.0578]0.0577625310529873
 38%|███▊      | 305/797 [00:08<00:13, 36.08it/s, loss=0.0595]0.05951398533076919
 38%|███▊      | 305/797 [00:08<00:13, 35.99it/s, loss=0.0585]0.05854121587380143
 38%|███▊      | 305/797 [00:08<00:13, 35.90it/s, loss=0.0578]0.05783070522208261
 39%|███▉      | 310/797 [00:08<00:13, 36.40it/s, loss=0.0572]0.05720046727599205
 39%|███▉      | 310/797 [00:08<00:13, 36.32it/s, loss=0.0577]0.05765885601788354
 39%|███▉      | 310/797 [00:08<00:13, 36.23it/s, loss=0.0585]0.05849252842213519
 39%|███▉      | 310/797 [00:08<00:13, 36.14it/s, loss=0.0602]0.060226389377231695
 39%|███▉      | 310/797 [00:08<00:13, 36.05it/s, loss=0.0607]0.06074293539083039
 40%|███▉      | 315/797 [00:08<00:13, 36.54it/s, loss=0.0597]0.05972964031458973
 40%|███▉      | 315/797 [00:08<00:13, 36.46it/s, loss=0.0593]0.05925807332570838
 40%|███▉      | 315/797 [00:08<00:13, 36.37it/s, loss=0.059] 0.0590130380027883
 40%|███▉      | 315/797 [00:08<00:13, 36.28it/s, loss=0.0615]0.061509633751896664
 40%|███▉      | 315/797 [00:08<00:13, 36.19it/s, loss=0.0607]0.06067710940334628
 40%|████      | 320/797 [00:08<00:13, 36.51it/s, loss=0.0599]0.059852291406292904
 40%|████      | 320/797 [00:08<00:13, 36.19it/s, loss=0.0596]0.059610242478376464
 40%|████      | 320/797 [00:08<00:13, 36.03it/s, loss=0.061] 0.06101413255722865
 40%|████      | 320/797 [00:08<00:13, 35.94it/s, loss=0.0606]0.06061095027570472
 40%|████      | 320/797 [00:08<00:13, 35.86it/s, loss=0.0601]0.060109964232576384
 41%|████      | 325/797 [00:08<00:12, 36.34it/s, loss=0.0598]0.05981821057435306
 41%|████      | 325/797 [00:08<00:13, 36.25it/s, loss=0.0598]0.05983271794190636
 41%|████      | 325/797 [00:08<00:13, 36.17it/s, loss=0.0613]0.0612505917915838
 41%|████      | 325/797 [00:09<00:13, 36.09it/s, loss=0.061] 0.06097476511429439
 41%|████      | 325/797 [00:09<00:13, 36.01it/s, loss=0.0603]0.060251134595848424
 41%|████▏     | 330/797 [00:09<00:12, 36.47it/s, loss=0.0605]0.060457084580639815
 41%|████▏     | 330/797 [00:09<00:12, 36.39it/s, loss=0.0599]0.059946870831555574
 41%|████▏     | 330/797 [00:09<00:12, 36.30it/s, loss=0.0609]0.06092221389073424
 41%|████▏     | 330/797 [00:09<00:12, 36.22it/s, loss=0.061] 0.060958701952264285
 41%|████▏     | 330/797 [00:09<00:12, 36.14it/s, loss=0.0614]0.061400661996766935
 42%|████▏     | 335/797 [00:09<00:12, 36.60it/s, loss=0.0613]0.06130566845535139
 42%|████▏     | 335/797 [00:09<00:12, 36.51it/s, loss=0.0607]0.060669384548464836
 42%|████▏     | 335/797 [00:09<00:12, 36.43it/s, loss=0.0602]0.060246327892104316
 42%|████▏     | 335/797 [00:09<00:12, 36.35it/s, loss=0.061] 0.06097810518447057
 42%|████▏     | 335/797 [00:09<00:12, 36.26it/s, loss=0.0611]0.06114930964628229
 43%|████▎     | 340/797 [00:09<00:12, 36.59it/s, loss=0.0607]0.060715422113877496
 43%|████▎     | 340/797 [00:09<00:12, 36.29it/s, loss=0.0619]0.06189955512201805
 43%|████▎     | 340/797 [00:09<00:12, 36.19it/s, loss=0.0625]0.06250379154203886
 43%|████▎     | 340/797 [00:09<00:12, 36.03it/s, loss=0.0634]0.06338521505803149
 43%|████▎     | 340/797 [00:09<00:12, 35.96it/s, loss=0.0625]0.062462606560552116
 43%|████▎     | 345/797 [00:09<00:12, 36.40it/s, loss=0.0636]0.0635725795606117
 43%|████▎     | 345/797 [00:09<00:12, 36.32it/s, loss=0.064] 0.0639796183498852
 43%|████▎     | 345/797 [00:09<00:12, 36.24it/s, loss=0.0654]0.0653696632685196
 43%|████▎     | 345/797 [00:09<00:12, 36.16it/s, loss=0.0651]0.06511497647586217
 43%|████▎     | 345/797 [00:09<00:12, 36.09it/s, loss=0.0664]0.06636185431277702
 44%|████▍     | 350/797 [00:09<00:12, 36.53it/s, loss=0.0671]0.06708257429431762
 44%|████▍     | 350/797 [00:09<00:12, 36.45it/s, loss=0.0671]0.06708004980948734
 44%|████▍     | 350/797 [00:09<00:12, 36.37it/s, loss=0.0667]0.06673305771080543
 44%|████▍     | 350/797 [00:09<00:12, 36.29it/s, loss=0.0673]0.06727601003820995
 44%|████▍     | 350/797 [00:09<00:12, 36.21it/s, loss=0.0668]0.0668273451551488
 45%|████▍     | 355/797 [00:09<00:12, 36.64it/s, loss=0.066] 0.06603922450051794
 45%|████▍     | 355/797 [00:09<00:12, 36.56it/s, loss=0.0662]0.0661850233982582
 45%|████▍     | 355/797 [00:09<00:12, 36.48it/s, loss=0.0653]0.06525306585777887
 45%|████▍     | 355/797 [00:09<00:12, 36.40it/s, loss=0.0661]0.0660830932641103
 45%|████▍     | 355/797 [00:09<00:12, 36.33it/s, loss=0.0651]0.06513952037450108
 45%|████▌     | 360/797 [00:09<00:11, 36.65it/s, loss=0.064] 0.063968671055788
 45%|████▌     | 360/797 [00:09<00:12, 36.39it/s, loss=0.0646]0.06456647262186148
 45%|████▌     | 360/797 [00:09<00:12, 36.19it/s, loss=0.0641]0.06414553716476208
 45%|████▌     | 360/797 [00:09<00:12, 36.12it/s, loss=0.0644]0.06439907415702867
 45%|████▌     | 360/797 [00:09<00:12, 36.04it/s, loss=0.0636]0.06362999015386751
 46%|████▌     | 365/797 [00:10<00:11, 36.46it/s, loss=0.0629]0.06290748370016305
 46%|████▌     | 365/797 [00:10<00:11, 36.39it/s, loss=0.0631]0.06305160448359848
 46%|████▌     | 365/797 [00:10<00:11, 36.31it/s, loss=0.0631]0.06310071834075061
 46%|████▌     | 365/797 [00:10<00:11, 36.24it/s, loss=0.0621]0.06213374412563071
 46%|████▌     | 365/797 [00:10<00:11, 36.16it/s, loss=0.0614]0.06138347658748844
 46%|████▋     | 370/797 [00:10<00:11, 36.58it/s, loss=0.0603]0.06025282556649989
 46%|████▋     | 370/797 [00:10<00:11, 36.51it/s, loss=0.0596]0.0595620753844699
 46%|████▋     | 370/797 [00:10<00:11, 36.43it/s, loss=0.0585]0.05853584683453994
 46%|████▋     | 370/797 [00:10<00:11, 36.36it/s, loss=0.0583]0.0583235722689264
 46%|████▋     | 370/797 [00:10<00:11, 36.28it/s, loss=0.0581]0.058120884349048996
 47%|████▋     | 375/797 [00:10<00:11, 36.70it/s, loss=0.0586]0.05859903756332521
 47%|████▋     | 375/797 [00:10<00:11, 36.62it/s, loss=0.0593]0.059261350137531706
 47%|████▋     | 375/797 [00:10<00:11, 36.55it/s, loss=0.0587]0.05870178210689993
 47%|████▋     | 375/797 [00:10<00:11, 36.47it/s, loss=0.0588]0.058818254759067934
 47%|████▋     | 375/797 [00:10<00:11, 36.40it/s, loss=0.0579]0.05792792334293109
 48%|████▊     | 380/797 [00:10<00:11, 36.54it/s, loss=0.0573]0.05732493030799203
 48%|████▊     | 380/797 [00:10<00:11, 36.32it/s, loss=0.0564]0.056439649242380104
 48%|████▊     | 380/797 [00:10<00:11, 36.25it/s, loss=0.0568]0.05679183926562672
 48%|████▊     | 380/797 [00:10<00:11, 36.18it/s, loss=0.0561]0.05605904257234599
 48%|████▊     | 380/797 [00:10<00:11, 36.11it/s, loss=0.0552]0.05522402033900646
 48%|████▊     | 385/797 [00:10<00:11, 36.51it/s, loss=0.0546]0.05458364241942768
 48%|████▊     | 385/797 [00:10<00:11, 36.44it/s, loss=0.0541]0.05411110080974049
 48%|████▊     | 385/797 [00:10<00:11, 36.37it/s, loss=0.0552]0.05519345682954965
 48%|████▊     | 385/797 [00:10<00:11, 36.29it/s, loss=0.0542]0.05420408285428319
 48%|████▊     | 385/797 [00:10<00:11, 36.22it/s, loss=0.0546]0.054612205700646226
 49%|████▉     | 390/797 [00:10<00:11, 36.62it/s, loss=0.0544]0.05436304866870341
 49%|████▉     | 390/797 [00:10<00:11, 36.55it/s, loss=0.055] 0.05504795272721687
 49%|████▉     | 390/797 [00:10<00:11, 36.48it/s, loss=0.0543]0.05428784900154078
 49%|████▉     | 390/797 [00:10<00:11, 36.41it/s, loss=0.0534]0.05337670498907668
 49%|████▉     | 390/797 [00:10<00:11, 36.34it/s, loss=0.0555]0.05547524734836299
 50%|████▉     | 395/797 [00:10<00:10, 36.73it/s, loss=0.0558]0.055796785499717116
 50%|████▉     | 395/797 [00:10<00:10, 36.66it/s, loss=0.0548]0.05483862010232603
 50%|████▉     | 395/797 [00:10<00:10, 36.59it/s, loss=0.054] 0.05397459719259037
 50%|████▉     | 395/797 [00:10<00:11, 36.52it/s, loss=0.0538]0.053765804344099326
 50%|████▉     | 395/797 [00:10<00:11, 36.45it/s, loss=0.0551]0.055103589175065776
 50%|█████     | 400/797 [00:10<00:10, 36.71it/s, loss=0.0553]0.05525573971530736
 50%|█████     | 400/797 [00:10<00:10, 36.38it/s, loss=0.0557]0.05568263373335667
 50%|█████     | 400/797 [00:11<00:10, 36.28it/s, loss=0.055] 0.05503054138515574
 50%|█████     | 400/797 [00:11<00:10, 36.21it/s, loss=0.0546]0.0546436523245349
 50%|█████     | 400/797 [00:11<00:10, 36.14it/s, loss=0.0548]0.05482680455524297
 51%|█████     | 405/797 [00:11<00:10, 36.53it/s, loss=0.0544]0.0544284359112213
 51%|█████     | 405/797 [00:11<00:10, 36.46it/s, loss=0.054] 0.05395974810035997
 51%|█████     | 405/797 [00:11<00:10, 36.39it/s, loss=0.0535]0.053480342059561196
 51%|█████     | 405/797 [00:11<00:10, 36.32it/s, loss=0.0528]0.05275340482799898
 51%|█████     | 405/797 [00:11<00:10, 36.26it/s, loss=0.0525]0.05248750923095593
 51%|█████▏    | 410/797 [00:11<00:10, 36.63it/s, loss=0.0538]0.05378862953499366
 51%|█████▏    | 410/797 [00:11<00:10, 36.57it/s, loss=0.054] 0.05400426954802814
 51%|█████▏    | 410/797 [00:11<00:10, 36.50it/s, loss=0.0534]0.053414943497562345
 51%|█████▏    | 410/797 [00:11<00:10, 36.43it/s, loss=0.0525]0.05251568147879136
 51%|█████▏    | 410/797 [00:11<00:10, 36.36it/s, loss=0.0519]0.05190137763533696
 52%|█████▏    | 415/797 [00:11<00:10, 36.73it/s, loss=0.0526]0.05257822920793968
 52%|█████▏    | 415/797 [00:11<00:10, 36.67it/s, loss=0.0519]0.05190589574788438
 52%|█████▏    | 415/797 [00:11<00:10, 36.60it/s, loss=0.0522]0.052155341121229506
 52%|█████▏    | 415/797 [00:11<00:10, 36.53it/s, loss=0.0522]0.05218442874620665
 52%|█████▏    | 415/797 [00:11<00:10, 36.46it/s, loss=0.0532]0.05321805480167704
 53%|█████▎    | 420/797 [00:11<00:10, 36.56it/s, loss=0.0524]0.052413461657701166
 53%|█████▎    | 420/797 [00:11<00:10, 36.36it/s, loss=0.0516]0.051553420356673825
 53%|█████▎    | 420/797 [00:11<00:10, 36.29it/s, loss=0.0509]0.0509068194284516
 53%|█████▎    | 420/797 [00:11<00:10, 36.23it/s, loss=0.0504]0.05037974577591666
 53%|█████▎    | 420/797 [00:11<00:10, 36.16it/s, loss=0.0504]0.05044901602352077
 53%|█████▎    | 425/797 [00:11<00:10, 36.53it/s, loss=0.0524]0.052435654429782715
 53%|█████▎    | 425/797 [00:11<00:10, 36.46it/s, loss=0.0515]0.051522881328357945
 53%|█████▎    | 425/797 [00:11<00:10, 36.40it/s, loss=0.0532]0.05321458491139377
 53%|█████▎    | 425/797 [00:11<00:10, 36.34it/s, loss=0.0527]0.052661325513466924
 53%|█████▎    | 425/797 [00:11<00:10, 36.27it/s, loss=0.0544]0.05443511580470118
 54%|█████▍    | 430/797 [00:11<00:10, 36.63it/s, loss=0.0537]0.05369891804255553
 54%|█████▍    | 430/797 [00:11<00:10, 36.57it/s, loss=0.0528]0.05276261161887846
 54%|█████▍    | 430/797 [00:11<00:10, 36.50it/s, loss=0.0519]0.05191312752920002
 54%|█████▍    | 430/797 [00:11<00:10, 36.44it/s, loss=0.053] 0.05303794031174682
 54%|█████▍    | 430/797 [00:11<00:10, 36.37it/s, loss=0.0535]0.053540160587357076
 55%|█████▍    | 435/797 [00:11<00:09, 36.73it/s, loss=0.0544]0.054412293981777576
 55%|█████▍    | 435/797 [00:11<00:09, 36.67it/s, loss=0.054] 0.05404412169313102
 55%|█████▍    | 435/797 [00:11<00:09, 36.60it/s, loss=0.0542]0.05420185118907793
 55%|█████▍    | 435/797 [00:11<00:09, 36.54it/s, loss=0.0543]0.05425647329530538
 55%|█████▍    | 435/797 [00:11<00:09, 36.47it/s, loss=0.0536]0.053562669193881474
 55%|█████▌    | 440/797 [00:11<00:09, 36.73it/s, loss=0.0534]0.05336825574861108
 55%|█████▌    | 440/797 [00:12<00:09, 36.44it/s, loss=0.0574]0.057436610280420464
 55%|█████▌    | 440/797 [00:12<00:09, 36.35it/s, loss=0.0565]0.05650094787149156
 55%|█████▌    | 440/797 [00:12<00:09, 36.29it/s, loss=0.0557]0.055691872735664553
 55%|█████▌    | 440/797 [00:12<00:09, 36.22it/s, loss=0.0548]0.05484338463544417
 56%|█████▌    | 445/797 [00:12<00:09, 36.57it/s, loss=0.0543]0.05431878561219465
 56%|█████▌    | 445/797 [00:12<00:09, 36.51it/s, loss=0.0542]0.05423395119546579
 56%|█████▌    | 445/797 [00:12<00:09, 36.45it/s, loss=0.0549]0.054935475438277294
 56%|█████▌    | 445/797 [00:12<00:09, 36.39it/s, loss=0.0552]0.05516614023365712
 56%|█████▌    | 445/797 [00:12<00:09, 36.33it/s, loss=0.0564]0.05641682278048604
 56%|█████▋    | 450/797 [00:12<00:09, 36.67it/s, loss=0.0577]0.05774235770087944
 56%|█████▋    | 450/797 [00:12<00:09, 36.61it/s, loss=0.0573]0.05725491732851555
 56%|█████▋    | 450/797 [00:12<00:09, 36.55it/s, loss=0.0569]0.05692508517615329
 56%|█████▋    | 450/797 [00:12<00:09, 36.49it/s, loss=0.058] 0.057983247963107935
 56%|█████▋    | 450/797 [00:12<00:09, 36.42it/s, loss=0.057]0.05699800418491577
 57%|█████▋    | 455/797 [00:12<00:09, 36.76it/s, loss=0.0566]0.056649521010457914
 57%|█████▋    | 455/797 [00:12<00:09, 36.70it/s, loss=0.06]  0.06001010889366562
 57%|█████▋    | 455/797 [00:12<00:09, 36.64it/s, loss=0.0592]0.05919861126500736
 57%|█████▋    | 455/797 [00:12<00:09, 36.58it/s, loss=0.0583]0.05826791219578235
 57%|█████▋    | 455/797 [00:12<00:09, 36.52it/s, loss=0.0576]0.05758061985421175
 58%|█████▊    | 460/797 [00:12<00:09, 36.67it/s, loss=0.0575]0.05750550854824635
 58%|█████▊    | 460/797 [00:12<00:09, 36.58it/s, loss=0.0569]0.0569307325123767
 58%|█████▊    | 460/797 [00:12<00:09, 36.40it/s, loss=0.0561]0.056097032770483
 58%|█████▊    | 460/797 [00:12<00:09, 36.35it/s, loss=0.0556]0.05564860485868498
 58%|█████▊    | 460/797 [00:12<00:09, 36.29it/s, loss=0.0549]0.054851478359179404
 58%|█████▊    | 465/797 [00:12<00:09, 36.62it/s, loss=0.0565]0.05651042000641922
 58%|█████▊    | 465/797 [00:12<00:09, 36.56it/s, loss=0.0557]0.055659301950976896
 58%|█████▊    | 465/797 [00:12<00:09, 36.50it/s, loss=0.0559]0.055888303188899674
 58%|█████▊    | 465/797 [00:12<00:09, 36.44it/s, loss=0.0551]0.05513182984709252
 58%|█████▊    | 465/797 [00:12<00:09, 36.38it/s, loss=0.0561]0.0561113608713705
 59%|█████▉    | 470/797 [00:12<00:08, 36.71it/s, loss=0.0554]0.05539467747634058
 59%|█████▉    | 470/797 [00:12<00:08, 36.65it/s, loss=0.0552]0.05521805768366198
 59%|█████▉    | 470/797 [00:12<00:08, 36.59it/s, loss=0.0557]0.05570544386147968
 59%|█████▉    | 470/797 [00:12<00:08, 36.53it/s, loss=0.0556]0.05563765711299444
 59%|█████▉    | 470/797 [00:12<00:08, 36.47it/s, loss=0.056] 0.05596724404092449
 60%|█████▉    | 475/797 [00:12<00:08, 36.80it/s, loss=0.0555]0.05549831388735944
 60%|█████▉    | 475/797 [00:12<00:08, 36.74it/s, loss=0.0558]0.05578049271212493
 60%|█████▉    | 475/797 [00:12<00:08, 36.68it/s, loss=0.056] 0.055983886906863664
 60%|█████▉    | 475/797 [00:12<00:08, 36.62it/s, loss=0.0552]0.05515169441328422
 60%|█████▉    | 475/797 [00:12<00:08, 36.56it/s, loss=0.0563]0.05633051111913595
 60%|██████    | 480/797 [00:13<00:08, 36.75it/s, loss=0.0557]0.05569151334704209
 60%|██████    | 480/797 [00:13<00:08, 36.67it/s, loss=0.055] 0.054968924812325445
 60%|██████    | 480/797 [00:13<00:08, 36.46it/s, loss=0.0542]0.05420411168064381
 60%|██████    | 480/797 [00:13<00:08, 36.40it/s, loss=0.0552]0.05522671150339222
 60%|██████    | 480/797 [00:13<00:08, 36.34it/s, loss=0.0547]0.05468848418795722
 61%|██████    | 485/797 [00:13<00:08, 36.66it/s, loss=0.0543]0.05431691645733647
 61%|██████    | 485/797 [00:13<00:08, 36.61it/s, loss=0.0536]0.05360371746150852
 61%|██████    | 485/797 [00:13<00:08, 36.55it/s, loss=0.053] 0.05299189175382631
 61%|██████    | 485/797 [00:13<00:08, 36.49it/s, loss=0.0521]0.052137838710404366
 61%|██████    | 485/797 [00:13<00:08, 36.43it/s, loss=0.0532]0.053162453429354786
 61%|██████▏   | 490/797 [00:13<00:08, 36.75it/s, loss=0.0525]0.052534515954209215
 61%|██████▏   | 490/797 [00:13<00:08, 36.69it/s, loss=0.0538]0.05382850349093851
 61%|██████▏   | 490/797 [00:13<00:08, 36.63it/s, loss=0.053] 0.05299779968406081
 61%|██████▏   | 490/797 [00:13<00:08, 36.57it/s, loss=0.0532]0.05323446011808746
 61%|██████▏   | 490/797 [00:13<00:08, 36.52it/s, loss=0.0526]0.052600415053703026
 62%|██████▏   | 495/797 [00:13<00:08, 36.83it/s, loss=0.052] 0.05203850077068923
 62%|██████▏   | 495/797 [00:13<00:08, 36.77it/s, loss=0.0525]0.05252923746428335
 62%|██████▏   | 495/797 [00:13<00:08, 36.72it/s, loss=0.0524]0.05240574173543436
 62%|██████▏   | 495/797 [00:13<00:08, 36.66it/s, loss=0.0522]0.05221510743353611
 62%|██████▏   | 495/797 [00:13<00:08, 36.60it/s, loss=0.0528]0.05278205146597093
 63%|██████▎   | 500/797 [00:13<00:08, 36.83it/s, loss=0.0541]0.05412638754003824
 63%|██████▎   | 500/797 [00:13<00:08, 36.72it/s, loss=0.0537]0.053700459248643126
 63%|██████▎   | 500/797 [00:13<00:08, 36.50it/s, loss=0.0531]0.05314737375061083
 63%|██████▎   | 500/797 [00:13<00:08, 36.45it/s, loss=0.0529]0.05286450298339373
 63%|██████▎   | 500/797 [00:13<00:08, 36.39it/s, loss=0.0532]0.05315951750395728
 63%|██████▎   | 505/797 [00:13<00:07, 36.70it/s, loss=0.0526]0.05256802516198502
 63%|██████▎   | 505/797 [00:13<00:07, 36.65it/s, loss=0.0529]0.0528908567710085
 63%|██████▎   | 505/797 [00:13<00:07, 36.59it/s, loss=0.0548]0.05483361300766046
 63%|██████▎   | 505/797 [00:13<00:07, 36.54it/s, loss=0.0543]0.05429812415326061
 63%|██████▎   | 505/797 [00:13<00:08, 36.48it/s, loss=0.0573]0.057276822643542086
 64%|██████▍   | 510/797 [00:13<00:07, 36.79it/s, loss=0.0576]0.05763864470632974
 64%|██████▍   | 510/797 [00:13<00:07, 36.73it/s, loss=0.057] 0.057023290097196
 64%|██████▍   | 510/797 [00:13<00:07, 36.68it/s, loss=0.0589]0.058872877828277874
 64%|██████▍   | 510/797 [00:13<00:07, 36.62it/s, loss=0.0595]0.059499115374303656
 64%|██████▍   | 510/797 [00:13<00:07, 36.57it/s, loss=0.0599]0.05993015389717435
 65%|██████▍   | 515/797 [00:13<00:07, 36.87it/s, loss=0.0595]0.05951135489209712
 65%|██████▍   | 515/797 [00:13<00:07, 36.81it/s, loss=0.059] 0.059020864302427936
 65%|██████▍   | 515/797 [00:14<00:07, 36.76it/s, loss=0.0593]0.059288319712377045
 65%|██████▍   | 515/797 [00:14<00:07, 36.70it/s, loss=0.0587]0.058724096871743486
 65%|██████▍   | 515/797 [00:14<00:07, 36.65it/s, loss=0.0585]0.058479075603727795
 65%|██████▌   | 520/797 [00:14<00:07, 36.86it/s, loss=0.058] 0.05797133595284483
 65%|██████▌   | 520/797 [00:14<00:07, 36.69it/s, loss=0.057]0.0569862531347948
 65%|██████▌   | 520/797 [00:14<00:07, 36.56it/s, loss=0.0583]0.058275627907126946
 65%|██████▌   | 520/797 [00:14<00:07, 36.51it/s, loss=0.0576]0.05755214136035845
 65%|██████▌   | 520/797 [00:14<00:07, 36.45it/s, loss=0.0566]0.05657377102173137
 66%|██████▌   | 525/797 [00:14<00:07, 36.75it/s, loss=0.0559]0.05594571609474062
 66%|██████▌   | 525/797 [00:14<00:07, 36.70it/s, loss=0.0573]0.057295968803479175
 66%|██████▌   | 525/797 [00:14<00:07, 36.64it/s, loss=0.0563]0.05631920998214446
 66%|██████▌   | 525/797 [00:14<00:07, 36.59it/s, loss=0.056] 0.056033532193886466
 66%|██████▌   | 525/797 [00:14<00:07, 36.54it/s, loss=0.0598]0.059755786769637435
 66%|██████▋   | 530/797 [00:14<00:07, 36.83it/s, loss=0.0587]0.058724457626864845
 66%|██████▋   | 530/797 [00:14<00:07, 36.78it/s, loss=0.0579]0.057919821694951405
 66%|██████▋   | 530/797 [00:14<00:07, 36.73it/s, loss=0.0571]0.057064580983762755
 66%|██████▋   | 530/797 [00:14<00:07, 36.67it/s, loss=0.0573]0.05730926199693909
 66%|██████▋   | 530/797 [00:14<00:07, 36.62it/s, loss=0.0569]0.05688126368689594
 67%|██████▋   | 535/797 [00:14<00:07, 36.91it/s, loss=0.0562]0.05620991998957328
 67%|██████▋   | 535/797 [00:14<00:07, 36.86it/s, loss=0.0552]0.0551739228943881
 67%|██████▋   | 535/797 [00:14<00:07, 36.80it/s, loss=0.0544]0.05436424877697762
 67%|██████▋   | 535/797 [00:14<00:07, 36.75it/s, loss=0.0541]0.05407738341246677
 67%|██████▋   | 535/797 [00:14<00:07, 36.70it/s, loss=0.0539]0.05388440112362692
 68%|██████▊   | 540/797 [00:14<00:06, 36.93it/s, loss=0.0537]0.05367657695536043
 68%|██████▊   | 540/797 [00:14<00:06, 36.85it/s, loss=0.0536]0.05363556612276659
 68%|██████▊   | 540/797 [00:14<00:06, 36.79it/s, loss=0.0531]0.05309192547802703
 68%|██████▊   | 540/797 [00:14<00:07, 36.58it/s, loss=0.0534]0.05342999745757552
 68%|██████▊   | 540/797 [00:14<00:07, 36.52it/s, loss=0.053] 0.053036508229424476
 68%|██████▊   | 545/797 [00:14<00:06, 36.81it/s, loss=0.0532]0.05320366253165647
 68%|██████▊   | 545/797 [00:14<00:06, 36.75it/s, loss=0.0528]0.05276272559727592
 68%|██████▊   | 545/797 [00:14<00:06, 36.70it/s, loss=0.052] 0.05199976352414582
 68%|██████▊   | 545/797 [00:14<00:06, 36.65it/s, loss=0.0513]0.051284153835542176
 68%|██████▊   | 545/797 [00:14<00:06, 36.60it/s, loss=0.0505]0.050482901121345944
 69%|██████▉   | 550/797 [00:14<00:06, 36.88it/s, loss=0.0523]0.0522981659520229
 69%|██████▉   | 550/797 [00:14<00:06, 36.83it/s, loss=0.0518]0.05175332831713829
 69%|██████▉   | 550/797 [00:14<00:06, 36.78it/s, loss=0.0519]0.05188320745092366
 69%|██████▉   | 550/797 [00:14<00:06, 36.73it/s, loss=0.0516]0.05164793063522412
 69%|██████▉   | 550/797 [00:14<00:06, 36.68it/s, loss=0.0516]0.051598643796987524
 70%|██████▉   | 555/797 [00:15<00:06, 36.96it/s, loss=0.0517]0.051734474635261145
 70%|██████▉   | 555/797 [00:15<00:06, 36.91it/s, loss=0.0517]0.05171451402586527
 70%|██████▉   | 555/797 [00:15<00:06, 36.86it/s, loss=0.0509]0.050915604850296205
 70%|██████▉   | 555/797 [00:15<00:06, 36.81it/s, loss=0.0506]0.0506065997488836
 70%|██████▉   | 555/797 [00:15<00:06, 36.76it/s, loss=0.0498]0.04980553918230578
 70%|███████   | 560/797 [00:15<00:06, 36.78it/s, loss=0.0496]0.04955146682789384
 70%|███████   | 560/797 [00:15<00:06, 36.68it/s, loss=0.0487]0.048697400442562024
 70%|███████   | 560/797 [00:15<00:06, 36.63it/s, loss=0.0484]0.0483536416879688
 70%|███████   | 560/797 [00:15<00:06, 36.58it/s, loss=0.0484]0.04839997082456409
 70%|███████   | 560/797 [00:15<00:06, 36.53it/s, loss=0.0479]0.047857791438756625
 71%|███████   | 565/797 [00:15<00:06, 36.81it/s, loss=0.0485]0.048548888834892875
 71%|███████   | 565/797 [00:15<00:06, 36.76it/s, loss=0.0482]0.04816279318979855
 71%|███████   | 565/797 [00:15<00:06, 36.71it/s, loss=0.0496]0.049555126154960535
 71%|███████   | 565/797 [00:15<00:06, 36.66it/s, loss=0.0502]0.05018624839568345
 71%|███████   | 565/797 [00:15<00:06, 36.61it/s, loss=0.05]  0.04997924949734761
 72%|███████▏  | 570/797 [00:15<00:06, 36.88it/s, loss=0.05]0.04997646037285415
 72%|███████▏  | 570/797 [00:15<00:06, 36.83it/s, loss=0.0501]0.050064227121099356
 72%|███████▏  | 570/797 [00:15<00:06, 36.78it/s, loss=0.0507]0.05070614301176842
 72%|███████▏  | 570/797 [00:15<00:06, 36.73it/s, loss=0.05]  0.049963624534367694
 72%|███████▏  | 570/797 [00:15<00:06, 36.68it/s, loss=0.0493]0.049317263678951694
 72%|███████▏  | 575/797 [00:15<00:06, 36.95it/s, loss=0.0494]0.04941548851879204
 72%|███████▏  | 575/797 [00:15<00:06, 36.90it/s, loss=0.0486]0.04862374681944947
 72%|███████▏  | 575/797 [00:15<00:06, 36.85it/s, loss=0.0492]0.04923083946009102
 72%|███████▏  | 575/797 [00:15<00:06, 36.80it/s, loss=0.0504]0.05039175977696279
 72%|███████▏  | 575/797 [00:15<00:06, 36.75it/s, loss=0.0496]0.049638243562384166
 73%|███████▎  | 580/797 [00:15<00:05, 36.97it/s, loss=0.0495]0.04953391071826919
 73%|███████▎  | 580/797 [00:15<00:05, 36.78it/s, loss=0.0493]0.04929666554353737
 73%|███████▎  | 580/797 [00:15<00:05, 36.66it/s, loss=0.0488]0.048796491797067305
 73%|███████▎  | 580/797 [00:15<00:05, 36.61it/s, loss=0.0481]0.048106879460601056
 73%|███████▎  | 580/797 [00:15<00:05, 36.56it/s, loss=0.0478]0.047803006922347714
 73%|███████▎  | 585/797 [00:15<00:05, 36.83it/s, loss=0.0475]0.04750225774696931
 73%|███████▎  | 585/797 [00:15<00:05, 36.78it/s, loss=0.048] 0.048037372794687436
 73%|███████▎  | 585/797 [00:15<00:05, 36.73it/s, loss=0.0479]0.047853223409644166
 73%|███████▎  | 585/797 [00:15<00:05, 36.69it/s, loss=0.0474]0.047373515647512396
 73%|███████▎  | 585/797 [00:15<00:05, 36.64it/s, loss=0.0466]0.04662968024656156
 74%|███████▍  | 590/797 [00:15<00:05, 36.90it/s, loss=0.0467]0.04670650320159455
 74%|███████▍  | 590/797 [00:16<00:05, 36.85it/s, loss=0.0469]0.04693261341576037
 74%|███████▍  | 590/797 [00:16<00:05, 36.80it/s, loss=0.0484]0.048393266763962814
 74%|███████▍  | 590/797 [00:16<00:05, 36.75it/s, loss=0.048] 0.04798053286238533
 74%|███████▍  | 590/797 [00:16<00:05, 36.70it/s, loss=0.0471]0.047125534535476285
 75%|███████▍  | 595/797 [00:16<00:05, 36.96it/s, loss=0.049] 0.04899523148552035
 75%|███████▍  | 595/797 [00:16<00:05, 36.91it/s, loss=0.0482]0.04815488657194104
 75%|███████▍  | 595/797 [00:16<00:05, 36.87it/s, loss=0.0473]0.04730152375689959
 75%|███████▍  | 595/797 [00:16<00:05, 36.82it/s, loss=0.0475]0.047513086991298144
 75%|███████▍  | 595/797 [00:16<00:05, 36.77it/s, loss=0.0507]0.050655414082944654
 75%|███████▌  | 600/797 [00:16<00:05, 36.97it/s, loss=0.0498]0.049763573401513515
 75%|███████▌  | 600/797 [00:16<00:05, 36.83it/s, loss=0.0489]0.048888577001467724
 75%|███████▌  | 600/797 [00:16<00:05, 36.68it/s, loss=0.0489]0.04886483102563114
 75%|███████▌  | 600/797 [00:16<00:05, 36.63it/s, loss=0.0486]0.04858533173147237
 75%|███████▌  | 600/797 [00:16<00:05, 36.58it/s, loss=0.0479]0.04788130892546716
 76%|███████▌  | 605/797 [00:16<00:05, 36.84it/s, loss=0.048] 0.04799079268060905
 76%|███████▌  | 605/797 [00:16<00:05, 36.79it/s, loss=0.0485]0.04847232848851174
 76%|███████▌  | 605/797 [00:16<00:05, 36.75it/s, loss=0.0481]0.048083665319756864
 76%|███████▌  | 605/797 [00:16<00:05, 36.70it/s, loss=0.0472]0.047202801336188815
 76%|███████▌  | 605/797 [00:16<00:05, 36.65it/s, loss=0.0469]0.046929816549211946
 77%|███████▋  | 610/797 [00:16<00:05, 36.91it/s, loss=0.0464]0.046410203440592296
 77%|███████▋  | 610/797 [00:16<00:05, 36.86it/s, loss=0.0463]0.04625875408794437
 77%|███████▋  | 610/797 [00:16<00:05, 36.82it/s, loss=0.0454]0.045440348511463925
 77%|███████▋  | 610/797 [00:16<00:05, 36.77it/s, loss=0.0447]0.044676360410148656
 77%|███████▋  | 610/797 [00:16<00:05, 36.72it/s, loss=0.0445]0.04449081821517288
 77%|███████▋  | 615/797 [00:16<00:04, 36.97it/s, loss=0.0441]0.04411790557094416
 77%|███████▋  | 615/797 [00:16<00:04, 36.92it/s, loss=0.0441]0.04407176647302
 77%|███████▋  | 615/797 [00:16<00:04, 36.88it/s, loss=0.0443]0.044286724677456625
 77%|███████▋  | 615/797 [00:16<00:04, 36.83it/s, loss=0.0445]0.044546411816895906
 77%|███████▋  | 615/797 [00:16<00:04, 36.78it/s, loss=0.0442]0.044233546170599956
 78%|███████▊  | 620/797 [00:16<00:04, 36.75it/s, loss=0.0442]0.04420575718400441
 78%|███████▊  | 620/797 [00:16<00:04, 36.70it/s, loss=0.044] 0.04401172218497798
 78%|███████▊  | 620/797 [00:16<00:04, 36.65it/s, loss=0.0455]0.04547915714384941
 78%|███████▊  | 620/797 [00:16<00:04, 36.61it/s, loss=0.045] 0.045016516389763735
 78%|███████▊  | 620/797 [00:16<00:04, 36.57it/s, loss=0.0449]0.04487046805698043
 78%|███████▊  | 625/797 [00:16<00:04, 36.81it/s, loss=0.0447]0.04465349776795287
 78%|███████▊  | 625/797 [00:16<00:04, 36.77it/s, loss=0.0442]0.04415517007123996
 78%|███████▊  | 625/797 [00:17<00:04, 36.72it/s, loss=0.0437]0.043744008047566195
 78%|███████▊  | 625/797 [00:17<00:04, 36.68it/s, loss=0.0495]0.04952629638103849
 78%|███████▊  | 625/797 [00:17<00:04, 36.64it/s, loss=0.0506]0.050643942372304206
 79%|███████▉  | 630/797 [00:17<00:04, 36.88it/s, loss=0.0502]0.050202153266027526
 79%|███████▉  | 630/797 [00:17<00:04, 36.84it/s, loss=0.0495]0.049474254508121866
 79%|███████▉  | 630/797 [00:17<00:04, 36.79it/s, loss=0.0504]0.05043005058326303
 79%|███████▉  | 630/797 [00:17<00:04, 36.75it/s, loss=0.0506]0.05055524410279701
 79%|███████▉  | 630/797 [00:17<00:04, 36.70it/s, loss=0.0499]0.04985277406123452
 80%|███████▉  | 635/797 [00:17<00:04, 36.95it/s, loss=0.0496]0.04959728402991473
 80%|███████▉  | 635/797 [00:17<00:04, 36.90it/s, loss=0.0494]0.04939455272117871
 80%|███████▉  | 635/797 [00:17<00:04, 36.86it/s, loss=0.0494]0.049352743363441566
 80%|███████▉  | 635/797 [00:17<00:04, 36.81it/s, loss=0.0503]0.05030466108952742
 80%|███████▉  | 635/797 [00:17<00:04, 36.77it/s, loss=0.0524]0.05238261706496849
 80%|████████  | 640/797 [00:17<00:04, 36.79it/s, loss=0.0528]0.052825241486330776
 80%|████████  | 640/797 [00:17<00:04, 36.70it/s, loss=0.0522]0.05224525641048299
 80%|████████  | 640/797 [00:17<00:04, 36.66it/s, loss=0.0521]0.05214214720073427
 80%|████████  | 640/797 [00:17<00:04, 36.62it/s, loss=0.0516]0.051569345055264365
 80%|████████  | 640/797 [00:17<00:04, 36.57it/s, loss=0.052] 0.05201231979973081
 81%|████████  | 645/797 [00:17<00:04, 36.81it/s, loss=0.0519]0.051906093605406385
 81%|████████  | 645/797 [00:17<00:04, 36.77it/s, loss=0.0511]0.05114364399454405
 81%|████████  | 645/797 [00:17<00:04, 36.73it/s, loss=0.0509]0.050904206919067316
 81%|████████  | 645/797 [00:17<00:04, 36.68it/s, loss=0.0503]0.05030701545107082
 81%|████████  | 645/797 [00:17<00:04, 36.64it/s, loss=0.0501]0.05006566797045792
 82%|████████▏ | 650/797 [00:17<00:03, 36.88it/s, loss=0.0513]0.05134142409377964
 82%|████████▏ | 650/797 [00:17<00:03, 36.83it/s, loss=0.0513]0.051337434405457884
 82%|████████▏ | 650/797 [00:17<00:03, 36.79it/s, loss=0.0509]0.05087534811683496
 82%|████████▏ | 650/797 [00:17<00:04, 36.75it/s, loss=0.0506]0.050643533214047035
 82%|████████▏ | 650/797 [00:17<00:04, 36.70it/s, loss=0.0499]0.0498580384086885
 82%|████████▏ | 655/797 [00:17<00:03, 36.94it/s, loss=0.0511]0.051148123658434755
 82%|████████▏ | 655/797 [00:17<00:03, 36.90it/s, loss=0.0503]0.050342118619746766
 82%|████████▏ | 655/797 [00:17<00:03, 36.85it/s, loss=0.0499]0.04989876887104566
 82%|████████▏ | 655/797 [00:17<00:03, 36.81it/s, loss=0.049] 0.049035069112605474
 82%|████████▏ | 655/797 [00:17<00:03, 36.77it/s, loss=0.0493]0.04928425752660792
 83%|████████▎ | 660/797 [00:17<00:03, 36.86it/s, loss=0.0485]0.04852355754361468
 83%|████████▎ | 660/797 [00:17<00:03, 36.81it/s, loss=0.049] 0.04897331676539364
 83%|████████▎ | 660/797 [00:17<00:03, 36.70it/s, loss=0.0489]0.04889983659737613
 83%|████████▎ | 660/797 [00:18<00:03, 36.66it/s, loss=0.0488]0.04883277188894948
 83%|████████▎ | 660/797 [00:18<00:03, 36.61it/s, loss=0.0482]0.04817328893731885
 83%|████████▎ | 665/797 [00:18<00:03, 36.85it/s, loss=0.0474]0.04743510247425957
 83%|████████▎ | 665/797 [00:18<00:03, 36.81it/s, loss=0.0474]0.04738585307167259
 83%|████████▎ | 665/797 [00:18<00:03, 36.76it/s, loss=0.047] 0.04702667118201505
 83%|████████▎ | 665/797 [00:18<00:03, 36.72it/s, loss=0.0464]0.04635227710048133
 83%|████████▎ | 665/797 [00:18<00:03, 36.68it/s, loss=0.0457]0.04570084644854913
 84%|████████▍ | 670/797 [00:18<00:03, 36.91it/s, loss=0.0463]0.04632808962144497
 84%|████████▍ | 670/797 [00:18<00:03, 36.87it/s, loss=0.0466]0.0465863673908749
 84%|████████▍ | 670/797 [00:18<00:03, 36.83it/s, loss=0.0481]0.04808658190006507
 84%|████████▍ | 670/797 [00:18<00:03, 36.78it/s, loss=0.0497]0.049666226455445134
 84%|████████▍ | 670/797 [00:18<00:03, 36.74it/s, loss=0.0494]0.04938482229829554
 85%|████████▍ | 675/797 [00:18<00:03, 36.97it/s, loss=0.0487]0.04865242450272954
 85%|████████▍ | 675/797 [00:18<00:03, 36.93it/s, loss=0.0482]0.04824958398038565
 85%|████████▍ | 675/797 [00:18<00:03, 36.89it/s, loss=0.0489]0.04887390078837362
 85%|████████▍ | 675/797 [00:18<00:03, 36.84it/s, loss=0.0483]0.048326063921874766
 85%|████████▍ | 675/797 [00:18<00:03, 36.80it/s, loss=0.0504]0.0504403105334149
 85%|████████▌ | 680/797 [00:18<00:03, 36.96it/s, loss=0.0498]0.04983480778146487
 85%|████████▌ | 680/797 [00:18<00:03, 36.90it/s, loss=0.0517]0.05167529969390918
 85%|████████▌ | 680/797 [00:18<00:03, 36.72it/s, loss=0.0514]0.05138244995684089
 85%|████████▌ | 680/797 [00:18<00:03, 36.68it/s, loss=0.0513]0.051338534672256866
 85%|████████▌ | 680/797 [00:18<00:03, 36.64it/s, loss=0.0508]0.050783710331911695
 86%|████████▌ | 685/797 [00:18<00:03, 36.86it/s, loss=0.0502]0.05024102029220247
 86%|████████▌ | 685/797 [00:18<00:03, 36.82it/s, loss=0.0501]0.05006472037657879
 86%|████████▌ | 685/797 [00:18<00:03, 36.78it/s, loss=0.0496]0.04960245821188385
 86%|████████▌ | 685/797 [00:18<00:03, 36.74it/s, loss=0.051] 0.050966453436684635
 86%|████████▌ | 685/797 [00:18<00:03, 36.70it/s, loss=0.0525]0.05253299130849804
 87%|████████▋ | 690/797 [00:18<00:02, 36.92it/s, loss=0.0522]0.05217448355301776
 87%|████████▋ | 690/797 [00:18<00:02, 36.88it/s, loss=0.0521]0.0521430427256177
 87%|████████▋ | 690/797 [00:18<00:02, 36.84it/s, loss=0.0517]0.05173777256844596
 87%|████████▋ | 690/797 [00:18<00:02, 36.80it/s, loss=0.0511]0.05105516557079126
 87%|████████▋ | 690/797 [00:18<00:02, 36.76it/s, loss=0.0512]0.05115333996401623
 87%|████████▋ | 695/797 [00:18<00:02, 36.98it/s, loss=0.0507]0.05074994501349889
 87%|████████▋ | 695/797 [00:18<00:02, 36.94it/s, loss=0.0499]0.04988285526858861
 87%|████████▋ | 695/797 [00:18<00:02, 36.90it/s, loss=0.05]  0.05004587886285185
 87%|████████▋ | 695/797 [00:18<00:02, 36.86it/s, loss=0.0499]0.04985400993109017
 87%|████████▋ | 695/797 [00:18<00:02, 36.82it/s, loss=0.049] 0.04902802039501992
 88%|████████▊ | 700/797 [00:18<00:02, 36.87it/s, loss=0.0486]0.04859658147913512
 88%|████████▊ | 700/797 [00:19<00:02, 36.76it/s, loss=0.0544]0.05437143353151805
 88%|████████▊ | 700/797 [00:19<00:02, 36.72it/s, loss=0.0537]0.053658069566998694
 88%|████████▊ | 700/797 [00:19<00:02, 36.68it/s, loss=0.0531]0.05314987061196028
 88%|████████▊ | 700/797 [00:19<00:02, 36.64it/s, loss=0.0543]0.05427093966026628
 88%|████████▊ | 705/797 [00:19<00:02, 36.86it/s, loss=0.0534]0.05343941704884199
 88%|████████▊ | 705/797 [00:19<00:02, 36.82it/s, loss=0.0528]0.052836639943907096
 88%|████████▊ | 705/797 [00:19<00:02, 36.78it/s, loss=0.0519]0.05190336880210846
 88%|████████▊ | 705/797 [00:19<00:02, 36.74it/s, loss=0.051] 0.05101073830849924
 88%|████████▊ | 705/797 [00:19<00:02, 36.70it/s, loss=0.0507]0.05071468925443686
 89%|████████▉ | 710/797 [00:19<00:02, 36.92it/s, loss=0.0517]0.05169189048091883
 89%|████████▉ | 710/797 [00:19<00:02, 36.88it/s, loss=0.0522]0.05221997107995235
 89%|████████▉ | 710/797 [00:19<00:02, 36.84it/s, loss=0.0537]0.053724047758424646
 89%|████████▉ | 710/797 [00:19<00:02, 36.79it/s, loss=0.0533]0.05333664870443429
 89%|████████▉ | 710/797 [00:19<00:02, 36.75it/s, loss=0.0526]0.052599466438233516
 90%|████████▉ | 715/797 [00:19<00:02, 36.97it/s, loss=0.0524]0.052411627237487865
 90%|████████▉ | 715/797 [00:19<00:02, 36.93it/s, loss=0.052] 0.05200996779230071
 90%|████████▉ | 715/797 [00:19<00:02, 36.89it/s, loss=0.0519]0.051941630107363274
 90%|████████▉ | 715/797 [00:19<00:02, 36.85it/s, loss=0.052] 0.051950001497167615
 90%|████████▉ | 715/797 [00:19<00:02, 36.81it/s, loss=0.0513]0.0513077961326794
 90%|█████████ | 720/797 [00:19<00:02, 36.98it/s, loss=0.0512]0.051234160247725154
 90%|█████████ | 720/797 [00:19<00:02, 36.93it/s, loss=0.0506]0.050597542324810285
 90%|█████████ | 720/797 [00:19<00:02, 36.77it/s, loss=0.0512]0.05116990226942109
 90%|█████████ | 720/797 [00:19<00:02, 36.70it/s, loss=0.0502]0.05023696995122921
 90%|█████████ | 720/797 [00:19<00:02, 36.66it/s, loss=0.0504]0.05040019079020483
 91%|█████████ | 725/797 [00:19<00:01, 36.88it/s, loss=0.0496]0.04956996194352627
 91%|█████████ | 725/797 [00:19<00:01, 36.84it/s, loss=0.0495]0.04951047593642544
 91%|█████████ | 725/797 [00:19<00:01, 36.80it/s, loss=0.0499]0.04990182595002452
 91%|█████████ | 725/797 [00:19<00:01, 36.76it/s, loss=0.0492]0.04920355938660551
 91%|█████████ | 725/797 [00:19<00:01, 36.72it/s, loss=0.0484]0.0484257627300021
 92%|█████████▏| 730/797 [00:19<00:01, 36.93it/s, loss=0.0481]0.048144553294213265
 92%|█████████▏| 730/797 [00:19<00:01, 36.89it/s, loss=0.0484]0.04838833555517892
 92%|█████████▏| 730/797 [00:19<00:01, 36.86it/s, loss=0.0483]0.048264139626506476
 92%|█████████▏| 730/797 [00:19<00:01, 36.82it/s, loss=0.0499]0.04993906038152903
 92%|█████████▏| 730/797 [00:19<00:01, 36.78it/s, loss=0.0494]0.04943501423551175
 92%|█████████▏| 735/797 [00:19<00:01, 36.99it/s, loss=0.0489]0.04888609798822269
 92%|█████████▏| 735/797 [00:19<00:01, 36.95it/s, loss=0.0491]0.04913983281214115
 92%|█████████▏| 735/797 [00:19<00:01, 36.91it/s, loss=0.0483]0.04829316468542404
 92%|█████████▏| 735/797 [00:19<00:01, 36.87it/s, loss=0.0483]0.048334138892381945
 92%|█████████▏| 735/797 [00:19<00:01, 36.83it/s, loss=0.0482]0.04820492144864173
 93%|█████████▎| 740/797 [00:20<00:01, 36.90it/s, loss=0.0499]0.04987382803219141
 93%|█████████▎| 740/797 [00:20<00:01, 36.84it/s, loss=0.0489]0.04894274792704228
 93%|█████████▎| 740/797 [00:20<00:01, 36.79it/s, loss=0.0486]0.04862807458254773
 93%|█████████▎| 740/797 [00:20<00:01, 36.73it/s, loss=0.0484]0.048362126969893345
 93%|█████████▎| 740/797 [00:20<00:01, 36.70it/s, loss=0.0482]0.04823174493012405
 93%|█████████▎| 745/797 [00:20<00:01, 36.91it/s, loss=0.0487]0.048728455875431224
 93%|█████████▎| 745/797 [00:20<00:01, 36.87it/s, loss=0.0482]0.04819905999288691
 93%|█████████▎| 745/797 [00:20<00:01, 36.83it/s, loss=0.0478]0.04778295682535792
 93%|█████████▎| 745/797 [00:20<00:01, 36.79it/s, loss=0.0474]0.04738419987132625
 93%|█████████▎| 745/797 [00:20<00:01, 36.76it/s, loss=0.0467]0.046702554917270586
 94%|█████████▍| 750/797 [00:20<00:01, 36.97it/s, loss=0.046] 0.04596008189674072
 94%|█████████▍| 750/797 [00:20<00:01, 36.93it/s, loss=0.0472]0.047168804529285104
 94%|█████████▍| 750/797 [00:20<00:01, 36.89it/s, loss=0.0467]0.04669768471442908
 94%|█████████▍| 750/797 [00:20<00:01, 36.85it/s, loss=0.0479]0.047937347014631156
 94%|█████████▍| 750/797 [00:20<00:01, 36.81it/s, loss=0.0485]0.048477648078071346
 95%|█████████▍| 755/797 [00:20<00:01, 37.02it/s, loss=0.0486]0.04864157463064093
 95%|█████████▍| 755/797 [00:20<00:01, 36.98it/s, loss=0.0487]0.048701789248326854
 95%|█████████▍| 755/797 [00:20<00:01, 36.94it/s, loss=0.0485]0.04850971569592826
 95%|█████████▍| 755/797 [00:20<00:01, 36.90it/s, loss=0.0484]0.04836571355064639
 95%|█████████▍| 755/797 [00:20<00:01, 36.87it/s, loss=0.0492]0.04917286817288303
 95%|█████████▌| 760/797 [00:20<00:01, 36.91it/s, loss=0.0489]0.04885524163790175
 95%|█████████▌| 760/797 [00:20<00:01, 36.81it/s, loss=0.0483]0.04831228612239326
 95%|█████████▌| 760/797 [00:20<00:01, 36.77it/s, loss=0.0488]0.04875505480382748
 95%|█████████▌| 760/797 [00:20<00:01, 36.74it/s, loss=0.0489]0.048936743553195404
 95%|█████████▌| 760/797 [00:20<00:01, 36.70it/s, loss=0.0484]0.04840397470327416
 96%|█████████▌| 765/797 [00:20<00:00, 36.90it/s, loss=0.0479]0.047912311465113204
 96%|█████████▌| 765/797 [00:20<00:00, 36.86it/s, loss=0.0472]0.047186083928848446
 96%|█████████▌| 765/797 [00:20<00:00, 36.82it/s, loss=0.0474]0.047378198230967915
 96%|█████████▌| 765/797 [00:20<00:00, 36.78it/s, loss=0.0471]0.04706536824489036
 96%|█████████▌| 765/797 [00:20<00:00, 36.75it/s, loss=0.0472]0.04717456780386956
 97%|█████████▋| 770/797 [00:20<00:00, 36.95it/s, loss=0.0464]0.04643328925008367
 97%|█████████▋| 770/797 [00:20<00:00, 36.91it/s, loss=0.0459]0.04590276869373489
 97%|█████████▋| 770/797 [00:20<00:00, 36.88it/s, loss=0.0465]0.04649326033598309
 97%|█████████▋| 770/797 [00:20<00:00, 36.84it/s, loss=0.0461]0.046111222847695914
 97%|█████████▋| 770/797 [00:20<00:00, 36.80it/s, loss=0.0458]0.0458228944872674
 97%|█████████▋| 775/797 [00:20<00:00, 37.00it/s, loss=0.045] 0.04502049547404128
 97%|█████████▋| 775/797 [00:20<00:00, 36.96it/s, loss=0.0446]0.044596614949233726
 97%|█████████▋| 775/797 [00:20<00:00, 36.93it/s, loss=0.0445]0.044483559623456494
 97%|█████████▋| 775/797 [00:21<00:00, 36.89it/s, loss=0.044] 0.04402040340602993
 97%|█████████▋| 775/797 [00:21<00:00, 36.85it/s, loss=0.0445]0.04453574009313121
 98%|█████████▊| 780/797 [00:21<00:00, 37.01it/s, loss=0.0455]0.04545232646448011
 98%|█████████▊| 780/797 [00:21<00:00, 36.95it/s, loss=0.0448]0.044807956655800854
 98%|█████████▊| 780/797 [00:21<00:00, 36.88it/s, loss=0.0448]0.04479254495808288
 98%|█████████▊| 780/797 [00:21<00:00, 36.80it/s, loss=0.0448]0.04477929803032019
 98%|█████████▊| 780/797 [00:21<00:00, 36.76it/s, loss=0.0442]0.04420175626993313
 98%|█████████▊| 785/797 [00:21<00:00, 36.96it/s, loss=0.0443]0.0442604879674155
 98%|█████████▊| 785/797 [00:21<00:00, 36.92it/s, loss=0.0441]0.04410931833036296
 98%|█████████▊| 785/797 [00:21<00:00, 36.89it/s, loss=0.0433]0.04331366693309984
 98%|█████████▊| 785/797 [00:21<00:00, 36.85it/s, loss=0.0434]0.04342058039815732
 98%|█████████▊| 785/797 [00:21<00:00, 36.81it/s, loss=0.0436]0.04362849101245704
 99%|█████████▉| 790/797 [00:21<00:00, 37.01it/s, loss=0.0441]0.04408332696535829
 99%|█████████▉| 790/797 [00:21<00:00, 36.97it/s, loss=0.0442]0.044240972180604236
 99%|█████████▉| 790/797 [00:21<00:00, 36.94it/s, loss=0.044] 0.04403996491585933
 99%|█████████▉| 790/797 [00:21<00:00, 36.90it/s, loss=0.0433]0.043334275941036685
 99%|█████████▉| 790/797 [00:21<00:00, 36.87it/s, loss=0.0429]0.04289509619671417
100%|█████████▉| 795/797 [00:21<00:00, 37.06it/s, loss=0.0425]0.04252624945082725
100%|█████████▉| 795/797 [00:21<00:00, 37.03it/s, loss=0.0428]0.04275723303236188
> <ipython-input-288-6ed726a523ca>(10)on_epoch_end()          
-> self.val_losses.append(metrics[0])
(Pdb) metrics[0]
array([0.03617])
(Pdb) metrics
[array([0.03617]), 0.9888888889948527]
(Pdb) q
---------------------------------------------------------------------------
BdbQuit                                   Traceback (most recent call last)
<timed eval> in <module>()

~/Kaukasos/research/fastai/learner.py in fit(self, lrs, n_cycle, wds, **kwargs)
    285         self.sched = None
    286         layer_opt = self.get_layer_opt(lrs, wds)
--> 287         return self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
    288 
    289     def warm_up(self, lr, wds=None):

~/Kaukasos/research/fastai/learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, best_save_name, use_clr, use_clr_beta, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, use_swa, swa_start, swa_eval_freq, **kwargs)
    232             metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, fp16=self.fp16,
    233             swa_model=self.swa_model if use_swa else None, swa_start=swa_start,
--> 234             swa_eval_freq=swa_eval_freq, **kwargs)
    235 
    236     def get_layer_groups(self): return self.models.get_layer_groups()

~/Kaukasos/research/fastai/model.py in fit(model, data, n_epochs, opt, crit, metrics, callbacks, stepper, swa_model, swa_start, swa_eval_freq, **kwargs)
    159             vals = validate(model_stepper, cur_data.val_dl, metrics)
    160             stop=False
--> 161             for cb in callbacks: stop = stop or cb.on_epoch_end(vals)
    162             if swa_model is not None:
    163                 if (epoch + 1) >= swa_start and ((epoch + 1 - swa_start) % swa_eval_freq == 0 or epoch == tot_epochs - 1):

<ipython-input-288-6ed726a523ca> in on_epoch_end(self, metrics)
      8     def on_epoch_end(self, metrics):
      9         pdb.set_trace()
---> 10         self.val_losses.append(metrics[0])
     11     def plot(self):
     12         plt.plot(list(range(len(self.val_losses))), self.val_losses)

<ipython-input-288-6ed726a523ca> in on_epoch_end(self, metrics)
      8     def on_epoch_end(self, metrics):
      9         pdb.set_trace()
---> 10         self.val_losses.append(metrics[0])
     11     def plot(self):
     12         plt.plot(list(range(len(self.val_losses))), self.val_losses)

~/src/anaconda3/envs/fastai/lib/python3.6/bdb.py in trace_dispatch(self, frame, event, arg)
     49             return # None
     50         if event == 'line':
---> 51             return self.dispatch_line(frame)
     52         if event == 'call':
     53             return self.dispatch_call(frame, arg)

~/src/anaconda3/envs/fastai/lib/python3.6/bdb.py in dispatch_line(self, frame)
     68         if self.stop_here(frame) or self.break_here(frame):
     69             self.user_line(frame)
---> 70             if self.quitting: raise BdbQuit
     71         return self.trace_dispatch
     72 

BdbQuit: 

In [292]:
custom_learner.metrics


Out[292]:
[<function fastai.metrics.accuracy(preds, targs)>]

In [260]:
save_val.val_losses


Out[260]:
[array([0.03371]), array([0.03247]), array([0.0286]), array([0.0288])]

In [ ]:


In [207]:
custom_learner.sched.val_losses


Out[207]:
[0.04781805194086498, 0.03861683146821128]

In [ ]:
custom_learner.

In [ ]:
%time custom_learner.fit(lrs, n_cycle=4, cycle_len=1, cycle_mult=1)

In [ ]:


In [ ]:

6. Comparisons & Thoughts