Fine-tune with Pre-trained Models

Many of the exciting deep learning algorithms for computer vision require massive datasets for training. The most popular benchmark dataset, ImageNet, for example, contains one million images from one thousand categories. But for any practical problem, we typically have access to comparatively small datasets. In these cases, if we were to train a neural network's weights from scratch, starting from random initialized parameters, we would overfit the training set badly.

One approach to get around this problem is to first pretrain a deep net on a large-scale dataset, like ImageNet. Then, given a new dataset, we can start with these pretrained weights when training on our new task. This process commonly called "fine-tuning". There are anumber of variations of fine-tuning. Sometimes, the initial neural network is used only as a feature extractor. That means that we freeze every layer prior to the output layer and simply learn a new output layer. In another document, we explained how to do this kind of feature extraction. Another approach is to update all of networks weights for the new task, and that's the appraoch we demonstrate in this document.

To fine-tune a network, we must first replace the last fully-connected layer with a new one that outputs the desired number of classes. We initialize its weights randomly. Then we continue training as normal. Sometimes it's common use a smaller learning rate based on the intuition that we may already be close to a good result.

In this demonstration, we'll fine-tune a model pre-trained on ImageNet to the smaller caltech-256 dataset. Following this example, you can finetune to other datasets, even for strikingly different applications such as face identification.

We will show that, even with simple hyper-parameters setting, we can match and even outperform state-of-the-art results on caltech-256.

Network Accuracy
Resnet-50 77.4%
Resnet-152 86.4%

Prepare data

We follow the standard protocol to sample 60 images from each class as the training set, and the rest for the validation set. We resize images into 256x256 size and pack them into the rec file. The scripts to prepare the data is as following.

wget http://www.vision.caltech.edu/Image_Datasets/Caltech256/256_ObjectCategories.tar
tar -xf 256_ObjectCategories.tar

mkdir -p caltech_256_train_60
for i in 256_ObjectCategories/*; do
    c=`basename $i`
    mkdir -p caltech_256_train_60/$c
    for j in `ls $i/*.jpg | shuf | head -n 60`; do
        mv $j caltech_256_train_60/$c/
    done
done

python ~/mxnet/tools/im2rec.py --list True --recursive True caltech-256-60-train caltech_256_train_60/
python ~/mxnet/tools/im2rec.py --list True --recursive True caltech-256-60-val 256_ObjectCategories/
python ~/mxnet/tools/im2rec.py --resize 256 --quality 90 --num-thread 16 caltech-256-60-val 256_ObjectCategories/
python ~/mxnet/tools/im2rec.py --resize 256 --quality 90 --num-thread 16 caltech-256-60-train caltech_256_train_60/

The following codes download the pre-generated rec files. It may take a few minutes.


In [1]:
import os, urllib
def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.urlretrieve(url, filename)
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')

Next we define the function which returns the data iterators:


In [2]:
import mxnet as mx

def get_iterators(batch_size, data_shape=(3, 224, 224)):
    train = mx.io.ImageRecordIter(
        path_imgrec         = './caltech-256-60-train.rec',
        data_name           = 'data',
        label_name          = 'softmax_label',
        batch_size          = batch_size,
        data_shape          = data_shape,
        shuffle             = True,
        rand_crop           = True,
        rand_mirror         = True)
    val = mx.io.ImageRecordIter(
        path_imgrec         = './caltech-256-60-val.rec',
        data_name           = 'data',
        label_name          = 'softmax_label',
        batch_size          = batch_size,
        data_shape          = data_shape,
        rand_crop           = False,
        rand_mirror         = False)
    return (train, val)

We then download a pretrained 50-layer ResNet model and load into memory. Note that if load_checkpoint reports an error, we can remove the downloaded files and try get_model again.


In [3]:
def get_model(prefix, epoch):
    download(prefix+'-symbol.json')
    download(prefix+'-%04d.params' % (epoch,))

get_model('http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50', 0)
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-50', 0)

Train

We first define a function which replaces the the last fully-connected layer for a given network.


In [4]:
def get_fine_tune_model(symbol, arg_params, num_classes, layer_name='flatten0'):
    """
    symbol: the pre-trained network symbol
    arg_params: the argument parameters of the pre-trained model
    num_classes: the number of classes for the fine-tune datasets
    layer_name: the layer name before the last fully-connected layer
    """
    all_layers = sym.get_internals()
    net = all_layers[layer_name+'_output']
    net = mx.symbol.FullyConnected(data=net, num_hidden=num_classes, name='fc1')
    net = mx.symbol.SoftmaxOutput(data=net, name='softmax')
    new_args = dict({k:arg_params[k] for k in arg_params if 'fc1' not in k})
    return (net, new_args)

Now we create a module. We pass the argument parameters of the pre-trained model to replace all parameters except for the last fully-connected layer. For the last fully-connected layer, we use an initializer to initialize.


In [5]:
import logging
head = '%(asctime)-15s %(message)s'
logging.basicConfig(level=logging.DEBUG, format=head)

def fit(symbol, arg_params, aux_params, train, val, batch_size, num_gpus):
    devs = [mx.gpu(i) for i in range(num_gpus)]
    mod = mx.mod.Module(symbol=new_sym, context=devs)
    mod.fit(train, val, 
        num_epoch=8,
        arg_params=arg_params,
        aux_params=aux_params,
        allow_missing=True,
        batch_end_callback = mx.callback.Speedometer(batch_size, 10),        
        kvstore='device',
        optimizer='sgd',
        optimizer_params={'learning_rate':0.01},
        initializer=mx.init.Xavier(rnd_type='gaussian', factor_type="in", magnitude=2),
        eval_metric='acc')
    metric = mx.metric.Accuracy()
    return mod.score(val, metric)

Then we can start training. We use AWS EC2 g2.8xlarge, which has 8 GPUs.


In [6]:
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
num_classes = 256
batch_per_gpu = 16
num_gpus = 8

(new_sym, new_args) = get_fine_tune_model(sym, arg_params, num_classes)

batch_size = batch_per_gpu * num_gpus
(train, val) = get_iterators(batch_size)
mod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus)
assert mod_score > 0.77, "Low training accuracy."


2016-10-22 18:24:16,695 Already binded, ignoring bind()
2016-10-22 18:24:22,361 Epoch[0] Batch [10]	Speed: 325.98 samples/sec	Train-accuracy=0.004261
2016-10-22 18:24:26,205 Epoch[0] Batch [20]	Speed: 333.06 samples/sec	Train-accuracy=0.011719
2016-10-22 18:24:30,072 Epoch[0] Batch [30]	Speed: 331.06 samples/sec	Train-accuracy=0.021094
2016-10-22 18:24:33,954 Epoch[0] Batch [40]	Speed: 329.84 samples/sec	Train-accuracy=0.020313
2016-10-22 18:24:37,811 Epoch[0] Batch [50]	Speed: 331.93 samples/sec	Train-accuracy=0.023438
2016-10-22 18:24:41,668 Epoch[0] Batch [60]	Speed: 331.93 samples/sec	Train-accuracy=0.032813
2016-10-22 18:24:45,557 Epoch[0] Batch [70]	Speed: 329.22 samples/sec	Train-accuracy=0.049219
2016-10-22 18:24:49,424 Epoch[0] Batch [80]	Speed: 331.12 samples/sec	Train-accuracy=0.071875
2016-10-22 18:24:53,323 Epoch[0] Batch [90]	Speed: 328.36 samples/sec	Train-accuracy=0.084375
2016-10-22 18:24:57,203 Epoch[0] Batch [100]	Speed: 329.95 samples/sec	Train-accuracy=0.115625
2016-10-22 18:25:01,091 Epoch[0] Batch [110]	Speed: 329.33 samples/sec	Train-accuracy=0.153906
2016-10-22 18:25:05,000 Epoch[0] Batch [120]	Speed: 327.49 samples/sec	Train-accuracy=0.187500
2016-10-22 18:25:05,001 Epoch[0] Train-accuracy=nan
2016-10-22 18:25:05,002 Epoch[0] Time cost=48.301
2016-10-22 18:25:24,502 Epoch[0] Validation-accuracy=0.297072
2016-10-22 18:25:28,564 Epoch[1] Batch [10]	Speed: 330.58 samples/sec	Train-accuracy=0.240767
2016-10-22 18:25:32,426 Epoch[1] Batch [20]	Speed: 331.53 samples/sec	Train-accuracy=0.265625
2016-10-22 18:25:36,289 Epoch[1] Batch [30]	Speed: 331.41 samples/sec	Train-accuracy=0.287500
2016-10-22 18:25:40,173 Epoch[1] Batch [40]	Speed: 329.64 samples/sec	Train-accuracy=0.314063
2016-10-22 18:25:44,032 Epoch[1] Batch [50]	Speed: 331.80 samples/sec	Train-accuracy=0.361719
2016-10-22 18:25:47,876 Epoch[1] Batch [60]	Speed: 333.07 samples/sec	Train-accuracy=0.347656
2016-10-22 18:25:51,741 Epoch[1] Batch [70]	Speed: 331.30 samples/sec	Train-accuracy=0.410156
2016-10-22 18:25:55,603 Epoch[1] Batch [80]	Speed: 331.50 samples/sec	Train-accuracy=0.417187
2016-10-22 18:25:59,460 Epoch[1] Batch [90]	Speed: 331.88 samples/sec	Train-accuracy=0.425781
2016-10-22 18:26:03,304 Epoch[1] Batch [100]	Speed: 333.11 samples/sec	Train-accuracy=0.419531
2016-10-22 18:26:07,196 Epoch[1] Batch [110]	Speed: 328.97 samples/sec	Train-accuracy=0.496875
2016-10-22 18:26:10,665 Epoch[1] Train-accuracy=0.488715
2016-10-22 18:26:10,666 Epoch[1] Time cost=46.163
2016-10-22 18:26:29,719 Epoch[1] Validation-accuracy=0.556066
2016-10-22 18:26:33,883 Epoch[2] Batch [10]	Speed: 325.12 samples/sec	Train-accuracy=0.514915
2016-10-22 18:26:37,757 Epoch[2] Batch [20]	Speed: 330.50 samples/sec	Train-accuracy=0.524219
2016-10-22 18:26:41,684 Epoch[2] Batch [30]	Speed: 325.98 samples/sec	Train-accuracy=0.536719
2016-10-22 18:26:45,562 Epoch[2] Batch [40]	Speed: 330.21 samples/sec	Train-accuracy=0.514844
2016-10-22 18:26:49,448 Epoch[2] Batch [50]	Speed: 329.44 samples/sec	Train-accuracy=0.564844
2016-10-22 18:26:53,338 Epoch[2] Batch [60]	Speed: 329.16 samples/sec	Train-accuracy=0.534375
2016-10-22 18:26:57,230 Epoch[2] Batch [70]	Speed: 328.99 samples/sec	Train-accuracy=0.576562
2016-10-22 18:27:01,128 Epoch[2] Batch [80]	Speed: 328.42 samples/sec	Train-accuracy=0.604688
2016-10-22 18:27:04,990 Epoch[2] Batch [90]	Speed: 331.54 samples/sec	Train-accuracy=0.582812
2016-10-22 18:27:08,874 Epoch[2] Batch [100]	Speed: 329.63 samples/sec	Train-accuracy=0.572656
2016-10-22 18:27:12,737 Epoch[2] Batch [110]	Speed: 331.45 samples/sec	Train-accuracy=0.625781
2016-10-22 18:27:16,591 Epoch[2] Batch [120]	Speed: 332.20 samples/sec	Train-accuracy=0.603125
2016-10-22 18:27:16,597 Epoch[2] Train-accuracy=nan
2016-10-22 18:27:16,598 Epoch[2] Time cost=46.878
2016-10-22 18:27:34,905 Epoch[2] Validation-accuracy=0.651947
2016-10-22 18:27:38,961 Epoch[3] Batch [10]	Speed: 330.53 samples/sec	Train-accuracy=0.636364
2016-10-22 18:27:42,811 Epoch[3] Batch [20]	Speed: 332.56 samples/sec	Train-accuracy=0.634375
2016-10-22 18:27:46,675 Epoch[3] Batch [30]	Speed: 331.38 samples/sec	Train-accuracy=0.629687
2016-10-22 18:27:50,545 Epoch[3] Batch [40]	Speed: 330.79 samples/sec	Train-accuracy=0.641406
2016-10-22 18:27:54,423 Epoch[3] Batch [50]	Speed: 330.16 samples/sec	Train-accuracy=0.665625
2016-10-22 18:27:58,273 Epoch[3] Batch [60]	Speed: 332.54 samples/sec	Train-accuracy=0.638281
2016-10-22 18:28:02,131 Epoch[3] Batch [70]	Speed: 331.93 samples/sec	Train-accuracy=0.671875
2016-10-22 18:28:05,988 Epoch[3] Batch [80]	Speed: 331.88 samples/sec	Train-accuracy=0.691406
2016-10-22 18:28:09,870 Epoch[3] Batch [90]	Speed: 329.84 samples/sec	Train-accuracy=0.670312
2016-10-22 18:28:13,742 Epoch[3] Batch [100]	Speed: 330.65 samples/sec	Train-accuracy=0.660156
2016-10-22 18:28:17,636 Epoch[3] Batch [110]	Speed: 328.77 samples/sec	Train-accuracy=0.681250
2016-10-22 18:28:21,097 Epoch[3] Train-accuracy=0.684028
2016-10-22 18:28:21,098 Epoch[3] Time cost=46.192
2016-10-22 18:28:40,464 Epoch[3] Validation-accuracy=0.701943
2016-10-22 18:28:44,610 Epoch[4] Batch [10]	Speed: 327.03 samples/sec	Train-accuracy=0.708807
2016-10-22 18:28:48,480 Epoch[4] Batch [20]	Speed: 330.86 samples/sec	Train-accuracy=0.708594
2016-10-22 18:28:52,371 Epoch[4] Batch [30]	Speed: 329.02 samples/sec	Train-accuracy=0.713281
2016-10-22 18:28:56,234 Epoch[4] Batch [40]	Speed: 331.46 samples/sec	Train-accuracy=0.700781
2016-10-22 18:29:00,129 Epoch[4] Batch [50]	Speed: 328.65 samples/sec	Train-accuracy=0.712500
2016-10-22 18:29:04,006 Epoch[4] Batch [60]	Speed: 330.30 samples/sec	Train-accuracy=0.697656
2016-10-22 18:29:07,865 Epoch[4] Batch [70]	Speed: 331.74 samples/sec	Train-accuracy=0.717969
2016-10-22 18:29:11,737 Epoch[4] Batch [80]	Speed: 330.61 samples/sec	Train-accuracy=0.737500
2016-10-22 18:29:15,592 Epoch[4] Batch [90]	Speed: 332.19 samples/sec	Train-accuracy=0.714844
2016-10-22 18:29:19,435 Epoch[4] Batch [100]	Speed: 333.15 samples/sec	Train-accuracy=0.696875
2016-10-22 18:29:23,287 Epoch[4] Batch [110]	Speed: 332.35 samples/sec	Train-accuracy=0.734375
2016-10-22 18:29:27,136 Epoch[4] Batch [120]	Speed: 332.61 samples/sec	Train-accuracy=0.726562
2016-10-22 18:29:27,137 Epoch[4] Train-accuracy=nan
2016-10-22 18:29:27,138 Epoch[4] Time cost=46.673
2016-10-22 18:29:45,791 Epoch[4] Validation-accuracy=0.736935
2016-10-22 18:29:49,873 Epoch[5] Batch [10]	Speed: 332.48 samples/sec	Train-accuracy=0.749290
2016-10-22 18:29:53,765 Epoch[5] Batch [20]	Speed: 328.95 samples/sec	Train-accuracy=0.732031
2016-10-22 18:29:57,648 Epoch[5] Batch [30]	Speed: 329.67 samples/sec	Train-accuracy=0.736719
2016-10-22 18:30:01,540 Epoch[5] Batch [40]	Speed: 329.42 samples/sec	Train-accuracy=0.722656
2016-10-22 18:30:05,433 Epoch[5] Batch [50]	Speed: 328.82 samples/sec	Train-accuracy=0.751563
2016-10-22 18:30:09,309 Epoch[5] Batch [60]	Speed: 330.37 samples/sec	Train-accuracy=0.736719
2016-10-22 18:30:13,198 Epoch[5] Batch [70]	Speed: 329.27 samples/sec	Train-accuracy=0.771875
2016-10-22 18:30:17,084 Epoch[5] Batch [80]	Speed: 329.47 samples/sec	Train-accuracy=0.762500
2016-10-22 18:30:20,958 Epoch[5] Batch [90]	Speed: 330.43 samples/sec	Train-accuracy=0.742969
2016-10-22 18:30:24,858 Epoch[5] Batch [100]	Speed: 328.32 samples/sec	Train-accuracy=0.770312
2016-10-22 18:30:28,734 Epoch[5] Batch [110]	Speed: 330.27 samples/sec	Train-accuracy=0.781250
2016-10-22 18:30:32,217 Epoch[5] Train-accuracy=0.757812
2016-10-22 18:30:32,218 Epoch[5] Time cost=46.426
2016-10-22 18:30:51,745 Epoch[5] Validation-accuracy=0.752450
2016-10-22 18:30:55,887 Epoch[6] Batch [10]	Speed: 326.48 samples/sec	Train-accuracy=0.754261
2016-10-22 18:30:59,754 Epoch[6] Batch [20]	Speed: 331.16 samples/sec	Train-accuracy=0.768750
2016-10-22 18:31:03,612 Epoch[6] Batch [30]	Speed: 331.83 samples/sec	Train-accuracy=0.774219
2016-10-22 18:31:07,472 Epoch[6] Batch [40]	Speed: 331.66 samples/sec	Train-accuracy=0.751563
2016-10-22 18:31:11,326 Epoch[6] Batch [50]	Speed: 332.21 samples/sec	Train-accuracy=0.777344
2016-10-22 18:31:15,194 Epoch[6] Batch [60]	Speed: 331.01 samples/sec	Train-accuracy=0.762500
2016-10-22 18:31:19,062 Epoch[6] Batch [70]	Speed: 331.03 samples/sec	Train-accuracy=0.801562
2016-10-22 18:31:22,938 Epoch[6] Batch [80]	Speed: 330.32 samples/sec	Train-accuracy=0.788281
2016-10-22 18:31:26,802 Epoch[6] Batch [90]	Speed: 331.37 samples/sec	Train-accuracy=0.773438
2016-10-22 18:31:30,656 Epoch[6] Batch [100]	Speed: 332.24 samples/sec	Train-accuracy=0.777344
2016-10-22 18:31:34,555 Epoch[6] Batch [110]	Speed: 328.36 samples/sec	Train-accuracy=0.791406
2016-10-22 18:31:38,412 Epoch[6] Batch [120]	Speed: 331.89 samples/sec	Train-accuracy=0.791406
2016-10-22 18:31:38,413 Epoch[6] Train-accuracy=nan
2016-10-22 18:31:38,414 Epoch[6] Time cost=46.668
2016-10-22 18:31:57,459 Epoch[6] Validation-accuracy=0.768382
2016-10-22 18:32:01,634 Epoch[7] Batch [10]	Speed: 324.04 samples/sec	Train-accuracy=0.789773
2016-10-22 18:32:05,542 Epoch[7] Batch [20]	Speed: 327.57 samples/sec	Train-accuracy=0.794531
2016-10-22 18:32:09,411 Epoch[7] Batch [30]	Speed: 330.90 samples/sec	Train-accuracy=0.788281
2016-10-22 18:32:13,311 Epoch[7] Batch [40]	Speed: 328.36 samples/sec	Train-accuracy=0.778906
2016-10-22 18:32:17,190 Epoch[7] Batch [50]	Speed: 330.00 samples/sec	Train-accuracy=0.803125
2016-10-22 18:32:21,075 Epoch[7] Batch [60]	Speed: 329.54 samples/sec	Train-accuracy=0.780469
2016-10-22 18:32:24,934 Epoch[7] Batch [70]	Speed: 331.78 samples/sec	Train-accuracy=0.779687
2016-10-22 18:32:28,803 Epoch[7] Batch [80]	Speed: 330.92 samples/sec	Train-accuracy=0.821875
2016-10-22 18:32:32,662 Epoch[7] Batch [90]	Speed: 331.79 samples/sec	Train-accuracy=0.783594
2016-10-22 18:32:36,515 Epoch[7] Batch [100]	Speed: 332.32 samples/sec	Train-accuracy=0.802344
2016-10-22 18:32:40,393 Epoch[7] Batch [110]	Speed: 330.16 samples/sec	Train-accuracy=0.800000
2016-10-22 18:32:43,832 Epoch[7] Train-accuracy=0.782118
2016-10-22 18:32:43,833 Epoch[7] Time cost=46.373
2016-10-22 18:33:01,994 Epoch[7] Validation-accuracy=0.774422

As you can see, after only 8 epochs, we can get 78% validation accuracy. This matches the state-of-the-art results training on caltech-256 alone, e.g. VGG.

Next, we try to use another pretrained model. This model was trained on the complete Imagenet dataset, which is 10x larger than the Imagenet 1K classes version, and uses a 3x deeper Resnet architecture.


In [8]:
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
get_model('http://data.mxnet.io/models/imagenet-11k/resnet-152/resnet-152', 0)
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0)
(new_sym, new_args) = get_fine_tune_model(sym, arg_params, num_classes)
mod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus)
assert mod_score > 0.86, "Low training accuracy."


2016-10-22 18:35:42,274 Already binded, ignoring bind()
2016-10-22 18:35:55,659 Epoch[0] Batch [10]	Speed: 139.63 samples/sec	Train-accuracy=0.070312
2016-10-22 18:36:04,814 Epoch[0] Batch [20]	Speed: 139.83 samples/sec	Train-accuracy=0.349219
2016-10-22 18:36:13,991 Epoch[0] Batch [30]	Speed: 139.49 samples/sec	Train-accuracy=0.585156
2016-10-22 18:36:23,163 Epoch[0] Batch [40]	Speed: 139.57 samples/sec	Train-accuracy=0.642188
2016-10-22 18:36:32,309 Epoch[0] Batch [50]	Speed: 139.97 samples/sec	Train-accuracy=0.728906
2016-10-22 18:36:41,426 Epoch[0] Batch [60]	Speed: 140.41 samples/sec	Train-accuracy=0.760156
2016-10-22 18:36:50,531 Epoch[0] Batch [70]	Speed: 140.60 samples/sec	Train-accuracy=0.778906
2016-10-22 18:36:59,631 Epoch[0] Batch [80]	Speed: 140.68 samples/sec	Train-accuracy=0.786719
2016-10-22 18:37:08,742 Epoch[0] Batch [90]	Speed: 140.51 samples/sec	Train-accuracy=0.797656
2016-10-22 18:37:17,857 Epoch[0] Batch [100]	Speed: 140.45 samples/sec	Train-accuracy=0.823438
2016-10-22 18:37:26,969 Epoch[0] Batch [110]	Speed: 140.50 samples/sec	Train-accuracy=0.827344
2016-10-22 18:37:36,094 Epoch[0] Batch [120]	Speed: 140.29 samples/sec	Train-accuracy=0.829688
2016-10-22 18:37:36,095 Epoch[0] Train-accuracy=nan
2016-10-22 18:37:36,096 Epoch[0] Time cost=113.804
2016-10-22 18:38:08,728 Epoch[0] Validation-accuracy=0.829780
2016-10-22 18:38:18,228 Epoch[1] Batch [10]	Speed: 139.92 samples/sec	Train-accuracy=0.862926
2016-10-22 18:38:27,365 Epoch[1] Batch [20]	Speed: 140.10 samples/sec	Train-accuracy=0.867969
2016-10-22 18:38:36,476 Epoch[1] Batch [30]	Speed: 140.52 samples/sec	Train-accuracy=0.884375
2016-10-22 18:38:45,581 Epoch[1] Batch [40]	Speed: 140.60 samples/sec	Train-accuracy=0.856250
2016-10-22 18:38:54,671 Epoch[1] Batch [50]	Speed: 140.84 samples/sec	Train-accuracy=0.888281
2016-10-22 18:39:03,774 Epoch[1] Batch [60]	Speed: 140.62 samples/sec	Train-accuracy=0.891406
2016-10-22 18:39:12,893 Epoch[1] Batch [70]	Speed: 140.38 samples/sec	Train-accuracy=0.893750
2016-10-22 18:39:22,016 Epoch[1] Batch [80]	Speed: 140.33 samples/sec	Train-accuracy=0.911719
2016-10-22 18:39:31,173 Epoch[1] Batch [90]	Speed: 139.79 samples/sec	Train-accuracy=0.893750
2016-10-22 18:39:40,341 Epoch[1] Batch [100]	Speed: 139.65 samples/sec	Train-accuracy=0.885938
2016-10-22 18:39:49,522 Epoch[1] Batch [110]	Speed: 139.45 samples/sec	Train-accuracy=0.901563
2016-10-22 18:39:57,750 Epoch[1] Train-accuracy=0.907986
2016-10-22 18:39:57,751 Epoch[1] Time cost=109.022
2016-10-22 18:40:30,649 Epoch[1] Validation-accuracy=0.848608
2016-10-22 18:40:40,134 Epoch[2] Batch [10]	Speed: 140.33 samples/sec	Train-accuracy=0.921875
2016-10-22 18:40:49,247 Epoch[2] Batch [20]	Speed: 140.47 samples/sec	Train-accuracy=0.911719
2016-10-22 18:40:58,367 Epoch[2] Batch [30]	Speed: 140.37 samples/sec	Train-accuracy=0.914844
2016-10-22 18:41:07,515 Epoch[2] Batch [40]	Speed: 139.93 samples/sec	Train-accuracy=0.913281
2016-10-22 18:41:16,659 Epoch[2] Batch [50]	Speed: 140.01 samples/sec	Train-accuracy=0.929688
2016-10-22 18:41:25,826 Epoch[2] Batch [60]	Speed: 139.64 samples/sec	Train-accuracy=0.940625
2016-10-22 18:41:35,015 Epoch[2] Batch [70]	Speed: 139.31 samples/sec	Train-accuracy=0.927344
2016-10-22 18:41:44,178 Epoch[2] Batch [80]	Speed: 139.72 samples/sec	Train-accuracy=0.940625
2016-10-22 18:41:53,316 Epoch[2] Batch [90]	Speed: 140.09 samples/sec	Train-accuracy=0.928125
2016-10-22 18:42:02,413 Epoch[2] Batch [100]	Speed: 140.72 samples/sec	Train-accuracy=0.948438
2016-10-22 18:42:11,522 Epoch[2] Batch [110]	Speed: 140.53 samples/sec	Train-accuracy=0.925781
2016-10-22 18:42:20,624 Epoch[2] Batch [120]	Speed: 140.66 samples/sec	Train-accuracy=0.928906
2016-10-22 18:42:20,625 Epoch[2] Train-accuracy=nan
2016-10-22 18:42:20,626 Epoch[2] Time cost=109.976
2016-10-22 18:42:53,414 Epoch[2] Validation-accuracy=0.853269
2016-10-22 18:43:02,925 Epoch[3] Batch [10]	Speed: 139.86 samples/sec	Train-accuracy=0.941051
2016-10-22 18:43:12,095 Epoch[3] Batch [20]	Speed: 139.60 samples/sec	Train-accuracy=0.935156
2016-10-22 18:43:21,270 Epoch[3] Batch [30]	Speed: 139.52 samples/sec	Train-accuracy=0.939844
2016-10-22 18:43:30,434 Epoch[3] Batch [40]	Speed: 139.70 samples/sec	Train-accuracy=0.945312
2016-10-22 18:43:39,557 Epoch[3] Batch [50]	Speed: 140.31 samples/sec	Train-accuracy=0.946094
2016-10-22 18:43:48,680 Epoch[3] Batch [60]	Speed: 140.33 samples/sec	Train-accuracy=0.937500
2016-10-22 18:43:57,775 Epoch[3] Batch [70]	Speed: 140.75 samples/sec	Train-accuracy=0.951562
2016-10-22 18:44:06,899 Epoch[3] Batch [80]	Speed: 140.31 samples/sec	Train-accuracy=0.956250
2016-10-22 18:44:16,000 Epoch[3] Batch [90]	Speed: 140.67 samples/sec	Train-accuracy=0.942969
2016-10-22 18:44:25,110 Epoch[3] Batch [100]	Speed: 140.52 samples/sec	Train-accuracy=0.958594
2016-10-22 18:44:34,225 Epoch[3] Batch [110]	Speed: 140.46 samples/sec	Train-accuracy=0.946875
2016-10-22 18:44:42,448 Epoch[3] Train-accuracy=0.952257
2016-10-22 18:44:42,450 Epoch[3] Time cost=109.035
2016-10-22 18:45:15,423 Epoch[3] Validation-accuracy=0.857587
2016-10-22 18:45:24,921 Epoch[4] Batch [10]	Speed: 139.90 samples/sec	Train-accuracy=0.965199
2016-10-22 18:45:34,041 Epoch[4] Batch [20]	Speed: 140.37 samples/sec	Train-accuracy=0.964844
2016-10-22 18:45:43,172 Epoch[4] Batch [30]	Speed: 140.20 samples/sec	Train-accuracy=0.968750
2016-10-22 18:45:52,287 Epoch[4] Batch [40]	Speed: 140.45 samples/sec	Train-accuracy=0.955469
2016-10-22 18:46:01,418 Epoch[4] Batch [50]	Speed: 140.20 samples/sec	Train-accuracy=0.971094
2016-10-22 18:46:10,534 Epoch[4] Batch [60]	Speed: 140.43 samples/sec	Train-accuracy=0.954688
2016-10-22 18:46:19,664 Epoch[4] Batch [70]	Speed: 140.21 samples/sec	Train-accuracy=0.964063
2016-10-22 18:46:28,811 Epoch[4] Batch [80]	Speed: 139.96 samples/sec	Train-accuracy=0.969531
2016-10-22 18:46:37,986 Epoch[4] Batch [90]	Speed: 139.53 samples/sec	Train-accuracy=0.961719
2016-10-22 18:46:47,150 Epoch[4] Batch [100]	Speed: 139.70 samples/sec	Train-accuracy=0.966406
2016-10-22 18:46:56,307 Epoch[4] Batch [110]	Speed: 139.79 samples/sec	Train-accuracy=0.966406
2016-10-22 18:47:05,456 Epoch[4] Batch [120]	Speed: 139.94 samples/sec	Train-accuracy=0.966406
2016-10-22 18:47:05,457 Epoch[4] Train-accuracy=nan
2016-10-22 18:47:05,457 Epoch[4] Time cost=110.033
2016-10-22 18:47:38,303 Epoch[4] Validation-accuracy=0.862329
2016-10-22 18:47:47,779 Epoch[5] Batch [10]	Speed: 140.25 samples/sec	Train-accuracy=0.971591
2016-10-22 18:47:56,897 Epoch[5] Batch [20]	Speed: 140.40 samples/sec	Train-accuracy=0.970313
2016-10-22 18:48:06,006 Epoch[5] Batch [30]	Speed: 140.53 samples/sec	Train-accuracy=0.976562
2016-10-22 18:48:15,150 Epoch[5] Batch [40]	Speed: 140.01 samples/sec	Train-accuracy=0.967187
2016-10-22 18:48:24,320 Epoch[5] Batch [50]	Speed: 139.60 samples/sec	Train-accuracy=0.975781
2016-10-22 18:48:33,515 Epoch[5] Batch [60]	Speed: 139.22 samples/sec	Train-accuracy=0.971094
2016-10-22 18:48:42,707 Epoch[5] Batch [70]	Speed: 139.26 samples/sec	Train-accuracy=0.971875
2016-10-22 18:48:51,857 Epoch[5] Batch [80]	Speed: 139.92 samples/sec	Train-accuracy=0.988281
2016-10-22 18:49:00,980 Epoch[5] Batch [90]	Speed: 140.32 samples/sec	Train-accuracy=0.969531
2016-10-22 18:49:10,092 Epoch[5] Batch [100]	Speed: 140.49 samples/sec	Train-accuracy=0.984375
2016-10-22 18:49:19,205 Epoch[5] Batch [110]	Speed: 140.49 samples/sec	Train-accuracy=0.978125
2016-10-22 18:49:27,399 Epoch[5] Train-accuracy=0.968750
2016-10-22 18:49:27,400 Epoch[5] Time cost=109.095
2016-10-22 18:50:00,339 Epoch[5] Validation-accuracy=0.864102
2016-10-22 18:50:09,861 Epoch[6] Batch [10]	Speed: 139.72 samples/sec	Train-accuracy=0.978693
2016-10-22 18:50:19,028 Epoch[6] Batch [20]	Speed: 139.65 samples/sec	Train-accuracy=0.976562
2016-10-22 18:50:28,206 Epoch[6] Batch [30]	Speed: 139.48 samples/sec	Train-accuracy=0.975000
2016-10-22 18:50:37,343 Epoch[6] Batch [40]	Speed: 140.11 samples/sec	Train-accuracy=0.976562
2016-10-22 18:50:46,475 Epoch[6] Batch [50]	Speed: 140.18 samples/sec	Train-accuracy=0.971094
2016-10-22 18:50:55,613 Epoch[6] Batch [60]	Speed: 140.10 samples/sec	Train-accuracy=0.976562
2016-10-22 18:51:04,717 Epoch[6] Batch [70]	Speed: 140.60 samples/sec	Train-accuracy=0.978906
2016-10-22 18:51:13,821 Epoch[6] Batch [80]	Speed: 140.63 samples/sec	Train-accuracy=0.977344
2016-10-22 18:51:22,932 Epoch[6] Batch [90]	Speed: 140.50 samples/sec	Train-accuracy=0.971875
2016-10-22 18:51:32,039 Epoch[6] Batch [100]	Speed: 140.56 samples/sec	Train-accuracy=0.980469
2016-10-22 18:51:41,172 Epoch[6] Batch [110]	Speed: 140.17 samples/sec	Train-accuracy=0.978906
2016-10-22 18:51:50,312 Epoch[6] Batch [120]	Speed: 140.06 samples/sec	Train-accuracy=0.978906
2016-10-22 18:51:50,314 Epoch[6] Train-accuracy=nan
2016-10-22 18:51:50,314 Epoch[6] Time cost=109.974
2016-10-22 18:52:23,287 Epoch[6] Validation-accuracy=0.864738
2016-10-22 18:52:32,798 Epoch[7] Batch [10]	Speed: 139.84 samples/sec	Train-accuracy=0.982244
2016-10-22 18:52:41,881 Epoch[7] Batch [20]	Speed: 140.94 samples/sec	Train-accuracy=0.980469
2016-10-22 18:52:50,982 Epoch[7] Batch [30]	Speed: 140.67 samples/sec	Train-accuracy=0.978906
2016-10-22 18:53:00,086 Epoch[7] Batch [40]	Speed: 140.61 samples/sec	Train-accuracy=0.980469
2016-10-22 18:53:09,208 Epoch[7] Batch [50]	Speed: 140.35 samples/sec	Train-accuracy=0.975000
2016-10-22 18:53:18,342 Epoch[7] Batch [60]	Speed: 140.15 samples/sec	Train-accuracy=0.970313
2016-10-22 18:53:27,490 Epoch[7] Batch [70]	Speed: 139.94 samples/sec	Train-accuracy=0.978125
2016-10-22 18:53:36,623 Epoch[7] Batch [80]	Speed: 140.15 samples/sec	Train-accuracy=0.989844
2016-10-22 18:53:45,795 Epoch[7] Batch [90]	Speed: 139.58 samples/sec	Train-accuracy=0.976562
2016-10-22 18:53:54,958 Epoch[7] Batch [100]	Speed: 139.70 samples/sec	Train-accuracy=0.981250
2016-10-22 18:54:04,143 Epoch[7] Batch [110]	Speed: 139.39 samples/sec	Train-accuracy=0.974219
2016-10-22 18:54:12,364 Epoch[7] Train-accuracy=0.976562
2016-10-22 18:54:12,365 Epoch[7] Time cost=109.077
2016-10-22 18:54:45,259 Epoch[7] Validation-accuracy=0.863905

As can be seen, even for a single data epoch, it reaches 83% validation accuracy. After 8 epoches, the validation accuracy increases to 86.4%.