Let's try to use a neural network as a classification estimator. We will use awesome caffe framework.
In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_validation import train_test_split
from sklearn import datasets
digits = datasets.load_digits()
# split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split( digits.data,
digits.target,
test_size=0.33 )
# courtesy caffe devs - usefull for visualisation of tilled images
def vis_square(data, padsize=1, padval=0):
data -= data.min()
data /= data.max()
# force the number of filters to be square
n = int(np.ceil(np.sqrt(data.shape[0])))
padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))
# tile the filters into an image
data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
plt.imshow(data, cmap = plt.get_cmap('gray') )
vis_square( X_train[ 0:100 ].reshape(100,8,8) )
We're going to use one convolutional layer. To avoid config-fu with caffe, the data will go into LMDB store. But 1st, let's cleanup the environment to avoid costly debugging.
In [2]:
rm -r digits_train_lmdb digits_test_lmdb
Caffe uses fairly simple kev / value format. The code below performs conversion from sklearn dataset into LMDB.
In [3]:
import lmdb
import sys
sys.path.append( './caffe/python' )
import caffe
env = lmdb.open( 'digits_train_lmdb', map_size=1000000000L )
k = 0
for i in range( X_train.shape[0] ):
x_in = X_train[ i ].reshape( (8,8) ).astype( np.uint8 )
# add original digit
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = 1
datum.height = 8
datum.width = 8
datum.data = x_in.tostring()
datum.label = int( y_train[i] )
str_id = '{:08}'.format( k )
k += 1
with env.begin( write=True ) as txn:
txn.put( str_id.encode('ascii'), datum.SerializeToString() )
print( env.stat() )
env.close()
env = lmdb.open( 'digits_test_lmdb', map_size=1000000000L )
k = 0
for i in range( X_test.shape[0] ):
x_in = X_test[ i ].reshape( (8,8) ).astype( np.uint8 )
# add original digit
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = 1
datum.height = 8
datum.width = 8
datum.data = x_in.tostring()
datum.label = int( y_test[i] )
str_id = '{:08}'.format( k )
k += 1
with env.begin( write=True ) as txn:
txn.put( str_id.encode('ascii'), datum.SerializeToString() )
print( env.stat() )
env.close()
We're going to define here solver parameters and network structure. The original MNIST reference net included with caffe, had two convolutional layers. Given the small size of the input picture (8x8), we're only going to use one convolutional layer.
In [4]:
solver_txt = """
# The train/test net protocol buffer definition
net: "digits_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# the batch is 64 images
test_iter: 100
# Carry out testing every 100 training iterations.
test_interval: 100
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.95
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 1000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "digits"
# solver mode: CPU or GPU
solver_mode: GPU
"""
open( 'digits_solver.prototxt', 'wt' ).write( solver_txt )
digits_train_test = """
name: "digits"
layer {
name: "digits_data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.0625
}
data_param {
source: "digits_train_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "digits_data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.0625
}
data_param {
source: "digits_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool1"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
"""
open( 'digits_train_test.prototxt', 'wt' ).write( digits_train_test )
Let's start the training using GPU.
In [5]:
!./caffe/build/tools/caffe train -solver digits_solver.prototxt 2> digits_train.log
In [6]:
!python ./caffe/tools/extra/parse_log.py digits_train.log .
In [7]:
import pandas as pd
# columns: NumIters,Seconds,TrainingLoss,LearningRate
df_train = pd.read_csv( 'digits_train.log.train' )
# columns: NumIters,Seconds,TestAccuracy,TestLoss
df_test = pd.read_csv( 'digits_train.log.test' )
fig, (ax1, ax2) = plt.subplots( 2, sharex=True, figsize=(10, 12) )
ax1.plot( df_train.NumIters, df_train.TrainingLoss, 'b-' )
ax1.set_xlabel( 'Iterations' )
ax1.set_ylabel( 'Loss' )
ax3 = ax1.twinx()
ax3.plot( df_train.NumIters, df_train.LearningRate, 'g-' )
ax3.set_ylabel( 'Learning Rate' )
ax2.plot( df_test.NumIters, df_test.TestAccuracy, 'r-' )
ax2.set_xlabel( 'Iterations' )
ax2.set_ylabel( 'Test Accuracy' )
plt.show()
In [8]:
print( 'Final accuracy: {0}'.format( df_test.tail(1).TestAccuracy.iloc[0] ) )