The MNIST dataset contains 70000 images of handwritten digits from 0-9. It is a very popular dataset and is commonly used to benchmark the accuracy of computer vision models. Some claim that for modern computer vision tasks the dataset is too easy to learn and is not a good indicator for how accurate a model really is. The Fashion MNIST dataset is used a drop-in replacement for MNIST and is meant to be a much more difficult dataset to classify.
From github.com/zalandoresearch
Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.
For more information see - https://github.com/zalandoresearch/fashion-mnist
In [1]:
%matplotlib inline
from collections import OrderedDict
import swat as sw
import matplotlib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image, display
from casplt import plot_imgs
# Start CAS session
sw.options.cas.print_messages = False
s = sw.CAS('localhost', 5570, 'sas','pwd')
s.loadactionset('image')
s.loadactionset('deepLearn')
s.loadactionset('sampling')
sns.set()
class_dict = OrderedDict([('class0','T-shirt/top'), ('class1','Trouser'), ('class2','Pullover'), ('class3','Dress'),
('class4','Coat'),('class5','Sandal'), ('class6','Shirt'), ('class7','Sneaker'),
('class8','Bag'), ('class9','Ankle boot')])
In [2]:
# Load training images recursively
s.image.loadimages(casout=dict(name='train', blocksize='128',replace=True), recurse=True, decode=True,
distribution="RANDOM", labelLevels=1, path='/some/dir/mnist_fashion_train_png/')
# Load test images recursively
s.image.loadimages(casout=dict(name='test', blocksize='128',replace=True), recurse=True, decode=True,
distribution="RANDOM", labelLevels=1, path='/some/dir/mnist_fashion_test_png/')
s.image.summarizeimages(imagetable='train')['Summary']
Out[2]:
In [3]:
s.stratified(display={"names":"STRAFreq"},
output={"casOut":{"name":"train", "replace":True}, "copyVars":"ALL"},
samppct=20, partind=True, seed=10,
table={"name":"train"},
outputTables={"names":{"STRAFreq"},"replace":True})
Out[3]:
In [4]:
trainTbl = s.CASTable('train', replace=True)
testTbl= s.CASTable('test',replace=True)
trainTbl.shuffle(casout=dict(name=trainTbl,replace=True), table=trainTbl)
testTbl.shuffle(casout=dict(name=testTbl,replace=True), table=testTbl)
Out[4]:
In [5]:
trainTbl.columninfo()
Out[5]:
In [6]:
trainTbl.freq(inputs='_PartInd_')
Out[6]:
In [ ]:
trainTbl.head()
In [8]:
plot_imgs(trainTbl, images_per_class=10, figsize=(30,20), font_size=18)
In [9]:
display(Image(filename='some/dir/lenet_architecture.png', embed=True))
Define the model architecture by subsequently adding layers to the model object. The model is a type of ConvNet called LeNet which consist of mainly 3 types of layers - convolution, pooling, and fully connected. The model can be made arbitrarily deep by repeating the sequence of convolution to pooling layers, but in this instance we only use 2 convolutional layers to keep the number of parameters down.
As each layer is built you must specify some options and hyperparameters. These options and hyperparamaters are layer dependent. For example the first layer, data, specifies the image dimensions as well as some image augmentation and normalization options. One option in the data layer is image standardization (std='STD') where an 'average' image is created (0 mean, 1 std). There is also a scale value specified which normalizes the pixel values of our image to be between 0 and 1 (0.004 = 1/255).
Following the input layer the first convolutional layer is created. Hyperparameters for this layer like number of filters, filter size, stride, weight initialization method, and activation function are assigned. Aside from the standard activation functions (tanh, relu, etc.) you can also define your own activation functions, which has auto differentiation built in.
The convolutional layer defines filters that slide across the input image or feature map. These filters are what the model is learning, as these filters build out the underlying features of our data set. For example, in fashion MNIST, a filter may detect vertical lines like those in the shirt class for a sleeve. The model learns this filter and will be activated as shirts with sleeves propagate through the network.
From the convolutional layer define a max pooling layer which down samples the output feature map from our convolutions to just contain the highest values from our activations, hence the max in max pooling. The reason pooling in performed is to reduce the number of parameters in the model. By reducing the parameter numbers, there is less chance of over fitting the model as well as a smaller computational footprint.
Repeat this one more time and then create a fully connected layer. This layer specifies a dropout hyperparameter to remove 40% of the neurons and connections to the previous layer. Dropout is technique for addressing the problem of overfitting and is an effective regularization technique.
Finally output the classifications using softmax activation in order to get the predictions. The output contains probability values for each of the 10 classes of the images. The highest probability value for a particular class is the classification chosen for that input.
In [10]:
# This will create a empty Conv. Net.
s.buildmodel(model=dict(name='lenet',replace=True),type='CNN')
# Add first layer. This is an input layer that reads in 1 channel (grayscale) 28x28 pixel images.
s.addlayer(model='lenet', name='data', replace=True,
layer=dict(type='input',nchannels=1, width=28, height=28, scale=0.004, std='STD'))
# Add 1st convolutional layer.
s.addLayer(model='lenet', name='conv1', replace=True,
layer=dict(type='convolution',act='relu', nFilters=32, width=5, height=5, stride=1, init='xavier'),
srcLayers=['data'])
# Add 1st max pooling layer.
s.addLayer(model='lenet', name='pool1', replace=True,
layer=dict(type='pooling', width=2, height=2, stride=2, pool='max'),
srcLayers=['conv1'])
# Add 2nd convolutional layer
s.addLayer(model='lenet', name='conv2', replace=True,
layer=dict(type='convolution',act='relu', nFilters=64, width=5, height=5, stride=1, init='xavier'),
srcLayers=['pool1'])
# Add 2nd max pooling layer
s.addLayer(model='lenet', name='pool2', replace=True,
layer=dict(type='pooling',width=2, height=2, stride=2, pool='max'),
srcLayers=['conv2'])
# Add fully connected layer
s.addLayer(model='lenet', name='fc1', replace=True,
layer=dict(type='fullconnect',n=1024, act='relu', init='xavier',dropout = 0.4),
srcLayers=['pool2'])
# Add softmax output layer
s.addLayer(model='lenet', name='outlayer', replace=True,
layer=dict(type='output',n=10,act='softmax', init='xavier'),
srcLayers=['fc1'])
s.modelInfo(model='lenet')
Out[10]:
In [11]:
sw.options.cas.print_messages = True
In [12]:
trainTbl.shuffle()
testTbl.shuffle()
Out[12]:
In [13]:
lenet = s.dltrain(model='lenet',
table=dict(name=trainTbl, where='_PartInd_ = 0.0'),
validTable=dict(name='train', where='_PartInd_ = 1.0'),
seed=54321,
nthreads=2,
gpu=dict(devices={0,1}),
inputs=['_image_'],
target='_label_',
modelWeights=dict(name='LeNet_Weights', replace=True),
optimizer=dict(mode=dict(type='synchronous'),
miniBatchSize=64,
algorithm=dict(method='momentum',learningRate = 1e-2,),
maxEpochs=50,
loglevel=1)
)
lenet
Out[13]:
In [14]:
s.dlscore(model='lenet',
initWeights='LeNet_weights',
table=testTbl,
copyVars={'_label_','_image_'},
layerImageType='jpg',
layerOut=dict(name='layerOut', replace=True),
casout=dict(name='LeNet_scored', replace=True)
)
Out[14]:
In [15]:
sw.options.cas.print_messages = False
In [16]:
cmr = s.crosstab(table='LeNet_scored', row='_label_', col='_DL_PredName_')
cmr['Crosstab']
cmr
Out[16]:
In [17]:
c=cmr['Crosstab'].values
c=c[:,1:].astype('float')
missedCount = []
for i in range(len(c)):
missed = 0
for j in range(len(c)):
if i != j:
missed = missed + c[i][j]
missedCount.append(missed)
plt.subplots(figsize=(12,6))
ax=sns.barplot(x=class_dict.values(), y=missedCount, color='blue', saturation=0.25)
plt.title('Missclassification Counts\n Total = '+str(np.sum(missedCount)))
plt.xlabel('Classes')
plt.ylabel('Number Missclassified')
plt.show()
In [18]:
plt.subplots(figsize=(12,6))
ax = sns.heatmap(c,annot=True, fmt="g", cmap='PuBu',
yticklabels=class_dict.values(), xticklabels=class_dict.values())
plt.title('Confusion Matrix')
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()
Looking above, it’s apparent that our "shirt" class has the highest number of misclassifications and is often times confused for T-shirts/tops, Pullovers, and Coats. We also see that those 3 classes in particular are mostly misclassified as shirts as well. To determine why shirts are so prevelent in cases of misclassification we can look at a few cases correct and incorrect predictions:
In [19]:
plot_imgs(s.CASTable('lenet_scored'), class_list=[6,2,0], query_condition='_label_ = _DL_PredName_',
images_per_class=10, figsize=(25,10), font_size=14)
In [20]:
plot_imgs(s.CASTable('lenet_scored'), class_list=[6,2,0], query_condition='_label_ ^= _DL_PredName_',
images_per_class=10, figsize=(25,10), font_size=14)
In [21]:
plot_imgs(s.CASTable('lenet_scored'),class_list=[6,2,0], query_condition='_label_ = _DL_PredName_ and _DL_PredP_ > 0.9',
images_per_class=10, figsize=(25,10),font_size=14)
In [22]:
plot_imgs(s.CASTable('lenet_scored'),class_list=[6,2,0], query_condition='_label_ ^= _DL_PredName_ and _DL_PredP_ > 0.9',
images_per_class=10, figsize=(25,10),font_size=14)
In [ ]:
layeroutTbl = s.CASTable('layerout',replace=True)
l_out = layeroutTbl.fetch()['Fetch']
In [24]:
fig=plt.figure(figsize=(12, 6))
for i in range(20):
w=np.asarray(layeroutTbl.fetchimages(image='_LayerAct_1_IMG_{}_'.format(i))['Images'].iloc[0][0])
b=fig.add_subplot(5,10,i+1)
plt.imshow(w, cmap='PuBu_r')
plt.title('act:{}'.format(i))
plt.axis('off')
In [25]:
fig=plt.figure(figsize=(12, 6))
for i in range(50):
w=np.asarray(layeroutTbl.fetchimages(image='_LayerAct_3_IMG_{}_'.format(i))['Images'].iloc[0][0])
b=fig.add_subplot(5,10,i+1)
plt.imshow(w, cmap='PuBu_r')
plt.title('act:{}'.format(i))
plt.axis('off')
In [ ]:
In [ ]:
In [26]:
s.endsession()
Out[26]:
In [ ]: