MNIST classification with Vowpal Wabbit


In [1]:
from __future__ import division
import re
import numpy as np
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

%matplotlib inline

In [2]:
#%qtconsole

Train

I found some help with parameters here:

--cache_file train.cache
converts train_ALL.vw to a binary file for future faster processing. Next time we go through the model building, we will use the cache file and not the text file.

--passes
is the number of passes

--oaa 10
refers to oaa learning algorithm with 10 classes (1 to 10)

-q ii
creates interaction between variables in the two referred to namespaces which here are the same i.e. 'image' Namespace.
An interaction variable is created from two variables 'A' and 'B' by multiplying the values of 'A' and 'B'.

-f mnist_ALL.model
refers to file where model will be saved.

-b
refers to number of bits in the feature table.
Default number is 18 but as we have increased the number of features much more by introducing interaction features, value of '-b' has been increased to 22.

-l rate
Adjust the learning rate. Defaults to 0.5

--power_t p
This specifies the power on the learning rate decay. You can adjust this --power_t p where p is in the range [0,1]. 0 means the learning rate does not decay, which can be helpful when state tracking, while 1 is very aggressive. Defaults to 0.5


In [3]:
!rm train_ect.vw.cache


rm: cannot remove ‘train_ect.vw.cache’: No such file or directory

In [4]:
!rm mnist_train_ect.model


rm: cannot remove ‘mnist_train_ect.model’: No such file or directory

In [5]:
!vw -d data/mnist_train.vw -b 19  --ect 10  -f mnist_train_ect.model  -q ii  --passes 100 -l 0.4  --early_terminate 3  --cache_file train_ect.vw.cache --power_t 0.6


creating quadratic features for pairs: ii 
final_regressor = mnist_train_ect.model
Num weight bits = 19
learning rate = 0.4
initial_t = 0
power_t = 0.6
decay_learning_rate = 1
creating cache_file = train_ect.vw.cache
Reading datafile = data/mnist_train.vw
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0        6        1    14028
1.000000 1.000000            2            2.0        1        6    15753
1.000000 1.000000            4            4.0        2        6     4753
1.000000 1.000000            8            8.0        4        3    20301
0.812500 0.625000           16           16.0        3        2    11476
0.843750 0.875000           32           32.0        1       10    17020
0.687500 0.531250           64           64.0        2        2     7626
0.531250 0.375000          128          128.0        8        9     8646
0.476562 0.421875          256          256.0        1        1    27730
0.382812 0.289062          512          512.0        8        8     6786
0.313477 0.244141         1024         1024.0        7        7    14878
0.226074 0.138672         2048         2048.0       10       10     9316
0.171875 0.117676         4096         4096.0        3        3     9316
0.131836 0.091797         8192         8192.0        1        1    14365
0.104431 0.077026        16384        16384.0        5        5    10585
0.081207 0.057983        32768        32768.0        6        6    11325
0.063736 0.046265        65536        65536.0        6        6    15753
0.037561 0.037561       131072       131072.0       10       10    11781 h
0.031174 0.024787       262144       262144.0        5        5     6328 h
0.026367 0.021561       524288       524288.0        7        7    25425 h

finished run
number of examples per pass = 108000
passes used = 9
weighted example sum = 972000.000000
weighted label sum = 0.000000
average loss = 0.021250 h
total feature number = 13723570242

Predict

-t
is for test file

-i
specifies the model file created earlier

-p
where to store the class predictions [1,10]


In [6]:
!rm predict_ect.txt


rm: cannot remove ‘predict_ect.txt’: No such file or directory

In [7]:
!vw -t data/mnist_test.vw -i mnist_train_ect.model  -p predict_ect.txt


creating quadratic features for pairs: ii 
only testing
predictions = predict_ect.txt
Num weight bits = 19
learning rate = 10
initial_t = 1
power_t = 0.5
using no cache
Reading datafile = data/mnist_test.vw
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.000000 0.000000            1            1.0        8        8     6903
0.000000 0.000000            2            2.0        3        3    13861
0.000000 0.000000            4            4.0        1        1    18915
0.000000 0.000000            8            8.0       10       10     8515
0.000000 0.000000           16           16.0        6        6     9591
0.000000 0.000000           32           32.0        2        2     1953
0.000000 0.000000           64           64.0        4        4     9591
0.007812 0.015625          128          128.0        6        6    12880
0.015625 0.023438          256          256.0        8        8     5460
0.037109 0.058594          512          512.0        5        5     5050
0.044922 0.052734         1024         1024.0        5        5    12246
0.055664 0.066406         2048         2048.0        7        6    18721
0.054688 0.053711         4096         4096.0       10       10    12403
0.042603 0.030518         8192         8192.0        7        7    14878
0.038086 0.033569        16384        16384.0        1        1    21528

finished run
number of examples per pass = 20000
passes used = 1
weighted example sum = 20000.000000
weighted label sum = 0.000000
average loss = 0.033600
total feature number = 288781223

Analyze


In [8]:
y_true=[]
with open("data/mnist_test.vw", 'rb') as f:
    for line in f:
        m = re.search('^\d+', line)
        if m:
            found = m.group()
        y_true.append(int(found))


y_pred = []
with open("predict_ect.txt", 'rb') as f:
    for line in f:
        m = re.search('^\d+', line)
        if m:
            found = m.group()
        y_pred.append(int(found))

target_names     = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"] # NOTE: plus one

In [9]:
def plot_confusion_matrix(cm, 
                          target_names,
                          title='Proportional Confusion matrix: VW ect on 784 pixels', 
                          cmap=plt.cm.Paired):  
    """
    given a confusion matrix (cm), make a nice plot
    see the skikit-learn documentation for the original done for the iris dataset
    """
    plt.figure(figsize=(8, 6))
    plt.imshow((cm/cm.sum(axis=1)), interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(target_names))
    plt.xticks(tick_marks, target_names, rotation=45)
    plt.yticks(tick_marks, target_names)
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    
cm = confusion_matrix(y_true, y_pred)  

print(cm)
model_accuracy = sum(cm.diagonal())/len(y_pred)
model_misclass = 1 - model_accuracy
print("\nModel accuracy: {0}, model misclass rate: {1}".format(model_accuracy, model_misclass))

plot_confusion_matrix(cm, target_names)


[[1926    2    2    2    1   13    5    2    4    3]
 [   1 2252    5    2    3    4    2    0    1    0]
 [  10    8 2002    6    7    2    0    8   21    0]
 [   0    1    3 1970    2   21    0    8   10    5]
 [   9    4    7    0 1920    0    4    0    3   17]
 [   3    0    1   21    0 1739    4    0   14    2]
 [   9    6    4    3   23   80 1789    0    2    0]
 [   2   11   25    6   22    5    0 1954    8   23]
 [  13    4    7   28   18   15    0    4 1858    1]
 [   7    6    0   16   42   16    0   10    3 1918]]

Model accuracy: 0.9664, model misclass rate: 0.0336

In [ ]: