Machine Learning at Scale, Part I

KMeans clustering at scale

Training models with data that fits in memory is very limiting. But minibatch learners can easily work with data directly from disk.

We'll use the MNIST data set, which has 8 million images (about 17 GB). The dataset has been partition into groups of 100k images (using the unix split command) and saved in compressed lz4 files. This dataset is very large and doesnt get loaded by default by getdata.sh. You have to load it explicitly by calling getmnist.sh from the scripts directory. The script automatically splits the data into files that are small enough to be loaded into memory.

Let's load BIDMat/BIDMach



In [1]:

    
import BIDMat.{CMat,CSMat,DMat,Dict,IDict,Image,FMat,FND,GDMat,GMat,GIMat,GSDMat,GSMat,HMat,IMat,Mat,SMat,SBMat,SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.JPlotting._
import BIDMach.Learner
import BIDMach.models.{FM,GLM,KMeans,KMeansw,ICA,LDA,LDAgibbs,Model,NMF,RandomForest,SFA,SVD}
import BIDMach.datasources.{DataSource,MatSource,FileSource,SFileSource}
import BIDMach.mixins.{CosineSim,Perplexity,Top,L1Regularizer,L2Regularizer}
import BIDMach.updaters.{ADAGrad,Batch,BatchNorm,IncMult,IncNorm,Telescoping}
import BIDMach.causal.{IPTW}

Mat.checkMKL
Mat.checkCUDA
Mat.setInline
if (Mat.hasCUDA > 0) GPUmem









    



1 CUDA device found, CUDA version 7.0






    Out[1]:





(0.99132067,11974557696,12079398912)

And define the root directory for this dataset.



In [2]:

    
val mdir = "../data/MNIST8M/parts/"









    









    Out[2]:





../data/MNIST8M/parts/

Constrained Clustering.

For this tutorial, we are going to evaluate the quality of clustering by using it for classification. We use a labeled dataset, and compute clusters of training samples using k-Means. Then we match new test samples to the clusters and find the best match. The label assigned to the new sample is the majority vote of the cluster.

This method by itself doesnt work well. Clusters will often straddle label boundaries leading to poor labelings. Its better to force each cluster to have a single label. We do that by adding the labels in as very strong features before clustering. The label features cause samples with different labels to be very far apart. Far enough that k-Means will never assign them to the same cluster. The data we want looks like this:

           Instance 0      Instance 1      Instance 2    ...
           has label "2"   has label "7"   has label "0" ...
           /    0               0           10000         ...
          |     0               0               0         ...
          | 10000               0               0         ...
          |     0               0               0         ...
label    /      0               0               0         ...
features \      0               0               0         ...
(10)      |     0               0               0         ...
          |     0           10000               0         ...
          |     0               0               0         ...
           \    0               0               0         ...

           /  128              19               5         ...
          |    47              28               9         ...
image    /     42             111              18         ...
features \     37             128              17         ...
(784)     |    18             176              14         ...
          |    ..              ..              ..

We chose the label feature weights (here 10000) to force the distance between differently-labeled samples (2 10000^2) to be larger than the distance between two image samples (1000 256^2). This guarantees that points will not be assigned to a cluster containing a different label (assuming there is initially at least one cluster center with each label).

Even though these label features are present in cluster centroids after training, they dont affect matching at test time. Test images dont have the label features, and will match the closest cluster based only on image features. That cluster will have a unique label, which we then assign to the test point.

The files containind data in this form are named "alls00.fmat.lz4", "alls01.fmat.lz4" etc. Since they contain both data and labels, we dont need to load label files separately. We can create a learner using a pattern for accessing these files:



In [3]:

    
val (mm, opts) = KMeans.learner(mdir+"alls%02d.fmat.lz4")









    









    Out[3]:





BIDMach.models.KMeans$FileOptions@21efd653

The string "%02d" is a C/Scala format string that expands into a two-digit ASCII number to help with the enumeration.

There are several new options that can tailor a files datasource, but we'll mostly use the defaults. One thing we will do is define the last file to use for training (number 70). This leaves us with some held-out files to use for testing.



In [4]:

    
opts.dim = 30000
opts.nend = 10

Note that the training data include image data and labels (0-9). K-Means is an unsupervised algorithm and if we used image data only KMeans will often build clusters containing different digit images. To produce cleaner clusters, and to facilitate classification later on, the alls data includes both labels in the first 10 rows, and image data in the remaining rows. The label features are scaled by a large constant factor. That means that images of different digits will be far apart in feature space. It effectively prevents different digits occuring in the same cluster.

Tuning Options

The following options are the important ones for tuning. For KMeans, batchSize has no effect on accracy since the algorithm uses all the data instances to perform an update. So you're free to tune it for best speed. Generally larger is better, as long as you dont use too much GPU ram.

npasses is the number of passes over the dataset. Larger is typically better, but the model may overfit at some point.



In [5]:

    
opts.batchSize = 20000
opts.npasses = 10

You invoke the learner the same way as before. You can change the options above after each run to optimize performance.



In [6]:

    
mm.train









    



pass= 0
First pass random centroid initialization
 4.00%, ll=0.00000, gf=702.986, secs=1.4, GB=0.13, MB/s=93.55, GPUmem=0.347705
25.00%, ll=0.00000, gf=755.267, secs=2.5, GB=0.83, MB/s=326.65, GPUmem=0.334922
48.00%, ll=0.00000, gf=777.832, secs=3.7, GB=1.52, MB/s=414.04, GPUmem=0.330495
70.00%, ll=0.00000, gf=794.722, secs=4.8, GB=2.22, MB/s=462.68, GPUmem=0.327652
91.00%, ll=0.00000, gf=809.445, secs=5.9, GB=2.92, MB/s=495.49, GPUmem=0.325633
100.00%, ll=0.00000, gf=878.386, secs=6.5, GB=3.18, MB/s=487.04, GPUmem=0.325123
pass= 1
 4.00%, ll=-1447031.50000, gf=999.597, secs=8.6, GB=3.30, MB/s=384.25, GPUmem=0.118532
25.00%, ll=-1487285.37500, gf=1403.871, secs=20.4, GB=4.00, MB/s=196.12, GPUmem=0.118532
48.00%, ll=-1519196.25000, gf=1511.701, secs=32.2, GB=4.70, MB/s=145.91, GPUmem=0.118532
70.00%, ll=-1528522.62500, gf=1561.714, secs=44.0, GB=5.40, MB/s=122.64, GPUmem=0.118532
91.00%, ll=-1480058.12500, gf=1590.571, secs=55.8, GB=6.10, MB/s=109.22, GPUmem=0.118532
100.00%, ll=-1464802.12500, gf=1597.795, secs=59.8, GB=6.35, MB/s=106.28, GPUmem=0.118532
pass= 2
 4.00%, ll=-1044000.43750, gf=1590.891, secs=61.8, GB=6.48, MB/s=104.80, GPUmem=0.118532
25.00%, ll=-1060547.75000, gf=1608.093, secs=73.6, GB=7.18, MB/s=97.48, GPUmem=0.118532
48.00%, ll=-1094171.00000, gf=1620.824, secs=85.4, GB=7.88, MB/s=92.20, GPUmem=0.118532
70.00%, ll=-1100127.12500, gf=1630.215, secs=97.2, GB=8.58, MB/s=88.19, GPUmem=0.118532
91.00%, ll=-1051547.75000, gf=1637.572, secs=109.0, GB=9.27, MB/s=85.05, GPUmem=0.118532
100.00%, ll=-1021016.25000, gf=1639.758, secs=113.0, GB=9.53, MB/s=84.34, GPUmem=0.118532
pass= 3
 4.00%, ll=-1017818.81250, gf=1635.083, secs=115.1, GB=9.66, MB/s=83.92, GPUmem=0.118532
25.00%, ll=-1040552.50000, gf=1641.148, secs=126.8, GB=10.35, MB/s=81.63, GPUmem=0.118532
48.00%, ll=-1077800.62500, gf=1646.002, secs=138.7, GB=11.05, MB/s=79.71, GPUmem=0.118532
70.00%, ll=-1084202.50000, gf=1650.260, secs=150.4, GB=11.75, MB/s=78.11, GPUmem=0.118532
91.00%, ll=-1029964.18750, gf=1653.745, secs=162.3, GB=12.45, MB/s=76.73, GPUmem=0.118532
100.00%, ll=-990975.06250, gf=1654.839, secs=166.2, GB=12.70, MB/s=76.44, GPUmem=0.118532
pass= 4
 4.00%, ll=-1007864.68750, gf=1651.447, secs=168.3, GB=12.83, MB/s=76.25, GPUmem=0.118532
25.00%, ll=-1033786.43750, gf=1654.509, secs=180.1, GB=13.53, MB/s=75.13, GPUmem=0.118532
48.00%, ll=-1072413.62500, gf=1657.325, secs=191.9, GB=14.23, MB/s=74.16, GPUmem=0.118532
70.00%, ll=-1079038.12500, gf=1659.691, secs=203.7, GB=14.93, MB/s=73.29, GPUmem=0.118532
91.00%, ll=-1022153.50000, gf=1661.791, secs=215.5, GB=15.63, MB/s=72.51, GPUmem=0.118532
100.00%, ll=-978369.18750, gf=1662.483, secs=219.4, GB=15.88, MB/s=72.37, GPUmem=0.118532
pass= 5
 4.00%, ll=-1002847.81250, gf=1659.962, secs=221.5, GB=16.01, MB/s=72.27, GPUmem=0.118532
25.00%, ll=-1030117.56250, gf=1661.895, secs=233.3, GB=16.71, MB/s=71.61, GPUmem=0.118532
48.00%, ll=-1069558.00000, gf=1663.635, secs=245.1, GB=17.40, MB/s=71.01, GPUmem=0.118532
70.00%, ll=-1076436.37500, gf=1665.124, secs=256.9, GB=18.10, MB/s=70.46, GPUmem=0.118532
91.00%, ll=-1017634.81250, gf=1666.569, secs=268.7, GB=18.80, MB/s=69.97, GPUmem=0.118532
100.00%, ll=-971843.25000, gf=1666.959, secs=272.7, GB=19.06, MB/s=69.88, GPUmem=0.118532
pass= 6
 4.00%, ll=-1000022.31250, gf=1664.802, secs=274.8, GB=19.18, MB/s=69.82, GPUmem=0.118532
25.00%, ll=-1028040.18750, gf=1666.171, secs=286.6, GB=19.88, MB/s=69.38, GPUmem=0.118532
48.00%, ll=-1067767.00000, gf=1667.352, secs=298.4, GB=20.58, MB/s=68.97, GPUmem=0.118532
70.00%, ll=-1074786.75000, gf=1668.439, secs=310.2, GB=21.28, MB/s=68.60, GPUmem=0.118532
91.00%, ll=-1014987.12500, gf=1669.523, secs=322.0, GB=21.98, MB/s=68.25, GPUmem=0.118532
100.00%, ll=-967646.18750, gf=1669.896, secs=326.0, GB=22.23, MB/s=68.21, GPUmem=0.118532
pass= 7
 4.00%, ll=-998211.56250, gf=1668.146, secs=328.0, GB=22.36, MB/s=68.17, GPUmem=0.118532
25.00%, ll=-1026361.31250, gf=1669.189, secs=339.8, GB=23.06, MB/s=67.85, GPUmem=0.118532
48.00%, ll=-1066458.75000, gf=1670.086, secs=351.6, GB=23.76, MB/s=67.56, GPUmem=0.118532
70.00%, ll=-1073685.12500, gf=1670.998, secs=363.5, GB=24.46, MB/s=67.29, GPUmem=0.118532
91.00%, ll=-1013243.18750, gf=1671.500, secs=375.3, GB=25.15, MB/s=67.02, GPUmem=0.118532
100.00%, ll=-965184.87500, gf=1671.800, secs=379.3, GB=25.41, MB/s=66.99, GPUmem=0.118532
pass= 8
 4.00%, ll=-996995.75000, gf=1670.215, secs=381.3, GB=25.54, MB/s=66.96, GPUmem=0.118532
25.00%, ll=-1025366.50000, gf=1671.054, secs=393.2, GB=26.23, MB/s=66.73, GPUmem=0.118532
48.00%, ll=-1065699.37500, gf=1671.844, secs=405.0, GB=26.93, MB/s=66.51, GPUmem=0.118532
70.00%, ll=-1073239.62500, gf=1672.585, secs=416.8, GB=27.63, MB/s=66.30, GPUmem=0.118532
91.00%, ll=-1011892.31250, gf=1673.290, secs=428.6, GB=28.33, MB/s=66.10, GPUmem=0.118532
100.00%, ll=-962925.87500, gf=1673.474, secs=432.5, GB=28.58, MB/s=66.09, GPUmem=0.118532
pass= 9
 4.00%, ll=-996031.43750, gf=1671.776, secs=434.7, GB=28.71, MB/s=66.05, GPUmem=0.118532
25.00%, ll=-1024622.68750, gf=1672.473, secs=446.5, GB=29.41, MB/s=65.87, GPUmem=0.118532
48.00%, ll=-1065454.25000, gf=1673.076, secs=458.3, GB=30.11, MB/s=65.69, GPUmem=0.118532
70.00%, ll=-1072694.00000, gf=1673.706, secs=470.1, GB=30.81, MB/s=65.53, GPUmem=0.118532
91.00%, ll=-1010900.93750, gf=1674.305, secs=481.9, GB=31.51, MB/s=65.37, GPUmem=0.118532
100.00%, ll=-961305.56250, gf=1674.516, secs=485.9, GB=31.76, MB/s=65.37, GPUmem=0.118532
Time=485.8830 secs, gflops=1674.46

Now lets extract the model as a Floating-point matrix. We included the category features for clustering to make sure that each cluster is a subset of images for one digit.



In [7]:

    
val modelmat = FMat(mm.modelmat)









    









    Out[7]:





      0      0      0      0      0      0      0      0      0  10000...
  10000      0      0      0      0      0      0      0      0      0...
  10000      0      0      0      0      0      0      0      0      0...
      0      0      0      0      0      0  10000      0      0      0...
      0  10000      0      0      0      0      0      0      0      0...
      0      0      0  10000      0      0      0      0      0      0...
  10000      0      0      0      0      0      0      0      0      0...
      0  10000      0      0      0      0      0      0      0      0...
     ..     ..     ..     ..     ..     ..     ..     ..     ..     ..

Next we build a 30 x 10 array of images to view the first 300 cluster centers as images.



In [8]:

    
val nx = 30
val ny = 10
val im = zeros(28,28)
val allim = zeros(28*nx,28*ny)
for (i<-0 until nx) {
    for (j<-0 until ny) {
        val slice = modelmat(i+nx*j,10->794)
        im(?) = slice(?)
        allim((28*i)->(28*(i+1)), (28*j)->(28*(j+1))) = im
    }
}
show(allim kron ones(2,2))









    









    Out[8]:

We'll predict using the closest cluster (or 1-NN if you like). Since we did constrained clustering, our data include the labels for each instance, but unlabeled test data doesnt have this. So we project the model matrix down to remove its first 10 features. Before doing this though we find the strongest label for each cluster so later on we can map from cluster id to label.



In [9]:

    
val igood = find(sum(modelmat,2) > 100)                // find non-empty clusters
val mmat = modelmat(igood,?)









    









    Out[9]:





      0      0      0      0      0      0      0      0      0  10000...
  10000      0      0      0      0      0      0      0      0      0...
  10000      0      0      0      0      0      0      0      0      0...
      0      0      0      0      0      0  10000      0      0      0...
      0  10000      0      0      0      0      0      0      0      0...
      0      0      0  10000      0      0      0      0      0      0...
  10000      0      0      0      0      0      0      0      0      0...
      0  10000      0      0      0      0      0      0      0      0...
     ..     ..     ..     ..     ..     ..     ..     ..     ..     ..



In [10]:

    
val (dmy, catmap) = maxi2(mmat(?,0->10).t)                // Lookup the label for each cluster
mm.model.modelmats(0) = mmat(?,10->mmat.ncols)            // Remove the label features
mm.model.modelmats(1) = mm.modelmats(1)(igood,0)
catmap(0->100)









    









    Out[10]:





9,0,0,6,1,3,0,1,3,7,6,1,3,5,3,4,4,7,2,9,3,0,8,3,2,9,0,3,1,2,2,5,7,7,1,5,4,0,3,6,9,6,8,8,8,9,4,3,8,7,2,8,7,3,0,3,1,8,0,8,1,9,8,0,3,1,0,0,3,1,1,0,7,3,9,2,1,0,1,1,6,2,9,3,1,2,6,1,1,8,0,9,1,4,8,4,3,1,1,1

Next we define a predictor from the just-computed model and the testdata, with the preds files to catch the predictions.



In [18]:

    
val (pp, popts) = KMeans.predictor(mm.model, mdir+"data%02d.fmat.lz4", mdir+"preds%02d.imat.lz4")

popts.nstart = 70                                      // start with file 70 as test data
popts.nend = 80                                        // finish at file 79
popts.ofcols = 100000                                  // Match number of samples per file to test file
popts.batchSize = 10000

Lets run the predictor



In [19]:

    
pp.predict









    



Predicting
 2.00%, ll=-101005864.00000, gf=813.954, secs=1.1, GB=0.06, MB/s=55.07, GPUmem=0.67
 3.00%, ll=-100983656.00000, gf=979.324, secs=1.4, GB=0.09, MB/s=66.25, GPUmem=0.67
 5.00%, ll=-101019392.00000, gf=1188.581, secs=2.0, GB=0.16, MB/s=80.41, GPUmem=0.67
 6.00%, ll=-100990560.00000, gf=1246.652, secs=2.2, GB=0.19, MB/s=84.34, GPUmem=0.67
 8.00%, ll=-101043492.00000, gf=1335.388, secs=2.8, GB=0.25, MB/s=90.34, GPUmem=0.67
10.00%, ll=-100993888.00000, gf=1388.280, secs=3.3, GB=0.31, MB/s=93.92, GPUmem=0.67
11.00%, ll=-101004648.00000, gf=1414.821, secs=3.6, GB=0.34, MB/s=95.72, GPUmem=0.67
12.00%, ll=-101008464.00000, gf=1431.804, secs=3.9, GB=0.38, MB/s=96.86, GPUmem=0.67
14.00%, ll=-101006396.00000, gf=1464.602, secs=4.4, GB=0.44, MB/s=99.08, GPUmem=0.67
16.00%, ll=-100992752.00000, gf=1490.204, secs=5.0, GB=0.50, MB/s=100.82, GPUmem=0.67
17.00%, ll=-101068400.00000, gf=1510.744, secs=5.5, GB=0.56, MB/s=102.21, GPUmem=0.67
19.00%, ll=-100991520.00000, gf=1521.663, secs=5.8, GB=0.60, MB/s=102.94, GPUmem=0.67
20.00%, ll=-101005908.00000, gf=1536.861, secs=6.3, GB=0.66, MB/s=103.97, GPUmem=0.67
22.00%, ll=-100985192.00000, gf=1541.652, secs=6.6, GB=0.69, MB/s=104.30, GPUmem=0.67
24.00%, ll=-101015832.00000, gf=1550.323, secs=7.2, GB=0.75, MB/s=104.88, GPUmem=0.67
25.00%, ll=-101001400.00000, gf=1560.763, secs=7.7, GB=0.82, MB/s=105.59, GPUmem=0.67
27.00%, ll=-101019872.00000, gf=1566.820, secs=8.0, GB=0.85, MB/s=106.00, GPUmem=0.67
28.00%, ll=-101014976.00000, gf=1569.824, secs=8.3, GB=0.88, MB/s=106.20, GPUmem=0.67
29.00%, ll=-101045808.00000, gf=1575.211, secs=8.5, GB=0.91, MB/s=106.57, GPUmem=0.67
30.00%, ll=-101057432.00000, gf=1577.763, secs=8.8, GB=0.94, MB/s=106.74, GPUmem=0.67
31.00%, ll=-101037576.00000, gf=1582.593, secs=9.1, GB=0.97, MB/s=107.07, GPUmem=0.67
32.00%, ll=-100988068.00000, gf=1589.138, secs=9.6, GB=1.03, MB/s=107.51, GPUmem=0.67
34.00%, ll=-100921416.00000, gf=1591.014, secs=9.9, GB=1.07, MB/s=107.64, GPUmem=0.67
35.00%, ll=-100975080.00000, gf=1596.602, secs=10.5, GB=1.13, MB/s=108.01, GPUmem=0.67
37.00%, ll=-100981672.00000, gf=1600.226, secs=10.7, GB=1.16, MB/s=108.26, GPUmem=0.67
39.00%, ll=-100970040.00000, gf=1604.964, secs=11.3, GB=1.22, MB/s=108.58, GPUmem=0.67
40.00%, ll=-101022856.00000, gf=1606.191, secs=11.5, GB=1.25, MB/s=108.66, GPUmem=0.67
41.00%, ll=-100951208.00000, gf=1607.223, secs=11.8, GB=1.29, MB/s=108.73, GPUmem=0.67
43.00%, ll=-100996876.00000, gf=1611.228, secs=12.4, GB=1.35, MB/s=109.00, GPUmem=0.67
44.00%, ll=-100941592.00000, gf=1614.123, secs=12.6, GB=1.38, MB/s=109.20, GPUmem=0.67
45.00%, ll=-100917360.00000, gf=1617.596, secs=13.2, GB=1.44, MB/s=109.43, GPUmem=0.67
46.00%, ll=-101005136.00000, gf=1618.264, secs=13.5, GB=1.47, MB/s=109.48, GPUmem=0.67
48.00%, ll=-100946552.00000, gf=1620.793, secs=13.7, GB=1.51, MB/s=109.65, GPUmem=0.67
49.00%, ll=-100993024.00000, gf=1621.371, secs=14.0, GB=1.54, MB/s=109.69, GPUmem=0.67
50.00%, ll=-100991888.00000, gf=1621.927, secs=14.3, GB=1.57, MB/s=109.73, GPUmem=0.67
51.00%, ll=-101037652.00000, gf=1624.725, secs=14.8, GB=1.63, MB/s=109.92, GPUmem=0.67
53.00%, ll=-101000288.00000, gf=1626.910, secs=15.1, GB=1.66, MB/s=110.06, GPUmem=0.67
54.00%, ll=-100988368.00000, gf=1627.325, secs=15.4, GB=1.69, MB/s=110.09, GPUmem=0.67
55.00%, ll=-100963864.00000, gf=1629.390, secs=15.6, GB=1.72, MB/s=110.23, GPUmem=0.67
57.00%, ll=-100976692.00000, gf=1631.703, secs=16.2, GB=1.79, MB/s=110.39, GPUmem=0.67
58.00%, ll=-100936096.00000, gf=1632.008, secs=16.5, GB=1.82, MB/s=110.41, GPUmem=0.67
59.00%, ll=-101037264.00000, gf=1633.864, secs=16.7, GB=1.85, MB/s=110.53, GPUmem=0.67
60.00%, ll=-101008576.00000, gf=1634.125, secs=17.0, GB=1.88, MB/s=110.55, GPUmem=0.67
61.00%, ll=-100970072.00000, gf=1635.889, secs=17.3, GB=1.91, MB/s=110.67, GPUmem=0.67
63.00%, ll=-101016104.00000, gf=1637.790, secs=17.8, GB=1.98, MB/s=110.80, GPUmem=0.67
64.00%, ll=-101018004.00000, gf=1639.578, secs=18.4, GB=2.04, MB/s=110.92, GPUmem=0.67
65.00%, ll=-101016592.00000, gf=1639.730, secs=18.7, GB=2.07, MB/s=110.93, GPUmem=0.67
66.00%, ll=-100995392.00000, gf=1641.263, secs=18.9, GB=2.10, MB/s=111.04, GPUmem=0.67
68.00%, ll=-100982480.00000, gf=1641.386, secs=19.2, GB=2.13, MB/s=111.04, GPUmem=0.67
70.00%, ll=-100998912.00000, gf=1642.950, secs=19.8, GB=2.20, MB/s=111.15, GPUmem=0.67
71.00%, ll=-101013612.00000, gf=1644.431, secs=20.3, GB=2.26, MB/s=111.25, GPUmem=0.67
73.00%, ll=-101069136.00000, gf=1644.502, secs=20.6, GB=2.29, MB/s=111.25, GPUmem=0.67
74.00%, ll=-101034232.00000, gf=1645.833, secs=20.8, GB=2.32, MB/s=111.34, GPUmem=0.67
76.00%, ll=-101026964.00000, gf=1647.164, secs=21.4, GB=2.38, MB/s=111.43, GPUmem=0.67
78.00%, ll=-101007316.00000, gf=1648.429, secs=21.9, GB=2.45, MB/s=111.52, GPUmem=0.67
79.00%, ll=-101021680.00000, gf=1648.444, secs=22.2, GB=2.48, MB/s=111.52, GPUmem=0.67
80.00%, ll=-100982880.00000, gf=1649.632, secs=22.5, GB=2.51, MB/s=111.60, GPUmem=0.67
81.00%, ll=-100978512.00000, gf=1649.632, secs=22.8, GB=2.54, MB/s=111.60, GPUmem=0.67
83.00%, ll=-101017784.00000, gf=1650.765, secs=23.3, GB=2.60, MB/s=111.68, GPUmem=0.67
85.00%, ll=-101041340.00000, gf=1651.845, secs=23.9, GB=2.67, MB/s=111.75, GPUmem=0.67
86.00%, ll=-101010552.00000, gf=1652.915, secs=24.1, GB=2.70, MB/s=111.82, GPUmem=0.67
87.00%, ll=-100958200.00000, gf=1652.878, secs=24.4, GB=2.73, MB/s=111.82, GPUmem=0.67
89.00%, ll=-101018340.00000, gf=1653.865, secs=24.9, GB=2.79, MB/s=111.89, GPUmem=0.67
91.00%, ll=-101006796.00000, gf=1654.809, secs=25.5, GB=2.85, MB/s=111.95, GPUmem=0.67
93.00%, ll=-100994880.00000, gf=1655.714, secs=26.0, GB=2.92, MB/s=112.01, GPUmem=0.67
95.00%, ll=-101012468.00000, gf=1654.653, secs=26.6, GB=2.98, MB/s=111.94, GPUmem=0.67
96.00%, ll=-101068464.00000, gf=1655.585, secs=26.9, GB=3.01, MB/s=112.00, GPUmem=0.67
98.00%, ll=-101007076.00000, gf=1655.464, secs=27.4, GB=3.07, MB/s=112.00, GPUmem=0.67
100.00%, ll=-100981328.00000, gf=1656.293, secs=28.0, GB=3.14, MB/s=112.05, GPUmem=0.67
Time=27.9870 secs, gflops=1656.29

The preds files now contains the numbers of the best-matching cluster centers. We still need to look up the category label for each one, and compare with the reference data. We'll do this one file at a time, so that our evaluation can scale to arbitrary problem sizes.



In [20]:

    
val totals = (popts.nstart until popts.nend).map(i => {
                    val preds = loadIMat(mdir + "preds%02d.imat.lz4" format i);    // predicted centroids
                    val cats = loadIMat(mdir + "cat%02d.imat.lz4" format i);       // reference labels
                    val cpreds = catmap(preds);                                    // map centroid to label
                    accum(cats.t \ cpreds.t, 1.0, 10, 10)                          // form a confusion matrix
}).reduce(_+_)

totals









    









    Out[20]:





   98512      24      17      15       3      32     104      14      24...
       0  112177      28       8      20       0      12      52       3...
     148     174   98284      59      10      16      33     408      87...
      15      45     149  100926       1     456       7     164     300...
      17     240       5       0   95776       3      78     104      12...
      53      21      18     344      15   89357     344       9      86...
     140      62      14       3      43     169   98130       0      41...
       0     441      98       5     130       2       0  103225       9...
      ..      ..      ..      ..      ..      ..      ..      ..      ..

From the actual and predicted categories, we can compute a confusion matrix:



In [21]:

    
val conf = float(totals / sum(totals))









    









    Out[21]:





     0.99453  0.00021154  0.00017225  0.00014721  3.1033e-05  0.00035376...
           0     0.98874  0.00028370  7.8512e-05  0.00020689           0...
   0.0014941   0.0015337     0.99584  0.00057903  0.00010344  0.00017688...
  0.00015143  0.00039664   0.0015097     0.99049  1.0344e-05   0.0050410...
  0.00017162   0.0021154  5.0661e-05           0     0.99073  3.3165e-05...
  0.00053506  0.00018510  0.00018238   0.0033760  0.00015516     0.98783...
   0.0014134  0.00054648  0.00014185  2.9442e-05  0.00044480   0.0018683...
           0   0.0038870  0.00099296  4.9070e-05   0.0013448  2.2110e-05...
          ..          ..          ..          ..          ..          ..

Now lets create an image by multiplying each confusion matrix cell by a white square:



In [22]:

    
show((conf * 250f) ⊗ ones(32,32))









    









    Out[22]:

Its useful to isolate the correct classification rate by digit, which is:



In [23]:

    
val dacc = getdiag(conf).t









    









    Out[23]:





0.99453,0.98874,0.99584,0.99049,0.99073,0.98783,0.99188,0.98695,0.99332,0.97656

We can take the mean of the diagonal accuracies to get an overall accuracy for this model.



In [24]:

    
mean(dacc)

Run the experiment again with a larger number of clusters (3000, then 30000). You should reduce the batchSize option to 20000 to avoid memory problems.

Include the training time output by the call to nn.train but not the evaluation time. Rerun and fill out the table below:

KMeans Clusters	Training time	Avg. gflops	Accuracy
300	...	...	...
3000	...	...	...
30000	...	...	...