COCOA Method based on LBFGS Optimization

In this tutorial, we'll explore training and evaluation of L-BFGS method based Logitistic Regression Classifiers.

To start, we import the standard BIDMach class definitions.



In [1]:

    
import BIDMat.{CMat,CSMat,DMat,Dict,IDict,FMat,GMat,GIMat,GSMat,HMat,IMat,Mat,SMat,SBMat,SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{FM,GLM,KMeans,LDA,LDAgibbs,NMF,SFA}
import BIDMach.datasources.{MatDS,FilesDS,SFilesDS}
import BIDMach.mixins.{CosineSim,Perplexity,Top,L1Regularizer,L2Regularizer}
import BIDMach.updaters.{ADAGrad,Batch,BatchNorm,IncMult,IncNorm,Telescoping}
import BIDMach.causal.{IPTW}

Mat.checkMKL
Mat.checkCUDA
if (Mat.hasCUDA > 0) GPUmem

//var myaaa = loadFMat("/Users/Anna/workspace/BIDMach/data/2_new.txt");
//myaaa.nrows









    



Cant find native HDF5 library
Couldnt load JCuda






    Out[1]:





289

Now we load some training and test data, and some category labels. The data come from a news collection from Reuters, and is a "classic" test set for classification. Each article belongs to one or more of 103 categories. The articles are represented as Bag-of-Words (BoW) column vectors. For a data matrix A, element A(i,j) holds the count of word i in document j.

The category matrices have 103 rows, and a category matrix C has a one in position C(i,j) if document j is tagged with category i, or zero otherwise.

To reduce the computing time and memory footprint, the training data have been sampled. The full collection has about 700k documents. Our training set has 60k.

Since the document matrices contain counts of words, we use a min function to limit the count to "1", i.e. because we need binary features for naive Bayes.



In [21]:

    
val dict = "/Users/Anna/workspace/BIDMach/data/"
val rpath = "/Users/Anna/workspace/BIDMach/data/"

//val dict = "/Users/Anna/workspace/C3D/examples/c3d_feature_extraction/UCF11/"
//val rpath = "/Users/Anna/workspace/C3D/examples/c3d_feature_extraction/UCF11/"
//val dict = "/mnt/data/FCVID/FCVID_Feature/"
//val rpath = "/mnt/data/FCVID/FCVID_Results/"

var index = 1; 
var index1 = 2;
var tnum = 1;
var tnum1 = 1;
var tnumx = 1;
var tnumx1 = 1;
//for(index <- 1 to 1){
 var aa = loadFMat(dict+index+"_new.txt");
// var c = loadFMat(dict+index+"_test"+".txt")
 var b = loadFMat(dict+index1+"_new.txt");

//var a = aa.t   
var atrain = aa(?,(1->aa.ncols))
//var atest = c(?,(1->c.ncols))
var atrain1 = b(?,(1->b.ncols))


var maxx1 = maxi(maxi(atrain,1),2);
var minx1 = mini(mini(atrain,1),2);
var maxx2 = maxi(maxi(atrain1,1),2);
var minx2 = mini(mini(atrain1,1),2);
var maxx = 0.0;
var minx = 0.0;
if((maxx1.data(0))>(maxx2.data(0))){maxx=maxx1.data(0);}
else {maxx=maxx2.data(0);}
if((minx1.data(0))>(minx2.data(0))){minx=minx2.data(0);}
else {minx=minx1.data(0);}
//println(maxx,minx)  
atrain = (atrain-minx)/(maxx-minx);
atrain1 = (atrain1-minx)/(maxx-minx);

//TODO: sample balance problem
//the number of training samples
tnumx = (atrain.nrows+1)/2;
tnumx1 =  (atrain1.nrows+1)/2;
if((tnumx)>(tnumx1)){tnum=tnumx1;tnum1=tnumx1;}
else 
{tnum=tnumx;tnum1=tnumx;}

//var ttrain = atrain(0->tnum,?);                 //atrain(0->tnum,(239->atrain.ncols) )
//var ttest = atrain(tnum->atrain.nrows,?);       //atrain(tnum->atrain.nrows,(239->atrain.ncols) )
//println(tnum,tnum1);

//var ttrain1 = atrain1(0->tnum1,?);              //atrain1(0->tnum1,(239->atrain1.ncols) )
//var ttest1 = atrain1(tnum1->atrain1.nrows,?);   //atrain1(tnum1->atrain1.nrows,(239->atrain1.ncols) )


var trainx = zeros( (tnum+tnum1),4096);
var testx = zeros( (tnum+tnum1),4096);
var ctrain1=zeros(2,(tnum+tnum1) ) ;
var ctest=zeros(2,(tnum+tnum1) ) ;

for(i<-0 to (tnum-2) )
{
   trainx((2*i),?)=atrain((2*i),?);
    testx((2*i),?) = atrain((2*i+1),?);
    testx((2*i+1),?) = atrain1((2*i+1),?);
   trainx((2*i+1),?)=atrain1(2*i,?);
   ctrain1(0,(2*i))= 1;  ctrain1(1,(2*i+1))= 1;
   ctrain1(1,(2*i))= -1;  ctrain1(0,(2*i+1))= -1;
   ctest(0,(2*i)) =  1;
   ctest(1,(2*i+1)) = 1;
}

/*//
trainx(0->(ttrain.nrows),?)=ttrain
trainx( (ttrain.nrows)->(ttrain.nrows+ttrain1.nrows),?)=ttrain1
ctrain1(0,(0->(ttrain.nrows) ) )= 1
ctrain1(1,(ttrain.nrows)->(ttrain.nrows+ttrain1.nrows)) = 1
//ctrain1(1,(0->(ttrain.nrows) ) )= -1
//ctrain1(0,(ttrain.nrows)->(ttrain.nrows+ttrain1.nrows)) = -1
*///
println(trainx.nrows,trainx.ncols);
/*//
testx(0->(ttest.nrows),?)=ttest
testx( (ttest.nrows)->(ttest.nrows+ttest1.nrows),?)=ttest1

ctest(0,(0->(ttest.nrows) ) )= 1
ctest(1,(ttest.nrows)->(ttest.nrows+ttest1.nrows)) = 1
*///

//max(atrain, 0.001, atrain)                       // the first "traindata" argument is the input, the other is out

//tnum=testx.nrows
val cx=zeros(2,testx.nrows)
//println(tnum);
val (mm,mopts,nn,nopts)=GLM.learner(trainx.t,ctrain1,testx.t,cx,5)
mopts.autoReset=false
mopts.useGPU=false
mopts.lrate=0.15
mopts.batchSize=30
mopts.dim=256
mopts.startBlock=0
mopts.npasses=1
mopts.updateAll=false
mm.train;
nn.predict;

//saveFMat(rpath+"r"+index+".txt",cx/tnum);
//}
var count = zeros(2,1);
var count1 = zeros(2,1);
min(cx, 1, cx)                       // the first "traindata" argument is the input, the other is output
max(cx, 0, cx) 
//println(cx.nrows,cx.ncols);
for(i<-0 to (1)){ for(j<-0 to (cx.ncols-1))
                 {
                  if(cx(i,j)>0.0)
                  { 
                    cx(i,j)=1; count(i,0)+=1; //positive
                    if(ctest(i,j)>0.0)count1(i,0)+=1;  //true positive
                  }  
                 }
                }
val p=ctest *@cx +(1-ctest) *@(1-cx);
var meanp=mean(p,2);
var meanq=count1(0,0)/count(0,0);
var meanq1=count1(1,0)/count(1,0);
saveFMat(rpath+"meanp"+index+".txt",meanp);
saveFMat(rpath+"meanq"+index+".txt",meanq);
//println(meanp,cx);
println(meanq,meanq1);
cx
//}









    



(280,4096)
corpus perplexity=3331.661008
pass= 0
21.00%, ll=-1.00000, gf=1.534, secs=0.0, GB=0.00, MB/s=887.18
100.00%, ll=-0.20000, gf=0.605, secs=0.0, GB=0.00, MB/s=254.82
Time=0.0160 secs, gflops=0.60
corpus perplexity=3672.354592
Predicting
 3.00%, ll=-2.83563, gf=0.164, secs=0.0, GB=0.00, MB/s=124.75
 7.00%, ll=-3.05231, gf=0.328, secs=0.0, GB=0.00, MB/s=267.08
10.00%, ll=-2.35773, gf=0.492, secs=0.0, GB=0.00, MB/s=438.24
14.00%, ll=-3.00912, gf=0.656, secs=0.0, GB=0.00, MB/s=591.13
17.00%, ll=-3.23159, gf=0.410, secs=0.0, GB=0.00, MB/s=362.80
21.00%, ll=-3.33330, gf=0.492, secs=0.0, GB=0.00, MB/s=432.51
25.00%, ll=-3.66408, gf=0.574, secs=0.0, GB=0.00, MB/s=511.49
28.00%, ll=-4.24531, gf=0.438, secs=0.0, GB=0.00, MB/s=382.96
32.00%, ll=-3.87677, gf=0.492, secs=0.0, GB=0.00, MB/s=430.12
35.00%, ll=-4.03520, gf=0.547, secs=0.0, GB=0.00, MB/s=478.84
39.00%, ll=-3.35823, gf=0.602, secs=0.0, GB=0.00, MB/s=527.00
42.00%, ll=-3.90740, gf=0.492, secs=0.0, GB=0.00, MB/s=422.07
46.00%, ll=-4.10422, gf=0.533, secs=0.0, GB=0.00, MB/s=462.76
50.00%, ll=-3.43480, gf=0.574, secs=0.0, GB=0.00, MB/s=497.98
53.00%, ll=-3.04730, gf=0.615, secs=0.0, GB=0.00, MB/s=544.74
57.00%, ll=-4.51894, gf=0.525, secs=0.0, GB=0.00, MB/s=457.13
60.00%, ll=-4.25245, gf=0.558, secs=0.0, GB=0.00, MB/s=480.51
64.00%, ll=-4.33993, gf=0.591, secs=0.0, GB=0.00, MB/s=513.97
67.00%, ll=-3.18239, gf=0.624, secs=0.0, GB=0.00, MB/s=550.77
71.00%, ll=-2.56534, gf=0.547, secs=0.0, GB=0.00, MB/s=486.99
75.00%, ll=-2.67207, gf=0.574, secs=0.0, GB=0.00, MB/s=509.66
78.00%, ll=-2.40262, gf=0.602, secs=0.0, GB=0.00, MB/s=529.84
82.00%, ll=-3.32243, gf=0.629, secs=0.0, GB=0.00, MB/s=550.73
85.00%, ll=-3.82720, gf=0.563, secs=0.0, GB=0.00, MB/s=493.16
89.00%, ll=-3.93341, gf=0.586, secs=0.0, GB=0.00, MB/s=515.91
92.00%, ll=-3.55948, gf=0.610, secs=0.0, GB=0.00, MB/s=536.09
96.00%, ll=-3.98607, gf=0.633, secs=0.0, GB=0.00, MB/s=557.91
100.00%, ll=-2.86721, gf=0.574, secs=0.0, GB=0.00, MB/s=503.76
Time=0.0080 secs, gflops=0.57
(1.0,1.0)






    Out[21]:





   1   0   1   0   1   0   1   0   1   0   1   0   1   0   1   0   1   0...
   0   1   0   1   0   1   0   1   0   1   0   1   0   1   0   1   0   1...

Get the word and document counts from the data. This turns out to be equivalent to a matrix multiply. For a data matrix A and category matrix C, we want all (cat, word) pairs (i,j) such that C(i,k) and A(j,k) are both 1 - this means that document k contains word j, and is also tagged with category i. Summing over all documents gives us

$${\rm wordcatCounts(i,j)} = \sum_{k=1}^N C(i,k) A(j,k) = C * A^T$$

Because we are doing independent binary classifiers for each class, we need to construct the counts for words not in the class (negwcounts).

Finally, we add a smoothing count 0.5 to counts that could be zero.



In [4]:

    
val cx=zeros(ctest.nrows,ctest.ncols)
val (mm,mopts,nn,nopts)=GLM.learner(atrain,ctrain,atest,cx,0)
mopts.autoReset=false
mopts.useGPU=false
mopts.lrate=0.1
mopts.batchSize=2
mopts.dim=256
mopts.startBlock=0
mopts.npasses=10
mopts.updateAll=false
//mopts.what









    









    



<console>:26: error: not found: value ctest
       val cx=zeros(ctest.nrows,ctest.ncols)
                    ^
<console>:26: error: not found: value ctest
       val cx=zeros(ctest.nrows,ctest.ncols)
                                ^
<console>:27: error: not found: value atrain
       val (mm,mopts,nn,nopts)=GLM.learner(atrain,ctrain,atest,cx,0)
                                           ^
<console>:27: error: not found: value ctrain
       val (mm,mopts,nn,nopts)=GLM.learner(atrain,ctrain,atest,cx,0)
                                                  ^
<console>:27: error: not found: value atest
       val (mm,mopts,nn,nopts)=GLM.learner(atrain,ctrain,atest,cx,0)
                                                         ^
<console>:45: error: value autoReset is not a member of Any
val $ires0 = mopts.autoReset
                   ^
<console>:46: error: value useGPU is not a member of Any
val $ires1 = mopts.useGPU
                   ^
<console>:47: error: value lrate is not a member of Any
val $ires2 = mopts.lrate
                   ^
<console>:48: error: value batchSize is not a member of Any
val $ires3 = mopts.batchSize
                   ^
<console>:49: error: value dim is not a member of Any
val $ires4 = mopts.dim
                   ^
<console>:50: error: value startBlock is not a member of Any
val $ires5 = mopts.startBlock
                   ^
<console>:51: error: value npasses is not a member of Any
val $ires6 = mopts.npasses
                   ^
<console>:52: error: value updateAll is not a member of Any
val $ires7 = mopts.updateAll
                   ^
<console>:28: error: value autoReset is not a member of Any
       mopts.autoReset=false
             ^
<console>:29: error: value useGPU is not a member of Any
       mopts.useGPU=false
             ^
<console>:30: error: value lrate is not a member of Any
       mopts.lrate=0.1
             ^
<console>:31: error: value batchSize is not a member of Any
       mopts.batchSize=2
             ^
<console>:32: error: value dim is not a member of Any
       mopts.dim=256
             ^
<console>:33: error: value startBlock is not a member of Any
       mopts.startBlock=0
             ^
<console>:34: error: value npasses is not a member of Any
       mopts.npasses=10
             ^
<console>:35: error: value updateAll is not a member of Any
       mopts.updateAll=false
             ^

Now compute the probabilities

pwordcat = probability that a word is in a cat, given the cat.
pwordncat = probability of a word, given the complement of the cat.
pcat = probability that doc is in a given cat.
spcat = sum of pcat probabilities (> 1 because docs can be in multiple cats)



In [17]:

    
//mm.train
//nn.predict
var myaaa = loadFMat("/Users/Anna/workspace/BIDMach/data/1_new.txt");
myaaa









    









    



<console>:29: error: not found: value loadHMat
              var myaaa = loadHMat("/Users/Anna/workspace/BIDMach/data/1_new.txt");
                          ^

Now take the logs of those probabilities. Here we're using the formula presented here to match Naive Bayes to Logistic Regression for independent data.

For each word, we compute the log of the ratio of the complementary word probability over the in-class word probability.

For each category, we compute the log of the ratio of the complementary category probability over the current category probability.

lpwordcat(j,i) represents $\log\left(\frac{{\rm Pr}(X_i|\neg c_j)}{{\rm Pr}(X_i|c_j)}\right)$

while lpcat(j) represents $\log\left(\frac{{\rm Pr}(\neg c)}{{\rm Pr}(c)}\right)$



In [22]:

    
val cx1=cx
min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
max(cx1, 0, cx1) 
val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
mean(p,2)
//saveFMat(rpath+index+".txt",cx)
cx









    









    



java.lang.RuntimeException: dims incompatible
    BIDMat.DenseMat$mcF$sp.ggMatOpStrictv$mcF$sp(DenseMat.scala:963)
    BIDMat.DenseMat$mcF$sp.ggMatOpv$mcF$sp(DenseMat.scala:951)
    BIDMat.FMat.ffMatOpv(FMat.scala:206)
    BIDMat.FMat.$times$at(FMat.scala:799)

Here's where we apply Naive Bayes. The formula we're using is borrowed from here.

$${\rm Pr}(c|X_1,\ldots,X_k) = \frac{1}{1 + \frac{{\rm Pr}(\neg c)}{{\rm Pr}(c)}\prod_{i-1}^k\frac{{\rm Pr}(X_i|\neg c)}{{\rm Pr}(X_i|c)}}$$

and we can rewrite

$$\frac{{\rm Pr}(\neg c)}{{\rm Pr}(c)}\prod_{i-1}^k\frac{{\rm Pr}(X_i|\neg c)}{{\rm Pr}(X_i|c)}$$

$$\exp\left(\log\left(\frac{{\rm Pr}(\neg c)}{{\rm Pr}(c)}\right) + \sum_{i=1}^k\log\left(\frac{{\rm Pr}(X_i|\neg c)}{{\rm Pr}(X_i|c)}\right)\right) = \exp({\rm lpcat(j)} + {\rm lpwordcat(j,?)} * X)$$

for class number j and an input column $X$. This follows because an input column $X$ is a sparse vector with ones in the positions of the input features. The product ${\rm lpwordcat(i,?)} * X$ picks out the features occuring in the input document and adds the corresponding logs from lpwordcat.

Finally, we take the exponential above and fold it into the formula $P(c_j|X_1,\ldots,X_k) = 1/(1+\exp(\cdots))$. This gives us a matrix of predictions. preds(i,j) = prediction of membership in category i for test document j.



In [6]:

    
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)

To measure the accuracy of the predictions above, we can compute the probability that the classifier outputs the right label. We used this formula in class for the expected accuracy for logistic regression. The "dot arrow" operator takes dot product along rows:



In [7]:

    
val model = mm.model
val (nn1, nopts1) = GLM.LBFGSpredictor(model, atest, cx)









    









    Out[7]:





BIDMach.models.GLM$LearnLBFGSOptions@8eb62c4

Raw accuracy is not a good measure in most cases. When there are few positives (instances in the class vs. its complement), accuracy simply drives down false-positive rate at the expense of false-negative rate. In the worst case, the learner may always predict "no" and still achieve high accuracy.

ROC curves and ROC Area Under the Curve (AUC) are much better. Here we compute the ROC curves from the predictions above. We need:

scores - the predicted quality from the formula above.
good - 1 for positive instances, 0 for negative instances.
bad - complement of good.
npoints (100) - specifies the number of X-axis points for the ROC plot.

itest specifies which of the categories to plot for. We chose itest=6 because that category has one of the highest positive rates, and gives the most stable accuracy plots.



In [8]:

    
nn1.predict









    



corpus perplexity=1.970211
Predicting
 4.00%, ll=-0.88782, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
 8.00%, ll=-0.87371, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.19
12.00%, ll=-0.94405, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.14
16.00%, ll=-0.97745, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.19
20.00%, ll=-0.91692, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.24
24.00%, ll=-0.95906, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.19
28.00%, ll=-0.91178, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.22
32.00%, ll=-0.80047, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
36.00%, ll=-0.92667, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.22
40.00%, ll=-0.91742, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.24
44.00%, ll=-0.92421, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
48.00%, ll=-0.94550, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.23
51.00%, ll=-0.81677, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.25
56.00%, ll=-0.91331, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.27
60.00%, ll=-0.93091, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.24
64.00%, ll=-0.85525, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
68.00%, ll=-0.87134, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.27
72.00%, ll=-0.94563, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.25
76.00%, ll=-0.78031, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
80.00%, ll=-0.94823, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.27
83.00%, ll=-0.95096, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.29
88.00%, ll=-0.86657, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
92.00%, ll=-0.97759, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.28
96.00%, ll=-0.75901, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.29
100.00%, ll=-0.93326, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.27
Time=0.0090 secs, gflops=0.00

TODO 1: In the cell below, write an expression to derive the ROC Area under the curve (AUC) given the curve rr. rr gives the ROC curve y-coordinates at 100 evenly-spaced X-values from 0 to 1.0.



In [9]:

    
val cx1=cx*10
min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
max(cx1, 0, cx1) 
val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
mean(p,2)
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)

TODO 2: In the cell below, write the value of AUC returned by the expression above.



In [10]:

    
cx1









    









    Out[10]:





        1  0.97167  0.32712        1        1        0  0.19741        1...
        0        0        0        0        0        1        0        0...



In [11]:

    
saveFMat(dict+"moonresult.fmat.txt",cx)

Logistic Regression

Now lets train a logistic classifier on the same data. BIDMach has an umbrella classifier called GLM for Generalized Linear Model. GLM includes linear regression, logistic regression (with log accuracy or direct accuracy optimization), and SVM.

The learner function accepts these arguments:

traindata: the training data in the same format as for Naive Bayes
traincats: the training category labels
testdata: the test input data
predcats: a container for the predictions generated by the model
modeltype (GLM.logistic here): an integer that specifies the type of model (0=linear, 1=logistic log accuracy, 2=logistic accuracy, 3=SVM).

We'll construct the learner and then look at its options:



In [2]:

    
val dict = "/Users/Anna/workspace/BIDMach/data/wildlife/"
val aa = loadFMat(dict+"1.txt")
val c = loadFMat(dict+"c1.txt")
val b = loadFMat("/Users/Anna/workspace/BIDMach/data/wltestall.txt")
val d = loadFMat("/Users/Anna/workspace/BIDMach/data/wltestcatall.txt")
//val dict = "/Users/Anna/workspace/BIDMach_1.0.0-full-linux-x86_64/data/uci/"
//val aa = loadFMat(dict+"arabic.fmat.lz4")
//val c = loadFMat(dict+"arabic_cats.fmat.lz4")
val a = aa *10  
val atrain = a //a(?,(100->a.ncols))
val atest =  a //a(?,(0->100))
val ctrain = c //c(?,(100->a.ncols))
val ctest = c //c(?,(0->100))
//max(atrain, 0.001, atrain)                       // the first "traindata" argument is the input, the other is output
//max(atest, 0.001, atest)
atrain









    









    Out[2]:





    -0.33386    -0.33914    -0.29299    -0.40609    -0.43935    -0.55155...
     0.28568     0.18359  -0.0037000    -0.17297   -0.095350   -0.013070...
   -0.098630    -0.13269     0.41184     0.40252     0.24642     0.26601...
   -0.031890   -0.038550     0.38608     0.35432     0.21913     0.31026...
     0.17352     0.25341     0.37047     0.53088     0.40656     0.43656...
     0.45181     0.47072     0.50574     0.36032     0.33288     0.26337...
     0.24997    -0.11709     0.54020     0.33701     0.30424     0.15961...
     0.10230    0.082690    0.017330     0.46488     0.24725   -0.025730...
          ..          ..          ..          ..          ..          ..

The most important options are:

lrate: the learning rate
batchSize: the minibatch size
npasses: the number of passes over the dataset

We'll use the following parameters for this training run.



In [7]:

    
val cx=zeros(2,ctest.ncols)
val (mm,mopts,nn,nopts)=GLM.SVMlearner(atrain,ctrain,atest,cx)
mopts.autoReset=false
mopts.useGPU=false
mopts.lrate=1
mopts.batchSize=2
mopts.dim=256
mopts.startBlock=0
mopts.npasses=10
mopts.updateAll=false
mopts.what









    



Option Name       Type          Value
===========       ====          =====
addConstFeat      boolean       false
autoReset         boolean       false
batchSize         int           2
dim               int           256
doubleScore       boolean       false
epsilon           float         1.0E-5
evalStep          int           11
featThreshold     Mat           null
featType          int           1
hashFeatures      boolean       false
initsumsq         float         1.0E-5
iweight           FMat          null
lim               float         0.0
links             IMat             3
   3

lrate             FMat          1
mask              FMat          null
npasses           int           10
nzPerColumn       int           0
pstep             float         0.01
putBack           int           -1
r1nmats           int           1
r2nmats           int           1
reg1weight        FMat          1.0000e-07
reg2weight        FMat          1
resFile           String        null
rmask             FMat          null
sample            float         1.0
sizeMargin        float         3.0
startBlock        int           0
targets           FMat          null
targmap           FMat          null
texp              FMat          0.50000
updateAll         boolean       false
useCache          boolean       true
useDouble         boolean       false
useGPU            boolean       false
vexp              FMat          0.50000
waitsteps         int           2



In [8]:

    
mm.train
nn.predict
val cx1=cx
//min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
//max(cx1, 0, cx1) 
//val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
//mean(p,2)
cx









    



corpus perplexity=NaN
pass= 0
 6.00%, ll=-1.00000, gf=0.011, secs=0.0, GB=0.00, MB/s=12.00
41.00%, ll=0.00000, gf=0.010, secs=0.1, GB=0.00, MB/s= 5.11
77.00%, ll=0.00000, gf=0.018, secs=0.1, GB=0.00, MB/s= 8.73
100.00%, ll=0.00000, gf=0.023, secs=0.1, GB=0.00, MB/s=10.78
pass= 1
 6.00%, ll=0.00000, gf=0.023, secs=0.1, GB=0.00, MB/s=11.31
41.00%, ll=0.00000, gf=0.026, secs=0.1, GB=0.00, MB/s=12.14
77.00%, ll=0.00000, gf=0.031, secs=0.1, GB=0.00, MB/s=14.35
100.00%, ll=0.00000, gf=0.026, secs=0.1, GB=0.00, MB/s=12.30
pass= 2
 6.00%, ll=0.00000, gf=0.026, secs=0.1, GB=0.00, MB/s=12.49
41.00%, ll=0.00000, gf=0.030, secs=0.1, GB=0.00, MB/s=13.95
77.00%, ll=0.00000, gf=0.029, secs=0.2, GB=0.00, MB/s=13.76
100.00%, ll=0.00000, gf=0.031, secs=0.2, GB=0.00, MB/s=14.59
pass= 3
 6.00%, ll=0.00000, gf=0.031, secs=0.2, GB=0.00, MB/s=14.80
41.00%, ll=0.00000, gf=0.034, secs=0.2, GB=0.00, MB/s=16.00
77.00%, ll=0.00000, gf=0.034, secs=0.2, GB=0.00, MB/s=15.95
100.00%, ll=0.00000, gf=0.035, secs=0.2, GB=0.00, MB/s=16.53
pass= 4
 6.00%, ll=0.00000, gf=0.036, secs=0.2, GB=0.00, MB/s=16.80
41.00%, ll=0.00000, gf=0.038, secs=0.2, GB=0.00, MB/s=17.77
77.00%, ll=-0.21902, gf=0.038, secs=0.2, GB=0.00, MB/s=17.67
100.00%, ll=0.00000, gf=0.039, secs=0.2, GB=0.00, MB/s=18.15
pass= 5
 6.00%, ll=0.00000, gf=0.039, secs=0.2, GB=0.00, MB/s=18.38
41.00%, ll=0.00000, gf=0.041, secs=0.2, GB=0.00, MB/s=19.11
77.00%, ll=-0.14903, gf=0.043, secs=0.2, GB=0.00, MB/s=19.89
100.00%, ll=0.00000, gf=0.040, secs=0.2, GB=0.00, MB/s=18.45
pass= 6
 6.00%, ll=0.00000, gf=0.040, secs=0.2, GB=0.00, MB/s=18.57
41.00%, ll=0.00000, gf=0.041, secs=0.3, GB=0.00, MB/s=19.10
77.00%, ll=-0.10156, gf=0.042, secs=0.3, GB=0.01, MB/s=19.76
100.00%, ll=0.00000, gf=0.043, secs=0.3, GB=0.01, MB/s=20.19
pass= 7
 6.00%, ll=0.00000, gf=0.043, secs=0.3, GB=0.01, MB/s=20.29
41.00%, ll=0.00000, gf=0.042, secs=0.3, GB=0.01, MB/s=19.78
77.00%, ll=0.00000, gf=0.043, secs=0.3, GB=0.01, MB/s=20.22
100.00%, ll=0.00000, gf=0.044, secs=0.3, GB=0.01, MB/s=20.52
pass= 8
 6.00%, ll=0.00000, gf=0.044, secs=0.3, GB=0.01, MB/s=20.62
41.00%, ll=0.00000, gf=0.045, secs=0.3, GB=0.01, MB/s=21.16
77.00%, ll=0.00000, gf=0.045, secs=0.3, GB=0.01, MB/s=20.86
100.00%, ll=0.00000, gf=0.045, secs=0.3, GB=0.01, MB/s=21.19
pass= 9
 6.00%, ll=0.00000, gf=0.046, secs=0.3, GB=0.01, MB/s=21.27
41.00%, ll=0.00000, gf=0.047, secs=0.3, GB=0.01, MB/s=21.76
77.00%, ll=0.00000, gf=0.048, secs=0.3, GB=0.01, MB/s=22.17
100.00%, ll=0.00000, gf=0.048, secs=0.3, GB=0.01, MB/s=22.48
Time=0.3310 secs, gflops=0.05
corpus perplexity=NaN
Predicting
 4.00%, ll=-5.35505, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
 9.00%, ll=-5.56336, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
14.00%, ll=-5.39651, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
19.00%, ll=-3.47791, gf=0.048, secs=0.0, GB=0.00, MB/s=144.00
24.00%, ll=-4.42609, gf=0.060, secs=0.0, GB=0.00, MB/s=180.00
29.00%, ll=-5.00045, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
33.00%, ll=-4.28123, gf=0.042, secs=0.0, GB=0.00, MB/s=126.00
38.00%, ll=-4.85835, gf=0.048, secs=0.0, GB=0.00, MB/s=144.00
43.00%, ll=-4.95342, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
48.00%, ll=-5.43300, gf=0.040, secs=0.0, GB=0.00, MB/s=120.00
53.00%, ll=-4.59199, gf=0.044, secs=0.0, GB=0.00, MB/s=132.00
58.00%, ll=-3.04440, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
62.00%, ll=-4.28866, gf=0.039, secs=0.0, GB=0.00, MB/s=117.00
67.00%, ll=-4.49006, gf=0.042, secs=0.0, GB=0.00, MB/s=126.00
72.00%, ll=-6.24142, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
77.00%, ll=-5.79272, gf=0.039, secs=0.0, GB=0.00, MB/s=115.20
82.00%, ll=-3.15073, gf=0.041, secs=0.0, GB=0.00, MB/s=122.40
87.00%, ll=-4.36058, gf=0.044, secs=0.0, GB=0.00, MB/s=129.60
91.00%, ll=-7.38732, gf=0.038, secs=0.0, GB=0.00, MB/s=114.00
96.00%, ll=-10.42975, gf=0.040, secs=0.0, GB=0.00, MB/s=120.00
100.00%, ll=-6.19474, gf=0.042, secs=0.0, GB=0.00, MB/s=124.00
Time=0.0070 secs, gflops=0.04






    Out[8]:





  -7.7981  -10.713  -10.620  -11.079  -10.499  -8.8026  -12.241  -12.910...
   7.7981   10.713   10.620   11.079   10.499   8.8026   12.241   12.910...

Since we have the accuracy scores for both Naive Bayes and Logistic regression, we can plot both of them on the same axes. Naive Bayes is red, Logistic regression is blue. The x-axis is the category number from 0 to 102. The y-axis is the absolute accuracy of the predictor for that category.



In [7]:

    
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)









    









    



java.lang.NoClassDefFoundError: Could not initialize class

TODO 3: With the full training set (700k training documents), Logistic Regression is noticeably more accurate than Naive Bayes in every category. What do you observe in the plot above? Why do you think this is?

Next we'll compute the ROC plot and ROC area (AUC) for Logistic regression for category itest.



In [40]:

    
saveFMat(dict+"wildliferesults.fmat.txt",cx)

We computed the ROC curve for Naive Bayes earlier, so now we can plot them on the same axes. Naive Bayes is once again in red, Logistic regression in blue.



In [43]:

    
var my = loadFMat("/Users/Anna/workspace/BIDMach/data/UCF1.txt");
//var mya = loadFMat("/Users/Anna/workspace/C3D/examples/c3d_feature_extraction/UCF11/1.txt");
myaa









    









    



java.lang.ArrayIndexOutOfBoundsException:

TODO 4: In the cell below, compute and plot lift curves from the ROC curves for Naive Bayes and Logistic regression. The lift curves should show the ratio of ROC y-values over a unit slope diagonal line (Y=X). The X-values should be the same as for the ROC plots, except that X=0 will be omitted since the lift will be undefined.



In [25]:

    
val dict = "/Users/Anna/workspace/BIDMach_1.0.0-full-linux-x86_64/data/"
//val a = loadFMat(dict+"arabic.fmat.lz4")
//val c = loadFMat(dict+"arabic_cats.fmat.lz4")
val aa = loadFMat(dict+"a.txt")
val c = loadFMat(dict+"alabel.txt")
val a = aa + 0.5 
val atrain =a(?,(100->a.ncols))
val atest =a(?,(0->100))
val ctrain =c(?,(100->a.ncols))
val ctest =c(?,(0->100))
max(atrain, 0.001, atrain)                       // the first "traindata" argument is the input, the other is output
max(atest, 0.001, atest)
atest









    









    Out[25]:





   0.075172    0.42627    0.64251    0.36696    0.19400    0.58963...
    0.64074    0.42242    0.41518    0.87276    0.77227  0.0010000...

TODO 5: Experiment with different values for learning rate and batchSize to get the best performance for absolute accuracy and ROC area on category 6. Write your optimal values below:



In [41]:

    
val cx=zeros(ctest.nrows,ctest.ncols)
val (mm,mopts,nn,nopts)=GLM.SVMlearner(atrain,ctrain,atest,cx)
mopts.autoReset=false
mopts.useGPU=false
mopts.lrate=0.1
mopts.batchSize=2
mopts.dim=256
mopts.startBlock=0
mopts.npasses=10
mopts.updateAll=false
mopts.what









    



Option Name       Type          Value
===========       ====          =====
addConstFeat      boolean       false
autoReset         boolean       false
batchSize         int           2
dim               int           256
doubleScore       boolean       false
epsilon           float         1.0E-5
evalStep          int           11
featThreshold     Mat           null
featType          int           1
hashFeatures      boolean       false
initsumsq         float         1.0E-5
iweight           FMat          null
lim               float         0.0
links             IMat             3
   3

lrate             FMat          0.10000
mask              FMat          null
npasses           int           10
nzPerColumn       int           0
pstep             float         0.01
putBack           int           -1
r1nmats           int           1
r2nmats           int           1
reg1weight        FMat          1.0000e-07
reg2weight        FMat          1
resFile           String        null
rmask             FMat          null
sample            float         1.0
sizeMargin        float         3.0
startBlock        int           0
targets           FMat          null
targmap           FMat          null
texp              FMat          0.50000
updateAll         boolean       false
useCache          boolean       true
useDouble         boolean       false
useGPU            boolean       false
vexp              FMat          0.50000
waitsteps         int           2



In [42]:

    
mm.train
nn.predict









    



corpus perplexity=992.808970
pass= 0
 6.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
41.00%, ll=-0.28270, gf=0.079, secs=0.0, GB=0.00, MB/s=39.00
77.00%, ll=-0.03527, gf=0.081, secs=0.0, GB=0.00, MB/s=38.40
100.00%, ll=-0.20491, gf=0.083, secs=0.0, GB=0.00, MB/s=39.16
pass= 1
 6.00%, ll=-1.20125, gf=0.078, secs=0.0, GB=0.00, MB/s=37.71
41.00%, ll=-0.30951, gf=0.083, secs=0.0, GB=0.00, MB/s=39.11
77.00%, ll=-0.24529, gf=0.085, secs=0.0, GB=0.00, MB/s=40.00
100.00%, ll=-0.13970, gf=0.086, secs=0.0, GB=0.00, MB/s=40.22
pass= 2
 6.00%, ll=-1.47287, gf=0.085, secs=0.0, GB=0.00, MB/s=40.42
41.00%, ll=-0.22679, gf=0.087, secs=0.0, GB=0.00, MB/s=40.91
77.00%, ll=-0.39799, gf=0.090, secs=0.0, GB=0.00, MB/s=42.12
100.00%, ll=-0.17308, gf=0.090, secs=0.1, GB=0.00, MB/s=42.11
pass= 3
 6.00%, ll=-1.47907, gf=0.090, secs=0.1, GB=0.00, MB/s=42.22
41.00%, ll=-0.30906, gf=0.092, secs=0.1, GB=0.00, MB/s=43.12
77.00%, ll=-0.49180, gf=0.093, secs=0.1, GB=0.00, MB/s=43.20
100.00%, ll=-0.23331, gf=0.094, secs=0.1, GB=0.00, MB/s=43.76
pass= 4
 6.00%, ll=-1.46579, gf=0.093, secs=0.1, GB=0.00, MB/s=43.83
41.00%, ll=-0.37264, gf=0.094, secs=0.1, GB=0.00, MB/s=43.84
77.00%, ll=-0.55793, gf=0.094, secs=0.1, GB=0.00, MB/s=43.85
100.00%, ll=-0.28842, gf=0.095, secs=0.1, GB=0.00, MB/s=44.29
pass= 5
 6.00%, ll=-1.44678, gf=0.095, secs=0.1, GB=0.00, MB/s=44.33
41.00%, ll=-0.42261, gf=0.095, secs=0.1, GB=0.00, MB/s=44.31
77.00%, ll=-0.60707, gf=0.095, secs=0.1, GB=0.00, MB/s=44.29
100.00%, ll=-0.33395, gf=0.096, secs=0.1, GB=0.00, MB/s=44.64
pass= 6
 6.00%, ll=-1.42705, gf=0.095, secs=0.1, GB=0.00, MB/s=44.67
41.00%, ll=-0.46288, gf=0.097, secs=0.1, GB=0.00, MB/s=45.06
77.00%, ll=-0.64504, gf=0.097, secs=0.1, GB=0.01, MB/s=45.40
100.00%, ll=-0.37222, gf=0.098, secs=0.1, GB=0.01, MB/s=45.68
pass= 7
 6.00%, ll=-1.40822, gf=0.098, secs=0.1, GB=0.01, MB/s=45.70
41.00%, ll=-0.49608, gf=0.099, secs=0.1, GB=0.01, MB/s=46.39
77.00%, ll=-0.67528, gf=0.099, secs=0.1, GB=0.01, MB/s=46.27
100.00%, ll=-0.40490, gf=0.100, secs=0.1, GB=0.01, MB/s=46.50
pass= 8
 6.00%, ll=-1.39079, gf=0.100, secs=0.1, GB=0.01, MB/s=46.51
41.00%, ll=-0.52395, gf=0.100, secs=0.1, GB=0.01, MB/s=46.75
77.00%, ll=-0.69995, gf=0.101, secs=0.1, GB=0.01, MB/s=46.96
100.00%, ll=-0.43319, gf=0.101, secs=0.1, GB=0.01, MB/s=47.15
pass= 9
 6.00%, ll=-1.37481, gf=0.101, secs=0.1, GB=0.01, MB/s=47.16
41.00%, ll=-0.54772, gf=0.102, secs=0.1, GB=0.01, MB/s=47.67
77.00%, ll=-0.72047, gf=0.103, secs=0.2, GB=0.01, MB/s=47.84
100.00%, ll=-0.45795, gf=0.103, secs=0.2, GB=0.01, MB/s=48.00
Time=0.1550 secs, gflops=0.10
corpus perplexity=985.317557
Predicting
 4.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
 9.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
14.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
19.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
24.00%, ll=-1.00000, gf=0.060, secs=0.0, GB=0.00, MB/s=180.00
29.00%, ll=-1.00000, gf=0.073, secs=0.0, GB=0.00, MB/s=216.00
33.00%, ll=-1.00000, gf=0.085, secs=0.0, GB=0.00, MB/s=252.00
38.00%, ll=-1.00000, gf=0.097, secs=0.0, GB=0.00, MB/s=288.00
43.00%, ll=-1.00000, gf=0.109, secs=0.0, GB=0.00, MB/s=324.00
48.00%, ll=-1.00000, gf=0.121, secs=0.0, GB=0.00, MB/s=360.00
53.00%, ll=-1.00000, gf=0.066, secs=0.0, GB=0.00, MB/s=198.00
58.00%, ll=-1.00000, gf=0.073, secs=0.0, GB=0.00, MB/s=216.00
62.00%, ll=-1.00000, gf=0.079, secs=0.0, GB=0.00, MB/s=234.00
67.00%, ll=-1.00000, gf=0.085, secs=0.0, GB=0.00, MB/s=251.99
72.00%, ll=-1.00000, gf=0.091, secs=0.0, GB=0.00, MB/s=269.99
77.00%, ll=-1.00000, gf=0.097, secs=0.0, GB=0.00, MB/s=287.99
82.00%, ll=-1.00000, gf=0.069, secs=0.0, GB=0.00, MB/s=204.00
87.00%, ll=-1.00000, gf=0.073, secs=0.0, GB=0.00, MB/s=216.00
91.00%, ll=-1.00000, gf=0.077, secs=0.0, GB=0.00, MB/s=228.00
96.00%, ll=-1.00000, gf=0.081, secs=0.0, GB=0.00, MB/s=240.00
100.00%, ll=-1.00000, gf=0.083, secs=0.0, GB=0.00, MB/s=248.00
Time=0.0030 secs, gflops=0.08



In [ ]:

    
val cx1=cx*5
min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
max(cx1, 0, cx1) 
val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
mean(p,2)



In [6]:

    
cx1









    









    Out[6]:





         1   0.48584   0.16356         1         1         0  0.098707...
         0         0         0         0         0   0.82548         0...



In [7]:

    
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)



In [8]:

    
val model = mm.model
val (nn1, nopts1) = GLM.LBFGSpredictor(model, atest, cx)









    









    Out[8]:





BIDMach.models.GLM$LearnLBFGSOptions@3fde0ffe



In [9]:

    
nn1.predict









    



corpus perplexity=1.970211
Predicting
 4.00%, ll=-0.88782, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.10
 8.00%, ll=-0.87371, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.19
12.00%, ll=-0.94405, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.29
16.00%, ll=-0.97745, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.38
20.00%, ll=-0.91692, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.24
24.00%, ll=-0.95906, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.29
28.00%, ll=-0.91178, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.34
32.00%, ll=-0.80047, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.38
36.00%, ll=-0.92667, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.43
40.00%, ll=-0.91742, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.48
44.00%, ll=-0.92421, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.53
48.00%, ll=-0.94550, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.38
51.00%, ll=-0.81677, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.42
56.00%, ll=-0.91331, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.45
60.00%, ll=-0.93091, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.48
64.00%, ll=-0.85525, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.51
68.00%, ll=-0.87134, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.54
72.00%, ll=-0.94563, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.58
76.00%, ll=-0.78031, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.46
80.00%, ll=-0.94823, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.48
83.00%, ll=-0.95096, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.50
88.00%, ll=-0.86657, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.53
92.00%, ll=-0.97759, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.55
96.00%, ll=-0.75901, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.58
100.00%, ll=-0.93326, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.60
Time=0.0050 secs, gflops=0.00



In [10]:

    
val cx1=cx*10
min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
max(cx1, 0, cx1) 
val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
mean(p,2)
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)



In [11]:

    
cx1









    









    Out[11]:





        1  0.97167  0.32712        1        1        0  0.19741        1...
        0        0        0        0        0        1        0        0...



In [23]:

    
saveFMat(dict+"moonresult.fmat.txt",cx)



In [ ]:



In [19]: