COCOA Method based on LBFGS Optimization

In this tutorial, we'll explore training and evaluation of L-BFGS method based Logitistic Regression Classifiers.

To start, we import the standard BIDMach class definitions.


In [1]:
import BIDMat.{CMat,CSMat,DMat,Dict,IDict,FMat,GMat,GIMat,GSMat,HMat,IMat,Mat,SMat,SBMat,SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{FM,GLM,KMeans,LDA,LDAgibbs,NMF,SFA}
import BIDMach.datasources.{MatDS,FilesDS,SFilesDS}
import BIDMach.mixins.{CosineSim,Perplexity,Top,L1Regularizer,L2Regularizer}
import BIDMach.updaters.{ADAGrad,Batch,BatchNorm,IncMult,IncNorm,Telescoping}
import BIDMach.causal.{IPTW}

Mat.checkMKL
Mat.checkCUDA
if (Mat.hasCUDA > 0) GPUmem


Cant find native HDF5 library
Couldnt load JCuda
Out[1]:
()

Now we load some training and test data, and some category labels. The data come from a news collection from Reuters, and is a "classic" test set for classification. Each article belongs to one or more of 103 categories. The articles are represented as Bag-of-Words (BoW) column vectors. For a data matrix A, element A(i,j) holds the count of word i in document j.

The category matrices have 103 rows, and a category matrix C has a one in position C(i,j) if document j is tagged with category i, or zero otherwise.

To reduce the computing time and memory footprint, the training data have been sampled. The full collection has about 700k documents. Our training set has 60k.

Since the document matrices contain counts of words, we use a min function to limit the count to "1", i.e. because we need binary features for naive Bayes.


In [15]:
//val dict = "/Users/Anna/workspace/BIDMach/data/"
//val rpath = "/Users/Anna/workspace/BIDMach/data/"


val dict = "/mnt/data/FCVID/FCVID_Feature/"
val rpath = "/mnt/data/FCVID/FCVID_Results/"

var index = 79; 
var index1 = 110;
var tnum = 1;
var tnum1 = 1;
for(index <- 1 to 10){
 var aa = loadFMat(dict+index+".txt")
// var c = loadFMat(dict+index+"_test"+".txt")
 var b = loadFMat(dict+"all"+index+".txt")

//var a = aa.t   
var atrain = aa(?,(239->aa.ncols))
//var atest = c(?,(239->c.ncols))
var atrain1 = b(?,(239->b.ncols))


var maxx1 = maxi(maxi(atrain,1),2);
var minx1 = mini(mini(atrain,1),2);
var maxx2 = maxi(maxi(atrain1,1),2);
var minx2 = mini(mini(atrain1,1),2);
var maxx = 0;
max(maxx1,maxx2,maxx);
var minx = 0;
min(minx1,minx2,minx);
atrain = (atrain-minx)/(maxx-minx);
atrain1 = (atrain1-minx)/(maxx-minx);

tnum = (atrain.nrows+1)/2
var ttrain = atrain(0->tnum,(239->atrain.ncols) )
var ttest = atrain(tnum->atrain.nrows,(239->atrain.ncols) )
tnum1 = (atrain1.nrows+1)/2
var ttrain1 = atrain1(0->tnum1,(239->atrain1.ncols) )
var ttest1 = atrain1(tnum1->atrain1.nrows,(239->atrain1.ncols) )

var trainx = zeros( (ttrain.nrows+ttrain1.nrows),4096)
trainx(0->(ttrain.nrows),?)=ttrain
trainx( (ttrain.nrows)->(ttrain.nrows+ttrain1.nrows),?)=ttrain1

var testx = zeros( (ttest.nrows+ttest1.nrows),4096)
testx(0->(ttest.nrows),?)=ttest
testx( (ttest.nrows)->(ttest.nrows+ttest1.nrows),?)=ttest1

var ctrain1=zeros(2,(ttrain.nrows+ttrain1.nrows) ) 
ctrain1(0,(0->(ttrain.nrows) ) )= 1
ctrain1(1,(ttrain.nrows)->(ttrain.nrows+ttrain1.nrows)) = 1
ctrain1(1,(0->(ttrain.nrows) ) )= -1
ctrain1(0,(ttrain.nrows)->(ttrain.nrows+ttrain1.nrows)) = -1

var ctest=zeros(2,(ttest.nrows+ttest1.nrows) ) 
ctest(0,(0->(ttest.nrows) ) )= 1
ctest(1,(ttest.nrows)->(ttest.nrows+ttest1.nrows)) = 1

//max(atrain, 0.001, atrain)                       // the first "traindata" argument is the input, the other is out
tnum=testx.nrows
val cx=zeros(2,testx.nrows)
val (mm,mopts,nn,nopts)=GLM.COCOAlearner(trainx.t,ctrain1,testx.t,cx)
mopts.autoReset=false
mopts.useGPU=false
mopts.lrate=0.1
mopts.batchSize=30
mopts.dim=256
mopts.startBlock=0
mopts.npasses=10
mopts.updateAll=false
mm.train;
nn.predict;

saveFMat(rpath+"r"+index+".txt",cx/tnum);
//}
 
min(cx, 1, cx)                       // the first "traindata" argument is the input, the other is output
max(cx, 0, cx) 
val p=ctest *@cx +(1-ctest) *@(1-cx)
var meanp=mean(p,2)
saveFMat(rpath+"meanp"+index+".txt",meanp);
}


corpus perplexity=2590.533408
pass= 0
 7.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
47.00%, ll=0.00000, gf=0.185, secs=0.0, GB=0.00, MB/s=75.99
87.00%, ll=0.00000, gf=0.186, secs=0.0, GB=0.00, MB/s=80.30
100.00%, ll=0.00000, gf=0.184, secs=0.0, GB=0.00, MB/s=80.23
pass= 1
 7.00%, ll=0.00000, gf=0.186, secs=0.0, GB=0.00, MB/s=82.50
47.00%, ll=0.00000, gf=0.186, secs=0.0, GB=0.00, MB/s=78.92
87.00%, ll=0.00000, gf=0.183, secs=0.1, GB=0.00, MB/s=78.80
100.00%, ll=0.00000, gf=0.185, secs=0.1, GB=0.00, MB/s=80.23
pass= 2
 7.00%, ll=0.00000, gf=0.183, secs=0.1, GB=0.00, MB/s=80.03
47.00%, ll=0.00000, gf=0.184, secs=0.1, GB=0.01, MB/s=78.35
87.00%, ll=0.00000, gf=0.184, secs=0.1, GB=0.01, MB/s=79.29
100.00%, ll=0.00000, gf=0.186, secs=0.1, GB=0.01, MB/s=80.23
pass= 3
 7.00%, ll=0.00000, gf=0.185, secs=0.1, GB=0.01, MB/s=80.09
47.00%, ll=0.00000, gf=0.185, secs=0.1, GB=0.01, MB/s=78.89
87.00%, ll=0.00000, gf=0.185, secs=0.1, GB=0.01, MB/s=79.53
100.00%, ll=0.00000, gf=0.185, secs=0.1, GB=0.01, MB/s=79.54
pass= 4
 7.00%, ll=0.00000, gf=0.184, secs=0.1, GB=0.01, MB/s=79.45
47.00%, ll=0.00000, gf=0.184, secs=0.1, GB=0.01, MB/s=78.58
87.00%, ll=-1.30684, gf=0.184, secs=0.1, GB=0.01, MB/s=79.12
100.00%, ll=0.00000, gf=0.184, secs=0.1, GB=0.01, MB/s=79.14
pass= 5
 7.00%, ll=0.00000, gf=0.183, secs=0.1, GB=0.01, MB/s=79.07
47.00%, ll=0.00000, gf=0.183, secs=0.2, GB=0.01, MB/s=78.39
87.00%, ll=-1.30685, gf=0.184, secs=0.2, GB=0.01, MB/s=78.85
100.00%, ll=0.00000, gf=0.184, secs=0.2, GB=0.01, MB/s=79.32
pass= 6
 7.00%, ll=0.00000, gf=0.184, secs=0.2, GB=0.01, MB/s=79.26
47.00%, ll=0.00000, gf=0.184, secs=0.2, GB=0.01, MB/s=78.67
87.00%, ll=-1.30685, gf=0.184, secs=0.2, GB=0.02, MB/s=79.04
100.00%, ll=0.00000, gf=0.184, secs=0.2, GB=0.02, MB/s=79.06
pass= 7
 7.00%, ll=0.00000, gf=0.183, secs=0.2, GB=0.02, MB/s=79.01
47.00%, ll=0.00000, gf=0.184, secs=0.2, GB=0.02, MB/s=78.52
87.00%, ll=-1.30686, gf=0.183, secs=0.2, GB=0.02, MB/s=78.51
100.00%, ll=0.00000, gf=0.183, secs=0.2, GB=0.02, MB/s=78.53
pass= 8
 7.00%, ll=0.00000, gf=0.183, secs=0.2, GB=0.02, MB/s=78.83
47.00%, ll=0.00000, gf=0.183, secs=0.3, GB=0.02, MB/s=78.40
87.00%, ll=-1.30687, gf=0.183, secs=0.3, GB=0.02, MB/s=78.70
100.00%, ll=0.00000, gf=0.183, secs=0.3, GB=0.02, MB/s=78.72
pass= 9
 7.00%, ll=0.00000, gf=0.183, secs=0.3, GB=0.02, MB/s=78.69
47.00%, ll=0.00000, gf=0.183, secs=0.3, GB=0.02, MB/s=78.31
87.00%, ll=-1.30687, gf=0.183, secs=0.3, GB=0.02, MB/s=78.58
100.00%, ll=0.00000, gf=0.183, secs=0.3, GB=0.02, MB/s=78.60
Time=0.2960 secs, gflops=0.18
corpus perplexity=2885.136257
Predicting
 3.00%, ll=-34.31177, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
 7.00%, ll=-12.09627, gf=0.066, secs=0.0, GB=0.00, MB/s=142.00
11.00%, ll=-25.67790, gf=0.098, secs=0.0, GB=0.00, MB/s=202.78
14.00%, ll=-24.67247, gf=0.131, secs=0.0, GB=0.00, MB/s=273.26
18.00%, ll=-11.22077, gf=0.164, secs=0.0, GB=0.00, MB/s=336.98
22.00%, ll=-23.32488, gf=0.197, secs=0.0, GB=0.00, MB/s=401.00
25.00%, ll=-18.10493, gf=0.230, secs=0.0, GB=0.00, MB/s=488.93
29.00%, ll=-24.56264, gf=0.131, secs=0.0, GB=0.00, MB/s=280.49
33.00%, ll=-26.04318, gf=0.148, secs=0.0, GB=0.00, MB/s=313.34
37.00%, ll=-28.85150, gf=0.164, secs=0.0, GB=0.00, MB/s=350.53
40.00%, ll=-18.17740, gf=0.181, secs=0.0, GB=0.00, MB/s=393.37
44.00%, ll=-22.09091, gf=0.197, secs=0.0, GB=0.00, MB/s=428.68
48.00%, ll=-25.84660, gf=0.213, secs=0.0, GB=0.00, MB/s=455.83
51.00%, ll=-14.79853, gf=0.153, secs=0.0, GB=0.00, MB/s=330.42
55.00%, ll=-9.53028, gf=0.164, secs=0.0, GB=0.00, MB/s=362.90
59.00%, ll=-18.21640, gf=0.175, secs=0.0, GB=0.00, MB/s=387.96
62.00%, ll=-17.24561, gf=0.186, secs=0.0, GB=0.00, MB/s=417.92
66.00%, ll=-25.66841, gf=0.197, secs=0.0, GB=0.00, MB/s=446.41
70.00%, ll=-8.86158, gf=0.156, secs=0.0, GB=0.00, MB/s=357.61
74.00%, ll=-17.01925, gf=0.164, secs=0.0, GB=0.00, MB/s=380.54
77.00%, ll=-1.99512, gf=0.172, secs=0.0, GB=0.00, MB/s=396.58
81.00%, ll=-12.84921, gf=0.144, secs=0.0, GB=0.00, MB/s=336.24
85.00%, ll=-23.18103, gf=0.151, secs=0.0, GB=0.00, MB/s=354.82
88.00%, ll=-20.92567, gf=0.158, secs=0.0, GB=0.00, MB/s=374.25
92.00%, ll=-14.64597, gf=0.164, secs=0.0, GB=0.00, MB/s=393.61
96.00%, ll=-9.56748, gf=0.171, secs=0.0, GB=0.00, MB/s=412.80
100.00%, ll=-17.66913, gf=0.148, secs=0.0, GB=0.00, MB/s=358.68
Time=0.0060 secs, gflops=0.15

Get the word and document counts from the data. This turns out to be equivalent to a matrix multiply. For a data matrix A and category matrix C, we want all (cat, word) pairs (i,j) such that C(i,k) and A(j,k) are both 1 - this means that document k contains word j, and is also tagged with category i. Summing over all documents gives us

$${\rm wordcatCounts(i,j)} = \sum_{k=1}^N C(i,k) A(j,k) = C * A^T$$

Because we are doing independent binary classifiers for each class, we need to construct the counts for words not in the class (negwcounts).

Finally, we add a smoothing count 0.5 to counts that could be zero.


In [45]:
val cx=zeros(ctest.nrows,ctest.ncols)
val (mm,mopts,nn,nopts)=GLM.learner(atrain,ctrain,atest,cx,0)
mopts.autoReset=false
mopts.useGPU=false
mopts.lrate=0.1
mopts.batchSize=2
mopts.dim=256
mopts.startBlock=0
mopts.npasses=10
mopts.updateAll=false
//mopts.what



Out[45]:
false

Now compute the probabilities

  • pwordcat = probability that a word is in a cat, given the cat.
  • pwordncat = probability of a word, given the complement of the cat.
  • pcat = probability that doc is in a given cat.
  • spcat = sum of pcat probabilities (> 1 because docs can be in multiple cats)

In [11]:
//mm.train
//nn.predict
atrain.ncols



java.lang.NoClassDefFoundError: Could not initialize class 

Now take the logs of those probabilities. Here we're using the formula presented here to match Naive Bayes to Logistic Regression for independent data.

For each word, we compute the log of the ratio of the complementary word probability over the in-class word probability.

For each category, we compute the log of the ratio of the complementary category probability over the current category probability.

lpwordcat(j,i) represents $\log\left(\frac{{\rm Pr}(X_i|\neg c_j)}{{\rm Pr}(X_i|c_j)}\right)$

while lpcat(j) represents $\log\left(\frac{{\rm Pr}(\neg c)}{{\rm Pr}(c)}\right)$


In [22]:
val cx1=cx
min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
max(cx1, 0, cx1) 
val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
mean(p,2)
//saveFMat(rpath+index+".txt",cx)
cx



java.lang.RuntimeException: dims incompatible
    BIDMat.DenseMat$mcF$sp.ggMatOpStrictv$mcF$sp(DenseMat.scala:963)
    BIDMat.DenseMat$mcF$sp.ggMatOpv$mcF$sp(DenseMat.scala:951)
    BIDMat.FMat.ffMatOpv(FMat.scala:206)
    BIDMat.FMat.$times$at(FMat.scala:799)

Here's where we apply Naive Bayes. The formula we're using is borrowed from here.

$${\rm Pr}(c|X_1,\ldots,X_k) = \frac{1}{1 + \frac{{\rm Pr}(\neg c)}{{\rm Pr}(c)}\prod_{i-1}^k\frac{{\rm Pr}(X_i|\neg c)}{{\rm Pr}(X_i|c)}}$$

and we can rewrite

$$\frac{{\rm Pr}(\neg c)}{{\rm Pr}(c)}\prod_{i-1}^k\frac{{\rm Pr}(X_i|\neg c)}{{\rm Pr}(X_i|c)}$$

as

$$\exp\left(\log\left(\frac{{\rm Pr}(\neg c)}{{\rm Pr}(c)}\right) + \sum_{i=1}^k\log\left(\frac{{\rm Pr}(X_i|\neg c)}{{\rm Pr}(X_i|c)}\right)\right) = \exp({\rm lpcat(j)} + {\rm lpwordcat(j,?)} * X)$$

for class number j and an input column $X$. This follows because an input column $X$ is a sparse vector with ones in the positions of the input features. The product ${\rm lpwordcat(i,?)} * X$ picks out the features occuring in the input document and adds the corresponding logs from lpwordcat.

Finally, we take the exponential above and fold it into the formula $P(c_j|X_1,\ldots,X_k) = 1/(1+\exp(\cdots))$. This gives us a matrix of predictions. preds(i,j) = prediction of membership in category i for test document j.


In [6]:
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)



Out[6]:
0.76865

To measure the accuracy of the predictions above, we can compute the probability that the classifier outputs the right label. We used this formula in class for the expected accuracy for logistic regression. The "dot arrow" operator takes dot product along rows:


In [7]:
val model = mm.model
val (nn1, nopts1) = GLM.LBFGSpredictor(model, atest, cx)



Out[7]:
BIDMach.models.GLM$LearnLBFGSOptions@8eb62c4

Raw accuracy is not a good measure in most cases. When there are few positives (instances in the class vs. its complement), accuracy simply drives down false-positive rate at the expense of false-negative rate. In the worst case, the learner may always predict "no" and still achieve high accuracy.

ROC curves and ROC Area Under the Curve (AUC) are much better. Here we compute the ROC curves from the predictions above. We need:

  • scores - the predicted quality from the formula above.
  • good - 1 for positive instances, 0 for negative instances.
  • bad - complement of good.
  • npoints (100) - specifies the number of X-axis points for the ROC plot.

itest specifies which of the categories to plot for. We chose itest=6 because that category has one of the highest positive rates, and gives the most stable accuracy plots.


In [8]:
nn1.predict


corpus perplexity=1.970211
Predicting
 4.00%, ll=-0.88782, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
 8.00%, ll=-0.87371, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.19
12.00%, ll=-0.94405, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.14
16.00%, ll=-0.97745, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.19
20.00%, ll=-0.91692, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.24
24.00%, ll=-0.95906, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.19
28.00%, ll=-0.91178, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.22
32.00%, ll=-0.80047, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
36.00%, ll=-0.92667, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.22
40.00%, ll=-0.91742, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.24
44.00%, ll=-0.92421, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
48.00%, ll=-0.94550, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.23
51.00%, ll=-0.81677, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.25
56.00%, ll=-0.91331, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.27
60.00%, ll=-0.93091, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.24
64.00%, ll=-0.85525, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
68.00%, ll=-0.87134, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.27
72.00%, ll=-0.94563, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.25
76.00%, ll=-0.78031, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
80.00%, ll=-0.94823, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.27
83.00%, ll=-0.95096, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.29
88.00%, ll=-0.86657, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.26
92.00%, ll=-0.97759, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.28
96.00%, ll=-0.75901, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.29
100.00%, ll=-0.93326, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.27
Time=0.0090 secs, gflops=0.00

TODO 1: In the cell below, write an expression to derive the ROC Area under the curve (AUC) given the curve rr. rr gives the ROC curve y-coordinates at 100 evenly-spaced X-values from 0 to 1.0.


In [9]:
val cx1=cx*10
min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
max(cx1, 0, cx1) 
val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
mean(p,2)
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)



Out[9]:
0.81119

TODO 2: In the cell below, write the value of AUC returned by the expression above.


In [10]:
cx1



Out[10]:
        1  0.97167  0.32712        1        1        0  0.19741        1...
        0        0        0        0        0        1        0        0...

In [11]:
saveFMat(dict+"moonresult.fmat.txt",cx)



Logistic Regression

Now lets train a logistic classifier on the same data. BIDMach has an umbrella classifier called GLM for Generalized Linear Model. GLM includes linear regression, logistic regression (with log accuracy or direct accuracy optimization), and SVM.

The learner function accepts these arguments:

  • traindata: the training data in the same format as for Naive Bayes
  • traincats: the training category labels
  • testdata: the test input data
  • predcats: a container for the predictions generated by the model
  • modeltype (GLM.logistic here): an integer that specifies the type of model (0=linear, 1=logistic log accuracy, 2=logistic accuracy, 3=SVM).

We'll construct the learner and then look at its options:


In [2]:
val dict = "/Users/Anna/workspace/BIDMach/data/wildlife/"
val aa = loadFMat(dict+"1.txt")
val c = loadFMat(dict+"c1.txt")
val b = loadFMat("/Users/Anna/workspace/BIDMach/data/wltestall.txt")
val d = loadFMat("/Users/Anna/workspace/BIDMach/data/wltestcatall.txt")
//val dict = "/Users/Anna/workspace/BIDMach_1.0.0-full-linux-x86_64/data/uci/"
//val aa = loadFMat(dict+"arabic.fmat.lz4")
//val c = loadFMat(dict+"arabic_cats.fmat.lz4")
val a = aa *10  
val atrain = a //a(?,(100->a.ncols))
val atest =  a //a(?,(0->100))
val ctrain = c //c(?,(100->a.ncols))
val ctest = c //c(?,(0->100))
//max(atrain, 0.001, atrain)                       // the first "traindata" argument is the input, the other is output
//max(atest, 0.001, atest)
atrain



Out[2]:
    -0.33386    -0.33914    -0.29299    -0.40609    -0.43935    -0.55155...
     0.28568     0.18359  -0.0037000    -0.17297   -0.095350   -0.013070...
   -0.098630    -0.13269     0.41184     0.40252     0.24642     0.26601...
   -0.031890   -0.038550     0.38608     0.35432     0.21913     0.31026...
     0.17352     0.25341     0.37047     0.53088     0.40656     0.43656...
     0.45181     0.47072     0.50574     0.36032     0.33288     0.26337...
     0.24997    -0.11709     0.54020     0.33701     0.30424     0.15961...
     0.10230    0.082690    0.017330     0.46488     0.24725   -0.025730...
          ..          ..          ..          ..          ..          ..

The most important options are:

  • lrate: the learning rate
  • batchSize: the minibatch size
  • npasses: the number of passes over the dataset

We'll use the following parameters for this training run.


In [7]:
val cx=zeros(2,ctest.ncols)
val (mm,mopts,nn,nopts)=GLM.SVMlearner(atrain,ctrain,atest,cx)
mopts.autoReset=false
mopts.useGPU=false
mopts.lrate=1
mopts.batchSize=2
mopts.dim=256
mopts.startBlock=0
mopts.npasses=10
mopts.updateAll=false
mopts.what


Option Name       Type          Value
===========       ====          =====
addConstFeat      boolean       false
autoReset         boolean       false
batchSize         int           2
dim               int           256
doubleScore       boolean       false
epsilon           float         1.0E-5
evalStep          int           11
featThreshold     Mat           null
featType          int           1
hashFeatures      boolean       false
initsumsq         float         1.0E-5
iweight           FMat          null
lim               float         0.0
links             IMat             3
   3

lrate             FMat          1
mask              FMat          null
npasses           int           10
nzPerColumn       int           0
pstep             float         0.01
putBack           int           -1
r1nmats           int           1
r2nmats           int           1
reg1weight        FMat          1.0000e-07
reg2weight        FMat          1
resFile           String        null
rmask             FMat          null
sample            float         1.0
sizeMargin        float         3.0
startBlock        int           0
targets           FMat          null
targmap           FMat          null
texp              FMat          0.50000
updateAll         boolean       false
useCache          boolean       true
useDouble         boolean       false
useGPU            boolean       false
vexp              FMat          0.50000
waitsteps         int           2

In [8]:
mm.train
nn.predict
val cx1=cx
//min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
//max(cx1, 0, cx1) 
//val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
//mean(p,2)
cx


corpus perplexity=NaN
pass= 0
 6.00%, ll=-1.00000, gf=0.011, secs=0.0, GB=0.00, MB/s=12.00
41.00%, ll=0.00000, gf=0.010, secs=0.1, GB=0.00, MB/s= 5.11
77.00%, ll=0.00000, gf=0.018, secs=0.1, GB=0.00, MB/s= 8.73
100.00%, ll=0.00000, gf=0.023, secs=0.1, GB=0.00, MB/s=10.78
pass= 1
 6.00%, ll=0.00000, gf=0.023, secs=0.1, GB=0.00, MB/s=11.31
41.00%, ll=0.00000, gf=0.026, secs=0.1, GB=0.00, MB/s=12.14
77.00%, ll=0.00000, gf=0.031, secs=0.1, GB=0.00, MB/s=14.35
100.00%, ll=0.00000, gf=0.026, secs=0.1, GB=0.00, MB/s=12.30
pass= 2
 6.00%, ll=0.00000, gf=0.026, secs=0.1, GB=0.00, MB/s=12.49
41.00%, ll=0.00000, gf=0.030, secs=0.1, GB=0.00, MB/s=13.95
77.00%, ll=0.00000, gf=0.029, secs=0.2, GB=0.00, MB/s=13.76
100.00%, ll=0.00000, gf=0.031, secs=0.2, GB=0.00, MB/s=14.59
pass= 3
 6.00%, ll=0.00000, gf=0.031, secs=0.2, GB=0.00, MB/s=14.80
41.00%, ll=0.00000, gf=0.034, secs=0.2, GB=0.00, MB/s=16.00
77.00%, ll=0.00000, gf=0.034, secs=0.2, GB=0.00, MB/s=15.95
100.00%, ll=0.00000, gf=0.035, secs=0.2, GB=0.00, MB/s=16.53
pass= 4
 6.00%, ll=0.00000, gf=0.036, secs=0.2, GB=0.00, MB/s=16.80
41.00%, ll=0.00000, gf=0.038, secs=0.2, GB=0.00, MB/s=17.77
77.00%, ll=-0.21902, gf=0.038, secs=0.2, GB=0.00, MB/s=17.67
100.00%, ll=0.00000, gf=0.039, secs=0.2, GB=0.00, MB/s=18.15
pass= 5
 6.00%, ll=0.00000, gf=0.039, secs=0.2, GB=0.00, MB/s=18.38
41.00%, ll=0.00000, gf=0.041, secs=0.2, GB=0.00, MB/s=19.11
77.00%, ll=-0.14903, gf=0.043, secs=0.2, GB=0.00, MB/s=19.89
100.00%, ll=0.00000, gf=0.040, secs=0.2, GB=0.00, MB/s=18.45
pass= 6
 6.00%, ll=0.00000, gf=0.040, secs=0.2, GB=0.00, MB/s=18.57
41.00%, ll=0.00000, gf=0.041, secs=0.3, GB=0.00, MB/s=19.10
77.00%, ll=-0.10156, gf=0.042, secs=0.3, GB=0.01, MB/s=19.76
100.00%, ll=0.00000, gf=0.043, secs=0.3, GB=0.01, MB/s=20.19
pass= 7
 6.00%, ll=0.00000, gf=0.043, secs=0.3, GB=0.01, MB/s=20.29
41.00%, ll=0.00000, gf=0.042, secs=0.3, GB=0.01, MB/s=19.78
77.00%, ll=0.00000, gf=0.043, secs=0.3, GB=0.01, MB/s=20.22
100.00%, ll=0.00000, gf=0.044, secs=0.3, GB=0.01, MB/s=20.52
pass= 8
 6.00%, ll=0.00000, gf=0.044, secs=0.3, GB=0.01, MB/s=20.62
41.00%, ll=0.00000, gf=0.045, secs=0.3, GB=0.01, MB/s=21.16
77.00%, ll=0.00000, gf=0.045, secs=0.3, GB=0.01, MB/s=20.86
100.00%, ll=0.00000, gf=0.045, secs=0.3, GB=0.01, MB/s=21.19
pass= 9
 6.00%, ll=0.00000, gf=0.046, secs=0.3, GB=0.01, MB/s=21.27
41.00%, ll=0.00000, gf=0.047, secs=0.3, GB=0.01, MB/s=21.76
77.00%, ll=0.00000, gf=0.048, secs=0.3, GB=0.01, MB/s=22.17
100.00%, ll=0.00000, gf=0.048, secs=0.3, GB=0.01, MB/s=22.48
Time=0.3310 secs, gflops=0.05
corpus perplexity=NaN
Predicting
 4.00%, ll=-5.35505, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
 9.00%, ll=-5.56336, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
14.00%, ll=-5.39651, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
19.00%, ll=-3.47791, gf=0.048, secs=0.0, GB=0.00, MB/s=144.00
24.00%, ll=-4.42609, gf=0.060, secs=0.0, GB=0.00, MB/s=180.00
29.00%, ll=-5.00045, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
33.00%, ll=-4.28123, gf=0.042, secs=0.0, GB=0.00, MB/s=126.00
38.00%, ll=-4.85835, gf=0.048, secs=0.0, GB=0.00, MB/s=144.00
43.00%, ll=-4.95342, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
48.00%, ll=-5.43300, gf=0.040, secs=0.0, GB=0.00, MB/s=120.00
53.00%, ll=-4.59199, gf=0.044, secs=0.0, GB=0.00, MB/s=132.00
58.00%, ll=-3.04440, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
62.00%, ll=-4.28866, gf=0.039, secs=0.0, GB=0.00, MB/s=117.00
67.00%, ll=-4.49006, gf=0.042, secs=0.0, GB=0.00, MB/s=126.00
72.00%, ll=-6.24142, gf=0.036, secs=0.0, GB=0.00, MB/s=108.00
77.00%, ll=-5.79272, gf=0.039, secs=0.0, GB=0.00, MB/s=115.20
82.00%, ll=-3.15073, gf=0.041, secs=0.0, GB=0.00, MB/s=122.40
87.00%, ll=-4.36058, gf=0.044, secs=0.0, GB=0.00, MB/s=129.60
91.00%, ll=-7.38732, gf=0.038, secs=0.0, GB=0.00, MB/s=114.00
96.00%, ll=-10.42975, gf=0.040, secs=0.0, GB=0.00, MB/s=120.00
100.00%, ll=-6.19474, gf=0.042, secs=0.0, GB=0.00, MB/s=124.00
Time=0.0070 secs, gflops=0.04
Out[8]:
  -7.7981  -10.713  -10.620  -11.079  -10.499  -8.8026  -12.241  -12.910...
   7.7981   10.713   10.620   11.079   10.499   8.8026   12.241   12.910...

Since we have the accuracy scores for both Naive Bayes and Logistic regression, we can plot both of them on the same axes. Naive Bayes is red, Logistic regression is blue. The x-axis is the category number from 0 to 102. The y-axis is the absolute accuracy of the predictor for that category.


In [7]:
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)



java.lang.NoClassDefFoundError: Could not initialize class 

TODO 3: With the full training set (700k training documents), Logistic Regression is noticeably more accurate than Naive Bayes in every category. What do you observe in the plot above? Why do you think this is?

Next we'll compute the ROC plot and ROC area (AUC) for Logistic regression for category itest.


In [40]:
saveFMat(dict+"wildliferesults.fmat.txt",cx)



We computed the ROC curve for Naive Bayes earlier, so now we can plot them on the same axes. Naive Bayes is once again in red, Logistic regression in blue.


In [51]:
cx1



Out[51]:
  1.1398e-13  1.0409e-13  3.1763e-16  6.3486e-17  3.7628e-18  6.3368e-16...
  1.0488e-14  3.4730e-14  1.2205e-16  1.6546e-17  7.8444e-19  1.5367e-16...
  4.9068e-14  1.3641e-12  5.0059e-15  1.0316e-15  4.3107e-18  2.8967e-15...
  3.2656e-14  5.3654e-14  1.1086e-16  1.9517e-17  8.0842e-19  3.9087e-17...
  8.4080e-11  2.6030e-12  1.4020e-14  1.3571e-15  2.5275e-15  8.0566e-13...
  1.8410e-13  2.3721e-13  2.6570e-16  6.7162e-17  5.4204e-18  8.2176e-16...
  6.5161e-11  1.3945e-10  2.1820e-12  7.0878e-14  1.5250e-15  2.8615e-12...
  1.6085e-13  1.9523e-13  3.2423e-16  5.6627e-17  7.7785e-18  1.7874e-15...
          ..          ..          ..          ..          ..          ..

TODO 4: In the cell below, compute and plot lift curves from the ROC curves for Naive Bayes and Logistic regression. The lift curves should show the ratio of ROC y-values over a unit slope diagonal line (Y=X). The X-values should be the same as for the ROC plots, except that X=0 will be omitted since the lift will be undefined.


In [2]:
val dict = "/Users/Anna/workspace/BIDMach_1.0.0-full-linux-x86_64/data/"
//val a = loadFMat(dict+"arabic.fmat.lz4")
//val c = loadFMat(dict+"arabic_cats.fmat.lz4")
val aa = loadFMat(dict+"a.txt")
val c = loadFMat(dict+"alabel.txt")
val a = aa + 0.5 
val atrain =a(?,(100->a.ncols))
val atest =a(?,(0->100))
val ctrain =c(?,(100->a.ncols))
val ctest =c(?,(0->100))
max(atrain, 0.001, atrain)                       // the first "traindata" argument is the input, the other is output
max(atest, 0.001, atest)
atest



Out[2]:
   0.075172    0.42627    0.64251    0.36696    0.19400    0.58963...
    0.64074    0.42242    0.41518    0.87276    0.77227  0.0010000...

TODO 5: Experiment with different values for learning rate and batchSize to get the best performance for absolute accuracy and ROC area on category 6. Write your optimal values below:


In [41]:
val cx=zeros(ctest.nrows,ctest.ncols)
val (mm,mopts,nn,nopts)=GLM.SVMlearner(atrain,ctrain,atest,cx)
mopts.autoReset=false
mopts.useGPU=false
mopts.lrate=0.1
mopts.batchSize=2
mopts.dim=256
mopts.startBlock=0
mopts.npasses=10
mopts.updateAll=false
mopts.what


Option Name       Type          Value
===========       ====          =====
addConstFeat      boolean       false
autoReset         boolean       false
batchSize         int           2
dim               int           256
doubleScore       boolean       false
epsilon           float         1.0E-5
evalStep          int           11
featThreshold     Mat           null
featType          int           1
hashFeatures      boolean       false
initsumsq         float         1.0E-5
iweight           FMat          null
lim               float         0.0
links             IMat             3
   3

lrate             FMat          0.10000
mask              FMat          null
npasses           int           10
nzPerColumn       int           0
pstep             float         0.01
putBack           int           -1
r1nmats           int           1
r2nmats           int           1
reg1weight        FMat          1.0000e-07
reg2weight        FMat          1
resFile           String        null
rmask             FMat          null
sample            float         1.0
sizeMargin        float         3.0
startBlock        int           0
targets           FMat          null
targmap           FMat          null
texp              FMat          0.50000
updateAll         boolean       false
useCache          boolean       true
useDouble         boolean       false
useGPU            boolean       false
vexp              FMat          0.50000
waitsteps         int           2

In [42]:
mm.train
nn.predict


corpus perplexity=992.808970
pass= 0
 6.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
41.00%, ll=-0.28270, gf=0.079, secs=0.0, GB=0.00, MB/s=39.00
77.00%, ll=-0.03527, gf=0.081, secs=0.0, GB=0.00, MB/s=38.40
100.00%, ll=-0.20491, gf=0.083, secs=0.0, GB=0.00, MB/s=39.16
pass= 1
 6.00%, ll=-1.20125, gf=0.078, secs=0.0, GB=0.00, MB/s=37.71
41.00%, ll=-0.30951, gf=0.083, secs=0.0, GB=0.00, MB/s=39.11
77.00%, ll=-0.24529, gf=0.085, secs=0.0, GB=0.00, MB/s=40.00
100.00%, ll=-0.13970, gf=0.086, secs=0.0, GB=0.00, MB/s=40.22
pass= 2
 6.00%, ll=-1.47287, gf=0.085, secs=0.0, GB=0.00, MB/s=40.42
41.00%, ll=-0.22679, gf=0.087, secs=0.0, GB=0.00, MB/s=40.91
77.00%, ll=-0.39799, gf=0.090, secs=0.0, GB=0.00, MB/s=42.12
100.00%, ll=-0.17308, gf=0.090, secs=0.1, GB=0.00, MB/s=42.11
pass= 3
 6.00%, ll=-1.47907, gf=0.090, secs=0.1, GB=0.00, MB/s=42.22
41.00%, ll=-0.30906, gf=0.092, secs=0.1, GB=0.00, MB/s=43.12
77.00%, ll=-0.49180, gf=0.093, secs=0.1, GB=0.00, MB/s=43.20
100.00%, ll=-0.23331, gf=0.094, secs=0.1, GB=0.00, MB/s=43.76
pass= 4
 6.00%, ll=-1.46579, gf=0.093, secs=0.1, GB=0.00, MB/s=43.83
41.00%, ll=-0.37264, gf=0.094, secs=0.1, GB=0.00, MB/s=43.84
77.00%, ll=-0.55793, gf=0.094, secs=0.1, GB=0.00, MB/s=43.85
100.00%, ll=-0.28842, gf=0.095, secs=0.1, GB=0.00, MB/s=44.29
pass= 5
 6.00%, ll=-1.44678, gf=0.095, secs=0.1, GB=0.00, MB/s=44.33
41.00%, ll=-0.42261, gf=0.095, secs=0.1, GB=0.00, MB/s=44.31
77.00%, ll=-0.60707, gf=0.095, secs=0.1, GB=0.00, MB/s=44.29
100.00%, ll=-0.33395, gf=0.096, secs=0.1, GB=0.00, MB/s=44.64
pass= 6
 6.00%, ll=-1.42705, gf=0.095, secs=0.1, GB=0.00, MB/s=44.67
41.00%, ll=-0.46288, gf=0.097, secs=0.1, GB=0.00, MB/s=45.06
77.00%, ll=-0.64504, gf=0.097, secs=0.1, GB=0.01, MB/s=45.40
100.00%, ll=-0.37222, gf=0.098, secs=0.1, GB=0.01, MB/s=45.68
pass= 7
 6.00%, ll=-1.40822, gf=0.098, secs=0.1, GB=0.01, MB/s=45.70
41.00%, ll=-0.49608, gf=0.099, secs=0.1, GB=0.01, MB/s=46.39
77.00%, ll=-0.67528, gf=0.099, secs=0.1, GB=0.01, MB/s=46.27
100.00%, ll=-0.40490, gf=0.100, secs=0.1, GB=0.01, MB/s=46.50
pass= 8
 6.00%, ll=-1.39079, gf=0.100, secs=0.1, GB=0.01, MB/s=46.51
41.00%, ll=-0.52395, gf=0.100, secs=0.1, GB=0.01, MB/s=46.75
77.00%, ll=-0.69995, gf=0.101, secs=0.1, GB=0.01, MB/s=46.96
100.00%, ll=-0.43319, gf=0.101, secs=0.1, GB=0.01, MB/s=47.15
pass= 9
 6.00%, ll=-1.37481, gf=0.101, secs=0.1, GB=0.01, MB/s=47.16
41.00%, ll=-0.54772, gf=0.102, secs=0.1, GB=0.01, MB/s=47.67
77.00%, ll=-0.72047, gf=0.103, secs=0.2, GB=0.01, MB/s=47.84
100.00%, ll=-0.45795, gf=0.103, secs=0.2, GB=0.01, MB/s=48.00
Time=0.1550 secs, gflops=0.10
corpus perplexity=985.317557
Predicting
 4.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
 9.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
14.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
19.00%, ll=-1.00000, gf=Infinity, secs=0.0, GB=0.00, MB/s=Infinity
24.00%, ll=-1.00000, gf=0.060, secs=0.0, GB=0.00, MB/s=180.00
29.00%, ll=-1.00000, gf=0.073, secs=0.0, GB=0.00, MB/s=216.00
33.00%, ll=-1.00000, gf=0.085, secs=0.0, GB=0.00, MB/s=252.00
38.00%, ll=-1.00000, gf=0.097, secs=0.0, GB=0.00, MB/s=288.00
43.00%, ll=-1.00000, gf=0.109, secs=0.0, GB=0.00, MB/s=324.00
48.00%, ll=-1.00000, gf=0.121, secs=0.0, GB=0.00, MB/s=360.00
53.00%, ll=-1.00000, gf=0.066, secs=0.0, GB=0.00, MB/s=198.00
58.00%, ll=-1.00000, gf=0.073, secs=0.0, GB=0.00, MB/s=216.00
62.00%, ll=-1.00000, gf=0.079, secs=0.0, GB=0.00, MB/s=234.00
67.00%, ll=-1.00000, gf=0.085, secs=0.0, GB=0.00, MB/s=251.99
72.00%, ll=-1.00000, gf=0.091, secs=0.0, GB=0.00, MB/s=269.99
77.00%, ll=-1.00000, gf=0.097, secs=0.0, GB=0.00, MB/s=287.99
82.00%, ll=-1.00000, gf=0.069, secs=0.0, GB=0.00, MB/s=204.00
87.00%, ll=-1.00000, gf=0.073, secs=0.0, GB=0.00, MB/s=216.00
91.00%, ll=-1.00000, gf=0.077, secs=0.0, GB=0.00, MB/s=228.00
96.00%, ll=-1.00000, gf=0.081, secs=0.0, GB=0.00, MB/s=240.00
100.00%, ll=-1.00000, gf=0.083, secs=0.0, GB=0.00, MB/s=248.00
Time=0.0030 secs, gflops=0.08

In [ ]:
val cx1=cx*5
min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
max(cx1, 0, cx1) 
val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
mean(p,2)

In [6]:
cx1



Out[6]:
         1   0.48584   0.16356         1         1         0  0.098707...
         0         0         0         0         0   0.82548         0...

In [7]:
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)



Out[7]:
0.76865

In [8]:
val model = mm.model
val (nn1, nopts1) = GLM.LBFGSpredictor(model, atest, cx)



Out[8]:
BIDMach.models.GLM$LearnLBFGSOptions@3fde0ffe

In [9]:
nn1.predict


corpus perplexity=1.970211
Predicting
 4.00%, ll=-0.88782, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.10
 8.00%, ll=-0.87371, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.19
12.00%, ll=-0.94405, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.29
16.00%, ll=-0.97745, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.38
20.00%, ll=-0.91692, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.24
24.00%, ll=-0.95906, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.29
28.00%, ll=-0.91178, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.34
32.00%, ll=-0.80047, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.38
36.00%, ll=-0.92667, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.43
40.00%, ll=-0.91742, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.48
44.00%, ll=-0.92421, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.53
48.00%, ll=-0.94550, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.38
51.00%, ll=-0.81677, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.42
56.00%, ll=-0.91331, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.45
60.00%, ll=-0.93091, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.48
64.00%, ll=-0.85525, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.51
68.00%, ll=-0.87134, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.54
72.00%, ll=-0.94563, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.58
76.00%, ll=-0.78031, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.46
80.00%, ll=-0.94823, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.48
83.00%, ll=-0.95096, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.50
88.00%, ll=-0.86657, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.53
92.00%, ll=-0.97759, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.55
96.00%, ll=-0.75901, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.58
100.00%, ll=-0.93326, gf=0.001, secs=0.0, GB=0.00, MB/s= 0.60
Time=0.0050 secs, gflops=0.00

In [10]:
val cx1=cx*10
min(cx1, 1, cx1)                       // the first "traindata" argument is the input, the other is output
max(cx1, 0, cx1) 
val p=ctest *@cx1 +(1-ctest) *@(1-cx1)
mean(p,2)
val lacc = (cx1 ∙→ ctest + (1-cx1) ∙→ (1-ctest))/cx1.ncols
lacc.t
mean(lacc)



Out[10]:
0.81119

In [11]:
cx1



Out[11]:
        1  0.97167  0.32712        1        1        0  0.19741        1...
        0        0        0        0        0        1        0        0...

In [23]:
saveFMat(dict+"moonresult.fmat.txt",cx)




In [ ]:


In [19]: