In this notebook we'll explore automated parameter exploration by grid search.
In [1]:
import BIDMat.{CMat,CSMat,DMat,Dict,IDict,FMat,FND,GDMat,GMat,GIMat,GSDMat,GSMat,HMat,Image,IMat,Mat,SMat,SBMat,SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{FM,GLM,KMeans,KMeansw,ICA,LDA,LDAgibbs,NMF,RandomForest,SFA}
import BIDMach.datasources.{MatDS,FilesDS,SFilesDS}
import BIDMach.mixins.{CosineSim,Perplexity,Top,L1Regularizer,L2Regularizer}
import BIDMach.updaters.{ADAGrad,Batch,BatchNorm,IncMult,IncNorm,Telescoping}
import BIDMach.causal.{IPTW}
Mat.checkMKL
Mat.checkCUDA
if (Mat.hasCUDA > 0) GPUmem
Out[1]:
The dataset is the widely used Reuters news article dataset RCV1 V2. This dataset and several others are loaded by running the script getdata.sh
from the BIDMach/scripts directory. The data include both train and test subsets, and train and test labels (cats).
In [2]:
var dir = "../data/rcv1/" // adjust to point to the BIDMach/data/rcv1 directory
tic
val train = loadSMat(dir+"docs.smat.lz4")
val cats = loadFMat(dir+"cats.fmat.lz4")
val test = loadSMat(dir+"testdocs.smat.lz4")
val tcats = loadFMat(dir+"testcats.fmat.lz4")
toc
Out[2]:
First lets enumerate some parameter combinations for learning rate and time exponent of the optimizer (texp)
In [3]:
val lrates = col(0.03f, 0.1f, 0.3f, 1f) // 4 values
val texps = col(0.3f, 0.4f, 0.5f, 0.6f, 0.7f) // 5 values
Out[3]:
The next step is to enumerate all pairs of parameters. We can do this using the kron operator for now, this will eventually be a custom function:
In [4]:
val lrateparams = ones(texps.nrows, 1) ⊗ lrates
val texpparams = texps ⊗ ones(lrates.nrows,1)
lrateparams \ texpparams
Out[4]:
Here's the learner again:
In [5]:
val (mm, opts) = GLM.learner(train, cats, GLM.logistic)
Out[5]:
To keep things simple, we'll focus on just one category and train many models for it. The "targmap" option specifies a mapping from the actual base categories to the model categories. We'll map from category six to all our models:
In [6]:
val nparams = lrateparams.length
val targmap = zeros(nparams, 103)
targmap(?,6) = 1
Out[6]:
In [7]:
opts.targmap = targmap
opts.lrate = lrateparams
opts.texp = texpparams
Out[7]:
In [8]:
mm.train
In [9]:
val preds = zeros(targmap.nrows, tcats.ncols) // An array to hold the predictions
val (pp, popts) = GLM.predictor(mm.model, test, preds)
Out[9]:
And invoke the predict method on the predictor:
In [10]:
pp.predict
Although ll values are printed above, they are not meaningful (there is no target to compare the prediction with).
We can now compare the accuracy of predictions (preds matrix) with ground truth (the tcats matrix).
In [11]:
val vcats = targmap * tcats // create some virtual cats
val lls = mean(ln(1e-7f + vcats ∘ preds + (1-vcats) ∘ (1-preds)),2) // actual logistic likelihood
mean(lls)
Out[11]:
A more thorough measure is ROC area:
In [12]:
val rocs = roc2(preds, vcats, 1-vcats, 100) // Compute ROC curves for all categories
Out[12]:
In [13]:
plot(rocs)
Out[13]:
In [14]:
val aucs = mean(rocs)
Out[14]:
The maxi2 function will find the max value and its index.
In [15]:
val (bestv, besti) = maxi2(aucs)
Out[15]:
And using the best index we can find the optimal parameters:
In [16]:
texpparams(besti) \ lrateparams(besti)
Out[16]:
Write the optimal values in the cell below:
Note: although our parameters lay in a square grid, we could have enumerated any sequence of pairs, and we could have searched over more parameters. The learner infrastructure supports more intelligent model optimization (e.g. Bayesian methods).