General-purpose prediction with DNNs

Multi-layer deep networks are powerful predictors and often outperform classical models like kernel-SVMs and gradient-boosted trees. Here we'll apply a simple multi-layer network to classification of Higgs Boson data.

Let's load BIDMat/BIDMach


In [ ]:
import BIDMat.{CMat,CSMat,DMat,Dict,IDict,Image,FMat,FND,GDMat,GMat,GIMat,GSDMat,GSMat,HMat,IMat,Mat,SMat,SBMat,SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.JPlotting._
import BIDMach.Learner
import BIDMach.models.{FM,GLM,KMeans,KMeansw,ICA,LDA,LDAgibbs,Model,NMF,RandomForest,SFA,SVD}
import BIDMach.networks.{Net}
import BIDMach.datasources.{DataSource,MatSource,FileSource,SFileSource}
import BIDMach.mixins.{CosineSim,Perplexity,Top,L1Regularizer,L2Regularizer}
import BIDMach.updaters.{ADAGrad,Batch,BatchNorm,IncMult,IncNorm,Telescoping}
import BIDMach.causal.{IPTW}

Mat.checkMKL
Mat.checkCUDA
Mat.setInline
if (Mat.hasCUDA > 0) GPUmem

And define the root directory for this dataset.


In [ ]:
val dir = "/code/BIDMach/data/uci/Higgs/parts/"

Constructing a deep network Learner

The "Net" class is the parent class for Deep networks. By defining a learner, we also configure a datasource, an optimization method, and possibly a regularizer.


In [ ]:
val (mm, opts) = Net.learner(dir+"data%03d.fmat.lz4", dir+"label%03d.fmat.lz4")

The next step is to define the network to run. First we set some options:


In [ ]:
opts.hasBias = true;                    // Include additive bias in linear layers
opts.links = iones(1,1);                // The link functions specify output loss, 1= logistic
opts.nweight = 1e-4f                    // weight for normalization layers

Now we define the network itself. We use the function "dnodes3" which builds stack of 3 layers with linear, non-linear and normalization layers. The non-linearity is configurable. The arguments to the function are:

  • depth:Int the number of layers,
  • width:Int the number of units in the first hidden layer (not the input layer which is set by the data source)
  • taper:Float the decrease (multiplicative) in width of each hidden layer from the first
  • ntargs:Int how many targets to predict
  • opts:Opts the options above
  • nonlin:Int the type of non-linear layer, 1=tanh, 2=sigmoid, 3=rectifying, 4=softplus

In [ ]:
opts.nodeset = Net.dnodes3(12, 500, 0.6f, 1, opts, 2);

Here's the source for dnodes3. It creates a "nodeset" or flow graph for the network. The nodeSet is "nodes" and can be access like an array. nodes(i) is set to a node whose input is nodes(i-1) etc.

  def dnodes3(depth0:Int, width:Int, taper:Float, ntargs:Int, opts:Opts, nonlin:Int = 1):NodeSet = {
    val depth = (depth0/3)*3;              // Round up to an odd number of nodes 
    val nodes = new NodeSet(depth);
    var w = width;
    nodes(0) = new InputNode;
    for (i <- 1 until depth-2) {
        if (i % 3 == 1) {
            nodes(i) = new LinNode{inputs(0)=nodes(i-1); outdim=w; hasBias=opts.hasBias; aopts=opts.aopts};
            w = (taper*w).toInt;
        } else if (i % 3 == 2) {
          nonlin match {
            case 1 => nodes(i) = new TanhNode{inputs(0)=nodes(i-1)};
            case 2 => nodes(i) = new SigmoidNode{inputs(0)=nodes(i-1)};
            case 3 => nodes(i) = new RectNode{inputs(0)=nodes(i-1)};
            case 4 => nodes(i) = new SoftplusNode{inputs(0)=nodes(i-1)};
          }
        } else {
            nodes(i) = new NormNode{inputs(0)=nodes(i-1); targetNorm=opts.targetNorm; weight=opts.nweight};
        }
    }
    nodes(depth-2) = new LinNode{inputs(0)=nodes(depth-3); outdim=ntargs; hasBias=opts.hasBias; aopts=opts.aopts};
    nodes(depth-1) = new GLMNode{inputs(0)=nodes(depth-2); links=opts.links};
    nodes;
  }
  

Tuning Options

Here follow some tuning options


In [ ]:
opts.nend = 10                         // The last file number in the datasource
opts.npasses = 5                       // How many passes to make over the data 
opts.batchSize = 200                  // The minibatch size
opts.evalStep = 511                    // Count of minibatch between eval steps

opts.lrate = 0.01f;                    // Learning rate
opts.texp = 0.4f;                      // Time exponent for ADAGRAD

You invoke the learner the same way as before.


In [ ]:
mm.train

Now lets extract the model and use it to predict labels on a held-out sample of data.


In [ ]:
val model = mm.model.asInstanceOf[Net]

val ta = loadFMat(dir + "data%03d.fmat.lz4" format 10);
val tc = loadFMat(dir + "label%03d.fmat.lz4" format 10);

val (nn,nopts) = Net.predictor(model, ta);
nopts.batchSize=10000

Let's run the predictor


In [ ]:
nn.predict

To evaluate, we extract the predictions as a floating matrix, and then compute a ROC curve with them. The mean of this curve is the AUC (Area Under the Curve).


In [ ]:
val pc = FMat(nn.preds(0))
val rc = roc(pc, tc, 1-tc, 1000);
mean(rc)

In [ ]:
plot(rc)

Not Bad!: this net gives competitive performance on the Kaggle challenge for Higgs Boson classification.

Tuning

This net an be optimized in a variety of ways. Try adding an extra block of layers (your should increment the net depth by 3) and re=running. You may need to restart the notebook.