About

This notebook demonstrates neural networks (NN) classifiers, which are provided by Reproducible experiment platform (REP) package.
REP contains wrappers for following NN libraries:

  • theanets
  • neurolab
  • pybrain

In this notebook we show:

  • train classifier
  • get predictions
  • measure quality
  • pretraining and partial fitting
  • combine classifiers using meta-algorithms

Most of this is done in the same way as for other classifiers (see notebook 01-howto-Classifiers.ipynb)

Loading data

download particle identification Data Set from UCI


In [27]:
!cd toy_datasets; wget -O MiniBooNE_PID.txt -nc MiniBooNE_PID.txt https://archive.ics.uci.edu/ml/machine-learning-databases/00199/MiniBooNE_PID.txt


File `MiniBooNE_PID.txt' already there; not retrieving.

In [28]:
import numpy, pandas
from rep.utils import train_test_split
from sklearn.metrics import roc_auc_score

data = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep='\s*', skiprows=[0], header=None, engine='python')
labels = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep=' ', nrows=1, header=None)
labels = [1] * labels[1].values[0] + [0] * labels[2].values[0]
data.columns = ['feature_{}'.format(key) for key in data.columns]

First rows of our data


In [29]:
data[:5]


Out[29]:
feature_0 feature_1 feature_2 feature_3 feature_4 feature_5 feature_6 feature_7 feature_8 feature_9 ... feature_40 feature_41 feature_42 feature_43 feature_44 feature_45 feature_46 feature_47 feature_48 feature_49
0 2.59413 0.468803 20.6916 0.322648 0.009682 0.374393 0.803479 0.896592 3.59665 0.249282 ... 101.174 -31.3730 0.442259 5.86453 0.000000 0.090519 0.176909 0.457585 0.071769 0.245996
1 3.86388 0.645781 18.1375 0.233529 0.030733 0.361239 1.069740 0.878714 3.59243 0.200793 ... 186.516 45.9597 -0.478507 6.11126 0.001182 0.091800 -0.465572 0.935523 0.333613 0.230621
2 3.38584 1.197140 36.0807 0.200866 0.017341 0.260841 1.108950 0.884405 3.43159 0.177167 ... 129.931 -11.5608 -0.297008 8.27204 0.003854 0.141721 -0.210559 1.013450 0.255512 0.180901
3 4.28524 0.510155 674.2010 0.281923 0.009174 0.000000 0.998822 0.823390 3.16382 0.171678 ... 163.978 -18.4586 0.453886 2.48112 0.000000 0.180938 0.407968 4.341270 0.473081 0.258990
4 5.93662 0.832993 59.8796 0.232853 0.025066 0.233556 1.370040 0.787424 3.66546 0.174862 ... 229.555 42.9600 -0.975752 2.66109 0.000000 0.170836 -0.814403 4.679490 1.924990 0.253893

5 rows × 50 columns

Splitting into train and test


In [30]:
# Get train and test data
train_data, test_data, train_labels, test_labels = train_test_split(data, labels, train_size=0.5)

Neural nets

All nets inherit from sklearn.BaseEstimator and have the same interface as another wrappers in REP (details see in 01-howto-Classifiers)

All of these nets libraries support:

  • classification
  • multi-classification
  • regression
  • multi-target regresssion
  • additional fitting (using partial_fit method)

and don't support:

  • staged prediction methods
  • weights for data

Variables used in training


In [31]:
variables = list(data.columns[:25])

Theanets


In [32]:
from rep.estimators import TheanetsClassifier
print TheanetsClassifier.__doc__


Classifier from Theanets library. 

    Parameters:
    -----------
    :param features: list of features to train model
    :type features: None or list(str)
    :param layers: a sequence of values specifying the **hidden** layer configuration for the network.
        For more information please see 'Specifying layers' in theanets documentation:
        http://theanets.readthedocs.org/en/latest/creating.html#creating-specifying-layers
        Note that theanets "layers" parameter included input and output layers in the sequence as well.
    :type layers: sequence of int, tuple, dict
    :param int input_layer: size of the input layer. If equals -1, the size is taken from the training dataset
    :param int output_layer: size of the output layer. If equals -1, the size is taken from the training dataset
    :param str hidden_activation: the name of an activation function to use on hidden network layers by default
    :param str output_activation: the name of an activation function to use on the output layer by default
    :param float input_noise: standard deviation of desired noise to inject into input
    :param float hidden_noise: standard deviation of desired noise to inject into hidden unit activation output
    :param input_dropouts: proportion of input units to randomly set to 0
    :type input_dropouts: float in [0, 1]
    :param hidden_dropouts: proportion of hidden unit activations to randomly set to 0
    :type hidden_dropouts: float in [0, 1]
    :param decode_from: any of the hidden layers can be tapped at the output. Just specify a value greater than
        1 to tap the last N hidden layers. The default is 1, which decodes from just the last layer
    :type decode_from: positive int
    :param scaler: scaler used to transform data. If False, scaling will not be used
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param trainers: parameters to specify training algorithm(s)
        example: [{'optimize': sgd, 'momentum': 0.2}, {'optimize': 'nag'}]
    :type trainers: list[dict] or None
    :param int random_state: random seed


    For more information on available trainers and their parameters, see this page
    http://theanets.readthedocs.org/en/latest/training.html
    

Simple training


In [33]:
tn = TheanetsClassifier(features=variables, layers=[20], 
                        trainers=[{'optimize': 'nag', 'learning_rate': 0.1}])

tn.fit(train_data, train_labels)


Out[33]:
TheanetsClassifier(decode_from=1,
          features=['feature_0', 'feature_1', 'feature_2', 'feature_3', 'feature_4', 'feature_5', 'feature_6', 'feature_7', 'feature_8', 'feature_9', 'feature_10', 'feature_11', 'feature_12', 'feature_13', 'feature_14', 'feature_15', 'feature_16', 'feature_17', 'feature_18', 'feature_19', 'feature_20', 'feature_21', 'feature_22', 'feature_23', 'feature_24'],
          hidden_activation='logistic', hidden_dropouts=0, hidden_noise=0,
          input_dropouts=0, input_layer=-1, input_noise=0, layers=[20],
          output_activation='linear', output_layer=-1, random_state=42,
          scaler=StandardScaler(copy=True, with_mean=True, with_std=True),
          trainers=[{'learning_rate': 0.1, 'optimize': 'nag'}])

Predicting probabilities, measuring the quality


In [34]:
# predict probabilities for each class
prob = tn.predict_proba(test_data)
print prob


[[  9.97073105e-01   2.92689511e-03]
 [  9.99829263e-01   1.70737368e-04]
 [  9.99931938e-01   6.80617719e-05]
 ..., 
 [  9.97636984e-01   2.36301586e-03]
 [  2.86741853e-02   9.71325815e-01]
 [  4.79396547e-01   5.20603453e-01]]

In [35]:
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])


ROC AUC 0.967231287392

Theanets multistage training

In some cases we need to continue training: i.e., we have new data or current trainer is not efficient anymore.

For this purpose there is partial_fit method, where you can continue training using different trainer or different data.


In [36]:
tn = TheanetsClassifier(features=variables, layers=[10, 10], 
                        trainers=[{'optimize': 'rprop'}])

tn.fit(train_data, train_labels)
print('training complete')


training complete

Second stage of fitting


In [37]:
tn.partial_fit(train_data, train_labels, **{'optimize': 'adadelta'})


Out[37]:
TheanetsClassifier(decode_from=1,
          features=['feature_0', 'feature_1', 'feature_2', 'feature_3', 'feature_4', 'feature_5', 'feature_6', 'feature_7', 'feature_8', 'feature_9', 'feature_10', 'feature_11', 'feature_12', 'feature_13', 'feature_14', 'feature_15', 'feature_16', 'feature_17', 'feature_18', 'feature_19', 'feature_20', 'feature_21', 'feature_22', 'feature_23', 'feature_24'],
          hidden_activation='logistic', hidden_dropouts=0, hidden_noise=0,
          input_dropouts=0, input_layer=-1, input_noise=0, layers=[10, 10],
          output_activation='linear', output_layer=-1, random_state=42,
          scaler=StandardScaler(copy=True, with_mean=True, with_std=True),
          trainers=[{'optimize': 'rprop'}, {'optimize': 'adadelta'}])

In [38]:
# predict probabilities for each class
prob = tn.predict_proba(test_data)
print prob


[[  9.99319828e-01   6.80172318e-04]
 [  9.99325309e-01   6.74691256e-04]
 [  9.99662312e-01   3.37687728e-04]
 ..., 
 [  9.98850489e-01   1.14951136e-03]
 [  2.27691010e-02   9.77230899e-01]
 [  4.73644521e-01   5.26355479e-01]]

In [39]:
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])


ROC AUC 0.971760255012

Predictions of classes


In [40]:
tn.predict(test_data)


Out[40]:
array([0, 0, 0, ..., 0, 1, 1])

Neurolab


In [41]:
from rep.estimators import NeurolabClassifier
print NeurolabClassifier.__doc__


Classifier from neurolab library. 

    Parameters:
    -----------
    :param features: features used in training
    :type features: list[str] or None
    :param list[int] layers: sequence, number of units inside each **hidden** layer.
    :param string net_type: type of network
        One of 'feed-forward', 'single-layer', 'competing-layer', 'learning-vector',
        'elman-recurrent', 'hopfield-recurrent', 'hemming-recurrent'
    :param initf: layer initializers
    :type initf: anything implementing call(layer), e.g. nl.init.* or list[nl.init.*] of shape [n_layers]
    :param trainf: net train function, default value depends on type of network
    :param scaler: transformer to apply to the input objects
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param random_state: ignored, added for uniformity.
    :param dict kwargs: additional arguments to net __init__, varies with different net_types

    .. seealso:: https://pythonhosted.org/neurolab/lib.html for supported train functions and their parameters.
    

Let's train network using Rprop algorithm


In [42]:
import neurolab
nl = NeurolabClassifier(features=variables, layers=[10], epochs=40, trainf=neurolab.train.train_rprop)
nl.fit(train_data, train_labels)
print('training complete')


The maximum number of train epochs is reached
training complete

After training neural network you still can improve it by using partial fit on other data:

nl.partial_fit(new_train_data, new_train_labels)

Predict probabilities and estimate quality


In [43]:
# predict probabilities for each class
prob = nl.predict_proba(test_data)
print prob


[[ 0.85450056  0.14549944]
 [ 0.84607857  0.15392143]
 [ 0.86431641  0.13568359]
 ..., 
 [ 0.83337519  0.16662481]
 [ 0.77451208  0.22548792]
 [ 0.72305126  0.27694874]]

In [44]:
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])


ROC AUC 0.798553242002

In [45]:
# predict labels
nl.predict(test_data)


Out[45]:
array([0, 0, 0, ..., 0, 0, 0])

Pybrain


In [46]:
from rep.estimators import PyBrainClassifier
print PyBrainClassifier.__doc__


Implements classification from PyBrain library 

    Parameters:
    -----------
    :param features: features used in training.
    :type features: list[str] or None
    :param scaler: transformer to apply to the input objects
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param bool use_rprop: flag to indicate whether we should use Rprop or SGD trainer
    :param bool verbose: print train/validation errors.
    :param random_state: ignored parameter, pybrain training isn't reproducible

    **Net parameters:**

    :param list[int] layers: indicate how many neurons in each hidden(!) layer; default is 1 hidden layer with 10 neurons
    :param list[str] hiddenclass: classes of the hidden layers; default is 'SigmoidLayer'
    :param dict params: other net parameters:
        bias and outputbias (boolean) flags to indicate whether the network should have the corresponding biases,
        both default to True;
        peepholes (boolean);
        recurrent (boolean) if the `recurrent` flag is set, a :class:`RecurrentNetwork` will be created,
        otherwise a :class:`FeedForwardNetwork`

    **Gradient descent trainer parameters:**

    :param float learningrate: gives the ratio of which parameters are changed into the direction of the gradient
    :param float lrdecay: the learning rate decreases by lrdecay, which is used to multiply the learning rate after each training step
    :param float momentum: the ratio by which the gradient of the last timestep is used
    :param boolean batchlearning: if set, the parameters are updated only at the end of each epoch. Default is False
    :param float weightdecay: corresponds to the weightdecay rate, where 0 is no weight decay at all

    **Rprop trainer parameters:**

    :param float etaminus: factor by which step width is decreased when overstepping (0.5)
    :param float etaplus: factor by which step width is increased when following gradient (1.2)
    :param float delta: step width for each weight
    :param float deltamin: minimum step width (1e-6)
    :param float deltamax: maximum step width (5.0)
    :param float delta0: initial step width (0.1)

    **Training termination parameters**

    :param int epochs: number of iterations of training; if < 0 then classifier trains until convergence
    :param int max_epochs: if is given, at most that many epochs are trained
    :param int continue_epochs: each time validation error decreases, try for continue_epochs epochs to find a better one
    :param float validation_proportion: the ratio of the dataset that is used for the validation dataset

    .. note::

        Details about parameters: http://pybrain.org/docs/
    

In [47]:
pb = PyBrainClassifier(features=variables, layers=[10, 2], hiddenclass=['TanhLayer', 'SigmoidLayer'])
pb.fit(train_data, train_labels)
print('training complete')


training complete

Predict probabilities and estimate quality

again, we could proceed with training and use new dataset

nl.partial_fit(new_train_data, new_train_labels)

In [48]:
prob = pb.predict_proba(test_data)
print 'ROC AUC:', roc_auc_score(test_labels, prob[:, 1])


ROC AUC: 0.955048270009

Predict labels


In [49]:
pb.predict(test_data)


Out[49]:
array([0, 0, 0, ..., 0, 1, 0])

Scaling of features

initial prescaling of features is frequently crucial to get some appropriate results using neural networks.

By default, all the networks use StandardScaler from sklearn, but you can use any other transformer, say MinMax or self-written by passing appropriate value as scaler. All the networks have same support of scaler parameter


In [50]:
from sklearn.preprocessing import MinMaxScaler
# will use StandardScaler
NeurolabClassifier(scaler='standard')
# will use MinMaxScaler
NeurolabClassifier(scaler=MinMaxScaler())
# will not use any pretransformation of features
NeurolabClassifier(scaler=False)


Out[50]:
NeurolabClassifier(initf=<function init_rand at 0x111f242a8>, layers=[10],
          net_type='feed-forward', random_state=None, scaler=False,
          trainf=None)

Advantages of common interface

Let's build an ensemble of neural networks. This will be done by bagging meta-algorithm

Bagging over Theanets classifier (same can be done with any neural network)

in practice, one will need many networks to get predictions better, then obtained by one network


In [51]:
from sklearn.ensemble import BaggingClassifier

base_tn = TheanetsClassifier(layers=[20], trainers=[{'min_improvement': 0.01}])
bagging_tn = BaggingClassifier(base_estimator=base_tn, n_estimators=3)
bagging_tn.fit(train_data[variables], train_labels)
print('training complete')


training complete

In [52]:
prob = bagging_tn.predict_proba(test_data[variables])
print 'AUC', roc_auc_score(test_labels, prob[:, 1])


AUC 0.967326443313

Other advantages of common interface

There are many things you can do with neural networks now:

  • cloning
  • getting / setting parameters as dictionaries
  • use grid_search, play with sizes of hidden layers and other parameters
  • build pipelines (sklearn.pipeline)
  • use hierarchical training, training on subsets
  • passing over internet / train classifiers on other machines / distributed learning of ensemles

And you can replace classifiers at any moment.