About

This notebook demonstrates neural networks (NN) classifiers, which are provided by Reproducible experiment platform (REP) package.
REP contains wrappers for following NN libraries:

theanets
neurolab
pybrain

In this notebook we show:

train classifier
get predictions
measure quality
pretraining and partial fitting
combine classifiers using meta-algorithms

Most of this is done in the same way as for other classifiers (see notebook 01-howto-Classifiers.ipynb).

Parameters selected here are specially taken to make training very fast, those are very non-optimal.

Loading data

download particle identification data set from UCI



In [1]:

    
!cd toy_datasets; wget -O MiniBooNE_PID.txt -nc MiniBooNE_PID.txt https://archive.ics.uci.edu/ml/machine-learning-databases/00199/MiniBooNE_PID.txt









    



File `MiniBooNE_PID.txt' already there; not retrieving.



In [2]:

    
import numpy, pandas
from rep.utils import train_test_split
from sklearn.metrics import roc_auc_score

data = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep='\s*', skiprows=[0], header=None, engine='python')
labels = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep=' ', nrows=1, header=None)
labels = [1] * labels[1].values[0] + [0] * labels[2].values[0]
data.columns = ['feature_{}'.format(key) for key in data.columns]



In [3]:

    
len(data)









    Out[3]:





130064

First rows of data



In [4]:

    
data[:5]









    Out[4]:






  
    
      
      feature_0
      feature_1
      feature_2
      feature_3
      feature_4
      feature_5
      feature_6
      feature_7
      feature_8
      feature_9
      ...
      feature_40
      feature_41
      feature_42
      feature_43
      feature_44
      feature_45
      feature_46
      feature_47
      feature_48
      feature_49
    
  
  
    
      0
      2.59413
      0.468803
      20.6916
      0.322648
      0.009682
      0.374393
      0.803479
      0.896592
      3.59665
      0.249282
      ...
      101.174
      -31.3730
      0.442259
      5.86453
      0.000000
      0.090519
      0.176909
      0.457585
      0.071769
      0.245996
    
    
      1
      3.86388
      0.645781
      18.1375
      0.233529
      0.030733
      0.361239
      1.069740
      0.878714
      3.59243
      0.200793
      ...
      186.516
      45.9597
      -0.478507
      6.11126
      0.001182
      0.091800
      -0.465572
      0.935523
      0.333613
      0.230621
    
    
      2
      3.38584
      1.197140
      36.0807
      0.200866
      0.017341
      0.260841
      1.108950
      0.884405
      3.43159
      0.177167
      ...
      129.931
      -11.5608
      -0.297008
      8.27204
      0.003854
      0.141721
      -0.210559
      1.013450
      0.255512
      0.180901
    
    
      3
      4.28524
      0.510155
      674.2010
      0.281923
      0.009174
      0.000000
      0.998822
      0.823390
      3.16382
      0.171678
      ...
      163.978
      -18.4586
      0.453886
      2.48112
      0.000000
      0.180938
      0.407968
      4.341270
      0.473081
      0.258990
    
    
      4
      5.93662
      0.832993
      59.8796
      0.232853
      0.025066
      0.233556
      1.370040
      0.787424
      3.66546
      0.174862
      ...
      229.555
      42.9600
      -0.975752
      2.66109
      0.000000
      0.170836
      -0.814403
      4.679490
      1.924990
      0.253893
    
  

5 rows × 50 columns

Splitting into train and test



In [5]:

    
# Get train and test data
train_data, test_data, train_labels, test_labels = train_test_split(data, labels, train_size=0.25)

Neural nets

All nets inherit from sklearn.BaseEstimator and have the same interface as another wrappers in REP (details see in 01-howto-Classifiers)

Neurla network libraries libraries support:

classification
multi-classification
regression
multi-target regresssion
additional fitting (using partial_fit method)

and don't support:

staged prediction methods
weights for data

Variables used in training



In [6]:

    
variables = list(data.columns[:15])

Theanets



In [7]:

    
from rep.estimators import TheanetsClassifier
print TheanetsClassifier.__doc__









    



Classifier from Theanets library. 

    Parameters:
    -----------
    :param features: list of features to train model
    :type features: None or list(str)
    :param layers: a sequence of values specifying the **hidden** layer configuration for the network.
        For more information please see 'Specifying layers' in theanets documentation:
        http://theanets.readthedocs.org/en/latest/creating.html#creating-specifying-layers
        Note that theanets "layers" parameter included input and output layers in the sequence as well.
    :type layers: sequence of int, tuple, dict
    :param int input_layer: size of the input layer. If equals -1, the size is taken from the training dataset
    :param int output_layer: size of the output layer. If equals -1, the size is taken from the training dataset
    :param str hidden_activation: the name of an activation function to use on hidden network layers by default
    :param str output_activation: the name of an activation function to use on the output layer by default
    :param float input_noise: standard deviation of desired noise to inject into input
    :param float hidden_noise: standard deviation of desired noise to inject into hidden unit activation output
    :param input_dropouts: proportion of input units to randomly set to 0
    :type input_dropouts: float in [0, 1]
    :param hidden_dropouts: proportion of hidden unit activations to randomly set to 0
    :type hidden_dropouts: float in [0, 1]
    :param decode_from: any of the hidden layers can be tapped at the output. Just specify a value greater than
        1 to tap the last N hidden layers. The default is 1, which decodes from just the last layer
    :type decode_from: positive int
    :param scaler: scaler used to transform data. If False, scaling will not be used
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param trainers: parameters to specify training algorithm(s)
        example: [{'optimize': sgd, 'momentum': 0.2}, {'optimize': 'nag'}]
    :type trainers: list[dict] or None
    :param int random_state: random seed


    For more information on available trainers and their parameters, see this page
    http://theanets.readthedocs.org/en/latest/training.html

Simple training



In [8]:

    
tn = TheanetsClassifier(features=variables, layers=[7], 
                        trainers=[{'optimize': 'nag', 'learning_rate': 0.1, 'min_improvement': 0.1}])

tn.fit(train_data, train_labels)
pass

Predicting probabilities, measuring the quality



In [9]:

    
prob = tn.predict_proba(test_data)
print prob









    



[[ 0.26320391  0.73679609]
 [ 0.81044349  0.18955651]
 [ 0.40544071  0.59455929]
 ..., 
 [ 0.90087309  0.09912691]
 [ 0.86900052  0.13099948]
 [ 0.90821799  0.09178201]]



In [10]:

    
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])









    



ROC AUC 0.843440299528

Theanets multistage training

In some cases we need to continue training: i.e., we have new data or current trainer is not efficient anymore.

For this purpose there is partial_fit method, where you can continue training using different trainer or different data.



In [11]:

    
tn = TheanetsClassifier(features=variables, layers=[10, 10], 
                        trainers=[{'algo': 'rprop', 'min_improvement': 0.1}])

tn.fit(train_data, train_labels)
print('training complete')









    



training complete

Second stage of fitting



In [12]:

    
tn.partial_fit(train_data, train_labels, **{'algo': 'adagrad', 'min_improvement': 0.1})
print('training complete')









    



training complete



In [13]:

    
# predict probabilities for each class
prob = tn.predict_proba(test_data)
print prob









    



[[ 0.24486897  0.75513103]
 [ 0.78883091  0.21116909]
 [ 0.47429026  0.52570974]
 ..., 
 [ 0.90560846  0.09439154]
 [ 0.88662219  0.11337781]
 [ 0.9052761   0.0947239 ]]



In [14]:

    
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])









    



ROC AUC 0.844713853906

Predictions of classes



In [15]:

    
tn.predict(test_data)









    Out[15]:





array([1, 0, 1, ..., 0, 0, 0])

Neurolab



In [16]:

    
from rep.estimators import NeurolabClassifier
print NeurolabClassifier.__doc__









    



Classifier from neurolab library. 

    Parameters:
    -----------
    :param features: features used in training
    :type features: list[str] or None
    :param list[int] layers: sequence, number of units inside each **hidden** layer.
    :param string net_type: type of network
        One of 'feed-forward', 'single-layer', 'competing-layer', 'learning-vector',
        'elman-recurrent', 'hopfield-recurrent', 'hemming-recurrent'
    :param initf: layer initializers
    :type initf: anything implementing call(layer), e.g. nl.init.* or list[nl.init.*] of shape [n_layers]
    :param trainf: net train function, default value depends on type of network
    :param scaler: transformer to apply to the input objects
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param random_state: ignored, added for uniformity.
    :param dict kwargs: additional arguments to net __init__, varies with different net_types

    .. seealso:: https://pythonhosted.org/neurolab/lib.html for supported train functions and their parameters.

Let's train network using Rprop algorithm



In [17]:

    
import neurolab
nl = NeurolabClassifier(features=variables, layers=[10], epochs=5, trainf=neurolab.train.train_rprop)
nl.fit(train_data, train_labels)
print('training complete')









    



The maximum number of train epochs is reached
training complete

After training neural network you still can improve it by using partial fit on other data:

nl.partial_fit(new_train_data, new_train_labels)

Predict probabilities and estimate quality



In [18]:

    
# predict probabilities for each class
prob = nl.predict_proba(test_data)
print prob









    



[[ 0.72909063  0.27090937]
 [ 0.73301084  0.26698916]
 [ 0.7261278   0.2738722 ]
 ..., 
 [ 0.72833376  0.27166624]
 [ 0.72881829  0.27118171]
 [ 0.72708209  0.27291791]]



In [19]:

    
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])









    



ROC AUC 0.471191281317



In [20]:

    
# predict labels
nl.predict(test_data)









    Out[20]:





array([0, 0, 0, ..., 0, 0, 0])

Pybrain



In [21]:

    
from rep.estimators import PyBrainClassifier
print PyBrainClassifier.__doc__









    



Implements classification from PyBrain library 

    Parameters:
    -----------
    :param features: features used in training.
    :type features: list[str] or None
    :param scaler: transformer to apply to the input objects
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param bool use_rprop: flag to indicate whether we should use Rprop or SGD trainer
    :param bool verbose: print train/validation errors.
    :param random_state: ignored parameter, pybrain training isn't reproducible

    **Net parameters:**

    :param list[int] layers: indicate how many neurons in each hidden(!) layer; default is 1 hidden layer with 10 neurons
    :param list[str] hiddenclass: classes of the hidden layers; default is 'SigmoidLayer'
    :param dict params: other net parameters:
        bias and outputbias (boolean) flags to indicate whether the network should have the corresponding biases,
        both default to True;
        peepholes (boolean);
        recurrent (boolean) if the `recurrent` flag is set, a :class:`RecurrentNetwork` will be created,
        otherwise a :class:`FeedForwardNetwork`

    **Gradient descent trainer parameters:**

    :param float learningrate: gives the ratio of which parameters are changed into the direction of the gradient
    :param float lrdecay: the learning rate decreases by lrdecay, which is used to multiply the learning rate after each training step
    :param float momentum: the ratio by which the gradient of the last timestep is used
    :param boolean batchlearning: if set, the parameters are updated only at the end of each epoch. Default is False
    :param float weightdecay: corresponds to the weightdecay rate, where 0 is no weight decay at all

    **Rprop trainer parameters:**

    :param float etaminus: factor by which step width is decreased when overstepping (0.5)
    :param float etaplus: factor by which step width is increased when following gradient (1.2)
    :param float delta: step width for each weight
    :param float deltamin: minimum step width (1e-6)
    :param float deltamax: maximum step width (5.0)
    :param float delta0: initial step width (0.1)

    **Training termination parameters**

    :param int epochs: number of iterations of training; if < 0 then classifier trains until convergence
    :param int max_epochs: if is given, at most that many epochs are trained
    :param int continue_epochs: each time validation error decreases, try for continue_epochs epochs to find a better one
    :param float validation_proportion: the ratio of the dataset that is used for the validation dataset

    .. note::

        Details about parameters: http://pybrain.org/docs/



In [22]:

    
pb = PyBrainClassifier(features=variables, layers=[5], epochs=2, hiddenclass=['TanhLayer'])
pb.fit(train_data, train_labels)
print('training complete')









    



training complete

Predict probabilities and estimate quality

again, we could proceed with training and use new dataset

nl.partial_fit(new_train_data, new_train_labels)



In [23]:

    
prob = pb.predict_proba(test_data)
print 'ROC AUC:', roc_auc_score(test_labels, prob[:, 1])









    



ROC AUC: 0.856107824713

Predict labels



In [24]:

    
pb.predict(test_data)









    Out[24]:





array([1, 0, 1, ..., 0, 0, 0])

Scaling of features

initial prescaling of features is frequently crucial to get some appropriate results using neural networks.

By default, all the networks use StandardScaler from sklearn, but you can use any other transformer, say MinMax or self-written by passing appropriate value as scaler. All the networks have same support of scaler parameter



In [25]:

    
from sklearn.preprocessing import MinMaxScaler
# will use StandardScaler
NeurolabClassifier(scaler='standard')
# will use MinMaxScaler
NeurolabClassifier(scaler=MinMaxScaler())
# will not use any pretransformation of features
NeurolabClassifier(scaler=False)









    Out[25]:





NeurolabClassifier(initf=<function init_rand at 0x112431f50>, layers=[10],
          net_type='feed-forward', random_state=None, scaler=False,
          trainf=None)

Advantages of common interface

Let's build an ensemble of neural networks. This will be done by bagging meta-algorithm

Bagging over Theanets classifier

A well-known fact is that the classification quality of single neural network can be significantly improved by ensembling.

In simplest case, we average predictions of several neural networks. Bagging trains several classifiers on random subsets of training data, and thus achieves higher quality and more stable predictions.

You can try the same trick with any other network, not only Theanets.



In [26]:

    
# uncomment the code below to try, this may take much time

# from sklearn.ensemble import BaggingClassifier
# base_tn = TheanetsClassifier(layers=[10, 7], trainers=[{'algo': 'adadelta'}])
# bagging_tn = BaggingClassifier(base_estimator=base_tn, n_estimators=10)
# bagging_tn.fit(train_data[variables], train_labels)
# prob = bagging_tn.predict_proba(test_data[variables])
# print 'AUC', roc_auc_score(test_labels, prob[:, 1])

Other advantages of common interface

There are many things you can do with neural networks now:

cloning
getting / setting parameters as dictionaries
use grid_search, play with sizes of hidden layers and other parameters
build pipelines (sklearn.pipeline)
use hierarchical training, training on subsets
passing over internet / train classifiers on other machines / distributed learning of ensemles

And you can replace classifiers at any moment.

	feature_0	feature_1	feature_2	feature_3	feature_4	feature_5	feature_6	feature_7	feature_8	feature_9	...	feature_40	feature_41	feature_42	feature_43	feature_44	feature_45	feature_46	feature_47	feature_48	feature_49
0	2.59413	0.468803	20.6916	0.322648	0.009682	0.374393	0.803479	0.896592	3.59665	0.249282	...	101.174	-31.3730	0.442259	5.86453	0.000000	0.090519	0.176909	0.457585	0.071769	0.245996
1	3.86388	0.645781	18.1375	0.233529	0.030733	0.361239	1.069740	0.878714	3.59243	0.200793	...	186.516	45.9597	-0.478507	6.11126	0.001182	0.091800	-0.465572	0.935523	0.333613	0.230621
2	3.38584	1.197140	36.0807	0.200866	0.017341	0.260841	1.108950	0.884405	3.43159	0.177167	...	129.931	-11.5608	-0.297008	8.27204	0.003854	0.141721	-0.210559	1.013450	0.255512	0.180901
3	4.28524	0.510155	674.2010	0.281923	0.009174	0.000000	0.998822	0.823390	3.16382	0.171678	...	163.978	-18.4586	0.453886	2.48112	0.000000	0.180938	0.407968	4.341270	0.473081	0.258990
4	5.93662	0.832993	59.8796	0.232853	0.025066	0.233556	1.370040	0.787424	3.66546	0.174862	...	229.555	42.9600	-0.975752	2.66109	0.000000	0.170836	-0.814403	4.679490	1.924990	0.253893

About

In this notebook we show:

Loading data

download particle identification data set from UCI

First rows of data

Splitting into train and test

Neural nets

Variables used in training

Theanets

Simple training

Predicting probabilities, measuring the quality

Theanets multistage training

Second stage of fitting

Predictions of classes

Neurolab

Let's train network using Rprop algorithm

After training neural network you still can improve it by using partial fit on other data:

Predict probabilities and estimate quality

Pybrain

Predict probabilities and estimate quality

Predict labels

Scaling of features

Advantages of common interface

Bagging over Theanets classifier

Other advantages of common interface

See also