This notebook demonstrates neural networks (NN) classifiers, which are provided by Reproducible experiment platform (REP) package.
REP contains wrappers for following NN libraries:
Most of this is done in the same way as for other classifiers (see notebook 01-howto-Classifiers.ipynb)
In [27]:
!cd toy_datasets; wget -O MiniBooNE_PID.txt -nc MiniBooNE_PID.txt https://archive.ics.uci.edu/ml/machine-learning-databases/00199/MiniBooNE_PID.txt
In [28]:
import numpy, pandas
from rep.utils import train_test_split
from sklearn.metrics import roc_auc_score
data = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep='\s*', skiprows=[0], header=None, engine='python')
labels = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep=' ', nrows=1, header=None)
labels = [1] * labels[1].values[0] + [0] * labels[2].values[0]
data.columns = ['feature_{}'.format(key) for key in data.columns]
In [29]:
data[:5]
Out[29]:
In [30]:
# Get train and test data
train_data, test_data, train_labels, test_labels = train_test_split(data, labels, train_size=0.5)
All nets inherit from sklearn.BaseEstimator and have the same interface as another wrappers in REP (details see in 01-howto-Classifiers)
All of these nets libraries support:
partial_fit
method)and don't support:
In [31]:
variables = list(data.columns[:25])
In [32]:
from rep.estimators import TheanetsClassifier
print TheanetsClassifier.__doc__
In [33]:
tn = TheanetsClassifier(features=variables, layers=[20],
trainers=[{'optimize': 'nag', 'learning_rate': 0.1}])
tn.fit(train_data, train_labels)
Out[33]:
In [34]:
# predict probabilities for each class
prob = tn.predict_proba(test_data)
print prob
In [35]:
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])
In [36]:
tn = TheanetsClassifier(features=variables, layers=[10, 10],
trainers=[{'optimize': 'rprop'}])
tn.fit(train_data, train_labels)
print('training complete')
In [37]:
tn.partial_fit(train_data, train_labels, **{'optimize': 'adadelta'})
Out[37]:
In [38]:
# predict probabilities for each class
prob = tn.predict_proba(test_data)
print prob
In [39]:
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])
In [40]:
tn.predict(test_data)
Out[40]:
In [41]:
from rep.estimators import NeurolabClassifier
print NeurolabClassifier.__doc__
In [42]:
import neurolab
nl = NeurolabClassifier(features=variables, layers=[10], epochs=40, trainf=neurolab.train.train_rprop)
nl.fit(train_data, train_labels)
print('training complete')
In [43]:
# predict probabilities for each class
prob = nl.predict_proba(test_data)
print prob
In [44]:
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])
In [45]:
# predict labels
nl.predict(test_data)
Out[45]:
In [46]:
from rep.estimators import PyBrainClassifier
print PyBrainClassifier.__doc__
In [47]:
pb = PyBrainClassifier(features=variables, layers=[10, 2], hiddenclass=['TanhLayer', 'SigmoidLayer'])
pb.fit(train_data, train_labels)
print('training complete')
In [48]:
prob = pb.predict_proba(test_data)
print 'ROC AUC:', roc_auc_score(test_labels, prob[:, 1])
In [49]:
pb.predict(test_data)
Out[49]:
initial prescaling of features is frequently crucial to get some appropriate results using neural networks.
By default, all the networks use StandardScaler
from sklearn
, but you can use any other transformer, say MinMax or self-written by passing appropriate value as scaler. All the networks have same support of scaler
parameter
In [50]:
from sklearn.preprocessing import MinMaxScaler
# will use StandardScaler
NeurolabClassifier(scaler='standard')
# will use MinMaxScaler
NeurolabClassifier(scaler=MinMaxScaler())
# will not use any pretransformation of features
NeurolabClassifier(scaler=False)
Out[50]:
In [51]:
from sklearn.ensemble import BaggingClassifier
base_tn = TheanetsClassifier(layers=[20], trainers=[{'min_improvement': 0.01}])
bagging_tn = BaggingClassifier(base_estimator=base_tn, n_estimators=3)
bagging_tn.fit(train_data[variables], train_labels)
print('training complete')
In [52]:
prob = bagging_tn.predict_proba(test_data[variables])
print 'AUC', roc_auc_score(test_labels, prob[:, 1])
There are many things you can do with neural networks now:
grid_search
, play with sizes of hidden layers and other parameterssklearn.pipeline
)And you can replace classifiers at any moment.