This example illustrates how to train kitchen neural network using the standart 20 newsgroups dataset in a similiar way to the sklearn's tutorial: http://scikit-learn.org/stable/datasets/twenty_newsgroups.html
In [1]:
import kitchen
import lasagne
The network architecture is specified using Python class definition. kitchen uses multiple inheritance to specify a generic network structure (kitchen.Network), a training algorithm (kitchen.SGDNesterovMomentum), an optimization criterion (kitchen.CategoricalCrossentropy) and optionaly the regularization (kitchen.L2Regularization).
The core method for specifying the network architecture is the create_layers() method. It's header looks like:
def create_layers(self, X_dim, y_dim, random_state):
where:
self is self :-)X_dim is the dimensionality of an input feature vectory_dim is the dimensionality of an output target vectorrandom_state is a random state used to initialize network parametersThe goal of the create_layers() method is to construct the layers the neural network and return tuple (input_layer, output_layer).
To create the input layer instantiate the lasagne.layers.InputLayer class and use the X_dim parameter to determine the size of an input vector.
The ouput layer is the instance of any lasagne layer. To correctly train the network, it is necessary to match the optimization criterion and the output layer activation function:
BinaryCrossentropy is matched with lasagne.nonlinearities.sigmoid and used for binary classification problemsCategoricalCrossentropy is matched with lasagne.nonlinearities.softmax and used for multi-class classification problemsKitchen reimplements the lasagne initializers to support the sklearn-like initialization using the random_state parameter. Just instantiate kitchen.init classes in the create_layers() method instead of lasagne.init and pass the instaces to layer constructors.
Network1The example below creates network with one hidden layer with 500 neurons and linear rectifier activations.
The output layer uses softmax activation function, which is suitable for multi-class classification.
In [2]:
class Network1(kitchen.Network, kitchen.SGDNesterovMomentum, kitchen.CategoricalCrossentropy, kitchen.L2Regularization):
def create_layers(self, X_dim, y_dim, random_state):
initW = kitchen.init.GlorotUniform(random_state=random_state, gain='relu')
initb = kitchen.init.Uniform(random_state=random_state)
input = lasagne.layers.InputLayer(shape=(None, X_dim))
hidden = lasagne.layers.DenseLayer(input,
num_units=500,
nonlinearity=lasagne.nonlinearities.rectify,
W=initW, b=initb)
output = lasagne.layers.DenseLayer(hidden,
num_units=y_dim,
nonlinearity=lasagne.nonlinearities.softmax,
W=initW, b=initb)
return input, output
Kitchen is focused on the integration with sklearn so that kitchen neural networks are sklearn's classifiers and could be used in GridSearchCV and/or Pipeline.
In this example, the sklearn is used to featch the 20 newsgroups dataset and transform it using the TfidfVectorizer into a TF-IDF feature vectors.
The training data consists of the pair X and y, the test data test_X and test_y.
In [3]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, roc_auc_score
newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')
vectorizer = TfidfVectorizer(min_df=10)
X = vectorizer.fit_transform(newsgroups_train.data).toarray()
y = newsgroups_train.target
test_X = vectorizer.transform(newsgroups_test.data).toarray()
test_y = newsgroups_test.target
It is usefull to monitor the training process. In this case we define two callback:
epoch_callback which is called after each epoch; it prints some time information and also the training (and test) loss and accuracy.batch_callback which is called after each mini-batch, it just prints the number of mini-batch, used as a visual feedback during training.
In [4]:
def epoch_callback(stats):
avg_test_loss = clsf.loss(test_X, test_y)
pred_y = clsf.predict(test_X)
acc = accuracy_score(test_y, pred_y)
pred_y = clsf.predict(X)
acc_train = accuracy_score(y, pred_y)
print("")
print("Epoch {}, took {}".format(stats['epoch'], stats['t_epoch']))
print(" training loss: \t{:.6f}".format(stats['avg_train_loss']))
print(" training accuracy:\t{:.5f}".format(acc_train))
print(" test loss: \t{:.6f}".format(avg_test_loss))
print(" test accuracy: \t{:.5f}".format(acc))
def batch_callback(stats):
print stats['batch_num'],
sys.stdout.flush()
To train the network instantiate the Network1 class just like any other classifier from sklearn. You can specify additional parameters, e.g.:
batch_size - the size of the mini-batch in an SGD algorithmlearning_rate - the size of an update stepalpha - the weight of a regularization termn_epochs - number of training epochsAfter creating an instance of Network1, just call the fit() method and use your training data.
In [5]:
clsf = Network1(batch_size=128,
random_state=42,
learning_rate=0.1,
alpha=0.0001,
n_epochs=10,
epoch_callback=epoch_callback,
batch_callback=batch_callback)
clsf.fit(X, y)
In [6]:
pred_y = clsf.predict(test_X)
acc = accuracy_score(test_y, pred_y)
prob_y = clsf.predict_proba(test_X)
print("Test accuracy: {:.5f}".format(acc))
print("Test ROC AUC for class:")
for idx, cls in enumerate(clsf.classes_):
auc = roc_auc_score(test_y==cls, prob_y[:, idx])
print(" {}: {:.5f}".format(cls, auc))