Kitchen and 20 newsgroups dataset

This example illustrates how to train kitchen neural network using the standart 20 newsgroups dataset in a similiar way to the sklearn's tutorial: http://scikit-learn.org/stable/datasets/twenty_newsgroups.html



In [1]:

    
import kitchen
import lasagne

Network architecture

The network architecture is specified using Python class definition. kitchen uses multiple inheritance to specify a generic network structure (kitchen.Network), a training algorithm (kitchen.SGDNesterovMomentum), an optimization criterion (kitchen.CategoricalCrossentropy) and optionaly the regularization (kitchen.L2Regularization).

create_layers()

The core method for specifying the network architecture is the create_layers() method. It's header looks like:

def create_layers(self, X_dim, y_dim, random_state):

where:

self is self :-)
X_dim is the dimensionality of an input feature vector
y_dim is the dimensionality of an output target vector
random_state is a random state used to initialize network parameters

The goal of the create_layers() method is to construct the layers the neural network and return tuple (input_layer, output_layer).

Input layer

To create the input layer instantiate the lasagne.layers.InputLayer class and use the X_dim parameter to determine the size of an input vector.

Output layer

The ouput layer is the instance of any lasagne layer. To correctly train the network, it is necessary to match the optimization criterion and the output layer activation function:

BinaryCrossentropy is matched with lasagne.nonlinearities.sigmoid and used for binary classification problems
CategoricalCrossentropy is matched with lasagne.nonlinearities.softmax and used for multi-class classification problems

Parameter initialization

Kitchen reimplements the lasagne initializers to support the sklearn-like initialization using the random_state parameter. Just instantiate kitchen.init classes in the create_layers() method instead of lasagne.init and pass the instaces to layer constructors.

Example network - `Network1`

The example below creates network with one hidden layer with 500 neurons and linear rectifier activations.

The output layer uses softmax activation function, which is suitable for multi-class classification.



In [2]:

    
class Network1(kitchen.Network, kitchen.SGDNesterovMomentum, kitchen.CategoricalCrossentropy, kitchen.L2Regularization):
    def create_layers(self, X_dim, y_dim, random_state):
        initW = kitchen.init.GlorotUniform(random_state=random_state, gain='relu')
        initb = kitchen.init.Uniform(random_state=random_state)

        input = lasagne.layers.InputLayer(shape=(None, X_dim))

        hidden = lasagne.layers.DenseLayer(input,
                                           num_units=500,
                                           nonlinearity=lasagne.nonlinearities.rectify,
                                           W=initW, b=initb)

        output = lasagne.layers.DenseLayer(hidden,
                                           num_units=y_dim,
                                           nonlinearity=lasagne.nonlinearities.softmax,
                                           W=initW, b=initb)

        return input, output

sklearn integration

Kitchen is focused on the integration with sklearn so that kitchen neural networks are sklearn's classifiers and could be used in GridSearchCV and/or Pipeline.

In this example, the sklearn is used to featch the 20 newsgroups dataset and transform it using the TfidfVectorizer into a TF-IDF feature vectors.

The training data consists of the pair X and y, the test data test_X and test_y.



In [3]:

    
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, roc_auc_score


newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')

vectorizer = TfidfVectorizer(min_df=10)
X = vectorizer.fit_transform(newsgroups_train.data).toarray()
y = newsgroups_train.target

test_X = vectorizer.transform(newsgroups_test.data).toarray()
test_y = newsgroups_test.target

Epoch and batch callbacks

It is usefull to monitor the training process. In this case we define two callback:

epoch_callback which is called after each epoch; it prints some time information and also the training (and test) loss and accuracy.
batch_callback which is called after each mini-batch, it just prints the number of mini-batch, used as a visual feedback during training.



In [4]:

    
def epoch_callback(stats):
    avg_test_loss = clsf.loss(test_X, test_y)

    pred_y = clsf.predict(test_X)
    acc = accuracy_score(test_y, pred_y)

    pred_y = clsf.predict(X)
    acc_train = accuracy_score(y, pred_y)

    print("")
    print("Epoch {}, took {}".format(stats['epoch'], stats['t_epoch']))
    print("  training loss:    \t{:.6f}".format(stats['avg_train_loss']))
    print("  training accuracy:\t{:.5f}".format(acc_train))
    print("  test loss:        \t{:.6f}".format(avg_test_loss))
    print("  test accuracy:    \t{:.5f}".format(acc))

def batch_callback(stats):
    print stats['batch_num'],
    sys.stdout.flush()

Training the network

To train the network instantiate the Network1 class just like any other classifier from sklearn. You can specify additional parameters, e.g.:

batch_size - the size of the mini-batch in an SGD algorithm
learning_rate - the size of an update step
alpha - the weight of a regularization term
n_epochs - number of training epochs

After creating an instance of Network1, just call the fit() method and use your training data.



In [5]:

    
clsf = Network1(batch_size=128,
               random_state=42,
               learning_rate=0.1,
               alpha=0.0001,
               n_epochs=10, 
               epoch_callback=epoch_callback,
               batch_callback=batch_callback)
clsf.fit(X, y)









    



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 1, took 0:00:22.879267
  training loss:    	2.899898
  training accuracy:	0.38616
  test loss:        	2.747076
  test accuracy:    	0.31121
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 2, took 0:00:20.962085
  training loss:    	2.377049
  training accuracy:	0.73732
  test loss:        	2.109553
  test accuracy:    	0.62454
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 3, took 0:00:21.979369
  training loss:    	1.567305
  training accuracy:	0.82977
  test loss:        	1.454442
  test accuracy:    	0.72557
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 4, took 0:00:19.573091
  training loss:    	0.975413
  training accuracy:	0.89791
  test loss:        	1.079327
  test accuracy:    	0.78306
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 5, took 0:00:21.657240
  training loss:    	0.657697
  training accuracy:	0.92107
  test loss:        	0.917733
  test accuracy:    	0.78266
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 6, took 0:00:21.731523
  training loss:    	0.483460
  training accuracy:	0.94564
  test loss:        	0.803713
  test accuracy:    	0.80550
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 7, took 0:00:21.414499
  training loss:    	0.371634
  training accuracy:	0.95952
  test loss:        	0.736851
  test accuracy:    	0.81506
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 8, took 0:00:21.804400
  training loss:    	0.297562
  training accuracy:	0.97066
  test loss:        	0.699920
  test accuracy:    	0.81532
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 9, took 0:00:23.867777
  training loss:    	0.243854
  training accuracy:	0.97879
  test loss:        	0.667521
  test accuracy:    	0.82236
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
Epoch 10, took 0:00:22.632328
  training loss:    	0.203605
  training accuracy:	0.98400
  test loss:        	0.663889
  test accuracy:    	0.81798

Using the classifier

Use tha classifier in the same way as any other sklearn classifier. You can call predict() to predict the target classes, predict_proba() to predict class-probabilities or pickle the classifier.



In [6]:

    
pred_y = clsf.predict(test_X)
acc = accuracy_score(test_y, pred_y)

prob_y = clsf.predict_proba(test_X)

print("Test accuracy: {:.5f}".format(acc))
print("Test ROC AUC for class:")

for idx, cls in enumerate(clsf.classes_):
    auc = roc_auc_score(test_y==cls, prob_y[:, idx])
    print("    {}: {:.5f}".format(cls, auc))









    



Test accuracy: 0.81798
Test ROC AUC for class:
    0: 0.98401
    1: 0.97850
    2: 0.97475
    3: 0.98016
    4: 0.98628
    5: 0.98192
    6: 0.99289
    7: 0.99496
    8: 0.99889
    9: 0.99761
    10: 0.99915
    11: 0.99392
    12: 0.97810
    13: 0.98875
    14: 0.99593
    15: 0.99437
    16: 0.98499
    17: 0.99642
    18: 0.95440
    19: 0.96634