Multi-layer Perceptron (normal neural network) on the Reuters newswire classification

The original script that this notebook is based on is here


In [1]:
# Imports
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import reuters
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.utils import np_utils
from keras.preprocessing.text import Tokenizer


Using Theano backend.

Neural Network Settings

  • max_words: Only keep this many words as features. Uses the most common words.
  • Iterations: These values set the number of iterations.
    • batch_size: The number of samples per gradient update. Bigger values make the gradient update more accurate, but mean it takes longer to train the neural network
    • nb_epoch: The number of times to go through all of the training data. Since batch_size is less than the full training set size, each "epoch" will be updating the gradient multiple times. So basically, the number of iterations is nb_epoch * sample_size / batch_size.
  • nb_hidden: The number of hidden layers to use
  • nb_dense: The number of units to use in the hidden layer(s).
  • p_dropout: Randomly sets this fraction of the input units to 0 at each gradient update. It helps to prevent overfitting.

Network Architecture:

Here is something close to what the neural network we use here looks like.

Each of the input nodes correspond to a yes/no answer to the questions "Does this article contain the word 'x'?" In our model, we have max_words input nodes instead of the 3 shown here.

The next layer, the hidden layer, is where a lot of the magic happens. Each hidden layer node input is a linear combination of the input layer values. Their output is a nonlinear "activation" function applied to the input. Typical activation functions are tanh or in this case, relu. The more hidden layer nodes you have, the more accurate the neural network can be.

The output layer in our case is the number of types of news articles. Like the hidden layer, each node is a linear combination of the previous layer's outputs.


In [2]:
max_words = 1000
batch_size = 32
nb_epoch = 15
nb_dense = 512
nb_hidden = 1   # The number of hidden layers to use
p_dropout = 0.5

Get the data


In [3]:
print('Loading data...')
(X_train, y_train), (X_test, y_test) = reuters.load_data(nb_words=max_words, test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

nb_classes = np.max(y_train)+1
print(nb_classes, 'classes')

print('Vectorizing sequence data...')
tokenizer = Tokenizer(nb_words=max_words)
X_train = tokenizer.sequences_to_matrix(X_train, mode='binary')
X_test = tokenizer.sequences_to_matrix(X_test, mode='binary')
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

print('Convert class vector to binary class matrix (for use with categorical_crossentropy)')
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
print('Y_train shape:', Y_train.shape)
print('Y_test shape:', Y_test.shape)


Loading data...
8982 train sequences
2246 test sequences
46 classes
Vectorizing sequence data...
X_train shape: (8982, 1000)
X_test shape: (2246, 1000)
Convert class vector to binary class matrix (for use with categorical_crossentropy)
Y_train shape: (8982, 46)
Y_test shape: (2246, 46)

Build the Neural Network


In [4]:
print('Building model...')
model = Sequential()
model.add(Dense(nb_dense, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(p_dropout))
for _ in range(nb_hidden-1):
    model.add(Dense(nb_dense))
    model.add(Activation('relu'))
    model.add(Dropout(p_dropout))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])


Building model...

Fit and Evaluate


In [5]:
import time
t1 = time.time()
history = model.fit(X_train, Y_train,
                    nb_epoch=nb_epoch, batch_size=batch_size,
                    verbose=1, validation_split=0.1)
t2 = time.time()
print('Model training took {:.2g} minutes'.format((t2-t1)/60))


Train on 8083 samples, validate on 899 samples
Epoch 1/15
8083/8083 [==============================] - 1s - loss: 1.4274 - acc: 0.6812 - val_loss: 1.0934 - val_acc: 0.7553
Epoch 2/15
8083/8083 [==============================] - 1s - loss: 0.7737 - acc: 0.8169 - val_loss: 0.9172 - val_acc: 0.7920
Epoch 3/15
8083/8083 [==============================] - 1s - loss: 0.5441 - acc: 0.8691 - val_loss: 0.8529 - val_acc: 0.8087
Epoch 4/15
8083/8083 [==============================] - 1s - loss: 0.4132 - acc: 0.8968 - val_loss: 0.8730 - val_acc: 0.8076
Epoch 5/15
8083/8083 [==============================] - 1s - loss: 0.3338 - acc: 0.9180 - val_loss: 0.8931 - val_acc: 0.8176
Epoch 6/15
8083/8083 [==============================] - 1s - loss: 0.2751 - acc: 0.9287 - val_loss: 0.9250 - val_acc: 0.8176
Epoch 7/15
8083/8083 [==============================] - 1s - loss: 0.2381 - acc: 0.9380 - val_loss: 0.9519 - val_acc: 0.8109
Epoch 8/15
8083/8083 [==============================] - 1s - loss: 0.2198 - acc: 0.9440 - val_loss: 0.9600 - val_acc: 0.8098
Epoch 9/15
8083/8083 [==============================] - 1s - loss: 0.2000 - acc: 0.9478 - val_loss: 1.0384 - val_acc: 0.7964
Epoch 10/15
8083/8083 [==============================] - 1s - loss: 0.1850 - acc: 0.9490 - val_loss: 1.0403 - val_acc: 0.7864
Epoch 11/15
8083/8083 [==============================] - 1s - loss: 0.1829 - acc: 0.9522 - val_loss: 1.0429 - val_acc: 0.7898
Epoch 12/15
8083/8083 [==============================] - 2s - loss: 0.1706 - acc: 0.9536 - val_loss: 1.1037 - val_acc: 0.7898
Epoch 13/15
8083/8083 [==============================] - 2s - loss: 0.1684 - acc: 0.9545 - val_loss: 1.0831 - val_acc: 0.7931
Epoch 14/15
8083/8083 [==============================] - 2s - loss: 0.1581 - acc: 0.9553 - val_loss: 1.1036 - val_acc: 0.7887
Epoch 15/15
8083/8083 [==============================] - 2s - loss: 0.1562 - acc: 0.9535 - val_loss: 1.1101 - val_acc: 0.7998
Model training took 0.48 minutes

In [6]:
score = model.evaluate(X_test, Y_test,
                       batch_size=batch_size, verbose=1)
print('\nTest score:', score[0])
print('Test accuracy:', score[1])


2016/2246 [=========================>....] - ETA: 0s
Test score: 1.08631595775
Test accuracy: 0.786286731968

Save fitted model


In [13]:
import output_model

In [19]:
output_model.save_model(model, 'models/Reuters_MLP_model')

In [ ]: