Text Classification of Movie Reviews (Keras & TensorFlow Hub)

This text classification example:

  • trains a simple neural network on the IMDB large movie review dataset for sentiment analysis.
  • uses verta's Python client logging observations and artifacts

This notebook classifies movie reviews as positive or negative using the text of the review. This is an example of binary—or two-class—classification, an important and widely applicable kind of machine learning problem.

The tutorial demonstrates the basic application of transfer learning with TensorFlow Hub and Keras.

We'll use the IMDB dataset that contains the text of 50,000 movie reviews from the Internet Movie Database. These are split into 25,000 reviews for training and 25,000 reviews for testing. The training and testing sets are balanced, meaning they contain an equal number of positive and negative reviews.

This notebook uses tf.keras, a high-level API to build and train models in TensorFlow, and TensorFlow Hub, a library and platform for transfer learning.

Set Up Environment

This notebook has been tested with the following package versions:
(you may need to change pip to pip3, depending on your own Python environment)


In [1]:
# Python 3.6
!pip install verta
!pip install matplotlib==3.1.1
!pip install tensorflow==2.0.0-beta1
!pip install tensorflow-hub==0.5.0
!pip install tensorflow-datasets==1.0.2

Set Up Verta


In [2]:
HOST = 'app.verta.ai'

PROJECT_NAME = 'Text-Classification'
EXPERIMENT_NAME = 'basic-clf'

In [3]:
# import os
# os.environ['VERTA_EMAIL'] = 
# os.environ['VERTA_DEV_KEY'] =

In [4]:
from verta import Client
from verta.utils import ModelAPI

client = Client(HOST, use_git=False)

proj = client.set_project(PROJECT_NAME)
expt = client.set_experiment(EXPERIMENT_NAME)
run = client.set_experiment_run()

Imports


In [5]:
from __future__ import absolute_import, division, print_function, unicode_literals
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

import numpy as np
import six

import matplotlib.pyplot as plt

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.test.is_gpu_available() else "NOT AVAILABLE")

Download the IMDB dataset

The IMDB dataset is available on TensorFlow datasets. The following code downloads the IMDB dataset to your machine:


In [6]:
# Split the training set into 60% and 40%, so we'll end up with 15,000 examples
# for training, 10,000 examples for validation and 25,000 examples for testing.
train_validation_split = tfds.Split.TRAIN.subsplit([6, 4])

(train_data, validation_data), test_data = tfds.load(
    name="imdb_reviews", 
    split=(train_validation_split, tfds.Split.TEST),
    as_supervised=True)

Explore the data

Let's take a moment to understand the format of the data. Each example is a sentence representing the movie review and a corresponding label.Let's print first 10 examples.


In [7]:
train_examples_batch, train_labels_batch = next(iter(train_data.batch(2)))
train_examples_batch

Let's also print the first 2 labels.


In [8]:
train_labels_batch

Build the model

In this example, the input data consists of sentences. The labels to predict are either 0 or 1. One way to represent the text is to convert sentences into embeddings vectors. We can use a pre-trained text embedding as the first layer, we don't have to worry about text preprocessing.

For this example we will use a pre-trained text embedding model from TensorFlow Hub called google/tf2-preview/gnews-swivel-20dim/1.


In [9]:
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[], 
                           dtype=tf.string, trainable=True)
hub_layer(train_examples_batch[:3])

Let's now build the full model:


In [10]:
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.summary()

The layers are stacked sequentially to build the classifier:

  1. The first layer is a TensorFlow Hub layer. This layer uses a pre-trained Saved Model to map a sentence into its embedding vector. The pre-trained text embedding model that we are using (google/tf2-preview/gnews-swivel-20dim/1) splits the sentence into tokens, embeds each token and then combines the embedding. The resulting dimensions are: (num_examples, embedding_dimension).
  2. This fixed-length output vector is piped through a fully-connected (Dense) layer with 16 hidden units.
  3. The last layer is densely connected with a single output node. Using the sigmoid activation function, this value is a float between 0 and 1, representing a probability, or confidence level.

Let's compile the model.


In [11]:
hyperparams = {'optimizer':'adam',
               'loss':'binary_crossentropy',
               'metrics':'accuracy', 
               'train_batch_size':512,
               'num_epochs':20, 
               'validation_batch_size':512, 
               'test_batch_size':512,
              }

run.log_hyperparameters(hyperparams)

In [12]:
model.compile(optimizer=hyperparams['optimizer'],
              loss=hyperparams['loss'],
              metrics=[hyperparams['metrics']])

Train the model

Train the model for 20 epochs in mini-batches of 512 samples. This is 20 iterations over all samples in the x_train and y_train tensors. While training, monitor the model's loss and accuracy on the 10,000 samples from the validation set:


In [13]:
# called at the end of each epoch - logging loss, accuracy as observations for the run
class LossAndErrorLoggingCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        print('The average loss for epoch {} is {:7.2f}, accuracy is {:7.2f}.'.format(epoch, logs['loss'], logs['accuracy']))
        run.log_observation("train_loss", float(logs['loss']))
        run.log_observation("train_acc", float(logs['accuracy']))
        run.log_observation("val_loss", float(logs['val_loss']))
        run.log_observation("val_acc", float(logs['val_accuracy']))

In [14]:
history = model.fit(train_data.shuffle(10000).batch(hyperparams['train_batch_size']),
                    epochs=hyperparams['num_epochs'],
                    validation_data=validation_data.batch(hyperparams['validation_batch_size']),
                    callbacks=[LossAndErrorLoggingCallback()])

Evaluate the model

And let's see how the model performs. Two values will be returned. Loss (a number which represents our error, lower values are better), and accuracy.


In [15]:
results = model.evaluate(test_data.batch(512), verbose=0)
for name, value in zip(model.metrics_names, results):
    print("%s: %.3f" % (name, value))
    run.log_metric(name, value)

With more advanced approaches, the model should get closer to 95%.

Plotting our metrics


In [16]:
def plot_graphs(history, string, run, plot_title):
    plt.plot(history.history[string])
    plt.plot(history.history['val_'+string])
    plt.xlabel('Epochs')
    plt.ylabel(string)
    plt.legend([string, 'val_'+string])
    run.log_image(plot_title, plt)
    plt.show()

In [17]:
# plotting graphs to see variation in accuracy and loss
plot_graphs(history, 'accuracy', run, 'accuracy_plt')
plot_graphs(history, 'loss', run, 'loss_plt')

Prediction with the model


In [18]:
sample_pred = np.array(["The new Spiderman movie is a fun watch. Loved it!"])
model.predict(sample_pred)