Sentiment Analysis of Movie Reviews

In this tutorial, we will load a trained model and perform inference on a new movie review.

Setup

As before, we first create a computational backend to tell neon on what device to execute the computation.


In [ ]:
from neon.backends import gen_backend

be = gen_backend(backend='gpu', batch_size=1)
print be

We also define a few parameters, and the load the vocabulary. The vocab is a 1:1 mapping of words to numbers. The file imdb.vocab can be downloaded from https://s3-us-west-1.amazonaws.com/nervana-course/imdb.vocab and placed in the data directory.


In [ ]:
import pickle as pkl

sentence_length = 128
vocab_size = 20000

# we have some special codes
pad_char = 0  # padding character
start = 1  # marker for start of review
oov = 2  # when the word is out of the vocab
index_from = 3  # index of first word in vocab

# load the vocab
vocab, rev_vocab = pkl.load(open('data/imdb.vocab', 'rb'))

Load Model

To load the model, we just pass in the saved model file. neon will automatically generate the layers specified in the model file and load the corresponding weights.


In [ ]:
from neon.models import Model

model = Model('imdb_lstm.pkl')

# we initialize the model, passing in the size of the input data.
model.initialize(dataset=(sentence_length, 1))

Inference

We first generate some buffers on both the host (CPU) and the device (GPU) to hold the input data that we would like to pass to the model for inference. Below the variable be is the backend that we creater with gen_backend earlier in the code. Our backend supports numpy-like functions for allocating buffers on the compute device.


In [ ]:
import numpy as np

input_device = be.zeros((sentence_length, 1), dtype=np.int32)  # `be` is the backend that we created earlier in the code.
input_numpy = np.zeros((sentence_length, 1), dtype=np.int32)

Now we write our new movie review. We've included a sample here, but feel free to write your own and see how well the model responds.

POSITIVE:

"The pace is steady and constant, the characters full and engaging, the relationships and interactions natural showing that you do not need floods of tears to show emotion, screams to show fear, shouting to show dispute or violence to show anger. Naturally Joyce's short story lends the film a ready made structure as perfect as a polished diamond, but the small changes Huston makes such as the inclusion of the poem fit in neatly. It is truly a masterpiece of tact, subtlety and overwhelming beauty."

NEGATIVE:

"Beautiful attracts excellent idea, but ruined with a bad selection of the actors. The main character is a loser and his woman friend and his friend upset viewers. Apart from the first episode all the other become more boring and boring. First, it considers it illogical behavior. No one normal would not behave the way the main character behaves. It all represents a typical Halmark way to endear viewers to the reduced amount of intelligence. Does such a scenario, or the casting director and destroy this question is on Halmark producers. Cat is the main character is wonderful. The main character behaves according to his friend selfish."

NEUTRAL:

"The characters voices were very good. I was only really bothered by Kanga. The music, however, was twice as loud in parts than the dialog, and incongruous to the film. As for the story, it was a bit preachy and militant in tone. Overall, I was disappointed, but I would go again just to see the same excitement on my child's face. I liked Lumpy's laugh..."


In [ ]:
line = """Beautiful attracts excellent idea, but ruined with a bad selection of the actors. The main character is
          a loser and his woman friend and his friend upset viewers. Apart from the first episode all the other become 
          more boring and boring. First, it considers it illogical behavior. No one normal would not behave the way the 
          main character behaves. It all represents a typical Halmark way to endear viewers to the reduced amount of 
          intelligence. Does such a scenario, or the casting director and destroy this question is on Halmark 
          producers. Cat is the main character is wonderful. The main character behaves according to 
          his friend selfish."""

Before we send the data to the model, we need to convert the string to a sequence of numbers, with each number representing a word, using the vocab that we loaded earlier in the code. If a word is not in our vocab, we use a special out-of-vocab number.


In [ ]:
from neon.data.text_preprocessing import clean_string

tokens = clean_string(line).strip().split()

sent = [len(vocab) + 1 if t not in vocab else vocab[t] for t in tokens]
sent = [start] + [w + index_from for w in sent]
sent = [oov if w >= vocab_size else w for w in sent]

The text data is now converted to a list of integers:


In [ ]:
print sent

We truncate the input to sentence_length=128 words. If the text is less than 128 words, we pad with zeros. The text is then loaded into the numpy array named input_host.


In [ ]:
trunc = sent[-sentence_length:]  # take the last sentence_length words

input_numpy[:] = 0  # fill with zeros
input_numpy[-len(trunc):, 0] = trunc   # place the input into the numpy array
print input_numpy.T

In [ ]:
input_device.set(input_numpy)  # copy the numpy array to device
y_pred = model.fprop(input_device, inference=True)  # run the forward pass through the model

print("Predicted sentiment: {}".format(y_pred.get()[1]))  # print the estimated sentiment

Experimentation

To make it easy for you to experiment with the model inference, below we wrap all the text above into a single function that you can call.


In [ ]:
def sentiment(sent, model):
    input_device = be.zeros((sentence_length, 1), dtype=np.int32)
    input_numpy = np.zeros((sentence_length, 1), dtype=np.int32) 
    tokens = clean_string(line).strip().split()

    sent = [len(vocab) + 1 if t not in vocab else vocab[t] for t in tokens]
    sent = [start] + [w + index_from for w in sent]
    sent = [oov if w >= vocab_size else w for w in sent]
    
    trunc = sent[-sentence_length:]  # take the last sentence_length words

    input_numpy[:] = 0  # fill with zeros
    input_numpy[-len(trunc):, 0] = trunc   # place the input into the numpy array
    input_device.set(input_numpy)  # copy the numpy array to device
    y_pred = model.fprop(input_device, inference=True)  # run the forward pass through the model
    
    return y_pred.get()[1]

Now you can easily enter your own review and get the result. Here we included a more neutral review below:


In [ ]:
line = """The characters voices were very good. I was only really bothered by Kanga. The music, however, was twice 
          as loud in parts than the dialog, and incongruous to the film. As for the story, it was a bit preachy and 
          militant in tone. Overall, I was disappointed, but I would go again just to see the same excitement on my 
          child's face. I liked Lumpy's laugh..."""

result = sentiment(line, model)
print("Sentiment: {}".format(result))

In [ ]: