In this notebook, you'll implement a recurrent neural network that performs sentiment analysis. Using an RNN rather than a feedfoward network is more accurate since we can include information about the sequence of words. Here we'll use a dataset of movie reviews, accompanied by labels.
The architecture for this network is shown below.
Here, we'll pass in words to an embedding layer. We need an embedding layer because we have tens of thousands of words, so we'll need a more efficient representation for our input data than one-hot encoded vectors. You should have seen this before from the word2vec lesson. You can actually train up an embedding with word2vec and use it here. But it's good enough to just have an embedding layer and let the network learn the embedding table on it's own.
From the embedding layer, the new representations will be passed to LSTM cells. These will add recurrent connections to the network so we can include information about the sequence of words in the data. Finally, the LSTM cells will go to a sigmoid output layer here. We're using the sigmoid because we're trying to predict if this text has positive or negative sentiment. The output layer will just be a single unit then, with a sigmoid activation function.
We don't care about the sigmoid outputs except for the very last one, we can ignore the rest. We'll calculate the cost from the output of the last step and the training label.
In [1]:
import numpy as np
import tensorflow as tf
In [2]:
with open('../sentiment-network/reviews.txt', 'r') as f:
reviews = f.read()
with open('../sentiment-network/labels.txt', 'r') as f:
labels_orig = f.read()
In [3]:
reviews[:2000]
Out[3]:
The first step when building a neural network model is getting your data into the proper form to feed into the network. Since we're using embedding layers, we'll need to encode each word with an integer. We'll also want to clean it up a bit.
You can see an example of the reviews data above. We'll want to get rid of those periods. Also, you might notice that the reviews are delimited with newlines \n
. To deal with those, I'm going to split the text into each review using \n
as the delimiter. Then I can combined all the reviews back together into one big string.
First, let's remove all punctuation. Then get all the text without the newlines and split it into individual words.
In [4]:
from string import punctuation
all_text = ''.join([c for c in reviews if c not in punctuation])
reviews = all_text.split('\n')
all_text = ' '.join(reviews)
words = all_text.split()
In [5]:
all_text[:2000]
Out[5]:
In [6]:
words[:100]
Out[6]:
The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our reviews into integers so they can be passed into the network.
Exercise: Now you're going to encode the words with integers. Build a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers start at 1, not 0. Also, convert the reviews to integers and store the reviews in a new list called
reviews_ints
.
In [7]:
import pickle
with open('../embeddings/word_embeddings.pkl', 'rb') as f:
[vocab_to_int, embed_mat] = pickle.load(f)
In [8]:
embed_mat.shape
Out[8]:
In [9]:
bak = embed_mat[vocab_to_int['the'],:]
In [10]:
# Reorganize so that 0 can be the empty string.
embed_mat = np.concatenate((np.random.uniform(-1,1, (1,embed_mat.shape[1])),
embed_mat),
axis=0)
vocab_to_int = {k:v+1 for k,v in vocab_to_int.items()}
# embed_mat = embed_mat.copy()
# embed_mat.resize((embed_mat.shape[0]+1, embed_mat.shape[1]))
# embed_mat[-1,:] = embed_mat[0]
# embed_mat[0,:] = np.random.uniform(-1,1, (1,embed_mat.shape[1]))
# embed_mat.shape
In [11]:
vocab_to_int[''] = 0
In [12]:
assert(all(bak == embed_mat[vocab_to_int['the'],:]))
In [13]:
[k for k,v in vocab_to_int.items() if v == 0]
Out[13]:
In [14]:
embed_mat[vocab_to_int['stupid'],:]
Out[14]:
In [15]:
non_words = set(['','.','\n'])
extra_words = set([w for w in set(words) if w not in vocab_to_int and w not in non_words])
new_vocab = [(word, index) for index,word in enumerate(extra_words, len(vocab_to_int))]
embed_mat = np.concatenate(
(embed_mat,
np.random.uniform(-1,1, (len(extra_words), embed_mat.shape[1]))),
axis=0)
print("added {} extra words".format(len(extra_words)))
vocab_to_int.update(new_vocab)
del extra_words
del new_vocab
In [16]:
37807/63641
Out[16]:
In [17]:
reviews_ints = [[vocab_to_int[word] for word in review.split(' ') if word not in non_words] for review in reviews]
In [18]:
set([word for word in set(words) if word not in vocab_to_int])
Out[18]:
In [19]:
len(vocab_to_int)
Out[19]:
In [20]:
# Convert labels to 1s and 0s for 'positive' and 'negative'
labels = np.array([(0 if l == 'negative' else 1) for l in labels_orig.split('\n')])
If you built labels
correctly, you should see the next output.
In [21]:
from collections import Counter
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))
Okay, a couple issues here. We seem to have one review with zero length. And, the maximum review length is way too many steps for our RNN. Let's truncate to 200 steps. For reviews shorter than 200, we'll pad with 0s. For reviews longer than 200, we can truncate them to the first 200 characters.
Exercise: First, remove the review with zero length from the
reviews_ints
list.
In [22]:
x = [1,2,3]
x[:10]
Out[22]:
In [23]:
# Filter out that review with 0 length
new_values = [(review_ints[:200], label) for review_ints,label
in zip(reviews_ints, labels)
if len(review_ints) > 0]
reviews_ints, labels = zip(*new_values)
Exercise: Now, create an array
features
that contains the data we'll pass to the network. The data should come fromreview_ints
, since we want to feed integers to the network. Each row should be 200 elements long. For reviews shorter than 200 words, left pad with 0s. That is, if the review is['best', 'movie', 'ever']
,[117, 18, 128]
as integers, the row will look like[0, 0, 0, ..., 0, 117, 18, 128]
. For reviews longer than 200, use on the first 200 words as the feature vector.
This isn't trivial and there are a bunch of ways to do this. But, if you're going to be building your own deep learning networks, you're going to have to get used to preparing your data.
In [24]:
seq_len = 200
features = np.array([([0] * (seq_len-len(review))) + review for review in reviews_ints])
labels = np.array(labels)
If you build features correctly, it should look like that cell output below.
In [25]:
review = reviews_ints[0]
In [26]:
len(review)
Out[26]:
In [27]:
features[:10,:]
Out[27]:
With our data in nice shape, we'll split it into training, validation, and test sets.
Exercise: Create the training, validation, and test sets here. You'll need to create sets for the features and the labels,
train_x
andtrain_y
for example. Define a split fraction,split_frac
as the fraction of data to keep in the training set. Usually this is set to 0.8 or 0.9. The rest of the data will be split in half to create the validation and testing data.
In [28]:
split_frac = 0.8
split_tv = int(features.shape[0] * split_frac)
split_vt = int(round(features.shape[0] * (1-split_frac) / 2)) + split_tv
train_x = features[:split_tv,:]
train_y = labels[:split_vt ]
val_x = features[split_tv:split_vt,:]
val_y = labels[split_tv:split_vt]
test_x = features[split_vt:,:]
test_y = labels[split_vt: ]
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape),
"\nValidation set: \t{}".format(val_x.shape),
"\nTest set: \t\t{}".format(test_x.shape))
With train, validation, and text fractions of 0.8, 0.1, 0.1, the final shapes should look like:
Feature Shapes:
Train set: (20000, 200)
Validation set: (2500, 200)
Test set: (2500, 200)
Here, we'll build the graph. First up, defining the hyperparameters.
lstm_size
: Number of units in the hidden layers in the LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.lstm_layers
: Number of LSTM layers in the network. I'd start with 1, then add more if I'm underfitting.batch_size
: The number of reviews to feed the network in one training pass. Typically this should be set as high as you can go without running out of memory.learning_rate
: Learning rate
In [53]:
#run_number = 7
if 'run_number' in locals():
run_number += 1
else:
run_number = 1
run_number
Out[53]:
In [54]:
lstm_size = 512
lstm_layers = 1
batch_size = 500
learning_rate = 0.001
For the network itself, we'll be passing in our 200 element long review vectors. Each batch will be batch_size
vectors. We'll also be using dropout on the LSTM layer, so we'll make a placeholder for the keep probability.
Exercise: Create the
inputs_
,labels_
, and drop outkeep_prob
placeholders usingtf.placeholder
.labels_
needs to be two-dimensional to work with some functions later. Sincekeep_prob
is a scalar (a 0-dimensional tensor), you shouldn't provide a size totf.placeholder
.
In [55]:
n_words = len(vocab_to_int)
# Create the graph object
graph = tf.Graph()
# Add nodes to the graph
with graph.as_default():
inputs_ = tf.placeholder(tf.int32, shape=(None,seq_len), name='inputs')
labels_ = tf.placeholder(tf.int32, shape=(None,1), name='labels')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
In [56]:
n_words
Out[56]:
Now we'll add an embedding layer. We need to do this because there are 74000 words in our vocabulary. It is massively inefficient to one-hot encode our classes here. You should remember dealing with this problem from the word2vec lesson. Instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using word2vec, then load it here. But, it's fine to just make a new layer and let the network learn the weights.
Exercise: Create the embedding lookup matrix as a
tf.Variable
. Use that embedding matrix to get the embedded vectors to pass to the LSTM cell withtf.nn.embedding_lookup
. This function takes the embedding matrix and an input tensor, such as the review vectors. Then, it'll return another tensor with the embedded vectors. So, if the embedding layer has 200 units, the function will return a tensor with size [batch_size, 200].
In [57]:
import os
os.makedirs('./logs/{}/val'.format(run_number), exist_ok=True)
In [58]:
word_order = {v:k for k,v in vocab_to_int.items()}
embedding_metadata_file = './logs/{}/val/metadata.tsv'.format(run_number)
with open(embedding_metadata_file, 'w') as f:
for i in range(len(word_order)):
f.write(word_order[i]+'\n')
In [59]:
projector_config = tf.contrib.tensorboard.plugins.projector.ProjectorConfig()
embedding_config = projector_config.embeddings.add()
In [60]:
embed_mat.dtype
Out[60]:
In [61]:
# Size of the embedding vectors (number of units in the embedding layer)
if False:
embed_size = 300
with graph.as_default():
with tf.name_scope('embedding'):
embedding = tf.Variable(
tf.random_uniform((n_words,embed_size),
-1,1), name="word_embedding")
embedding_config.tensor_name = embedding.name
embedding_config.metadata_path = embedding_metadata_file
embed = tf.nn.embedding_lookup(embedding, inputs_)
tf.summary.histogram('embedding', embedding)
else:
embed_size = embed_mat.shape[1]
with graph.as_default():
with tf.name_scope('embedding'):
embedding = tf.Variable(embed_mat, name="word_embedding", dtype=tf.float32)
embedding_config.tensor_name = embedding.name
embedding_config.metadata_path = embedding_metadata_file
embed = tf.nn.embedding_lookup(embedding, inputs_)
tf.summary.histogram('embedding', embedding)
Next, we'll create our LSTM cells to use in the recurrent network (TensorFlow documentation). Here we are just defining what the cells look like. This isn't actually building the graph, just defining the type of cells we want in our graph.
To create a basic LSTM cell for the graph, you'll want to use tf.contrib.rnn.BasicLSTMCell
. Looking at the function documentation:
tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, input_size=None, state_is_tuple=True, activation=<function tanh at 0x109f1ef28>)
you can see it takes a parameter called num_units
, the number of units in the cell, called lstm_size
in this code. So then, you can write something like
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
to create an LSTM cell with num_units
. Next, you can add dropout to the cell with tf.contrib.rnn.DropoutWrapper
. This just wraps the cell in another cell, but with dropout added to the inputs and/or outputs. It's a really convenient way to make your network better with almost no effort! So you'd do something like
drop = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)
Most of the time, your network will have better performance with more layers. That's sort of the magic of deep learning, adding more layers allows the network to learn really complex relationships. Again, there is a simple way to create multiple layers of LSTM cells with tf.contrib.rnn.MultiRNNCell
:
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
Here, [drop] * lstm_layers
creates a list of cells (drop
) that is lstm_layers
long. The MultiRNNCell
wrapper builds this into multiple layers of RNN cells, one for each cell in the list.
So the final cell you're using in the network is actually multiple (or just one) LSTM cells with dropout. But it all works the same from an achitectural viewpoint, just a more complicated graph in the cell.
Exercise: Below, use
tf.contrib.rnn.BasicLSTMCell
to create an LSTM cell. Then, add drop out to it withtf.contrib.rnn.DropoutWrapper
. Finally, create multiple LSTM layers withtf.contrib.rnn.MultiRNNCell
.
Here is a tutorial on building RNNs that will help you out.
In [62]:
with graph.as_default():
with tf.name_scope('LSTM'):
# Your basic LSTM cell
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Add dropout to the cell
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
# Stack up multiple LSTM layers, for deep learning
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
# Getting an initial state of all zeros
initial_state = cell.zero_state(batch_size, tf.float32)
Now we need to actually run the data through the RNN nodes. You can use tf.nn.dynamic_rnn
to do this. You'd pass in the RNN cell you created (our multiple layered LSTM cell
for instance), and the inputs to the network.
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
Above I created an initial state, initial_state
, to pass to the RNN. This is the cell state that is passed between the hidden layers in successive time steps. tf.nn.dynamic_rnn
takes care of most of the work for us. We pass in our cell and the input to the cell, then it does the unrolling and everything else for us. It returns outputs for each time step and the final_state of the hidden layer.
Exercise: Use
tf.nn.dynamic_rnn
to add the forward pass through the RNN. Remember that we're actually passing in vectors from the embedding layer,embed
.
In [63]:
with graph.as_default():
with tf.name_scope('LSTM'):
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state)
In [64]:
with graph.as_default():
with tf.name_scope('Prediction'):
predictions = tf.contrib.layers.fully_connected(outputs[:, -1], 1, activation_fn=tf.sigmoid)
with tf.name_scope('Loss'):
cost = tf.losses.mean_squared_error(labels_, predictions)
tf.summary.scalar('cost', cost)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
In [65]:
with graph.as_default():
with tf.name_scope('Accuracy'):
correct_pred = tf.equal(tf.cast(tf.round(predictions), tf.int32), labels_)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
tf.summary.scalar('accuracy',accuracy)
In [66]:
def get_batches(x, y, batch_size=100):
n_batches = len(x)//batch_size
x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
for ii in range(0, len(x), batch_size):
yield x[ii:ii+batch_size], y[ii:ii+batch_size]
In [ ]:
In [67]:
train_y.mean()
Out[67]:
In [68]:
epochs = 20
with graph.as_default():
merged = tf.summary.merge_all()
saver = tf.train.Saver()
with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer())
train_writer = tf.summary.FileWriter('./logs/{}/train'.format(run_number), sess.graph)
val_writer = tf.summary.FileWriter('./logs/{}/val'.format(run_number))
tf.contrib.tensorboard.plugins.projector.visualize_embeddings(val_writer, embedding_config)
iteration = 1
for e in range(epochs):
state = sess.run(initial_state)
for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 0.5,
initial_state: state}
summary, loss, state, _ = sess.run([merged, cost, final_state, optimizer],
feed_dict=feed)
train_writer.add_summary(summary, iteration)
if iteration%5==0:
print("Epoch: {}/{}".format(e, epochs),
"Iteration: {}".format(iteration),
"Train loss: {:.3f}".format(loss))
if iteration%25==0:
val_acc = []
val_state = sess.run(cell.zero_state(batch_size, tf.float32))
for x, y in get_batches(val_x, val_y, batch_size):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 1,
initial_state: val_state}
summary, batch_acc, val_state = sess.run([merged, accuracy, final_state],
feed_dict=feed)
val_acc.append(batch_acc)
val_writer.add_summary(summary, iteration)
saver.save(sess, './logs/{}/model.ckpt'.format(run_number), iteration)
print("Val acc: {:.3f}".format(np.mean(val_acc)))
iteration +=1
saver.save(sess, "checkpoints/sentiment.ckpt")
In [69]:
train_writer.flush()
val_writer.flush()
In [ ]:
In [70]:
test_acc = []
with tf.Session(graph=graph) as sess:
saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
test_state = sess.run(cell.zero_state(batch_size, tf.float32))
for ii, (x, y) in enumerate(get_batches(test_x, test_y, batch_size), 1):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 1,
initial_state: test_state}
batch_acc, test_state = sess.run([accuracy, final_state], feed_dict=feed)
test_acc.append(batch_acc)
print("Test accuracy: {:.3f}".format(np.mean(test_acc)))
Test accuracy: 0.748
Test accuracy: 0.784
In [71]:
print(features.shape)
In [82]:
def TestSomeText(text):
text = text.lower()
delete = ['.','!',',','"',"'",'\n']
for d in delete:
text = text.replace(d," ")
text_ints = [vocab_to_int[word] for word in text.split(' ') if word in vocab_to_int]
print(len(text_ints))
text_ints = text_ints[:seq_len]
#print(text_ints)
#text_features = np.zeros((batch_size,seq_len))
text_features = np.array([([0] * (seq_len-len(text_ints))) + text_ints] * batch_size)
#print(text_features)
#print(text_features.shape)
with tf.Session(graph=graph) as sess:
saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
test_state = sess.run(cell.zero_state(batch_size, tf.float32))
for i in range(1):
feed = {inputs_: text_features,
labels_: [[0]]*batch_size,
keep_prob: 1,
initial_state: test_state}
pred, mycost, test_state = sess.run([predictions, accuracy, final_state], feed_dict=feed)
return pred[0,0]
In [73]:
TestSomeText("highly underrated movie")
#pred[0,0]
Out[73]:
In [74]:
TestSomeText('overrated movie')
Out[74]:
In [83]:
TestSomeText("""I ve been looking forward to a viking film or TV series for many years
and when my wishes were finally granted, I was very worried that this production
was going to be total crap. After viewing the first two episodes I do not worry
about that anymore. Thank you, Odin
As a person of some historical knowledge of the viking era, I can point out numerous
flaws - but they don't ruin the story for me, so I will let them slip. Historical
accounts about those days are, after all, not entirely reliable.
Happy to see Travis Fimmel in a role that totally suits him. A physical and intense
character, with that spice of humor that is the viking trademark from the sagas.
Gabriel Byrne plays a stern leader, that made me think of him in "Prince of Jutland",
and Clive Standen seems like he's going to surprise us.
Been pondering the Game of Thrones comparison, since I love that show too, but in my
opinion Vikings has its own thing going on. Way fewer lead characters to begin with,
and also a more straight forward approach. Plenty of room for more series with this
high class!
Can I wish for more than the planned nine episodes, PLEASE!!!""")
Out[83]:
In [96]:
TestSomeText("""vikings""")
Out[96]:
In [76]:
delete = ['.','!',',','"',"'",'\n']
for d in delete:
text = text.replace(d," ")
In [77]:
TestSomeText("""Pirates of the Caribbean has always been a franchise that makes no attempt for Oscar worthy material but in its own way is massively enjoyable.
Pirates of the Caribbean: Dead Men Tell No Tales certainly embraces the aspects of the original movie while also incorporating new plot lines that fit in well with plots from the original story. With the introduction of Henry and Karina there is a new love interest that is provided to the audience that rivals that of Will and Elizabeth Turner's.
Henry Turner is portrayed as an almost exact copy of his father except just a teensy bit worse at sword fighting while Karina differs from the usual women as she remains just as important, if not more, as Henry as she guides the course towards Posiedon's trident.
Jack Sparrow is entertaining as always with his usual drunk characteristics. For those of you who are tired of Sparrow acting this way Don't SEE THE MOVIE Jack sparrow isn't going to change because it doesn't make sense for his character to suddenly change.
All together the movie was expertly written and expertly performed by the entire cast even Kiera Knightely who didn't manage to get one word throughout the whole movie. I know as a major fan of the Pirates of the Caribbean I can't wait to see what happens for the future of the franchise.
""")
Out[77]:
In [ ]:
pred,mycost = TestSomeText(text)
pred[0,0]
In [78]:
TestSomeText("""If your child is a fan of the Wimpy Kid series, they'll no doubt enjoy this one, it's entertaining and lowbrow enough to also appease the moodiest of teens and grumpiest adults.""")
Out[78]:
In [79]:
TestSomeText("""At first I thought the film was going to be just a normal thriller but it turned out to be a thousand times better than I expected. The film is truly original and was so dark & sinister that gives the tensive mood also it is emotionally & psychologically thrilling, the whole movie is charged with pulse pounding suspense and seems like it's really happening. It's amazing that how they managed to make an 80 minute movie with just a guy in a phone booth but the full credit goes to Colin Farrell and Larry Cohen the writer not Joel Schumacher because he is a crappy director. Joel Schumacher's films are rubbish especially The Number 23, Phone Booth was shot in 10 days with a budget of $10 million so it wasn't a hard job to make it, that's why Joel doesn't get any credit but the cast & crew did a fantastic job. I also really liked the raspberry coloured shirt Colin was wearing and it was an excellent choice of clothing because the viewers are going to watch him throughout the whole film. When I first saw the movie I fell in love with it and I bought it on DVD the next day and I've seen it about 20 times and I'm still not fed up with it. Phone Booth is and always will be Colin Farrell's best film! Overall it is simply one of my favourite films and I even argued over my friend because he didn't like it.
""")
Out[79]:
In [ ]:
delete = ['.','!',',','"',"'",'\n']
for d in delete:
text = text.replace(d," ")
In [ ]:
text
In [ ]:
pred,mycost = TestSomeText(text)
pred[0,0]
In [80]:
TestSomeText("""There are few quality movies or series about the Vikings, and this series is outstanding and well worth waiting for. Not only is Vikings a series that is a joy to watch, it is also a series that is easy to recommend. I personally feel that the creator and producers did a fine job of giving the viewer quality material. Now, there are a few inconsistencies with the series, most notably would be the idea that Vikings had very little knowledge of other European countries and were amazed by these people across the big waters. In reality Vikings engaged in somewhat normal commercial activities with other Anglo-Saxons, so the idea that Vikings were as amazed as they seemed when they realize that other people were out there is not that realistic. However, it is this small inconsistency that goes a long way in holding the premise together. I simply love the series and would recommend it to anyone wanting to watch a quality show.""")
Out[80]:
In [ ]:
delete = ['.','!',',','"',"'",'\n']
for d in delete:
text = text.replace(d," ")
In [ ]:
pred,mycost = TestSomeText(text)
pred[0,0]
In [81]:
TestSomeText("""This movie didn't feel any different from the other anime movies out there. Sure, the sibling dynamics were good, as well as the family values, the childhood memories and older brother anxiety. The main idea was interesting, with the new baby seeming rather like a boss sent into the family to spy on the parents and solve a big problem for his company. You can't help but identify with the older kid, especially if you have younger siblings. But eventually, the action was a bit main stream. The action scenes were not original and kind of boring. Other than that, the story became a little complicated when you start to think about what's real and what's not. The narration was good and the animation was nice, with the cute babies and puppies. So, 4 out of 10.
""")
Out[81]:
In [ ]:
delete = ['.','!',',','"',"'",'\n']
for d in delete:
text = text.replace(d," ")
In [ ]:
text
In [ ]:
pred,mycost = TestSomeText(text)
pred[0,0]
In [98]:
TestSomeText('seriously awesome movie')
Out[98]:
In [ ]: