# Sentiment Classification & How To "Frame Problems" for a Neural Network

### What You Should Already Know

• neural networks, forward and back-propagation
• mean squared error
• and train/test splits

### Where to Get Help if You Need it

• Re-watch previous Udacity Lectures
• Leverage the recommended Course Reading Material - Grokking Deep Learning (40% Off: traskud17)
• Shoot me a tweet @iamtrask

### Tutorial Outline:

• Intro: The Importance of "Framing a Problem"
• Curate a Dataset
• Developing a "Predictive Theory"
• PROJECT 1: Quick Theory Validation
• Transforming Text to Numbers
• PROJECT 2: Creating the Input/Output Data
• Putting it all together in a Neural Network
• PROJECT 3: Building our Neural Network
• Understanding Neural Noise
• PROJECT 4: Making Learning Faster by Reducing Noise
• Analyzing Inefficiencies in our Network
• PROJECT 5: Making our Network Train and Run Faster
• Further Noise Reduction
• PROJECT 6: Reducing Noise by Strategically Reducing the Vocabulary
• Analysis: What's going on in the weights?

# Lesson: Curate a Dataset

``````

In [50]:

def pretty_print_review_and_label(i):
print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
g.close()

g = open('labels.txt','r') # What we WANT to know!
g.close()

``````
``````

In [51]:

len(reviews)

``````
``````

Out[51]:

25000

``````
``````

In [52]:

reviews[0]

``````
``````

Out[52]:

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   '

``````
``````

In [53]:

labels[0]

``````
``````

Out[53]:

'POSITIVE'

``````

# Lesson: Develop a Predictive Theory

``````

In [54]:

print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)

``````
``````

labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...

``````

# Project 1: Quick Theory Validation

``````

In [98]:

import numpy as np

``````
``````

In [114]:

bag_of_words = {}
pos_words = {}
neg_words = {}

for i in range(len(reviews)):
words = reviews[i].split(' ')
for word in words:
if word in bag_of_words.keys():
bag_of_words[word] += 1
else:
bag_of_words[word] = 1
pos_words[word] = 0
neg_words[word] = 0

if labels[i] == 'POSITIVE':
if word in pos_words.keys():
pos_words[word] += 1
elif labels[i] == 'NEGATIVE':
if word in neg_words.keys():
neg_words[word] += 1

words_pos_neg_ratio = []
for word in bag_of_words.keys():
if bag_of_words[word] > 500:
pos_neg_ratio = pos_words[word] / float(neg_words[word] + 1)
words_pos_neg_ratio.append((word, np.log(pos_neg_ratio)))

words_pos_neg_ratio = sorted(words_pos_neg_ratio, key=lambda x: x[1], reverse=True)

``````
``````

In [115]:

print('\nTop positive words: \n')
for i in range(10):
print(words_pos_neg_ratio[i][0],': ', round(words_pos_neg_ratio[i][1], 10), sep='')

``````
``````

Top positive words:

superb: 1.7091514459
wonderful: 1.5645425925
fantastic: 1.5048433869
excellent: 1.4647538506
amazing: 1.3919815802
powerful: 1.2999662776
favorite: 1.2668956298
perfect: 1.2467424807
brilliant: 1.2287554138
perfectly: 1.1971931173

``````
``````

In [116]:

print('\nTop negative words: \n')
for i in range(-1, -11, -1):
print(words_pos_neg_ratio[i][0],': ', round(words_pos_neg_ratio[i][1], 10), sep='')

``````
``````

Top negative words:

waste: -2.619384564
pointless: -2.45530618
worst: -2.2869878962
awful: -2.227194247
poorly: -2.2207550747
lame: -1.9817674589
horrible: -1.910259094
wasted: -1.8382794849
crap: -1.8281271134

``````
``````

In [ ]:

``````