# Sentiment Classification & How To "Frame Problems" for a Neural Network

### What You Should Already Know

• neural networks, forward and back-propagation
• mean squared error
• and train/test splits

### Where to Get Help if You Need it

• Re-watch previous Udacity Lectures
• Leverage the recommended Course Reading Material - Grokking Deep Learning (40% Off: traskud17)
• Shoot me a tweet @iamtrask

### Tutorial Outline:

• Intro: The Importance of "Framing a Problem"
• Curate a Dataset
• Developing a "Predictive Theory"
• PROJECT 1: Quick Theory Validation
• Transforming Text to Numbers
• PROJECT 2: Creating the Input/Output Data
• Putting it all together in a Neural Network
• PROJECT 3: Building our Neural Network
• Understanding Neural Noise
• PROJECT 4: Making Learning Faster by Reducing Noise
• Analyzing Inefficiencies in our Network
• PROJECT 5: Making our Network Train and Run Faster
• Further Noise Reduction
• PROJECT 6: Reducing Noise by Strategically Reducing the Vocabulary
• Analysis: What's going on in the weights?

# Lesson: Curate a Dataset

``````

In [1]:

def pretty_print_review_and_label(i):
print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
g.close()

g = open('labels.txt','r') # What we WANT to know!
g.close()

``````
``````

In [2]:

len(reviews)

``````
``````

Out[2]:

25000

``````
``````

In [5]:

reviews[0]

``````
``````

Out[5]:

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   '

``````
``````

In [6]:

labels[0]

``````
``````

Out[6]:

'POSITIVE'

``````

# Lesson: Develop a Predictive Theory

``````

In [7]:

print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)

``````
``````

labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...

``````

# Project 1: Quick Theory Validation

``````

In [9]:

from collections import Counter
import numpy as np

``````
``````

In [10]:

positive_counts = Counter()
negative_counts = Counter()
total_counts = Counter()

``````
``````

In [11]:

for i in range(len(reviews)):
if(labels[i] == 'POSITIVE'):
for word in reviews[i].split(" "):
positive_counts[word] += 1
total_counts[word] += 1
else:
for word in reviews[i].split(" "):
negative_counts[word] += 1
total_counts[word] += 1

``````
``````

In [12]:

positive_counts.most_common()

``````
``````

Out[12]:

``````
``````

In [20]:

pos_neg_ratios = Counter()

for term,cnt in list(total_counts.most_common()):
if(cnt > 100):
pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)
pos_neg_ratios[term] = pos_neg_ratio

for word,ratio in pos_neg_ratios.most_common():
if(ratio > 1):
pos_neg_ratios[word] = np.log(ratio)
else:
pos_neg_ratios[word] = -np.log((1 / (ratio+0.01)))

``````
``````

In [21]:

# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()

``````
``````

Out[21]:

[('edie', 4.6913478822291435),
('paulie', 4.0775374439057197),
('felix', 3.1527360223636558),
('polanski', 2.8233610476132043),
('matthau', 2.8067217286092401),
('victoria', 2.6810215287142909),
('mildred', 2.6026896854443837),
('gandhi', 2.5389738710582761),
('flawless', 2.451005098112319),
('superbly', 2.2600254785752498),
('perfection', 2.1594842493533721),
('astaire', 2.1400661634962708),
('captures', 2.0386195471595809),
('voight', 2.0301704926730531),
('wonderfully', 2.0218960560332353),
('powell', 1.9783454248084671),
('brosnan', 1.9547990964725592),
('lily', 1.9203768470501485),
('bakshi', 1.9029851043382795),
('lincoln', 1.9014583864844796),
('refreshing', 1.8551812956655511),
('breathtaking', 1.8481124057791867),
('bourne', 1.8478489358790986),
('lemmon', 1.8458266904983307),
('delightful', 1.8002701588959635),
('flynn', 1.7996646487351682),
('andrews', 1.7764919970972666),
('homer', 1.7692866133759964),
('beautifully', 1.7626953362841438),
('soccer', 1.7578579175523736),
('elvira', 1.7397031072720019),
('underrated', 1.7197859696029656),
('gripping', 1.7165360479904674),
('superb', 1.7091514458966952),
('delight', 1.6714733033535532),
('welles', 1.6677068205580761),
('sinatra', 1.6389967146756448),
('touching', 1.637217476541176),
('timeless', 1.62924053973028),
('macy', 1.6211339521972916),
('unforgettable', 1.6177367152487956),
('favorites', 1.6158688027643908),
('stewart', 1.6119987332957739),
('hartley', 1.6094379124341003),
('sullivan', 1.6094379124341003),
('extraordinary', 1.6094379124341003),
('brilliantly', 1.5950491749820008),
('friendship', 1.5677652160335325),
('wonderful', 1.5645425925262093),
('palma', 1.5553706911638245),
('magnificent', 1.54663701119507),
('finest', 1.5462590108125689),
('jackie', 1.5439233053234738),
('ritter', 1.5404450409471491),
('tremendous', 1.5184661342283736),
('freedom', 1.5091151908062312),
('fantastic', 1.5048433868558566),
('terrific', 1.5026699370083942),
('noir', 1.493925025312256),
('sidney', 1.493925025312256),
('outstanding', 1.4910053152089213),
('mann', 1.4894785973551214),
('pleasantly', 1.4894785973551214),
('nancy', 1.488077055429833),
('marie', 1.4825711915553104),
('marvelous', 1.4739999415389962),
('excellent', 1.4647538505723599),
('ruth', 1.4596256342054401),
('stanwyck', 1.4412101187160054),
('widmark', 1.4350845252893227),
('splendid', 1.4271163556401458),
('chan', 1.423108334242607),
('exceptional', 1.4201959127955721),
('tender', 1.410986973710262),
('gentle', 1.4078005663408544),
('poignant', 1.4022947024663317),
('gem', 1.3932148039644643),
('amazing', 1.3919815802404802),
('chilling', 1.3862943611198906),
('captivating', 1.3862943611198906),
('fisher', 1.3862943611198906),
('davies', 1.3862943611198906),
('darker', 1.3652409519220583),
('april', 1.3499267169490159),
('kelly', 1.3461743673304654),
('blake', 1.3418425985490567),
('overlooked', 1.329135947279942),
('ralph', 1.32818673031261),
('bette', 1.3156767939059373),
('hoffman', 1.3150668518315229),
('cole', 1.3121863889661687),
('shines', 1.3049487216659381),
('powerful', 1.2999662776313934),
('notch', 1.2950456896547455),
('remarkable', 1.2883688239495823),
('pitt', 1.286210902562908),
('winters', 1.2833463918674481),
('vivid', 1.2762934659055623),
('gritty', 1.2757524867200667),
('giallo', 1.2745029551317739),
('portrait', 1.2704625455947689),
('innocence', 1.2694300209805796),
('psychiatrist', 1.2685113254635072),
('favorite', 1.2668956297860055),
('ensemble', 1.2656663733312759),
('stunning', 1.2622417124499117),
('burns', 1.259880436264232),
('garbo', 1.258954938743289),
('barbara', 1.2580400255962119),
('panic', 1.2527629684953681),
('holly', 1.2527629684953681),
('philip', 1.2527629684953681),
('carol', 1.2481440226390734),
('perfect', 1.246742480713785),
('appreciated', 1.2462482874741743),
('favourite', 1.2411123512753928),
('journey', 1.2367626271489269),
('rural', 1.235471471385307),
('bond', 1.2321436812926323),
('builds', 1.2305398317106577),
('brilliant', 1.2287554137664785),
('brooklyn', 1.2286654169163074),
('von', 1.225175011976539),
('unfolds', 1.2163953243244932),
('recommended', 1.2163953243244932),
('daniel', 1.20215296760895),
('perfectly', 1.1971931173405572),
('crafted', 1.1962507582320256),
('prince', 1.1939224684724346),
('troubled', 1.192138346678933),
('consequences', 1.1865810616140668),
('haunting', 1.1814999484738773),
('cinderella', 1.180052620608284),
('alexander', 1.1759989522835299),
('emotions', 1.1753049094563641),
('boxing', 1.1735135968412274),
('subtle', 1.1734135017508081),
('curtis', 1.1649873576129823),
('rare', 1.1566438362402944),
('loved', 1.1563661500586044),
('daughters', 1.1526795099383853),
('courage', 1.1438688802562305),
('dentist', 1.1426722784621401),
('highly', 1.1420208631618658),
('nominated', 1.1409146683587992),
('tony', 1.1397491942285991),
('draws', 1.1325138403437911),
('everyday', 1.1306150197542835),
('contrast', 1.1284652518177909),
('cried', 1.1213405397456659),
('fabulous', 1.1210851445201684),
('ned', 1.120591195386885),
('fay', 1.120591195386885),
('emma', 1.1184149159642893),
('sensitive', 1.113318436057805),
('smooth', 1.1089750757036563),
('dramas', 1.1080910326226534),
('today', 1.1050431789984001),
('helps', 1.1023091505494358),
('inspiring', 1.0986122886681098),
('jimmy', 1.0937696641923216),
('awesome', 1.0931328229034842),
('unique', 1.0881409888008142),
('tragic', 1.0871835928444868),
('intense', 1.0870514662670339),
('stellar', 1.0857088838322018),
('rival', 1.0822184788924332),
('provides', 1.0797081340289569),
('depression', 1.0782034170369026),
('shy', 1.0775588794702773),
('carrie', 1.076139432816051),
('blend', 1.0753554265038423),
('hank', 1.0736109864626924),
('diana', 1.0726368022648489),
('unexpected', 1.0722255334949147),
('achievement', 1.0668635903535293),
('bettie', 1.0663514264498881),
('happiness', 1.0632729222228008),
('glorious', 1.0608719606852626),
('davis', 1.0541605260972757),
('terrifying', 1.0525211814678428),
('beauty', 1.050410186850232),
('ideal', 1.0479685558493548),
('fears', 1.0467872208035236),
('hong', 1.0438040521731147),
('seasons', 1.0433496099930604),
('fascinating', 1.0414538748281612),
('carries', 1.0345904299031787),
('satisfying', 1.0321225473992768),
('definite', 1.0319209141694374),
('touched', 1.0296194171811581),
('greatest', 1.0248947127715422),
('creates', 1.0241097613701886),
('aunt', 1.023388867430522),
('walter', 1.022328983918479),
('spectacular', 1.0198314108149955),
('portrayal', 1.0189810189761024),
('ann', 1.0127808528183286),
('enterprise', 1.0116009116784799),
('musicals', 1.0096648026516135),
('deeply', 1.0094845087721023),
('incredible', 1.0061677561461084),
('mature', 1.0060195018402847),
('triumph', 0.99682959435816731),
('margaret', 0.99682959435816731),
('navy', 0.99493385919326827),
('harry', 0.99176919305006062),
('lucas', 0.990398704027877),
('sweet', 0.98966110487955483),
('joey', 0.98794672078059009),
('oscar', 0.98721905111049713),
('balance', 0.98649499054740353),
('warm', 0.98485340331145166),
('ages', 0.98449898190068863),
('glover', 0.98082925301172619),
('guilt', 0.98082925301172619),
('carrey', 0.98082925301172619),
('learns', 0.97881108885548895),
('unusual', 0.97788374278196932),
('sons', 0.97777581552483595),
('complex', 0.97761897738147796),
('essence', 0.97753435711487369),
('brazil', 0.9769153536905899),
('widow', 0.97650959186720987),
('solid', 0.97537964824416146),
('beautiful', 0.97326301262841053),
('holmes', 0.97246100334120955),
('awe', 0.97186058302896583),
('vhs', 0.97116734209998934),
('eerie', 0.97116734209998934),
('lonely', 0.96873720724669754),
('grim', 0.96873720724669754),
('sport', 0.96825047080486615),
('debut', 0.96508089604358704),
('destiny', 0.96343751029985703),
('thrillers', 0.96281074750904794),
('tears', 0.95977584381389391),
('rose', 0.95664202739772253),
('feelings', 0.95551144502743635),
('ginger', 0.95551144502743635),
('winning', 0.95471810900804055),
('stanley', 0.95387344302319799),
('cox', 0.95343027882361187),
('paris', 0.95278479030472663),
('heart', 0.95238806924516806),
('hooked', 0.95155887071161305),
('comfortable', 0.94803943018873538),
('mgm', 0.94446160884085151),
('masterpiece', 0.94155039863339296),
('themes', 0.94118828349588235),
('danny', 0.93967118051821874),
('anime', 0.93378388932167222),
('perry', 0.93328830824272613),
('joy', 0.93301752567946861),
('lovable', 0.93081883243706487),
('hal', 0.92953595862417571),
('mysteries', 0.92953595862417571),
('louis', 0.92871325187271225),
('charming', 0.92520609553210742),
('urban', 0.92367083917177761),
('allows', 0.92183091224977043),
('impact', 0.91815814604895041),
('lifestyle', 0.91629073187415511),
('italy', 0.91629073187415511),
('spy', 0.91289514287301687),
('treat', 0.91193342650519937),
('subsequent', 0.91056005716517008),
('kennedy', 0.90981821736853763),
('loving', 0.90967549275543591),
('surprising', 0.90937028902958128),
('quiet', 0.90648673177753425),
('winter', 0.90624039602065365),
('reveals', 0.90490540964902977),
('raw', 0.90445627422715225),
('funniest', 0.90078654533818991),
('norman', 0.89994159387262562),
('thief', 0.89874642222324552),
('season', 0.89827222637147675),
('secrets', 0.89794159320595857),
('colorful', 0.89705936994626756),
('highest', 0.8967461358011849),
('compelling', 0.89462923509297576),
('danes', 0.89248008318043659),
('castle', 0.88967708335606499),
('kudos', 0.88889175768604067),
('great', 0.88810470901464589),
('baseball', 0.88730319500090271),
('subtitles', 0.88730319500090271),
('bleak', 0.88730319500090271),
('winner', 0.88643776872447388),
('tragedy', 0.88563699078315261),
('todd', 0.88551907320740142),
('nicely', 0.87924946019380601),
('arthur', 0.87546873735389985),
('essential', 0.87373111745535925),
('gorgeous', 0.8731725250935497),
('fonda', 0.87294029100054127),
('eastwood', 0.87139541196626402),
('focuses', 0.87082835779739776),
('enjoyed', 0.87070195951624607),
('natural', 0.86997924506912838),
('intensity', 0.86835126958503595),
('witty', 0.86824103423244681),
('rob', 0.8642954367557748),
('worlds', 0.86377269759070874),
('health', 0.86113891179907498),
('magical', 0.85953791528170564),
('deeper', 0.85802182375017932),
('lucy', 0.85618680780444956),
('moving', 0.85566611005772031),
('lovely', 0.85290640004681306),
('purple', 0.8513711857748395),
('memorable', 0.84801189112086062),
('sings', 0.84729786038720367),
('craig', 0.84342938360928321),
('modesty', 0.84342938360928321),
('relate', 0.84326559685926517),
('episodes', 0.84223712084137292),
('strong', 0.84167135777060931),
('smith', 0.83959811108590054),
('tear', 0.83704136022001441),
('apartment', 0.83333115290549531),
('princess', 0.83290912293510388),
('disagree', 0.83290912293510388),
('kung', 0.83173334384609199),
('columbo', 0.82667857318446791),
('jake', 0.82667857318446791),
('hart', 0.82472353834866463),
('strength', 0.82417544296634937),
('realizes', 0.82360006895738058),
('dave', 0.8232003088081431),
('childhood', 0.82208086393583857),
('forbidden', 0.81989888619908913),
('tight', 0.81883539572344199),
('surreal', 0.8178506590609026),
('manager', 0.81770990320170756),
('dancer', 0.81574950265227764),
('con', 0.81093021621632877),
('studios', 0.81093021621632877),
('miike', 0.80821651034473263),
('realistic', 0.80807714723392232),
('explicit', 0.80792269515237358),
('kurt', 0.8060875917405409),
('deals', 0.80535917116687328),
('holds', 0.80493858654806194),
('carl', 0.80437281567016972),
('touches', 0.80396154690023547),
('gene', 0.80314807577427383),
('albert', 0.8027669055771679),
('abc', 0.80234647252493729),
('cry', 0.80011930011211307),
('sides', 0.7995275841185171),
('develops', 0.79850769621777162),
('eyre', 0.79850769621777162),
('dances', 0.79694397424158891),
('oscars', 0.79633141679517616),
('legendary', 0.79600456599965308),
('importance', 0.79492987486988764),
('hearted', 0.79492987486988764),
('portraying', 0.79356592830699269),
('impressed', 0.79258107754813223),
('waters', 0.79112758892014912),
('empire', 0.79078565012386137),
('edge', 0.789774016249017),
('environment', 0.78845736036427028),
('jean', 0.78845736036427028),
('sentimental', 0.7864791203521645),
('captured', 0.78623760362595729),
('styles', 0.78592891401091158),
('daring', 0.78592891401091158),
('backgrounds', 0.78275933924963248),
('frank', 0.78275933924963248),
('matches', 0.78275933924963248),
('tense', 0.78275933924963248),
('gothic', 0.78209466657644144),
('sharp', 0.7814397877056235),
('achieved', 0.78015855754957497),
('court', 0.77947526404844247),
('steals', 0.7789140023173704),
('rules', 0.77844476107184035),
('colors', 0.77684619943659217),
('reunion', 0.77318988823348167),
('covers', 0.77139937745969345),
('tale', 0.77010822169607374),
('rain', 0.7683706017975328),
('denzel', 0.76804848873306297),
('stays', 0.76787072675588186),
('blob', 0.76725515271366718),
('conventional', 0.76214005204689672),
('maria', 0.76214005204689672),
('fresh', 0.76158434211317383),
('midnight', 0.76096977689870637),
('landscape', 0.75852993982279704),
('animated', 0.75768570169751648),
('titanic', 0.75666058628227129),
('sunday', 0.75666058628227129),
('spring', 0.7537718023763802),
('cagney', 0.7537718023763802),
('enjoyable', 0.75246375771636476),
('immensely', 0.75198768058287868),
('sir', 0.7507762933965817),
('nevertheless', 0.75067102469813185),
('driven', 0.74994477895307854),
('performances', 0.74883252516063137),
('memories', 0.74721440183022114),
('simple', 0.74641420974143258),
('golden', 0.74533293373051557),
('leslie', 0.74533293373051557),
('lovers', 0.74497224842453125),
('relationship', 0.74484232345601786),
('supporting', 0.74357803418683721),
('che', 0.74262723782331497),
('packed', 0.7410032017375805),
('trek', 0.74021469141793106),
('provoking', 0.73840377214806618),
('strikes', 0.73759894313077912),
('depiction', 0.73682224406260699),
('emotional', 0.73678211645681524),
('secretary', 0.7366322924996842),
('influenced', 0.73511137965897755),
('florida', 0.73511137965897755),
('germany', 0.73288750920945944),
('brings', 0.73142936713096229),
('lewis', 0.73129894652432159),
('elderly', 0.73088750854279239),
('owner', 0.72743625403857748),
('streets', 0.72666987259858895),
('henry', 0.72642196944481741),
('portrays', 0.72593700338293632),
('bears', 0.7252354951114458),
('china', 0.72489587887452556),
('anger', 0.72439972406404984),
('society', 0.72433010799663333),
('available', 0.72415741730250549),
('best', 0.72347034060446314),
('bugs', 0.72270598280148979),
('magic', 0.71878961117328299),
('verhoeven', 0.71846498854423513),
('delivers', 0.71846498854423513),
('jim', 0.71783979315031676),
('donald', 0.71667767797013937),
('endearing', 0.71465338578090898),
('relationships', 0.71393795022901896),
('greatly', 0.71256526641704687),
('charlie', 0.71024161391924534),
('simon', 0.70967648251115578),
('effectively', 0.70914752190638641),
('march', 0.70774597998109789),
('atmosphere', 0.70744773070214162),
('influence', 0.70733181555190172),
('genius', 0.706392407309966),
('emotionally', 0.70556970055850243),
('ken', 0.70526854109229009),
('identity', 0.70484322032313651),
('sophisticated', 0.70470800296102132),
('dan', 0.70457587638356811),
('andrew', 0.70329955202396321),
('india', 0.70144598337464037),
('roy', 0.69970458110610434),
('surprisingly', 0.6995780708902356),
('sky', 0.69780919366575667),
('romantic', 0.69664981111114743),
('match', 0.69566924999265523),
('britain', 0.69314718055994529),
('beatty', 0.69314718055994529),
('affected', 0.69314718055994529),
('cowboy', 0.69314718055994529),
('wave', 0.69314718055994529),
('stylish', 0.69314718055994529),
('bitter', 0.69314718055994529),
('patient', 0.69314718055994529),
('meets', 0.69314718055994529),
('love', 0.69198533541937324),
('paul', 0.68980827929443067),
('andy', 0.68846333124751902),
('performance', 0.68797386327972465),
('patrick', 0.68645819240914863),
('unlike', 0.68546468438792907),
('brooks', 0.68433655087779044),
('refuses', 0.68348526964820844),
('award', 0.6824518914431974),
('complaint', 0.6824518914431974),
('ride', 0.68229716453587952),
('dawson', 0.68171848473632257),
('luke', 0.68158635815886937),
('wells', 0.68087708796813096),
('france', 0.6804081547825156),
('handsome', 0.68007509899259255),
('sports', 0.68007509899259255),
('rebel', 0.67875844310784572),
('directs', 0.67875844310784572),
('greater', 0.67605274720064523),
('dreams', 0.67599410133369586),
('effective', 0.67565402311242806),
('interpretation', 0.67479804189174875),
('works', 0.67445504754779284),
('brando', 0.67445504754779284),
('noble', 0.6737290947028437),
('paced', 0.67314651385327573),
('le', 0.67067432470788668),
('master', 0.67015766233524654),
('h', 0.6696166831497512),
('rings', 0.66904962898088483),
('easy', 0.66895995494594152),
('city', 0.66820823221269321),
('sunshine', 0.66782937257565544),
('succeeds', 0.66647893347778397),
('relations', 0.664159643686693),
('england', 0.66387679825983203),
('glimpse', 0.66329421741026418),
('aired', 0.66268797307523675),
('sees', 0.66263163663399482),
('both', 0.66248336767382998),
('definitely', 0.66199789483898808),
('imaginative', 0.66139848224536502),
('appreciate', 0.66083893732728749),
('tricks', 0.66071190480679143),
('striking', 0.66071190480679143),
('carefully', 0.65999497324304479),
('complicated', 0.65981076029235353),
('perspective', 0.65962448852130173),
('trilogy', 0.65877953705573755),
('future', 0.65834665141052828),
('lion', 0.65742909795786608),
('victor', 0.65540685257709819),
('douglas', 0.65540685257709819),
('inspired', 0.65459851044271034),
('marriage', 0.65392646740666405),
('demands', 0.65392646740666405),
('father', 0.65172321672194655),
('page', 0.65123628494430852),
('instant', 0.65058756614114943),
('era', 0.6495567444850836),
('ruthless', 0.64934455790155243),
('saga', 0.64934455790155243),
('joan', 0.64891392558311978),
('joseph', 0.64841128671855386),
('workers', 0.64829661439459352),
('fantasy', 0.64726757480925168),
('accomplished', 0.64551913157069074),
('distant', 0.64551913157069074),
('manhattan', 0.64435701639051324),
('personal', 0.64355023942057321),
('pushing', 0.64313675998528386),
('meeting', 0.64313675998528386),
('individual', 0.64313675998528386),
('pleasant', 0.64250344774119039),
('brave', 0.64185388617239469),
('william', 0.64083139119578469),
('hudson', 0.64077919504262937),
('friendly', 0.63949446706762514),
('eccentric', 0.63907995928966954),
('awards', 0.63875310849414646),
('jack', 0.63838309514997038),
('seeking', 0.63808740337691783),
('colonel', 0.63757732940513456),
('divorce', 0.63757732940513456),
('jane', 0.63443957973316734),
('keeping', 0.63414883979798953),
('gives', 0.63383568159497883),
('ted', 0.63342794585832296),
('animation', 0.63208692379869902),
('progress', 0.6317782341836532),
('concert', 0.63127177684185776),
('larger', 0.63127177684185776),
('nation', 0.6296337748376194),
('albeit', 0.62739580299716491),
('discovers', 0.62542900650499444),
('classic', 0.62504956428050518),
('segment', 0.62335141862440335),
('morgan', 0.62303761437291871),
('mouse', 0.62294292188669675),
('impressive', 0.62211140744319349),
('artist', 0.62168821657780038),
('ultimate', 0.62168821657780038),
('griffith', 0.62117368093485603),
('emily', 0.62082651898031915),
('drew', 0.62082651898031915),
('moved', 0.6197197120051281),
('profound', 0.61903920840622351),
('families', 0.61903920840622351),
('innocent', 0.61851219917136446),
('versions', 0.61730910416844087),
('eddie', 0.61691981517206107),
('criticism', 0.61651395453902935),
('nature', 0.61594514653194088),
('recognized', 0.61518563909023349),
('sexuality', 0.61467556511845012),
('contract', 0.61400986000122149),
('brian', 0.61344043794920278),
('remembered', 0.6131044728864089),
('determined', 0.6123858239154869),
('offers', 0.61207935747116349),
('pleasure', 0.61195702582993206),
('washington', 0.61180154110599294),
('images', 0.61159731359583758),
('games', 0.61067095873570676),
('fashioned', 0.60798937221963845),
('melodrama', 0.60749173598145145),
('peoples', 0.60613580357031549),
('charismatic', 0.60613580357031549),
('rough', 0.60613580357031549),
('dealing', 0.60517840761398811),
('fine', 0.60496962268013299),
('tap', 0.60391604683200273),
('trio', 0.60157998703445481),
('russell', 0.60120968523425966),
('figures', 0.60077386042893011),
('ward', 0.60005675749393339),
('shine', 0.59911823091166894),
('job', 0.59845562125168661),
('satisfied', 0.59652034487087369),
('river', 0.59637962862495086),
('brown', 0.595773016534769),
('believable', 0.59566072133302495),
('bound', 0.59470710774669278),
('always', 0.59470710774669278),
('hall', 0.5933967777928858),
('cook', 0.5916777203950857),
('claire', 0.59136448625000293),
('anna', 0.58778666490211906),
('peace', 0.58628403501758408),
('visually', 0.58539431926349916),
('falk', 0.58525821854876026),
('morality', 0.58525821854876026),
('growing', 0.58466653756587539),
('experiences', 0.58314628534561685),
('stood', 0.58314628534561685),
('touch', 0.58122926435596001),
('lives', 0.5810976767513224),
('kubrick', 0.58066919713325493),
('timing', 0.58047401805583243),
('struggles', 0.57981849525294216),
('expressions', 0.57981849525294216),
('authentic', 0.57848427223980559),
('helen', 0.57763429343810091),
('pre', 0.57700753064729182),
('quirky', 0.5753641449035618),
('young', 0.57531672344534313),
('inner', 0.57454143815209846),
('mexico', 0.57443087372056334),
('clint', 0.57380042292737909),
('sisters', 0.57286101468544337),
('realism', 0.57226528899949558),
('personalities', 0.5720692490067093),
('french', 0.5720692490067093),
('surprises', 0.57113222999698177),
('overcome', 0.5697681593994407),
('timothy', 0.56953322459276867),
('tales', 0.56909453188996639),
('war', 0.56843317302781682),
('civil', 0.5679840376059393),
('countries', 0.56737779327091187),
('streep', 0.56710645966458029),
('oliver', 0.56673325570428668),
('australia', 0.56580775818334383),
('understanding', 0.56531380905006046),
('players', 0.56509525370004821),
('knowing', 0.56489284503626647),
('rogers', 0.56421349718405212),
('suspenseful', 0.56368911332305849),
('variety', 0.56368911332305849),
('true', 0.56281525180810066),
('jr', 0.56220982311246936),
('psychological', 0.56108745854687891),
('branagh', 0.55961578793542266),
('wealth', 0.55961578793542266),
('performing', 0.55961578793542266),
('odds', 0.55961578793542266),
('sent', 0.55961578793542266),
('reminiscent', 0.55961578793542266),
('grand', 0.55961578793542266),
('overwhelming', 0.55961578793542266),
('brothers', 0.55891181043362848),
('howard', 0.55811089675600245),
('david', 0.55693122256475369),
('generation', 0.55628799784274796),
('grow', 0.55612538299565417),
('survival', 0.55594605904646033),
('mainstream', 0.55574731115750231),
('dick', 0.55431073570572953),
('charm', 0.55288175575407861),
('kirk', 0.55278982286502287),
('twists', 0.55244729845681018),
('gangster', 0.55206858230003986),
('jeff', 0.55179306225421365),
('family', 0.55116244510065526),
('tend', 0.55053307336110335),
('thanks', 0.55049088015842218),
('world', 0.54744234723432639),
('sutherland', 0.54743536937855164),
('life', 0.54695514434959924),
('disc', 0.54654370636806993),
('bug', 0.54654370636806993),
('tribute', 0.5455111817538808),
('europe', 0.54522705048332309),
('sacrifice', 0.54430155296238014),
('color', 0.54405127139431109),
('superior', 0.54333490233128523),
('york', 0.54318235866536513),
('pulls', 0.54266622962164945),
('hearts', 0.54232429082536171),
('jackson', 0.54232429082536171),
('enjoy', 0.54124285135906114),
('redemption', 0.54056759296472823),
('hamilton', 0.5389965007326869),
('stands', 0.5389965007326869),
('trial', 0.5389965007326869),
('greek', 0.5389965007326869),
('each', 0.5388212312554177),
('faithful', 0.53773307668591508),
('jealous', 0.53714293208336406),
('documentaries', 0.53714293208336406),
('different', 0.53709860682460819),
('describes', 0.53680111016925136),
('shorts', 0.53596159703753288),
('brilliance', 0.53551823635636209),
('mountains', 0.53492317534505118),
('share', 0.53408248593025787),
('dealt', 0.53408248593025787),
('providing', 0.53329847961804933),
('explore', 0.53329847961804933),
('series', 0.5325809226575603),
('fellow', 0.5323318289869543),
('loves', 0.53062825106217038),
('olivier', 0.53062825106217038),
('revolution', 0.53062825106217038),
('roman', 0.53062825106217038),
('century', 0.53002783074992665),
('musical', 0.52966871156747064),
('heroic', 0.52925932545482868),
('ironically', 0.52806743020049673),
('approach', 0.52806743020049673),
('temple', 0.52806743020049673),
('moves', 0.5279372642387119),
('julie', 0.52609309589677911),
('tells', 0.52415107836314001),
('uncle', 0.52354439617376536),
('union', 0.52324814376454787),
('deep', 0.52309571635780505),
('reminds', 0.52157841554225237),
('famous', 0.52118841080153722),
('jazz', 0.52053443789295151),
('dennis', 0.51987545928590861),
('epic', 0.51919387343650736),
('shows', 0.51915322220375304),
('performed', 0.5191244265806858),
('demons', 0.5191244265806858),
('eric', 0.51879379341516751),
('discovered', 0.51879379341516751),
('youth', 0.5185626062681431),
('human', 0.51851411224987087),
('tarzan', 0.51813827061227724),
('ourselves', 0.51794309153485463),
('wwii', 0.51758240622887042),
('passion', 0.5162164724008671),
('desire', 0.51607497965213445),
('pays', 0.51581316527702981),
('fox', 0.51557622652458857),
('dirty', 0.51557622652458857),
('symbolism', 0.51546600332249293),
('sympathetic', 0.51546600332249293),
('attitude', 0.51530993621331933),
('appearances', 0.51466440007315639),
('jeremy', 0.51466440007315639),
('fun', 0.51439068993048687),
('south', 0.51420972175023116),
('arrives', 0.51409894911095988),
('present', 0.51341965894303732),
('com', 0.51326167856387173),
('smile', 0.51265880484765169),
('fits', 0.51082562376599072),
('provided', 0.51082562376599072),
('carter', 0.51082562376599072),
('ring', 0.51082562376599072),
('aging', 0.51082562376599072),
('countryside', 0.51082562376599072),
('alan', 0.51082562376599072),
('visit', 0.51082562376599072),
('begins', 0.51015650363396647),
('success', 0.50900578704900468),
('japan', 0.50900578704900468),
('accurate', 0.50895471583017893),
('proud', 0.50800474742434931),
('daily', 0.5075946031845443),
('atmospheric', 0.50724780241810674),
('karloff', 0.50724780241810674),
('recently', 0.50714914903668207),
('fu', 0.50704490092608467),
('horrors', 0.50656122497953315),
('finding', 0.50637127341661037),
('lust', 0.5059356384717989),
('hitchcock', 0.50574947073413001),
('among', 0.50334004951332734),
('viewing', 0.50302139827440906),
('shining', 0.50262885656181222),
('investigation', 0.50262885656181222),
('duo', 0.5020919437972361),
('cameron', 0.5020919437972361),
('finds', 0.50128303100539795),
('contemporary', 0.50077528791248915),
('genuine', 0.50046283673044401),
('frightening', 0.49995595152908684),
('plays', 0.49975983848890226),
('age', 0.49941323171424595),
('position', 0.49899116611898781),
('continues', 0.49863035067217237),
('roles', 0.49839716550752178),
('james', 0.49837216269470402),
('individuals', 0.49824684155913052),
('brought', 0.49783842823917956),
('hilarious', 0.49714551986191058),
('brutal', 0.49681488669639234),
('appropriate', 0.49643688631389105),
('dance', 0.49581998314812048),
('league', 0.49578774640145024),
('helping', 0.49578774640145024),
('stunts', 0.49561620510246196),
('traveling', 0.49532143723002542),
('thoroughly', 0.49414593456733524),
('depicted', 0.49317068852726992),
('honor', 0.49247648509779424),
('combination', 0.49247648509779424),
('differences', 0.49247648509779424),
('fully', 0.49213349075383811),
('tracy', 0.49159426183810306),
('battles', 0.49140753790888908),
('possibility', 0.49112055268665822),
('romance', 0.4901589869574316),
('initially', 0.49002249613622745),
('happy', 0.4898997500608791),
('crime', 0.48977221456815834),
('singing', 0.4893852925281213),
('especially', 0.48901267837860624),
('shakespeare', 0.48754793889664511),
('hugh', 0.48729512635579658),
('detail', 0.48609484250827351),
('guide', 0.48550781578170082),
('companion', 0.48550781578170082),
('julia', 0.48550781578170082),
('san', 0.48550781578170082),
('desperation', 0.48550781578170082),
('strongly', 0.48460242866688824),
('necessary', 0.48302334245403883),
('humanity', 0.48265474679929443),
('drama', 0.48221998493060503),
('warming', 0.48183808689273838),
('intrigue', 0.48183808689273838),
('nonetheless', 0.48183808689273838),
('cuba', 0.48183808689273838),
('planned', 0.47957308026188628),
('pictures', 0.47929937011921681),
('nine', 0.47803580094299974),
('settings', 0.47743860773325364),
('history', 0.47732966933780852),
('ordinary', 0.47725880012690741),
('primary', 0.47608267532211779),
('official', 0.47608267532211779),
('episode', 0.47529620261150429),
('role', 0.47520268270188676),
('spirit', 0.47477690799839323),
('grey', 0.47409361449726067),
('ways', 0.47323464982718205),
('cup', 0.47260441094579297),
('piano', 0.47260441094579297),
('familiar', 0.47241617565111949),
('sinister', 0.47198579044972683),
('reveal', 0.47171449364936496),
('max', 0.47150852042515579),
('dated', 0.47121648567094482),
('discovery', 0.47000362924573563),
('vicious', 0.47000362924573563),
('losing', 0.47000362924573563),
('genuinely', 0.46871413841586385),
('hatred', 0.46734051182625186),
('mistaken', 0.46702300110759781),
('dream', 0.46608972992459924),
('challenge', 0.46608972992459924),
('crisis', 0.46575733836428446),
('photographed', 0.46488852857896512),
('machines', 0.46430560813109778),
('critics', 0.46430560813109778),
('bird', 0.46430560813109778),
('born', 0.46411383518967209),
('detective', 0.4636633473511525),
('higher', 0.46328467899699055),
('remains', 0.46262352194811296),
('inevitable', 0.46262352194811296),
('soviet', 0.4618180446592961),
('ryan', 0.46134556650262099),
('african', 0.46112595521371813),
('smaller', 0.46081520319132935),
('techniques', 0.46052488529119184),
('information', 0.46034171833399862),
('deserved', 0.45999798712841444),
('cynical', 0.45953232937844013),
('lynch', 0.45953232937844013),
('francisco', 0.45953232937844013),
('tour', 0.45953232937844013),
('spielberg', 0.45953232937844013),
('struggle', 0.45911782160048453),
('language', 0.45902121257712653),
('visual', 0.45823514408822852),
('warner', 0.45724137763188427),
('social', 0.45720078250735313),
('reality', 0.45719346885019546),
('hidden', 0.45675840249571492),
('breaking', 0.45601738727099561),
('sometimes', 0.45563021171182794),
('modern', 0.45500247579345005),
('surfing', 0.45425527227759638),
('popular', 0.45410691533051023),
('surprised', 0.4534409399850382),
('follows', 0.45245361754408348),
('keeps', 0.45234869400701483),
('john', 0.4520909494482197),
('defeat', 0.45198512374305722),
('mixed', 0.45198512374305722),
('justice', 0.45142724367280018),
('treasure', 0.45083371313801535),
('presents', 0.44973793178615257),
('years', 0.44919197032104968),
('chief', 0.44895022004790319),
('closely', 0.44701411102103689),
('segments', 0.44701411102103689),
('lose', 0.44658335503763702),
('caine', 0.44628710262841953),
('caught', 0.44610275383999071),
('hamlet', 0.44558510189758965),
('chinese', 0.44507424620321018),
('welcome', 0.44438052435783792),
('birth', 0.44368632092836219),
('represents', 0.44320543609101143),
('puts', 0.44279106572085081),
('fame', 0.44183275227903923),
('closer', 0.44183275227903923),
('visuals', 0.44183275227903923),
('web', 0.44183275227903923),
('criminal', 0.4412745608048752),
('minor', 0.4409224199448939),
('jon', 0.44086703515908027),
('liked', 0.44074991514020723),
('restaurant', 0.44031183943833246),
('flaws', 0.43983275161237217),
('de', 0.43983275161237217),
('searching', 0.4393666597838457),
('rap', 0.43891304217570443),
('light', 0.43884433018199892),
('elizabeth', 0.43872232986464677),
('marry', 0.43861731542506488),
('oz', 0.43825493093115531),
('controversial', 0.43825493093115531),
('learned', 0.43825493093115531),
('slowly', 0.43785660389939979),
('bridge', 0.43721380642274466),
('thrilling', 0.43721380642274466),
('wayne', 0.43721380642274466),
('comedic', 0.43721380642274466),
('married', 0.43658501682196887),
('nazi', 0.4361020775700542),
('murder', 0.4353180712578455),
('physical', 0.4353180712578455),
('johnny', 0.43483971678806865),
('michelle', 0.43445264498141672),
('wallace', 0.43403848055222038),
('silent', 0.43395706390247063),
('comedies', 0.43395706390247063),
('played', 0.43387244114515305),
('international', 0.43363598507486073),
('vision', 0.43286408229627887),
('intelligent', 0.43196704885367099),
('shop', 0.43078291609245434),
('also', 0.43036720209769169),
('levels', 0.4302451371066513),
('miss', 0.43006426712153217),
('ocean', 0.4295626596872249),
...]

``````
``````

In [22]:

# words most frequently seen in a review with a "NEGATIVE" label
list(reversed(pos_neg_ratios.most_common()))[0:30]

``````
``````

Out[22]:

[('boll', -4.0778152602708904),
('uwe', -3.9218753018711578),
('seagal', -3.3202501058581921),
('unwatchable', -3.0269848170580955),
('stinker', -2.9876839403711624),
('mst', -2.7753833211707968),
('incoherent', -2.7641396677532537),
('unfunny', -2.5545257844967644),
('waste', -2.4907515123361046),
('blah', -2.4475792789485005),
('horrid', -2.3715779644809971),
('pointless', -2.3451073877136341),
('atrocious', -2.3187369339642556),
('redeeming', -2.2667790015910296),
('prom', -2.2601040980178784),
('drivel', -2.2476029585766928),
('lousy', -2.2118080125207054),
('worst', -2.1930856334332267),
('laughable', -2.172468615469592),
('awful', -2.1385076866397488),
('poorly', -2.1326133844207011),
('wasting', -2.1178155545614512),
('remotely', -2.111046881095167),
('existent', -2.0024805005437076),
('boredom', -1.9241486572738005),
('miserably', -1.9216610938019989),
('sucks', -1.9166645809588516),
('uninspired', -1.9131499212248517),
('lame', -1.9117232884159072),
('insult', -1.9085323769376259)]

``````

# Transforming Text into Numbers

``````

In [26]:

from IPython.display import Image

review = "This was a horrible, terrible movie."

Image(filename='sentiment_network.png')

``````
``````

Out[26]:

``````
``````

In [27]:

review = "The movie was excellent"

Image(filename='sentiment_network_pos.png')

``````
``````

Out[27]:

``````

# Project 2: Creating the Input/Output Data

``````

In [74]:

vocab = set(total_counts.keys())
vocab_size = len(vocab)
print(vocab_size)

``````
``````

74074

``````
``````

In [75]:

list(vocab)

``````
``````

Out[75]:

``````
``````

In [46]:

import numpy as np

layer_0 = np.zeros((1,vocab_size))
layer_0

``````
``````

Out[46]:

array([[ 0.,  0.,  0., ...,  0.,  0.,  0.]])

``````
``````

In [47]:

from IPython.display import Image
Image(filename='sentiment_network.png')

``````
``````

Out[47]:

``````
``````

In [48]:

word2index = {}

for i,word in enumerate(vocab):
word2index[word] = i
word2index

``````
``````

Out[48]:

{'': 0,
'inhabitants': 1,
'goku': 2,
'stunts': 3,
'catepillar': 4,
'kristensen': 5,
'goddess': 7,
'offing': 49797,
'distroy': 8,
'unexplainably': 9,
'concoctions': 10,
'petite': 11,
'paramilitary': 24759,
'scribe': 12,
'stevson': 13,
'senegal': 6,
'sctv': 14,
'soundscape': 15,
'rana': 16,
'immortalizer': 18,
'rene': 67354,
'eko': 23,
'planning': 20,
'akiva': 21,
'plod': 22,
'orderly': 24,
'zeleznice': 25,
'critize': 29,
'baguettes': 25649,
'jefferies': 30,
'uncertainties': 61695,
'mountainbillies': 31,
'steinbichler': 32,
'vowel': 33,
'rafe': 34,
'donig': 68719,
'tulipe': 36,
'clot': 37,
'hack': 12526,
'distended': 38,
'cornered': 37116,
'impatiently': 40,
'batrice': 12525,
'unfortuntly': 41,
'lung': 42,
'scapegoats': 43,
'pscychosexual': 45,
'outbid': 46,
'obit': 47,
'sideshows': 48,
'jugde': 49,
'kevloun': 51,
'quartier': 53,
'harp': 61948,
'unravelling': 54,
'antiques': 56,
'strutts': 57,
'tilts': 58,
'disconcert': 59,
'dossiers': 60,
'sorriest': 61,
'craftsman': 49412,
'blart': 62,
'dependence': 37120,
'sated': 61698,
'iberia': 63,
'sagan': 72,
'frmann': 65,
'daniell': 66,
'rays': 67,
'pried': 68,
'khoobsurat': 69,
'leavitt': 70,
'caiano': 71,
'attractiveness': 73,
'kitaparaporn': 74,
'hamilton': 75,
'massages': 76,
...}

``````
``````

In [49]:

def update_input_layer(review):

global layer_0

# clear out previous state, reset the layer to be all 0s
layer_0 *= 0
for word in review.split(" "):
layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

``````
``````

In [33]:

layer_0

``````
``````

Out[33]:

array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

``````
``````

In [51]:

def get_target_for_label(label):
if(label == 'POSITIVE'):
return 1
else:
return 0

``````
``````

In [54]:

labels[0]

``````
``````

Out[54]:

'POSITIVE'

``````
``````

In [52]:

get_target_for_label(labels[0])

``````
``````

Out[52]:

1

``````
``````

In [55]:

labels[1]

``````
``````

Out[55]:

'NEGATIVE'

``````
``````

In [53]:

get_target_for_label(labels[1])

``````
``````

Out[53]:

0

``````

# Project 3: Building a Neural Network

• 3 layer neural network
• no non-linearity in hidden layer
• use our functions to create the training data
• create a "pre_process_data" function to create vocabulary for our training data generating functions
• modify "train" to train over the entire corpus

### Where to Get Help if You Need it

``````

In [86]:

import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):

# set our random number generator
np.random.seed(1)

self.pre_process_data(reviews, labels)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self, reviews, labels):

review_vocab = set()
for review in reviews:
for word in review.split(" "):
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
if(word in self.word2index.keys()):
self.layer_0[0][self.word2index[word]] += 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def train(self, training_reviews, training_labels):

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer
self.update_input_layer(review)

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# TODO: Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# TODO: Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# TODO: Update the weights
self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

if(np.abs(layer_2_error) < 0.5):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
if(i % 2500 == 0):
print("")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer
self.update_input_layer(review.lower())

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

if(layer_2[0] > 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [87]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

``````
``````

In [61]:

# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):587.5% #Correct:500 #Tested:1000 Testing Accuracy:50.0%

``````
``````

In [62]:

# train the network
mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):89.58 #Correct:1250 #Trained:2501 Training Accuracy:49.9%
Progress:20.8% Speed(reviews/sec):95.03 #Correct:2500 #Trained:5001 Training Accuracy:49.9%
Progress:27.4% Speed(reviews/sec):95.46 #Correct:3295 #Trained:6592 Training Accuracy:49.9%

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
117             # TODO: Update the weights
118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
120
121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt:

``````
``````

In [63]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.01)

``````
``````

In [64]:

# train the network
mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):96.39 #Correct:1247 #Trained:2501 Training Accuracy:49.8%
Progress:20.8% Speed(reviews/sec):99.31 #Correct:2497 #Trained:5001 Training Accuracy:49.9%
Progress:22.8% Speed(reviews/sec):99.02 #Correct:2735 #Trained:5476 Training Accuracy:49.9%

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
117             # TODO: Update the weights
118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
120
121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt:

``````
``````

In [65]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.001)

``````
``````

In [66]:

# train the network
mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):98.77 #Correct:1267 #Trained:2501 Training Accuracy:50.6%
Progress:20.8% Speed(reviews/sec):98.79 #Correct:2640 #Trained:5001 Training Accuracy:52.7%
Progress:31.2% Speed(reviews/sec):98.58 #Correct:4109 #Trained:7501 Training Accuracy:54.7%
Progress:41.6% Speed(reviews/sec):93.78 #Correct:5638 #Trained:10001 Training Accuracy:56.3%
Progress:52.0% Speed(reviews/sec):91.76 #Correct:7246 #Trained:12501 Training Accuracy:57.9%
Progress:62.5% Speed(reviews/sec):92.42 #Correct:8841 #Trained:15001 Training Accuracy:58.9%
Progress:69.4% Speed(reviews/sec):92.58 #Correct:9934 #Trained:16668 Training Accuracy:59.5%

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
117             # TODO: Update the weights
118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
120
121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt:

``````

# Understanding Neural Noise

``````

In [67]:

from IPython.display import Image
Image(filename='sentiment_network.png')

``````
``````

Out[67]:

``````
``````

In [70]:

def update_input_layer(review):

global layer_0

# clear out previous state, reset the layer to be all 0s
layer_0 *= 0
for word in review.split(" "):
layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

``````
``````

In [71]:

layer_0

``````
``````

Out[71]:

array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

``````
``````

In [79]:

review_counter = Counter()

``````
``````

In [80]:

for word in reviews[0].split(" "):
review_counter[word] += 1

``````
``````

In [81]:

review_counter.most_common()

``````
``````

Out[81]:

[('.', 27),
('', 18),
('the', 9),
('to', 6),
('i', 5),
('high', 5),
('is', 4),
('of', 4),
('a', 4),
('bromwell', 4),
('teachers', 4),
('that', 4),
('their', 2),
('my', 2),
('at', 2),
('as', 2),
('me', 2),
('in', 2),
('students', 2),
('it', 2),
('student', 2),
('school', 2),
('through', 1),
('insightful', 1),
('ran', 1),
('years', 1),
('here', 1),
('episode', 1),
('reality', 1),
('what', 1),
('far', 1),
('t', 1),
('saw', 1),
('s', 1),
('repeatedly', 1),
('isn', 1),
('closer', 1),
('and', 1),
('fetched', 1),
('remind', 1),
('can', 1),
('welcome', 1),
('line', 1),
('your', 1),
('survive', 1),
('teaching', 1),
('satire', 1),
('classic', 1),
('who', 1),
('age', 1),
('knew', 1),
('schools', 1),
('inspector', 1),
('comedy', 1),
('down', 1),
('pity', 1),
('m', 1),
('all', 1),
('see', 1),
('think', 1),
('situation', 1),
('time', 1),
('pomp', 1),
('other', 1),
('much', 1),
('many', 1),
('which', 1),
('one', 1),
('profession', 1),
('programs', 1),
('same', 1),
('some', 1),
('such', 1),
('pettiness', 1),
('immediately', 1),
('expect', 1),
('financially', 1),
('recalled', 1),
('tried', 1),
('whole', 1),
('right', 1),
('life', 1),
('cartoon', 1),
('scramble', 1),
('sack', 1),
('believe', 1),
('when', 1),
('than', 1),
('burn', 1),
('pathetic', 1)]

``````

# Project 4: Reducing Noise in our Input Data

``````

In [82]:

import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):

# set our random number generator
np.random.seed(1)

self.pre_process_data(reviews, labels)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self, reviews, labels):

review_vocab = set()
for review in reviews:
for word in review.split(" "):
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
if(word in self.word2index.keys()):
self.layer_0[0][self.word2index[word]] = 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def train(self, training_reviews, training_labels):

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer
self.update_input_layer(review)

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# TODO: Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# TODO: Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# TODO: Update the weights
self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

if(np.abs(layer_2_error) < 0.5):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
if(i % 2500 == 0):
print("")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer
self.update_input_layer(review.lower())

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

if(layer_2[0] > 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [83]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

``````
``````

In [84]:

mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):91.50 #Correct:1795 #Trained:2501 Training Accuracy:71.7%
Progress:20.8% Speed(reviews/sec):95.25 #Correct:3811 #Trained:5001 Training Accuracy:76.2%
Progress:31.2% Speed(reviews/sec):93.74 #Correct:5898 #Trained:7501 Training Accuracy:78.6%
Progress:41.6% Speed(reviews/sec):93.69 #Correct:8042 #Trained:10001 Training Accuracy:80.4%
Progress:52.0% Speed(reviews/sec):95.27 #Correct:10186 #Trained:12501 Training Accuracy:81.4%
Progress:62.5% Speed(reviews/sec):98.19 #Correct:12317 #Trained:15001 Training Accuracy:82.1%
Progress:72.9% Speed(reviews/sec):98.56 #Correct:14440 #Trained:17501 Training Accuracy:82.5%
Progress:83.3% Speed(reviews/sec):99.74 #Correct:16613 #Trained:20001 Training Accuracy:83.0%
Progress:93.7% Speed(reviews/sec):100.7 #Correct:18794 #Trained:22501 Training Accuracy:83.5%
Progress:99.9% Speed(reviews/sec):101.9 #Correct:20115 #Trained:24000 Training Accuracy:83.8%

``````
``````

In [85]:

# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):832.7% #Correct:851 #Tested:1000 Testing Accuracy:85.1%

``````

# Analyzing Inefficiencies in our Network

``````

In [88]:

Image(filename='sentiment_network_sparse.png')

``````
``````

Out[88]:

``````
``````

In [89]:

layer_0 = np.zeros(10)

``````
``````

In [90]:

layer_0

``````
``````

Out[90]:

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

``````
``````

In [91]:

layer_0[4] = 1
layer_0[9] = 1

``````
``````

In [92]:

layer_0

``````
``````

Out[92]:

array([ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.])

``````
``````

In [93]:

weights_0_1 = np.random.randn(10,5)

``````
``````

In [94]:

layer_0.dot(weights_0_1)

``````
``````

Out[94]:

array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

``````
``````

In [101]:

indices = [4,9]

``````
``````

In [102]:

layer_1 = np.zeros(5)

``````
``````

In [103]:

for index in indices:
layer_1 += (weights_0_1[index])

``````
``````

In [104]:

layer_1

``````
``````

Out[104]:

array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

``````
``````

In [100]:

Image(filename='sentiment_network_sparse_2.png')

``````
``````

Out[100]:

``````

# Project 5: Making our Network More Efficient

``````

In [105]:

import time
import sys

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):

np.random.seed(1)

self.pre_process_data(reviews)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self,reviews):

review_vocab = set()
for review in reviews:
for word in review.split(" "):
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))
self.layer_1 = np.zeros((1,hidden_nodes))

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
self.layer_0[0][self.word2index[word]] = 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def train(self, training_reviews_raw, training_labels):

training_reviews = list()
for review in training_reviews_raw:
indices = set()
for word in review.split(" "):
if(word in self.word2index.keys()):
training_reviews.append(list(indices))

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer

# Hidden layer
#             layer_1 = self.layer_0.dot(self.weights_0_1)
self.layer_1 *= 0
for index in review:
self.layer_1 += self.weights_0_1[index]

# Output layer
layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# Update the weights
self.weights_1_2 -= self.layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step

for index in review:
self.weights_0_1[index] -= layer_1_delta[0] * self.learning_rate # update input-to-hidden weights with gradient descent step

if(np.abs(layer_2_error) < 0.5):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer

# Hidden layer
self.layer_1 *= 0
unique_indices = set()
for word in review.lower().split(" "):
if word in self.word2index.keys():
for index in unique_indices:
self.layer_1 += self.weights_0_1[index]

# Output layer
layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

if(layer_2[0] > 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [106]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

``````
``````

In [111]:

mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

In [109]:

# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):1581.% #Correct:857 #Tested:1000 Testing Accuracy:85.7%

``````

# Further Noise Reduction

``````

In [112]:

Image(filename='sentiment_network_sparse_2.png')

``````
``````

Out[112]:

``````
``````

In [113]:

# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()

``````
``````

Out[113]:

[('edie', 4.6913478822291435),
('paulie', 4.0775374439057197),
('felix', 3.1527360223636558),
('polanski', 2.8233610476132043),
('matthau', 2.8067217286092401),
('victoria', 2.6810215287142909),
('mildred', 2.6026896854443837),
('gandhi', 2.5389738710582761),
('flawless', 2.451005098112319),
('superbly', 2.2600254785752498),
('perfection', 2.1594842493533721),
('astaire', 2.1400661634962708),
('captures', 2.0386195471595809),
('voight', 2.0301704926730531),
('wonderfully', 2.0218960560332353),
('powell', 1.9783454248084671),
('brosnan', 1.9547990964725592),
('lily', 1.9203768470501485),
('bakshi', 1.9029851043382795),
('lincoln', 1.9014583864844796),
('refreshing', 1.8551812956655511),
('breathtaking', 1.8481124057791867),
('bourne', 1.8478489358790986),
('lemmon', 1.8458266904983307),
('delightful', 1.8002701588959635),
('flynn', 1.7996646487351682),
('andrews', 1.7764919970972666),
('homer', 1.7692866133759964),
('beautifully', 1.7626953362841438),
('soccer', 1.7578579175523736),
('elvira', 1.7397031072720019),
('underrated', 1.7197859696029656),
('gripping', 1.7165360479904674),
('superb', 1.7091514458966952),
('delight', 1.6714733033535532),
('welles', 1.6677068205580761),
('sinatra', 1.6389967146756448),
('touching', 1.637217476541176),
('timeless', 1.62924053973028),
('macy', 1.6211339521972916),
('unforgettable', 1.6177367152487956),
('favorites', 1.6158688027643908),
('stewart', 1.6119987332957739),
('hartley', 1.6094379124341003),
('sullivan', 1.6094379124341003),
('extraordinary', 1.6094379124341003),
('brilliantly', 1.5950491749820008),
('friendship', 1.5677652160335325),
('wonderful', 1.5645425925262093),
('palma', 1.5553706911638245),
('magnificent', 1.54663701119507),
('finest', 1.5462590108125689),
('jackie', 1.5439233053234738),
('ritter', 1.5404450409471491),
('tremendous', 1.5184661342283736),
('freedom', 1.5091151908062312),
('fantastic', 1.5048433868558566),
('terrific', 1.5026699370083942),
('noir', 1.493925025312256),
('sidney', 1.493925025312256),
('outstanding', 1.4910053152089213),
('mann', 1.4894785973551214),
('pleasantly', 1.4894785973551214),
('nancy', 1.488077055429833),
('marie', 1.4825711915553104),
('marvelous', 1.4739999415389962),
('excellent', 1.4647538505723599),
('ruth', 1.4596256342054401),
('stanwyck', 1.4412101187160054),
('widmark', 1.4350845252893227),
('splendid', 1.4271163556401458),
('chan', 1.423108334242607),
('exceptional', 1.4201959127955721),
('tender', 1.410986973710262),
('gentle', 1.4078005663408544),
('poignant', 1.4022947024663317),
('gem', 1.3932148039644643),
('amazing', 1.3919815802404802),
('chilling', 1.3862943611198906),
('captivating', 1.3862943611198906),
('fisher', 1.3862943611198906),
('davies', 1.3862943611198906),
('darker', 1.3652409519220583),
('april', 1.3499267169490159),
('kelly', 1.3461743673304654),
('blake', 1.3418425985490567),
('overlooked', 1.329135947279942),
('ralph', 1.32818673031261),
('bette', 1.3156767939059373),
('hoffman', 1.3150668518315229),
('cole', 1.3121863889661687),
('shines', 1.3049487216659381),
('powerful', 1.2999662776313934),
('notch', 1.2950456896547455),
('remarkable', 1.2883688239495823),
('pitt', 1.286210902562908),
('winters', 1.2833463918674481),
('vivid', 1.2762934659055623),
('gritty', 1.2757524867200667),
('giallo', 1.2745029551317739),
('portrait', 1.2704625455947689),
('innocence', 1.2694300209805796),
('psychiatrist', 1.2685113254635072),
('favorite', 1.2668956297860055),
('ensemble', 1.2656663733312759),
('stunning', 1.2622417124499117),
('burns', 1.259880436264232),
('garbo', 1.258954938743289),
('barbara', 1.2580400255962119),
('panic', 1.2527629684953681),
('holly', 1.2527629684953681),
('philip', 1.2527629684953681),
('carol', 1.2481440226390734),
('perfect', 1.246742480713785),
('appreciated', 1.2462482874741743),
('favourite', 1.2411123512753928),
('journey', 1.2367626271489269),
('rural', 1.235471471385307),
('bond', 1.2321436812926323),
('builds', 1.2305398317106577),
('brilliant', 1.2287554137664785),
('brooklyn', 1.2286654169163074),
('von', 1.225175011976539),
('unfolds', 1.2163953243244932),
('recommended', 1.2163953243244932),
('daniel', 1.20215296760895),
('perfectly', 1.1971931173405572),
('crafted', 1.1962507582320256),
('prince', 1.1939224684724346),
('troubled', 1.192138346678933),
('consequences', 1.1865810616140668),
('haunting', 1.1814999484738773),
('cinderella', 1.180052620608284),
('alexander', 1.1759989522835299),
('emotions', 1.1753049094563641),
('boxing', 1.1735135968412274),
('subtle', 1.1734135017508081),
('curtis', 1.1649873576129823),
('rare', 1.1566438362402944),
('loved', 1.1563661500586044),
('daughters', 1.1526795099383853),
('courage', 1.1438688802562305),
('dentist', 1.1426722784621401),
('highly', 1.1420208631618658),
('nominated', 1.1409146683587992),
('tony', 1.1397491942285991),
('draws', 1.1325138403437911),
('everyday', 1.1306150197542835),
('contrast', 1.1284652518177909),
('cried', 1.1213405397456659),
('fabulous', 1.1210851445201684),
('ned', 1.120591195386885),
('fay', 1.120591195386885),
('emma', 1.1184149159642893),
('sensitive', 1.113318436057805),
('smooth', 1.1089750757036563),
('dramas', 1.1080910326226534),
('today', 1.1050431789984001),
('helps', 1.1023091505494358),
('inspiring', 1.0986122886681098),
('jimmy', 1.0937696641923216),
('awesome', 1.0931328229034842),
('unique', 1.0881409888008142),
('tragic', 1.0871835928444868),
('intense', 1.0870514662670339),
('stellar', 1.0857088838322018),
('rival', 1.0822184788924332),
('provides', 1.0797081340289569),
('depression', 1.0782034170369026),
('shy', 1.0775588794702773),
('carrie', 1.076139432816051),
('blend', 1.0753554265038423),
('hank', 1.0736109864626924),
('diana', 1.0726368022648489),
('unexpected', 1.0722255334949147),
('achievement', 1.0668635903535293),
('bettie', 1.0663514264498881),
('happiness', 1.0632729222228008),
('glorious', 1.0608719606852626),
('davis', 1.0541605260972757),
('terrifying', 1.0525211814678428),
('beauty', 1.050410186850232),
('ideal', 1.0479685558493548),
('fears', 1.0467872208035236),
('hong', 1.0438040521731147),
('seasons', 1.0433496099930604),
('fascinating', 1.0414538748281612),
('carries', 1.0345904299031787),
('satisfying', 1.0321225473992768),
('definite', 1.0319209141694374),
('touched', 1.0296194171811581),
('greatest', 1.0248947127715422),
('creates', 1.0241097613701886),
('aunt', 1.023388867430522),
('walter', 1.022328983918479),
('spectacular', 1.0198314108149955),
('portrayal', 1.0189810189761024),
('ann', 1.0127808528183286),
('enterprise', 1.0116009116784799),
('musicals', 1.0096648026516135),
('deeply', 1.0094845087721023),
('incredible', 1.0061677561461084),
('mature', 1.0060195018402847),
('triumph', 0.99682959435816731),
('margaret', 0.99682959435816731),
('navy', 0.99493385919326827),
('harry', 0.99176919305006062),
('lucas', 0.990398704027877),
('sweet', 0.98966110487955483),
('joey', 0.98794672078059009),
('oscar', 0.98721905111049713),
('balance', 0.98649499054740353),
('warm', 0.98485340331145166),
('ages', 0.98449898190068863),
('glover', 0.98082925301172619),
('guilt', 0.98082925301172619),
('carrey', 0.98082925301172619),
('learns', 0.97881108885548895),
('unusual', 0.97788374278196932),
('sons', 0.97777581552483595),
('complex', 0.97761897738147796),
('essence', 0.97753435711487369),
('brazil', 0.9769153536905899),
('widow', 0.97650959186720987),
('solid', 0.97537964824416146),
('beautiful', 0.97326301262841053),
('holmes', 0.97246100334120955),
('awe', 0.97186058302896583),
('vhs', 0.97116734209998934),
('eerie', 0.97116734209998934),
('lonely', 0.96873720724669754),
('grim', 0.96873720724669754),
('sport', 0.96825047080486615),
('debut', 0.96508089604358704),
('destiny', 0.96343751029985703),
('thrillers', 0.96281074750904794),
('tears', 0.95977584381389391),
('rose', 0.95664202739772253),
('feelings', 0.95551144502743635),
('ginger', 0.95551144502743635),
('winning', 0.95471810900804055),
('stanley', 0.95387344302319799),
('cox', 0.95343027882361187),
('paris', 0.95278479030472663),
('heart', 0.95238806924516806),
('hooked', 0.95155887071161305),
('comfortable', 0.94803943018873538),
('mgm', 0.94446160884085151),
('masterpiece', 0.94155039863339296),
('themes', 0.94118828349588235),
('danny', 0.93967118051821874),
('anime', 0.93378388932167222),
('perry', 0.93328830824272613),
('joy', 0.93301752567946861),
('lovable', 0.93081883243706487),
('hal', 0.92953595862417571),
('mysteries', 0.92953595862417571),
('louis', 0.92871325187271225),
('charming', 0.92520609553210742),
('urban', 0.92367083917177761),
('allows', 0.92183091224977043),
('impact', 0.91815814604895041),
('lifestyle', 0.91629073187415511),
('italy', 0.91629073187415511),
('spy', 0.91289514287301687),
('treat', 0.91193342650519937),
('subsequent', 0.91056005716517008),
('kennedy', 0.90981821736853763),
('loving', 0.90967549275543591),
('surprising', 0.90937028902958128),
('quiet', 0.90648673177753425),
('winter', 0.90624039602065365),
('reveals', 0.90490540964902977),
('raw', 0.90445627422715225),
('funniest', 0.90078654533818991),
('norman', 0.89994159387262562),
('thief', 0.89874642222324552),
('season', 0.89827222637147675),
('secrets', 0.89794159320595857),
('colorful', 0.89705936994626756),
('highest', 0.8967461358011849),
('compelling', 0.89462923509297576),
('danes', 0.89248008318043659),
('castle', 0.88967708335606499),
('kudos', 0.88889175768604067),
('great', 0.88810470901464589),
('baseball', 0.88730319500090271),
('subtitles', 0.88730319500090271),
('bleak', 0.88730319500090271),
('winner', 0.88643776872447388),
('tragedy', 0.88563699078315261),
('todd', 0.88551907320740142),
('nicely', 0.87924946019380601),
('arthur', 0.87546873735389985),
('essential', 0.87373111745535925),
('gorgeous', 0.8731725250935497),
('fonda', 0.87294029100054127),
('eastwood', 0.87139541196626402),
('focuses', 0.87082835779739776),
('enjoyed', 0.87070195951624607),
('natural', 0.86997924506912838),
('intensity', 0.86835126958503595),
('witty', 0.86824103423244681),
('rob', 0.8642954367557748),
('worlds', 0.86377269759070874),
('health', 0.86113891179907498),
('magical', 0.85953791528170564),
('deeper', 0.85802182375017932),
('lucy', 0.85618680780444956),
('moving', 0.85566611005772031),
('lovely', 0.85290640004681306),
('purple', 0.8513711857748395),
('memorable', 0.84801189112086062),
('sings', 0.84729786038720367),
('craig', 0.84342938360928321),
('modesty', 0.84342938360928321),
('relate', 0.84326559685926517),
('episodes', 0.84223712084137292),
('strong', 0.84167135777060931),
('smith', 0.83959811108590054),
('tear', 0.83704136022001441),
('apartment', 0.83333115290549531),
('princess', 0.83290912293510388),
('disagree', 0.83290912293510388),
('kung', 0.83173334384609199),
('columbo', 0.82667857318446791),
('jake', 0.82667857318446791),
('hart', 0.82472353834866463),
('strength', 0.82417544296634937),
('realizes', 0.82360006895738058),
('dave', 0.8232003088081431),
('childhood', 0.82208086393583857),
('forbidden', 0.81989888619908913),
('tight', 0.81883539572344199),
('surreal', 0.8178506590609026),
('manager', 0.81770990320170756),
('dancer', 0.81574950265227764),
('con', 0.81093021621632877),
('studios', 0.81093021621632877),
('miike', 0.80821651034473263),
('realistic', 0.80807714723392232),
('explicit', 0.80792269515237358),
('kurt', 0.8060875917405409),
('deals', 0.80535917116687328),
('holds', 0.80493858654806194),
('carl', 0.80437281567016972),
('touches', 0.80396154690023547),
('gene', 0.80314807577427383),
('albert', 0.8027669055771679),
('abc', 0.80234647252493729),
('cry', 0.80011930011211307),
('sides', 0.7995275841185171),
('develops', 0.79850769621777162),
('eyre', 0.79850769621777162),
('dances', 0.79694397424158891),
('oscars', 0.79633141679517616),
('legendary', 0.79600456599965308),
('importance', 0.79492987486988764),
('hearted', 0.79492987486988764),
('portraying', 0.79356592830699269),
('impressed', 0.79258107754813223),
('waters', 0.79112758892014912),
('empire', 0.79078565012386137),
('edge', 0.789774016249017),
('environment', 0.78845736036427028),
('jean', 0.78845736036427028),
('sentimental', 0.7864791203521645),
('captured', 0.78623760362595729),
('styles', 0.78592891401091158),
('daring', 0.78592891401091158),
('backgrounds', 0.78275933924963248),
('frank', 0.78275933924963248),
('matches', 0.78275933924963248),
('tense', 0.78275933924963248),
('gothic', 0.78209466657644144),
('sharp', 0.7814397877056235),
('achieved', 0.78015855754957497),
('court', 0.77947526404844247),
('steals', 0.7789140023173704),
('rules', 0.77844476107184035),
('colors', 0.77684619943659217),
('reunion', 0.77318988823348167),
('covers', 0.77139937745969345),
('tale', 0.77010822169607374),
('rain', 0.7683706017975328),
('denzel', 0.76804848873306297),
('stays', 0.76787072675588186),
('blob', 0.76725515271366718),
('conventional', 0.76214005204689672),
('maria', 0.76214005204689672),
('fresh', 0.76158434211317383),
('midnight', 0.76096977689870637),
('landscape', 0.75852993982279704),
('animated', 0.75768570169751648),
('titanic', 0.75666058628227129),
('sunday', 0.75666058628227129),
('spring', 0.7537718023763802),
('cagney', 0.7537718023763802),
('enjoyable', 0.75246375771636476),
('immensely', 0.75198768058287868),
('sir', 0.7507762933965817),
``````
``````

In [114]:

# words most frequently seen in a review with a "NEGATIVE" label
list(reversed(pos_neg_ratios.most_common()))[0:30]

``````
``````

Out[114]:

``````
``````

In [134]:

get_most_similar_words("terrible")

``````
``````

Out[134]:

[('worst', 0.16966107259049848),
('awful', 0.12026847019691242),
('waste', 0.11945367265311002),
('poor', 0.092758887574435483),
('terrible', 0.091425387197727914),
('dull', 0.084209271678223591),
('poorly', 0.081241544516042027),
('disappointment', 0.080064759621368692),
('fails', 0.078599773723337499),
('disappointing', 0.07733948548032335),
('boring', 0.077127858748012895),
('unfortunately', 0.075502449705859093),
('worse', 0.070601835364194662),
('mess', 0.070564299623590385),
('stupid', 0.069484822832543036),
('annoying', 0.065687021903374138),
('save', 0.06288059749586572),
('disappointed', 0.062692353812072846),
('wasted', 0.061387183028051268),
('supposed', 0.060985452957725138),
('horrible', 0.060121772339380097),
('laughable', 0.05869840628546763),
('crap', 0.058104528667884549),
('basically', 0.057218840369636148),
('nothing', 0.057158220043034176),
('ridiculous', 0.056905481068931438),
('lacks', 0.055766565889465436),
('lame', 0.055616009058110163),
('avoid', 0.055518726073197189),
('unless', 0.054208926212940739),
('script', 0.053948359467048485),
('failed', 0.05341393055000912),
('pointless', 0.052855531546894111),
('oh', 0.052761580933176816),
('effort', 0.050773747127292324),
('guess', 0.050379576420076538),
('minutes', 0.049784532804242179),
('wooden', 0.049453108380727175),
('redeeming', 0.049182869114721736),
('seems', 0.049079625154669751),
('weak', 0.046496387374765677),
('pathetic', 0.046099741149715746),
('looks', 0.045796536730244836),
('hoping', 0.045082242887577006),
('wonder', 0.044669791780934595),
('forgettable', 0.042854349251871711),
('silly', 0.042237829687270009),
('attempt', 0.041706299941373509),
('predictable', 0.041514442438568111),
('someone', 0.041506119027337314),
('sorry', 0.04086887728153335),
('might', 0.040445683500688362),
('slow', 0.040346869107034944),
('painful', 0.040220039039613249),
('thin', 0.040062642253777855),
('mediocre', 0.03940716537757738),
('garbage', 0.039310979440981095),
('money', 0.038907973313640501),
('none', 0.038300807052230948),
('bland', 0.038062246057085039),
('couldn', 0.038016664218957927),
('either', 0.037738833070341968),
('unfunny', 0.037076629805044496),
('entire', 0.036642119399463165),
('cheap', 0.036516800802525562),
('honestly', 0.036212041543797806),
('mildly', 0.035744850608185635),
('total', 0.035560454471013067),
('neither', 0.035415946043548564),
('making', 0.035244315060985604),
('problem', 0.035088251034562444),
('flat', 0.034518947038747069),
('bizarre', 0.034509460694521141),
('group', 0.034335883528586783),
('ludicrous', 0.034159649323816037),
('decent', 0.033771585787868943),
('clich', 0.033751444631720563),
('daughter', 0.033732725858384868),
('bored', 0.033622879572852551),
('horror', 0.033464120619956815),
('writing', 0.033437913916756788),
('skip', 0.033430639850491162),
('absurd', 0.033154173530163311),
('barely', 0.032653416827517712),
('idea', 0.032584013175663208),
('wasn', 0.032481207966272067),
('fake', 0.032136435098031532),
('believe', 0.031677858935800801),
('uninteresting', 0.031526815915867132),
('reason', 0.031390715260270527),
('scenes', 0.031216362935389166),
('alright', 0.031046883113956258),
('body', 0.030999982945986659),
('no', 0.030917695380560415),
('insult', 0.030808450146355922),
('mst', 0.030527916471397853),
('nowhere', 0.030352177599338292),
('lousy', 0.030160195468380797),
('didn', 0.030115903194061419),
('interest', 0.029888118468771117),
('half', 0.02981324611505725),
('lee', 0.029804235955718638),
('dimensional', 0.029562861996904038),
('unconvincing', 0.029322607679950232),
('left', 0.029322408787030522),
('sex', 0.029296748476082143),
('even', 0.029225209450923412),
('far', 0.029192618334294547),
('tries', 0.029004001132703523),
('anything', 0.028988097743501137),
('trying', 0.028919477228465107),
('accent', 0.028779542310252575),
('nudity', 0.028662654953266066),
('apparently', 0.028291626941517919),
('zombies', 0.028178583120430672),
('sense', 0.028166740534758782),
('incoherent', 0.027988926190862507),
('something', 0.027986519420278216),
('tedious', 0.027952212405329514),
('wrong', 0.027831947557365632),
('were', 0.027825695799985381),
('endless', 0.027824591794431464),
('turkey', 0.027624266205058482),
('zombie', 0.027543333835110859),
('appears', 0.027469840878483233),
('embarrassing', 0.027425437142424351),
('walked', 0.027411768647042707),
('premise', 0.027346072285964175),
('ok', 0.027333008356232001),
('result', 0.027312558653191901),
('complete', 0.027247564384243413),
('t', 0.02718673746561022),
('least', 0.02694907263201728),
('was', 0.026917906772065299),
('unwatchable', 0.026829458762459381),
('sat', 0.026806511532143459),
('to', 0.026801902698524071),
('christmas', 0.026735555962199221),
('gore', 0.0266701616306084),
('mother', 0.026612696987437748),
('aspects', 0.026583237615263797),
('amateurish', 0.026565159291175689),
('below', 0.026548271016778126),
('stupidity', 0.026460990221946916),
('appeal', 0.026396596713420969),
('trite', 0.026331168557051404),
('then', 0.026284629203937655),
('rubbish', 0.026216695246125493),
('okay', 0.025981446095883619),
('sucks', 0.025930224401969335),
('pretentious', 0.02590791237062829),
('positive', 0.025773976409798761),
('confusing', 0.025737618729473628),
('remotely', 0.025699566061653016),
('obnoxious', 0.025454829745850248),
('m', 0.025435495928249188),
('rent', 0.025373441934038503),
('laughs', 0.025346512576104405),
('re', 0.025342239903627856),
('context', 0.025274382593713566),
('disgusting', 0.025195418263468175),
('so', 0.025148024611438793),
('tiresome', 0.025031684199042097),
('miscast', 0.024970026716882358),
('aren', 0.024968703889385907),
('forced', 0.024933299777713691),
('paid', 0.024906929703330336),
('utter', 0.024802282233385511),
('uninspired', 0.024799576212017463),
('falls', 0.024749631706810708),
('throw', 0.024614954073046699),
('been', 0.024470487429445055),
('ugly', 0.024334820044832374),
('hopes', 0.024315635652054308),
('dire', 0.024191221840051083),
('hunter', 0.02417129112741848),
('producers', 0.024089231997130214),
('seem', 0.024065146985976858),
('straight', 0.02399666645155215),
('vampire', 0.023942797574072673),
('paper', 0.023908828083961012),
('crappy', 0.023807255546688062),
('excited', 0.023764516357875833),
('start', 0.023739057832096774),
('material', 0.023729757962158746),
('excuse', 0.023681577270328096),
('cop', 0.023480677028928129),
('f', 0.023312251619610848),
('ms', 0.023282327986278314),
('villain', 0.023158273483660733),
('fest', 0.023091425711778239),
('lack', 0.023039437894325183),
('such', 0.023031161078650962),
('saving', 0.023025745893238067),
('clichs', 0.022928209200342307),
('enough', 0.02292139725392528),
('mistake', 0.022868689470375),
('unbelievable', 0.022864325693347887),
('maybe', 0.022825002748295277),
('blame', 0.022808369279543168),
('bunch', 0.022769532876362856),
('version', 0.022753296945755487),
('candy', 0.022749363632616742),
('island', 0.02274580066608017),
('tripe', 0.022695188509832674),
('wasting', 0.022681371343356752),
('inept', 0.022679276425665765),
('actor', 0.02263697537177102),
('flop', 0.022613758633444538),
('any', 0.022560608437607196),
('k', 0.022554017579615032),
('appalling', 0.022500975853556059),
('propaganda', 0.022465024430755737),
('major', 0.022430482324246572),
('sequel', 0.022362296462477865),
('offensive', 0.022326080604825445),
('revenge', 0.022315150942472609),
('shoot', 0.02228810570921174),
('whatsoever', 0.022286498346940933),
('ruined', 0.022173811528211046),
('painfully', 0.022152008209040921),
('on', 0.022016020939730041),
('shame', 0.021981493467648269),
('effects', 0.021849482201960254),
('wouldn', 0.021848506706035151),
('development', 0.021773241990065747),
('plot', 0.021733893676650608),
('co', 0.021728673026887642),
('church', 0.021719723717009982),
('storyline', 0.021663404462350763),
('screenwriter', 0.02166017725248592),
('bother', 0.02157169990956697),
('miserably', 0.021516173872499805),
('christian', 0.021515873507543644),
('found', 0.021449077767987133),
('watching', 0.021344833140596573),
('pseudo', 0.021308384076023465),
('boredom', 0.021119995917930002),
('talent', 0.021005847445274794),
('continuity', 0.021005145852421921),
('talents', 0.020992716564348882),
('college', 0.020990718952374872),
('tried', 0.020978219626186817),
('editing', 0.020865814801443755),
('lines', 0.020853755408845792),
('drivel', 0.020726493692759695),
('generous', 0.020697017742241999),
('potential', 0.020672988272090822),
('creatures', 0.020601399429061324),
('disjointed', 0.020581338926655212),
('irritating', 0.020576764848872681),
('pile', 0.020560898967541538),
('acts', 0.020560043588043517),
('junk', 0.020558505639508208),
('raped', 0.020550629285133258),
('christ', 0.020481424289613519),
('brain', 0.020431161137662711),
('slasher', 0.020425652445140888),
('seconds', 0.020390927443421889),
('nobody', 0.020389268101762604),
('dialog', 0.020338349197601486),
('makers', 0.020333184431951125),
('excitement', 0.0202904560242918),
('flashbacks', 0.020267510512910234),
('sloppy', 0.020234078734398357),
('joke', 0.020212187048528514),
('sleep', 0.020108895811675787),
('bottom', 0.019986770547280194),
('however', 0.019981104962051167),
('fail', 0.01993740521162023),
('sucked', 0.019874923017311572),
('soap', 0.019853525395543012),
('looked', 0.019810211840927107),
('stinks', 0.019769365381781159),
('deserve', 0.019614034321096468),
('exact', 0.019555320028259),
('substance', 0.019552647432498176),
('yeah', 0.019513150136671549),
('production', 0.019510696746296522),
('female', 0.0194769149781218),
('unintentional', 0.019387723280198922),
('army', 0.019364852889641605),
('minute', 0.019351862554568222),
('unrealistic', 0.019350657250497855),
('rescue', 0.019340920364464904),
('theater', 0.019333829276668497),
('monsters', 0.019332636015751026),
('frankly', 0.019326550823843876),
('children', 0.019314240606868868),
('convince', 0.019312073515560635),
('shallow', 0.019298445504930546),
('synopsis', 0.019259706392396589),
('scott', 0.01918347440557033),
('seriously', 0.019182027987149994),
('ridiculously', 0.019169300285178967),
('looking', 0.019150985439966562),
('kareena', 0.019110212601710658),
('wrote', 0.019015323411486429),
('attempts', 0.019006343780653929),
('bothered', 0.018970712777578509),
('utterly', 0.018924824767803397),
('giant', 0.018891084650049701),
('writers', 0.018868906582101302),
('atrocious', 0.018848042351202358),
('plain', 0.018828766525513598),
('presumably', 0.018826629750947937),
('example', 0.018796453237837171),
('murray', 0.018754173430046931),
('seemed', 0.018749132295913067),
('stay', 0.01874415970643268),
('interview', 0.018672085964709519),
('disaster', 0.018553283301235145),
('value', 0.018544080955166367),
('paint', 0.018529607132429377),
('original', 0.018528190682362417),
('difficult', 0.018518455298178582),
('care', 0.018494804801171251),
('watchable', 0.01848187060538909),
('useless', 0.018470481000366853),
('desperately', 0.018421675047000256),
('except', 0.018391993551238547),
('doing', 0.018384737621350646),
('errors', 0.018380414978330258),
('solely', 0.018349321075079389),
('sitting', 0.018346519170301077),
('giving', 0.018335957397904827),
('ideas', 0.018327099221245188),
('unbearable', 0.018321159676201411),
('nor', 0.018254420259554285),
('project', 0.018252633214771746),
('dozen', 0.018206363291515752),
('charles', 0.018163660578293446),
('plastic', 0.018161741020378652),
('book', 0.018139011699011297),
('shots', 0.018114876064363863),
('ill', 0.018103621818215732),
('where', 0.018065882599695146),
('women', 0.018026883825059355),
('screenplay', 0.018014307024101311),
('through', 0.017990863003241389),
('actress', 0.017876003487857155),
('sign', 0.01786563614405693),
('walk', 0.017823522607756631),
('santa', 0.017727102733219178),
('happens', 0.017722408798843584),
('contrived', 0.017720303645882781),
('gun', 0.01768599317693384),
('ashamed', 0.017679623098721585),
('gratuitous', 0.017665737783803856),
('one', 0.017608259344043253),
('not', 0.017562336441189891),
('credibility', 0.017558852870687949),
('promising', 0.017544417082572289),
('risk', 0.017532600100721239),
('sub', 0.017531947750389465),
('lacking', 0.017513759836446527),
('fell', 0.017464857159331278),
('scenery', 0.017451365955319952),
('flesh', 0.017402514298262693),
('animal', 0.017386681692205423),
('tired', 0.017383214541566681),
('writer', 0.017380887757560838),
('dialogue', 0.017319373946647603),
('terribly', 0.017291135257276879),
('downright', 0.017277675563205447),
('rented', 0.017247977656900705),
('clumsy', 0.017241290805182073),
('blah', 0.017217377177396766),
('random', 0.017199913549247985),
('members', 0.017198947117344765),
('three', 0.017189383912215896),
('celluloid', 0.017174000803758888),
('your', 0.017140173886430049),
('lost', 0.017127763322061815),
('suddenly', 0.017124566068806111),
('cover', 0.017066680835874294),
('existent', 0.017028540662919325),
('mostly', 0.017009366180205404),
('dig', 0.016990887715494299),
('spending', 0.016944400877991015),
('elsewhere', 0.016937877167916525),
('suck', 0.016897737192407582),
('apparent', 0.016783874225807266),
('fill', 0.016766110935370601),
('running', 0.016728621099996368),
('jokes', 0.016718920312228033),
('cheese', 0.016699473014889825),
('outer', 0.016612591391981471),
('anil', 0.016581200840654876),
('director', 0.01651289445031142),
('awfully', 0.016492200414985295),
('mix', 0.016468214294032502),
('naturally', 0.016404879835269445),
('scientist', 0.016395078905109238),
('imdb', 0.016343168034107167),
('dumb', 0.016289693549692445),
('curiosity', 0.016277433551029966),
('somewhere', 0.016236117446747977),
('stereotyped', 0.016235814767295294),
('officer', 0.016235401039884571),
('shelf', 0.016151304702362455),
('spends', 0.016089566181633208),
('explanation', 0.016040330428242214),
('proof', 0.016021381235154272),
('killed', 0.016004979798664866),
('songs', 0.016002280189188103),
('why', 0.015994497048455167),
('assume', 0.015953574865902424),
('mean', 0.015907137878947274),
('year', 0.015900265748875844),
('named', 0.015897377296493403),
('actors', 0.015880849255718699),
('dreck', 0.015844184837849263),
('ripped', 0.015809352391222227),
('exception', 0.015801037653546943),
('let', 0.015747554995806858),
('said', 0.015739206756809128),
('handed', 0.015729421480492771),
('five', 0.015692627471399438),
('manage', 0.015647108880417118),
('thousands', 0.015643430975892967),
('faith', 0.015616976955551864),
('hideous', 0.015589158171890801),
('alas', 0.015538213296394241),
('interesting', 0.015537431607034399),
('camera', 0.01553421777185927),
('affair', 0.0154993718203294),
('saved', 0.015479619606949033),
('allow', 0.01547129065797),
('embarrassed', 0.01546569091101236),
('historically', 0.015405093934372963),
('guy', 0.01537764125447004),
('smoking', 0.015346508854378344),
('implausible', 0.01534045398602275),
('entirely', 0.01533469278818364),
('insulting', 0.015328508644691492),
('unable', 0.015321433538157139),
('supposedly', 0.015316107621242397),
('replaced', 0.015263381265213493),
('write', 0.015247349730647834),
('devoid', 0.01519618192038018),
('angry', 0.015128878425101411),
('cannot', 0.015124671278970766),
('stinker', 0.015117424017513681),
('types', 0.015097306608066994),
('hype', 0.015076288365524311),
('responsible', 0.014991356276561583),
('peter', 0.014969127137333012),
('putting', 0.014910707254937244),
('over', 0.014897181020826423),
('cardboard', 0.014888714204149049),
('interspersed', 0.014883165331874143),
('haired', 0.014880449676198546),
('spend', 0.01487609431622766),
('elvis', 0.01485470984415174),
('indulgent', 0.014847232132387197),
('catholic', 0.014843519648135949),
('downhill', 0.014807184967767797),
('lazy', 0.01478151469522973),
('aged', 0.014773315829198606),
('exist', 0.014753607788843255),
('torture', 0.014733998799388373),
('prove', 0.014729418674653008),
('tolerable', 0.014680880104255794),
('four', 0.014654547592632506),
('acceptable', 0.014651730694965842),
('chick', 0.014641428398798827),
('unimaginative', 0.014629366067627067),
('whiny', 0.014626751487134576),
('artsy', 0.014597921349167277),
('decide', 0.014596087755808965),
('unpleasant', 0.014539257963097196),
('rotten', 0.014526987482368661),
('racist', 0.014521318292204636),
('air', 0.014513999400043521),
('flimsy', 0.014510298364381131),
('baldwin', 0.014458793249711601),
('merely', 0.014423588430956459),
('wood', 0.01440518212855918),
('thinking', 0.014365675477621536),
('earth', 0.01435295387020083),
('kidding', 0.014337420788166336),
('unintentionally', 0.014336443850996722),
('vampires', 0.014325905430975226),
('generic', 0.014319871170399814),
('defense', 0.014290336242912224),
('saif', 0.014289573796132719),
('asleep', 0.014289012435576958),
('execution', 0.01428396200827341),
('figure', 0.014283770855230148),
('lackluster', 0.014273058981901444),
('hoped', 0.014264724762345842),
('nonsense', 0.014261341497203126),
('horrid', 0.01425321660445842),
('god', 0.01423736354744793),
('l', 0.01418729677374257),
('caricatures', 0.014181564208326641),
('starts', 0.014153430344591595),
('dry', 0.014133935534427947),
('display', 0.014128179969827091),
('button', 0.014116471162614745),
('bore', 0.014116389381443268),
('empty', 0.014096772700681904),
('harold', 0.01405213089664656),
('incomprehensible', 0.014009428713655188),
('annie', 0.014008405850952511),
('thrown', 0.014007462594894682),
('incredibly', 0.014005185007294354),
('renting', 0.01392668760863046),
('connect', 0.013922471736926735),
('younger', 0.013921148395141743),
('author', 0.013908729139553388),
('mistakes', 0.013902060662024712),
('vague', 0.013900188409028451),
('susan', 0.013899718009237958),
('obvious', 0.013862928310275266),
('public', 0.013848261281553172),
('porn', 0.013842110384054581),
('trash', 0.013803990572178484),
('stevens', 0.013796967244647431),
('sequels', 0.013782463861472683),
('hurt', 0.013769543921240131),
('desert', 0.013763619124969734),
('did', 0.013737639449728181),
('behave', 0.013719767167839486),
('served', 0.01371483823922371),
('claims', 0.013706886269650505),
('ultimately', 0.01369764359110015),
('wide', 0.013685211021307753),
('wow', 0.013679184770624804),
('worthless', 0.013670533296298285),
('dear', 0.01365359137960015),
('plodding', 0.013622845840855251),
('mike', 0.013594086031988709),
('favor', 0.013578310381078488),
('call', 0.013577646631327921),
('biggest', 0.013529947586389569),
('worthy', 0.013524754842185308),
('meaning', 0.013517997531900548),
('scientific', 0.01351539665384285),
('hanks', 0.013467213376215899),
('gay', 0.013414840808688235),
('embarrassingly', 0.013401336286973733),
('literary', 0.013389208999321035),
('playing', 0.013329954634726381),
('bo', 0.013312890564682506),
('manipulative', 0.013287016941406323),
('dressed', 0.013285092423656568),
('embarrassment', 0.013269530319198216),
('regarding', 0.013233250211631659),
('stilted', 0.013215539220141915),
('sleeve', 0.013215085161586723),
('rating', 0.013203442200940888),
('kills', 0.013183919467358734),
('sounds', 0.013178727878711712),
('ali', 0.013173031266866366),
('non', 0.013162603751805228),
('pie', 0.013161492629253844),
('populated', 0.013152746747459266),
('killing', 0.013111860853151807),
('else', 0.013110592541316683),
('schneider', 0.013093514941690403),
('priest', 0.013071537555948209),
('hollow', 0.013068001463175459),
('shower', 0.013029604174841079),
('ruins', 0.013021597567104507),
('mental', 0.013019696244479805),
('this', 0.01300977816966453),
('pregnant', 0.012997074834619551),
('make', 0.01299285191649867),
('timberlake', 0.012979689860020446),
('saves', 0.012915795355367859),
('vastly', 0.012914828969565756),
('swear', 0.01290105947549007),
('stella', 0.012883911119651205),
('grave', 0.01288255504027714),
('thats', 0.012861061812910335),
('drinking', 0.012860129471019707),
('boom', 0.012851779594694185),
('introduction', 0.012831129197335454),
('programming', 0.012796219757750256),
('career', 0.012773059501084117),
('stereotype', 0.012769447626661472),
('attractive', 0.012765873120010159),
('victims', 0.012749299245502169),
('pass', 0.012735021821089279),
('experiment', 0.012716112941788907),
('retarded', 0.012713099529852412),
('stuck', 0.012709332698253251),
('akshay', 0.01268427306987787),
('cut', 0.012676285239015485),
('shoddy', 0.012674792040888047),
('damme', 0.01266653641765667),
('inaccurate', 0.012653687577536547),
('ray', 0.01264981802351017),
('woman', 0.012646521945546323),
('research', 0.012640494662864557),
('mile', 0.012627245693716727),
('place', 0.012624645831509396),
('demon', 0.012621688470792604),
('vulgar', 0.012612150302693324),
('engage', 0.012602272831074858),
('wives', 0.012601890190118301),
('mention', 0.012581598480006471),
('if', 0.012569631262234704),
('cartoon', 0.012561864177985764),
('unbelievably', 0.012550391668315846),
('only', 0.012517107727859128),
('ended', 0.012507282716729776),
('stereotypical', 0.012506426536204346),
('spent', 0.012503032775055239),
('thing', 0.012483110991541426),
('phone', 0.012464039991489134),
('stock', 0.012446742147556615),
('drop', 0.012432978683590463),
('self', 0.012432059211520803),
('escapes', 0.01241921129824892),
('conceived', 0.012392639977060704),
('required', 0.012392260947042827),
('assassin', 0.012332404091910096),
('meat', 0.012327751187890425),
('therefore', 0.012316138729629601),
('struggling', 0.012308628353572298),
('ho', 0.012307714936265705),
('ta', 0.012299409649320241),
('cold', 0.012289510775209258),
('expects', 0.012271684887263188),
('furthermore', 0.012263298696316198),
('remote', 0.012254529263879217),
('cgi', 0.012250569964074181),
('arab', 0.012230232115225252),
('feminist', 0.012220004405980534),
('hair', 0.012213792907949595),
('intelligence', 0.012203964889416771),
('destroy', 0.012190213907023965),
('cameo', 0.012186034087855131),
('claus', 0.012181510618531243),
('awake', 0.012171290237450144),
('sums', 0.012139945909251909),
('auto', 0.012126012687040619),
('cue', 0.012120943623008957),
('speak', 0.012117784815618097),
('stereotypes', 0.012106976159466581),
('footage', 0.012103658001584283),
('maker', 0.01209336953927035),
('rental', 0.012083052888147337),
('proper', 0.012063210621690412),
('mercifully', 0.012047936344961967),
('gimmick', 0.012041001769926642),
('coherent', 0.012027899920693617),
('inane', 0.011993175877578827),
('relies', 0.011992345660343812),
('nomination', 0.011982252573531246),
('segal', 0.011947340234058405),
('christians', 0.011946398905489899),
('overrated', 0.011926101166626013),
('don', 0.011924357980777277),
('severely', 0.011916168552237318),
('phony', 0.011913822393121727),
('selfish', 0.011900529017180243),
('resume', 0.011897346320859058),
('another', 0.01187768443136164),
('sean', 0.011876040214137608),
('hepburn', 0.011869243078008905),
('secondly', 0.01186310933445027),
('ups', 0.011859394818287428),
('planet', 0.011852030247443603),
('changed', 0.01184533561188748),
('amused', 0.011842962845878567),
('lowest', 0.011831634819501915),
('fools', 0.011824116232842369),
('spelling', 0.011821902194872624),
('repressed', 0.011821527286346348),
('unlikeable', 0.01181876011058648),
('failure', 0.011816519901709052),
('line', 0.011796438571873891),
('hyped', 0.011784666544684304),
('anti', 0.011764086315539161),
('acting', 0.011752348314205383),
('promise', 0.011749711660046624),
('observe', 0.011739608959278626),
('mindless', 0.011729368774426891),
('lacked', 0.011718485221863709),
('rather', 0.011704535222487891),
('ed', 0.011700096242496997),
('significant', 0.01169617650193994),
('talks', 0.011678101476086883),
('arty', 0.011674972481678897),
('spit', 0.011671408526135128),
('ilk', 0.011661568455359029),
('unoriginal', 0.011651107245840887),
('forward', 0.011646719533106094),
('toilet', 0.011635522207639078),
('suppose', 0.011633258510072186),
('feed', 0.01161744751742516),
('surrounded', 0.011607897169523127),
('wanted', 0.011604506869089724),
('tashan', 0.011596205445299108),
('dr', 0.01154394928133564),
('scare', 0.011543316667712905),
('murderer', 0.011535350571639676),
('explained', 0.011466329649783205),
('cheated', 0.011455846970137712),
('whats', 0.01145144357723085),
('romance', 0.011445558616225329),
('jewish', 0.01144156416364368),
('sexual', 0.011438682797255701),
('books', 0.01141981177753516),
('throwing', 0.011404165894740239),
('nose', 0.011395583651720624),
('parking', 0.011390688400833907),
('pick', 0.011357671445382181),
('chose', 0.011354353327826118),
('improve', 0.011350584813053918),
('kapoor', 0.011340767814074903),
('costs', 0.011325900726890981),
('saying', 0.011325617629551313),
('early', 0.011320525734188087),
('technically', 0.011317672837061938),
('hackman', 0.011288294849240651),
('birthday', 0.011282785404027751),
('cinematography', 0.011263572785831684),
('hurts', 0.011250154303091528),
('saturday', 0.011247837147971233),
('meaningless', 0.011239510238506719),
('mannered', 0.011239044207972256),
('screaming', 0.01123862031022237),
('should', 0.011236648355832369),
('crazed', 0.011236418275421324),
('dignity', 0.011236150963786546),
('mate', 0.0112167000098445),
('letters', 0.011208675517174478),
('recycled', 0.011206236378205576),
('promptly', 0.011202237607822145),
('inexplicably', 0.01116132181154625),
('or', 0.011152965343305343),
('simply', 0.011146233896835922),
('too', 0.011130044921930288),
('nerd', 0.011122543127721436),
('chris', 0.011116119389820144),
('proceedings', 0.011111786695547108),
('lived', 0.011100598930695569),
('code', 0.01109542524270142),
('potentially', 0.011093285835678523),
('open', 0.011075631889800954),
('faster', 0.011074177906888309),
('moore', 0.011070458274337773),
('bowl', 0.011060417562531431),
('absolutely', 0.011044130796846871),
('just', 0.011033356854991554),
('suspension', 0.011031781173072127),
('enemy', 0.011025820754518639),
('conclusion', 0.010986051066943338),
('hospital', 0.010977494845678686),
('romances', 0.010962761722118311),
('spoke', 0.010962116403553662),
('hardly', 0.010960545391113456),
('olds', 0.010951344004097441),
('creek', 0.010950023924322864),
('shouting', 0.01094372750254274),
('originality', 0.010912963822714918),
('bollywood', 0.010911409137577785),
('cape', 0.01090232612951828),
('teeth', 0.01090050204600262),
('backdrop', 0.010885688008708722),
('turn', 0.010880478059425644),
('mason', 0.010866951716170654),
('grace', 0.010848406257382322),
('valley', 0.010845180425875844),
('depressing', 0.010827818086738505),
('superficial', 0.01082640323755853),
('invested', 0.010812488716640862),
('bomb', 0.010811727591767118),
('embarrass', 0.010778451069403564),
('sided', 0.010773707983617679),
('sticking', 0.010762292435547709),
('common', 0.010754536408451008),
('boat', 0.010750196487059143),
('promised', 0.010746025901289747),
('wayans', 0.010744338945929416),
('sheer', 0.010734103279474522),
('wrestling', 0.010724515540975418),
('staff', 0.010715523520497053),
('apollo', 0.010711377643774767),
('leigh', 0.010702080598678557),
('virtually', 0.010691942663824007),
('seagal', 0.010677324100672111),
('comes', 0.0106748997197255),
('edition', 0.010673353805904191),
('predictably', 0.010666551243955741),
('stuff', 0.010664915811483258),
('gang', 0.010664441184213124),
('cancer', 0.010643225900463574),
('obviously', 0.010641670080654522),
('would', 0.010623530922231164),
('totally', 0.010616092995147883),
('profile', 0.010596003501785214),
('spacey', 0.010595967407784398),
('ability', 0.01058459252136016),
('horrendous', 0.010580213328532085),
('blood', 0.010579520401095313),
('imitation', 0.010568550630572958),
('bikini', 0.010568043371931093),
('talented', 0.010566001035979433),
('basis', 0.010564729746933205),
('dialogs', 0.010551191397294005),
('showing', 0.010548613564454221),
('door', 0.010544563357219762),
('portray', 0.01052779962849062),
('strictly', 0.010526959295132305),
('mexican', 0.01050873151782232),
('stick', 0.010465961443388669),
('east', 0.010455324716016765),
('anywhere', 0.010431532734666283),
('remake', 0.01041986919495284),
('am', 0.010410414209203916),
('attempting', 0.010386393998627376),
('disturbing', 0.010381152608581442),
('jude', 0.010377136500506754),
('wondering', 0.010363512690012198),
('celebrated', 0.01036011176907586),
('use', 0.010350554074714637),
('wreck', 0.010344734410393921),
('appear', 0.010344438351539177),
('entitled', 0.010335246001593065),
('youth', 0.010323214445994815),
('letdown', 0.01031855344625868),
('moran', 0.010305507693633359),
('mediocrity', 0.010302827140695369),
('news', 0.010292874788426091),
('bits', 0.010276065293631171),
('alone', 0.010268492053981953),
('accents', 0.010263852094534689),
('inhabited', 0.010244117693024815),
('mock', 0.010244061360675905),
('g', 0.010223458175403785),
('box', 0.010203304329265734),
('term', 0.010199983044386091),
('behavior', 0.010198776124373237),
('tedium', 0.010190092201507218),
('intent', 0.010190038120698582),
('husband', 0.01018950226595784),
('presence', 0.01018719233607417),
('z', 0.010184318583214757),
('unappealing', 0.010146391189444364),
('much', 0.010136790117697133),
('tree', 0.010113534581593916),
('doctors', 0.010099854380484191),
('pi', 0.010095099419111339),
('rodney', 0.010090819798082389),
('franchise', 0.010089650929674206),
('piece', 0.010086011549585341),
('company', 0.01008353958260106),
('choppy', 0.010079223420593732),
('turned', 0.010069855547990123),
('test', 0.010041505355613897),
('ball', 0.010040944323609524),
('hated', 0.010035509058945862),
('bear', 0.01003427246505746),
('serves', 0.010027495172169224),
('leonard', 0.010022751390164689),
('deserved', 0.010022334081283371),
('part', 0.010016360436147431),
('opportunity', 0.010013126012646686),
('turning', 0.010011850960865775),
('overacting', 0.010008994714980207),
('refer', 0.010006488920574083),
('flies', 0.010006418749637626),
('uninvolving', 0.0099991338976208165),
('produce', 0.0099962014038013722),
('jumpy', 0.0099947855808415129),
('die', 0.0099914129058671017),
('root', 0.0099747135001128327),
('insomnia', 0.0099744642555285069),
('blatant', 0.0099596620005663813),
('larry', 0.0099556905367902439),
('threw', 0.0099473965388449589),
('billed', 0.0099285818753670832),
('bullets', 0.0099281758971005909),
('intellectually', 0.009908138827878615),
('rip', 0.009901323399604086),
('stretching', 0.0099012969699172632),
('protest', 0.0098984552675623581),
('soldiers', 0.0098936923822449258),
('flick', 0.009887063364977652),
('justin', 0.009862246602717558),
('highlights', 0.0098589088020586291),
('move', 0.0098539899809540372),
('merit', 0.0098431205949966738),
('russian', 0.009841171721984102),
('security', 0.0098373450338831089),
('idiotic', 0.0098341234288144581),
('produced', 0.0098294307574258062),
('king', 0.009826687234317566),
('magically', 0.009822884247682559),
('united', 0.0098070847890707642),
('missile', 0.0097990578193348551),
('unlikable', 0.0097869158986480815),
('ignorant', 0.0097732743173460923),
('amateur', 0.0097674059870561138),
('bachelor', 0.0097673429455405695),
('asylum', 0.0097627338519779908),
('screw', 0.0097568098573927193),
('report', 0.0097479232699172434),
('dracula', 0.0097467323393205588),
('removed', 0.0097416519499422087),
('confess', 0.0097162925211573253),
('brand', 0.0097152534660907616),
('conspiracy', 0.0097116972290396987),
('horribly', 0.009708378556425248),
('switch', 0.009702684093379545),
('jaws', 0.0096877455513713073),
('unsuspecting', 0.0096853425035846423),
('betty', 0.0096770352133324685),
('forwarding', 0.0096711196893192741),
('university', 0.0096636715878149638),
('star', 0.0096623254931800309),
('crawl', 0.0096464318968590562),
('dopey', 0.0096460863315858663),
('ruin', 0.0096230106385457228),
('lifeless', 0.0096228807274879972),
('flash', 0.0096193625359650009),
('whoever', 0.0096174128915875422),
('coincidence', 0.0096024599741402102),
('choosing', 0.0095951100051069292),
('avid', 0.0095900913284222636),
('intended', 0.0095846987041676261),
('remained', 0.0095839628178583831),
('c', 0.0095732676681762417),
('waiting', 0.0095562258694348833),
('cassie', 0.0095481354442238063),
('garage', 0.0095349544587830237),
('clarke', 0.0095345445855698589),
('fortune', 0.0095330396648302049),
('interminable', 0.0095328159563552606),
('incessant', 0.0095235485026846332),
('plots', 0.0095225805490624683),
('danger', 0.0095171205654692899),
('costumes', 0.0094980144667524413),
('evidently', 0.0094952158467012243),
('minus', 0.0094911495174661263),
('reporters', 0.0094836811040990825),
('israeli', 0.0094750077183364638),
('failing', 0.0094711841313976849),
('paying', 0.00946923440668513),
('godzilla', 0.0094586915548437872),
('dumber', 0.0094582903092924817),
('earn', 0.009447622492842497),
('slows', 0.0094467463872487598),
('held', 0.0094452736817914641),
('chase', 0.0094438362611946568),
('lies', 0.0094383969845033347),
('hands', 0.0094381781614589089),
('grief', 0.009423849453410283),
('brains', 0.009418215341663207),
('tom', 0.0094130433384347137),
('resurrected', 0.0094083423437290523),
('sleeps', 0.0094017951882658275),
('porno', 0.0093907201413965108),
('somehow', 0.0093889261270860523),
('sarcasm', 0.0093886064393904137),
('tie', 0.0093856009366311641),
('fall', 0.0093801640008931118),
('bring', 0.0093791273545761507),
('rape', 0.0093760851230746296),
('village', 0.0093684513318614063),
('kitchen', 0.0093649071460109555),
('concerned', 0.0093611353238811264),
('republic', 0.0093499426948764237),
('hell', 0.0093400360705317119),
('inducing', 0.0093382129792553541),
('stomach', 0.0093378286385158524),
('shambles', 0.0093335457329829716),
('virgin', 0.0093312001339055962),
('extraneous', 0.0093250413800351276),
('cameras', 0.009322946026797712),
('suffers', 0.0093204929924829982),
('justified', 0.0093163217479363125),
('plummer', 0.0092948273285103911),
('ponderous', 0.0092880344237223321),
('player', 0.0092802296345443694),
('survivor', 0.0092767026472125765),
('rainy', 0.0092697034218137443),
('graces', 0.0092620944963291256),
...]

``````
``````

In [135]:

import matplotlib.colors as colors

words_to_visualize = list()
for word, ratio in pos_neg_ratios.most_common(500):
if(word in mlp_full.word2index.keys()):
words_to_visualize.append(word)

for word, ratio in list(reversed(pos_neg_ratios.most_common()))[0:500]:
if(word in mlp_full.word2index.keys()):
words_to_visualize.append(word)

``````
``````

In [136]:

pos = 0
neg = 0

colors_list = list()
vectors_list = list()
for word in words_to_visualize:
if word in pos_neg_ratios.keys():
vectors_list.append(mlp_full.weights_0_1[mlp_full.word2index[word]])
if(pos_neg_ratios[word] > 0):
pos+=1
colors_list.append("#00ff00")
else:
neg+=1
colors_list.append("#000000")

``````
``````

In [137]:

from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, random_state=0)
words_top_ted_tsne = tsne.fit_transform(vectors_list)

``````
``````

In [139]:

p = figure(tools="pan,wheel_zoom,reset,save",
toolbar_location="above",
title="vector T-SNE for most polarized words")

source = ColumnDataSource(data=dict(x1=words_top_ted_tsne[:,0],
x2=words_top_ted_tsne[:,1],
names=words_to_visualize))

p.scatter(x="x1", y="x2", size=8, source=source,color=colors_list)

word_labels = LabelSet(x="x1", y="x2", text="names", y_offset=6,
text_font_size="8pt", text_color="#555555",
source=source, text_align='center')

show(p)

# green indicates positive words, black indicates negative words

``````
``````

``````
``````

In [ ]:

``````