# Sentiment Classification & How To "Frame Problems" for a Neural Network

### What You Should Already Know

• neural networks, forward and back-propagation
• mean squared error
• and train/test splits

### Where to Get Help if You Need it

• Re-watch previous Udacity Lectures
• Leverage the recommended Course Reading Material - Grokking Deep Learning (40% Off: traskud17)
• Shoot me a tweet @iamtrask

### Tutorial Outline:

• Intro: The Importance of "Framing a Problem"
• Curate a Dataset
• Developing a "Predictive Theory"
• PROJECT 1: Quick Theory Validation
• Transforming Text to Numbers
• PROJECT 2: Creating the Input/Output Data
• Putting it all together in a Neural Network
• PROJECT 3: Building our Neural Network
• Understanding Neural Noise
• PROJECT 4: Making Learning Faster by Reducing Noise
• Analyzing Inefficiencies in our Network
• PROJECT 5: Making our Network Train and Run Faster
• Further Noise Reduction
• PROJECT 6: Reducing Noise by Strategically Reducing the Vocabulary
• Analysis: What's going on in the weights?

# Lesson: Curate a Dataset

``````

In [1]:

def pretty_print_review_and_label(i):
print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
g.close()

g = open('labels.txt','r') # What we WANT to know!
g.close()

``````
``````

In [2]:

len(reviews)

``````
``````

Out[2]:

25000

``````
``````

In [3]:

reviews[0]

``````
``````

Out[3]:

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   '

``````
``````

In [4]:

labels[0]

``````
``````

Out[4]:

'POSITIVE'

``````

# Lesson: Develop a Predictive Theory

``````

In [5]:

print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)

``````
``````

labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...

``````

# Project 1: Quick Theory Validation

``````

In [6]:

from collections import Counter
import numpy as np

``````
``````

In [7]:

positive_counts = Counter()
negative_counts = Counter()
total_counts = Counter()

``````
``````

In [8]:

for i in range(len(reviews)):
if(labels[i] == 'POSITIVE'):
for word in reviews[i].split(" "):
positive_counts[word] += 1
total_counts[word] += 1
else:
for word in reviews[i].split(" "):
negative_counts[word] += 1
total_counts[word] += 1

``````
``````

In [9]:

positive_counts.most_common()

``````
``````

Out[9]:

[('', 550468),
('the', 173324),
('.', 159654),
('and', 89722),
('a', 83688),
('of', 76855),
('to', 66746),
('is', 57245),
('in', 50215),
('br', 49235),
('it', 48025),
('i', 40743),
('that', 35630),
('this', 35080),
('s', 33815),
('as', 26308),
('with', 23247),
('for', 22416),
('was', 21917),
('film', 20937),
('but', 20822),
('movie', 19074),
('his', 17227),
('on', 17008),
('you', 16681),
('he', 16282),
('are', 14807),
('not', 14272),
('t', 13720),
('one', 13655),
('have', 12587),
('be', 12416),
('by', 11997),
('all', 11942),
('who', 11464),
('an', 11294),
('at', 11234),
('from', 10767),
('her', 10474),
('they', 9895),
('has', 9186),
('so', 9154),
('like', 9038),
('very', 8305),
('out', 8134),
('there', 8057),
('she', 7779),
('what', 7737),
('or', 7732),
('good', 7720),
('more', 7521),
('when', 7456),
('some', 7441),
('if', 7285),
('just', 7152),
('can', 7001),
('story', 6780),
('time', 6515),
('my', 6488),
('great', 6419),
('well', 6405),
('up', 6321),
('which', 6267),
('their', 6107),
('see', 6026),
('also', 5550),
('we', 5531),
('really', 5476),
('would', 5400),
('will', 5218),
('me', 5167),
('only', 5137),
('him', 5018),
('even', 4964),
('most', 4864),
('other', 4858),
('were', 4782),
('first', 4755),
('than', 4736),
('much', 4685),
('its', 4622),
('no', 4574),
('into', 4544),
('people', 4479),
('best', 4319),
('love', 4301),
('get', 4272),
('how', 4213),
('life', 4199),
('been', 4189),
('because', 4079),
('way', 4036),
('do', 3941),
('films', 3813),
('them', 3805),
('after', 3800),
('many', 3766),
('two', 3733),
('too', 3659),
('think', 3655),
('movies', 3586),
('characters', 3560),
('character', 3514),
('don', 3468),
('man', 3460),
('show', 3432),
('watch', 3424),
('seen', 3414),
('then', 3358),
('little', 3341),
('still', 3340),
('make', 3303),
('could', 3237),
('never', 3226),
('being', 3217),
('where', 3173),
('does', 3069),
('over', 3017),
('any', 3002),
('while', 2899),
('know', 2833),
('did', 2790),
('years', 2758),
('here', 2740),
('ever', 2734),
('end', 2696),
('these', 2694),
('such', 2590),
('real', 2568),
('scene', 2567),
('back', 2547),
('those', 2485),
('though', 2475),
('off', 2463),
('new', 2458),
('your', 2453),
('go', 2440),
('acting', 2437),
('plot', 2432),
('world', 2429),
('scenes', 2427),
('say', 2414),
('through', 2409),
('makes', 2390),
('better', 2381),
('now', 2368),
('work', 2346),
('young', 2343),
('old', 2311),
('ve', 2307),
('find', 2272),
('both', 2248),
('before', 2177),
('us', 2162),
('again', 2158),
('series', 2153),
('quite', 2143),
('something', 2135),
('cast', 2133),
('should', 2121),
('part', 2098),
('always', 2088),
('lot', 2087),
('another', 2075),
('actors', 2047),
('director', 2040),
('family', 2032),
('between', 2016),
('own', 2016),
('m', 1998),
('may', 1997),
('same', 1972),
('role', 1967),
('watching', 1966),
('every', 1954),
('funny', 1953),
('doesn', 1935),
('performance', 1928),
('few', 1918),
('look', 1900),
('re', 1884),
('why', 1855),
('things', 1849),
('times', 1832),
('big', 1815),
('however', 1795),
('actually', 1790),
('action', 1789),
('going', 1783),
('bit', 1757),
('comedy', 1742),
('down', 1740),
('music', 1738),
('must', 1728),
('take', 1709),
('saw', 1692),
('long', 1690),
('right', 1688),
('fun', 1686),
('fact', 1684),
('excellent', 1683),
('around', 1674),
('didn', 1672),
('without', 1671),
('thing', 1662),
('thought', 1639),
('got', 1635),
('each', 1630),
('day', 1614),
('feel', 1597),
('seems', 1596),
('come', 1594),
('done', 1586),
('beautiful', 1580),
('especially', 1572),
('played', 1571),
('almost', 1566),
('want', 1562),
('yet', 1556),
('give', 1553),
('pretty', 1549),
('last', 1543),
('since', 1519),
('different', 1504),
('although', 1501),
('gets', 1490),
('true', 1487),
('interesting', 1481),
('job', 1470),
('enough', 1455),
('our', 1454),
('shows', 1447),
('horror', 1441),
('woman', 1439),
('tv', 1400),
('probably', 1398),
('father', 1395),
('original', 1393),
('girl', 1390),
('point', 1379),
('plays', 1378),
('wonderful', 1372),
('far', 1358),
('course', 1358),
('john', 1350),
('rather', 1340),
('isn', 1328),
('ll', 1326),
('later', 1324),
('dvd', 1324),
('whole', 1310),
('war', 1310),
('d', 1307),
('found', 1306),
('away', 1306),
('screen', 1305),
('nothing', 1300),
('year', 1297),
('once', 1296),
('hard', 1294),
('together', 1280),
('set', 1277),
('am', 1277),
('having', 1266),
('making', 1265),
('place', 1263),
('might', 1260),
('comes', 1260),
('sure', 1253),
('american', 1248),
('play', 1245),
('kind', 1244),
('perfect', 1242),
('takes', 1242),
('performances', 1237),
('himself', 1230),
('worth', 1221),
('everyone', 1221),
('anyone', 1214),
('actor', 1203),
('three', 1201),
('wife', 1196),
('classic', 1192),
('goes', 1186),
('ending', 1178),
('version', 1168),
('star', 1149),
('enjoy', 1146),
('book', 1142),
('nice', 1132),
('everything', 1128),
('during', 1124),
('put', 1118),
('seeing', 1111),
('least', 1102),
('house', 1100),
('high', 1095),
('watched', 1094),
('loved', 1087),
('men', 1087),
('night', 1082),
('anything', 1075),
('believe', 1071),
('guy', 1071),
('top', 1063),
('amazing', 1058),
('hollywood', 1056),
('looking', 1053),
('main', 1044),
('definitely', 1043),
('gives', 1031),
('home', 1029),
('seem', 1028),
('episode', 1023),
('audience', 1020),
('sense', 1020),
('truly', 1017),
('special', 1011),
('second', 1009),
('short', 1009),
('fan', 1009),
('mind', 1005),
('human', 1001),
('recommend', 999),
('full', 996),
('black', 995),
('help', 991),
('along', 989),
('trying', 987),
('small', 986),
('death', 985),
('friends', 981),
('remember', 974),
('often', 970),
('said', 966),
('favorite', 962),
('heart', 959),
('early', 957),
('left', 956),
('until', 955),
('script', 954),
('let', 954),
('maybe', 937),
('today', 936),
('live', 934),
('less', 934),
('moments', 933),
('others', 929),
('brilliant', 926),
('shot', 925),
('liked', 923),
('become', 916),
('won', 915),
('used', 910),
('style', 907),
('mother', 895),
('lives', 894),
('came', 893),
('stars', 890),
('cinema', 889),
('looks', 885),
('perhaps', 884),
('enjoyed', 879),
('boy', 875),
('drama', 873),
('highly', 871),
('given', 870),
('playing', 867),
('use', 864),
('next', 859),
('women', 858),
('fine', 857),
('effects', 856),
('kids', 854),
('entertaining', 853),
('need', 852),
('line', 850),
('works', 848),
('someone', 847),
('mr', 836),
('simply', 835),
('picture', 833),
('children', 833),
('face', 831),
('keep', 831),
('friend', 831),
('dark', 830),
('overall', 828),
('certainly', 828),
('minutes', 827),
('wasn', 824),
('history', 822),
('finally', 820),
('couple', 816),
('against', 815),
('son', 809),
('understand', 808),
('lost', 807),
('michael', 805),
('else', 801),
('throughout', 798),
('fans', 797),
('city', 792),
('reason', 789),
('written', 787),
('production', 787),
('several', 784),
('school', 783),
('based', 781),
('rest', 781),
('try', 780),
('hope', 775),
('strong', 768),
('white', 765),
('tell', 759),
('itself', 758),
('half', 753),
('person', 749),
('sometimes', 746),
('past', 744),
('start', 744),
('genre', 743),
('beginning', 739),
('final', 739),
('town', 738),
('art', 734),
('humor', 732),
('game', 732),
('yes', 731),
('idea', 731),
('late', 730),
('becomes', 729),
('despite', 729),
('able', 726),
('case', 726),
('money', 723),
('child', 721),
('completely', 721),
('side', 719),
('camera', 716),
('getting', 714),
('soon', 702),
('under', 700),
('viewer', 699),
('age', 697),
('days', 696),
('stories', 696),
('felt', 694),
('simple', 694),
('roles', 693),
('video', 688),
('name', 683),
('either', 683),
('doing', 677),
('turns', 674),
('wants', 671),
('close', 671),
('title', 669),
('wrong', 668),
('went', 666),
('james', 665),
('evil', 659),
('budget', 657),
('episodes', 657),
('relationship', 655),
('fantastic', 653),
('piece', 653),
('david', 651),
('turn', 648),
('murder', 646),
('parts', 645),
('brother', 644),
('absolutely', 643),
('experience', 642),
('eyes', 641),
('sex', 638),
('direction', 637),
('called', 637),
('directed', 636),
('lines', 634),
('behind', 633),
('sort', 632),
('actress', 631),
('oscar', 628),
('including', 627),
('example', 627),
('known', 625),
('musical', 625),
('chance', 621),
('score', 620),
('feeling', 619),
('hit', 619),
('voice', 615),
('moment', 612),
('living', 612),
('low', 610),
('supporting', 610),
('ago', 609),
('themselves', 608),
('reality', 605),
('hilarious', 605),
('jack', 604),
('told', 603),
('hand', 601),
('quality', 600),
('moving', 600),
('dialogue', 600),
('song', 599),
('happy', 599),
('matter', 598),
('paul', 598),
('light', 594),
('future', 593),
('entire', 592),
('finds', 591),
('gave', 589),
('laugh', 587),
('released', 586),
('expect', 584),
('fight', 581),
('particularly', 580),
('cinematography', 579),
('police', 579),
('whose', 578),
('type', 578),
('sound', 578),
('view', 573),
('enjoyable', 573),
('number', 572),
('romantic', 572),
('husband', 572),
('daughter', 572),
('documentary', 571),
('self', 570),
('superb', 569),
('modern', 569),
('took', 569),
('robert', 569),
('mean', 566),
('shown', 563),
('coming', 561),
('important', 560),
('king', 559),
('leave', 559),
('change', 558),
('somewhat', 555),
('wanted', 555),
('tells', 554),
('events', 552),
('run', 552),
('career', 552),
('country', 552),
('heard', 550),
('season', 550),
('greatest', 549),
('girls', 549),
('etc', 547),
('care', 546),
('starts', 545),
('english', 542),
('killer', 541),
('tale', 540),
('guys', 540),
('totally', 540),
('animation', 540),
('usual', 539),
('miss', 535),
('opinion', 535),
('easy', 531),
('violence', 531),
('songs', 530),
('british', 528),
('says', 526),
('realistic', 525),
('writing', 524),
('writer', 522),
('act', 522),
('comic', 521),
('thriller', 519),
('television', 517),
('power', 516),
('ones', 515),
('kid', 514),
('york', 513),
('novel', 513),
('alone', 512),
('problem', 512),
('attention', 509),
('involved', 508),
('kill', 507),
('extremely', 507),
('seemed', 506),
('hero', 505),
('french', 505),
('rock', 504),
('stuff', 501),
('wish', 499),
('begins', 498),
('taken', 497),
('ways', 496),
('richard', 495),
('knows', 494),
('atmosphere', 493),
('similar', 491),
('surprised', 491),
('taking', 491),
('car', 491),
('george', 490),
('perfectly', 490),
('across', 489),
('team', 489),
('eye', 489),
('sequence', 489),
('room', 488),
('due', 488),
('among', 488),
('serious', 488),
('powerful', 488),
('strange', 487),
('order', 487),
('cannot', 487),
('b', 487),
('beauty', 486),
('famous', 485),
('happened', 484),
('tries', 484),
('herself', 484),
('myself', 484),
('class', 483),
('four', 482),
('cool', 481),
('release', 479),
('anyway', 479),
('theme', 479),
('opening', 478),
('entertainment', 477),
('slow', 475),
('ends', 475),
('unique', 475),
('exactly', 475),
('easily', 474),
('level', 474),
('o', 474),
('red', 474),
('interest', 472),
('happen', 471),
('crime', 470),
('viewing', 468),
('sets', 467),
('memorable', 467),
('stop', 466),
('group', 466),
('problems', 463),
('dance', 463),
('working', 463),
('sister', 463),
('message', 463),
('knew', 462),
('mystery', 461),
('nature', 461),
('bring', 460),
('believable', 459),
('thinking', 459),
('brought', 459),
('mostly', 458),
('disney', 457),
('couldn', 457),
('society', 456),
('within', 455),
('blood', 454),
('parents', 453),
('upon', 453),
('viewers', 453),
('meets', 452),
('form', 452),
('peter', 452),
('tom', 452),
('usually', 452),
('soundtrack', 452),
('local', 450),
('certain', 448),
('follow', 448),
('whether', 447),
('possible', 446),
('emotional', 445),
('killed', 444),
('above', 444),
('de', 444),
('god', 443),
('middle', 443),
('needs', 442),
('happens', 442),
('flick', 442),
('masterpiece', 441),
('period', 440),
('major', 440),
('named', 439),
('haven', 439),
('particular', 438),
('th', 438),
('earth', 437),
('feature', 437),
('stand', 436),
('words', 435),
('typical', 435),
('elements', 433),
('obviously', 433),
('romance', 431),
('jane', 430),
('yourself', 427),
('showing', 427),
('brings', 426),
('fantasy', 426),
('guess', 423),
('america', 423),
('unfortunately', 422),
('huge', 422),
('indeed', 421),
('running', 421),
('talent', 420),
('stage', 419),
('started', 418),
('sweet', 417),
('japanese', 417),
('poor', 416),
('deal', 416),
('incredible', 413),
('personal', 413),
('fast', 412),
('became', 410),
('deep', 410),
('hours', 409),
('giving', 408),
('nearly', 408),
('dream', 408),
('clearly', 407),
('turned', 407),
('obvious', 406),
('near', 406),
('cut', 405),
('surprise', 405),
('era', 404),
('body', 404),
('hour', 403),
('female', 403),
('five', 403),
('note', 399),
('learn', 398),
('truth', 398),
('except', 397),
('feels', 397),
('match', 397),
('tony', 397),
('filmed', 394),
('clear', 394),
('complete', 394),
('street', 393),
('eventually', 393),
('keeps', 393),
('older', 393),
('lots', 393),
('william', 391),
('stewart', 391),
('fall', 390),
('joe', 390),
('meet', 390),
('unlike', 389),
('talking', 389),
('shots', 389),
('rating', 389),
('difficult', 389),
('dramatic', 388),
('means', 388),
('situation', 386),
('wonder', 386),
('present', 386),
('appears', 386),
('subject', 386),
('general', 383),
('sequences', 383),
('lee', 383),
('points', 382),
('earlier', 382),
('gone', 379),
('check', 379),
('suspense', 378),
('recommended', 378),
('ten', 378),
('third', 377),
('talk', 375),
('leaves', 375),
('beyond', 375),
('portrayal', 374),
('beautifully', 373),
('single', 372),
('bill', 372),
('plenty', 371),
('word', 371),
('whom', 370),
('falls', 370),
('scary', 369),
('non', 369),
('figure', 369),
('battle', 369),
('using', 368),
('return', 368),
('doubt', 367),
('hear', 366),
('solid', 366),
('success', 366),
('jokes', 365),
('oh', 365),
('touching', 365),
('political', 365),
('hell', 364),
('awesome', 364),
('boys', 364),
('sexual', 362),
('recently', 362),
('dog', 362),
('wouldn', 361),
('straight', 361),
('features', 361),
('forget', 360),
('setting', 360),
('lack', 360),
('married', 359),
('mark', 359),
('social', 357),
('interested', 356),
('actual', 355),
('terrific', 355),
('sees', 355),
('brothers', 355),
('move', 354),
('call', 354),
('various', 353),
('theater', 353),
('dr', 353),
('animated', 352),
('western', 351),
('baby', 350),
('space', 350),
('disappointed', 348),
('portrayed', 346),
('aren', 346),
('screenplay', 345),
('smith', 345),
('towards', 344),
('hate', 344),
('noir', 343),
('outstanding', 342),
('decent', 342),
('kelly', 342),
('directors', 341),
('journey', 341),
('none', 340),
('looked', 340),
('effective', 340),
('storyline', 339),
('caught', 339),
('sci', 339),
('fi', 339),
('cold', 339),
('mary', 339),
('rich', 338),
('charming', 338),
('popular', 337),
('rare', 337),
('manages', 337),
('harry', 337),
('spirit', 336),
('appreciate', 335),
('open', 335),
('moves', 334),
('basically', 334),
('acted', 334),
('inside', 333),
('boring', 333),
('century', 333),
('mention', 333),
('deserves', 333),
('subtle', 333),
('pace', 333),
('familiar', 332),
('background', 332),
('ben', 331),
('creepy', 330),
('supposed', 330),
('secret', 329),
('die', 328),
('jim', 328),
('question', 327),
('effect', 327),
('natural', 327),
('impressive', 326),
('rate', 326),
('language', 326),
('saying', 325),
('intelligent', 325),
('telling', 324),
('realize', 324),
('material', 324),
('scott', 324),
('singing', 323),
('dancing', 322),
('visual', 321),
('imagine', 321),
('kept', 320),
('office', 320),
('uses', 319),
('pure', 318),
('wait', 318),
('stunning', 318),
('review', 317),
('previous', 317),
('copy', 317),
('seriously', 317),
('create', 316),
('hot', 316),
('created', 316),
('magic', 316),
('somehow', 316),
('stay', 315),
('attempt', 315),
('escape', 315),
('crazy', 315),
('air', 315),
('frank', 315),
('hands', 314),
('filled', 313),
('expected', 312),
('average', 312),
('surprisingly', 312),
('complex', 311),
('quickly', 310),
('successful', 310),
('studio', 310),
('plus', 309),
('male', 309),
('co', 307),
('images', 306),
('casting', 306),
('following', 306),
('minute', 306),
('exciting', 306),
('members', 305),
('follows', 305),
('themes', 305),
('german', 305),
('reasons', 305),
('e', 305),
('touch', 304),
('edge', 304),
('free', 304),
('cute', 304),
('genius', 304),
('outside', 303),
('reviews', 302),
('ok', 302),
('younger', 302),
('fighting', 301),
('odd', 301),
('master', 301),
('recent', 300),
('thanks', 300),
('break', 300),
('comment', 300),
('apart', 299),
('emotions', 298),
('lovely', 298),
('begin', 298),
('doctor', 297),
('party', 297),
('italian', 297),
('la', 296),
('missed', 296),
...]

``````
``````

In [10]:

pos_neg_ratios = Counter()

for term,cnt in list(total_counts.most_common()):
if(cnt > 100):
pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)
pos_neg_ratios[term] = pos_neg_ratio

for word,ratio in pos_neg_ratios.most_common():
if(ratio > 1):
pos_neg_ratios[word] = np.log(ratio)
else:
pos_neg_ratios[word] = -np.log((1 / (ratio+0.01)))

``````
``````

In [11]:

# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()

``````
``````

Out[11]:

[('edie', 4.6913478822291435),
('paulie', 4.0775374439057197),
('felix', 3.1527360223636558),
('polanski', 2.8233610476132043),
('matthau', 2.8067217286092401),
('victoria', 2.6810215287142909),
('mildred', 2.6026896854443837),
('gandhi', 2.5389738710582761),
('flawless', 2.451005098112319),
('superbly', 2.2600254785752498),
('perfection', 2.1594842493533721),
('astaire', 2.1400661634962708),
('captures', 2.0386195471595809),
('voight', 2.0301704926730531),
('wonderfully', 2.0218960560332353),
('powell', 1.9783454248084671),
('brosnan', 1.9547990964725592),
('lily', 1.9203768470501485),
('bakshi', 1.9029851043382795),
('lincoln', 1.9014583864844796),
('refreshing', 1.8551812956655511),
('breathtaking', 1.8481124057791867),
('bourne', 1.8478489358790986),
('lemmon', 1.8458266904983307),
('delightful', 1.8002701588959635),
('flynn', 1.7996646487351682),
('andrews', 1.7764919970972666),
('homer', 1.7692866133759964),
('beautifully', 1.7626953362841438),
('soccer', 1.7578579175523736),
('elvira', 1.7397031072720019),
('underrated', 1.7197859696029656),
('gripping', 1.7165360479904674),
('superb', 1.7091514458966952),
('delight', 1.6714733033535532),
('welles', 1.6677068205580761),
('sinatra', 1.6389967146756448),
('touching', 1.637217476541176),
('timeless', 1.62924053973028),
('macy', 1.6211339521972916),
('unforgettable', 1.6177367152487956),
('favorites', 1.6158688027643908),
('stewart', 1.6119987332957739),
('sullivan', 1.6094379124341003),
('extraordinary', 1.6094379124341003),
('hartley', 1.6094379124341003),
('brilliantly', 1.5950491749820008),
('friendship', 1.5677652160335325),
('wonderful', 1.5645425925262093),
('palma', 1.5553706911638245),
('magnificent', 1.54663701119507),
('finest', 1.5462590108125689),
('jackie', 1.5439233053234738),
('ritter', 1.5404450409471491),
('tremendous', 1.5184661342283736),
('freedom', 1.5091151908062312),
('fantastic', 1.5048433868558566),
('terrific', 1.5026699370083942),
('noir', 1.493925025312256),
('sidney', 1.493925025312256),
('outstanding', 1.4910053152089213),
('pleasantly', 1.4894785973551214),
('mann', 1.4894785973551214),
('nancy', 1.488077055429833),
('marie', 1.4825711915553104),
('marvelous', 1.4739999415389962),
('excellent', 1.4647538505723599),
('ruth', 1.4596256342054401),
('stanwyck', 1.4412101187160054),
('widmark', 1.4350845252893227),
('splendid', 1.4271163556401458),
('chan', 1.423108334242607),
('exceptional', 1.4201959127955721),
('tender', 1.410986973710262),
('gentle', 1.4078005663408544),
('poignant', 1.4022947024663317),
('gem', 1.3932148039644643),
('amazing', 1.3919815802404802),
('chilling', 1.3862943611198906),
('fisher', 1.3862943611198906),
('davies', 1.3862943611198906),
('captivating', 1.3862943611198906),
('darker', 1.3652409519220583),
('april', 1.3499267169490159),
('kelly', 1.3461743673304654),
('blake', 1.3418425985490567),
('overlooked', 1.329135947279942),
('ralph', 1.32818673031261),
('bette', 1.3156767939059373),
('hoffman', 1.3150668518315229),
('cole', 1.3121863889661687),
('shines', 1.3049487216659381),
('powerful', 1.2999662776313934),
('notch', 1.2950456896547455),
('remarkable', 1.2883688239495823),
('pitt', 1.286210902562908),
('winters', 1.2833463918674481),
('vivid', 1.2762934659055623),
('gritty', 1.2757524867200667),
('giallo', 1.2745029551317739),
('portrait', 1.2704625455947689),
('innocence', 1.2694300209805796),
('psychiatrist', 1.2685113254635072),
('favorite', 1.2668956297860055),
('ensemble', 1.2656663733312759),
('stunning', 1.2622417124499117),
('burns', 1.259880436264232),
('garbo', 1.258954938743289),
('barbara', 1.2580400255962119),
('philip', 1.2527629684953681),
('panic', 1.2527629684953681),
('holly', 1.2527629684953681),
('carol', 1.2481440226390734),
('perfect', 1.246742480713785),
('appreciated', 1.2462482874741743),
('favourite', 1.2411123512753928),
('journey', 1.2367626271489269),
('rural', 1.235471471385307),
('bond', 1.2321436812926323),
('builds', 1.2305398317106577),
('brilliant', 1.2287554137664785),
('brooklyn', 1.2286654169163074),
('von', 1.225175011976539),
('recommended', 1.2163953243244932),
('unfolds', 1.2163953243244932),
('daniel', 1.20215296760895),
('perfectly', 1.1971931173405572),
('crafted', 1.1962507582320256),
('prince', 1.1939224684724346),
('troubled', 1.192138346678933),
('consequences', 1.1865810616140668),
('haunting', 1.1814999484738773),
('cinderella', 1.180052620608284),
('alexander', 1.1759989522835299),
('emotions', 1.1753049094563641),
('boxing', 1.1735135968412274),
('subtle', 1.1734135017508081),
('curtis', 1.1649873576129823),
('rare', 1.1566438362402944),
('loved', 1.1563661500586044),
('daughters', 1.1526795099383853),
('courage', 1.1438688802562305),
('dentist', 1.1426722784621401),
('highly', 1.1420208631618658),
('nominated', 1.1409146683587992),
('tony', 1.1397491942285991),
('draws', 1.1325138403437911),
('everyday', 1.1306150197542835),
('contrast', 1.1284652518177909),
('cried', 1.1213405397456659),
('fabulous', 1.1210851445201684),
('ned', 1.120591195386885),
('fay', 1.120591195386885),
('emma', 1.1184149159642893),
('sensitive', 1.113318436057805),
('smooth', 1.1089750757036563),
('dramas', 1.1080910326226534),
('today', 1.1050431789984001),
('helps', 1.1023091505494358),
('inspiring', 1.0986122886681098),
('jimmy', 1.0937696641923216),
('awesome', 1.0931328229034842),
('unique', 1.0881409888008142),
('tragic', 1.0871835928444868),
('intense', 1.0870514662670339),
('stellar', 1.0857088838322018),
('rival', 1.0822184788924332),
('provides', 1.0797081340289569),
('depression', 1.0782034170369026),
('shy', 1.0775588794702773),
('carrie', 1.076139432816051),
('blend', 1.0753554265038423),
('hank', 1.0736109864626924),
('diana', 1.0726368022648489),
('unexpected', 1.0722255334949147),
('achievement', 1.0668635903535293),
('bettie', 1.0663514264498881),
('happiness', 1.0632729222228008),
('glorious', 1.0608719606852626),
('davis', 1.0541605260972757),
('terrifying', 1.0525211814678428),
('beauty', 1.050410186850232),
('ideal', 1.0479685558493548),
('fears', 1.0467872208035236),
('hong', 1.0438040521731147),
('seasons', 1.0433496099930604),
('fascinating', 1.0414538748281612),
('carries', 1.0345904299031787),
('satisfying', 1.0321225473992768),
('definite', 1.0319209141694374),
('touched', 1.0296194171811581),
('greatest', 1.0248947127715422),
('creates', 1.0241097613701886),
('aunt', 1.023388867430522),
('walter', 1.022328983918479),
('spectacular', 1.0198314108149955),
('portrayal', 1.0189810189761024),
('ann', 1.0127808528183286),
('enterprise', 1.0116009116784799),
('musicals', 1.0096648026516135),
('deeply', 1.0094845087721023),
('incredible', 1.0061677561461084),
('mature', 1.0060195018402847),
('triumph', 0.99682959435816731),
('margaret', 0.99682959435816731),
('navy', 0.99493385919326827),
('harry', 0.99176919305006062),
('lucas', 0.990398704027877),
('sweet', 0.98966110487955483),
('joey', 0.98794672078059009),
('oscar', 0.98721905111049713),
('balance', 0.98649499054740353),
('warm', 0.98485340331145166),
('ages', 0.98449898190068863),
('guilt', 0.98082925301172619),
('glover', 0.98082925301172619),
('carrey', 0.98082925301172619),
('learns', 0.97881108885548895),
('unusual', 0.97788374278196932),
('sons', 0.97777581552483595),
('complex', 0.97761897738147796),
('essence', 0.97753435711487369),
('brazil', 0.9769153536905899),
('widow', 0.97650959186720987),
('solid', 0.97537964824416146),
('beautiful', 0.97326301262841053),
('holmes', 0.97246100334120955),
('awe', 0.97186058302896583),
('vhs', 0.97116734209998934),
('eerie', 0.97116734209998934),
('lonely', 0.96873720724669754),
('grim', 0.96873720724669754),
('sport', 0.96825047080486615),
('debut', 0.96508089604358704),
('destiny', 0.96343751029985703),
('thrillers', 0.96281074750904794),
('tears', 0.95977584381389391),
('rose', 0.95664202739772253),
('feelings', 0.95551144502743635),
('ginger', 0.95551144502743635),
('winning', 0.95471810900804055),
('stanley', 0.95387344302319799),
('cox', 0.95343027882361187),
('paris', 0.95278479030472663),
('heart', 0.95238806924516806),
('hooked', 0.95155887071161305),
('comfortable', 0.94803943018873538),
('mgm', 0.94446160884085151),
('masterpiece', 0.94155039863339296),
('themes', 0.94118828349588235),
('danny', 0.93967118051821874),
('anime', 0.93378388932167222),
('perry', 0.93328830824272613),
('joy', 0.93301752567946861),
('lovable', 0.93081883243706487),
('mysteries', 0.92953595862417571),
('hal', 0.92953595862417571),
('louis', 0.92871325187271225),
('charming', 0.92520609553210742),
('urban', 0.92367083917177761),
('allows', 0.92183091224977043),
('impact', 0.91815814604895041),
('italy', 0.91629073187415511),
('lifestyle', 0.91629073187415511),
('spy', 0.91289514287301687),
('treat', 0.91193342650519937),
('subsequent', 0.91056005716517008),
('kennedy', 0.90981821736853763),
('loving', 0.90967549275543591),
('surprising', 0.90937028902958128),
('quiet', 0.90648673177753425),
('winter', 0.90624039602065365),
('reveals', 0.90490540964902977),
('raw', 0.90445627422715225),
('funniest', 0.90078654533818991),
('norman', 0.89994159387262562),
('thief', 0.89874642222324552),
('season', 0.89827222637147675),
('secrets', 0.89794159320595857),
('colorful', 0.89705936994626756),
('highest', 0.8967461358011849),
('compelling', 0.89462923509297576),
('danes', 0.89248008318043659),
('castle', 0.88967708335606499),
('kudos', 0.88889175768604067),
('great', 0.88810470901464589),
('baseball', 0.88730319500090271),
('subtitles', 0.88730319500090271),
('bleak', 0.88730319500090271),
('winner', 0.88643776872447388),
('tragedy', 0.88563699078315261),
('todd', 0.88551907320740142),
('nicely', 0.87924946019380601),
('arthur', 0.87546873735389985),
('essential', 0.87373111745535925),
('gorgeous', 0.8731725250935497),
('fonda', 0.87294029100054127),
('eastwood', 0.87139541196626402),
('focuses', 0.87082835779739776),
('enjoyed', 0.87070195951624607),
('natural', 0.86997924506912838),
('intensity', 0.86835126958503595),
('witty', 0.86824103423244681),
('rob', 0.8642954367557748),
('worlds', 0.86377269759070874),
('health', 0.86113891179907498),
('magical', 0.85953791528170564),
('deeper', 0.85802182375017932),
('lucy', 0.85618680780444956),
('moving', 0.85566611005772031),
('lovely', 0.85290640004681306),
('purple', 0.8513711857748395),
('memorable', 0.84801189112086062),
('sings', 0.84729786038720367),
('craig', 0.84342938360928321),
('modesty', 0.84342938360928321),
('relate', 0.84326559685926517),
('episodes', 0.84223712084137292),
('strong', 0.84167135777060931),
('smith', 0.83959811108590054),
('tear', 0.83704136022001441),
('apartment', 0.83333115290549531),
('princess', 0.83290912293510388),
('disagree', 0.83290912293510388),
('kung', 0.83173334384609199),
('columbo', 0.82667857318446791),
('jake', 0.82667857318446791),
('hart', 0.82472353834866463),
('strength', 0.82417544296634937),
('realizes', 0.82360006895738058),
('dave', 0.8232003088081431),
('childhood', 0.82208086393583857),
('forbidden', 0.81989888619908913),
('tight', 0.81883539572344199),
('surreal', 0.8178506590609026),
('manager', 0.81770990320170756),
('dancer', 0.81574950265227764),
('studios', 0.81093021621632877),
('con', 0.81093021621632877),
('miike', 0.80821651034473263),
('realistic', 0.80807714723392232),
('explicit', 0.80792269515237358),
('kurt', 0.8060875917405409),
('deals', 0.80535917116687328),
('holds', 0.80493858654806194),
('carl', 0.80437281567016972),
('touches', 0.80396154690023547),
('gene', 0.80314807577427383),
('albert', 0.8027669055771679),
('abc', 0.80234647252493729),
('cry', 0.80011930011211307),
('sides', 0.7995275841185171),
('develops', 0.79850769621777162),
('eyre', 0.79850769621777162),
('dances', 0.79694397424158891),
('oscars', 0.79633141679517616),
('legendary', 0.79600456599965308),
('hearted', 0.79492987486988764),
('importance', 0.79492987486988764),
('portraying', 0.79356592830699269),
('impressed', 0.79258107754813223),
('waters', 0.79112758892014912),
('empire', 0.79078565012386137),
('edge', 0.789774016249017),
('jean', 0.78845736036427028),
('environment', 0.78845736036427028),
('sentimental', 0.7864791203521645),
('captured', 0.78623760362595729),
('styles', 0.78592891401091158),
('daring', 0.78592891401091158),
('frank', 0.78275933924963248),
('tense', 0.78275933924963248),
('backgrounds', 0.78275933924963248),
('matches', 0.78275933924963248),
('gothic', 0.78209466657644144),
('sharp', 0.7814397877056235),
('achieved', 0.78015855754957497),
('court', 0.77947526404844247),
('steals', 0.7789140023173704),
('rules', 0.77844476107184035),
('colors', 0.77684619943659217),
('reunion', 0.77318988823348167),
('covers', 0.77139937745969345),
('tale', 0.77010822169607374),
('rain', 0.7683706017975328),
('denzel', 0.76804848873306297),
('stays', 0.76787072675588186),
('blob', 0.76725515271366718),
('maria', 0.76214005204689672),
('conventional', 0.76214005204689672),
('fresh', 0.76158434211317383),
('midnight', 0.76096977689870637),
('landscape', 0.75852993982279704),
('animated', 0.75768570169751648),
('titanic', 0.75666058628227129),
('sunday', 0.75666058628227129),
('spring', 0.7537718023763802),
('cagney', 0.7537718023763802),
('enjoyable', 0.75246375771636476),
('immensely', 0.75198768058287868),
('sir', 0.7507762933965817),
('nevertheless', 0.75067102469813185),
('driven', 0.74994477895307854),
('performances', 0.74883252516063137),
('memories', 0.74721440183022114),
('simple', 0.74641420974143258),
('golden', 0.74533293373051557),
('leslie', 0.74533293373051557),
('lovers', 0.74497224842453125),
('relationship', 0.74484232345601786),
('supporting', 0.74357803418683721),
('che', 0.74262723782331497),
('packed', 0.7410032017375805),
('trek', 0.74021469141793106),
('provoking', 0.73840377214806618),
('strikes', 0.73759894313077912),
('depiction', 0.73682224406260699),
('emotional', 0.73678211645681524),
('secretary', 0.7366322924996842),
('influenced', 0.73511137965897755),
('florida', 0.73511137965897755),
('germany', 0.73288750920945944),
('brings', 0.73142936713096229),
('lewis', 0.73129894652432159),
('elderly', 0.73088750854279239),
('owner', 0.72743625403857748),
('streets', 0.72666987259858895),
('henry', 0.72642196944481741),
('portrays', 0.72593700338293632),
('bears', 0.7252354951114458),
('china', 0.72489587887452556),
('anger', 0.72439972406404984),
('society', 0.72433010799663333),
('available', 0.72415741730250549),
('best', 0.72347034060446314),
('bugs', 0.72270598280148979),
('magic', 0.71878961117328299),
('delivers', 0.71846498854423513),
('verhoeven', 0.71846498854423513),
('jim', 0.71783979315031676),
('donald', 0.71667767797013937),
('endearing', 0.71465338578090898),
('relationships', 0.71393795022901896),
('greatly', 0.71256526641704687),
('charlie', 0.71024161391924534),
('simon', 0.70967648251115578),
('effectively', 0.70914752190638641),
('march', 0.70774597998109789),
('atmosphere', 0.70744773070214162),
('influence', 0.70733181555190172),
('genius', 0.706392407309966),
('emotionally', 0.70556970055850243),
('ken', 0.70526854109229009),
('identity', 0.70484322032313651),
('sophisticated', 0.70470800296102132),
('dan', 0.70457587638356811),
('andrew', 0.70329955202396321),
('india', 0.70144598337464037),
('roy', 0.69970458110610434),
('surprisingly', 0.6995780708902356),
('sky', 0.69780919366575667),
('romantic', 0.69664981111114743),
('match', 0.69566924999265523),
('meets', 0.69314718055994529),
('cowboy', 0.69314718055994529),
('wave', 0.69314718055994529),
('bitter', 0.69314718055994529),
('patient', 0.69314718055994529),
('stylish', 0.69314718055994529),
('britain', 0.69314718055994529),
('affected', 0.69314718055994529),
('beatty', 0.69314718055994529),
('love', 0.69198533541937324),
('paul', 0.68980827929443067),
('andy', 0.68846333124751902),
('performance', 0.68797386327972465),
('patrick', 0.68645819240914863),
('unlike', 0.68546468438792907),
('brooks', 0.68433655087779044),
('refuses', 0.68348526964820844),
('award', 0.6824518914431974),
('complaint', 0.6824518914431974),
('ride', 0.68229716453587952),
('dawson', 0.68171848473632257),
('luke', 0.68158635815886937),
('wells', 0.68087708796813096),
('france', 0.6804081547825156),
('sports', 0.68007509899259255),
('handsome', 0.68007509899259255),
('directs', 0.67875844310784572),
('rebel', 0.67875844310784572),
('greater', 0.67605274720064523),
('dreams', 0.67599410133369586),
('effective', 0.67565402311242806),
('interpretation', 0.67479804189174875),
('works', 0.67445504754779284),
('brando', 0.67445504754779284),
('noble', 0.6737290947028437),
('paced', 0.67314651385327573),
('le', 0.67067432470788668),
('master', 0.67015766233524654),
('h', 0.6696166831497512),
('rings', 0.66904962898088483),
('easy', 0.66895995494594152),
('city', 0.66820823221269321),
('sunshine', 0.66782937257565544),
('succeeds', 0.66647893347778397),
('relations', 0.664159643686693),
('england', 0.66387679825983203),
('glimpse', 0.66329421741026418),
('aired', 0.66268797307523675),
('sees', 0.66263163663399482),
('both', 0.66248336767382998),
('definitely', 0.66199789483898808),
('imaginative', 0.66139848224536502),
('appreciate', 0.66083893732728749),
('tricks', 0.66071190480679143),
('striking', 0.66071190480679143),
('carefully', 0.65999497324304479),
('complicated', 0.65981076029235353),
('perspective', 0.65962448852130173),
('trilogy', 0.65877953705573755),
('future', 0.65834665141052828),
('lion', 0.65742909795786608),
('douglas', 0.65540685257709819),
('victor', 0.65540685257709819),
('inspired', 0.65459851044271034),
('marriage', 0.65392646740666405),
('demands', 0.65392646740666405),
('father', 0.65172321672194655),
('page', 0.65123628494430852),
('instant', 0.65058756614114943),
('era', 0.6495567444850836),
('ruthless', 0.64934455790155243),
('saga', 0.64934455790155243),
('joan', 0.64891392558311978),
('joseph', 0.64841128671855386),
('workers', 0.64829661439459352),
('fantasy', 0.64726757480925168),
('distant', 0.64551913157069074),
('accomplished', 0.64551913157069074),
('manhattan', 0.64435701639051324),
('personal', 0.64355023942057321),
('meeting', 0.64313675998528386),
('individual', 0.64313675998528386),
('pushing', 0.64313675998528386),
('pleasant', 0.64250344774119039),
('brave', 0.64185388617239469),
('william', 0.64083139119578469),
('hudson', 0.64077919504262937),
('friendly', 0.63949446706762514),
('eccentric', 0.63907995928966954),
('awards', 0.63875310849414646),
('jack', 0.63838309514997038),
('seeking', 0.63808740337691783),
('divorce', 0.63757732940513456),
('colonel', 0.63757732940513456),
('jane', 0.63443957973316734),
('keeping', 0.63414883979798953),
('gives', 0.63383568159497883),
('ted', 0.63342794585832296),
('animation', 0.63208692379869902),
('progress', 0.6317782341836532),
('larger', 0.63127177684185776),
('concert', 0.63127177684185776),
('nation', 0.6296337748376194),
('albeit', 0.62739580299716491),
('discovers', 0.62542900650499444),
('classic', 0.62504956428050518),
('segment', 0.62335141862440335),
('morgan', 0.62303761437291871),
('mouse', 0.62294292188669675),
('impressive', 0.62211140744319349),
('artist', 0.62168821657780038),
('ultimate', 0.62168821657780038),
('griffith', 0.62117368093485603),
('drew', 0.62082651898031915),
('emily', 0.62082651898031915),
('moved', 0.6197197120051281),
('families', 0.61903920840622351),
('profound', 0.61903920840622351),
('innocent', 0.61851219917136446),
('versions', 0.61730910416844087),
('eddie', 0.61691981517206107),
('criticism', 0.61651395453902935),
('nature', 0.61594514653194088),
('recognized', 0.61518563909023349),
('sexuality', 0.61467556511845012),
('contract', 0.61400986000122149),
('brian', 0.61344043794920278),
('remembered', 0.6131044728864089),
('determined', 0.6123858239154869),
('offers', 0.61207935747116349),
('pleasure', 0.61195702582993206),
('washington', 0.61180154110599294),
('images', 0.61159731359583758),
('games', 0.61067095873570676),
('fashioned', 0.60798937221963845),
('melodrama', 0.60749173598145145),
('rough', 0.60613580357031549),
('charismatic', 0.60613580357031549),
('peoples', 0.60613580357031549),
('dealing', 0.60517840761398811),
('fine', 0.60496962268013299),
('tap', 0.60391604683200273),
('trio', 0.60157998703445481),
('russell', 0.60120968523425966),
('figures', 0.60077386042893011),
('ward', 0.60005675749393339),
('shine', 0.59911823091166894),
('job', 0.59845562125168661),
('satisfied', 0.59652034487087369),
('river', 0.59637962862495086),
('brown', 0.595773016534769),
('believable', 0.59566072133302495),
('always', 0.59470710774669278),
('bound', 0.59470710774669278),
('hall', 0.5933967777928858),
('cook', 0.5916777203950857),
('claire', 0.59136448625000293),
('anna', 0.58778666490211906),
('peace', 0.58628403501758408),
('visually', 0.58539431926349916),
('morality', 0.58525821854876026),
('falk', 0.58525821854876026),
('growing', 0.58466653756587539),
('experiences', 0.58314628534561685),
('stood', 0.58314628534561685),
('touch', 0.58122926435596001),
('lives', 0.5810976767513224),
('kubrick', 0.58066919713325493),
('timing', 0.58047401805583243),
('expressions', 0.57981849525294216),
('struggles', 0.57981849525294216),
('authentic', 0.57848427223980559),
('helen', 0.57763429343810091),
('pre', 0.57700753064729182),
('quirky', 0.5753641449035618),
('young', 0.57531672344534313),
('inner', 0.57454143815209846),
('mexico', 0.57443087372056334),
('clint', 0.57380042292737909),
('sisters', 0.57286101468544337),
('realism', 0.57226528899949558),
('french', 0.5720692490067093),
('personalities', 0.5720692490067093),
('surprises', 0.57113222999698177),
('overcome', 0.5697681593994407),
('timothy', 0.56953322459276867),
('tales', 0.56909453188996639),
('war', 0.56843317302781682),
('civil', 0.5679840376059393),
('countries', 0.56737779327091187),
('streep', 0.56710645966458029),
('oliver', 0.56673325570428668),
('australia', 0.56580775818334383),
('understanding', 0.56531380905006046),
('players', 0.56509525370004821),
('knowing', 0.56489284503626647),
('rogers', 0.56421349718405212),
('suspenseful', 0.56368911332305849),
('variety', 0.56368911332305849),
('true', 0.56281525180810066),
('jr', 0.56220982311246936),
('psychological', 0.56108745854687891),
('sent', 0.55961578793542266),
('grand', 0.55961578793542266),
('branagh', 0.55961578793542266),
('reminiscent', 0.55961578793542266),
('performing', 0.55961578793542266),
('wealth', 0.55961578793542266),
('overwhelming', 0.55961578793542266),
('odds', 0.55961578793542266),
('brothers', 0.55891181043362848),
('howard', 0.55811089675600245),
('david', 0.55693122256475369),
('generation', 0.55628799784274796),
('grow', 0.55612538299565417),
('survival', 0.55594605904646033),
('mainstream', 0.55574731115750231),
('dick', 0.55431073570572953),
('charm', 0.55288175575407861),
('kirk', 0.55278982286502287),
('twists', 0.55244729845681018),
('gangster', 0.55206858230003986),
('jeff', 0.55179306225421365),
('family', 0.55116244510065526),
('tend', 0.55053307336110335),
('thanks', 0.55049088015842218),
('world', 0.54744234723432639),
('sutherland', 0.54743536937855164),
('life', 0.54695514434959924),
('disc', 0.54654370636806993),
('bug', 0.54654370636806993),
('tribute', 0.5455111817538808),
('europe', 0.54522705048332309),
('sacrifice', 0.54430155296238014),
('color', 0.54405127139431109),
('superior', 0.54333490233128523),
('york', 0.54318235866536513),
('pulls', 0.54266622962164945),
('jackson', 0.54232429082536171),
('hearts', 0.54232429082536171),
('enjoy', 0.54124285135906114),
('redemption', 0.54056759296472823),
('stands', 0.5389965007326869),
('trial', 0.5389965007326869),
('greek', 0.5389965007326869),
('hamilton', 0.5389965007326869),
('each', 0.5388212312554177),
('faithful', 0.53773307668591508),
('documentaries', 0.53714293208336406),
('jealous', 0.53714293208336406),
('different', 0.53709860682460819),
('describes', 0.53680111016925136),
('shorts', 0.53596159703753288),
('brilliance', 0.53551823635636209),
('mountains', 0.53492317534505118),
('share', 0.53408248593025787),
('dealt', 0.53408248593025787),
('providing', 0.53329847961804933),
('explore', 0.53329847961804933),
('series', 0.5325809226575603),
('fellow', 0.5323318289869543),
('loves', 0.53062825106217038),
('revolution', 0.53062825106217038),
('olivier', 0.53062825106217038),
('roman', 0.53062825106217038),
('century', 0.53002783074992665),
('musical', 0.52966871156747064),
('heroic', 0.52925932545482868),
('approach', 0.52806743020049673),
('ironically', 0.52806743020049673),
('temple', 0.52806743020049673),
('moves', 0.5279372642387119),
('julie', 0.52609309589677911),
('tells', 0.52415107836314001),
('uncle', 0.52354439617376536),
('union', 0.52324814376454787),
('deep', 0.52309571635780505),
('reminds', 0.52157841554225237),
('famous', 0.52118841080153722),
('jazz', 0.52053443789295151),
('dennis', 0.51987545928590861),
('epic', 0.51919387343650736),
('shows', 0.51915322220375304),
('performed', 0.5191244265806858),
('demons', 0.5191244265806858),
('discovered', 0.51879379341516751),
('eric', 0.51879379341516751),
('youth', 0.5185626062681431),
('human', 0.51851411224987087),
('tarzan', 0.51813827061227724),
('ourselves', 0.51794309153485463),
('wwii', 0.51758240622887042),
('passion', 0.5162164724008671),
('desire', 0.51607497965213445),
('pays', 0.51581316527702981),
('dirty', 0.51557622652458857),
('fox', 0.51557622652458857),
('sympathetic', 0.51546600332249293),
('symbolism', 0.51546600332249293),
('attitude', 0.51530993621331933),
('appearances', 0.51466440007315639),
('jeremy', 0.51466440007315639),
('fun', 0.51439068993048687),
('south', 0.51420972175023116),
('arrives', 0.51409894911095988),
('present', 0.51341965894303732),
('com', 0.51326167856387173),
('smile', 0.51265880484765169),
('alan', 0.51082562376599072),
('ring', 0.51082562376599072),
('visit', 0.51082562376599072),
('fits', 0.51082562376599072),
('provided', 0.51082562376599072),
('carter', 0.51082562376599072),
('aging', 0.51082562376599072),
('countryside', 0.51082562376599072),
('begins', 0.51015650363396647),
('success', 0.50900578704900468),
('japan', 0.50900578704900468),
('accurate', 0.50895471583017893),
('proud', 0.50800474742434931),
('daily', 0.5075946031845443),
('karloff', 0.50724780241810674),
('atmospheric', 0.50724780241810674),
('recently', 0.50714914903668207),
('fu', 0.50704490092608467),
('horrors', 0.50656122497953315),
('finding', 0.50637127341661037),
('lust', 0.5059356384717989),
('hitchcock', 0.50574947073413001),
('among', 0.50334004951332734),
('viewing', 0.50302139827440906),
('investigation', 0.50262885656181222),
('shining', 0.50262885656181222),
('duo', 0.5020919437972361),
('cameron', 0.5020919437972361),
('finds', 0.50128303100539795),
('contemporary', 0.50077528791248915),
('genuine', 0.50046283673044401),
('frightening', 0.49995595152908684),
('plays', 0.49975983848890226),
('age', 0.49941323171424595),
('position', 0.49899116611898781),
('continues', 0.49863035067217237),
('roles', 0.49839716550752178),
('james', 0.49837216269470402),
('individuals', 0.49824684155913052),
('brought', 0.49783842823917956),
('hilarious', 0.49714551986191058),
('brutal', 0.49681488669639234),
('appropriate', 0.49643688631389105),
('dance', 0.49581998314812048),
('league', 0.49578774640145024),
('helping', 0.49578774640145024),
('stunts', 0.49561620510246196),
('traveling', 0.49532143723002542),
('thoroughly', 0.49414593456733524),
('depicted', 0.49317068852726992),
('combination', 0.49247648509779424),
('honor', 0.49247648509779424),
('differences', 0.49247648509779424),
('fully', 0.49213349075383811),
('tracy', 0.49159426183810306),
('battles', 0.49140753790888908),
('possibility', 0.49112055268665822),
('romance', 0.4901589869574316),
('initially', 0.49002249613622745),
('happy', 0.4898997500608791),
('crime', 0.48977221456815834),
('singing', 0.4893852925281213),
('especially', 0.48901267837860624),
('shakespeare', 0.48754793889664511),
('hugh', 0.48729512635579658),
('detail', 0.48609484250827351),
('julia', 0.48550781578170082),
('san', 0.48550781578170082),
('guide', 0.48550781578170082),
('desperation', 0.48550781578170082),
('companion', 0.48550781578170082),
('strongly', 0.48460242866688824),
('necessary', 0.48302334245403883),
('humanity', 0.48265474679929443),
('drama', 0.48221998493060503),
('nonetheless', 0.48183808689273838),
('intrigue', 0.48183808689273838),
('warming', 0.48183808689273838),
('cuba', 0.48183808689273838),
('planned', 0.47957308026188628),
('pictures', 0.47929937011921681),
('nine', 0.47803580094299974),
('settings', 0.47743860773325364),
('history', 0.47732966933780852),
('ordinary', 0.47725880012690741),
('official', 0.47608267532211779),
('primary', 0.47608267532211779),
('episode', 0.47529620261150429),
('role', 0.47520268270188676),
('spirit', 0.47477690799839323),
('grey', 0.47409361449726067),
('ways', 0.47323464982718205),
('cup', 0.47260441094579297),
('piano', 0.47260441094579297),
('familiar', 0.47241617565111949),
('sinister', 0.47198579044972683),
('reveal', 0.47171449364936496),
('max', 0.47150852042515579),
('dated', 0.47121648567094482),
('losing', 0.47000362924573563),
('discovery', 0.47000362924573563),
('vicious', 0.47000362924573563),
('genuinely', 0.46871413841586385),
('hatred', 0.46734051182625186),
('mistaken', 0.46702300110759781),
('dream', 0.46608972992459924),
('challenge', 0.46608972992459924),
('crisis', 0.46575733836428446),
('photographed', 0.46488852857896512),
('critics', 0.46430560813109778),
('bird', 0.46430560813109778),
('machines', 0.46430560813109778),
('born', 0.46411383518967209),
('detective', 0.4636633473511525),
('higher', 0.46328467899699055),
('remains', 0.46262352194811296),
('inevitable', 0.46262352194811296),
('soviet', 0.4618180446592961),
('ryan', 0.46134556650262099),
('african', 0.46112595521371813),
('smaller', 0.46081520319132935),
('techniques', 0.46052488529119184),
('information', 0.46034171833399862),
('deserved', 0.45999798712841444),
('lynch', 0.45953232937844013),
('spielberg', 0.45953232937844013),
('cynical', 0.45953232937844013),
('tour', 0.45953232937844013),
('francisco', 0.45953232937844013),
('struggle', 0.45911782160048453),
('language', 0.45902121257712653),
('visual', 0.45823514408822852),
('warner', 0.45724137763188427),
('social', 0.45720078250735313),
('reality', 0.45719346885019546),
('hidden', 0.45675840249571492),
('breaking', 0.45601738727099561),
('sometimes', 0.45563021171182794),
('modern', 0.45500247579345005),
('surfing', 0.45425527227759638),
('popular', 0.45410691533051023),
('surprised', 0.4534409399850382),
('follows', 0.45245361754408348),
('keeps', 0.45234869400701483),
('john', 0.4520909494482197),
('mixed', 0.45198512374305722),
('defeat', 0.45198512374305722),
('justice', 0.45142724367280018),
('treasure', 0.45083371313801535),
('presents', 0.44973793178615257),
('years', 0.44919197032104968),
('chief', 0.44895022004790319),
('closely', 0.44701411102103689),
('segments', 0.44701411102103689),
('lose', 0.44658335503763702),
('caine', 0.44628710262841953),
('caught', 0.44610275383999071),
('hamlet', 0.44558510189758965),
('chinese', 0.44507424620321018),
('welcome', 0.44438052435783792),
('birth', 0.44368632092836219),
('represents', 0.44320543609101143),
('puts', 0.44279106572085081),
('visuals', 0.44183275227903923),
('fame', 0.44183275227903923),
('closer', 0.44183275227903923),
('web', 0.44183275227903923),
('criminal', 0.4412745608048752),
('minor', 0.4409224199448939),
('jon', 0.44086703515908027),
('liked', 0.44074991514020723),
('restaurant', 0.44031183943833246),
('de', 0.43983275161237217),
('flaws', 0.43983275161237217),
('searching', 0.4393666597838457),
('rap', 0.43891304217570443),
('light', 0.43884433018199892),
('elizabeth', 0.43872232986464682),
('marry', 0.43861731542506488),
('learned', 0.43825493093115531),
('controversial', 0.43825493093115531),
('oz', 0.43825493093115531),
('slowly', 0.43785660389939979),
('comedic', 0.43721380642274466),
('wayne', 0.43721380642274466),
('thrilling', 0.43721380642274466),
('bridge', 0.43721380642274466),
('married', 0.43658501682196887),
('nazi', 0.4361020775700542),
('murder', 0.4353180712578455),
('physical', 0.4353180712578455),
('johnny', 0.43483971678806865),
('michelle', 0.43445264498141672),
('wallace', 0.43403848055222038),
('comedies', 0.43395706390247063),
('silent', 0.43395706390247063),
('played', 0.43387244114515305),
('international', 0.43363598507486073),
('vision', 0.43286408229627887),
('intelligent', 0.43196704885367099),
('shop', 0.43078291609245434),
('also', 0.43036720209769169),
('levels', 0.4302451371066513),
('miss', 0.43006426712153217),
('movement', 0.4295626596872249),
...]

``````
``````

In [12]:

# words most frequently seen in a review with a "NEGATIVE" label
list(reversed(pos_neg_ratios.most_common()))[0:30]

``````
``````

Out[12]:

[('boll', -4.0778152602708904),
('uwe', -3.9218753018711578),
('seagal', -3.3202501058581921),
('unwatchable', -3.0269848170580955),
('stinker', -2.9876839403711624),
('mst', -2.7753833211707968),
('incoherent', -2.7641396677532537),
('unfunny', -2.5545257844967644),
('waste', -2.4907515123361046),
('blah', -2.4475792789485005),
('horrid', -2.3715779644809971),
('pointless', -2.3451073877136341),
('atrocious', -2.3187369339642556),
('redeeming', -2.2667790015910296),
('prom', -2.2601040980178784),
('drivel', -2.2476029585766928),
('lousy', -2.2118080125207054),
('worst', -2.1930856334332267),
('laughable', -2.172468615469592),
('awful', -2.1385076866397488),
('poorly', -2.1326133844207011),
('wasting', -2.1178155545614512),
('remotely', -2.111046881095167),
('existent', -2.0024805005437076),
('boredom', -1.9241486572738005),
('miserably', -1.9216610938019989),
('sucks', -1.9166645809588516),
('uninspired', -1.9131499212248517),
('lame', -1.9117232884159072),
('insult', -1.9085323769376259)]

``````

# Transforming Text into Numbers

``````

In [13]:

from IPython.display import Image

review = "This was a horrible, terrible movie."

Image(filename='sentiment_network.png')

``````
``````

Out[13]:

``````
``````

In [14]:

review = "The movie was excellent"

Image(filename='sentiment_network_pos.png')

``````
``````

Out[14]:

``````

# Project 2: Creating the Input/Output Data

``````

In [15]:

vocab = set(total_counts.keys())
vocab_size = len(vocab)
print(vocab_size)

``````
``````

74074

``````
``````

In [16]:

list(vocab)

``````
``````

Out[16]:

['',
'stage',
'yuen',
'balder',
'timers',
'muro',
'abromowitz',
'partly',
'joies',
'azar',
'ddr',
'germane',
'bllsosopher',
'dissolve',
'breathing',
'tableau',
'prosthetic',
'taurus',
'gleamed',
'diverge',
'nighttime',
'homelessness',
'thanatopsis',
'untreated',
'doctrines',
'goodloe',
'rhythm',
'substandard',
'tentatively',
'underlying',
'whittier',
'pico',
'peopled',
'bullsh',
'pesky',
'yale',
'foulata',
'hyperkinetic',
'scholl',
'laughometer',
'oren',
'suprising',
'cans',
'lecturing',
'umber',
'forgery',
'autonomous',
'indigestible',
'chides',
'reclamation',
'wardens',
'footed',
'unilaterally',
'affter',
'ferber',
'portrayals',
'allows',
'extracurricular',
'neo',
'washing',
'ukraine',
'miryang',
'annick',
'reckless',
'blissfully',
'tsu',
'denison',
'paypal',
'louque',
'traced',
'relegates',
'loiret',
'ropers',
'unwinds',
'aito',
'dashingly',
'racist',
'fondly',
'frostbite',
'vampiros',
'repulsed',
'predicated',
'forsa',
'flitty',
'sunekosuri',
'vampyr',
'oless',
'nuke',
'punky',
'sawney',
'upsets',
'expels',
'dena',
'kiva',
'squeazy',
'penal',
'dartboard',
'boarders',
'mnm',
'mrquez',
'perversions',
'aggrandizing',
'brokovich',
'dependent',
'pursuing',
'familiarized',
'marchal',
'raju',
'bogarts',
'panes',
'caitlin',
'paarthale',
'recur',
'warping',
'intergender',
'subterranean',
'assistant',
'unscheduled',
'ozporns',
'liner',
'aragorn',
'lonliness',
'tashy',
'corleone',
'bombshell',
'companionship',
'ricci',
'solves',
'isint',
'underflowing',
'pransky',
'internalist',
'liaison',
'teletype',
'wile',
'programmation',
'applause',
'unmated',
'hassett',
'achterbusch',
'irk',
'bloodbath',
'explorations',
'dearies',
'rocco',
'homework',
'scales',
'yul',
'engine',
'unchoreographed',
'talented',
'ruler',
'maude',
'preferences',
'punsley',
'reentered',
'ditches',
'skis',
'tribbiani',
'normal',
'bryans',
'varhola',
'seam',
'coates',
'clavell',
'harping',
'chipped',
'sages',
'abolition',
'medias',
'megalomania',
'masina',
'peeves',
'bohlen',
'disdainful',
'cucumbers',
'vehicles',
'excepting',
'fizzly',
'stopovers',
'kumai',
'carabiners',
'reconnoitering',
'psychoanalytical',
'novarro',
'squirmish',
'carfully',
'spruced',
'reid',
'esha',
'unknowns',
'communicable',
'poundage',
'cartwright',
'homoeroticism',
'peyote',
'neutrality',
'reefer',
'premedical',
'alekos',
'schnook',
'quotation',
'rashly',
'ingenue',
'keenan',
'hagia',
'studding',
'amusements',
'critic',
'worshiper',
'psychokinetic',
'braking',
'capo',
'whisking',
'mc',
'hou',
'basis',
'aniston',
'screwee',
'followings',
'breakaway',
'gharlie',
'reichskanzler',
'pebble',
'discotheque',
'huntsbery',
'grueling',
'wilmington',
'insurgency',
'gaa',
'personifies',
'poodles',
'er',
'solutions',
'larraz',
'em',
'slouches',
'avenues',
'magnified',
'pear',
'swamps',
'braslia',
'wrinkling',
'bernal',
'giza',
'craig',
'hof',
'giordano',
'munchkin',
'dough',
'leery',
'crucifixion',
'posturing',
'riveting',
'defectives',
'transpose',
'cajoling',
'combines',
'livery',
'mining',
'wong',
'poldi',
'perdition',
'daw',
'bloopers',
'defacing',
'euthanasiarist',
'outrages',
'gfx',
'goodluck',
'pnico',
'honored',
'doofuses',
'indigineous',
'bldy',
'paint',
'weeny',
'dailey',
'wolfpack',
'supplanted',
'kiera',
'hairbrained',
'teleportation',
'sense',
'yiiii',
'inject',
'flamboyant',
'ahlberg',
'puszta',
'lorean',
'fiers',
'shallow',
'charteris',
'glitxy',
'sinclair',
'kindegarden',
'refusals',
'leonidas',
'undeserved',
'jensen',
'sabretooth',
'vitriolic',
'bereaved',
'fishtail',
'questmaster',
'impostor',
'coaxing',
'videotaping',
'orchidea',
'hedaya',
'bell',
'delpy',
'brit',
'lawnmowerman',
'calculating',
'phoned',
'container',
'resistant',
'proprietress',
'vodyanoi',
'leashes',
'benzedrine',
'lenghts',
'painkillers',
'dreams',
'zabriskie',
'harleys',
'foundationally',
'lassie',
'trustees',
'ducks',
'workers',
'cough',
'sizing',
'cardos',
'dong',
'uniforms',
'acquitted',
'bohnen',
'slightyly',
'surfaced',
'diced',
'lashley',
'shotgunning',
'submerges',
'centrepiece',
'perron',
'fundamental',
'sizzling',
'undefeated',
'sprinkle',
'speckle',
'teller',
'moviefreak',
'skaal',
'raindeer',
'uncompromizing',
'lamonte',
'laguna',
'cryptozoology',
'mohamed',
'sllskapsresan',
'pesce',
'walder',
'espionage',
'seams',
'necklace',
'reviles',
'provisions',
'butter',
'fledgling',
'revamped',
'xvid',
'transmits',
'bronsan',
'swirls',
'mindy',
'tethered',
'redid',
'gathered',
'griffen',
'sabrian',
'jurking',
'swindlers',
'bettering',
'triviata',
'wilding',
'mojo',
'disrepair',
'ruptured',
'circuits',
'analyzing',
'wirsching',
'escaping',
'sickingly',
'splitting',
'gft',
'licencing',
'frock',
'lyoko',
'males',
'franklin',
'vaitongi',
'sightless',
'bmx',
'viewability',
'conditional',
'burstingly',
'chauvinistic',
'bergerac',
'operetta',
'grungy',
'levens',
'eaves',
'expansionist',
'casablanka',
'oneself',
'excessiveness',
'keitel',
'honolulu',
'horrifying',
'stupefying',
'weekdays',
'eyebrow',
'gratefulness',
'mere',
'finals',
'cannible',
'dozing',
'salaries',
'prescience',
'bashings',
'liken',
'lenoire',
'americaness',
'staunchly',
'gruff',
'silliest',
'bleek',
'circumlocution',
'fearlessly',
'hit',
'vays',
'randolph',
'long',
'matarazzo',
'dorsey',
'rediculas',
'gao',
'doones',
'iglesia',
'torin',
'songwriters',
'plentiful',
'horsecocky',
'dreufuss',
'dicky',
'esq',
'besco',
'underused',
'forerunner',
'dreamgirl',
'gaining',
'platters',
'franciosa',
'legacy',
'carlita',
'repartees',
'decimation',
'borel',
'poach',
'aces',
'reorganized',
'purrs',
'shockers',
'campesinos',
'rohal',
'volunteered',
'pathedic',
'sayings',
'putty',
'isham',
'iwas',
'wretched',
'lovelier',
'cartooned',
'depressive',
'sissily',
'moe',
'infringement',
'fairview',
'artificial',
'plotholes',
'konchalovsky',
'himbut',
'correspondence',
'imagination',
'bancroft',
'outpost',
'sbardellati',
'scob',
'timeshifts',
'tenacity',
'labourer',
'unclever',
'deniers',
'narrtor',
'marathan',
'peculating',
'bridges',
'quinnn',
'chewed',
'doghi',
'savanna',
'hulbert',
'sarde',
'valenti',
'manson',
'glib',
'strays',
'when',
'annoyingly',
'andrei',
'anxiety',
'mlc',
'ears',
'paine',
'rummaged',
'musa',
'inspected',
'hopelessly',
'assassinate',
'relished',
'joke',
'warmhearted',
'undefined',
'une',
'incorporates',
'chee',
'takeko',
'ghosthouse',
'homebase',
'unlikley',
'unambiguous',
'dearest',
'preforming',
'group',
'selects',
'wrestles',
'moravia',
'mears',
'gaita',
'completest',
'joel',
'highlights',
'ooooohhhh',
'launching',
'snorting',
'cruiser',
'weingartner',
'beans',
'brion',
'couldve',
'descents',
'inferno',
'vining',
'westwood',
'gibs',
'gundam',
'pining',
'mates',
'tickling',
'appoint',
'overabundance',
'mnica',
'aspires',
'twinned',
'bitsmidohio',
'vctor',
'peak',
'gamers',
'interactive',
'decree',
'formosa',
'undressed',
'individuation',
'cabo',
'seboipepe',
'ryoko',
'friels',
'unbounded',
'rajnikant',
'freaky',
'ompuri',
'hallmark',
'glamourous',
'klok',
'calmly',
'attracted',
'powermaster',
'lyricists',
'dissing',
'portfolios',
'shakily',
'stair',
'document',
'unforgettable',
'sociable',
'vrsel',
'backlash',
'skitters',
'crapo',
'nicholls',
'alta',
'violation',
'bedevils',
'potion',
'italia',
'seiing',
'torpedos',
'tirith',
'templates',
'limbs',
'solver',
'stationary',
'malfique',
'denys',
'coulthard',
'schygulla',
'emannuelle',
'bunuel',
'xu',
'mon',
'xd',
'pb',
'consider',
'pianist',
'risks',
'dahl',
'beachcomber',
'repairs',
'jing',
'strobes',
'crediblity',
'canvas',
'torments',
'despicable',
'philbin',
'histrionics',
'awsomeness',
'bleed',
'bickering',
'finishing',
'von',
'motormouth',
'leclerc',
'dharmendra',
'globally',
'exhooker',
'illuminations',
'showiest',
'norris',
'seselj',
'denominator',
'il',
'spanishness',
'vandalizing',
'mch',
'trample',
'cleve',
'litters',
'lifeblood',
'entrusted',
'cc',
'coroner',
'lahaye',
'deludes',
'wishbone',
'sari',
'withdrawal',
'accentuate',
'klan',
'tain',
'bronco',
'jovan',
'lidsville',
'lexus',
'snyder',
'raves',
'striped',
'pupi',
'bravo',
'uno',
'saving',
'empathized',
'goetter',
'regimental',
'sprawling',
'aranoa',
'floundered',
'trifecta',
'powerglove',
'hifi',
'franfreako',
'goodnik',
'gillette',
'byronic',
'pollak',
'polution',
'grammatically',
'insurgents',
'apaches',
'gall',
'sneaking',
'pout',
'gull',
'siddons',
'zavet',
'knockdown',
'supports',
'hampeita',
'tripods',
'hito',
'philanthropic',
'punks',
'clytemenstra',
'kinski',
'cherri',
'mantis',
'smartest',
'uninjured',
'seagoing',
'faustino',
'hig',
'simpons',
'ethan',
'gumshoe',
'sunnydale',
'youknowwhat',
'piece',
'compelling',
'instigator',
'pollyanna',
'sirbossman',
'quayle',
'rissole',
'gaslit',
'vomited',
'plastic',
'salkow',
'rosenstrasse',
'yall',
'tamo',
'herod',
'vivacious',
'rhinos',
'applewhite',
'originators',
'hypnotising',
'bulgakov',
'tottering',
'vilifies',
'gnash',
'sophisticate',
'spheres',
'sprocket',
'weeks',
'citizenx',
'ist',
'viren',
'compute',
'deteriorate',
'popularize',
'enterntainment',
'at',
'proposition',
'filmstiftung',
'assael',
'terribly',
'normand',
'ritual',
'tame',
'threateningly',
'classrooms',
'shite',
'flimsily',
'artists',
'sandbag',
'horowitz',
'removes',
'hoofer',
'biggest',
'anathema',
'shattering',
'twists',
'comas',
'parameters',
'berliner',
'vaticani',
'dolly',
'crypts',
'squirrels',
'flubbing',
'yeccch',
'findlay',
'personae',
'rectitude',
'dnouement',
'indisputably',
'arithmetic',
'nebot',
'geeeee',
'rampantly',
'fickleness',
'natassia',
'jellybean',
'formulae',
'scorning',
'robald',
'lurching',
'petter',
'ivanek',
'zombiefest',
'hunnicutt',
'contrived',
'sags',
'israelis',
'earner',
'zaara',
'booker',
'bergre',
'plaudits',
'gubra',
'plex',
'lecter',
'hurrrts',
'zapp',
'police',
'pocketbooks',
'doctoral',
'yabba',
'speeds',
'shauvians',
'juxtaposed',
'eastman',
'integrates',
'starfucker',
'pursuant',
'authority',
'shlocky',
'swooshes',
'shovel',
'cannavale',
'avjo',
'assess',
'stucco',
'completetly',
'waved',
'irrepressible',
'distractive',
'interiors',
'alps',
'scorer',
'tetsukichi',
'dried',
'micah',
'patient',
'emminently',
'arrgh',
'trickling',
'aimanov',
'farily',
'deitrich',
'whorde',
'orca',
'leaped',
'linguistically',
'extreamely',
'fbl',
'prem',
'blanc',
'rearrange',
'salgueiro',
'channels',
'chris',
'feij',
'lapsed',
'sensible',
'boyum',
'bases',
'haywood',
'chikatilo',
'apollonia',
'contactable',
'clenched',
'aborigines',
'negativistic',
'mochrie',
'piggy',
'twoooooooo',
'suchet',
'looping',
'dasilva',
'privilege',
'sooooo',
'juliana',
'chapin',
'depreciative',
'lomas',
'bop',
'jetee',
'pausing',
'peephole',
'intoxication',
'babied',
'greengrass',
'steelcrafts',
'astrogators',
'ensure',
'pandora',
'excution',
'kikabidze',
'fetching',
'liferaft',
'transpires',
'stroh',
'hillman',
'jembs',
'deco',
'biased',
'fassbinder',
'envelopes',
'mumford',
'fugace',
'blinds',
'formats',
'roscoe',
'yokels',
'kirsty',
'crossfire',
'mistaken',
'captivating',
'replies',
'fratelli',
'sarafina',
'mn',
'plod',
'daines',
'cheeni',
'conquerors',
'budding',
'exterminating',
'carefully',
'corporation',
'ideologically',
'halpin',
'vfx',
'conaughey',
'floating',
'belivably',
'sweaters',
'favourably',
'female',
'western',
'infinity',
'uncharismatic',
'idiotized',
'ronnie',
'examined',
'atmospheres',
'perspiring',
'cookers',
'courtesan',
'mostof',
'format',
'polonius',
'asphyxiated',
...]

``````
``````

In [17]:

import numpy as np

layer_0 = np.zeros((1,vocab_size))
layer_0

``````
``````

Out[17]:

array([[ 0.,  0.,  0., ...,  0.,  0.,  0.]])

``````
``````

In [18]:

from IPython.display import Image
Image(filename='sentiment_network.png')

``````
``````

Out[18]:

``````
``````

In [48]:

word2index = {}

for i,word in enumerate(vocab):
word2index[word] = i
word2index

``````
``````

Out[48]:

{'': 0,
'inhabitants': 1,
'goku': 2,
'stunts': 3,
'catepillar': 4,
'kristensen': 5,
'goddess': 7,
'offing': 49797,
'distroy': 8,
'unexplainably': 9,
'concoctions': 10,
'petite': 11,
'paramilitary': 24759,
'scribe': 12,
'stevson': 13,
'senegal': 6,
'sctv': 14,
'soundscape': 15,
'rana': 16,
'immortalizer': 18,
'rene': 67354,
'eko': 23,
'planning': 20,
'akiva': 21,
'plod': 22,
'orderly': 24,
'zeleznice': 25,
'critize': 29,
'baguettes': 25649,
'jefferies': 30,
'uncertainties': 61695,
'mountainbillies': 31,
'steinbichler': 32,
'vowel': 33,
'rafe': 34,
'donig': 68719,
'tulipe': 36,
'clot': 37,
'hack': 12526,
'distended': 38,
'cornered': 37116,
'impatiently': 40,
'batrice': 12525,
'unfortuntly': 41,
'lung': 42,
'scapegoats': 43,
'pscychosexual': 45,
'outbid': 46,
'obit': 47,
'sideshows': 48,
'jugde': 49,
'kevloun': 51,
'quartier': 53,
'harp': 61948,
'unravelling': 54,
'antiques': 56,
'strutts': 57,
'tilts': 58,
'disconcert': 59,
'dossiers': 60,
'sorriest': 61,
'craftsman': 49412,
'blart': 62,
'dependence': 37120,
'sated': 61698,
'iberia': 63,
'sagan': 72,
'frmann': 65,
'daniell': 66,
'rays': 67,
'pried': 68,
'khoobsurat': 69,
'leavitt': 70,
'caiano': 71,
'attractiveness': 73,
'kitaparaporn': 74,
'hamilton': 75,
'massages': 76,
'horgan': 78,
'chemist': 79,
'audrey': 80,
'yeow': 55655,
'jana': 81,
'dutch': 82,
'pinchot': 24773,
'override': 83,
'dwervick': 63223,
'spasms': 84,
'resumed': 85,
'tamale': 66259,
'calibanian': 49636,
'stinson': 86,
'widows': 87,
'stonewall': 88,
'palatial': 89,
'neuman': 90,
'abandon': 91,
'lemmings': 65314,
'anglophile': 92,
'ertha': 61706,
'chevette': 94,
'unscary': 95,
'spoilerific': 97,
'neworleans': 67639,
'metamorphose': 17,
'brigand': 99,
'cheating': 41603,
'clued': 101,
'dermatonecrotic': 102,
'mulligan': 104,
'ol': 105,
'incubation': 107,
'plaintiffs': 110,
'snden': 109,
'fk': 111,
'deply': 112,
'franchot': 113,
'henstridge': 19,
'cyhper': 114,
'verbose': 26,
'mazovia': 116,
'elizabeth': 117,
'palestine': 118,
'robby': 119,
'wongo': 120,
'moshing': 121,
'mstified': 12543,
'eeeee': 122,
'doltish': 123,
'bree': 124,
'postponed': 125,
'debacles': 127,
'amplify': 27,
'kamm': 128,
'phantom': 18893,
'boylen': 136,
'rolando': 131,
'premises': 133,
'bruck': 134,
'loosely': 135,
'wodehousian': 139,
'onishi': 70389,
'encapsuling': 140,
'partly': 141,
'calms': 143,
'darkie': 148,
'wheeling': 147,
'ursla': 15875,
'subsidized': 49420,
'mckellar': 149,
'ooookkkk': 151,
'milky': 152,
'unfolded': 153,
'authenticating': 155,
'writeup': 12548,
'rotheroe': 156,
'beart': 157,
'intoxicants': 160,
'grispin': 159,
'cannes': 61718,
'antithetical': 70398,
'nnette': 161,
'tsukamoto': 163,
'antwones': 44205,
'stows': 164,
'suddenness': 165,
'vol': 61720,
'waqt': 166,
'camazotz': 168,
'paps': 55042,
'shakher': 170,
'terminate': 63868,
'kotex': 56419,
'delinquency': 171,
'bromwell': 25214,
'insecticide': 173,
'charlton': 174,
'titted': 24791,
'urbane': 178,
'depicted': 54491,
'hyping': 181,
'yr': 182,
'hebert': 183,
'waxwork': 12990,
'deathrow': 185,
'nourishes': 24792,
'unmediated': 187,
'tamper': 37143,
'alphabet': 189,
'donen': 191,
'lord': 192,
'recess': 193,
'watchably': 61023,
'handsome': 194,
'vignettes': 196,
'pairings': 198,
'uselful': 199,
'sanders': 200,
'outbursts': 72891,
'nots': 201,
'hatsumomo': 202,
'actioned': 18292,
'krimi': 24797,
'appleby': 203,
'tampax': 204,
'sprinkling': 205,
'defacing': 206,
'lofty': 207,
'verger': 213,
'tablespoons': 211,
'bernhard': 212,
'goosebump': 64565,
'acumen': 214,
'percentages': 215,
'wendingo': 216,
'resonating': 217,
'vntoarea': 218,
'redundancies': 219,
'strictly': 57081,
'pitied': 221,
'belying': 222,
'michelangelo': 53153,
'gleefulness': 223,
'environmentalist': 24803,
'gitane': 226,
'corrected': 66547,
'journalist': 227,
'focusing': 228,
'plethora': 229,
'his': 39,
'citizen': 230,
'south': 55579,
'clunkers': 232,
'pendulous': 55991,
'mounds': 24805,
'deplorable': 233,
'forgive': 234,
'proplems': 235,
'bankers': 237,
'aqua': 238,
'donated': 239,
'disbelieving': 240,
'acomplication': 241,
'contrasted': 243,
'muzzle': 44,
'amphibians': 72141,
'springs': 246,
'reformatted': 49443,
'toolbox': 247,
'contacting': 248,
'washrooms': 250,
'raving': 251,
'dynamism': 252,
'mae': 253,
'disharmony': 255,
'molls': 72979,
'dewaere': 12569,
'untutored': 256,
'icarus': 257,
'taint': 258,
'kargil': 259,
'captain': 260,
'paucity': 261,
'fits': 262,
'tumbles': 263,
'amer': 264,
'bueller': 265,
'cleansed': 267,
'shara': 269,
'humma': 270,
'outa': 272,
'piglets': 273,
'gombell': 274,
'supermen': 275,
'superlow': 276,
'kubanskie': 280,
'goode': 278,
'disorganised': 45570,
'zenith': 281,
'ananda': 282,
'matlin': 284,
'particolare': 50,
'presumptuous': 286,
'rerun': 287,
'toyko': 288,
'bilb': 291,
'sundry': 290,
'fugly': 292,
'orchestrating': 293,
'prosaically': 294,
'moveis': 296,
'conelly': 297,
'estrange': 298,
'elfriede': 49455,
'masterful': 52,
'seasonings': 300,
'quincey': 303,
'frowning': 49456,
'painkillers': 53444,
'high': 25515,
'flesh': 304,
'tootsie': 305,
'ai': 306,
'tenma': 307,
'duguay': 71257,
'appropriations': 308,
'ides': 310,
'rui': 61734,
'surrogacy': 311,
'pungent': 312,
'damaso': 314,
'authoritarian': 61736,
'caribou': 315,
'ro': 318,
'supplying': 317,
'yuy': 319,
'debuted': 321,
'mounts': 323,
'interpolated': 324,
'aetv': 325,
'plummer': 326,
'asunder': 331,
'airfix': 333,
'dubiel': 329,
'clavichord': 330,
'crafty': 50465,
'sublety': 332,
'stoltzfus': 334,
'ruth': 335,
'fluorescent': 336,
'improves': 337,
'russells': 339,
'tick': 43838,
'zsa': 341,
'macs': 343,
'jlb': 345,
'locus': 348,
'merly': 49461,
'corey': 350,
'blundered': 351,
'humourless': 3568,
'disorganized': 353,
'discuss': 354,
'sharifi': 45391,
'tieing': 356,
'kats': 34784,
'bbc': 360,
'pranked': 362,
'superman': 363,
'holroyd': 9223,
'aggravated': 364,
'rifleman': 365,
'yvone': 366,
'vaugier': 24820,
'galico': 368,
'debris': 369,
'btw': 371,
'denote': 24822,
'havnt': 372,
'francen': 373,
'chattered': 374,
'scathed': 375,
'pic': 376,
'ceremonies': 377,
'everyplace': 65309,
'betsy': 379,
'finster': 37176,
'meercat': 381,
'noirs': 382,
'grunts': 383,
'tribulations': 385,
'apparatus': 47673,
'martnez': 25825,
'telethons': 24825,
'alloimono': 390,
'situations': 64,
'scrutinising': 391,
'geta': 392,
'beltrami': 393,
'pvc': 394,
'horse': 395,
'tiburon': 396,
'huitime': 397,
'ripple': 398,
'exceed': 61748,
'loitering': 399,
'forensics': 400,
'nearly': 401,
'ellington': 403,
'uzi': 404,
'rung': 408,
'pillaged': 24829,
'gao': 409,
'licitates': 410,
'protocol': 411,
'smirker': 412,
'torin': 413,
'vizier': 31853,
'newlywed': 414,
'dismay': 416,
'moonwalks': 418,
'skyler': 417,
'invested': 18455,
'grifter': 421,
'undersold': 422,
'chearator': 423,
'marino': 424,
'scala': 425,
'conditioner': 426,
'lamarre': 428,
'figueroa': 429,
'mcinnerny': 61753,
'allllllll': 431,
'slide': 432,
'lateness': 433,
'selbst': 434,
'dramatizing': 436,
'doable': 438,
'hollywoodize': 27207,
'alexanderplatz': 440,
'wholesome': 45745,
'pandemonium': 441,
'earth': 443,
'mounties': 444,
'seeker': 445,
'cheat': 446,
'outbreaks': 447,
'savagely': 61759,
'snowstorm': 448,
'baur': 449,
'schedules': 450,
'bathetic': 451,
'johnathon': 453,
'origonal': 57843,
'rosanne': 454,
'cauldrons': 456,
'forrest': 457,
'poky': 458,
'aristos': 54856,
'womanness': 460,
'spender': 461,
'pagliai': 37108,
'rational': 463,
'terrell': 464,
'affronts': 472,
'concise': 49476,
'mathew': 468,
'narnia': 469,
'naseeruddin': 470,
'bucks': 471,
'proceeds': 69809,
'topple': 473,
'degree': 474,
'passionately': 476,
'defeats': 477,
'gras': 49477,
'sources': 479,
'pflug': 49976,
'botticelli': 480,
'fwd': 486,
'waiving': 483,
'gunnar': 484,
'stiffler': 485,
'unwise': 49480,
'kawajiri': 487,
'sistahs': 489,
'swallowed': 30511,
'soulhunter': 490,
'belies': 491,
'wrathful': 492,
'unforgivably': 497,
'weirdy': 496,
'violation': 63309,
'chepart': 498,
'departmentthe': 500,
'posehn': 49483,
'peyote': 37188,
'psychiatrically': 24846,
'marionettes': 503,
'blatty': 502,
'atop': 504,
'debases': 25135,
'henze': 24845,
'unrooted': 510,
'cloudscape': 508,
'resignedly': 509,
'begin': 49917,
'hitlerian': 512,
'reedus': 517,
'crewed': 514,
'bedeviled': 515,
'unfurnished': 516,
'herrmann': 12602,
'circumstances': 518,
'grasped': 519,
'fn': 521,
'beefed': 22200,
'scwatch': 64018,
'dishwashers': 522,
'ruthlessness': 524,
'migrant': 12605,
'refrains': 525,
'preponderance': 44377,
'lampooning': 526,
'richart': 528,
'gwenneth': 530,
'enmity': 531,
'vortex': 61772,
'assess': 532,
'manufacturer': 533,
'bullosa': 534,
'citizenship': 61774,
'chekov': 537,
'hogan': 536,
'blithe': 538,
'aredavid': 542,
'drillings': 540,
'revolvers': 541,
'boyfriendhe': 545,
'achcha': 544,
'wallow': 546,
'toga': 547,
'bosnians': 551,
'going': 550,
'willy': 552,
'fim': 554,
'forbidding': 555,
'delete': 56779,
'rationalised': 557,
'shimomo': 558,
'opposition': 559,
'landis': 560,
'minded': 561,
'arghhhhh': 564,
'trialat': 566,
'protected': 567,
'negras': 568,
'tracker': 571,
'muti': 570,
'dinky': 49489,
'shawl': 572,
'differentiates': 573,
'dipaolo': 61779,
'sweetheart': 574,
'manmohan': 576,
'enamored': 66265,
'trevethyn': 577,
'brain': 578,
'incomprehensibly': 579,
'bruton': 59142,
'shtick': 582,
'ute': 583,
'viggo': 584,
'relevent': 589,
'cites': 587,
'greenaways': 61781,
'minidress': 590,
'philosopher': 591,
'mahattan': 593,
'moden': 594,
'compiling': 595,
'unimaginative': 598,
'rogues': 597,
'subpaar': 599,
'darkly': 601,
'saturate': 602,
'fledgling': 603,
'breaths': 604,
'sceam': 37206,
'empathized': 58870,
'aszombi': 606,
'incalculable': 608,
'formations': 28596,
'hampden': 619,
'rawail': 612,
'forbid': 613,
'holiness': 617,
'unessential': 618,
'reputedly': 616,
'wage': 63181,
'kewpie': 24860,
'asylum': 620,
'bolye': 621,
'celticism': 63189,
'strangers': 622,
'rantzen': 623,
'farrellys': 624,
'marathon': 93,
'cantinflas': 626,
'disproportionately': 12617,
'bared': 67212,
'enshrined': 627,
'expetations': 629,
'replaying': 630,
'topless': 636,
'bukater': 632,
'overpaid': 633,
'exhude': 634,
'nitwits': 638,
'tsst': 51554,
'sufferings': 637,
'ci': 24693,
'eponymously': 96,
'ferdy': 644,
'danira': 641,
'unrelenting': 642,
'disabling': 643,
'gerard': 645,
'drewitt': 646,
'lamping': 650,
'demy': 652,
'wicklow': 37214,
'relinquish': 651,
'feminized': 64196,
'drink': 653,
'chamberlin': 654,
'floodwaters': 657,
'searing': 658,
'isral': 659,
'ling': 660,
'grossness': 661,
'sassier': 24865,
'pickier': 662,
'pax': 663,
'fleashens': 98,
'wierd': 664,
'tereasa': 665,
'smog': 666,
'girotti': 667,
'zooey': 64814,
'spat': 668,
'sera': 669,
'misbehaving': 671,
'scouts': 672,
'refreshments': 673,
'itll': 39668,
'toyomichi': 676,
'politeness': 100,
'bits': 677,
'psychotics': 678,
'optimistic': 61796,
'barzell': 679,
'colt': 680,
'anita': 49501,
'shivering': 681,
'utah': 59297,
'scrivener': 686,
'predicable': 687,
'dryer': 684,
'reissues': 685,
'sexier': 26115,
'spellbind': 691,
'seems': 690,
'wyke': 37223,
'innovator': 693,
'inthused': 695,
'scatman': 6309,
'contestants': 696,
'bertolucci': 106,
'serviced': 699,
'nozires': 700,
'ins': 701,
'mutilating': 702,
'dupes': 703,
'launius': 704,
'widescreen': 705,
'joo': 706,
'discretionary': 707,
'enlivens': 708,
'manos': 55596,
'bushes': 709,
'activist': 712,
'gethsemane': 713,
'phoenixs': 714,
'wreathed': 715,
'oldboy': 108,
'electrifyingly': 717,
'inseparability': 24874,
'ghidora': 719,
'binder': 720,
'tibet': 51530,
'doddsville': 723,
'sugar': 722,
'porkys': 724,
'hopefully': 37226,
'scattershot': 725,
'refunded': 726,
'rudely': 727,
'enacts': 67435,
'nightwatch': 61803,
'eurotrash': 730,
'unreservedly': 73710,
'vall': 49508,
'boogeman': 733,
'flunked': 24880,
'weighs': 734,
'glorfindel': 738,
'hypothermia': 737,
'misled': 64919,
'toiletries': 71501,
'birthdays': 739,
'attentive': 740,
'mallepa': 741,
'manoy': 743,
'bombshells': 744,
'glorifying': 115,
'southron': 747,
'destruction': 748,
'manhole': 750,
'elainor': 751,
'bounder': 13003,
'bowersock': 752,
'lowly': 753,
'wfst': 754,
'limousines': 755,
'skolimowski': 756,
'saban': 757,
'malaysia': 759,
'cyd': 761,
'bonecrushing': 763,
'merest': 765,
'janina': 766,
'chemotrodes': 767,
'trials': 768,
'whilhelm': 770,
'asthmatic': 771,
'missteps': 773,
'melyvn': 24885,
'embittered': 774,
'profit': 37234,
'seeming': 776,
'miscalculate': 777,
'recommeded': 778,
'mankin': 37235,
'schoolwork': 779,
'coy': 780,
'mcconaughey': 781,
'waver': 783,
'unwatchably': 786,
'saggy': 787,
'breakup': 790,
'pufnstuf': 37237,
'superstars': 792,
'replay': 793,
'aggravates': 794,
'urging': 796,
'snidely': 797,
'aleksandar': 798,
'hildy': 799,
'kazuhiro': 800,
'slayer': 801,
'tangy': 802,
'horne': 804,
'masayuki': 805,
'molden': 806,
'unravel': 807,
'goodtime': 808,
'rowboat': 811,
'dekhiye': 815,
'datedness': 813,
'astrotheology': 814,
'suriani': 59610,
'hostilities': 819,
'wipes': 818,
'sentimentalising': 820,
'documentary': 821,
'virtue': 823,
'unreasonably': 824,
'cei': 826,
'hobbled': 37240,
'unglamorised': 827,
'balky': 828,
'complementary': 829,
'paychecks': 830,
'tughlaq': 45551,
'functionality': 836,
'ily': 833,
'prc': 834,
'ennobling': 835,
'dissociated': 837,
'elk': 838,
'throbbing': 839,
'tempe': 840,
'linoleum': 841,
'bottacin': 843,
'hipper': 844,
'barging': 846,
'untie': 847,
'sacchetti': 848,
'gnat': 849,
'roedel': 850,
'performs': 852,
'nanavati': 856,
'migrs': 854,
'teachs': 855,
'gunslinger': 126,
'fresco': 857,
'davison': 858,
'jet': 59446,
'burglar': 860,
'jerker': 69267,
'masue': 861,
'dickory': 862,
'muggy': 46634,
'grills': 863,
'figment': 28693,
'monogamistic': 49527,
'appelagate': 864,
'loesser': 867,
'patties': 868,
'prudent': 869,
'mallorquins': 870,
'nativetex': 871,
'suprise': 872,
'quill': 874,
'angsty': 71451,
'speeded': 875,
'farscape': 876,
'herman': 129,
'centuries': 878,
'mos': 879,
'neccessarily': 881,
'tankers': 883,
'latte': 884,
'faracy': 886,
'stilts': 24897,
'synthetically': 887,
'thoughtless': 888,
'authoring': 62813,
'rake': 889,
'ropes': 890,
'whitewashed': 892,
'donal': 893,
'arching': 4910,
'cockamamie': 899,
'lifeless': 895,
'perfidy': 896,
'teresa': 897,
'bulldog': 898,
'vingh': 73726,
'evacuees': 65858,
'rasberries': 900,
'chiseling': 903,
'clampets': 905,
'grecianized': 138,
'smaller': 904,
'kluznick': 62184,
'aaaahhhhhhh': 909,
'wellingtonian': 908,
'dither': 910,
'incertitude': 911,
'florentine': 912,
'imperioli': 913,
'licking': 914,
'disparagement': 915,
'artfully': 916,
'feds': 917,
'fumiya': 918,
'jbl': 52774,
'tearfully': 919,
'welfare': 24905,
'idyllically': 49534,
'isha': 43702,
'lanchester': 920,
'undertaken': 921,
'longlost': 922,
'netted': 923,
'carrell': 924,
'uncompelling': 925,
'stems': 37258,
'reliefs': 926,
'leona': 927,
'autorenfilm': 928,
'unfriendly': 929,
'typewriter': 930,
'shifted': 931,
'bertrand': 932,
'blesses': 933,
'leukemia': 12666,
'posative': 142,
'tricking': 934,
'zanes': 936,
'dashboard': 12667,
'unknowingly': 937,
'flatmates': 51897,
'unnerve': 938,
'caning': 939,
'shortland': 146,
'recluse': 941,
'dcreasy': 942,
'scratchiness': 24911,
'pms': 30930,
'chipmunk': 943,
'tkachenko': 49537,
'dipper': 944,
'europeans': 61601,
'berserkers': 948,
'shys': 947,
'monte': 68505,
'eve': 949,
'luxury': 61828,
'conflagration': 950,
'water': 46389,
'irks': 951,
'positronic': 954,
'cushy': 150,
'swiftness': 957,
'underimpressed': 964,
'imprint': 959,
'sundance': 961,
'aida': 31951,
'thematically': 963,
'uno': 965,
'expressly': 966,
'russkies': 967,
'discos': 968,
'shaping': 969,
'verson': 970,
'blushed': 61831,
'prototype': 971,
'lifewell': 976,
'trafficker': 973,
'crucifixions': 62188,
'unrealistically': 975,
'rivas': 977,
'consequent': 978,
'katsu': 979,
'titantic': 980,
'jalees': 981,
'ranee': 982,
'gambles': 984,
'dispenses': 985,
'disfigurement': 986,
'bright': 987,
'cristian': 988,
'subculture': 37268,
'capta': 991,
'jewel': 992,
'erect': 993,
'avoide': 996,
'inconnu': 997,
'babbling': 1000,
'pac': 1001,
'performace': 1003,
'dorrit': 1004,
'runners': 1005,
'sentimentality': 1006,
'marred': 1007,
'commemorative': 1008,
'helpers': 1012,
'chiles': 1011,
'snowy': 1013,
'cheddar': 1014,
'neath': 158,
'outshine': 1016,
'wellbeing': 1020,
'envisioned': 43779,
'fanaticism': 1021,
'morrisette': 12687,
'sesame': 1024,
'gran': 1023,
'marlina': 1025,
'artificiality': 1030,
'coinsidence': 1027,
'founders': 1028,
'dismissably': 1029,
'dracht': 66299,
'scavengers': 1031,
'neese': 12685,
'pangborn': 1034,
'elmore': 1039,
'bristol': 71162,
'lillies': 1035,
'parkers': 1036,
'skipped': 1038,
'clipboard': 1042,
'jucier': 1041,
'haifa': 1043,
...}

``````
``````

In [49]:

def update_input_layer(review):

global layer_0

# clear out previous state, reset the layer to be all 0s
layer_0 *= 0
for word in review.split(" "):
layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

``````
``````

In [33]:

layer_0

``````
``````

Out[33]:

array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

``````
``````

In [51]:

def get_target_for_label(label):
if(label == 'POSITIVE'):
return 1
else:
return 0

``````
``````

In [54]:

labels[0]

``````
``````

Out[54]:

'POSITIVE'

``````
``````

In [52]:

get_target_for_label(labels[0])

``````
``````

Out[52]:

1

``````
``````

In [55]:

labels[1]

``````
``````

Out[55]:

'NEGATIVE'

``````
``````

In [53]:

get_target_for_label(labels[1])

``````
``````

Out[53]:

0

``````

# Project 3: Building a Neural Network

• 3 layer neural network
• no non-linearity in hidden layer
• use our functions to create the training data
• create a "pre_process_data" function to create vocabulary for our training data generating functions
• modify "train" to train over the entire corpus

### Where to Get Help if You Need it

``````

In [86]:

import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):

# set our random number generator
np.random.seed(1)

self.pre_process_data(reviews, labels)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self, reviews, labels):

review_vocab = set()
for review in reviews:
for word in review.split(" "):
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
if(word in self.word2index.keys()):
self.layer_0[0][self.word2index[word]] += 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def train(self, training_reviews, training_labels):

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer
self.update_input_layer(review)

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# TODO: Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# TODO: Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# TODO: Update the weights
self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

if(np.abs(layer_2_error) < 0.5):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
if(i % 2500 == 0):
print("")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer
self.update_input_layer(review.lower())

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

if(layer_2[0] > 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [87]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

``````
``````

In [61]:

# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):587.5% #Correct:500 #Tested:1000 Testing Accuracy:50.0%

``````
``````

In [62]:

# train the network
mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):89.58 #Correct:1250 #Trained:2501 Training Accuracy:49.9%
Progress:20.8% Speed(reviews/sec):95.03 #Correct:2500 #Trained:5001 Training Accuracy:49.9%
Progress:27.4% Speed(reviews/sec):95.46 #Correct:3295 #Trained:6592 Training Accuracy:49.9%

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
117             # TODO: Update the weights
118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
120
121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt:

``````
``````

In [63]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.01)

``````
``````

In [64]:

# train the network
mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):96.39 #Correct:1247 #Trained:2501 Training Accuracy:49.8%
Progress:20.8% Speed(reviews/sec):99.31 #Correct:2497 #Trained:5001 Training Accuracy:49.9%
Progress:22.8% Speed(reviews/sec):99.02 #Correct:2735 #Trained:5476 Training Accuracy:49.9%

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
117             # TODO: Update the weights
118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
120
121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt:

``````
``````

In [65]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.001)

``````
``````

In [66]:

# train the network
mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):98.77 #Correct:1267 #Trained:2501 Training Accuracy:50.6%
Progress:20.8% Speed(reviews/sec):98.79 #Correct:2640 #Trained:5001 Training Accuracy:52.7%
Progress:31.2% Speed(reviews/sec):98.58 #Correct:4109 #Trained:7501 Training Accuracy:54.7%
Progress:41.6% Speed(reviews/sec):93.78 #Correct:5638 #Trained:10001 Training Accuracy:56.3%
Progress:52.0% Speed(reviews/sec):91.76 #Correct:7246 #Trained:12501 Training Accuracy:57.9%
Progress:62.5% Speed(reviews/sec):92.42 #Correct:8841 #Trained:15001 Training Accuracy:58.9%
Progress:69.4% Speed(reviews/sec):92.58 #Correct:9934 #Trained:16668 Training Accuracy:59.5%

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
117             # TODO: Update the weights
118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
120
121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt:

``````

# Understanding Neural Noise

``````

In [67]:

from IPython.display import Image
Image(filename='sentiment_network.png')

``````
``````

Out[67]:

``````
``````

In [70]:

def update_input_layer(review):

global layer_0

# clear out previous state, reset the layer to be all 0s
layer_0 *= 0
for word in review.split(" "):
layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

``````
``````

In [71]:

layer_0

``````
``````

Out[71]:

array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

``````
``````

In [79]:

review_counter = Counter()

``````
``````

In [80]:

for word in reviews[0].split(" "):
review_counter[word] += 1

``````
``````

In [81]:

review_counter.most_common()

``````
``````

Out[81]:

[('.', 27),
('', 18),
('the', 9),
('to', 6),
('i', 5),
('high', 5),
('is', 4),
('of', 4),
('a', 4),
('bromwell', 4),
('teachers', 4),
('that', 4),
('their', 2),
('my', 2),
('at', 2),
('as', 2),
('me', 2),
('in', 2),
('students', 2),
('it', 2),
('student', 2),
('school', 2),
('through', 1),
('insightful', 1),
('ran', 1),
('years', 1),
('here', 1),
('episode', 1),
('reality', 1),
('what', 1),
('far', 1),
('t', 1),
('saw', 1),
('s', 1),
('repeatedly', 1),
('isn', 1),
('closer', 1),
('and', 1),
('fetched', 1),
('remind', 1),
('can', 1),
('welcome', 1),
('line', 1),
('your', 1),
('survive', 1),
('teaching', 1),
('satire', 1),
('classic', 1),
('who', 1),
('age', 1),
('knew', 1),
('schools', 1),
('inspector', 1),
('comedy', 1),
('down', 1),
('pity', 1),
('m', 1),
('all', 1),
('see', 1),
('think', 1),
('situation', 1),
('time', 1),
('pomp', 1),
('other', 1),
('much', 1),
('many', 1),
('which', 1),
('one', 1),
('profession', 1),
('programs', 1),
('same', 1),
('some', 1),
('such', 1),
('pettiness', 1),
('immediately', 1),
('expect', 1),
('financially', 1),
('recalled', 1),
('tried', 1),
('whole', 1),
('right', 1),
('life', 1),
('cartoon', 1),
('scramble', 1),
('sack', 1),
('believe', 1),
('when', 1),
('than', 1),
('burn', 1),
('pathetic', 1)]

``````

# Project 4: Reducing Noise in our Input Data

``````

In [82]:

import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):

# set our random number generator
np.random.seed(1)

self.pre_process_data(reviews, labels)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self, reviews, labels):

review_vocab = set()
for review in reviews:
for word in review.split(" "):
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
if(word in self.word2index.keys()):
self.layer_0[0][self.word2index[word]] = 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def train(self, training_reviews, training_labels):

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer
self.update_input_layer(review)

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# TODO: Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# TODO: Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# TODO: Update the weights
self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

if(np.abs(layer_2_error) < 0.5):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
if(i % 2500 == 0):
print("")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer
self.update_input_layer(review.lower())

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

if(layer_2[0] > 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [83]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

``````
``````

In [84]:

mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):91.50 #Correct:1795 #Trained:2501 Training Accuracy:71.7%
Progress:20.8% Speed(reviews/sec):95.25 #Correct:3811 #Trained:5001 Training Accuracy:76.2%
Progress:31.2% Speed(reviews/sec):93.74 #Correct:5898 #Trained:7501 Training Accuracy:78.6%
Progress:41.6% Speed(reviews/sec):93.69 #Correct:8042 #Trained:10001 Training Accuracy:80.4%
Progress:52.0% Speed(reviews/sec):95.27 #Correct:10186 #Trained:12501 Training Accuracy:81.4%
Progress:62.5% Speed(reviews/sec):98.19 #Correct:12317 #Trained:15001 Training Accuracy:82.1%
Progress:72.9% Speed(reviews/sec):98.56 #Correct:14440 #Trained:17501 Training Accuracy:82.5%
Progress:83.3% Speed(reviews/sec):99.74 #Correct:16613 #Trained:20001 Training Accuracy:83.0%
Progress:93.7% Speed(reviews/sec):100.7 #Correct:18794 #Trained:22501 Training Accuracy:83.5%
Progress:99.9% Speed(reviews/sec):101.9 #Correct:20115 #Trained:24000 Training Accuracy:83.8%

``````
``````

In [85]:

# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):832.7% #Correct:851 #Tested:1000 Testing Accuracy:85.1%

``````

# Analyzing Inefficiencies in our Network

``````

In [88]:

Image(filename='sentiment_network_sparse.png')

``````
``````

Out[88]:

``````
``````

In [89]:

layer_0 = np.zeros(10)

``````
``````

In [90]:

layer_0

``````
``````

Out[90]:

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

``````
``````

In [91]:

layer_0[4] = 1
layer_0[9] = 1

``````
``````

In [92]:

layer_0

``````
``````

Out[92]:

array([ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.])

``````
``````

In [93]:

weights_0_1 = np.random.randn(10,5)

``````
``````

In [94]:

layer_0.dot(weights_0_1)

``````
``````

Out[94]:

array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

``````
``````

In [101]:

indices = [4,9]

``````
``````

In [102]:

layer_1 = np.zeros(5)

``````
``````

In [103]:

for index in indices:
layer_1 += (weights_0_1[index])

``````
``````

In [104]:

layer_1

``````
``````

Out[104]:

array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

``````
``````

In [100]:

Image(filename='sentiment_network_sparse_2.png')

``````
``````

Out[100]:

``````

# Project 5: Making our Network More Efficient

``````

In [105]:

import time
import sys

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):

np.random.seed(1)

self.pre_process_data(reviews)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self,reviews):

review_vocab = set()
for review in reviews:
for word in review.split(" "):
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))
self.layer_1 = np.zeros((1,hidden_nodes))

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
self.layer_0[0][self.word2index[word]] = 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def train(self, training_reviews_raw, training_labels):

training_reviews = list()
for review in training_reviews_raw:
indices = set()
for word in review.split(" "):
if(word in self.word2index.keys()):
training_reviews.append(list(indices))

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer

# Hidden layer
#             layer_1 = self.layer_0.dot(self.weights_0_1)
self.layer_1 *= 0
for index in review:
self.layer_1 += self.weights_0_1[index]

# Output layer
layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# Update the weights
self.weights_1_2 -= self.layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step

for index in review:
self.weights_0_1[index] -= layer_1_delta[0] * self.learning_rate # update input-to-hidden weights with gradient descent step

if(np.abs(layer_2_error) < 0.5):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer

# Hidden layer
self.layer_1 *= 0
unique_indices = set()
for word in review.lower().split(" "):
if word in self.word2index.keys():
for index in unique_indices:
self.layer_1 += self.weights_0_1[index]

# Output layer
layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

if(layer_2[0] > 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [106]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

``````
``````

In [111]:

mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

In [109]:

# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):1581.% #Correct:857 #Tested:1000 Testing Accuracy:85.7%

``````

# Further Noise Reduction

``````

In [112]:

Image(filename='sentiment_network_sparse_2.png')

``````
``````

Out[112]:

``````
``````

In [113]:

# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()

``````
``````

Out[113]:

[('edie', 4.6913478822291435),
('paulie', 4.0775374439057197),
('felix', 3.1527360223636558),
('polanski', 2.8233610476132043),
('matthau', 2.8067217286092401),
('victoria', 2.6810215287142909),
('mildred', 2.6026896854443837),
('gandhi', 2.5389738710582761),
('flawless', 2.451005098112319),
('superbly', 2.2600254785752498),
('perfection', 2.1594842493533721),
('astaire', 2.1400661634962708),
('captures', 2.0386195471595809),
('voight', 2.0301704926730531),
('wonderfully', 2.0218960560332353),
('powell', 1.9783454248084671),
('brosnan', 1.9547990964725592),
('lily', 1.9203768470501485),
('bakshi', 1.9029851043382795),
('lincoln', 1.9014583864844796),
('refreshing', 1.8551812956655511),
('breathtaking', 1.8481124057791867),
('bourne', 1.8478489358790986),
('lemmon', 1.8458266904983307),
('delightful', 1.8002701588959635),
('flynn', 1.7996646487351682),
('andrews', 1.7764919970972666),
('homer', 1.7692866133759964),
('beautifully', 1.7626953362841438),
('soccer', 1.7578579175523736),
('elvira', 1.7397031072720019),
('underrated', 1.7197859696029656),
('gripping', 1.7165360479904674),
('superb', 1.7091514458966952),
('delight', 1.6714733033535532),
('welles', 1.6677068205580761),
('sinatra', 1.6389967146756448),
('touching', 1.637217476541176),
('timeless', 1.62924053973028),
('macy', 1.6211339521972916),
('unforgettable', 1.6177367152487956),
('favorites', 1.6158688027643908),
('stewart', 1.6119987332957739),
('hartley', 1.6094379124341003),
('sullivan', 1.6094379124341003),
('extraordinary', 1.6094379124341003),
('brilliantly', 1.5950491749820008),
('friendship', 1.5677652160335325),
('wonderful', 1.5645425925262093),
('palma', 1.5553706911638245),
('magnificent', 1.54663701119507),
('finest', 1.5462590108125689),
('jackie', 1.5439233053234738),
('ritter', 1.5404450409471491),
('tremendous', 1.5184661342283736),
('freedom', 1.5091151908062312),
('fantastic', 1.5048433868558566),
('terrific', 1.5026699370083942),
('noir', 1.493925025312256),
('sidney', 1.493925025312256),
('outstanding', 1.4910053152089213),
('mann', 1.4894785973551214),
('pleasantly', 1.4894785973551214),
('nancy', 1.488077055429833),
('marie', 1.4825711915553104),
('marvelous', 1.4739999415389962),
('excellent', 1.4647538505723599),
('ruth', 1.4596256342054401),
('stanwyck', 1.4412101187160054),
('widmark', 1.4350845252893227),
('splendid', 1.4271163556401458),
('chan', 1.423108334242607),
('exceptional', 1.4201959127955721),
('tender', 1.410986973710262),
('gentle', 1.4078005663408544),
('poignant', 1.4022947024663317),
('gem', 1.3932148039644643),
('amazing', 1.3919815802404802),
('chilling', 1.3862943611198906),
('captivating', 1.3862943611198906),
('fisher', 1.3862943611198906),
('davies', 1.3862943611198906),
('darker', 1.3652409519220583),
('april', 1.3499267169490159),
('kelly', 1.3461743673304654),
('blake', 1.3418425985490567),
('overlooked', 1.329135947279942),
('ralph', 1.32818673031261),
('bette', 1.3156767939059373),
('hoffman', 1.3150668518315229),
('cole', 1.3121863889661687),
('shines', 1.3049487216659381),
('powerful', 1.2999662776313934),
('notch', 1.2950456896547455),
('remarkable', 1.2883688239495823),
('pitt', 1.286210902562908),
('winters', 1.2833463918674481),
('vivid', 1.2762934659055623),
('gritty', 1.2757524867200667),
('giallo', 1.2745029551317739),
('portrait', 1.2704625455947689),
('innocence', 1.2694300209805796),
('psychiatrist', 1.2685113254635072),
('favorite', 1.2668956297860055),
('ensemble', 1.2656663733312759),
('stunning', 1.2622417124499117),
('burns', 1.259880436264232),
('garbo', 1.258954938743289),
('barbara', 1.2580400255962119),
('panic', 1.2527629684953681),
('holly', 1.2527629684953681),
('philip', 1.2527629684953681),
('carol', 1.2481440226390734),
('perfect', 1.246742480713785),
('appreciated', 1.2462482874741743),
('favourite', 1.2411123512753928),
('journey', 1.2367626271489269),
('rural', 1.235471471385307),
('bond', 1.2321436812926323),
('builds', 1.2305398317106577),
('brilliant', 1.2287554137664785),
('brooklyn', 1.2286654169163074),
('von', 1.225175011976539),
('unfolds', 1.2163953243244932),
('recommended', 1.2163953243244932),
('daniel', 1.20215296760895),
('perfectly', 1.1971931173405572),
('crafted', 1.1962507582320256),
('prince', 1.1939224684724346),
('troubled', 1.192138346678933),
('consequences', 1.1865810616140668),
('haunting', 1.1814999484738773),
('cinderella', 1.180052620608284),
('alexander', 1.1759989522835299),
('emotions', 1.1753049094563641),
('boxing', 1.1735135968412274),
('subtle', 1.1734135017508081),
('curtis', 1.1649873576129823),
('rare', 1.1566438362402944),
('loved', 1.1563661500586044),
('daughters', 1.1526795099383853),
('courage', 1.1438688802562305),
('dentist', 1.1426722784621401),
('highly', 1.1420208631618658),
('nominated', 1.1409146683587992),
('tony', 1.1397491942285991),
('draws', 1.1325138403437911),
('everyday', 1.1306150197542835),
('contrast', 1.1284652518177909),
('cried', 1.1213405397456659),
('fabulous', 1.1210851445201684),
('ned', 1.120591195386885),
('fay', 1.120591195386885),
('emma', 1.1184149159642893),
('sensitive', 1.113318436057805),
('smooth', 1.1089750757036563),
('dramas', 1.1080910326226534),
('today', 1.1050431789984001),
('helps', 1.1023091505494358),
('inspiring', 1.0986122886681098),
('jimmy', 1.0937696641923216),
('awesome', 1.0931328229034842),
('unique', 1.0881409888008142),
('tragic', 1.0871835928444868),
('intense', 1.0870514662670339),
('stellar', 1.0857088838322018),
('rival', 1.0822184788924332),
('provides', 1.0797081340289569),
('depression', 1.0782034170369026),
('shy', 1.0775588794702773),
('carrie', 1.076139432816051),
('blend', 1.0753554265038423),
('hank', 1.0736109864626924),
('diana', 1.0726368022648489),
('unexpected', 1.0722255334949147),
('achievement', 1.0668635903535293),
('bettie', 1.0663514264498881),
('happiness', 1.0632729222228008),
('glorious', 1.0608719606852626),
('davis', 1.0541605260972757),
('terrifying', 1.0525211814678428),
('beauty', 1.050410186850232),
('ideal', 1.0479685558493548),
('fears', 1.0467872208035236),
('hong', 1.0438040521731147),
('seasons', 1.0433496099930604),
('fascinating', 1.0414538748281612),
('carries', 1.0345904299031787),
('satisfying', 1.0321225473992768),
('definite', 1.0319209141694374),
('touched', 1.0296194171811581),
('greatest', 1.0248947127715422),
('creates', 1.0241097613701886),
('aunt', 1.023388867430522),
('walter', 1.022328983918479),
('spectacular', 1.0198314108149955),
('portrayal', 1.0189810189761024),
('ann', 1.0127808528183286),
('enterprise', 1.0116009116784799),
('musicals', 1.0096648026516135),
('deeply', 1.0094845087721023),
('incredible', 1.0061677561461084),
('mature', 1.0060195018402847),
('triumph', 0.99682959435816731),
('margaret', 0.99682959435816731),
('navy', 0.99493385919326827),
('harry', 0.99176919305006062),
('lucas', 0.990398704027877),
('sweet', 0.98966110487955483),
('joey', 0.98794672078059009),
('oscar', 0.98721905111049713),
('balance', 0.98649499054740353),
('warm', 0.98485340331145166),
('ages', 0.98449898190068863),
('glover', 0.98082925301172619),
('guilt', 0.98082925301172619),
('carrey', 0.98082925301172619),
('learns', 0.97881108885548895),
('unusual', 0.97788374278196932),
('sons', 0.97777581552483595),
('complex', 0.97761897738147796),
('essence', 0.97753435711487369),
('brazil', 0.9769153536905899),
('widow', 0.97650959186720987),
('solid', 0.97537964824416146),
('beautiful', 0.97326301262841053),
('holmes', 0.97246100334120955),
('awe', 0.97186058302896583),
('vhs', 0.97116734209998934),
('eerie', 0.97116734209998934),
('lonely', 0.96873720724669754),
('grim', 0.96873720724669754),
('sport', 0.96825047080486615),
('debut', 0.96508089604358704),
('destiny', 0.96343751029985703),
('thrillers', 0.96281074750904794),
('tears', 0.95977584381389391),
('rose', 0.95664202739772253),
('feelings', 0.95551144502743635),
('ginger', 0.95551144502743635),
('winning', 0.95471810900804055),
('stanley', 0.95387344302319799),
('cox', 0.95343027882361187),
('paris', 0.95278479030472663),
('heart', 0.95238806924516806),
('hooked', 0.95155887071161305),
('comfortable', 0.94803943018873538),
('mgm', 0.94446160884085151),
('masterpiece', 0.94155039863339296),
('themes', 0.94118828349588235),
('danny', 0.93967118051821874),
('anime', 0.93378388932167222),
('perry', 0.93328830824272613),
('joy', 0.93301752567946861),
('lovable', 0.93081883243706487),
('hal', 0.92953595862417571),
('mysteries', 0.92953595862417571),
('louis', 0.92871325187271225),
('charming', 0.92520609553210742),
('urban', 0.92367083917177761),
('allows', 0.92183091224977043),
('impact', 0.91815814604895041),
('lifestyle', 0.91629073187415511),
('italy', 0.91629073187415511),
('spy', 0.91289514287301687),
('treat', 0.91193342650519937),
('subsequent', 0.91056005716517008),
('kennedy', 0.90981821736853763),
('loving', 0.90967549275543591),
('surprising', 0.90937028902958128),
('quiet', 0.90648673177753425),
('winter', 0.90624039602065365),
('reveals', 0.90490540964902977),
('raw', 0.90445627422715225),
('funniest', 0.90078654533818991),
('norman', 0.89994159387262562),
('thief', 0.89874642222324552),
('season', 0.89827222637147675),
('secrets', 0.89794159320595857),
('colorful', 0.89705936994626756),
('highest', 0.8967461358011849),
('compelling', 0.89462923509297576),
('danes', 0.89248008318043659),
('castle', 0.88967708335606499),
('kudos', 0.88889175768604067),
('great', 0.88810470901464589),
('baseball', 0.88730319500090271),
('subtitles', 0.88730319500090271),
('bleak', 0.88730319500090271),
('winner', 0.88643776872447388),
('tragedy', 0.88563699078315261),
('todd', 0.88551907320740142),
('nicely', 0.87924946019380601),
('arthur', 0.87546873735389985),
('essential', 0.87373111745535925),
('gorgeous', 0.8731725250935497),
('fonda', 0.87294029100054127),
('eastwood', 0.87139541196626402),
('focuses', 0.87082835779739776),
('enjoyed', 0.87070195951624607),
('natural', 0.86997924506912838),
('intensity', 0.86835126958503595),
('witty', 0.86824103423244681),
('rob', 0.8642954367557748),
('worlds', 0.86377269759070874),
('health', 0.86113891179907498),
('magical', 0.85953791528170564),
('deeper', 0.85802182375017932),
('lucy', 0.85618680780444956),
('moving', 0.85566611005772031),
('lovely', 0.85290640004681306),
('purple', 0.8513711857748395),
('memorable', 0.84801189112086062),
('sings', 0.84729786038720367),
('craig', 0.84342938360928321),
('modesty', 0.84342938360928321),
('relate', 0.84326559685926517),
('episodes', 0.84223712084137292),
('strong', 0.84167135777060931),
('smith', 0.83959811108590054),
('tear', 0.83704136022001441),
('apartment', 0.83333115290549531),
('princess', 0.83290912293510388),
('disagree', 0.83290912293510388),
('kung', 0.83173334384609199),
('columbo', 0.82667857318446791),
('jake', 0.82667857318446791),
('hart', 0.82472353834866463),
('strength', 0.82417544296634937),
('realizes', 0.82360006895738058),
('dave', 0.8232003088081431),
('childhood', 0.82208086393583857),
('forbidden', 0.81989888619908913),
('tight', 0.81883539572344199),
('surreal', 0.8178506590609026),
('manager', 0.81770990320170756),
('dancer', 0.81574950265227764),
('con', 0.81093021621632877),
('studios', 0.81093021621632877),
('miike', 0.80821651034473263),
('realistic', 0.80807714723392232),
('explicit', 0.80792269515237358),
('kurt', 0.8060875917405409),
('deals', 0.80535917116687328),
('holds', 0.80493858654806194),
('carl', 0.80437281567016972),
('touches', 0.80396154690023547),
('gene', 0.80314807577427383),
('albert', 0.8027669055771679),
('abc', 0.80234647252493729),
('cry', 0.80011930011211307),
('sides', 0.7995275841185171),
('develops', 0.79850769621777162),
('eyre', 0.79850769621777162),
('dances', 0.79694397424158891),
('oscars', 0.79633141679517616),
('legendary', 0.79600456599965308),
('importance', 0.79492987486988764),
('hearted', 0.79492987486988764),
('portraying', 0.79356592830699269),
('impressed', 0.79258107754813223),
('waters', 0.79112758892014912),
('empire', 0.79078565012386137),
('edge', 0.789774016249017),
('environment', 0.78845736036427028),
('jean', 0.78845736036427028),
('sentimental', 0.7864791203521645),
('captured', 0.78623760362595729),
('styles', 0.78592891401091158),
('daring', 0.78592891401091158),
('backgrounds', 0.78275933924963248),
('frank', 0.78275933924963248),
('matches', 0.78275933924963248),
('tense', 0.78275933924963248),
('gothic', 0.78209466657644144),
('sharp', 0.7814397877056235),
('achieved', 0.78015855754957497),
('court', 0.77947526404844247),
('steals', 0.7789140023173704),
('rules', 0.77844476107184035),
('colors', 0.77684619943659217),
('reunion', 0.77318988823348167),
('covers', 0.77139937745969345),
('tale', 0.77010822169607374),
('rain', 0.7683706017975328),
('denzel', 0.76804848873306297),
('stays', 0.76787072675588186),
('blob', 0.76725515271366718),
('conventional', 0.76214005204689672),
('maria', 0.76214005204689672),
('fresh', 0.76158434211317383),
('midnight', 0.76096977689870637),
('landscape', 0.75852993982279704),
('animated', 0.75768570169751648),
('titanic', 0.75666058628227129),
('sunday', 0.75666058628227129),
('spring', 0.7537718023763802),
('cagney', 0.7537718023763802),
('enjoyable', 0.75246375771636476),
('immensely', 0.75198768058287868),
('sir', 0.7507762933965817),
('nevertheless', 0.75067102469813185),
('driven', 0.74994477895307854),
('performances', 0.74883252516063137),
('memories', 0.74721440183022114),
('simple', 0.74641420974143258),
('golden', 0.74533293373051557),
('leslie', 0.74533293373051557),
('lovers', 0.74497224842453125),
('relationship', 0.74484232345601786),
('supporting', 0.74357803418683721),
('che', 0.74262723782331497),
('packed', 0.7410032017375805),
('trek', 0.74021469141793106),
('provoking', 0.73840377214806618),
('strikes', 0.73759894313077912),
('depiction', 0.73682224406260699),
('emotional', 0.73678211645681524),
('secretary', 0.7366322924996842),
('influenced', 0.73511137965897755),
('florida', 0.73511137965897755),
('germany', 0.73288750920945944),
('brings', 0.73142936713096229),
('lewis', 0.73129894652432159),
('elderly', 0.73088750854279239),
('owner', 0.72743625403857748),
('streets', 0.72666987259858895),
('henry', 0.72642196944481741),
('portrays', 0.72593700338293632),
('bears', 0.7252354951114458),
('china', 0.72489587887452556),
('anger', 0.72439972406404984),
('society', 0.72433010799663333),
('available', 0.72415741730250549),
('best', 0.72347034060446314),
('bugs', 0.72270598280148979),
('magic', 0.71878961117328299),
('verhoeven', 0.71846498854423513),
('delivers', 0.71846498854423513),
('jim', 0.71783979315031676),
('donald', 0.71667767797013937),
('endearing', 0.71465338578090898),
('relationships', 0.71393795022901896),
('greatly', 0.71256526641704687),
('charlie', 0.71024161391924534),
('simon', 0.70967648251115578),
('effectively', 0.70914752190638641),
('march', 0.70774597998109789),
('atmosphere', 0.70744773070214162),
('influence', 0.70733181555190172),
('genius', 0.706392407309966),
('emotionally', 0.70556970055850243),
('ken', 0.70526854109229009),
('identity', 0.70484322032313651),
('sophisticated', 0.70470800296102132),
('dan', 0.70457587638356811),
('andrew', 0.70329955202396321),
('india', 0.70144598337464037),
('roy', 0.69970458110610434),
('surprisingly', 0.6995780708902356),
('sky', 0.69780919366575667),
('romantic', 0.69664981111114743),
('match', 0.69566924999265523),
('britain', 0.69314718055994529),
('beatty', 0.69314718055994529),
('affected', 0.69314718055994529),
('cowboy', 0.69314718055994529),
('wave', 0.69314718055994529),
('stylish', 0.69314718055994529),
('bitter', 0.69314718055994529),
('patient', 0.69314718055994529),
('meets', 0.69314718055994529),
('love', 0.69198533541937324),
('paul', 0.68980827929443067),
('andy', 0.68846333124751902),
('performance', 0.68797386327972465),
('patrick', 0.68645819240914863),
('unlike', 0.68546468438792907),
('brooks', 0.68433655087779044),
('refuses', 0.68348526964820844),
('award', 0.6824518914431974),
('complaint', 0.6824518914431974),
('ride', 0.68229716453587952),
('dawson', 0.68171848473632257),
('luke', 0.68158635815886937),
('wells', 0.68087708796813096),
('france', 0.6804081547825156),
('handsome', 0.68007509899259255),
('sports', 0.68007509899259255),
('rebel', 0.67875844310784572),
('directs', 0.67875844310784572),
('greater', 0.67605274720064523),
('dreams', 0.67599410133369586),
('effective', 0.67565402311242806),
('interpretation', 0.67479804189174875),
('works', 0.67445504754779284),
('brando', 0.67445504754779284),
('noble', 0.6737290947028437),
('paced', 0.67314651385327573),
('le', 0.67067432470788668),
('master', 0.67015766233524654),
('h', 0.6696166831497512),
('rings', 0.66904962898088483),
('easy', 0.66895995494594152),
('city', 0.66820823221269321),
('sunshine', 0.66782937257565544),
('succeeds', 0.66647893347778397),
('relations', 0.664159643686693),
('england', 0.66387679825983203),
('glimpse', 0.66329421741026418),
('aired', 0.66268797307523675),
('sees', 0.66263163663399482),
('both', 0.66248336767382998),
('definitely', 0.66199789483898808),
('imaginative', 0.66139848224536502),
('appreciate', 0.66083893732728749),
('tricks', 0.66071190480679143),
('striking', 0.66071190480679143),
('carefully', 0.65999497324304479),
('complicated', 0.65981076029235353),
('perspective', 0.65962448852130173),
('trilogy', 0.65877953705573755),
('future', 0.65834665141052828),
('lion', 0.65742909795786608),
('victor', 0.65540685257709819),
('douglas', 0.65540685257709819),
('inspired', 0.65459851044271034),
('marriage', 0.65392646740666405),
('demands', 0.65392646740666405),
('father', 0.65172321672194655),
('page', 0.65123628494430852),
('instant', 0.65058756614114943),
('era', 0.6495567444850836),
('ruthless', 0.64934455790155243),
('saga', 0.64934455790155243),
('joan', 0.64891392558311978),
('joseph', 0.64841128671855386),
('workers', 0.64829661439459352),
('fantasy', 0.64726757480925168),
('accomplished', 0.64551913157069074),
('distant', 0.64551913157069074),
('manhattan', 0.64435701639051324),
('personal', 0.64355023942057321),
('pushing', 0.64313675998528386),
('meeting', 0.64313675998528386),
('individual', 0.64313675998528386),
('pleasant', 0.64250344774119039),
('brave', 0.64185388617239469),
('william', 0.64083139119578469),
('hudson', 0.64077919504262937),
('friendly', 0.63949446706762514),
('eccentric', 0.63907995928966954),
('awards', 0.63875310849414646),
('jack', 0.63838309514997038),
('seeking', 0.63808740337691783),
('colonel', 0.63757732940513456),
('divorce', 0.63757732940513456),
('jane', 0.63443957973316734),
('keeping', 0.63414883979798953),
('gives', 0.63383568159497883),
('ted', 0.63342794585832296),
('animation', 0.63208692379869902),
('progress', 0.6317782341836532),
('concert', 0.63127177684185776),
('larger', 0.63127177684185776),
('nation', 0.6296337748376194),
('albeit', 0.62739580299716491),
('discovers', 0.62542900650499444),
('classic', 0.62504956428050518),
('segment', 0.62335141862440335),
('morgan', 0.62303761437291871),
('mouse', 0.62294292188669675),
('impressive', 0.62211140744319349),
('artist', 0.62168821657780038),
('ultimate', 0.62168821657780038),
('griffith', 0.62117368093485603),
('emily', 0.62082651898031915),
('drew', 0.62082651898031915),
('moved', 0.6197197120051281),
('profound', 0.61903920840622351),
('families', 0.61903920840622351),
('innocent', 0.61851219917136446),
('versions', 0.61730910416844087),
('eddie', 0.61691981517206107),
('criticism', 0.61651395453902935),
('nature', 0.61594514653194088),
('recognized', 0.61518563909023349),
('sexuality', 0.61467556511845012),
('contract', 0.61400986000122149),
('brian', 0.61344043794920278),
('remembered', 0.6131044728864089),
('determined', 0.6123858239154869),
('offers', 0.61207935747116349),
('pleasure', 0.61195702582993206),
('washington', 0.61180154110599294),
('images', 0.61159731359583758),
('games', 0.61067095873570676),
('fashioned', 0.60798937221963845),
('melodrama', 0.60749173598145145),
('peoples', 0.60613580357031549),
('charismatic', 0.60613580357031549),
('rough', 0.60613580357031549),
('dealing', 0.60517840761398811),
('fine', 0.60496962268013299),
('tap', 0.60391604683200273),
('trio', 0.60157998703445481),
('russell', 0.60120968523425966),
('figures', 0.60077386042893011),
('ward', 0.60005675749393339),
('shine', 0.59911823091166894),
('job', 0.59845562125168661),
('satisfied', 0.59652034487087369),
('river', 0.59637962862495086),
('brown', 0.595773016534769),
('believable', 0.59566072133302495),
('bound', 0.59470710774669278),
('always', 0.59470710774669278),
('hall', 0.5933967777928858),
('cook', 0.5916777203950857),
('claire', 0.59136448625000293),
('anna', 0.58778666490211906),
('peace', 0.58628403501758408),
('visually', 0.58539431926349916),
('falk', 0.58525821854876026),
('morality', 0.58525821854876026),
('growing', 0.58466653756587539),
('experiences', 0.58314628534561685),
('stood', 0.58314628534561685),
('touch', 0.58122926435596001),
('lives', 0.5810976767513224),
('kubrick', 0.58066919713325493),
('timing', 0.58047401805583243),
('struggles', 0.57981849525294216),
('expressions', 0.57981849525294216),
('authentic', 0.57848427223980559),
('helen', 0.57763429343810091),
('pre', 0.57700753064729182),
('quirky', 0.5753641449035618),
('young', 0.57531672344534313),
('inner', 0.57454143815209846),
('mexico', 0.57443087372056334),
('clint', 0.57380042292737909),
('sisters', 0.57286101468544337),
('realism', 0.57226528899949558),
('personalities', 0.5720692490067093),
('french', 0.5720692490067093),
('surprises', 0.57113222999698177),
('overcome', 0.5697681593994407),
('timothy', 0.56953322459276867),
('tales', 0.56909453188996639),
('war', 0.56843317302781682),
('civil', 0.5679840376059393),
('countries', 0.56737779327091187),
('streep', 0.56710645966458029),
('oliver', 0.56673325570428668),
('australia', 0.56580775818334383),
('understanding', 0.56531380905006046),
('players', 0.56509525370004821),
('knowing', 0.56489284503626647),
('rogers', 0.56421349718405212),
('suspenseful', 0.56368911332305849),
('variety', 0.56368911332305849),
('true', 0.56281525180810066),
('jr', 0.56220982311246936),
('psychological', 0.56108745854687891),
('branagh', 0.55961578793542266),
('wealth', 0.55961578793542266),
('performing', 0.55961578793542266),
('odds', 0.55961578793542266),
('sent', 0.55961578793542266),
('reminiscent', 0.55961578793542266),
('grand', 0.55961578793542266),
('overwhelming', 0.55961578793542266),
('brothers', 0.55891181043362848),
('howard', 0.55811089675600245),
('david', 0.55693122256475369),
('generation', 0.55628799784274796),
('grow', 0.55612538299565417),
('survival', 0.55594605904646033),
('mainstream', 0.55574731115750231),
('dick', 0.55431073570572953),
('charm', 0.55288175575407861),
('kirk', 0.55278982286502287),
('twists', 0.55244729845681018),
('gangster', 0.55206858230003986),
('jeff', 0.55179306225421365),
('family', 0.55116244510065526),
('tend', 0.55053307336110335),
('thanks', 0.55049088015842218),
('world', 0.54744234723432639),
('sutherland', 0.54743536937855164),
('life', 0.54695514434959924),
('disc', 0.54654370636806993),
('bug', 0.54654370636806993),
('tribute', 0.5455111817538808),
('europe', 0.54522705048332309),
('sacrifice', 0.54430155296238014),
('color', 0.54405127139431109),
('superior', 0.54333490233128523),
('york', 0.54318235866536513),
('pulls', 0.54266622962164945),
('hearts', 0.54232429082536171),
('jackson', 0.54232429082536171),
('enjoy', 0.54124285135906114),
('redemption', 0.54056759296472823),
('hamilton', 0.5389965007326869),
('stands', 0.5389965007326869),
('trial', 0.5389965007326869),
('greek', 0.5389965007326869),
('each', 0.5388212312554177),
('faithful', 0.53773307668591508),
('jealous', 0.53714293208336406),
('documentaries', 0.53714293208336406),
('different', 0.53709860682460819),
('describes', 0.53680111016925136),
('shorts', 0.53596159703753288),
('brilliance', 0.53551823635636209),
('mountains', 0.53492317534505118),
('share', 0.53408248593025787),
('dealt', 0.53408248593025787),
('providing', 0.53329847961804933),
('explore', 0.53329847961804933),
('series', 0.5325809226575603),
('fellow', 0.5323318289869543),
('loves', 0.53062825106217038),
('olivier', 0.53062825106217038),
('revolution', 0.53062825106217038),
('roman', 0.53062825106217038),
('century', 0.53002783074992665),
('musical', 0.52966871156747064),
('heroic', 0.52925932545482868),
('ironically', 0.52806743020049673),
('approach', 0.52806743020049673),
('temple', 0.52806743020049673),
('moves', 0.5279372642387119),
('julie', 0.52609309589677911),
('tells', 0.52415107836314001),
('uncle', 0.52354439617376536),
('union', 0.52324814376454787),
('deep', 0.52309571635780505),
('reminds', 0.52157841554225237),
('famous', 0.52118841080153722),
('jazz', 0.52053443789295151),
('dennis', 0.51987545928590861),
('epic', 0.51919387343650736),
('shows', 0.51915322220375304),
('performed', 0.5191244265806858),
('demons', 0.5191244265806858),
('eric', 0.51879379341516751),
('discovered', 0.51879379341516751),
('youth', 0.5185626062681431),
('human', 0.51851411224987087),
('tarzan', 0.51813827061227724),
('ourselves', 0.51794309153485463),
('wwii', 0.51758240622887042),
('passion', 0.5162164724008671),
('desire', 0.51607497965213445),
('pays', 0.51581316527702981),
('fox', 0.51557622652458857),
('dirty', 0.51557622652458857),
('symbolism', 0.51546600332249293),
('sympathetic', 0.51546600332249293),
('attitude', 0.51530993621331933),
('appearances', 0.51466440007315639),
('jeremy', 0.51466440007315639),
('fun', 0.51439068993048687),
('south', 0.51420972175023116),
('arrives', 0.51409894911095988),
('present', 0.51341965894303732),
('com', 0.51326167856387173),
('smile', 0.51265880484765169),
('fits', 0.51082562376599072),
('provided', 0.51082562376599072),
('carter', 0.51082562376599072),
('ring', 0.51082562376599072),
('aging', 0.51082562376599072),
('countryside', 0.51082562376599072),
('alan', 0.51082562376599072),
('visit', 0.51082562376599072),
('begins', 0.51015650363396647),
('success', 0.50900578704900468),
('japan', 0.50900578704900468),
('accurate', 0.50895471583017893),
('proud', 0.50800474742434931),
('daily', 0.5075946031845443),
('atmospheric', 0.50724780241810674),
('karloff', 0.50724780241810674),
('recently', 0.50714914903668207),
('fu', 0.50704490092608467),
('horrors', 0.50656122497953315),
('finding', 0.50637127341661037),
('lust', 0.5059356384717989),
('hitchcock', 0.50574947073413001),
('among', 0.50334004951332734),
('viewing', 0.50302139827440906),
('shining', 0.50262885656181222),
('investigation', 0.50262885656181222),
('duo', 0.5020919437972361),
('cameron', 0.5020919437972361),
('finds', 0.50128303100539795),
('contemporary', 0.50077528791248915),
('genuine', 0.50046283673044401),
('frightening', 0.49995595152908684),
('plays', 0.49975983848890226),
('age', 0.49941323171424595),
('position', 0.49899116611898781),
('continues', 0.49863035067217237),
('roles', 0.49839716550752178),
('james', 0.49837216269470402),
('individuals', 0.49824684155913052),
('brought', 0.49783842823917956),
('hilarious', 0.49714551986191058),
('brutal', 0.49681488669639234),
('appropriate', 0.49643688631389105),
('dance', 0.49581998314812048),
('league', 0.49578774640145024),
('helping', 0.49578774640145024),
('stunts', 0.49561620510246196),
('traveling', 0.49532143723002542),
('thoroughly', 0.49414593456733524),
('depicted', 0.49317068852726992),
('honor', 0.49247648509779424),
('combination', 0.49247648509779424),
('differences', 0.49247648509779424),
('fully', 0.49213349075383811),
('tracy', 0.49159426183810306),
('battles', 0.49140753790888908),
('possibility', 0.49112055268665822),
('romance', 0.4901589869574316),
('initially', 0.49002249613622745),
('happy', 0.4898997500608791),
('crime', 0.48977221456815834),
('singing', 0.4893852925281213),
('especially', 0.48901267837860624),
('shakespeare', 0.48754793889664511),
('hugh', 0.48729512635579658),
('detail', 0.48609484250827351),
('guide', 0.48550781578170082),
('companion', 0.48550781578170082),
('julia', 0.48550781578170082),
('san', 0.48550781578170082),
('desperation', 0.48550781578170082),
('strongly', 0.48460242866688824),
('necessary', 0.48302334245403883),
('humanity', 0.48265474679929443),
('drama', 0.48221998493060503),
('warming', 0.48183808689273838),
('intrigue', 0.48183808689273838),
('nonetheless', 0.48183808689273838),
('cuba', 0.48183808689273838),
('planned', 0.47957308026188628),
('pictures', 0.47929937011921681),
('nine', 0.47803580094299974),
('settings', 0.47743860773325364),
('history', 0.47732966933780852),
('ordinary', 0.47725880012690741),
('primary', 0.47608267532211779),
('official', 0.47608267532211779),
('episode', 0.47529620261150429),
('role', 0.47520268270188676),
('spirit', 0.47477690799839323),
('grey', 0.47409361449726067),
('ways', 0.47323464982718205),
('cup', 0.47260441094579297),
('piano', 0.47260441094579297),
('familiar', 0.47241617565111949),
('sinister', 0.47198579044972683),
('reveal', 0.47171449364936496),
('max', 0.47150852042515579),
('dated', 0.47121648567094482),
('discovery', 0.47000362924573563),
('vicious', 0.47000362924573563),
('losing', 0.47000362924573563),
('genuinely', 0.46871413841586385),
('hatred', 0.46734051182625186),
('mistaken', 0.46702300110759781),
('dream', 0.46608972992459924),
('challenge', 0.46608972992459924),
('crisis', 0.46575733836428446),
('photographed', 0.46488852857896512),
('machines', 0.46430560813109778),
('critics', 0.46430560813109778),
('bird', 0.46430560813109778),
('born', 0.46411383518967209),
('detective', 0.4636633473511525),
('higher', 0.46328467899699055),
('remains', 0.46262352194811296),
('inevitable', 0.46262352194811296),
('soviet', 0.4618180446592961),
('ryan', 0.46134556650262099),
('african', 0.46112595521371813),
('smaller', 0.46081520319132935),
('techniques', 0.46052488529119184),
('information', 0.46034171833399862),
('deserved', 0.45999798712841444),
('cynical', 0.45953232937844013),
('lynch', 0.45953232937844013),
('francisco', 0.45953232937844013),
('tour', 0.45953232937844013),
('spielberg', 0.45953232937844013),
('struggle', 0.45911782160048453),
('language', 0.45902121257712653),
('visual', 0.45823514408822852),
('warner', 0.45724137763188427),
('social', 0.45720078250735313),
('reality', 0.45719346885019546),
('hidden', 0.45675840249571492),
('breaking', 0.45601738727099561),
('sometimes', 0.45563021171182794),
('modern', 0.45500247579345005),
('surfing', 0.45425527227759638),
('popular', 0.45410691533051023),
('surprised', 0.4534409399850382),
('follows', 0.45245361754408348),
('keeps', 0.45234869400701483),
('john', 0.4520909494482197),
('defeat', 0.45198512374305722),
('mixed', 0.45198512374305722),
('justice', 0.45142724367280018),
('treasure', 0.45083371313801535),
('presents', 0.44973793178615257),
('years', 0.44919197032104968),
('chief', 0.44895022004790319),
('closely', 0.44701411102103689),
('segments', 0.44701411102103689),
('lose', 0.44658335503763702),
('caine', 0.44628710262841953),
('caught', 0.44610275383999071),
('hamlet', 0.44558510189758965),
('chinese', 0.44507424620321018),
('welcome', 0.44438052435783792),
('birth', 0.44368632092836219),
('represents', 0.44320543609101143),
('puts', 0.44279106572085081),
('fame', 0.44183275227903923),
('closer', 0.44183275227903923),
('visuals', 0.44183275227903923),
('web', 0.44183275227903923),
('criminal', 0.4412745608048752),
('minor', 0.4409224199448939),
('jon', 0.44086703515908027),
('liked', 0.44074991514020723),
('restaurant', 0.44031183943833246),
('flaws', 0.43983275161237217),
('de', 0.43983275161237217),
('searching', 0.4393666597838457),
('rap', 0.43891304217570443),
('light', 0.43884433018199892),
('elizabeth', 0.43872232986464677),
('marry', 0.43861731542506488),
('oz', 0.43825493093115531),
('controversial', 0.43825493093115531),
('learned', 0.43825493093115531),
('slowly', 0.43785660389939979),
('bridge', 0.43721380642274466),
('thrilling', 0.43721380642274466),
('wayne', 0.43721380642274466),
('comedic', 0.43721380642274466),
('married', 0.43658501682196887),
('nazi', 0.4361020775700542),
('murder', 0.4353180712578455),
('physical', 0.4353180712578455),
('johnny', 0.43483971678806865),
('michelle', 0.43445264498141672),
('wallace', 0.43403848055222038),
('silent', 0.43395706390247063),
('comedies', 0.43395706390247063),
('played', 0.43387244114515305),
('international', 0.43363598507486073),
('vision', 0.43286408229627887),
('intelligent', 0.43196704885367099),
('shop', 0.43078291609245434),
('also', 0.43036720209769169),
('levels', 0.4302451371066513),
('miss', 0.43006426712153217),
('ocean', 0.4295626596872249),
...]

``````
``````

In [114]:

# words most frequently seen in a review with a "NEGATIVE" label
list(reversed(pos_neg_ratios.most_common()))[0:30]

``````
``````

Out[114]:

[('boll', -4.0778152602708904),
('uwe', -3.9218753018711578),
('seagal', -3.3202501058581921),
('unwatchable', -3.0269848170580955),
('stinker', -2.9876839403711624),
('mst', -2.7753833211707968),
('incoherent', -2.7641396677532537),
('unfunny', -2.5545257844967644),
('waste', -2.4907515123361046),
('blah', -2.4475792789485005),
('horrid', -2.3715779644809971),
('pointless', -2.3451073877136341),
('atrocious', -2.3187369339642556),
('redeeming', -2.2667790015910296),
('prom', -2.2601040980178784),
('drivel', -2.2476029585766928),
('lousy', -2.2118080125207054),
('worst', -2.1930856334332267),
('laughable', -2.172468615469592),
('awful', -2.1385076866397488),
('poorly', -2.1326133844207011),
('wasting', -2.1178155545614512),
('remotely', -2.111046881095167),
('existent', -2.0024805005437076),
('boredom', -1.9241486572738005),
('miserably', -1.9216610938019989),
('sucks', -1.9166645809588516),
('uninspired', -1.9131499212248517),
('lame', -1.9117232884159072),
('insult', -1.9085323769376259)]

``````
``````

In [22]:

from bokeh.models import ColumnDataSource, LabelSet
from bokeh.plotting import figure, show, output_file
from bokeh.io import output_notebook
output_notebook()

``````
``````

var element = \$('#8be88610-faf7-42a7-9983-5f8175900b2d');

(function(global) {
function now() {
return new Date();
}

var force = true;

if (typeof (window._bokeh_onload_callbacks) === "undefined" || force === true) {
}

if (typeof (window._bokeh_timeout) === "undefined" || force === true) {
window._bokeh_timeout = Date.now() + 5000;
}

"<div style='background-color: #fdd'>\n"+
"<p>\n"+
"may be due to a slow or bad network connection. Possible fixes:\n"+
"</p>\n"+
"<ul>\n"+
"<li>re-rerun `output_notebook()` to attempt to load from CDN again, or</li>\n"+
"<li>use INLINE resources instead, as so:</li>\n"+
"</ul>\n"+
"<code>\n"+
"from bokeh.resources import INLINE\n"+
"output_notebook(resources=INLINE)\n"+
"</code>\n"+
"</div>"}};

if (window.Bokeh !== undefined) {
} else if (Date.now() < window._bokeh_timeout) {
}
}

function run_callbacks() {
console.info("Bokeh: all callbacks have finished");
}

console.log("Bokeh: BokehJS is being loaded, scheduling callback at", now());
return null;
}
if (js_urls == null || js_urls.length === 0) {
run_callbacks();
return null;
}
for (var i = 0; i < js_urls.length; i++) {
var url = js_urls[i];
var s = document.createElement('script');
s.src = url;
s.async = false;
run_callbacks()
}
};
s.onerror = function() {
console.warn("failed to load library " + url);
};
console.log("Bokeh: injecting script tag for BokehJS library: ", url);
}
};var element = document.getElementById("9600c2ee-4684-4193-ab58-01f39912be62");
if (element == null) {
console.log("Bokeh: ERROR: autoload.js configured with elementid '9600c2ee-4684-4193-ab58-01f39912be62' but no matching script tag was found. ")
return false;
}

var js_urls = ["https://cdn.pydata.org/bokeh/release/bokeh-0.12.4.min.js", "https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.4.min.js"];

var inline_js = [
function(Bokeh) {
Bokeh.set_log_level("info");
},

function(Bokeh) {

},
function(Bokeh) {
console.log("Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.12.4.min.css");
Bokeh.embed.inject_css("https://cdn.pydata.org/bokeh/release/bokeh-0.12.4.min.css");
console.log("Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.4.min.css");
Bokeh.embed.inject_css("https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.4.min.css");
}
];

function run_inline_js() {

if ((window.Bokeh !== undefined) || (force === true)) {
for (var i = 0; i < inline_js.length; i++) {
inline_js[i](window.Bokeh);
}if (force === true) {
}} else if (Date.now() < window._bokeh_timeout) {
setTimeout(run_inline_js, 100);
console.log("Bokeh: BokehJS failed to load within specified timeout.");
} else if (force !== true) {
var cell = \$(document.getElementById("9600c2ee-4684-4193-ab58-01f39912be62")).parents('.cell').data().cell;
}

}

console.log("Bokeh: BokehJS loaded, going straight to plotting");
run_inline_js();
} else {
console.log("Bokeh: BokehJS plotting callback run at", now());
run_inline_js();
});
}
}(this));

``````
``````

In [116]:

hist, edges = np.histogram(list(map(lambda x:x[1],pos_neg_ratios.most_common())), density=True, bins=100, normed=True)

p = figure(tools="pan,wheel_zoom,reset,save",
toolbar_location="above",
title="Word Positive/Negative Affinity Distribution")
show(p)

``````
``````

(function(global) {
function now() {
return new Date();
}

var force = "";

if (typeof (window._bokeh_onload_callbacks) === "undefined" || force !== "") {
}

if (typeof (window._bokeh_timeout) === "undefined" || force !== "") {
window._bokeh_timeout = Date.now() + 0;
}

"<div style='background-color: #fdd'>\n"+
"<p>\n"+
"may be due to a slow or bad network connection. Possible fixes:\n"+
"</p>\n"+
"<ul>\n"+
"<li>re-rerun `output_notebook()` to attempt to load from CDN again, or</li>\n"+
"<li>use INLINE resources instead, as so:</li>\n"+
"</ul>\n"+
"<code>\n"+
"from bokeh.resources import INLINE\n"+
"output_notebook(resources=INLINE)\n"+
"</code>\n"+
"</div>"}};

if (window.Bokeh !== undefined) {
} else if (Date.now() < window._bokeh_timeout) {
}
}

function run_callbacks() {
console.info("Bokeh: all callbacks have finished");
}

console.log("Bokeh: BokehJS is being loaded, scheduling callback at", now());
return null;
}
if (js_urls == null || js_urls.length === 0) {
run_callbacks();
return null;
}
for (var i = 0; i < js_urls.length; i++) {
var url = js_urls[i];
var s = document.createElement('script');
s.src = url;
s.async = false;
run_callbacks()
}
};
s.onerror = function() {
console.warn("failed to load library " + url);
};
console.log("Bokeh: injecting script tag for BokehJS library: ", url);
}
};var element = document.getElementById("1f2bbb42-b317-4ca2-bf06-3978b9b7bf10");
if (element == null) {
console.log("Bokeh: ERROR: autoload.js configured with elementid '1f2bbb42-b317-4ca2-bf06-3978b9b7bf10' but no matching script tag was found. ")
return false;
}

var js_urls = [];

var inline_js = [
function(Bokeh) {
Bokeh.\$(function() {
var render_items = [{"docid":"046572ae-fbcf-48e6-8e55-43f8cfc12d74","elementid":"1f2bbb42-b317-4ca2-bf06-3978b9b7bf10","modelid":"7f91350f-c2aa-4e20-ac2b-4c4c7073253f"}];

Bokeh.embed.embed_items(docs_json, render_items);
});
},
function(Bokeh) {
}
];

function run_inline_js() {

if ((window.Bokeh !== undefined) || (force === "1")) {
for (var i = 0; i < inline_js.length; i++) {
inline_js[i](window.Bokeh);
}if (force === "1") {
}} else if (Date.now() < window._bokeh_timeout) {
setTimeout(run_inline_js, 100);
console.log("Bokeh: BokehJS failed to load within specified timeout.");
} else if (!force) {
var cell = \$("#1f2bbb42-b317-4ca2-bf06-3978b9b7bf10").parents('.cell').data().cell;
}

}

console.log("Bokeh: BokehJS loaded, going straight to plotting");
run_inline_js();
} else {
console.log("Bokeh: BokehJS plotting callback run at", now());
run_inline_js();
});
}
}(this));

``````
``````

In [117]:

frequency_frequency = Counter()

for word, cnt in total_counts.most_common():
frequency_frequency[cnt] += 1

``````
``````

In [118]:

hist, edges = np.histogram(list(map(lambda x:x[1],frequency_frequency.most_common())), density=True, bins=100, normed=True)

p = figure(tools="pan,wheel_zoom,reset,save",
toolbar_location="above",
title="The frequency distribution of the words in our corpus")
show(p)

``````
``````

(function(global) {
function now() {
return new Date();
}

var force = "";

if (typeof (window._bokeh_onload_callbacks) === "undefined" || force !== "") {
}

if (typeof (window._bokeh_timeout) === "undefined" || force !== "") {
window._bokeh_timeout = Date.now() + 0;
}

"<div style='background-color: #fdd'>\n"+
"<p>\n"+
"may be due to a slow or bad network connection. Possible fixes:\n"+
"</p>\n"+
"<ul>\n"+
"<li>re-rerun `output_notebook()` to attempt to load from CDN again, or</li>\n"+
"<li>use INLINE resources instead, as so:</li>\n"+
"</ul>\n"+
"<code>\n"+
"from bokeh.resources import INLINE\n"+
"output_notebook(resources=INLINE)\n"+
"</code>\n"+
"</div>"}};

if (window.Bokeh !== undefined) {
} else if (Date.now() < window._bokeh_timeout) {
}
}

function run_callbacks() {
console.info("Bokeh: all callbacks have finished");
}

console.log("Bokeh: BokehJS is being loaded, scheduling callback at", now());
return null;
}
if (js_urls == null || js_urls.length === 0) {
run_callbacks();
return null;
}
for (var i = 0; i < js_urls.length; i++) {
var url = js_urls[i];
var s = document.createElement('script');
s.src = url;
s.async = false;
run_callbacks()
}
};
s.onerror = function() {
console.warn("failed to load library " + url);
};
console.log("Bokeh: injecting script tag for BokehJS library: ", url);
}
};var element = document.getElementById("6e363d1a-183b-4b97-9c3e-aa029f64eacb");
if (element == null) {
console.log("Bokeh: ERROR: autoload.js configured with elementid '6e363d1a-183b-4b97-9c3e-aa029f64eacb' but no matching script tag was found. ")
return false;
}

var js_urls = [];

var inline_js = [
function(Bokeh) {
Bokeh.\$(function() {
var render_items = [{"docid":"3eb28dc2-c7ee-4f6e-aa31-615b980807f1","elementid":"6e363d1a-183b-4b97-9c3e-aa029f64eacb","modelid":"5b02809f-321f-451d-8fd7-d92aa382746b"}];

Bokeh.embed.embed_items(docs_json, render_items);
});
},
function(Bokeh) {
}
];

function run_inline_js() {

if ((window.Bokeh !== undefined) || (force === "1")) {
for (var i = 0; i < inline_js.length; i++) {
inline_js[i](window.Bokeh);
}if (force === "1") {
}} else if (Date.now() < window._bokeh_timeout) {
setTimeout(run_inline_js, 100);
console.log("Bokeh: BokehJS failed to load within specified timeout.");
} else if (!force) {
var cell = \$("#6e363d1a-183b-4b97-9c3e-aa029f64eacb").parents('.cell').data().cell;
}

}

console.log("Bokeh: BokehJS loaded, going straight to plotting");
run_inline_js();
} else {
console.log("Bokeh: BokehJS plotting callback run at", now());
run_inline_js();
});
}
}(this));

``````

# Reducing Noise by Strategically Reducing the Vocabulary

``````

In [19]:

import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,min_count = 10,polarity_cutoff = 0.1,hidden_nodes = 10, learning_rate = 0.1):

np.random.seed(1)

self.pre_process_data(reviews, polarity_cutoff, min_count)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self,reviews, polarity_cutoff,min_count):

positive_counts = Counter()
negative_counts = Counter()
total_counts = Counter()

for i in range(len(reviews)):
if(labels[i] == 'POSITIVE'):
for word in reviews[i].split(" "):
positive_counts[word] += 1
total_counts[word] += 1
else:
for word in reviews[i].split(" "):
negative_counts[word] += 1
total_counts[word] += 1

pos_neg_ratios = Counter()

for term,cnt in list(total_counts.most_common()):
if(cnt >= 50):
pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)
pos_neg_ratios[term] = pos_neg_ratio

for word,ratio in pos_neg_ratios.most_common():
if(ratio > 1):
pos_neg_ratios[word] = np.log(ratio)
else:
pos_neg_ratios[word] = -np.log((1 / (ratio + 0.01)))

review_vocab = set()
for review in reviews:
for word in review.split(" "):
if(total_counts[word] > min_count):
if(word in pos_neg_ratios.keys()):
if((pos_neg_ratios[word] >= polarity_cutoff) or (pos_neg_ratios[word] <= -polarity_cutoff)):
else:
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))
self.layer_1 = np.zeros((1,hidden_nodes))

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
self.layer_0[0][self.word2index[word]] = 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def train(self, training_reviews_raw, training_labels):

training_reviews = list()
for review in training_reviews_raw:
indices = set()
for word in review.split(" "):
if(word in self.word2index.keys()):
training_reviews.append(list(indices))

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer

# Hidden layer
#             layer_1 = self.layer_0.dot(self.weights_0_1)
self.layer_1 *= 0
for index in review:
self.layer_1 += self.weights_0_1[index]

# Output layer
layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# Update the weights
self.weights_1_2 -= self.layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step

for index in review:
self.weights_0_1[index] -= layer_1_delta[0] * self.learning_rate # update input-to-hidden weights with gradient descent step

if(layer_2 >= 0.5 and label == 'POSITIVE'):
correct_so_far += 1
if(layer_2 < 0.5 and label == 'NEGATIVE'):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer

# Hidden layer
self.layer_1 *= 0
unique_indices = set()
for word in review.lower().split(" "):
if word in self.word2index.keys():
for index in unique_indices:
self.layer_1 += self.weights_0_1[index]

# Output layer
layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

if(layer_2[0] >= 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [123]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000],min_count=20,polarity_cutoff=0.05,learning_rate=0.01)

``````
``````

In [124]:

mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:99.9% Speed(reviews/sec):1371. #Correct:20461 #Trained:24000 Training Accuracy:85.2%

``````
``````

In [125]:

mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):1708.% #Correct:859 #Tested:1000 Testing Accuracy:85.9%

``````
``````

In [126]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000],min_count=20,polarity_cutoff=0.8,learning_rate=0.01)

``````
``````

In [127]:

mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:99.9% Speed(reviews/sec):7089. #Correct:20552 #Trained:24000 Training Accuracy:85.6%

``````
``````

In [128]:

mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):3805.% #Correct:822 #Tested:1000 Testing Accuracy:82.2%

``````

# Analysis: What's Going on in the Weights?

``````

In [20]:

mlp_full = SentimentNetwork(reviews[:-1000],labels[:-1000],min_count=0,polarity_cutoff=0,learning_rate=0.01)

``````
``````

In [21]:

mlp_full.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:99.9% Speed(reviews/sec):717.6 #Correct:20335 #Trained:24000 Training Accuracy:84.7%

``````
``````

In [23]:

Image(filename='sentiment_network_sparse.png')

``````
``````

Out[23]:

``````
``````

In [24]:

def get_most_similar_words(focus = "horrible"):
most_similar = Counter()

for word in mlp_full.word2index.keys():
most_similar[word] = np.dot(mlp_full.weights_0_1[mlp_full.word2index[word]],
mlp_full.weights_0_1[mlp_full.word2index[focus]])

return most_similar.most_common()

``````
``````

In [25]:

get_most_similar_words("excellent")

``````
``````

Out[25]:

[('excellent', 0.13672950757352462),
('perfect', 0.12548286087225946),
('amazing', 0.091827633925999672),
('today', 0.090223662694414217),
('wonderful', 0.089355976962214562),
('fun', 0.087504466674206888),
('great', 0.087141758882292031),
('best', 0.085810885617880611),
('liked', 0.077697629123843398),
('definitely', 0.076628781406965982),
('brilliant', 0.073423858769279038),
('loved', 0.073285428928122121),
('favorite', 0.072781136036160779),
('superb', 0.07173620717850504),
('fantastic', 0.070922191916266197),
('job', 0.069160617207634015),
('incredible', 0.066424077952614402),
('enjoyable', 0.065632560502888765),
('rare', 0.064819212662615075),
('highly', 0.063889453350970501),
('enjoyed', 0.062127546101812925),
('wonderfully', 0.062055178604090155),
('perfectly', 0.061093208811887373),
('fascinating', 0.060663547937493852),
('bit', 0.059655427045653034),
('gem', 0.059510859296156772),
('outstanding', 0.058860808147083013),
('beautiful', 0.058613934703162063),
('surprised', 0.058273314482562975),
('worth', 0.057657484236471213),
('especially', 0.057422020781760771),
('refreshing', 0.057310532092265755),
('entertaining', 0.056612033835629197),
('hilarious', 0.05616854103228662),
('masterpiece', 0.054993988649431565),
('simple', 0.054484083134924075),
('subtle', 0.054368883033508605),
('funniest', 0.05345716487130267),
('solid', 0.052903564743620644),
('awesome', 0.05248919420277038),
('always', 0.052260328525345262),
('noir', 0.05153019472640688),
('guys', 0.051109413645642678),
('sweet', 0.05081893031752599),
('unique', 0.050670162263589169),
('very', 0.050132994948528464),
('heart', 0.049948058498243582),
('moving', 0.04942460116437912),
('atmosphere', 0.048842500895912841),
('strong', 0.04857088063175919),
('remember', 0.048479036942291255),
('believable', 0.04841538439160379),
('shows', 0.048336045608039578),
('love', 0.047310648160924638),
('beautifully', 0.047118717440814889),
('both', 0.046957278901480319),
('terrific', 0.046686597975756625),
('touching', 0.046589962377280955),
('fine', 0.046256431328855763),
('caught', 0.046163326224782343),
('recommended', 0.045876341160885285),
('jack', 0.045352909975188316),
('everyone', 0.045145273964599379),
('episodes', 0.045064457062621285),
('classic', 0.044985816637932753),
('will', 0.044966672557930437),
('appreciate', 0.044764139584570858),
('powerful', 0.044176442621852781),
('realistic', 0.0435974822834648),
('performances', 0.043020249087841744),
('human', 0.042657925475092541),
('expecting', 0.042588442995212208),
('each', 0.042163774519666963),
('delightful', 0.041815007170235494),
('cry', 0.041750968395934819),
('enjoy', 0.041660091797818107),
('you', 0.041465994778271065),
('surprisingly', 0.041393139256517372),
('think', 0.041103720571057038),
('performance', 0.040844259420896839),
('nice', 0.040016506666931712),
('paced', 0.03994448864759962),
('true', 0.03975059264337067),
('tight', 0.039425438825552647),
('similar', 0.039222380170683482),
('friendship', 0.039110112764204286),
('somewhat', 0.03906961573101022),
('beauty', 0.038130922554738787),
('short', 0.037981700131409189),
('life', 0.037716639265310249),
('stunning', 0.037507364832543751),
('still', 0.037479827910101501),
('normal', 0.037422144669435109),
('works', 0.037255830186344166),
('appreciated', 0.037156165138066244),
('mind', 0.037080739403157759),
('twists', 0.036932552473074122),
('knowing', 0.036786021801572068),
('captures', 0.03646750688449471),
('certain', 0.036348359494082834),
('later', 0.03621004278676522),
('finest', 0.036132101827862646),
('compelling', 0.036098464918935765),
('others', 0.03609012020219609),
('tragic', 0.036005003580472768),
('viewing', 0.035933572455522977),
('above', 0.035886717849742573),
('them', 0.035717513281555736),
('matter', 0.035602710619685625),
('future', 0.035323777987573399),
('good', 0.035250130839512749),
('hooked', 0.035154077227307991),
('world', 0.035098777806455032),
('unexpected', 0.035078442502957774),
('innocent', 0.034765360696729197),
('tears', 0.034338309927008842),
('certainly', 0.034301037742714126),
('available', 0.034268101109488011),
('unlike', 0.034253988843446569),
('season', 0.034038922427011613),
('vhs', 0.034011519281018122),
('superior', 0.03391762273249576),
('unusual', 0.033797799688239358),
('genre', 0.033766115408287264),
('criminal', 0.033744472720326824),
('makes', 0.033587001877476604),
('greatest', 0.03343185227197535),
('small', 0.033426529870538395),
('episode', 0.033336443796849899),
('deal', 0.033336107665281924),
('now', 0.033283339034235505),
('quiet', 0.033147935977529276),
('played', 0.033108782201536791),
('day', 0.033074949731286586),
('moved', 0.032873980754099884),
('underrated', 0.032738818192726324),
('society', 0.032613580418616235),
('focuses', 0.032607333858382818),
('intense', 0.032564318613854969),
('sharp', 0.032309211040923339),
('check', 0.032030541149668801),
('take', 0.031717140193258622),
('deeply', 0.031693099458454561),
('games', 0.03166349528572017),
('pre', 0.031251131973427111),
('change', 0.031183353959862565),
('thanks', 0.031172398048464698),
('own', 0.03112133794334707),
('easy', 0.031088479340529641),
('pace', 0.03093436149167823),
('parts', 0.030850186028628303),
('truly', 0.030836637734471671),
('tony', 0.030739434811745025),
('inspired', 0.030725453849735001),
('thought', 0.030707437377997408),
('complex', 0.030464622676702042),
('worlds', 0.030391255174782039),
('language', 0.03026497620030956),
('soundtrack', 0.030210032139046033),
('steals', 0.030207167115964783),
('ride', 0.029801794809751706),
('came', 0.029760628313031532),
('impact', 0.029695785634015842),
('personally', 0.029677477012254878),
('gritty', 0.029540021762614992),
('effective', 0.029512382123355347),
('wise', 0.029510408701830332),
('ultimate', 0.029442440672320932),
('ways', 0.02943934179284419),
('well', 0.029238386207701295),
('sent', 0.029147924396380077),
('after', 0.029037668915531285),
('tells', 0.029004383695691471),
('along', 0.028932972901634893),
('modern', 0.028910642159349308),
('family', 0.028897380662865534),
('pleasantly', 0.028754280601052389),
('edge', 0.02874468747624128),
('american', 0.028706398764554442),
('england', 0.028640930969798108),
('grand', 0.02858110240637193),
('slowly', 0.028470328912922983),
('treat', 0.028418097520915946),
('pleasure', 0.02837070411200417),
('living', 0.028335845213660421),
('impressed', 0.028311856507726555),
('fans', 0.028234674336798968),
('suspenseful', 0.028156658725541142),
('smile', 0.02806565183459761),
('jim', 0.027910842672277562),
('saw', 0.027900239466183013),
('length', 0.027896431301274532),
('impressive', 0.027894778243362794),
('times', 0.027869981332762559),
('witty', 0.027809121334036416),
('flawless', 0.027676409302939117),
('magic', 0.027671001404746015),
('though', 0.027434087841071524),
('subtitles', 0.02743198117938046),
('stands', 0.02734851854841645),
('freedom', 0.027271908118037379),
('relationship', 0.027231146375769136),
('tape', 0.027213179198573838),
('apartment', 0.027198859160909989),
('shown', 0.027062169058709857),
('films', 0.027035590529373481),
('lot', 0.026934527370476375),
('barbara', 0.026837141036193602),
('office', 0.026775230449656282),
('damn', 0.026751196837598828),
('murder', 0.026709073212876626),
('brilliantly', 0.026701889741880671),
('learns', 0.026699872569574595),
('tends', 0.02668377436133576),
('complaint', 0.026587011626106858),
('themselves', 0.026524658938498969),
('war', 0.026518675436425346),
('violence', 0.026450628158076143),
('judge', 0.026443267774947338),
('thriller', 0.026431555027632114),
('his', 0.026370773394088613),
('finding', 0.026362279892885004),
('cast', 0.026360860883736618),
('police', 0.026352129453305256),
('once', 0.026255817642908224),
('spectacular', 0.026245466997092372),
('deserves', 0.026214508159961684),
('driven', 0.026194930792511638),
('spot', 0.026171686780563669),
('carrey', 0.026162838804053026),
('negative', 0.026161677045062219),
('suspense', 0.026110016575822789),
('flaws', 0.026085421601700295),
('brave', 0.026080835779725298),
('surprising', 0.026070851171974708),
('gives', 0.026069978044960768),
('takes', 0.026047493401813327),
('light', 0.025921067904644501),
('timing', 0.025900303450693638),
('crime', 0.025886011572638652),
('thank', 0.025873161609513372),
('century', 0.02587105631011263),
('until', 0.025870245942132507),
('nature', 0.025817942935875453),
('stellar', 0.025803971141651155),
('emotions', 0.025783809728671912),
('tremendous', 0.025772614605786559),
('missed', 0.025657501028952572),
('overall', 0.025655652485101776),
('haven', 0.025650692177140791),
('portrayal', 0.025594273657909627),
('taylor', 0.025516992710898162),
('appropriate', 0.025495908849901629),
('joan', 0.025489829859140629),
('realize', 0.025452457061382182),
('different', 0.02543407397006044),
('return', 0.025384569542597581),
('bound', 0.025380084410398834),
('noticed', 0.02530649499844077),
('constantly', 0.025282186745762457),
('first', 0.025246100888919792),
('lovable', 0.025213500492273062),
('comic', 0.025074597800944055),
('scared', 0.024995376513809509),
('fight', 0.024943209945836396),
('extraordinary', 0.024940366453083611),
('know', 0.024749519416087051),
('brothers', 0.024675058346350743),
('action', 0.024660907824635262),
('needs', 0.024634851651549335),
('jerry', 0.02462148438534386),
('while', 0.024620233313683841),
('also', 0.024519480987472433),
('definite', 0.024509585305468838),
('genius', 0.024500478757646955),
('tragedy', 0.024481339186882275),
('heard', 0.024446567944460477),
('haunting', 0.024431007352898926),
('legendary', 0.02441277726490897),
('uses', 0.024358972452014002),
('years', 0.024316094895735246),
('notch', 0.024310571597216266),
('fabulous', 0.024258810824927635),
('herself', 0.024241390957491057),
('battle', 0.024205827940178122),
('ralph', 0.024205046194653326),
('provoking', 0.024106106062481807),
('ago', 0.024024541904156496),
('game', 0.024004541901512372),
('deals', 0.02394702024903099),
('themes', 0.023936597120221115),
('my', 0.023928374753346037),
('which', 0.023908264765228698),
('together', 0.02388768394280821),
('record', 0.023879473557965502),
('chilling', 0.023877413677317435),
('absorbing', 0.023848541510400112),
('studios', 0.023840610970325336),
('helps', 0.023800338082370958),
('paul', 0.023782537407117978),
('drama', 0.023766688862014711),
('spots', 0.023727534480488408),
('japanese', 0.02370847543051147),
('com', 0.023663537310393355),
('meets', 0.023649415936523126),
('may', 0.023577512715288872),
('goal', 0.02357199244925659),
('out', 0.023558753773465096),
('page', 0.023530160671184863),
('con', 0.023523200814540533),
('thankfully', 0.023405004970711695),
('number', 0.023389568775323531),
('captured', 0.023351056068531193),
('joy', 0.023338854638575421),
('brought', 0.023336907813285936),
('max', 0.023250909447975868),
('superbly', 0.023239871167515597),
('those', 0.023176845007530665),
('course', 0.023170128305056523),
('inspiring', 0.023124940469820013),
('troubled', 0.023104553288143287),
('starring', 0.023098181939380305),
('famous', 0.023080990484234912),
('gripping', 0.023039160339941953),
('identity', 0.023038352369265169),
('many', 0.023030059748964153),
('victor', 0.023028627724258649),
('michael', 0.022946522358330855),
('stop', 0.022927047859442076),
('eerie', 0.022877301562370816),
('seen', 0.022820929217422629),
('caused', 0.022791670672167533),
('moment', 0.022789062338184275),
('portraying', 0.022729334983088951),
('influence', 0.022698569029077062),
('when', 0.022541791159242781),
('touched', 0.022525639292270201),
('complicated', 0.022432126566344631),
('turns', 0.022415566693423837),
('young', 0.022415228068631974),
('award', 0.022414761392271602),
('put', 0.022325849008177176),
('trust', 0.022301497663936395),
('issues', 0.02225775337618751),
('innocence', 0.022236928993752819),
('anime', 0.022201683728338893),
('without', 0.02214454398785886),
('himself', 0.022068240705874407),
('charlie', 0.02205203730146018),
('parents', 0.021888138202371763),
('covered', 0.02188753333796175),
('final', 0.021877215769079549),
('killers', 0.021830664900395119),
('ages', 0.021774376677575584),
('usual', 0.021760980512718141),
('physical', 0.021749103191221798),
('like', 0.021730991541426742),
('crazy', 0.021727382570242992),
('puts', 0.021725737321791543),
('got', 0.021701574500289096),
('room', 0.021690968569465629),
('complaints', 0.021670426593916568),
('type', 0.021663628982945167),
('brings', 0.021600600975875413),
('remarkable', 0.021576791719396034),
('get', 0.021538325389801369),
('city', 0.021523385378314882),
('coming', 0.021492351614142778),
('romantic', 0.021420587536168552),
('cinema', 0.021411776829230966),
('regular', 0.021395882255575833),
('intelligent', 0.021391350897315427),
('music', 0.021381013806527443),
('humor', 0.021365697759571513),
('experience', 0.021314525649372935),
('favourite', 0.02125347648387825),
('social', 0.021250085255237357),
('feelings', 0.021245030895714345),
('cried', 0.021233271641070747),
('rock', 0.02121328002983236),
('against', 0.021157314119587243),
('including', 0.021156674122491399),
('honest', 0.02114345875879349),
('parallel', 0.021107353247706448),
('eddie', 0.021080182147252723),
('crafted', 0.020979194953745086),
('more', 0.02093379734319379),
('glued', 0.02093198872193016),
('insanity', 0.020914935599101146),
('thoroughly', 0.020905661542252759),
('eyes', 0.020868013291281091),
('jr', 0.020865268971014535),
('dramas', 0.020836398428109217),
('follows', 0.020814937146708408),
('situation', 0.020814821105666462),
('understood', 0.020749677092470175),
('face', 0.020701739464945038),
('albeit', 0.020680340389878413),
('memorable', 0.020608260124115527),
('accurate', 0.020585303033408747),
('under', 0.020574430698374231),
('arthur', 0.020562083939889477),
('elderly', 0.020545350471808114),
('opinion', 0.020539570922797755),
('whoopi', 0.020515675744150079),
('helped', 0.02047624233713052),
('detract', 0.020443807698341677),
('flawed', 0.020436371691432333),
('unusually', 0.020433523835905333),
('performing', 0.020396957567555725),
('smooth', 0.020347681451465368),
('magnificent', 0.020334637688102838),
('desperation', 0.02028776899905723),
('lose', 0.02027753568325787),
('satisfying', 0.020251527110272068),
('friend', 0.020227651020398935),
('kudos', 0.020201477326926613),
('breaking', 0.020117861519854292),
('elephant', 0.020115783447057042),
('colors', 0.020112155987764876),
('willing', 0.020087728040224326),
('fresh', 0.02005401912359376),
('offers', 0.020003415308141065),
('provides', 0.020002909565985012),
('guilt', 0.019987917970659564),
('shouldn', 0.019907879458024347),
('japan', 0.019906368589571698),
('secrets', 0.019876976104814387),
('obligatory', 0.019789665431840405),
('dvd', 0.019782796187823429),
('tale', 0.019752149872839884),
('since', 0.019726258912690298),
('roles', 0.019710495505207995),
('breathtaking', 0.019705824135660525),
('ground', 0.019687236524961869),
('higher', 0.019670526139537556),
('jean', 0.019665400087401592),
('rich', 0.019653095716660716),
('right', 0.019629293580435747),
('stone', 0.0196105959056691),
('lives', 0.01961034893671014),
('it', 0.019542002303277555),
('essential', 0.01953386009392041),
('tend', 0.019523404457496819),
('places', 0.019510216587218014),
('recommend', 0.019506211559818108),
('loy', 0.019481148560970923),
('tell', 0.019450286669268766),
('challenge', 0.019374490591710928),
('fiction', 0.019350601498735361),
('able', 0.019340445094151421),
('animated', 0.019333069625267079),
('complain', 0.019332028796550112),
('deeper', 0.019318681931941164),
('blew', 0.019304454395430125),
('seeing', 0.019302442445035529),
('release', 0.019209904006239131),
('unfolds', 0.019184703456013679),
('boys', 0.019177414753158387),
('favorites', 0.019160378141489524),
('throughout', 0.019136892845690673),
('marvelous', 0.019110015321943563),
('relax', 0.019044075162625462),
('desire', 0.019016117204605987),
('end', 0.019014420138293214),
('questions', 0.018977699968684838),
('man', 0.018956744494720245),
('rea', 0.018928733395777456),
('vengeance', 0.018908638777923942),
('brian', 0.018906876323023587),
('learned', 0.01889994792370445),
('lovely', 0.018854980464698644),
('seasons', 0.018852496578683823),
('shines', 0.018827509959493258),
('justice', 0.018827310862034669),
('succeeds', 0.018776998522312769),
('discovered', 0.018766802216817063),
('touch', 0.018762806738861472),
('white', 0.018743225697414191),
('bitter', 0.018724701999912878),
('knows', 0.01871906328874429),
('gene', 0.018660060796556237),
('mainstream', 0.018654252436913901),
('raw', 0.018609728881254825),
('focus', 0.018605078305494939),
('won', 0.018597537876871639),
('ve', 0.018560162581379304),
('million', 0.018514133006256917),
('attention', 0.018406547682637144),
('river', 0.018403383531225694),
('classics', 0.018375185367387345),
('quirky', 0.018358100535754599),
('although', 0.018350252973821906),
('september', 0.018345012211358883),
('emotional', 0.01832716507095174),
('events', 0.01832455447591811),
('released', 0.018304767183625538),
('thus', 0.018302709016086102),
('rules', 0.018298967789718675),
('trilogy', 0.018261985922288494),
('jackie', 0.018261017705562571),
('country', 0.018248984107628784),
('find', 0.018220001120247339),
('sure', 0.018205281970545894),
('overlooked', 0.01817364459210739),
('sensitive', 0.018173518786609135),
('harsh', 0.018143998075916396),
('chair', 0.018127987063468094),
('neatly', 0.018123044612179433),
('round', 0.018082305853658363),
('strength', 0.018042558269708915),
('aunt', 0.018028313353173651),
('description', 0.017997557340833973),
('perspective', 0.017974761193339694),
('closer', 0.017945066423908043),
('extra', 0.017934760731343116),
('hit', 0.017910740181690348),
('tough', 0.017904509470376237),
('work', 0.017882494289916093),
('captivating', 0.01787507230892095),
('swim', 0.017853354272014843),
('holmes', 0.017846058193393119),
('unlikely', 0.017843839699452125),
('fears', 0.017838067451752794),
('nominated', 0.0178374393045206),
('neat', 0.01782306847491319),
('discovers', 0.017801301834152447),
('paris', 0.01779805788420007),
('streets', 0.017746147480597593),
('realism', 0.017729724930388029),
('travel', 0.017694257020940293),
('keep', 0.017684400089090099),
('anyway', 0.017675995400919457),
('realizes', 0.017618932935696142),
('variety', 0.017618487604827659),
('chief', 0.017603963834362808),
('broke', 0.017601657476194944),
('craven', 0.017597613499935324),
('moves', 0.017559744221771676),
('see', 0.017554713803040193),
('intellectual', 0.017537349329235133),
('normally', 0.017511237908563505),
('technique', 0.0175022650778302),
('dancer', 0.017501395365645257),
('awe', 0.017467446640641395),
('technology', 0.017414969148737202),
('kelly', 0.017380794671638257),
('particular', 0.017380503339109222),
('awards', 0.017343067374305077),
('twisted', 0.0173427316555122),
('manager', 0.017337683585341688),
('fantasy', 0.017314736380004723),
('blake', 0.017282963990552191),
('criticism', 0.017279558676803669),
('identify', 0.017277471199843665),
('collection', 0.017253533052260926),
('sidney', 0.017239120845031548),
('ironic', 0.017225809884120875),
('score', 0.017223046869263518),
('charm', 0.017204164112517871),
('lonely', 0.017192972607511965),
('recall', 0.01718951228267028),
('dream', 0.017185607849471301),
('known', 0.017169341473045805),
('hoffman', 0.017123937023014246),
('taking', 0.017102244694823313),
('color', 0.017086755659474456),
('existed', 0.017084491834780034),
('mel', 0.017080644125498475),
('treats', 0.017076365809061664),
('kennedy', 0.017063054110179412),
('millionaire', 0.017058120181534065),
('stewart', 0.01701786393539512),
('soon', 0.017016949690113498),
('style', 0.016978446616527424),
('urban', 0.01696177374188855),
('sides', 0.016958377563876283),
('nicely', 0.016956584044665043),
('survive', 0.01695320106620354),
('contrast', 0.016949017788907707),
('granted', 0.016948500759420799),
('wes', 0.016856895803564035),
('heroic', 0.016849533387674559),
('faults', 0.016833966998505426),
('walter', 0.016813645209614796),
('exceptional', 0.016810242985337294),
('dangerous', 0.016796058008032438),
('fan', 0.016737120507724371),
('witch', 0.016717085914917339),
('occasionally', 0.016711349636820468),
('movies', 0.016676687954063647),
('celebration', 0.016664197566723733),
('castle', 0.016661909651854559),
('catch', 0.016647995152024701),
('its', 0.016639302941262289),
('tribute', 0.016629617927918797),
('jimmy', 0.016625132101972986),
('bravo', 0.01661675415646004),
('enjoying', 0.016613140144305667),
('bus', 0.016593157501778099),
('documentary', 0.016564651461285371),
('frightening', 0.016559987706802767),
('guilty', 0.016536110253664235),
('slightly', 0.016526421724199342),
('is', 0.016511509443399758),
('chan', 0.016507204515006663),
('mixed', 0.016506847567311397),
('curious', 0.016506488394564579),
('spirit', 0.016502977044099081),
('most', 0.016476759333214065),
('chemistry', 0.016425356343989044),
('age', 0.016410666314929878),
('understanding', 0.016345696202945559),
('marie', 0.016341053241072701),
('dreams', 0.016332672013556312),
('again', 0.016287090973937747),
('union', 0.016282379359022551),
('spy', 0.016278154923785915),
('presented', 0.016273043238663489),
('steele', 0.016260993339006803),
('lay', 0.01625999545879786),
('plenty', 0.01624719418983283),
('horrors', 0.016246022980305589),
('black', 0.016223176851856817),
('comedy', 0.01622040802201059),
('winner', 0.0162203188573984),
('african', 0.016214456609794946),
('drummer', 0.016178152199513924),
('entertainment', 0.016173112007890945),
('delivers', 0.016166599465683076),
('stays', 0.016139476352793784),
('america', 0.016108896341111487),
('disappoint', 0.016066615933996442),
('gorgeous', 0.016062350166815054),
('sisters', 0.016060080355840684),
('subsequent', 0.016043574203873975),
('cerebral', 0.016039058904070029),
('french', 0.016038425317363183),
('perfection', 0.016033154869346932),
('likable', 0.016021713396124571),
('warm', 0.016019144095827342),
('studio', 0.016007232818464591),
('late', 0.015997923350457081),
('reality', 0.015978872249423726),
('showed', 0.015938750644323929),
('figures', 0.01592744660892324),
('ever', 0.015926454600790643),
('italy', 0.015909186780479357),
('accustomed', 0.015906246911558279),
('into', 0.015892173681617976),
('he', 0.015866239932092338),
('journey', 0.015817191390925522),
('waters', 0.0158009068788263),
('bill', 0.015785976148791337),
('cousin', 0.015784382710801671),
('explores', 0.015768756345569589),
('originally', 0.015766016465315408),
('astonishing', 0.015741175347778347),
('mouse', 0.015739473070555076),
('affect', 0.01571979846044326),
('authenticity', 0.015716491136675281),
('key', 0.015706372736941261),
('authorities', 0.015700111946298497),
('fortunately', 0.015676427069879848),
('notes', 0.015668388567765468),
('disagree', 0.015659822231464247),
('contribution', 0.015651919381489538),
('flaw', 0.015630623175485556),
('burning', 0.015593951152590362),
('scoop', 0.015580911014213493),
('levels', 0.015579506047588169),
('reveals', 0.015552631094426428),
('explicit', 0.015535052542383238),
('fault', 0.015532818014787668),
('requires', 0.015440001642516231),
('way', 0.015434313286947601),
('waitress', 0.015433929845739224),
('vividly', 0.015399209375312219),
('truman', 0.015388667015530332),
('leslie', 0.015388355420398653),
('cool', 0.015362419182461003),
('i', 0.015358846209804482),
('dated', 0.01535189493470787),
('ruthless', 0.015347223840634985),
('anymore', 0.015327840988573713),
('batman', 0.015325445892906488),
('york', 0.01532365079728272),
('expressions', 0.015290943599335199),
('terms', 0.015285161966075779),
('sunday', 0.015279982329904816),
('chinese', 0.015240680418926652),
('done', 0.015230733309302687),
('behind', 0.015219079842199838),
('event', 0.015214794169662826),
('chamberlain', 0.015214082741427186),
('mysteries', 0.01520455675940992),
('manages', 0.015203486934632015),
('simpsons', 0.015191849812926213),
('mine', 0.015191085212402703),
('purple', 0.015100505661562468),
('website', 0.015095063701722864),
('master', 0.01509152869655765),
('charming', 0.015088362486196539),
('joe', 0.01508192017787815),
('reservations', 0.015077821343474077),
('fever', 0.015076873583983718),
('covers', 0.015047233453258807),
('glimpse', 0.014991086926970954),
('pilot', 0.014978443271049677),
('johansson', 0.014975808461544405),
('explains', 0.014970512080227464),
('excellently', 0.014970388571598848),
('hawke', 0.01496975010993136),
('genuinely', 0.014947672770702568),
('often', 0.014942833143544474),
('cube', 0.014939928709365356),
('clean', 0.014937853229023522),
('ensemble', 0.014913656909087875),
('referred', 0.014910582069880152),
('replies', 0.014907131594945567),
('disease', 0.014895193110452173),
('wish', 0.014892245549307043),
('logical', 0.014888665766304057),
('nathan', 0.014869928851670402),
('aware', 0.01486986711289452),
('exciting', 0.014823139694980614),
('gone', 0.014821497224651535),
('critics', 0.014818559383907356),
('split', 0.014788117032985612),
('series', 0.014770708703162182),
('henry', 0.014757735101897452),
('prisoners', 0.014747710184003867),
('sentenced', 0.014746219906503842),
('laughing', 0.014722151818909786),
('president', 0.014671766779490544),
('list', 0.014666775185665164),
('ones', 0.01465899785410932),
('information', 0.014651687169784215),
('bonus', 0.014648059891508171),
('chicago', 0.014631769872667611),
('someday', 0.014629340475262568),
('splendid', 0.014609703424340649),
('surprises', 0.014608824054662468),
('sentimental', 0.014591361045287955),
('previously', 0.014571223247118625),
('conveys', 0.014567143509152123),
('prominent', 0.01454736311408328),
('born', 0.014536990751946699),
('necessary', 0.014533225697989453),
('yes', 0.014531704633026978),
('marvel', 0.014527554209112409),
('initially', 0.014510187714555967),
('jake', 0.014502509408478864),
('matters', 0.01449773042608421),
('lucas', 0.014496736417950695),
('stories', 0.014475382661229963),
('happy', 0.014471040644253806),
('improvement', 0.014459225025278393),
('anger', 0.01444069696929931),
('hong', 0.014412020732763238),
('devotion', 0.014406165594180752),
('infamous', 0.014402483161136861),
('sir', 0.014390585849942563),
('fashioned', 0.014376495163092877),
('whenever', 0.014311984840844727),
('facing', 0.014311813694297498),
('spin', 0.014300937890947244),
('clear', 0.014297831903635035),
('verhoeven', 0.014290838087095132),
('onto', 0.014287704198288412),
('sheriff', 0.014266680346279261),
('boy', 0.0142383932121725),
('felix', 0.014236371593101711),
('what', 0.014231196728127856),
('site', 0.01421283932921704),
('hits', 0.014208508715996906),
('convincingly', 0.014165838532387459),
('multiple', 0.014150723728410523),
('wrapped', 0.014118759103459127),
('reveal', 0.01407651065382279),
('toby', 0.01407522149311176),
('months', 0.014061986005374691),
('comedies', 0.014050301808876078),
('shot', 0.014031987455271896),
('holds', 0.014023504904484214),
('weeks', 0.014002257803042338),
('window', 0.013985434541614843),
('him', 0.013968181093938303),
('court', 0.013964352058193527),
('double', 0.013960483190947275),
('refuses', 0.013957613385590659),
('stand', 0.01394881385922137),
('shocked', 0.013935157243261928),
('powell', 0.013934062441977023),
('brutal', 0.013924129605946689),
('among', 0.013913156765292948),
('prostitute', 0.013911765274631796),
('nine', 0.013882343344720896),
('timeless', 0.013858274395499411),
('likes', 0.013844971514262236),
('kurosawa', 0.013820064338774894),
('fact', 0.013814297186034387),
('ass', 0.013813899781949799),
('deanna', 0.013799520782801162),
('almost', 0.013791517357271339),
('technicolor', 0.013790541990858995),
('gerard', 0.013776140434137591),
('analysis', 0.013764039325045371),
('mid', 0.013747853289146213),
('stanwyck', 0.013738927891779258),
('mann', 0.013726915645691881),
('stuart', 0.013700229069235782),
('reluctantly', 0.013697113976504024),
('humanity', 0.01369083073691104),
('classical', 0.013688949911986586),
('health', 0.013684784640613444),
('edie', 0.013683859176013941),
('british', 0.013666460250876436),
('primary', 0.013661794714033906),
('coaster', 0.013660631014138395),
('explore', 0.013656042478726909),
('china', 0.013638756081011151),
('protagonists', 0.013627593648932781),
('partly', 0.013617059618125359),
('artist', 0.013597123465502839),
('terrifying', 0.013581203319898153),
('scarlett', 0.013567078625941564),
('mesmerizing', 0.01354781689947941),
('prince', 0.013541105943095598),
('weird', 0.013535346249579566),
('vance', 0.013518150392608121),
('collect', 0.013513303578887652),
('humour', 0.013508890166677978),
('doc', 0.013507286431402924),
('history', 0.013506120200788268),
('miss', 0.01349818799089743),
('angles', 0.013497507265665435),
('dealers', 0.013493607234383895),
('mass', 0.013472328625932874),
('paramount', 0.013467546662344522),
('musicians', 0.013464517138686273),
('jackman', 0.013441428735872098),
('cheer', 0.013440230376864145),
('aired', 0.013427957547366854),
('personal', 0.013422418887670071),
('become', 0.013415910991211793),
('wang', 0.013406655764270567),
('unforgettable', 0.013405651085753997),
('theme', 0.013397995857105537),
('satisfy', 0.01336101263463744),
('beginning', 0.013353575498360082),
('tongue', 0.013332587937334757),
('ran', 0.013322580056022444),
('vh', 0.013321694862247338),
('april', 0.013317958082689022),
('cracking', 0.01331648265485188),
('hilariously', 0.013312111975215814),
('factory', 0.013302408850101527),
('bloom', 0.013287106893282025),
('outcome', 0.013278893812795744),
('startling', 0.013276469703553513),
('portrait', 0.01327305510099926),
('raines', 0.013257908724754863),
('sky', 0.013252502620889894),
('earlier', 0.013233110743632559),
('atlantis', 0.01322818861014456),
('delirious', 0.013226874818125445),
('titanic', 0.013205633401144466),
('nevertheless', 0.013198200611184941),
('proved', 0.013189760358384484),
('denzel', 0.013188430841614765),
('pleasant', 0.013180077348723361),
('horses', 0.013178651568029467),
('astounding', 0.013161698337226808),
('savage', 0.013154100553759934),
('winning', 0.01315324670837965),
('rose', 0.013145586701309777),
('fitting', 0.013133578254330347),
('compared', 0.013131693803520051),
('took', 0.013119343481498985),
('masterson', 0.013112762074217891),
('owner', 0.013108690454819136),
('delight', 0.013107278788311012),
('conventions', 0.01310603977069605),
('natali', 0.013094964441143215),
('message', 0.013093664295113416),
('stood', 0.013090122718303425),
('sailor', 0.01305895917042345),
('ida', 0.013058842950256232),
('escaping', 0.01305272362470678),
('top', 0.013047466741024414),
('louis', 0.013046238442637009),
('peace', 0.013040907918892328),
('several', 0.01302824488706027),
('info', 0.013023754625550174),
('graphics', 0.013020850288881849),
('reflection', 0.013019243823940105),
('slimy', 0.013014377070231845),
('elvira', 0.013009811638957064),
('andre', 0.01300004731344674),
('kong', 0.012999080313300528),
('mayor', 0.012994758409723564),
('punishment', 0.012988264949614938),
('morris', 0.012983710119604964),
('hall', 0.012981593609354808),
('match', 0.012980233583057324),
('bleak', 0.012972505086304058),
('lindy', 0.01297224893312126),
('sequence', 0.012964435808713573),
('learn', 0.012938848970083345),
('happen', 0.012932836387873745),
('john', 0.012929524979001666),
('gothic', 0.012926957011734876),
('wider', 0.012920985981480958),
('popular', 0.012891690509844083),
('diverse', 0.012875263936567813),
('compare', 0.012869395292065187),
('brooklyn', 0.012852986243263932),
('zane', 0.012834302957709145),
('andrew', 0.012824020940615251),
('finely', 0.012822716004015855),
('confronted', 0.012817523686608621),
('going', 0.012809762839304961),
('likewise', 0.012804639349082516),
('breath', 0.012790132659417912),
('building', 0.012789809704793867),
('suggesting', 0.012780624321169345),
('contemporary', 0.012772749462937513),
('midnight', 0.012766963563112075),
('victoria', 0.012756422131580528),
('lasting', 0.012752424415642593),
('kitty', 0.012751468371946009),
('continued', 0.012744325456485397),
('indian', 0.012712962842718674),
('subplots', 0.012709887814283906),
('douglas', 0.012693830679455896),
('explosions', 0.012692697593201855),
('bond', 0.012689802823687821),
('delightfully', 0.012669417460922622),
('understated', 0.012669374312789351),
('greater', 0.012664580396020154),
('sailing', 0.012662424581282427),
('images', 0.012661803048859875),
('copy', 0.012624649645734159),
('seat', 0.012610464273152518),
('eleven', 0.012602533659978888),
('riveting', 0.012591829460094515),
('boiled', 0.01258886352963876),
('whilst', 0.01256984165329564),
('heaven', 0.012547361621330921),
('fruit', 0.012543513029693252),
('reviewer', 0.012534273375083893),
('cost', 0.012529643005796618),
('week', 0.01252284501500827),
('intriguing', 0.012508687653306356),
('streak', 0.012507752385208551),
('san', 0.012502130058217922),
('awareness', 0.012476446442012446),
('catching', 0.012467108595451535),
('kicks', 0.012457714930570582),
('complexities', 0.012454362663082466),
('draws', 0.012447753285125911),
('easily', 0.012444885855614905),
('ealing', 0.012444339255708921),
('psychopath', 0.012431259926282277),
('skin', 0.012424248540973574),
('creative', 0.012386713452491526),
('recognition', 0.012354025801439423),
('downey', 0.012348698765161131),
('symbolism', 0.012329925038271326),
('touches', 0.012328013470751463),
('everyday', 0.012324934809895896),
('achieves', 0.012314898707483493),
('outcast', 0.012313662230219676),
('overwhelmed', 0.012306633138869472),
...]

``````
``````

In [26]:

get_most_similar_words("terrible")

``````
``````

Out[26]:

[('worst', 0.16966107259049848),
('awful', 0.12026847019691245),
('waste', 0.11945367265311002),
('poor', 0.09275888757443547),
('terrible', 0.091425387197727942),
('dull', 0.084209271678223604),
('poorly', 0.081241544516042055),
('disappointment', 0.080064759621368706),
('fails', 0.078599773723337527),
('disappointing', 0.07733948548032335),
('boring', 0.077127858748012895),
('unfortunately', 0.075502449705859051),
('worse', 0.070601835364194662),
('mess', 0.070564299623590412),
('stupid', 0.069484822832543036),
('annoying', 0.065687021903374165),
('save', 0.062880597495865748),
('disappointed', 0.06269235381207286),
('wasted', 0.061387183028051275),
('supposed', 0.060985452957725145),
('horrible', 0.060121772339380118),
('laughable', 0.058698406285467651),
('crap', 0.058104528667884577),
('basically', 0.057218840369636162),
('nothing', 0.057158220043034204),
('ridiculous', 0.056905481068931445),
('lacks', 0.055766565889465457),
('lame', 0.055616009058110184),
('avoid', 0.05551872607319721),
('unless', 0.054208926212940732),
('script', 0.053948359467048533),
('failed', 0.05341393055000912),
('pointless', 0.052855531546894118),
('oh', 0.052761580933176837),
('effort', 0.050773747127292324),
('guess', 0.050379576420076545),
('minutes', 0.049784532804242193),
('wooden', 0.049453108380727188),
('redeeming', 0.049182869114721757),
('seems', 0.049079625154669751),
('weak', 0.046496387374765663),
('pathetic', 0.04609974114971576),
('looks', 0.045796536730244877),
('hoping', 0.045082242887577034),
('wonder', 0.044669791780934602),
('forgettable', 0.042854349251871711),
('silly', 0.042237829687270009),
('attempt', 0.04170629994137353),
('predictable', 0.041514442438568125),
('someone', 0.0415061190273373),
('sorry', 0.040868877281533364),
('might', 0.040445683500688355),
('slow', 0.040346869107034951),
('painful', 0.040220039039613256),
('thin', 0.040062642253777855),
('mediocre', 0.039407165377577387),
('garbage', 0.039310979440981109),
('money', 0.038907973313640494),
('none', 0.038300807052230941),
('bland', 0.038062246057085046),
('couldn', 0.038016664218957934),
('either', 0.037738833070341961),
('unfunny', 0.03707662980504451),
('entire', 0.036642119399463165),
('cheap', 0.036516800802525583),
('honestly', 0.03621204154379784),
('mildly', 0.035744850608185635),
('total', 0.035560454471013074),
('neither', 0.035415946043548557),
('making', 0.035244315060985618),
('problem', 0.035088251034562444),
('flat', 0.034518947038747076),
('bizarre', 0.034509460694521141),
('group', 0.034335883528586797),
('ludicrous', 0.03415964932381603),
('decent', 0.03377158578786895),
('clich', 0.033751444631720556),
('daughter', 0.033732725858384882),
('bored', 0.033622879572852558),
('horror', 0.033464120619956815),
('writing', 0.033437913916756788),
('skip', 0.033430639850491169),
('absurd', 0.033154173530163318),
('barely', 0.032653416827517719),
('idea', 0.032584013175663243),
('wasn', 0.03248120796627206),
('fake', 0.032136435098031518),
('believe', 0.031677858935800801),
('uninteresting', 0.031526815915867139),
('reason', 0.031390715260270541),
('scenes', 0.03121636293538917),
('alright', 0.031046883113956251),
('body', 0.03099998294598668),
('no', 0.030917695380560412),
('insult', 0.030808450146355935),
('mst', 0.030527916471397864),
('nowhere', 0.030352177599338292),
('lousy', 0.03016019546838079),
('didn', 0.030115903194061419),
('interest', 0.029888118468771124),
('half', 0.029813246115057257),
('lee', 0.029804235955718652),
('dimensional', 0.029562861996904038),
('unconvincing', 0.029322607679950242),
('left', 0.029322408787030529),
('sex', 0.029296748476082147),
('even', 0.029225209450923412),
('far', 0.029192618334294561),
('tries', 0.029004001132703541),
('anything', 0.028988097743501119),
('trying', 0.02891947722846511),
('accent', 0.028779542310252575),
('nudity', 0.028662654953266063),
('apparently', 0.028291626941517923),
('zombies', 0.028178583120430676),
('sense', 0.028166740534758778),
('incoherent', 0.027988926190862514),
('something', 0.027986519420278223),
('tedious', 0.027952212405329517),
('wrong', 0.027831947557365632),
('were', 0.027825695799985388),
('endless', 0.027824591794431468),
('turkey', 0.027624266205058482),
('zombie', 0.027543333835110859),
('appears', 0.02746984087848325),
('embarrassing', 0.027425437142424351),
('walked', 0.027411768647042711),
('premise', 0.027346072285964189),
('ok', 0.027333008356232008),
('result', 0.027312558653191918),
('complete', 0.027247564384243431),
('t', 0.027186737465610209),
('least', 0.02694907263201728),
('was', 0.026917906772065292),
('unwatchable', 0.026829458762459388),
('sat', 0.026806511532143463),
('to', 0.026801902698524085),
('christmas', 0.026735555962199217),
('gore', 0.026670161630608404),
('mother', 0.026612696987437758),
('aspects', 0.026583237615263801),
('amateurish', 0.0265651592911757),
('below', 0.026548271016778147),
('stupidity', 0.026460990221946933),
('appeal', 0.02639659671342098),
('trite', 0.026331168557051404),
('then', 0.026284629203937659),
('rubbish', 0.026216695246125507),
('okay', 0.025981446095883612),
('sucks', 0.025930224401969348),
('pretentious', 0.025907912370628297),
('positive', 0.025773976409798761),
('confusing', 0.025737618729473642),
('remotely', 0.025699566061653023),
('obnoxious', 0.025454829745850255),
('m', 0.025435495928249188),
('rent', 0.025373441934038499),
('laughs', 0.025346512576104412),
('re', 0.025342239903627863),
('context', 0.025274382593713576),
('disgusting', 0.025195418263468185),
('so', 0.025148024611438818),
('tiresome', 0.025031684199042101),
('miscast', 0.024970026716882372),
('aren', 0.024968703889385904),
('forced', 0.024933299777713702),
('paid', 0.024906929703330343),
('utter', 0.024802282233385525),
('uninspired', 0.024799576212017463),
('falls', 0.024749631706810705),
('throw', 0.024614954073046699),
('been', 0.024470487429445045),
('ugly', 0.024334820044832381),
('hopes', 0.024315635652054312),
('dire', 0.024191221840051083),
('hunter', 0.024171291127418466),
('producers', 0.024089231997130232),
('seem', 0.024065146985976841),
('straight', 0.02399666645155216),
('vampire', 0.023942797574072684),
('paper', 0.023908828083961008),
('crappy', 0.023807255546688062),
('excited', 0.023764516357875815),
('start', 0.023739057832096774),
('material', 0.023729757962158749),
('excuse', 0.023681577270328102),
('cop', 0.023480677028928126),
('f', 0.023312251619610837),
('ms', 0.023282327986278321),
('villain', 0.023158273483660743),
('fest', 0.023091425711778243),
('lack', 0.023039437894325179),
('such', 0.023031161078650945),
('saving', 0.023025745893238081),
('clichs', 0.022928209200342314),
('enough', 0.022921397253925297),
('mistake', 0.022868689470375007),
('unbelievable', 0.022864325693347887),
('maybe', 0.022825002748295287),
('blame', 0.022808369279543172),
('bunch', 0.022769532876362859),
('version', 0.02275329694575548),
('candy', 0.022749363632616763),
('island', 0.02274580066608016),
('tripe', 0.022695188509832681),
('wasting', 0.022681371343356765),
('inept', 0.022679276425665761),
('actor', 0.022636975371771055),
('flop', 0.022613758633444534),
('any', 0.022560608437607207),
('k', 0.02255401757961505),
('appalling', 0.022500975853556055),
('propaganda', 0.022465024430755744),
('major', 0.022430482324246579),
('sequel', 0.022362296462477879),
('offensive', 0.022326080604825445),
('revenge', 0.022315150942472623),
('shoot', 0.02228810570921174),
('whatsoever', 0.02228649834694094),
('ruined', 0.022173811528211032),
('painfully', 0.022152008209040924),
('on', 0.022016020939730058),
('shame', 0.021981493467648276),
('effects', 0.021849482201960247),
('wouldn', 0.021848506706035161),
('development', 0.021773241990065747),
('plot', 0.021733893676650604),
('co', 0.021728673026887638),
('church', 0.021719723717009982),
('storyline', 0.021663404462350766),
('screenwriter', 0.021660177252485924),
('bother', 0.021571699909566967),
('miserably', 0.021516173872499805),
('christian', 0.021515873507543661),
('found', 0.021449077767987147),
('watching', 0.021344833140596594),
('pseudo', 0.021308384076023461),
('boredom', 0.021119995917930005),
('talent', 0.021005847445274783),
('continuity', 0.02100514585242191),
('talents', 0.020992716564348899),
('college', 0.020990718952374872),
('tried', 0.02097821962618682),
('editing', 0.020865814801443752),
('lines', 0.020853755408845785),
('drivel', 0.020726493692759695),
('generous', 0.020697017742242002),
('potential', 0.020672988272090836),
('creatures', 0.020601399429061324),
('disjointed', 0.020581338926655209),
('irritating', 0.020576764848872681),
('pile', 0.020560898967541534),
('acts', 0.020560043588043531),
('junk', 0.020558505639508211),
('raped', 0.020550629285133258),
('christ', 0.020481424289613526),
('brain', 0.020431161137662711),
('slasher', 0.020425652445140888),
('seconds', 0.020390927443421879),
('nobody', 0.020389268101762611),
('dialog', 0.020338349197601496),
('makers', 0.020333184431951135),
('excitement', 0.020290456024291803),
('flashbacks', 0.020267510512910245),
('sloppy', 0.020234078734398368),
('joke', 0.020212187048528514),
('sleep', 0.020108895811675784),
('bottom', 0.01998677054728017),
('however', 0.019981104962051171),
('fail', 0.019937405211620234),
('sucked', 0.019874923017311578),
('soap', 0.019853525395543015),
('looked', 0.019810211840927103),
('stinks', 0.019769365381781166),
('deserve', 0.019614034321096454),
('exact', 0.019555320028258997),
('substance', 0.019552647432498179),
('yeah', 0.019513150136671552),
('production', 0.019510696746296532),
('female', 0.019476914978121786),
('unintentional', 0.019387723280198929),
('army', 0.019364852889641612),
('minute', 0.019351862554568253),
('unrealistic', 0.019350657250497862),
('rescue', 0.019340920364464918),
('theater', 0.01933382927666849),
('monsters', 0.019332636015751022),
('frankly', 0.01932655082384388),
('children', 0.019314240606868868),
('convince', 0.019312073515560642),
('shallow', 0.019298445504930539),
('synopsis', 0.019259706392396592),
('scott', 0.01918347440557033),
('seriously', 0.019182027987149991),
('ridiculously', 0.019169300285178974),
('looking', 0.019150985439966572),
('kareena', 0.019110212601710662),
('wrote', 0.019015323411486432),
('attempts', 0.019006343780653943),
('bothered', 0.018970712777578516),
('utterly', 0.018924824767803394),
('giant', 0.018891084650049701),
('writers', 0.018868906582101285),
('atrocious', 0.018848042351202358),
('plain', 0.018828766525513598),
('presumably', 0.018826629750947944),
('example', 0.018796453237837189),
('murray', 0.018754173430046931),
('seemed', 0.018749132295913074),
('stay', 0.01874415970643269),
('interview', 0.018672085964709526),
('disaster', 0.018553283301235148),
('value', 0.018544080955166374),
('paint', 0.018529607132429366),
('original', 0.018528190682362406),
('difficult', 0.018518455298178589),
('care', 0.018494804801171258),
('watchable', 0.018481870605389104),
('useless', 0.018470481000366856),
('desperately', 0.01842167504700026),
('except', 0.018391993551238543),
('doing', 0.018384737621350653),
('errors', 0.018380414978330265),
('solely', 0.018349321075079392),
('sitting', 0.018346519170301064),
('giving', 0.018335957397904838),
('ideas', 0.018327099221245192),
('unbearable', 0.018321159676201411),
('nor', 0.018254420259554292),
('project', 0.018252633214771746),
('dozen', 0.018206363291515749),
('charles', 0.018163660578293463),
('plastic', 0.018161741020378659),
('book', 0.018139011699011297),
('shots', 0.018114876064363867),
('ill', 0.018103621818215735),
('where', 0.01806588259969515),
('women', 0.018026883825059355),
('screenplay', 0.018014307024101332),
('through', 0.017990863003241406),
('actress', 0.017876003487857148),
('sign', 0.01786563614405693),
('walk', 0.017823522607756635),
('santa', 0.017727102733219178),
('happens', 0.017722408798843577),
('contrived', 0.017720303645882802),
('gun', 0.017685993176933833),
('ashamed', 0.017679623098721592),
('gratuitous', 0.017665737783803856),
('one', 0.017608259344043278),
('not', 0.017562336441189881),
('credibility', 0.017558852870687959),
('promising', 0.017544417082572289),
('risk', 0.017532600100721243),
('sub', 0.017531947750389461),
('lacking', 0.017513759836446527),
('fell', 0.017464857159331271),
('scenery', 0.017451365955319969),
('flesh', 0.017402514298262693),
('animal', 0.017386681692205426),
('tired', 0.017383214541566681),
('writer', 0.017380887757560842),
('dialogue', 0.017319373946647617),
('terribly', 0.017291135257276893),
('downright', 0.017277675563205454),
('rented', 0.017247977656900716),
('clumsy', 0.01724129080518208),
('blah', 0.017217377177396763),
('random', 0.017199913549247988),
('members', 0.017198947117344762),
('three', 0.017189383912215913),
('celluloid', 0.017174000803758888),
('your', 0.017140173886430052),
('lost', 0.017127763322061815),
('suddenly', 0.017124566068806111),
('cover', 0.017066680835874291),
('existent', 0.017028540662919325),
('mostly', 0.017009366180205387),
('dig', 0.016990887715494292),
('spending', 0.016944400877991015),
('elsewhere', 0.016937877167916518),
('suck', 0.016897737192407596),
('apparent', 0.016783874225807262),
('fill', 0.016766110935370608),
('running', 0.016728621099996364),
('jokes', 0.016718920312228033),
('cheese', 0.016699473014889846),
('outer', 0.016612591391981468),
('anil', 0.016581200840654873),
('director', 0.016512894450311424),
('awfully', 0.016492200414985302),
('mix', 0.016468214294032498),
('naturally', 0.016404879835269455),
('scientist', 0.016395078905109245),
('imdb', 0.016343168034107167),
('dumb', 0.016289693549692456),
('curiosity', 0.016277433551029962),
('somewhere', 0.01623611744674798),
('stereotyped', 0.016235814767295294),
('officer', 0.016235401039884582),
('shelf', 0.016151304702362455),
('spends', 0.016089566181633218),
('explanation', 0.016040330428242218),
('proof', 0.016021381235154293),
('killed', 0.016004979798664883),
('songs', 0.016002280189188103),
('why', 0.015994497048455181),
('assume', 0.015953574865902428),
('mean', 0.015907137878947281),
('year', 0.015900265748875854),
('named', 0.015897377296493421),
('actors', 0.015880849255718713),
('dreck', 0.01584418483784927),
('ripped', 0.01580935239122223),
('exception', 0.015801037653546946),
('let', 0.01574755499580684),
('said', 0.015739206756809138),
('handed', 0.015729421480492774),
('five', 0.015692627471399444),
('manage', 0.015647108880417121),
('thousands', 0.01564343097589297),
('faith', 0.015616976955551868),
('hideous', 0.015589158171890808),
('alas', 0.015538213296394238),
('interesting', 0.015537431607034399),
('camera', 0.015534217771859279),
('affair', 0.015499371820329419),
('saved', 0.015479619606949038),
('allow', 0.015471290657970002),
('embarrassed', 0.015465690911012365),
('historically', 0.015405093934372957),
('guy', 0.015377641254470054),
('smoking', 0.01534650885437833),
('implausible', 0.015340453986022747),
('entirely', 0.015334692788183628),
('insulting', 0.015328508644691501),
('unable', 0.015321433538157143),
('supposedly', 0.015316107621242393),
('replaced', 0.015263381265213493),
('write', 0.015247349730647845),
('devoid', 0.01519618192038018),
('angry', 0.01512887842510143),
('cannot', 0.015124671278970775),
('stinker', 0.015117424017513684),
('types', 0.015097306608066994),
('hype', 0.015076288365524312),
('responsible', 0.014991356276561571),
('peter', 0.014969127137333007),
('putting', 0.01491070725493724),
('over', 0.014897181020826416),
('cardboard', 0.014888714204149054),
('interspersed', 0.014883165331874143),
('haired', 0.014880449676198558),
('spend', 0.014876094316227651),
('elvis', 0.014854709844151742),
('indulgent', 0.014847232132387193),
('catholic', 0.014843519648135945),
('downhill', 0.014807184967767801),
('lazy', 0.014781514695229727),
('aged', 0.014773315829198596),
('exist', 0.014753607788843276),
('torture', 0.014733998799388383),
('prove', 0.014729418674653008),
('tolerable', 0.014680880104255794),
('four', 0.014654547592632508),
('acceptable', 0.01465173069496585),
('chick', 0.014641428398798825),
('unimaginative', 0.014629366067627067),
('whiny', 0.014626751487134585),
('artsy', 0.014597921349167287),
('decide', 0.014596087755808963),
('unpleasant', 0.014539257963097203),
('rotten', 0.014526987482368666),
('racist', 0.014521318292204649),
('air', 0.014513999400043538),
('flimsy', 0.014510298364381134),
('baldwin', 0.014458793249711608),
('merely', 0.014423588430956447),
('wood', 0.014405182128559185),
('thinking', 0.014365675477621551),
('earth', 0.014352953870200838),
('kidding', 0.014337420788166334),
('unintentionally', 0.014336443850996722),
('vampires', 0.014325905430975231),
('generic', 0.014319871170399822),
('defense', 0.014290336242912222),
('saif', 0.014289573796132724),
('asleep', 0.014289012435576957),
('execution', 0.01428396200827341),
('figure', 0.014283770855230152),
('lackluster', 0.014273058981901449),
('hoped', 0.014264724762345849),
('nonsense', 0.014261341497203133),
('horrid', 0.014253216604458425),
('god', 0.014237363547447925),
('l', 0.014187296773742579),
('caricatures', 0.014181564208326643),
('starts', 0.014153430344591583),
('dry', 0.014133935534427954),
('display', 0.014128179969827095),
('button', 0.014116471162614745),
('bore', 0.014116389381443269),
('empty', 0.014096772700681905),
('harold', 0.014052130896646571),
('incomprehensible', 0.014009428713655195),
('annie', 0.014008405850952515),
('thrown', 0.014007462594894701),
('incredibly', 0.014005185007294351),
('renting', 0.013926687608630473),
('connect', 0.013922471736926739),
('younger', 0.01392114839514175),
('author', 0.013908729139553405),
('mistakes', 0.013902060662024717),
('vague', 0.013900188409028444),
('susan', 0.013899718009237951),
('obvious', 0.013862928310275264),
('public', 0.013848261281553181),
('porn', 0.013842110384054571),
('trash', 0.013803990572178482),
('stevens', 0.013796967244647431),
('sequels', 0.013782463861472688),
('hurt', 0.01376954392124014),
('desert', 0.01376361912496973),
('did', 0.013737639449728171),
('behave', 0.013719767167839477),
('served', 0.013714838239223717),
('claims', 0.01370688626965051),
('ultimately', 0.013697643591100152),
('wide', 0.013685211021307757),
('wow', 0.013679184770624806),
('worthless', 0.01367053329629828),
('dear', 0.013653591379600143),
('plodding', 0.01362284584085525),
('mike', 0.013594086031988719),
('favor', 0.013578310381078491),
('call', 0.013577646631327938),
('biggest', 0.013529947586389578),
('worthy', 0.013524754842185318),
('meaning', 0.013517997531900569),
('scientific', 0.013515396653842859),
('hanks', 0.013467213376215904),
('gay', 0.01341484080868823),
('embarrassingly', 0.013401336286973733),
('literary', 0.013389208999321039),
('playing', 0.01332995463472637),
('bo', 0.013312890564682513),
('manipulative', 0.013287016941406334),
('dressed', 0.013285092423656558),
('embarrassment', 0.01326953031919822),
('regarding', 0.013233250211631659),
('stilted', 0.013215539220141915),
('sleeve', 0.013215085161586726),
('rating', 0.013203442200940885),
('kills', 0.013183919467358743),
('sounds', 0.013178727878711719),
('ali', 0.013173031266866376),
('non', 0.01316260375180524),
('pie', 0.013161492629253851),
('populated', 0.013152746747459266),
('killing', 0.013111860853151806),
('else', 0.013110592541316695),
('schneider', 0.013093514941690405),
('priest', 0.013071537555948205),
('hollow', 0.013068001463175462),
('shower', 0.013029604174841072),
('ruins', 0.013021597567104512),
('mental', 0.013019696244479823),
('this', 0.013009778169664532),
('pregnant', 0.012997074834619548),
('make', 0.012992851916498642),
('timberlake', 0.012979689860020448),
('saves', 0.012915795355367859),
('vastly', 0.012914828969565754),
('swear', 0.012901059475490069),
('stella', 0.012883911119651205),
('grave', 0.012882555040277143),
('thats', 0.01286106181291035),
('drinking', 0.012860129471019702),
('boom', 0.01285177959469419),
('introduction', 0.012831129197335455),
('programming', 0.012796219757750258),
('career', 0.012773059501084108),
('stereotype', 0.012769447626661472),
('attractive', 0.012765873120010146),
('victims', 0.012749299245502168),
('pass', 0.012735021821089288),
('experiment', 0.012716112941788916),
('retarded', 0.012713099529852416),
('stuck', 0.012709332698253249),
('akshay', 0.012684273069877867),
('cut', 0.012676285239015487),
('shoddy', 0.012674792040888049),
('damme', 0.012666536417656676),
('inaccurate', 0.012653687577536547),
('ray', 0.01264981802351018),
('woman', 0.012646521945546326),
('research', 0.01264049466286456),
('mile', 0.012627245693716732),
('place', 0.012624645831509419),
('demon', 0.012621688470792605),
('vulgar', 0.012612150302693319),
('engage', 0.012602272831074859),
('wives', 0.012601890190118302),
('mention', 0.01258159848000647),
('if', 0.012569631262234709),
('cartoon', 0.012561864177985764),
('unbelievably', 0.01255039166831585),
('only', 0.012517107727859141),
('ended', 0.012507282716729793),
('stereotypical', 0.012506426536204342),
('spent', 0.012503032775055226),
('thing', 0.012483110991541426),
('phone', 0.012464039991489132),
('stock', 0.01244674214755662),
('drop', 0.012432978683590465),
('self', 0.012432059211520791),
('escapes', 0.012419211298248923),
('conceived', 0.012392639977060709),
('required', 0.012392260947042842),
('assassin', 0.012332404091910106),
('meat', 0.012327751187890422),
('therefore', 0.012316138729629602),
('struggling', 0.012308628353572293),
('ho', 0.012307714936265706),
('ta', 0.012299409649320241),
('cold', 0.012289510775209267),
('expects', 0.012271684887263188),
('furthermore', 0.012263298696316208),
('remote', 0.012254529263879222),
('cgi', 0.012250569964074172),
('arab', 0.012230232115225254),
('feminist', 0.012220004405980549),
('hair', 0.012213792907949607),
('intelligence', 0.012203964889416778),
('destroy', 0.01219021390702397),
('cameo', 0.012186034087855138),
('claus', 0.012181510618531245),
('awake', 0.012171290237450141),
('sums', 0.012139945909251911),
('auto', 0.012126012687040624),
('cue', 0.012120943623008961),
('speak', 0.012117784815618099),
('stereotypes', 0.012106976159466593),
('footage', 0.012103658001584281),
('maker', 0.012093369539270357),
('rental', 0.012083052888147337),
('proper', 0.012063210621690414),
('mercifully', 0.012047936344961967),
('gimmick', 0.012041001769926649),
('coherent', 0.012027899920693618),
('inane', 0.011993175877578831),
('relies', 0.011992345660343809),
('nomination', 0.011982252573531251),
('segal', 0.011947340234058407),
('christians', 0.011946398905489907),
('overrated', 0.011926101166626015),
('don', 0.011924357980777279),
('severely', 0.011916168552237321),
('phony', 0.01191382239312172),
('selfish', 0.011900529017180249),
('resume', 0.011897346320859063),
('another', 0.011877684431361642),
('sean', 0.01187604021413761),
('hepburn', 0.011869243078008906),
('secondly', 0.011863109334450275),
('ups', 0.011859394818287424),
('planet', 0.011852030247443595),
('changed', 0.011845335611887473),
('amused', 0.011842962845878571),
('lowest', 0.011831634819501925),
('fools', 0.011824116232842373),
('spelling', 0.011821902194872622),
('repressed', 0.011821527286346355),
('unlikeable', 0.011818760110586484),
('failure', 0.011816519901709057),
('line', 0.011796438571873895),
('hyped', 0.011784666544684309),
('anti', 0.011764086315539175),
('acting', 0.011752348314205381),
('promise', 0.011749711660046621),
('observe', 0.01173960895927862),
('mindless', 0.011729368774426884),
('lacked', 0.011718485221863712),
('rather', 0.011704535222487881),
('ed', 0.011700096242496993),
('significant', 0.011696176501939935),
('talks', 0.011678101476086888),
('arty', 0.011674972481678902),
('spit', 0.011671408526135135),
('ilk', 0.011661568455359032),
('unoriginal', 0.01165110724584089),
('forward', 0.011646719533106092),
('toilet', 0.01163552220763908),
('suppose', 0.011633258510072193),
('feed', 0.011617447517425161),
('surrounded', 0.011607897169523132),
('wanted', 0.011604506869089728),
('tashan', 0.011596205445299114),
('dr', 0.011543949281335645),
('scare', 0.011543316667712905),
('murderer', 0.011535350571639668),
('explained', 0.011466329649783223),
('cheated', 0.011455846970137714),
('whats', 0.011451443577230849),
('romance', 0.011445558616225327),
('jewish', 0.011441564163643688),
('sexual', 0.011438682797255701),
('books', 0.011419811777535161),
('throwing', 0.011404165894740241),
('nose', 0.01139558365172063),
('parking', 0.011390688400833916),
('pick', 0.011357671445382187),
('chose', 0.011354353327826123),
('improve', 0.011350584813053918),
('kapoor', 0.01134076781407491),
('costs', 0.011325900726890985),
('saying', 0.011325617629551317),
('early', 0.01132052573418809),
('technically', 0.011317672837061947),
('hackman', 0.011288294849240653),
('birthday', 0.011282785404027754),
('cinematography', 0.011263572785831694),
('hurts', 0.011250154303091526),
('saturday', 0.011247837147971238),
('meaningless', 0.011239510238506721),
('mannered', 0.011239044207972256),
('screaming', 0.01123862031022237),
('should', 0.011236648355832374),
('crazed', 0.011236418275421323),
('dignity', 0.011236150963786551),
('mate', 0.011216700009844505),
('letters', 0.011208675517174492),
('recycled', 0.011206236378205576),
('promptly', 0.011202237607822147),
('inexplicably', 0.011161321811546259),
('or', 0.01115296534330535),
('simply', 0.011146233896835904),
('too', 0.011130044921930284),
('nerd', 0.011122543127721441),
('chris', 0.011116119389820142),
('proceedings', 0.011111786695547103),
('lived', 0.011100598930695576),
('code', 0.011095425242701426),
('potentially', 0.011093285835678526),
('open', 0.011075631889800952),
('faster', 0.011074177906888309),
('moore', 0.011070458274337775),
('bowl', 0.011060417562531438),
('absolutely', 0.011044130796846871),
('just', 0.011033356854991554),
('suspension', 0.011031781173072127),
('enemy', 0.011025820754518642),
('conclusion', 0.010986051066943354),
('hospital', 0.010977494845678698),
('romances', 0.010962761722118314),
('spoke', 0.010962116403553655),
('hardly', 0.010960545391113441),
('olds', 0.010951344004097443),
('creek', 0.01095002392432287),
('shouting', 0.010943727502542746),
('originality', 0.010912963822714922),
('bollywood', 0.010911409137577786),
('cape', 0.010902326129518278),
('teeth', 0.010900502046002614),
('backdrop', 0.010885688008708729),
('turn', 0.010880478059425666),
('mason', 0.010866951716170662),
('grace', 0.010848406257382317),
('valley', 0.010845180425875851),
('depressing', 0.01082781808673851),
('superficial', 0.010826403237558527),
('invested', 0.01081248871664086),
('bomb', 0.010811727591767118),
('embarrass', 0.010778451069403573),
('sided', 0.010773707983617683),
('sticking', 0.01076229243554771),
('common', 0.010754536408451018),
('boat', 0.010750196487059148),
('promised', 0.010746025901289752),
('wayans', 0.010744338945929417),
('sheer', 0.01073410327947452),
('wrestling', 0.010724515540975418),
('staff', 0.010715523520497058),
('apollo', 0.010711377643774771),
('leigh', 0.010702080598678557),
('virtually', 0.010691942663824006),
('seagal', 0.010677324100672115),
('comes', 0.010674899719725498),
('edition', 0.010673353805904194),
('predictably', 0.010666551243955751),
('stuff', 0.010664915811483258),
('gang', 0.010664441184213122),
('cancer', 0.010643225900463578),
('obviously', 0.010641670080654524),
('would', 0.010623530922231167),
('totally', 0.010616092995147892),
('profile', 0.010596003501785217),
('spacey', 0.010595967407784396),
('ability', 0.01058459252136016),
('horrendous', 0.010580213328532087),
('blood', 0.010579520401095315),
('imitation', 0.010568550630572965),
('bikini', 0.010568043371931098),
('talented', 0.010566001035979433),
('basis', 0.010564729746933199),
('dialogs', 0.010551191397294006),
('showing', 0.010548613564454237),
('door', 0.010544563357219785),
('portray', 0.010527799628490634),
('strictly', 0.010526959295132308),
('mexican', 0.010508731517822329),
('stick', 0.010465961443388684),
('east', 0.01045532471601677),
('anywhere', 0.01043153273466628),
('remake', 0.010419869194952835),
('am', 0.010410414209203937),
('attempting', 0.010386393998627374),
('disturbing', 0.010381152608581447),
('jude', 0.010377136500506754),
('wondering', 0.0103635126900122),
('celebrated', 0.010360111769075862),
('use', 0.010350554074714646),
('wreck', 0.010344734410393921),
('appear', 0.010344438351539169),
('entitled', 0.010335246001593064),
('youth', 0.010323214445994804),
('letdown', 0.010318553446258687),
('moran', 0.010305507693633363),
('mediocrity', 0.010302827140695373),
('news', 0.010292874788426096),
('bits', 0.010276065293631165),
('alone', 0.010268492053981974),
('accents', 0.010263852094534688),
('inhabited', 0.010244117693024822),
('mock', 0.010244061360675906),
('g', 0.010223458175403786),
('box', 0.010203304329265748),
('term', 0.010199983044386097),
('behavior', 0.010198776124373244),
('tedium', 0.01019009220150722),
('intent', 0.010190038120698576),
('husband', 0.010189502265957844),
('presence', 0.010187192336074173),
('z', 0.010184318583214764),
('unappealing', 0.010146391189444366),
('much', 0.010136790117697142),
('tree', 0.010113534581593914),
('doctors', 0.010099854380484188),
('pi', 0.010095099419111337),
('rodney', 0.010090819798082386),
('franchise', 0.010089650929674203),
('piece', 0.010086011549585333),
('company', 0.010083539582601045),
('choppy', 0.010079223420593735),
('turned', 0.010069855547990144),
('test', 0.010041505355613897),
('ball', 0.010040944323609528),
('hated', 0.010035509058945867),
('bear', 0.010034272465057463),
('serves', 0.010027495172169233),
('leonard', 0.010022751390164696),
('deserved', 0.010022334081283375),
('part', 0.01001636043614744),
('opportunity', 0.010013126012646695),
('turning', 0.010011850960865772),
('overacting', 0.010008994714980214),
('refer', 0.010006488920574088),
('flies', 0.010006418749637628),
('uninvolving', 0.0099991338976208148),
('produce', 0.0099962014038013792),
('jumpy', 0.0099947855808415198),
('die', 0.0099914129058670999),
('root', 0.0099747135001128275),
('insomnia', 0.0099744642555285139),
('blatant', 0.0099596620005663883),
('larry', 0.0099556905367902578),
('threw', 0.0099473965388449607),
('billed', 0.0099285818753670936),
('bullets', 0.0099281758971005961),
('intellectually', 0.0099081388278786167),
('rip', 0.0099013233996040825),
('stretching', 0.0099012969699172632),
('protest', 0.0098984552675623616),
('soldiers', 0.0098936923822449188),
('flick', 0.009887063364977652),
('justin', 0.009862246602717558),
('highlights', 0.0098589088020586326),
('move', 0.0098539899809540407),
('merit', 0.0098431205949966755),
('russian', 0.0098411717219841037),
('security', 0.0098373450338831055),
('idiotic', 0.009834123428814465),
('produced', 0.0098294307574257923),
('king', 0.0098266872343175573),
('magically', 0.0098228842476825642),
('united', 0.0098070847890707729),
('missile', 0.0097990578193348533),
('unlikable', 0.0097869158986480815),
('ignorant', 0.009773274317346101),
('amateur', 0.009767405987056119),
('bachelor', 0.0097673429455405695),
('asylum', 0.009762733851977996),
('screw', 0.009756809857392721),
('report', 0.0097479232699172417),
('dracula', 0.0097467323393205605),
('removed', 0.0097416519499422052),
('confess', 0.0097162925211573305),
('brand', 0.0097152534660907564),
('conspiracy', 0.0097116972290397056),
('horribly', 0.0097083785564252584),
('switch', 0.0097026840933795502),
('jaws', 0.0096877455513713073),
('unsuspecting', 0.009685342503584644),
('betty', 0.009677035213332465),
('forwarding', 0.0096711196893192793),
('university', 0.0096636715878149586),
('star', 0.0096623254931800431),
('crawl', 0.0096464318968590562),
('dopey', 0.0096460863315858646),
('ruin', 0.009623010638545728),
('lifeless', 0.009622880727487999),
('flash', 0.0096193625359650009),
('whoever', 0.0096174128915875439),
('coincidence', 0.0096024599741402154),
('choosing', 0.0095951100051069223),
('avid', 0.0095900913284222636),
('intended', 0.0095846987041676296),
('remained', 0.0095839628178583866),
('c', 0.0095732676681762399),
('waiting', 0.009556225869434885),
('cassie', 0.009548135444223808),
('garage', 0.0095349544587830272),
('clarke', 0.0095345445855698624),
('fortune', 0.0095330396648302101),
('interminable', 0.0095328159563552659),
('incessant', 0.0095235485026846384),
('plots', 0.0095225805490624666),
('danger', 0.0095171205654692934),
('costumes', 0.0094980144667524448),
('evidently', 0.0094952158467012208),
('minus', 0.009491149517466128),
('reporters', 0.009483681104099086),
('israeli', 0.0094750077183364638),
('failing', 0.0094711841313976936),
('paying', 0.0094692344066851265),
('godzilla', 0.0094586915548437855),
('dumber', 0.0094582903092924851),
('earn', 0.0094476224928425005),
('slows', 0.0094467463872487632),
('held', 0.0094452736817914849),
('chase', 0.0094438362611946516),
('lies', 0.0094383969845033399),
('hands', 0.0094381781614589055),
('grief', 0.00942384945341029),
('brains', 0.009418215341663214),
('tom', 0.0094130433384347241),
('resurrected', 0.0094083423437290557),
('sleeps', 0.009401795188265831),
('porno', 0.0093907201413965125),
('somehow', 0.0093889261270860523),
('sarcasm', 0.0093886064393904137),
('tie', 0.0093856009366311572),
('fall', 0.0093801640008931257),
('bring', 0.0093791273545761524),
('rape', 0.0093760851230746452),
('village', 0.0093684513318614028),
('kitchen', 0.0093649071460109607),
('concerned', 0.0093611353238811368),
('republic', 0.009349942694876422),
('hell', 0.0093400360705317275),
('inducing', 0.0093382129792553489),
('stomach', 0.0093378286385158559),
('shambles', 0.0093335457329829768),
('virgin', 0.0093312001339055928),
('extraneous', 0.0093250413800351293),
('cameras', 0.0093229460267977154),
('suffers', 0.0093204929924830034),
('justified', 0.009316321747936316),
('plummer', 0.0092948273285103945),
('ponderous', 0.0092880344237223338),
('player', 0.0092802296345443642),
('survivor', 0.0092767026472125712),
('rainy', 0.009269703421813753),
('graces', 0.0092620944963291291),
...]

``````
``````

In [27]:

import matplotlib.colors as colors

words_to_visualize = list()
for word, ratio in pos_neg_ratios.most_common(500):
if(word in mlp_full.word2index.keys()):
words_to_visualize.append(word)

for word, ratio in list(reversed(pos_neg_ratios.most_common()))[0:500]:
if(word in mlp_full.word2index.keys()):
words_to_visualize.append(word)

``````
``````

In [28]:

pos = 0
neg = 0

colors_list = list()
vectors_list = list()
for word in words_to_visualize:
if word in pos_neg_ratios.keys():
vectors_list.append(mlp_full.weights_0_1[mlp_full.word2index[word]])
if(pos_neg_ratios[word] > 0):
pos+=1
colors_list.append("#00ff00")
else:
neg+=1
colors_list.append("#ff0000")

``````
``````

In [33]:

from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, random_state=0)
words_top_ted_tsne = tsne.fit_transform(vectors_list)

``````
``````

In [31]:

p = figure(tools="pan,wheel_zoom,reset,save",
toolbar_location="above",
title="vector T-SNE for most polarized words")

source = ColumnDataSource(data=dict(x1=words_top_ted_tsne[:,0],
x2=words_top_ted_tsne[:,1],
names=words_to_visualize))

p.scatter(x="x1", y="x2", size=8, source=source,color=colors_list)

word_labels = LabelSet(x="x1", y="x2", text="names", y_offset=6,
text_font_size="8pt", text_color="#555555",
source=source, text_align='center')

show(p)

# green indicates positive words, black indicates negative words

``````
``````

/opt/conda/envs/dlnd/lib/python3.6/site-packages/bokeh/util/deprecation.py:34: BokehDeprecationWarning:
Supplying a user-defined data source AND iterable values to glyph methods is deprecated.

warn(message)
/opt/conda/envs/dlnd/lib/python3.6/site-packages/bokeh/util/deprecation.py:34: BokehDeprecationWarning:
Supplying a user-defined data source AND iterable values to glyph methods is deprecated.

warn(message)

(function(global) {
function now() {
return new Date();
}

var force = false;

if (typeof (window._bokeh_onload_callbacks) === "undefined" || force === true) {
}

if (typeof (window._bokeh_timeout) === "undefined" || force === true) {
window._bokeh_timeout = Date.now() + 0;
}

"<div style='background-color: #fdd'>\n"+
"<p>\n"+
"may be due to a slow or bad network connection. Possible fixes:\n"+
"</p>\n"+
"<ul>\n"+
"<li>re-rerun `output_notebook()` to attempt to load from CDN again, or</li>\n"+
"<li>use INLINE resources instead, as so:</li>\n"+
"</ul>\n"+
"<code>\n"+
"from bokeh.resources import INLINE\n"+
"output_notebook(resources=INLINE)\n"+
"</code>\n"+
"</div>"}};

if (window.Bokeh !== undefined) {
} else if (Date.now() < window._bokeh_timeout) {
}
}

function run_callbacks() {
console.info("Bokeh: all callbacks have finished");
}

console.log("Bokeh: BokehJS is being loaded, scheduling callback at", now());
return null;
}
if (js_urls == null || js_urls.length === 0) {
run_callbacks();
return null;
}
for (var i = 0; i < js_urls.length; i++) {
var url = js_urls[i];
var s = document.createElement('script');
s.src = url;
s.async = false;
run_callbacks()
}
};
s.onerror = function() {
console.warn("failed to load library " + url);
};
console.log("Bokeh: injecting script tag for BokehJS library: ", url);
}
};var element = document.getElementById("3e21663a-f2f8-475a-a8dc-0f2140652719");
if (element == null) {
console.log("Bokeh: ERROR: autoload.js configured with elementid '3e21663a-f2f8-475a-a8dc-0f2140652719' but no matching script tag was found. ")
return false;
}

var js_urls = [];

var inline_js = [
function(Bokeh) {
(function() {
var fn = function() {
var render_items = [{"docid":"9eeda085-a786-41e0-b4ea-3e44fd6a9f4c","elementid":"3e21663a-f2f8-475a-a8dc-0f2140652719","modelid":"5ecaec60-7de9-433d-851b-9b0daa985378"}];

Bokeh.embed.embed_items(docs_json, render_items);
};
})();
},
function(Bokeh) {
}
];

function run_inline_js() {

if ((window.Bokeh !== undefined) || (force === true)) {
for (var i = 0; i < inline_js.length; i++) {
inline_js[i](window.Bokeh);
}if (force === true) {
}} else if (Date.now() < window._bokeh_timeout) {
setTimeout(run_inline_js, 100);
console.log("Bokeh: BokehJS failed to load within specified timeout.");
} else if (force !== true) {
var cell = \$(document.getElementById("3e21663a-f2f8-475a-a8dc-0f2140652719")).parents('.cell').data().cell;
}

}

console.log("Bokeh: BokehJS loaded, going straight to plotting");
run_inline_js();
} else {
console.log("Bokeh: BokehJS plotting callback run at", now());
run_inline_js();
});
}
}(this));

``````
``````

In [ ]:

``````