Sentiment Classification & How To "Frame Problems" for a Neural Network

by Andrew Trask

What You Should Already Know

  • neural networks, forward and back-propagation
  • stochastic gradient descent
  • mean squared error
  • and train/test splits

Where to Get Help if You Need it

  • Re-watch previous Udacity Lectures
  • Leverage the recommended Course Reading Material - Grokking Deep Learning (40% Off: traskud17)
  • Shoot me a tweet @iamtrask

Tutorial Outline:

  • Intro: The Importance of "Framing a Problem"
  • Curate a Dataset
  • Developing a "Predictive Theory"
  • PROJECT 1: Quick Theory Validation
  • Transforming Text to Numbers
  • PROJECT 2: Creating the Input/Output Data
  • Putting it all together in a Neural Network
  • PROJECT 3: Building our Neural Network
  • Understanding Neural Noise
  • PROJECT 4: Making Learning Faster by Reducing Noise
  • Analyzing Inefficiencies in our Network
  • PROJECT 5: Making our Network Train and Run Faster
  • Further Noise Reduction
  • PROJECT 6: Reducing Noise by Strategically Reducing the Vocabulary
  • Analysis: What's going on in the weights?

Lesson: Curate a Dataset


In [1]:
def pretty_print_review_and_label(i):
    print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

In [2]:
len(reviews)


Out[2]:
25000

In [3]:
reviews[0]


Out[3]:
'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   '

In [4]:
labels[0]


Out[4]:
'POSITIVE'

Lesson: Develop a Predictive Theory


In [5]:
print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)


labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...

Project 1: Quick Theory Validation


In [6]:
from collections import Counter
import numpy as np

In [7]:
positive_counts = Counter()
negative_counts = Counter()
total_counts = Counter()

In [8]:
for i in range(len(reviews)):
    if(labels[i] == 'POSITIVE'):
        for word in reviews[i].split(" "):
            positive_counts[word] += 1
            total_counts[word] += 1
    else:
        for word in reviews[i].split(" "):
            negative_counts[word] += 1
            total_counts[word] += 1

In [9]:
positive_counts.most_common()


Out[9]:
[('', 550468),
 ('the', 173324),
 ('.', 159654),
 ('and', 89722),
 ('a', 83688),
 ('of', 76855),
 ('to', 66746),
 ('is', 57245),
 ('in', 50215),
 ('br', 49235),
 ('it', 48025),
 ('i', 40743),
 ('that', 35630),
 ('this', 35080),
 ('s', 33815),
 ('as', 26308),
 ('with', 23247),
 ('for', 22416),
 ('was', 21917),
 ('film', 20937),
 ('but', 20822),
 ('movie', 19074),
 ('his', 17227),
 ('on', 17008),
 ('you', 16681),
 ('he', 16282),
 ('are', 14807),
 ('not', 14272),
 ('t', 13720),
 ('one', 13655),
 ('have', 12587),
 ('be', 12416),
 ('by', 11997),
 ('all', 11942),
 ('who', 11464),
 ('an', 11294),
 ('at', 11234),
 ('from', 10767),
 ('her', 10474),
 ('they', 9895),
 ('has', 9186),
 ('so', 9154),
 ('like', 9038),
 ('about', 8313),
 ('very', 8305),
 ('out', 8134),
 ('there', 8057),
 ('she', 7779),
 ('what', 7737),
 ('or', 7732),
 ('good', 7720),
 ('more', 7521),
 ('when', 7456),
 ('some', 7441),
 ('if', 7285),
 ('just', 7152),
 ('can', 7001),
 ('story', 6780),
 ('time', 6515),
 ('my', 6488),
 ('great', 6419),
 ('well', 6405),
 ('up', 6321),
 ('which', 6267),
 ('their', 6107),
 ('see', 6026),
 ('also', 5550),
 ('we', 5531),
 ('really', 5476),
 ('would', 5400),
 ('will', 5218),
 ('me', 5167),
 ('had', 5148),
 ('only', 5137),
 ('him', 5018),
 ('even', 4964),
 ('most', 4864),
 ('other', 4858),
 ('were', 4782),
 ('first', 4755),
 ('than', 4736),
 ('much', 4685),
 ('its', 4622),
 ('no', 4574),
 ('into', 4544),
 ('people', 4479),
 ('best', 4319),
 ('love', 4301),
 ('get', 4272),
 ('how', 4213),
 ('life', 4199),
 ('been', 4189),
 ('because', 4079),
 ('way', 4036),
 ('do', 3941),
 ('made', 3823),
 ('films', 3813),
 ('them', 3805),
 ('after', 3800),
 ('many', 3766),
 ('two', 3733),
 ('too', 3659),
 ('think', 3655),
 ('movies', 3586),
 ('characters', 3560),
 ('character', 3514),
 ('don', 3468),
 ('man', 3460),
 ('show', 3432),
 ('watch', 3424),
 ('seen', 3414),
 ('then', 3358),
 ('little', 3341),
 ('still', 3340),
 ('make', 3303),
 ('could', 3237),
 ('never', 3226),
 ('being', 3217),
 ('where', 3173),
 ('does', 3069),
 ('over', 3017),
 ('any', 3002),
 ('while', 2899),
 ('know', 2833),
 ('did', 2790),
 ('years', 2758),
 ('here', 2740),
 ('ever', 2734),
 ('end', 2696),
 ('these', 2694),
 ('such', 2590),
 ('real', 2568),
 ('scene', 2567),
 ('back', 2547),
 ('those', 2485),
 ('though', 2475),
 ('off', 2463),
 ('new', 2458),
 ('your', 2453),
 ('go', 2440),
 ('acting', 2437),
 ('plot', 2432),
 ('world', 2429),
 ('scenes', 2427),
 ('say', 2414),
 ('through', 2409),
 ('makes', 2390),
 ('better', 2381),
 ('now', 2368),
 ('work', 2346),
 ('young', 2343),
 ('old', 2311),
 ('ve', 2307),
 ('find', 2272),
 ('both', 2248),
 ('before', 2177),
 ('us', 2162),
 ('again', 2158),
 ('series', 2153),
 ('quite', 2143),
 ('something', 2135),
 ('cast', 2133),
 ('should', 2121),
 ('part', 2098),
 ('always', 2088),
 ('lot', 2087),
 ('another', 2075),
 ('actors', 2047),
 ('director', 2040),
 ('family', 2032),
 ('between', 2016),
 ('own', 2016),
 ('m', 1998),
 ('may', 1997),
 ('same', 1972),
 ('role', 1967),
 ('watching', 1966),
 ('every', 1954),
 ('funny', 1953),
 ('doesn', 1935),
 ('performance', 1928),
 ('few', 1918),
 ('bad', 1907),
 ('look', 1900),
 ('re', 1884),
 ('why', 1855),
 ('things', 1849),
 ('times', 1832),
 ('big', 1815),
 ('however', 1795),
 ('actually', 1790),
 ('action', 1789),
 ('going', 1783),
 ('bit', 1757),
 ('comedy', 1742),
 ('down', 1740),
 ('music', 1738),
 ('must', 1728),
 ('take', 1709),
 ('saw', 1692),
 ('long', 1690),
 ('right', 1688),
 ('fun', 1686),
 ('fact', 1684),
 ('excellent', 1683),
 ('around', 1674),
 ('didn', 1672),
 ('without', 1671),
 ('thing', 1662),
 ('thought', 1639),
 ('got', 1635),
 ('each', 1630),
 ('day', 1614),
 ('feel', 1597),
 ('seems', 1596),
 ('come', 1594),
 ('done', 1586),
 ('beautiful', 1580),
 ('especially', 1572),
 ('played', 1571),
 ('almost', 1566),
 ('want', 1562),
 ('yet', 1556),
 ('give', 1553),
 ('pretty', 1549),
 ('last', 1543),
 ('since', 1519),
 ('different', 1504),
 ('although', 1501),
 ('gets', 1490),
 ('true', 1487),
 ('interesting', 1481),
 ('job', 1470),
 ('enough', 1455),
 ('our', 1454),
 ('shows', 1447),
 ('horror', 1441),
 ('woman', 1439),
 ('tv', 1400),
 ('probably', 1398),
 ('father', 1395),
 ('original', 1393),
 ('girl', 1390),
 ('point', 1379),
 ('plays', 1378),
 ('wonderful', 1372),
 ('far', 1358),
 ('course', 1358),
 ('john', 1350),
 ('rather', 1340),
 ('isn', 1328),
 ('ll', 1326),
 ('later', 1324),
 ('dvd', 1324),
 ('whole', 1310),
 ('war', 1310),
 ('d', 1307),
 ('found', 1306),
 ('away', 1306),
 ('screen', 1305),
 ('nothing', 1300),
 ('year', 1297),
 ('once', 1296),
 ('hard', 1294),
 ('together', 1280),
 ('set', 1277),
 ('am', 1277),
 ('having', 1266),
 ('making', 1265),
 ('place', 1263),
 ('might', 1260),
 ('comes', 1260),
 ('sure', 1253),
 ('american', 1248),
 ('play', 1245),
 ('kind', 1244),
 ('perfect', 1242),
 ('takes', 1242),
 ('performances', 1237),
 ('himself', 1230),
 ('worth', 1221),
 ('everyone', 1221),
 ('anyone', 1214),
 ('actor', 1203),
 ('three', 1201),
 ('wife', 1196),
 ('classic', 1192),
 ('goes', 1186),
 ('ending', 1178),
 ('version', 1168),
 ('star', 1149),
 ('enjoy', 1146),
 ('book', 1142),
 ('nice', 1132),
 ('everything', 1128),
 ('during', 1124),
 ('put', 1118),
 ('seeing', 1111),
 ('least', 1102),
 ('house', 1100),
 ('high', 1095),
 ('watched', 1094),
 ('loved', 1087),
 ('men', 1087),
 ('night', 1082),
 ('anything', 1075),
 ('believe', 1071),
 ('guy', 1071),
 ('top', 1063),
 ('amazing', 1058),
 ('hollywood', 1056),
 ('looking', 1053),
 ('main', 1044),
 ('definitely', 1043),
 ('gives', 1031),
 ('home', 1029),
 ('seem', 1028),
 ('episode', 1023),
 ('audience', 1020),
 ('sense', 1020),
 ('truly', 1017),
 ('special', 1011),
 ('second', 1009),
 ('short', 1009),
 ('fan', 1009),
 ('mind', 1005),
 ('human', 1001),
 ('recommend', 999),
 ('full', 996),
 ('black', 995),
 ('help', 991),
 ('along', 989),
 ('trying', 987),
 ('small', 986),
 ('death', 985),
 ('friends', 981),
 ('remember', 974),
 ('often', 970),
 ('said', 966),
 ('favorite', 962),
 ('heart', 959),
 ('early', 957),
 ('left', 956),
 ('until', 955),
 ('script', 954),
 ('let', 954),
 ('maybe', 937),
 ('today', 936),
 ('live', 934),
 ('less', 934),
 ('moments', 933),
 ('others', 929),
 ('brilliant', 926),
 ('shot', 925),
 ('liked', 923),
 ('become', 916),
 ('won', 915),
 ('used', 910),
 ('style', 907),
 ('mother', 895),
 ('lives', 894),
 ('came', 893),
 ('stars', 890),
 ('cinema', 889),
 ('looks', 885),
 ('perhaps', 884),
 ('read', 882),
 ('enjoyed', 879),
 ('boy', 875),
 ('drama', 873),
 ('highly', 871),
 ('given', 870),
 ('playing', 867),
 ('use', 864),
 ('next', 859),
 ('women', 858),
 ('fine', 857),
 ('effects', 856),
 ('kids', 854),
 ('entertaining', 853),
 ('need', 852),
 ('line', 850),
 ('works', 848),
 ('someone', 847),
 ('mr', 836),
 ('simply', 835),
 ('picture', 833),
 ('children', 833),
 ('face', 831),
 ('keep', 831),
 ('friend', 831),
 ('dark', 830),
 ('overall', 828),
 ('certainly', 828),
 ('minutes', 827),
 ('wasn', 824),
 ('history', 822),
 ('finally', 820),
 ('couple', 816),
 ('against', 815),
 ('son', 809),
 ('understand', 808),
 ('lost', 807),
 ('michael', 805),
 ('else', 801),
 ('throughout', 798),
 ('fans', 797),
 ('city', 792),
 ('reason', 789),
 ('written', 787),
 ('production', 787),
 ('several', 784),
 ('school', 783),
 ('based', 781),
 ('rest', 781),
 ('try', 780),
 ('dead', 776),
 ('hope', 775),
 ('strong', 768),
 ('white', 765),
 ('tell', 759),
 ('itself', 758),
 ('half', 753),
 ('person', 749),
 ('sometimes', 746),
 ('past', 744),
 ('start', 744),
 ('genre', 743),
 ('beginning', 739),
 ('final', 739),
 ('town', 738),
 ('art', 734),
 ('humor', 732),
 ('game', 732),
 ('yes', 731),
 ('idea', 731),
 ('late', 730),
 ('becomes', 729),
 ('despite', 729),
 ('able', 726),
 ('case', 726),
 ('money', 723),
 ('child', 721),
 ('completely', 721),
 ('side', 719),
 ('camera', 716),
 ('getting', 714),
 ('instead', 712),
 ('soon', 702),
 ('under', 700),
 ('viewer', 699),
 ('age', 697),
 ('days', 696),
 ('stories', 696),
 ('felt', 694),
 ('simple', 694),
 ('roles', 693),
 ('video', 688),
 ('name', 683),
 ('either', 683),
 ('doing', 677),
 ('turns', 674),
 ('wants', 671),
 ('close', 671),
 ('title', 669),
 ('wrong', 668),
 ('went', 666),
 ('james', 665),
 ('evil', 659),
 ('budget', 657),
 ('episodes', 657),
 ('relationship', 655),
 ('fantastic', 653),
 ('piece', 653),
 ('david', 651),
 ('turn', 648),
 ('murder', 646),
 ('parts', 645),
 ('brother', 644),
 ('absolutely', 643),
 ('head', 643),
 ('experience', 642),
 ('eyes', 641),
 ('sex', 638),
 ('direction', 637),
 ('called', 637),
 ('directed', 636),
 ('lines', 634),
 ('behind', 633),
 ('sort', 632),
 ('actress', 631),
 ('lead', 630),
 ('oscar', 628),
 ('including', 627),
 ('example', 627),
 ('known', 625),
 ('musical', 625),
 ('chance', 621),
 ('score', 620),
 ('already', 619),
 ('feeling', 619),
 ('hit', 619),
 ('voice', 615),
 ('moment', 612),
 ('living', 612),
 ('low', 610),
 ('supporting', 610),
 ('ago', 609),
 ('themselves', 608),
 ('reality', 605),
 ('hilarious', 605),
 ('jack', 604),
 ('told', 603),
 ('hand', 601),
 ('quality', 600),
 ('moving', 600),
 ('dialogue', 600),
 ('song', 599),
 ('happy', 599),
 ('matter', 598),
 ('paul', 598),
 ('light', 594),
 ('future', 593),
 ('entire', 592),
 ('finds', 591),
 ('gave', 589),
 ('laugh', 587),
 ('released', 586),
 ('expect', 584),
 ('fight', 581),
 ('particularly', 580),
 ('cinematography', 579),
 ('police', 579),
 ('whose', 578),
 ('type', 578),
 ('sound', 578),
 ('view', 573),
 ('enjoyable', 573),
 ('number', 572),
 ('romantic', 572),
 ('husband', 572),
 ('daughter', 572),
 ('documentary', 571),
 ('self', 570),
 ('superb', 569),
 ('modern', 569),
 ('took', 569),
 ('robert', 569),
 ('mean', 566),
 ('shown', 563),
 ('coming', 561),
 ('important', 560),
 ('king', 559),
 ('leave', 559),
 ('change', 558),
 ('somewhat', 555),
 ('wanted', 555),
 ('tells', 554),
 ('events', 552),
 ('run', 552),
 ('career', 552),
 ('country', 552),
 ('heard', 550),
 ('season', 550),
 ('greatest', 549),
 ('girls', 549),
 ('etc', 547),
 ('care', 546),
 ('starts', 545),
 ('english', 542),
 ('killer', 541),
 ('tale', 540),
 ('guys', 540),
 ('totally', 540),
 ('animation', 540),
 ('usual', 539),
 ('miss', 535),
 ('opinion', 535),
 ('easy', 531),
 ('violence', 531),
 ('songs', 530),
 ('british', 528),
 ('says', 526),
 ('realistic', 525),
 ('writing', 524),
 ('writer', 522),
 ('act', 522),
 ('comic', 521),
 ('thriller', 519),
 ('television', 517),
 ('power', 516),
 ('ones', 515),
 ('kid', 514),
 ('york', 513),
 ('novel', 513),
 ('alone', 512),
 ('problem', 512),
 ('attention', 509),
 ('involved', 508),
 ('kill', 507),
 ('extremely', 507),
 ('seemed', 506),
 ('hero', 505),
 ('french', 505),
 ('rock', 504),
 ('stuff', 501),
 ('wish', 499),
 ('begins', 498),
 ('taken', 497),
 ('sad', 497),
 ('ways', 496),
 ('richard', 495),
 ('knows', 494),
 ('atmosphere', 493),
 ('similar', 491),
 ('surprised', 491),
 ('taking', 491),
 ('car', 491),
 ('george', 490),
 ('perfectly', 490),
 ('across', 489),
 ('team', 489),
 ('eye', 489),
 ('sequence', 489),
 ('room', 488),
 ('due', 488),
 ('among', 488),
 ('serious', 488),
 ('powerful', 488),
 ('strange', 487),
 ('order', 487),
 ('cannot', 487),
 ('b', 487),
 ('beauty', 486),
 ('famous', 485),
 ('happened', 484),
 ('tries', 484),
 ('herself', 484),
 ('myself', 484),
 ('class', 483),
 ('four', 482),
 ('cool', 481),
 ('release', 479),
 ('anyway', 479),
 ('theme', 479),
 ('opening', 478),
 ('entertainment', 477),
 ('slow', 475),
 ('ends', 475),
 ('unique', 475),
 ('exactly', 475),
 ('easily', 474),
 ('level', 474),
 ('o', 474),
 ('red', 474),
 ('interest', 472),
 ('happen', 471),
 ('crime', 470),
 ('viewing', 468),
 ('sets', 467),
 ('memorable', 467),
 ('stop', 466),
 ('group', 466),
 ('problems', 463),
 ('dance', 463),
 ('working', 463),
 ('sister', 463),
 ('message', 463),
 ('knew', 462),
 ('mystery', 461),
 ('nature', 461),
 ('bring', 460),
 ('believable', 459),
 ('thinking', 459),
 ('brought', 459),
 ('mostly', 458),
 ('disney', 457),
 ('couldn', 457),
 ('society', 456),
 ('lady', 455),
 ('within', 455),
 ('blood', 454),
 ('parents', 453),
 ('upon', 453),
 ('viewers', 453),
 ('meets', 452),
 ('form', 452),
 ('peter', 452),
 ('tom', 452),
 ('usually', 452),
 ('soundtrack', 452),
 ('local', 450),
 ('certain', 448),
 ('follow', 448),
 ('whether', 447),
 ('possible', 446),
 ('emotional', 445),
 ('killed', 444),
 ('above', 444),
 ('de', 444),
 ('god', 443),
 ('middle', 443),
 ('needs', 442),
 ('happens', 442),
 ('flick', 442),
 ('masterpiece', 441),
 ('period', 440),
 ('major', 440),
 ('named', 439),
 ('haven', 439),
 ('particular', 438),
 ('th', 438),
 ('earth', 437),
 ('feature', 437),
 ('stand', 436),
 ('words', 435),
 ('typical', 435),
 ('elements', 433),
 ('obviously', 433),
 ('romance', 431),
 ('jane', 430),
 ('yourself', 427),
 ('showing', 427),
 ('brings', 426),
 ('fantasy', 426),
 ('guess', 423),
 ('america', 423),
 ('unfortunately', 422),
 ('huge', 422),
 ('indeed', 421),
 ('running', 421),
 ('talent', 420),
 ('stage', 419),
 ('started', 418),
 ('leads', 417),
 ('sweet', 417),
 ('japanese', 417),
 ('poor', 416),
 ('deal', 416),
 ('incredible', 413),
 ('personal', 413),
 ('fast', 412),
 ('became', 410),
 ('deep', 410),
 ('hours', 409),
 ('giving', 408),
 ('nearly', 408),
 ('dream', 408),
 ('clearly', 407),
 ('turned', 407),
 ('obvious', 406),
 ('near', 406),
 ('cut', 405),
 ('surprise', 405),
 ('era', 404),
 ('body', 404),
 ('hour', 403),
 ('female', 403),
 ('five', 403),
 ('note', 399),
 ('learn', 398),
 ('truth', 398),
 ('except', 397),
 ('feels', 397),
 ('match', 397),
 ('tony', 397),
 ('filmed', 394),
 ('clear', 394),
 ('complete', 394),
 ('street', 393),
 ('eventually', 393),
 ('keeps', 393),
 ('older', 393),
 ('lots', 393),
 ('buy', 392),
 ('william', 391),
 ('stewart', 391),
 ('fall', 390),
 ('joe', 390),
 ('meet', 390),
 ('unlike', 389),
 ('talking', 389),
 ('shots', 389),
 ('rating', 389),
 ('difficult', 389),
 ('dramatic', 388),
 ('means', 388),
 ('situation', 386),
 ('wonder', 386),
 ('present', 386),
 ('appears', 386),
 ('subject', 386),
 ('comments', 385),
 ('general', 383),
 ('sequences', 383),
 ('lee', 383),
 ('points', 382),
 ('earlier', 382),
 ('gone', 379),
 ('check', 379),
 ('suspense', 378),
 ('recommended', 378),
 ('ten', 378),
 ('third', 377),
 ('business', 377),
 ('talk', 375),
 ('leaves', 375),
 ('beyond', 375),
 ('portrayal', 374),
 ('beautifully', 373),
 ('single', 372),
 ('bill', 372),
 ('plenty', 371),
 ('word', 371),
 ('whom', 370),
 ('falls', 370),
 ('scary', 369),
 ('non', 369),
 ('figure', 369),
 ('battle', 369),
 ('using', 368),
 ('return', 368),
 ('doubt', 367),
 ('add', 367),
 ('hear', 366),
 ('solid', 366),
 ('success', 366),
 ('jokes', 365),
 ('oh', 365),
 ('touching', 365),
 ('political', 365),
 ('hell', 364),
 ('awesome', 364),
 ('boys', 364),
 ('sexual', 362),
 ('recently', 362),
 ('dog', 362),
 ('please', 361),
 ('wouldn', 361),
 ('straight', 361),
 ('features', 361),
 ('forget', 360),
 ('setting', 360),
 ('lack', 360),
 ('married', 359),
 ('mark', 359),
 ('social', 357),
 ('interested', 356),
 ('adventure', 356),
 ('actual', 355),
 ('terrific', 355),
 ('sees', 355),
 ('brothers', 355),
 ('move', 354),
 ('call', 354),
 ('various', 353),
 ('theater', 353),
 ('dr', 353),
 ('animated', 352),
 ('western', 351),
 ('baby', 350),
 ('space', 350),
 ('leading', 348),
 ('disappointed', 348),
 ('portrayed', 346),
 ('aren', 346),
 ('screenplay', 345),
 ('smith', 345),
 ('towards', 344),
 ('hate', 344),
 ('noir', 343),
 ('outstanding', 342),
 ('decent', 342),
 ('kelly', 342),
 ('directors', 341),
 ('journey', 341),
 ('none', 340),
 ('looked', 340),
 ('effective', 340),
 ('storyline', 339),
 ('caught', 339),
 ('sci', 339),
 ('fi', 339),
 ('cold', 339),
 ('mary', 339),
 ('rich', 338),
 ('charming', 338),
 ('popular', 337),
 ('rare', 337),
 ('manages', 337),
 ('harry', 337),
 ('spirit', 336),
 ('appreciate', 335),
 ('open', 335),
 ('moves', 334),
 ('basically', 334),
 ('acted', 334),
 ('inside', 333),
 ('boring', 333),
 ('century', 333),
 ('mention', 333),
 ('deserves', 333),
 ('subtle', 333),
 ('pace', 333),
 ('familiar', 332),
 ('background', 332),
 ('ben', 331),
 ('creepy', 330),
 ('supposed', 330),
 ('secret', 329),
 ('die', 328),
 ('jim', 328),
 ('question', 327),
 ('effect', 327),
 ('natural', 327),
 ('impressive', 326),
 ('rate', 326),
 ('language', 326),
 ('saying', 325),
 ('intelligent', 325),
 ('telling', 324),
 ('realize', 324),
 ('material', 324),
 ('scott', 324),
 ('singing', 323),
 ('dancing', 322),
 ('visual', 321),
 ('adult', 321),
 ('imagine', 321),
 ('kept', 320),
 ('office', 320),
 ('uses', 319),
 ('pure', 318),
 ('wait', 318),
 ('stunning', 318),
 ('review', 317),
 ('previous', 317),
 ('copy', 317),
 ('seriously', 317),
 ('reading', 316),
 ('create', 316),
 ('hot', 316),
 ('created', 316),
 ('magic', 316),
 ('somehow', 316),
 ('stay', 315),
 ('attempt', 315),
 ('escape', 315),
 ('crazy', 315),
 ('air', 315),
 ('frank', 315),
 ('hands', 314),
 ('filled', 313),
 ('expected', 312),
 ('average', 312),
 ('surprisingly', 312),
 ('complex', 311),
 ('quickly', 310),
 ('successful', 310),
 ('studio', 310),
 ('plus', 309),
 ('male', 309),
 ('co', 307),
 ('images', 306),
 ('casting', 306),
 ('following', 306),
 ('minute', 306),
 ('exciting', 306),
 ('members', 305),
 ('follows', 305),
 ('themes', 305),
 ('german', 305),
 ('reasons', 305),
 ('e', 305),
 ('touch', 304),
 ('edge', 304),
 ('free', 304),
 ('cute', 304),
 ('genius', 304),
 ('outside', 303),
 ('reviews', 302),
 ('admit', 302),
 ('ok', 302),
 ('younger', 302),
 ('fighting', 301),
 ('odd', 301),
 ('master', 301),
 ('recent', 300),
 ('thanks', 300),
 ('break', 300),
 ('comment', 300),
 ('apart', 299),
 ('emotions', 298),
 ('lovely', 298),
 ('begin', 298),
 ('doctor', 297),
 ('party', 297),
 ('italian', 297),
 ('la', 296),
 ('missed', 296),
 ...]

In [10]:
pos_neg_ratios = Counter()

for term,cnt in list(total_counts.most_common()):
    if(cnt > 100):
        pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)
        pos_neg_ratios[term] = pos_neg_ratio

for word,ratio in pos_neg_ratios.most_common():
    if(ratio > 1):
        pos_neg_ratios[word] = np.log(ratio)
    else:
        pos_neg_ratios[word] = -np.log((1 / (ratio+0.01)))

In [11]:
# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()


Out[11]:
[('edie', 4.6913478822291435),
 ('paulie', 4.0775374439057197),
 ('felix', 3.1527360223636558),
 ('polanski', 2.8233610476132043),
 ('matthau', 2.8067217286092401),
 ('victoria', 2.6810215287142909),
 ('mildred', 2.6026896854443837),
 ('gandhi', 2.5389738710582761),
 ('flawless', 2.451005098112319),
 ('superbly', 2.2600254785752498),
 ('perfection', 2.1594842493533721),
 ('astaire', 2.1400661634962708),
 ('captures', 2.0386195471595809),
 ('voight', 2.0301704926730531),
 ('wonderfully', 2.0218960560332353),
 ('powell', 1.9783454248084671),
 ('brosnan', 1.9547990964725592),
 ('lily', 1.9203768470501485),
 ('bakshi', 1.9029851043382795),
 ('lincoln', 1.9014583864844796),
 ('refreshing', 1.8551812956655511),
 ('breathtaking', 1.8481124057791867),
 ('bourne', 1.8478489358790986),
 ('lemmon', 1.8458266904983307),
 ('delightful', 1.8002701588959635),
 ('flynn', 1.7996646487351682),
 ('andrews', 1.7764919970972666),
 ('homer', 1.7692866133759964),
 ('beautifully', 1.7626953362841438),
 ('soccer', 1.7578579175523736),
 ('elvira', 1.7397031072720019),
 ('underrated', 1.7197859696029656),
 ('gripping', 1.7165360479904674),
 ('superb', 1.7091514458966952),
 ('delight', 1.6714733033535532),
 ('welles', 1.6677068205580761),
 ('sadness', 1.663505133704376),
 ('sinatra', 1.6389967146756448),
 ('touching', 1.637217476541176),
 ('timeless', 1.62924053973028),
 ('macy', 1.6211339521972916),
 ('unforgettable', 1.6177367152487956),
 ('favorites', 1.6158688027643908),
 ('stewart', 1.6119987332957739),
 ('sullivan', 1.6094379124341003),
 ('extraordinary', 1.6094379124341003),
 ('hartley', 1.6094379124341003),
 ('brilliantly', 1.5950491749820008),
 ('friendship', 1.5677652160335325),
 ('wonderful', 1.5645425925262093),
 ('palma', 1.5553706911638245),
 ('magnificent', 1.54663701119507),
 ('finest', 1.5462590108125689),
 ('jackie', 1.5439233053234738),
 ('ritter', 1.5404450409471491),
 ('tremendous', 1.5184661342283736),
 ('freedom', 1.5091151908062312),
 ('fantastic', 1.5048433868558566),
 ('terrific', 1.5026699370083942),
 ('noir', 1.493925025312256),
 ('sidney', 1.493925025312256),
 ('outstanding', 1.4910053152089213),
 ('pleasantly', 1.4894785973551214),
 ('mann', 1.4894785973551214),
 ('nancy', 1.488077055429833),
 ('marie', 1.4825711915553104),
 ('marvelous', 1.4739999415389962),
 ('excellent', 1.4647538505723599),
 ('ruth', 1.4596256342054401),
 ('stanwyck', 1.4412101187160054),
 ('widmark', 1.4350845252893227),
 ('splendid', 1.4271163556401458),
 ('chan', 1.423108334242607),
 ('exceptional', 1.4201959127955721),
 ('tender', 1.410986973710262),
 ('gentle', 1.4078005663408544),
 ('poignant', 1.4022947024663317),
 ('gem', 1.3932148039644643),
 ('amazing', 1.3919815802404802),
 ('chilling', 1.3862943611198906),
 ('fisher', 1.3862943611198906),
 ('davies', 1.3862943611198906),
 ('captivating', 1.3862943611198906),
 ('darker', 1.3652409519220583),
 ('april', 1.3499267169490159),
 ('kelly', 1.3461743673304654),
 ('blake', 1.3418425985490567),
 ('overlooked', 1.329135947279942),
 ('ralph', 1.32818673031261),
 ('bette', 1.3156767939059373),
 ('hoffman', 1.3150668518315229),
 ('cole', 1.3121863889661687),
 ('shines', 1.3049487216659381),
 ('powerful', 1.2999662776313934),
 ('notch', 1.2950456896547455),
 ('remarkable', 1.2883688239495823),
 ('pitt', 1.286210902562908),
 ('winters', 1.2833463918674481),
 ('vivid', 1.2762934659055623),
 ('gritty', 1.2757524867200667),
 ('giallo', 1.2745029551317739),
 ('portrait', 1.2704625455947689),
 ('innocence', 1.2694300209805796),
 ('psychiatrist', 1.2685113254635072),
 ('favorite', 1.2668956297860055),
 ('ensemble', 1.2656663733312759),
 ('stunning', 1.2622417124499117),
 ('burns', 1.259880436264232),
 ('garbo', 1.258954938743289),
 ('barbara', 1.2580400255962119),
 ('philip', 1.2527629684953681),
 ('panic', 1.2527629684953681),
 ('holly', 1.2527629684953681),
 ('carol', 1.2481440226390734),
 ('perfect', 1.246742480713785),
 ('appreciated', 1.2462482874741743),
 ('favourite', 1.2411123512753928),
 ('journey', 1.2367626271489269),
 ('rural', 1.235471471385307),
 ('bond', 1.2321436812926323),
 ('builds', 1.2305398317106577),
 ('brilliant', 1.2287554137664785),
 ('brooklyn', 1.2286654169163074),
 ('von', 1.225175011976539),
 ('recommended', 1.2163953243244932),
 ('unfolds', 1.2163953243244932),
 ('daniel', 1.20215296760895),
 ('perfectly', 1.1971931173405572),
 ('crafted', 1.1962507582320256),
 ('prince', 1.1939224684724346),
 ('troubled', 1.192138346678933),
 ('consequences', 1.1865810616140668),
 ('haunting', 1.1814999484738773),
 ('cinderella', 1.180052620608284),
 ('alexander', 1.1759989522835299),
 ('emotions', 1.1753049094563641),
 ('boxing', 1.1735135968412274),
 ('subtle', 1.1734135017508081),
 ('curtis', 1.1649873576129823),
 ('rare', 1.1566438362402944),
 ('loved', 1.1563661500586044),
 ('daughters', 1.1526795099383853),
 ('courage', 1.1438688802562305),
 ('dentist', 1.1426722784621401),
 ('highly', 1.1420208631618658),
 ('nominated', 1.1409146683587992),
 ('tony', 1.1397491942285991),
 ('draws', 1.1325138403437911),
 ('everyday', 1.1306150197542835),
 ('contrast', 1.1284652518177909),
 ('cried', 1.1213405397456659),
 ('fabulous', 1.1210851445201684),
 ('ned', 1.120591195386885),
 ('fay', 1.120591195386885),
 ('emma', 1.1184149159642893),
 ('sensitive', 1.113318436057805),
 ('smooth', 1.1089750757036563),
 ('dramas', 1.1080910326226534),
 ('today', 1.1050431789984001),
 ('helps', 1.1023091505494358),
 ('inspiring', 1.0986122886681098),
 ('jimmy', 1.0937696641923216),
 ('awesome', 1.0931328229034842),
 ('unique', 1.0881409888008142),
 ('tragic', 1.0871835928444868),
 ('intense', 1.0870514662670339),
 ('stellar', 1.0857088838322018),
 ('rival', 1.0822184788924332),
 ('provides', 1.0797081340289569),
 ('depression', 1.0782034170369026),
 ('shy', 1.0775588794702773),
 ('carrie', 1.076139432816051),
 ('blend', 1.0753554265038423),
 ('hank', 1.0736109864626924),
 ('diana', 1.0726368022648489),
 ('adorable', 1.0726368022648489),
 ('unexpected', 1.0722255334949147),
 ('achievement', 1.0668635903535293),
 ('bettie', 1.0663514264498881),
 ('happiness', 1.0632729222228008),
 ('glorious', 1.0608719606852626),
 ('davis', 1.0541605260972757),
 ('terrifying', 1.0525211814678428),
 ('beauty', 1.050410186850232),
 ('ideal', 1.0479685558493548),
 ('fears', 1.0467872208035236),
 ('hong', 1.0438040521731147),
 ('seasons', 1.0433496099930604),
 ('fascinating', 1.0414538748281612),
 ('carries', 1.0345904299031787),
 ('satisfying', 1.0321225473992768),
 ('definite', 1.0319209141694374),
 ('touched', 1.0296194171811581),
 ('greatest', 1.0248947127715422),
 ('creates', 1.0241097613701886),
 ('aunt', 1.023388867430522),
 ('walter', 1.022328983918479),
 ('spectacular', 1.0198314108149955),
 ('portrayal', 1.0189810189761024),
 ('ann', 1.0127808528183286),
 ('enterprise', 1.0116009116784799),
 ('musicals', 1.0096648026516135),
 ('deeply', 1.0094845087721023),
 ('incredible', 1.0061677561461084),
 ('mature', 1.0060195018402847),
 ('triumph', 0.99682959435816731),
 ('margaret', 0.99682959435816731),
 ('navy', 0.99493385919326827),
 ('harry', 0.99176919305006062),
 ('lucas', 0.990398704027877),
 ('sweet', 0.98966110487955483),
 ('joey', 0.98794672078059009),
 ('oscar', 0.98721905111049713),
 ('balance', 0.98649499054740353),
 ('warm', 0.98485340331145166),
 ('ages', 0.98449898190068863),
 ('guilt', 0.98082925301172619),
 ('glover', 0.98082925301172619),
 ('carrey', 0.98082925301172619),
 ('learns', 0.97881108885548895),
 ('unusual', 0.97788374278196932),
 ('sons', 0.97777581552483595),
 ('complex', 0.97761897738147796),
 ('essence', 0.97753435711487369),
 ('brazil', 0.9769153536905899),
 ('widow', 0.97650959186720987),
 ('solid', 0.97537964824416146),
 ('beautiful', 0.97326301262841053),
 ('holmes', 0.97246100334120955),
 ('awe', 0.97186058302896583),
 ('vhs', 0.97116734209998934),
 ('eerie', 0.97116734209998934),
 ('lonely', 0.96873720724669754),
 ('grim', 0.96873720724669754),
 ('sport', 0.96825047080486615),
 ('debut', 0.96508089604358704),
 ('destiny', 0.96343751029985703),
 ('thrillers', 0.96281074750904794),
 ('tears', 0.95977584381389391),
 ('rose', 0.95664202739772253),
 ('feelings', 0.95551144502743635),
 ('ginger', 0.95551144502743635),
 ('winning', 0.95471810900804055),
 ('stanley', 0.95387344302319799),
 ('cox', 0.95343027882361187),
 ('paris', 0.95278479030472663),
 ('heart', 0.95238806924516806),
 ('hooked', 0.95155887071161305),
 ('comfortable', 0.94803943018873538),
 ('mgm', 0.94446160884085151),
 ('masterpiece', 0.94155039863339296),
 ('themes', 0.94118828349588235),
 ('danny', 0.93967118051821874),
 ('anime', 0.93378388932167222),
 ('perry', 0.93328830824272613),
 ('joy', 0.93301752567946861),
 ('lovable', 0.93081883243706487),
 ('mysteries', 0.92953595862417571),
 ('hal', 0.92953595862417571),
 ('louis', 0.92871325187271225),
 ('charming', 0.92520609553210742),
 ('urban', 0.92367083917177761),
 ('allows', 0.92183091224977043),
 ('impact', 0.91815814604895041),
 ('italy', 0.91629073187415511),
 ('gradually', 0.91629073187415511),
 ('lifestyle', 0.91629073187415511),
 ('spy', 0.91289514287301687),
 ('treat', 0.91193342650519937),
 ('subsequent', 0.91056005716517008),
 ('kennedy', 0.90981821736853763),
 ('loving', 0.90967549275543591),
 ('surprising', 0.90937028902958128),
 ('quiet', 0.90648673177753425),
 ('winter', 0.90624039602065365),
 ('reveals', 0.90490540964902977),
 ('raw', 0.90445627422715225),
 ('funniest', 0.90078654533818991),
 ('pleased', 0.89994159387262562),
 ('norman', 0.89994159387262562),
 ('thief', 0.89874642222324552),
 ('season', 0.89827222637147675),
 ('secrets', 0.89794159320595857),
 ('colorful', 0.89705936994626756),
 ('highest', 0.8967461358011849),
 ('compelling', 0.89462923509297576),
 ('danes', 0.89248008318043659),
 ('castle', 0.88967708335606499),
 ('kudos', 0.88889175768604067),
 ('great', 0.88810470901464589),
 ('baseball', 0.88730319500090271),
 ('subtitles', 0.88730319500090271),
 ('bleak', 0.88730319500090271),
 ('winner', 0.88643776872447388),
 ('tragedy', 0.88563699078315261),
 ('todd', 0.88551907320740142),
 ('nicely', 0.87924946019380601),
 ('arthur', 0.87546873735389985),
 ('essential', 0.87373111745535925),
 ('gorgeous', 0.8731725250935497),
 ('fonda', 0.87294029100054127),
 ('eastwood', 0.87139541196626402),
 ('focuses', 0.87082835779739776),
 ('enjoyed', 0.87070195951624607),
 ('natural', 0.86997924506912838),
 ('intensity', 0.86835126958503595),
 ('witty', 0.86824103423244681),
 ('rob', 0.8642954367557748),
 ('worlds', 0.86377269759070874),
 ('health', 0.86113891179907498),
 ('magical', 0.85953791528170564),
 ('deeper', 0.85802182375017932),
 ('lucy', 0.85618680780444956),
 ('moving', 0.85566611005772031),
 ('lovely', 0.85290640004681306),
 ('purple', 0.8513711857748395),
 ('memorable', 0.84801189112086062),
 ('sings', 0.84729786038720367),
 ('craig', 0.84342938360928321),
 ('modesty', 0.84342938360928321),
 ('relate', 0.84326559685926517),
 ('episodes', 0.84223712084137292),
 ('strong', 0.84167135777060931),
 ('smith', 0.83959811108590054),
 ('tear', 0.83704136022001441),
 ('apartment', 0.83333115290549531),
 ('princess', 0.83290912293510388),
 ('disagree', 0.83290912293510388),
 ('kung', 0.83173334384609199),
 ('adventure', 0.83150561393278388),
 ('columbo', 0.82667857318446791),
 ('jake', 0.82667857318446791),
 ('adds', 0.82485652591452319),
 ('hart', 0.82472353834866463),
 ('strength', 0.82417544296634937),
 ('realizes', 0.82360006895738058),
 ('dave', 0.8232003088081431),
 ('childhood', 0.82208086393583857),
 ('forbidden', 0.81989888619908913),
 ('tight', 0.81883539572344199),
 ('surreal', 0.8178506590609026),
 ('manager', 0.81770990320170756),
 ('dancer', 0.81574950265227764),
 ('studios', 0.81093021621632877),
 ('con', 0.81093021621632877),
 ('miike', 0.80821651034473263),
 ('realistic', 0.80807714723392232),
 ('explicit', 0.80792269515237358),
 ('kurt', 0.8060875917405409),
 ('traditional', 0.80535917116687328),
 ('deals', 0.80535917116687328),
 ('holds', 0.80493858654806194),
 ('carl', 0.80437281567016972),
 ('touches', 0.80396154690023547),
 ('gene', 0.80314807577427383),
 ('albert', 0.8027669055771679),
 ('abc', 0.80234647252493729),
 ('cry', 0.80011930011211307),
 ('sides', 0.7995275841185171),
 ('develops', 0.79850769621777162),
 ('eyre', 0.79850769621777162),
 ('dances', 0.79694397424158891),
 ('oscars', 0.79633141679517616),
 ('legendary', 0.79600456599965308),
 ('hearted', 0.79492987486988764),
 ('importance', 0.79492987486988764),
 ('portraying', 0.79356592830699269),
 ('impressed', 0.79258107754813223),
 ('waters', 0.79112758892014912),
 ('empire', 0.79078565012386137),
 ('edge', 0.789774016249017),
 ('jean', 0.78845736036427028),
 ('environment', 0.78845736036427028),
 ('sentimental', 0.7864791203521645),
 ('captured', 0.78623760362595729),
 ('styles', 0.78592891401091158),
 ('daring', 0.78592891401091158),
 ('frank', 0.78275933924963248),
 ('tense', 0.78275933924963248),
 ('backgrounds', 0.78275933924963248),
 ('matches', 0.78275933924963248),
 ('gothic', 0.78209466657644144),
 ('sharp', 0.7814397877056235),
 ('achieved', 0.78015855754957497),
 ('court', 0.77947526404844247),
 ('steals', 0.7789140023173704),
 ('rules', 0.77844476107184035),
 ('colors', 0.77684619943659217),
 ('reunion', 0.77318988823348167),
 ('covers', 0.77139937745969345),
 ('tale', 0.77010822169607374),
 ('rain', 0.7683706017975328),
 ('denzel', 0.76804848873306297),
 ('stays', 0.76787072675588186),
 ('blob', 0.76725515271366718),
 ('maria', 0.76214005204689672),
 ('conventional', 0.76214005204689672),
 ('fresh', 0.76158434211317383),
 ('midnight', 0.76096977689870637),
 ('landscape', 0.75852993982279704),
 ('animated', 0.75768570169751648),
 ('titanic', 0.75666058628227129),
 ('sunday', 0.75666058628227129),
 ('spring', 0.7537718023763802),
 ('cagney', 0.7537718023763802),
 ('enjoyable', 0.75246375771636476),
 ('immensely', 0.75198768058287868),
 ('sir', 0.7507762933965817),
 ('nevertheless', 0.75067102469813185),
 ('driven', 0.74994477895307854),
 ('performances', 0.74883252516063137),
 ('memories', 0.74721440183022114),
 ('nowadays', 0.74721440183022114),
 ('simple', 0.74641420974143258),
 ('golden', 0.74533293373051557),
 ('leslie', 0.74533293373051557),
 ('lovers', 0.74497224842453125),
 ('relationship', 0.74484232345601786),
 ('supporting', 0.74357803418683721),
 ('che', 0.74262723782331497),
 ('packed', 0.7410032017375805),
 ('trek', 0.74021469141793106),
 ('provoking', 0.73840377214806618),
 ('strikes', 0.73759894313077912),
 ('depiction', 0.73682224406260699),
 ('emotional', 0.73678211645681524),
 ('secretary', 0.7366322924996842),
 ('influenced', 0.73511137965897755),
 ('florida', 0.73511137965897755),
 ('germany', 0.73288750920945944),
 ('brings', 0.73142936713096229),
 ('lewis', 0.73129894652432159),
 ('elderly', 0.73088750854279239),
 ('owner', 0.72743625403857748),
 ('streets', 0.72666987259858895),
 ('henry', 0.72642196944481741),
 ('portrays', 0.72593700338293632),
 ('bears', 0.7252354951114458),
 ('china', 0.72489587887452556),
 ('anger', 0.72439972406404984),
 ('society', 0.72433010799663333),
 ('available', 0.72415741730250549),
 ('best', 0.72347034060446314),
 ('bugs', 0.72270598280148979),
 ('magic', 0.71878961117328299),
 ('delivers', 0.71846498854423513),
 ('verhoeven', 0.71846498854423513),
 ('jim', 0.71783979315031676),
 ('donald', 0.71667767797013937),
 ('endearing', 0.71465338578090898),
 ('relationships', 0.71393795022901896),
 ('greatly', 0.71256526641704687),
 ('charlie', 0.71024161391924534),
 ('brad', 0.71024161391924534),
 ('simon', 0.70967648251115578),
 ('effectively', 0.70914752190638641),
 ('march', 0.70774597998109789),
 ('atmosphere', 0.70744773070214162),
 ('influence', 0.70733181555190172),
 ('genius', 0.706392407309966),
 ('emotionally', 0.70556970055850243),
 ('ken', 0.70526854109229009),
 ('identity', 0.70484322032313651),
 ('sophisticated', 0.70470800296102132),
 ('dan', 0.70457587638356811),
 ('andrew', 0.70329955202396321),
 ('india', 0.70144598337464037),
 ('roy', 0.69970458110610434),
 ('surprisingly', 0.6995780708902356),
 ('sky', 0.69780919366575667),
 ('romantic', 0.69664981111114743),
 ('match', 0.69566924999265523),
 ('meets', 0.69314718055994529),
 ('cowboy', 0.69314718055994529),
 ('wave', 0.69314718055994529),
 ('bitter', 0.69314718055994529),
 ('patient', 0.69314718055994529),
 ('stylish', 0.69314718055994529),
 ('britain', 0.69314718055994529),
 ('affected', 0.69314718055994529),
 ('beatty', 0.69314718055994529),
 ('love', 0.69198533541937324),
 ('paul', 0.68980827929443067),
 ('andy', 0.68846333124751902),
 ('performance', 0.68797386327972465),
 ('patrick', 0.68645819240914863),
 ('unlike', 0.68546468438792907),
 ('brooks', 0.68433655087779044),
 ('refuses', 0.68348526964820844),
 ('award', 0.6824518914431974),
 ('complaint', 0.6824518914431974),
 ('ride', 0.68229716453587952),
 ('dawson', 0.68171848473632257),
 ('luke', 0.68158635815886937),
 ('wells', 0.68087708796813096),
 ('france', 0.6804081547825156),
 ('sports', 0.68007509899259255),
 ('handsome', 0.68007509899259255),
 ('directs', 0.67875844310784572),
 ('rebel', 0.67875844310784572),
 ('greater', 0.67605274720064523),
 ('dreams', 0.67599410133369586),
 ('effective', 0.67565402311242806),
 ('interpretation', 0.67479804189174875),
 ('works', 0.67445504754779284),
 ('brando', 0.67445504754779284),
 ('noble', 0.6737290947028437),
 ('paced', 0.67314651385327573),
 ('le', 0.67067432470788668),
 ('master', 0.67015766233524654),
 ('h', 0.6696166831497512),
 ('rings', 0.66904962898088483),
 ('easy', 0.66895995494594152),
 ('city', 0.66820823221269321),
 ('sunshine', 0.66782937257565544),
 ('succeeds', 0.66647893347778397),
 ('relations', 0.664159643686693),
 ('england', 0.66387679825983203),
 ('glimpse', 0.66329421741026418),
 ('aired', 0.66268797307523675),
 ('sees', 0.66263163663399482),
 ('both', 0.66248336767382998),
 ('definitely', 0.66199789483898808),
 ('imaginative', 0.66139848224536502),
 ('appreciate', 0.66083893732728749),
 ('tricks', 0.66071190480679143),
 ('striking', 0.66071190480679143),
 ('carefully', 0.65999497324304479),
 ('complicated', 0.65981076029235353),
 ('perspective', 0.65962448852130173),
 ('trilogy', 0.65877953705573755),
 ('future', 0.65834665141052828),
 ('lion', 0.65742909795786608),
 ('douglas', 0.65540685257709819),
 ('victor', 0.65540685257709819),
 ('inspired', 0.65459851044271034),
 ('marriage', 0.65392646740666405),
 ('demands', 0.65392646740666405),
 ('father', 0.65172321672194655),
 ('page', 0.65123628494430852),
 ('instant', 0.65058756614114943),
 ('era', 0.6495567444850836),
 ('ruthless', 0.64934455790155243),
 ('saga', 0.64934455790155243),
 ('joan', 0.64891392558311978),
 ('joseph', 0.64841128671855386),
 ('workers', 0.64829661439459352),
 ('fantasy', 0.64726757480925168),
 ('distant', 0.64551913157069074),
 ('accomplished', 0.64551913157069074),
 ('manhattan', 0.64435701639051324),
 ('personal', 0.64355023942057321),
 ('meeting', 0.64313675998528386),
 ('individual', 0.64313675998528386),
 ('pushing', 0.64313675998528386),
 ('pleasant', 0.64250344774119039),
 ('brave', 0.64185388617239469),
 ('william', 0.64083139119578469),
 ('hudson', 0.64077919504262937),
 ('friendly', 0.63949446706762514),
 ('eccentric', 0.63907995928966954),
 ('awards', 0.63875310849414646),
 ('jack', 0.63838309514997038),
 ('seeking', 0.63808740337691783),
 ('divorce', 0.63757732940513456),
 ('colonel', 0.63757732940513456),
 ('jane', 0.63443957973316734),
 ('keeping', 0.63414883979798953),
 ('gives', 0.63383568159497883),
 ('ted', 0.63342794585832296),
 ('animation', 0.63208692379869902),
 ('progress', 0.6317782341836532),
 ('larger', 0.63127177684185776),
 ('concert', 0.63127177684185776),
 ('nation', 0.6296337748376194),
 ('albeit', 0.62739580299716491),
 ('adapted', 0.62613647027698516),
 ('discovers', 0.62542900650499444),
 ('classic', 0.62504956428050518),
 ('segment', 0.62335141862440335),
 ('morgan', 0.62303761437291871),
 ('mouse', 0.62294292188669675),
 ('impressive', 0.62211140744319349),
 ('artist', 0.62168821657780038),
 ('ultimate', 0.62168821657780038),
 ('griffith', 0.62117368093485603),
 ('drew', 0.62082651898031915),
 ('emily', 0.62082651898031915),
 ('moved', 0.6197197120051281),
 ('families', 0.61903920840622351),
 ('profound', 0.61903920840622351),
 ('innocent', 0.61851219917136446),
 ('versions', 0.61730910416844087),
 ('eddie', 0.61691981517206107),
 ('criticism', 0.61651395453902935),
 ('nature', 0.61594514653194088),
 ('recognized', 0.61518563909023349),
 ('sexuality', 0.61467556511845012),
 ('contract', 0.61400986000122149),
 ('brian', 0.61344043794920278),
 ('remembered', 0.6131044728864089),
 ('determined', 0.6123858239154869),
 ('offers', 0.61207935747116349),
 ('pleasure', 0.61195702582993206),
 ('washington', 0.61180154110599294),
 ('images', 0.61159731359583758),
 ('games', 0.61067095873570676),
 ('academy', 0.60872983874736208),
 ('fashioned', 0.60798937221963845),
 ('melodrama', 0.60749173598145145),
 ('rough', 0.60613580357031549),
 ('charismatic', 0.60613580357031549),
 ('peoples', 0.60613580357031549),
 ('dealing', 0.60517840761398811),
 ('fine', 0.60496962268013299),
 ('tap', 0.60391604683200273),
 ('trio', 0.60157998703445481),
 ('russell', 0.60120968523425966),
 ('figures', 0.60077386042893011),
 ('ward', 0.60005675749393339),
 ('shine', 0.59911823091166894),
 ('brady', 0.59911823091166894),
 ('job', 0.59845562125168661),
 ('satisfied', 0.59652034487087369),
 ('river', 0.59637962862495086),
 ('brown', 0.595773016534769),
 ('believable', 0.59566072133302495),
 ('always', 0.59470710774669278),
 ('bound', 0.59470710774669278),
 ('hall', 0.5933967777928858),
 ('cook', 0.5916777203950857),
 ('claire', 0.59136448625000293),
 ('broadway', 0.59033768669372433),
 ('anna', 0.58778666490211906),
 ('peace', 0.58628403501758408),
 ('visually', 0.58539431926349916),
 ('morality', 0.58525821854876026),
 ('falk', 0.58525821854876026),
 ('growing', 0.58466653756587539),
 ('experiences', 0.58314628534561685),
 ('stood', 0.58314628534561685),
 ('touch', 0.58122926435596001),
 ('lives', 0.5810976767513224),
 ('kubrick', 0.58066919713325493),
 ('timing', 0.58047401805583243),
 ('expressions', 0.57981849525294216),
 ('struggles', 0.57981849525294216),
 ('authentic', 0.57848427223980559),
 ('helen', 0.57763429343810091),
 ('pre', 0.57700753064729182),
 ('quirky', 0.5753641449035618),
 ('young', 0.57531672344534313),
 ('inner', 0.57454143815209846),
 ('mexico', 0.57443087372056334),
 ('clint', 0.57380042292737909),
 ('sisters', 0.57286101468544337),
 ('realism', 0.57226528899949558),
 ('french', 0.5720692490067093),
 ('personalities', 0.5720692490067093),
 ('surprises', 0.57113222999698177),
 ('adventures', 0.57113222999698177),
 ('overcome', 0.5697681593994407),
 ('timothy', 0.56953322459276867),
 ('tales', 0.56909453188996639),
 ('war', 0.56843317302781682),
 ('civil', 0.5679840376059393),
 ('countries', 0.56737779327091187),
 ('streep', 0.56710645966458029),
 ('tradition', 0.56685345523565323),
 ('oliver', 0.56673325570428668),
 ('australia', 0.56580775818334383),
 ('understanding', 0.56531380905006046),
 ('players', 0.56509525370004821),
 ('knowing', 0.56489284503626647),
 ('rogers', 0.56421349718405212),
 ('suspenseful', 0.56368911332305849),
 ('variety', 0.56368911332305849),
 ('true', 0.56281525180810066),
 ('jr', 0.56220982311246936),
 ('psychological', 0.56108745854687891),
 ('sent', 0.55961578793542266),
 ('grand', 0.55961578793542266),
 ('branagh', 0.55961578793542266),
 ('reminiscent', 0.55961578793542266),
 ('performing', 0.55961578793542266),
 ('wealth', 0.55961578793542266),
 ('overwhelming', 0.55961578793542266),
 ('odds', 0.55961578793542266),
 ('brothers', 0.55891181043362848),
 ('howard', 0.55811089675600245),
 ('david', 0.55693122256475369),
 ('generation', 0.55628799784274796),
 ('grow', 0.55612538299565417),
 ('survival', 0.55594605904646033),
 ('mainstream', 0.55574731115750231),
 ('dick', 0.55431073570572953),
 ('charm', 0.55288175575407861),
 ('kirk', 0.55278982286502287),
 ('twists', 0.55244729845681018),
 ('gangster', 0.55206858230003986),
 ('jeff', 0.55179306225421365),
 ('family', 0.55116244510065526),
 ('tend', 0.55053307336110335),
 ('thanks', 0.55049088015842218),
 ('world', 0.54744234723432639),
 ('sutherland', 0.54743536937855164),
 ('life', 0.54695514434959924),
 ('disc', 0.54654370636806993),
 ('bug', 0.54654370636806993),
 ('tribute', 0.5455111817538808),
 ('europe', 0.54522705048332309),
 ('sacrifice', 0.54430155296238014),
 ('color', 0.54405127139431109),
 ('superior', 0.54333490233128523),
 ('york', 0.54318235866536513),
 ('pulls', 0.54266622962164945),
 ('jackson', 0.54232429082536171),
 ('hearts', 0.54232429082536171),
 ('enjoy', 0.54124285135906114),
 ('redemption', 0.54056759296472823),
 ('madness', 0.540384426007535),
 ('stands', 0.5389965007326869),
 ('trial', 0.5389965007326869),
 ('greek', 0.5389965007326869),
 ('hamilton', 0.5389965007326869),
 ('each', 0.5388212312554177),
 ('faithful', 0.53773307668591508),
 ('received', 0.5372768098531604),
 ('documentaries', 0.53714293208336406),
 ('jealous', 0.53714293208336406),
 ('different', 0.53709860682460819),
 ('describes', 0.53680111016925136),
 ('shorts', 0.53596159703753288),
 ('brilliance', 0.53551823635636209),
 ('mountains', 0.53492317534505118),
 ('share', 0.53408248593025787),
 ('dealt', 0.53408248593025787),
 ('providing', 0.53329847961804933),
 ('explore', 0.53329847961804933),
 ('series', 0.5325809226575603),
 ('fellow', 0.5323318289869543),
 ('loves', 0.53062825106217038),
 ('revolution', 0.53062825106217038),
 ('olivier', 0.53062825106217038),
 ('roman', 0.53062825106217038),
 ('century', 0.53002783074992665),
 ('musical', 0.52966871156747064),
 ('heroic', 0.52925932545482868),
 ('approach', 0.52806743020049673),
 ('ironically', 0.52806743020049673),
 ('temple', 0.52806743020049673),
 ('moves', 0.5279372642387119),
 ('gift', 0.52702030968597136),
 ('julie', 0.52609309589677911),
 ('tells', 0.52415107836314001),
 ('radio', 0.52394671172868779),
 ('uncle', 0.52354439617376536),
 ('union', 0.52324814376454787),
 ('deep', 0.52309571635780505),
 ('reminds', 0.52157841554225237),
 ('famous', 0.52118841080153722),
 ('jazz', 0.52053443789295151),
 ('dennis', 0.51987545928590861),
 ('epic', 0.51919387343650736),
 ('adult', 0.519167695083386),
 ('shows', 0.51915322220375304),
 ('performed', 0.5191244265806858),
 ('demons', 0.5191244265806858),
 ('discovered', 0.51879379341516751),
 ('eric', 0.51879379341516751),
 ('youth', 0.5185626062681431),
 ('human', 0.51851411224987087),
 ('tarzan', 0.51813827061227724),
 ('ourselves', 0.51794309153485463),
 ('wwii', 0.51758240622887042),
 ('passion', 0.5162164724008671),
 ('desire', 0.51607497965213445),
 ('pays', 0.51581316527702981),
 ('dirty', 0.51557622652458857),
 ('fox', 0.51557622652458857),
 ('sympathetic', 0.51546600332249293),
 ('symbolism', 0.51546600332249293),
 ('attitude', 0.51530993621331933),
 ('appearances', 0.51466440007315639),
 ('jeremy', 0.51466440007315639),
 ('fun', 0.51439068993048687),
 ('south', 0.51420972175023116),
 ('arrives', 0.51409894911095988),
 ('present', 0.51341965894303732),
 ('com', 0.51326167856387173),
 ('smile', 0.51265880484765169),
 ('alan', 0.51082562376599072),
 ('ring', 0.51082562376599072),
 ('visit', 0.51082562376599072),
 ('fits', 0.51082562376599072),
 ('provided', 0.51082562376599072),
 ('carter', 0.51082562376599072),
 ('aging', 0.51082562376599072),
 ('countryside', 0.51082562376599072),
 ('begins', 0.51015650363396647),
 ('success', 0.50900578704900468),
 ('japan', 0.50900578704900468),
 ('accurate', 0.50895471583017893),
 ('proud', 0.50800474742434931),
 ('daily', 0.5075946031845443),
 ('karloff', 0.50724780241810674),
 ('atmospheric', 0.50724780241810674),
 ('recently', 0.50714914903668207),
 ('fu', 0.50704490092608467),
 ('horrors', 0.50656122497953315),
 ('finding', 0.50637127341661037),
 ('lust', 0.5059356384717989),
 ('hitchcock', 0.50574947073413001),
 ('among', 0.50334004951332734),
 ('viewing', 0.50302139827440906),
 ('investigation', 0.50262885656181222),
 ('shining', 0.50262885656181222),
 ('duo', 0.5020919437972361),
 ('cameron', 0.5020919437972361),
 ('finds', 0.50128303100539795),
 ('contemporary', 0.50077528791248915),
 ('genuine', 0.50046283673044401),
 ('frightening', 0.49995595152908684),
 ('plays', 0.49975983848890226),
 ('age', 0.49941323171424595),
 ('position', 0.49899116611898781),
 ('continues', 0.49863035067217237),
 ('roles', 0.49839716550752178),
 ('james', 0.49837216269470402),
 ('individuals', 0.49824684155913052),
 ('brought', 0.49783842823917956),
 ('hilarious', 0.49714551986191058),
 ('brutal', 0.49681488669639234),
 ('appropriate', 0.49643688631389105),
 ('dance', 0.49581998314812048),
 ('league', 0.49578774640145024),
 ('helping', 0.49578774640145024),
 ('answers', 0.49578774640145024),
 ('stunts', 0.49561620510246196),
 ('traveling', 0.49532143723002542),
 ('thoroughly', 0.49414593456733524),
 ('depicted', 0.49317068852726992),
 ('combination', 0.49247648509779424),
 ('honor', 0.49247648509779424),
 ('differences', 0.49247648509779424),
 ('fully', 0.49213349075383811),
 ('tracy', 0.49159426183810306),
 ('battles', 0.49140753790888908),
 ('possibility', 0.49112055268665822),
 ('romance', 0.4901589869574316),
 ('initially', 0.49002249613622745),
 ('happy', 0.4898997500608791),
 ('crime', 0.48977221456815834),
 ('singing', 0.4893852925281213),
 ('especially', 0.48901267837860624),
 ('shakespeare', 0.48754793889664511),
 ('hugh', 0.48729512635579658),
 ('detail', 0.48609484250827351),
 ('julia', 0.48550781578170082),
 ('san', 0.48550781578170082),
 ('guide', 0.48550781578170082),
 ('desperation', 0.48550781578170082),
 ('companion', 0.48550781578170082),
 ('strongly', 0.48460242866688824),
 ('necessary', 0.48302334245403883),
 ('humanity', 0.48265474679929443),
 ('drama', 0.48221998493060503),
 ('nonetheless', 0.48183808689273838),
 ('intrigue', 0.48183808689273838),
 ('warming', 0.48183808689273838),
 ('cuba', 0.48183808689273838),
 ('planned', 0.47957308026188628),
 ('pictures', 0.47929937011921681),
 ('broadcast', 0.47849024312305422),
 ('nine', 0.47803580094299974),
 ('settings', 0.47743860773325364),
 ('history', 0.47732966933780852),
 ('ordinary', 0.47725880012690741),
 ('trade', 0.47692407209030935),
 ('official', 0.47608267532211779),
 ('primary', 0.47608267532211779),
 ('episode', 0.47529620261150429),
 ('role', 0.47520268270188676),
 ('spirit', 0.47477690799839323),
 ('grey', 0.47409361449726067),
 ('ways', 0.47323464982718205),
 ('cup', 0.47260441094579297),
 ('piano', 0.47260441094579297),
 ('familiar', 0.47241617565111949),
 ('sinister', 0.47198579044972683),
 ('reveal', 0.47171449364936496),
 ('max', 0.47150852042515579),
 ('dated', 0.47121648567094482),
 ('losing', 0.47000362924573563),
 ('discovery', 0.47000362924573563),
 ('vicious', 0.47000362924573563),
 ('genuinely', 0.46871413841586385),
 ('hatred', 0.46734051182625186),
 ('mistaken', 0.46702300110759781),
 ('dream', 0.46608972992459924),
 ('challenge', 0.46608972992459924),
 ('crisis', 0.46575733836428446),
 ('photographed', 0.46488852857896512),
 ('critics', 0.46430560813109778),
 ('bird', 0.46430560813109778),
 ('machines', 0.46430560813109778),
 ('born', 0.46411383518967209),
 ('detective', 0.4636633473511525),
 ('higher', 0.46328467899699055),
 ('remains', 0.46262352194811296),
 ('inevitable', 0.46262352194811296),
 ('soviet', 0.4618180446592961),
 ('ryan', 0.46134556650262099),
 ('african', 0.46112595521371813),
 ('smaller', 0.46081520319132935),
 ('techniques', 0.46052488529119184),
 ('information', 0.46034171833399862),
 ('deserved', 0.45999798712841444),
 ('lynch', 0.45953232937844013),
 ('spielberg', 0.45953232937844013),
 ('cynical', 0.45953232937844013),
 ('tour', 0.45953232937844013),
 ('francisco', 0.45953232937844013),
 ('struggle', 0.45911782160048453),
 ('language', 0.45902121257712653),
 ('visual', 0.45823514408822852),
 ('warner', 0.45724137763188427),
 ('social', 0.45720078250735313),
 ('reality', 0.45719346885019546),
 ('hidden', 0.45675840249571492),
 ('breaking', 0.45601738727099561),
 ('sometimes', 0.45563021171182794),
 ('modern', 0.45500247579345005),
 ('surfing', 0.45425527227759638),
 ('popular', 0.45410691533051023),
 ('surprised', 0.4534409399850382),
 ('follows', 0.45245361754408348),
 ('keeps', 0.45234869400701483),
 ('john', 0.4520909494482197),
 ('mixed', 0.45198512374305722),
 ('defeat', 0.45198512374305722),
 ('justice', 0.45142724367280018),
 ('treasure', 0.45083371313801535),
 ('presents', 0.44973793178615257),
 ('years', 0.44919197032104968),
 ('chief', 0.44895022004790319),
 ('shadows', 0.44802472252696035),
 ('closely', 0.44701411102103689),
 ('segments', 0.44701411102103689),
 ('lose', 0.44658335503763702),
 ('caine', 0.44628710262841953),
 ('caught', 0.44610275383999071),
 ('hamlet', 0.44558510189758965),
 ('chinese', 0.44507424620321018),
 ('welcome', 0.44438052435783792),
 ('birth', 0.44368632092836219),
 ('represents', 0.44320543609101143),
 ('puts', 0.44279106572085081),
 ('visuals', 0.44183275227903923),
 ('fame', 0.44183275227903923),
 ('closer', 0.44183275227903923),
 ('web', 0.44183275227903923),
 ('criminal', 0.4412745608048752),
 ('minor', 0.4409224199448939),
 ('jon', 0.44086703515908027),
 ('liked', 0.44074991514020723),
 ('restaurant', 0.44031183943833246),
 ('de', 0.43983275161237217),
 ('flaws', 0.43983275161237217),
 ('searching', 0.4393666597838457),
 ('rap', 0.43891304217570443),
 ('light', 0.43884433018199892),
 ('elizabeth', 0.43872232986464682),
 ('marry', 0.43861731542506488),
 ('learned', 0.43825493093115531),
 ('controversial', 0.43825493093115531),
 ('oz', 0.43825493093115531),
 ('slowly', 0.43785660389939979),
 ('comedic', 0.43721380642274466),
 ('wayne', 0.43721380642274466),
 ('thrilling', 0.43721380642274466),
 ('bridge', 0.43721380642274466),
 ('married', 0.43658501682196887),
 ('nazi', 0.4361020775700542),
 ('murder', 0.4353180712578455),
 ('physical', 0.4353180712578455),
 ('johnny', 0.43483971678806865),
 ('michelle', 0.43445264498141672),
 ('wallace', 0.43403848055222038),
 ('comedies', 0.43395706390247063),
 ('silent', 0.43395706390247063),
 ('played', 0.43387244114515305),
 ('international', 0.43363598507486073),
 ('vision', 0.43286408229627887),
 ('intelligent', 0.43196704885367099),
 ('shop', 0.43078291609245434),
 ('also', 0.43036720209769169),
 ('levels', 0.4302451371066513),
 ('miss', 0.43006426712153217),
 ('movement', 0.4295626596872249),
 ...]

In [12]:
# words most frequently seen in a review with a "NEGATIVE" label
list(reversed(pos_neg_ratios.most_common()))[0:30]


Out[12]:
[('boll', -4.0778152602708904),
 ('uwe', -3.9218753018711578),
 ('seagal', -3.3202501058581921),
 ('unwatchable', -3.0269848170580955),
 ('stinker', -2.9876839403711624),
 ('mst', -2.7753833211707968),
 ('incoherent', -2.7641396677532537),
 ('unfunny', -2.5545257844967644),
 ('waste', -2.4907515123361046),
 ('blah', -2.4475792789485005),
 ('horrid', -2.3715779644809971),
 ('pointless', -2.3451073877136341),
 ('atrocious', -2.3187369339642556),
 ('redeeming', -2.2667790015910296),
 ('prom', -2.2601040980178784),
 ('drivel', -2.2476029585766928),
 ('lousy', -2.2118080125207054),
 ('worst', -2.1930856334332267),
 ('laughable', -2.172468615469592),
 ('awful', -2.1385076866397488),
 ('poorly', -2.1326133844207011),
 ('wasting', -2.1178155545614512),
 ('remotely', -2.111046881095167),
 ('existent', -2.0024805005437076),
 ('boredom', -1.9241486572738005),
 ('miserably', -1.9216610938019989),
 ('sucks', -1.9166645809588516),
 ('uninspired', -1.9131499212248517),
 ('lame', -1.9117232884159072),
 ('insult', -1.9085323769376259)]

Transforming Text into Numbers


In [13]:
from IPython.display import Image

review = "This was a horrible, terrible movie."

Image(filename='sentiment_network.png')


Out[13]:

In [14]:
review = "The movie was excellent"

Image(filename='sentiment_network_pos.png')


Out[14]:

Project 2: Creating the Input/Output Data


In [15]:
vocab = set(total_counts.keys())
vocab_size = len(vocab)
print(vocab_size)


74074

In [16]:
list(vocab)


Out[16]:
['',
 'stage',
 'yuen',
 'balder',
 'timers',
 'mask',
 'muro',
 'abromowitz',
 'partly',
 'joies',
 'azar',
 'ddr',
 'germane',
 'bllsosopher',
 'dissolve',
 'breathing',
 'tableau',
 'prosthetic',
 'taurus',
 'gleamed',
 'diverge',
 'nighttime',
 'homelessness',
 'thanatopsis',
 'untreated',
 'doctrines',
 'goodloe',
 'rhythm',
 'substandard',
 'tentatively',
 'underlying',
 'whittier',
 'pico',
 'peopled',
 'bullsh',
 'pesky',
 'yale',
 'foulata',
 'hyperkinetic',
 'scholl',
 'laughometer',
 'oren',
 'suprising',
 'cans',
 'lecturing',
 'umber',
 'forgery',
 'autonomous',
 'indigestible',
 'chides',
 'reclamation',
 'wardens',
 'footed',
 'unilaterally',
 'affter',
 'ferber',
 'portrayals',
 'allows',
 'extracurricular',
 'neo',
 'washing',
 'ukraine',
 'miryang',
 'annick',
 'reckless',
 'blissfully',
 'tsu',
 'denison',
 'headache',
 'paypal',
 'louque',
 'traced',
 'relegates',
 'loiret',
 'ropers',
 'unwinds',
 'aito',
 'dashingly',
 'racist',
 'fondly',
 'frostbite',
 'vampiros',
 'repulsed',
 'predicated',
 'forsa',
 'flitty',
 'sunekosuri',
 'vampyr',
 'oless',
 'nuke',
 'punky',
 'sawney',
 'upsets',
 'expels',
 'dena',
 'kiva',
 'squeazy',
 'penal',
 'dartboard',
 'boarders',
 'mnm',
 'mrquez',
 'perversions',
 'aggrandizing',
 'brokovich',
 'dependent',
 'pursuing',
 'familiarized',
 'marchal',
 'raju',
 'bogarts',
 'panes',
 'caitlin',
 'paarthale',
 'recur',
 'warping',
 'bradycardia',
 'arcadia',
 'intergender',
 'subterranean',
 'assistant',
 'unscheduled',
 'ozporns',
 'liner',
 'aragorn',
 'lonliness',
 'tashy',
 'corleone',
 'bombshell',
 'companionship',
 'ricci',
 'solves',
 'isint',
 'underflowing',
 'pransky',
 'internalist',
 'liaison',
 'teletype',
 'wile',
 'programmation',
 'applause',
 'unmated',
 'hassett',
 'achterbusch',
 'irk',
 'bloodbath',
 'explorations',
 'dearies',
 'rocco',
 'homework',
 'addresses',
 'scales',
 'yul',
 'engine',
 'unchoreographed',
 'talented',
 'ruler',
 'maude',
 'preferences',
 'punsley',
 'reentered',
 'ditches',
 'skis',
 'tribbiani',
 'normal',
 'bryans',
 'varhola',
 'seam',
 'coates',
 'clavell',
 'harping',
 'chipped',
 'sages',
 'abolition',
 'medias',
 'megalomania',
 'masina',
 'peeves',
 'bohlen',
 'disdainful',
 'cucumbers',
 'vehicles',
 'excepting',
 'fizzly',
 'treads',
 'stopovers',
 'kumai',
 'carabiners',
 'reconnoitering',
 'psychoanalytical',
 'novarro',
 'squirmish',
 'carfully',
 'spruced',
 'reid',
 'esha',
 'unknowns',
 'communicable',
 'poundage',
 'cartwright',
 'homoeroticism',
 'peyote',
 'neutrality',
 'reefer',
 'premedical',
 'alekos',
 'schnook',
 'quotation',
 'rashly',
 'ingenue',
 'keenan',
 'hagia',
 'studding',
 'amusements',
 'critic',
 'worshiper',
 'psychokinetic',
 'braking',
 'capo',
 'whisking',
 'mc',
 'hou',
 'basis',
 'aniston',
 'screwee',
 'followings',
 'breakaway',
 'gharlie',
 'reichskanzler',
 'pebble',
 'discotheque',
 'huntsbery',
 'grueling',
 'wilmington',
 'insurgency',
 'gaa',
 'personifies',
 'poodles',
 'er',
 'solutions',
 'larraz',
 'em',
 'slouches',
 'raducanu',
 'avenues',
 'magnified',
 'pear',
 'swamps',
 'braslia',
 'wrinkling',
 'bernal',
 'giza',
 'craig',
 'hof',
 'giordano',
 'munchkin',
 'dough',
 'leery',
 'crucifixion',
 'posturing',
 'riveting',
 'defectives',
 'transpose',
 'cajoling',
 'combines',
 'livery',
 'mining',
 'wong',
 'poldi',
 'perdition',
 'daw',
 'bloopers',
 'defacing',
 'euthanasiarist',
 'outrages',
 'gfx',
 'goodluck',
 'pnico',
 'asks',
 'honored',
 'doofuses',
 'indigineous',
 'bldy',
 'paint',
 'weeny',
 'dailey',
 'wolfpack',
 'supplanted',
 'kiera',
 'hairbrained',
 'teleportation',
 'sense',
 'yiiii',
 'inject',
 'flamboyant',
 'ahlberg',
 'puszta',
 'lorean',
 'fiers',
 'shallow',
 'charteris',
 'glitxy',
 'sinclair',
 'kindegarden',
 'refusals',
 'leonidas',
 'undeserved',
 'jensen',
 'sabretooth',
 'vitriolic',
 'bereaved',
 'fishtail',
 'camaraderie',
 'questmaster',
 'adverse',
 'impostor',
 'coaxing',
 'videotaping',
 'orchidea',
 'hedaya',
 'bell',
 'delpy',
 'brit',
 'lawnmowerman',
 'calculating',
 'phoned',
 'container',
 'resistant',
 'proprietress',
 'vodyanoi',
 'leashes',
 'benzedrine',
 'lenghts',
 'painkillers',
 'dreams',
 'zabriskie',
 'harleys',
 'foundationally',
 'lassie',
 'trustees',
 'ducks',
 'workers',
 'cough',
 'sizing',
 'cardos',
 'dong',
 'uniforms',
 'acquitted',
 'bohnen',
 'slightyly',
 'surfaced',
 'diced',
 'lashley',
 'shotgunning',
 'submerges',
 'centrepiece',
 'perron',
 'fundamental',
 'sizzling',
 'undefeated',
 'sprinkle',
 'speckle',
 'teller',
 'moviefreak',
 'skaal',
 'raindeer',
 'ironhead',
 'uncompromizing',
 'lamonte',
 'laguna',
 'cryptozoology',
 'mohamed',
 'sllskapsresan',
 'pesce',
 'walder',
 'espionage',
 'seams',
 'necklace',
 'reviles',
 'provisions',
 'butter',
 'fledgling',
 'revamped',
 'xvid',
 'transmits',
 'bronsan',
 'swirls',
 'mindy',
 'tethered',
 'redid',
 'gathered',
 'griffen',
 'sabrian',
 'jurking',
 'swindlers',
 'bettering',
 'triviata',
 'dread',
 'wilding',
 'mojo',
 'disrepair',
 'ruptured',
 'circuits',
 'analyzing',
 'wirsching',
 'escaping',
 'sickingly',
 'splitting',
 'gft',
 'licencing',
 'frock',
 'lyoko',
 'males',
 'franklin',
 'vaitongi',
 'sightless',
 'bmx',
 'viewability',
 'conditional',
 'burstingly',
 'chauvinistic',
 'bergerac',
 'operetta',
 'grungy',
 'broadbent',
 'levens',
 'eaves',
 'expansionist',
 'casablanka',
 'oneself',
 'excessiveness',
 'keitel',
 'honolulu',
 'horrifying',
 'stupefying',
 'weekdays',
 'eyebrow',
 'gratefulness',
 'mere',
 'finals',
 'cannible',
 'dozing',
 'salaries',
 'prescience',
 'bashings',
 'liken',
 'lenoire',
 'americaness',
 'staunchly',
 'gruff',
 'silliest',
 'bleek',
 'circumlocution',
 'fearlessly',
 'hit',
 'vays',
 'randolph',
 'long',
 'matarazzo',
 'dorsey',
 'rediculas',
 'gao',
 'doones',
 'iglesia',
 'torin',
 'songwriters',
 'plentiful',
 'horsecocky',
 'dreufuss',
 'dicky',
 'esq',
 'besco',
 'underused',
 'forerunner',
 'dreamgirl',
 'gaining',
 'rereads',
 'platters',
 'franciosa',
 'legacy',
 'carlita',
 'repartees',
 'decimation',
 'borel',
 'poach',
 'aces',
 'reorganized',
 'purrs',
 'shockers',
 'campesinos',
 'rohal',
 'volunteered',
 'pathedic',
 'sayings',
 'putty',
 'isham',
 'iwas',
 'wretched',
 'lovelier',
 'cartooned',
 'depressive',
 'sissily',
 'moe',
 'infringement',
 'fairview',
 'artificial',
 'plotholes',
 'konchalovsky',
 'himbut',
 'correspondence',
 'imagination',
 'bancroft',
 'outpost',
 'sbardellati',
 'scob',
 'timeshifts',
 'tenacity',
 'labourer',
 'unclever',
 'deniers',
 'narrtor',
 'marathan',
 'peculating',
 'bridges',
 'quinnn',
 'chewed',
 'doghi',
 'savanna',
 'hulbert',
 'sarde',
 'valenti',
 'manson',
 'glib',
 'strays',
 'when',
 'annoyingly',
 'andrei',
 'anxiety',
 'mlc',
 'ears',
 'paine',
 'rummaged',
 'musa',
 'inspected',
 'hopelessly',
 'assassinate',
 'relished',
 'joke',
 'warmhearted',
 'undefined',
 'une',
 'incorporates',
 'chee',
 'takeko',
 'ghosthouse',
 'homebase',
 'unlikley',
 'unambiguous',
 'dearest',
 'preforming',
 'group',
 'selects',
 'wrestles',
 'moravia',
 'mears',
 'gaita',
 'completest',
 'joel',
 'highlights',
 'ooooohhhh',
 'launching',
 'snorting',
 'cruiser',
 'weingartner',
 'beans',
 'brion',
 'deadlier',
 'couldve',
 'descents',
 'inferno',
 'vining',
 'westwood',
 'gibs',
 'gundam',
 'pining',
 'mates',
 'tickling',
 'appoint',
 'overabundance',
 'mnica',
 'deadfall',
 'aspires',
 'twinned',
 'bitsmidohio',
 'vctor',
 'peak',
 'gamers',
 'interactive',
 'decree',
 'formosa',
 'undressed',
 'individuation',
 'cabo',
 'seboipepe',
 'ryoko',
 'friels',
 'unbounded',
 'rajnikant',
 'freaky',
 'ompuri',
 'hallmark',
 'glamourous',
 'klok',
 'calmly',
 'attracted',
 'powermaster',
 'lyricists',
 'dissing',
 'portfolios',
 'shakily',
 'stair',
 'document',
 'unforgettable',
 'sociable',
 'vrsel',
 'backlash',
 'skitters',
 'crapo',
 'nicholls',
 'alta',
 'violation',
 'bedevils',
 'potion',
 'italia',
 'seiing',
 'torpedos',
 'tirith',
 'templates',
 'limbs',
 'solver',
 'stationary',
 'malfique',
 'denys',
 'coulthard',
 'schygulla',
 'emannuelle',
 'bunuel',
 'xu',
 'mon',
 'xd',
 'pb',
 'consider',
 'pianist',
 'risks',
 'dahl',
 'beachcomber',
 'repairs',
 'jing',
 'strobes',
 'crediblity',
 'canvas',
 'torments',
 'despicable',
 'philbin',
 'histrionics',
 'awsomeness',
 'bleed',
 'bickering',
 'finishing',
 'von',
 'motormouth',
 'leclerc',
 'dharmendra',
 'globally',
 'exhooker',
 'illuminations',
 'showiest',
 'norris',
 'seselj',
 'denominator',
 'il',
 'spanishness',
 'vandalizing',
 'mch',
 'trample',
 'cleve',
 'litters',
 'lifeblood',
 'entrusted',
 'cc',
 'coroner',
 'lahaye',
 'deludes',
 'wishbone',
 'sari',
 'withdrawal',
 'accentuate',
 'klan',
 'tain',
 'bronco',
 'jovan',
 'lidsville',
 'dodesukaden',
 'lexus',
 'snyder',
 'raves',
 'striped',
 'pupi',
 'bravo',
 'uno',
 'saving',
 'empathized',
 'goetter',
 'regimental',
 'sprawling',
 'aranoa',
 'floundered',
 'trifecta',
 'powerglove',
 'hifi',
 'franfreako',
 'goodnik',
 'gillette',
 'byronic',
 'pollak',
 'polution',
 'grammatically',
 'insurgents',
 'apaches',
 'gall',
 'sneaking',
 'pout',
 'gull',
 'siddons',
 'zavet',
 'knockdown',
 'supports',
 'hampeita',
 'tripods',
 'hito',
 'philanthropic',
 'punks',
 'clytemenstra',
 'kinski',
 'cherri',
 'mantis',
 'smartest',
 'uninjured',
 'seagoing',
 'faustino',
 'hig',
 'simpons',
 'ethan',
 'gumshoe',
 'sunnydale',
 'youknowwhat',
 'piece',
 'compelling',
 'instigator',
 'pollyanna',
 'sirbossman',
 'quayle',
 'rissole',
 'gaslit',
 'vomited',
 'roadster',
 'plastic',
 'salkow',
 'thad',
 'rosenstrasse',
 'yall',
 'tamo',
 'herod',
 'vivacious',
 'rhinos',
 'applewhite',
 'originators',
 'hypnotising',
 'bulgakov',
 'tottering',
 'vilifies',
 'gnash',
 'sophisticate',
 'spheres',
 'sprocket',
 'weeks',
 'citizenx',
 'ist',
 'viren',
 'compute',
 'deteriorate',
 'popularize',
 'enterntainment',
 'at',
 'proposition',
 'filmstiftung',
 'assael',
 'terribly',
 'normand',
 'ritual',
 'tame',
 'threateningly',
 'classrooms',
 'shite',
 'flimsily',
 'artists',
 'sandbag',
 'horowitz',
 'removes',
 'hoofer',
 'biggest',
 'anathema',
 'shattering',
 'twists',
 'comas',
 'parameters',
 'berliner',
 'vaticani',
 'dolly',
 'crypts',
 'squirrels',
 'flubbing',
 'yeccch',
 'findlay',
 'personae',
 'rectitude',
 'dnouement',
 'indisputably',
 'arithmetic',
 'nebot',
 'geeeee',
 'rampantly',
 'fickleness',
 'natassia',
 'jellybean',
 'formulae',
 'scorning',
 'robald',
 'lurching',
 'petter',
 'ivanek',
 'zombiefest',
 'hunnicutt',
 'contrived',
 'sags',
 'israelis',
 'earner',
 'zaara',
 'booker',
 'bergre',
 'plaudits',
 'gubra',
 'plex',
 'lecter',
 'hurrrts',
 'zapp',
 'police',
 'pocketbooks',
 'doctoral',
 'yabba',
 'speeds',
 'shauvians',
 'juxtaposed',
 'eastman',
 'integrates',
 'starfucker',
 'pursuant',
 'authority',
 'shlocky',
 'swooshes',
 'shovel',
 'cannavale',
 'avjo',
 'assess',
 'stucco',
 'completetly',
 'waved',
 'irrepressible',
 'distractive',
 'interiors',
 'alps',
 'scorer',
 'tetsukichi',
 'dried',
 'micah',
 'patient',
 'emminently',
 'arrgh',
 'trickling',
 'aimanov',
 'farily',
 'deitrich',
 'whorde',
 'orca',
 'leaped',
 'linguistically',
 'extreamely',
 'fbl',
 'prem',
 'blanc',
 'rearrange',
 'salgueiro',
 'channels',
 'chris',
 'feij',
 'lapsed',
 'sensible',
 'boyum',
 'bases',
 'haywood',
 'chikatilo',
 'apollonia',
 'contactable',
 'clenched',
 'aborigines',
 'negativistic',
 'mochrie',
 'piggy',
 'twoooooooo',
 'suchet',
 'looping',
 'dasilva',
 'privilege',
 'sooooo',
 'juliana',
 'chapin',
 'depreciative',
 'lomas',
 'bop',
 'jetee',
 'pausing',
 'peephole',
 'hassadeevichit',
 'intoxication',
 'babied',
 'greengrass',
 'steelcrafts',
 'astrogators',
 'ensure',
 'pandora',
 'excution',
 'handmade',
 'kikabidze',
 'fetching',
 'liferaft',
 'transpires',
 'stroh',
 'hillman',
 'jembs',
 'deco',
 'biased',
 'fassbinder',
 'envelopes',
 'mumford',
 'fugace',
 'blinds',
 'formats',
 'roscoe',
 'yokels',
 'kirsty',
 'crossfire',
 'mistaken',
 'captivating',
 'replies',
 'fratelli',
 'sarafina',
 'mn',
 'plod',
 'daines',
 'cheeni',
 'conquerors',
 'budding',
 'exterminating',
 'carefully',
 'corporation',
 'ideologically',
 'halpin',
 'vfx',
 'conaughey',
 'floating',
 'belivably',
 'adoption',
 'sweaters',
 'favourably',
 'readable',
 'female',
 'western',
 'infinity',
 'uncharismatic',
 'idiotized',
 'ronnie',
 'examined',
 'atmospheres',
 'perspiring',
 'cookers',
 'courtesan',
 'mostof',
 'format',
 'polonius',
 'asphyxiated',
 ...]

In [17]:
import numpy as np

layer_0 = np.zeros((1,vocab_size))
layer_0


Out[17]:
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.]])

In [18]:
from IPython.display import Image
Image(filename='sentiment_network.png')


Out[18]:

In [48]:
word2index = {}

for i,word in enumerate(vocab):
    word2index[word] = i
word2index


Out[48]:
{'': 0,
 'inhabitants': 1,
 'goku': 2,
 'stunts': 3,
 'catepillar': 4,
 'kristensen': 5,
 'goddess': 7,
 'offing': 49797,
 'distroy': 8,
 'unexplainably': 9,
 'concoctions': 10,
 'petite': 11,
 'paramilitary': 24759,
 'scribe': 12,
 'stevson': 13,
 'senegal': 6,
 'sctv': 14,
 'soundscape': 15,
 'rana': 16,
 'immortalizer': 18,
 'rene': 67354,
 'eko': 23,
 'planning': 20,
 'akiva': 21,
 'plod': 22,
 'orderly': 24,
 'zeleznice': 25,
 'critize': 29,
 'baguettes': 25649,
 'jefferies': 30,
 'uncertainties': 61695,
 'mountainbillies': 31,
 'steinbichler': 32,
 'vowel': 33,
 'rafe': 34,
 'donig': 68719,
 'tulipe': 36,
 'clot': 37,
 'hack': 12526,
 'distended': 38,
 'cornered': 37116,
 'impatiently': 40,
 'batrice': 12525,
 'unfortuntly': 41,
 'lung': 42,
 'scapegoats': 43,
 'pscychosexual': 45,
 'outbid': 46,
 'obit': 47,
 'sideshows': 48,
 'jugde': 49,
 'kevloun': 51,
 'quartier': 53,
 'harp': 61948,
 'unravelling': 54,
 'antiques': 56,
 'strutts': 57,
 'tilts': 58,
 'disconcert': 59,
 'dossiers': 60,
 'sorriest': 61,
 'craftsman': 49412,
 'blart': 62,
 'dependence': 37120,
 'sated': 61698,
 'iberia': 63,
 'sagan': 72,
 'frmann': 65,
 'daniell': 66,
 'rays': 67,
 'pried': 68,
 'khoobsurat': 69,
 'leavitt': 70,
 'caiano': 71,
 'attractiveness': 73,
 'kitaparaporn': 74,
 'hamilton': 75,
 'massages': 76,
 'horgan': 78,
 'chemist': 79,
 'audrey': 80,
 'yeow': 55655,
 'jana': 81,
 'dutch': 82,
 'pinchot': 24773,
 'override': 83,
 'dwervick': 63223,
 'spasms': 84,
 'resumed': 85,
 'tamale': 66259,
 'calibanian': 49636,
 'stinson': 86,
 'widows': 87,
 'stonewall': 88,
 'palatial': 89,
 'neuman': 90,
 'abandon': 91,
 'lemmings': 65314,
 'anglophile': 92,
 'ertha': 61706,
 'chevette': 94,
 'unscary': 95,
 'spoilerific': 97,
 'neworleans': 67639,
 'metamorphose': 17,
 'brigand': 99,
 'cheating': 41603,
 'clued': 101,
 'dermatonecrotic': 102,
 'grady': 103,
 'mulligan': 104,
 'ol': 105,
 'incubation': 107,
 'plaintiffs': 110,
 'snden': 109,
 'fk': 111,
 'deply': 112,
 'franchot': 113,
 'henstridge': 19,
 'cyhper': 114,
 'verbose': 26,
 'mazovia': 116,
 'elizabeth': 117,
 'palestine': 118,
 'robby': 119,
 'wongo': 120,
 'moshing': 121,
 'mstified': 12543,
 'eeeee': 122,
 'doltish': 123,
 'bree': 124,
 'postponed': 125,
 'debacles': 127,
 'amplify': 27,
 'kamm': 128,
 'phantom': 18893,
 'boylen': 136,
 'rolando': 131,
 'premises': 133,
 'bruck': 134,
 'loosely': 135,
 'wodehousian': 139,
 'onishi': 70389,
 'encapsuling': 140,
 'partly': 141,
 'stadling': 144,
 'calms': 143,
 'darkie': 148,
 'wheeling': 147,
 'ursla': 15875,
 'subsidized': 49420,
 'mckellar': 149,
 'ooookkkk': 151,
 'milky': 152,
 'unfolded': 153,
 'degrades': 154,
 'authenticating': 155,
 'writeup': 12548,
 'rotheroe': 156,
 'beart': 157,
 'intoxicants': 160,
 'grispin': 159,
 'cannes': 61718,
 'antithetical': 70398,
 'nnette': 161,
 'tsukamoto': 163,
 'antwones': 44205,
 'stows': 164,
 'suddenness': 165,
 'vol': 61720,
 'waqt': 166,
 'camazotz': 168,
 'paps': 55042,
 'shakher': 170,
 'terminate': 63868,
 'kotex': 56419,
 'delinquency': 171,
 'bromwell': 25214,
 'insecticide': 173,
 'charlton': 174,
 'nakada': 177,
 'titted': 24791,
 'urbane': 178,
 'depicted': 54491,
 'sadomasochistic': 179,
 'hyping': 181,
 'yr': 182,
 'hebert': 183,
 'waxwork': 12990,
 'deathrow': 185,
 'nourishes': 24792,
 'unmediated': 187,
 'tamper': 37143,
 'soad': 190,
 'alphabet': 189,
 'donen': 191,
 'lord': 192,
 'recess': 193,
 'watchably': 61023,
 'handsome': 194,
 'vignettes': 196,
 'pairings': 198,
 'uselful': 199,
 'sanders': 200,
 'outbursts': 72891,
 'nots': 201,
 'hatsumomo': 202,
 'actioned': 18292,
 'krimi': 24797,
 'appleby': 203,
 'tampax': 204,
 'sprinkling': 205,
 'defacing': 206,
 'lofty': 207,
 'verger': 213,
 'tablespoons': 211,
 'bernhard': 212,
 'goosebump': 64565,
 'acumen': 214,
 'percentages': 215,
 'wendingo': 216,
 'resonating': 217,
 'vntoarea': 218,
 'redundancies': 219,
 'strictly': 57081,
 'pitied': 221,
 'belying': 222,
 'michelangelo': 53153,
 'gleefulness': 223,
 'environmentalist': 24803,
 'gitane': 226,
 'corrected': 66547,
 'journalist': 227,
 'focusing': 228,
 'plethora': 229,
 'his': 39,
 'citizen': 230,
 'south': 55579,
 'clunkers': 232,
 'pendulous': 55991,
 'mounds': 24805,
 'deplorable': 233,
 'forgive': 234,
 'proplems': 235,
 'bankers': 237,
 'aqua': 238,
 'donated': 239,
 'disbelieving': 240,
 'acomplication': 241,
 'contrasted': 243,
 'muzzle': 44,
 'amphibians': 72141,
 'springs': 246,
 'reformatted': 49443,
 'toolbox': 247,
 'contacting': 248,
 'washrooms': 250,
 'raving': 251,
 'dynamism': 252,
 'mae': 253,
 'disharmony': 255,
 'molls': 72979,
 'dewaere': 12569,
 'untutored': 256,
 'icarus': 257,
 'taint': 258,
 'kargil': 259,
 'captain': 260,
 'paucity': 261,
 'fits': 262,
 'tumbles': 263,
 'amer': 264,
 'bueller': 265,
 'cleansed': 267,
 'shara': 269,
 'humma': 270,
 'outa': 272,
 'piglets': 273,
 'gombell': 274,
 'supermen': 275,
 'superlow': 276,
 'kubanskie': 280,
 'goode': 278,
 'disorganised': 45570,
 'zenith': 281,
 'ananda': 282,
 'matlin': 284,
 'particolare': 50,
 'presumptuous': 286,
 'rerun': 287,
 'toyko': 288,
 'bilb': 291,
 'sundry': 290,
 'fugly': 292,
 'orchestrating': 293,
 'prosaically': 294,
 'moveis': 296,
 'conelly': 297,
 'estrange': 298,
 'elfriede': 49455,
 'masterful': 52,
 'seasonings': 300,
 'quincey': 303,
 'frowning': 49456,
 'painkillers': 53444,
 'high': 25515,
 'flesh': 304,
 'tootsie': 305,
 'ai': 306,
 'tenma': 307,
 'duguay': 71257,
 'appropriations': 308,
 'ides': 310,
 'rui': 61734,
 'surrogacy': 311,
 'pungent': 312,
 'damaso': 314,
 'authoritarian': 61736,
 'caribou': 315,
 'ro': 318,
 'supplying': 317,
 'yuy': 319,
 'debuted': 321,
 'mounts': 323,
 'interpolated': 324,
 'aetv': 325,
 'plummer': 326,
 'asunder': 331,
 'airfix': 333,
 'dubiel': 329,
 'clavichord': 330,
 'crafty': 50465,
 'sublety': 332,
 'stoltzfus': 334,
 'ruth': 335,
 'fluorescent': 336,
 'improves': 337,
 'russells': 339,
 'tick': 43838,
 'zsa': 341,
 'macs': 343,
 'jlb': 345,
 'locus': 348,
 'mislead': 349,
 'merly': 49461,
 'corey': 350,
 'blundered': 351,
 'humourless': 3568,
 'disorganized': 353,
 'discuss': 354,
 'sharifi': 45391,
 'tieing': 356,
 'kats': 34784,
 'bbc': 360,
 'pranked': 362,
 'superman': 363,
 'holroyd': 9223,
 'aggravated': 364,
 'rifleman': 365,
 'yvone': 366,
 'vaugier': 24820,
 'radiant': 367,
 'galico': 368,
 'debris': 369,
 'btw': 371,
 'denote': 24822,
 'havnt': 372,
 'francen': 373,
 'chattered': 374,
 'scathed': 375,
 'pic': 376,
 'ceremonies': 377,
 'everyplace': 65309,
 'betsy': 379,
 'finster': 37176,
 'meercat': 381,
 'noirs': 382,
 'grunts': 383,
 'tribulations': 385,
 'apparatus': 47673,
 'martnez': 25825,
 'telethons': 24825,
 'talladega': 387,
 'alloimono': 390,
 'situations': 64,
 'scrutinising': 391,
 'geta': 392,
 'beltrami': 393,
 'pvc': 394,
 'horse': 395,
 'tiburon': 396,
 'huitime': 397,
 'ripple': 398,
 'exceed': 61748,
 'loitering': 399,
 'forensics': 400,
 'nearly': 401,
 'ellington': 403,
 'uzi': 404,
 'rung': 408,
 'pillaged': 24829,
 'gao': 409,
 'licitates': 410,
 'protocol': 411,
 'smirker': 412,
 'torin': 413,
 'vizier': 31853,
 'newlywed': 414,
 'dismay': 416,
 'moonwalks': 418,
 'skyler': 417,
 'invested': 18455,
 'grifter': 421,
 'undersold': 422,
 'chearator': 423,
 'marino': 424,
 'scala': 425,
 'conditioner': 426,
 'lamarre': 428,
 'figueroa': 429,
 'mcinnerny': 61753,
 'allllllll': 431,
 'slide': 432,
 'lateness': 433,
 'selbst': 434,
 'dramatizing': 436,
 'doable': 438,
 'hollywoodize': 27207,
 'alexanderplatz': 440,
 'wholesome': 45745,
 'pandemonium': 441,
 'earth': 443,
 'mounties': 444,
 'seeker': 445,
 'cheat': 446,
 'outbreaks': 447,
 'savagely': 61759,
 'snowstorm': 448,
 'baur': 449,
 'schedules': 450,
 'bathetic': 451,
 'johnathon': 453,
 'origonal': 57843,
 'rosanne': 454,
 'cauldrons': 456,
 'forrest': 457,
 'poky': 458,
 'aristos': 54856,
 'womanness': 460,
 'spender': 461,
 'pagliai': 37108,
 'rational': 463,
 'terrell': 464,
 'affronts': 472,
 'concise': 49476,
 'mathew': 468,
 'narnia': 469,
 'naseeruddin': 470,
 'bucks': 471,
 'proceeds': 69809,
 'topple': 473,
 'degree': 474,
 'passionately': 476,
 'defeats': 477,
 'gras': 49477,
 'sources': 479,
 'pflug': 49976,
 'botticelli': 480,
 'fwd': 486,
 'waiving': 483,
 'gunnar': 484,
 'stiffler': 485,
 'unwise': 49480,
 'kawajiri': 487,
 'sistahs': 489,
 'swallowed': 30511,
 'soulhunter': 490,
 'belies': 491,
 'wrathful': 492,
 'badmouth': 16696,
 'floradora': 61766,
 'unforgivably': 497,
 'weirdy': 496,
 'violation': 63309,
 'chepart': 498,
 'departmentthe': 500,
 'posehn': 49483,
 'peyote': 37188,
 'psychiatrically': 24846,
 'marionettes': 503,
 'blatty': 502,
 'atop': 504,
 'debases': 25135,
 'henze': 24845,
 'unrooted': 510,
 'cloudscape': 508,
 'resignedly': 509,
 'begin': 49917,
 'hitlerian': 512,
 'reedus': 517,
 'crewed': 514,
 'bedeviled': 515,
 'unfurnished': 516,
 'herrmann': 12602,
 'circumstances': 518,
 'grasped': 519,
 'fn': 521,
 'beefed': 22200,
 'scwatch': 64018,
 'dishwashers': 522,
 'roadie': 523,
 'ruthlessness': 524,
 'migrant': 12605,
 'refrains': 525,
 'preponderance': 44377,
 'lampooning': 526,
 'richart': 528,
 'gwenneth': 530,
 'enmity': 531,
 'vortex': 61772,
 'assess': 532,
 'manufacturer': 533,
 'bullosa': 534,
 'citizenship': 61774,
 'chekov': 537,
 'hogan': 536,
 'blithe': 538,
 'aredavid': 542,
 'drillings': 540,
 'revolvers': 541,
 'boyfriendhe': 545,
 'achcha': 544,
 'wallow': 546,
 'toga': 547,
 'bosnians': 551,
 'going': 550,
 'willy': 552,
 'fim': 554,
 'forbidding': 555,
 'delete': 56779,
 'rationalised': 557,
 'shimomo': 558,
 'opposition': 559,
 'landis': 560,
 'minded': 561,
 'arghhhhh': 564,
 'trialat': 566,
 'protected': 567,
 'negras': 568,
 'tracker': 571,
 'muti': 570,
 'dinky': 49489,
 'shawl': 572,
 'differentiates': 573,
 'dipaolo': 61779,
 'sweetheart': 574,
 'manmohan': 576,
 'enamored': 66265,
 'trevethyn': 577,
 'brain': 578,
 'incomprehensibly': 579,
 'pasadena': 581,
 'bruton': 59142,
 'shtick': 582,
 'ute': 583,
 'viggo': 584,
 'relevent': 589,
 'cites': 587,
 'greenaways': 61781,
 'minidress': 590,
 'philosopher': 591,
 'mahattan': 593,
 'moden': 594,
 'compiling': 595,
 'unimaginative': 598,
 'rogues': 597,
 'subpaar': 599,
 'darkly': 601,
 'saturate': 602,
 'fledgling': 603,
 'breaths': 604,
 'sceam': 37206,
 'empathized': 58870,
 'aszombi': 606,
 'incalculable': 608,
 'formations': 28596,
 'hampden': 619,
 'rawail': 612,
 'forbid': 613,
 'holiness': 617,
 'unessential': 618,
 'reputedly': 616,
 'wage': 63181,
 'kewpie': 24860,
 'asylum': 620,
 'bolye': 621,
 'celticism': 63189,
 'strangers': 622,
 'rantzen': 623,
 'farrellys': 624,
 'marathon': 93,
 'cantinflas': 626,
 'disproportionately': 12617,
 'bared': 67212,
 'enshrined': 627,
 'expetations': 629,
 'replaying': 630,
 'topless': 636,
 'bukater': 632,
 'overpaid': 633,
 'exhude': 634,
 'nitwits': 638,
 'tsst': 51554,
 'sufferings': 637,
 'ci': 24693,
 'eponymously': 96,
 'ferdy': 644,
 'danira': 641,
 'unrelenting': 642,
 'disabling': 643,
 'gerard': 645,
 'drewitt': 646,
 'lamping': 650,
 'demy': 652,
 'wicklow': 37214,
 'relinquish': 651,
 'feminized': 64196,
 'drink': 653,
 'chamberlin': 654,
 'floodwaters': 657,
 'searing': 658,
 'isral': 659,
 'ling': 660,
 'grossness': 661,
 'sassier': 24865,
 'pickier': 662,
 'pax': 663,
 'fleashens': 98,
 'wierd': 664,
 'tereasa': 665,
 'smog': 666,
 'girotti': 667,
 'zooey': 64814,
 'spat': 668,
 'sera': 669,
 'misbehaving': 671,
 'scouts': 672,
 'refreshments': 673,
 'itll': 39668,
 'toyomichi': 676,
 'politeness': 100,
 'bits': 677,
 'psychotics': 678,
 'optimistic': 61796,
 'barzell': 679,
 'colt': 680,
 'anita': 49501,
 'shivering': 681,
 'utah': 59297,
 'scrivener': 686,
 'predicable': 687,
 'dryer': 684,
 'reissues': 685,
 'sexier': 26115,
 'spellbind': 691,
 'marmalade': 689,
 'seems': 690,
 'wyke': 37223,
 'innovator': 693,
 'inthused': 695,
 'scatman': 6309,
 'contestants': 696,
 'bertolucci': 106,
 'serviced': 699,
 'nozires': 700,
 'ins': 701,
 'mutilating': 702,
 'dupes': 703,
 'launius': 704,
 'widescreen': 705,
 'joo': 706,
 'discretionary': 707,
 'enlivens': 708,
 'manos': 55596,
 'bushes': 709,
 'header': 711,
 'activist': 712,
 'gethsemane': 713,
 'phoenixs': 714,
 'wreathed': 715,
 'oldboy': 108,
 'electrifyingly': 717,
 'inseparability': 24874,
 'ghidora': 719,
 'binder': 720,
 'tibet': 51530,
 'doddsville': 723,
 'sugar': 722,
 'porkys': 724,
 'hopefully': 37226,
 'scattershot': 725,
 'refunded': 726,
 'rudely': 727,
 'enacts': 67435,
 'insteadit': 728,
 'nightwatch': 61803,
 'eurotrash': 730,
 'radioraptus': 731,
 'unreservedly': 73710,
 'vall': 49508,
 'boogeman': 733,
 'flunked': 24880,
 'weighs': 734,
 'glorfindel': 738,
 'hypothermia': 737,
 'misled': 64919,
 'toiletries': 71501,
 'birthdays': 739,
 'attentive': 740,
 'mallepa': 741,
 'manoy': 743,
 'bombshells': 744,
 'glorifying': 115,
 'southron': 747,
 'destruction': 748,
 'manhole': 750,
 'elainor': 751,
 'bounder': 13003,
 'bowersock': 752,
 'lowly': 753,
 'wfst': 754,
 'limousines': 755,
 'skolimowski': 756,
 'saban': 757,
 'malaysia': 759,
 'cyd': 761,
 'bonecrushing': 763,
 'merest': 765,
 'janina': 766,
 'chemotrodes': 767,
 'trials': 768,
 'whilhelm': 770,
 'asthmatic': 771,
 'missteps': 773,
 'melyvn': 24885,
 'embittered': 774,
 'profit': 37234,
 'seeming': 776,
 'miscalculate': 777,
 'recommeded': 778,
 'mankin': 37235,
 'schoolwork': 779,
 'coy': 780,
 'mcconaughey': 781,
 'waver': 783,
 'unwatchably': 786,
 'saggy': 787,
 'breakup': 790,
 'pufnstuf': 37237,
 'superstars': 792,
 'replay': 793,
 'aggravates': 794,
 'urging': 796,
 'snidely': 797,
 'aleksandar': 798,
 'hildy': 799,
 'kazuhiro': 800,
 'slayer': 801,
 'tangy': 802,
 'horne': 804,
 'masayuki': 805,
 'molden': 806,
 'unravel': 807,
 'goodtime': 808,
 'rowboat': 811,
 'dekhiye': 815,
 'datedness': 813,
 'astrotheology': 814,
 'suriani': 59610,
 'hostilities': 819,
 'wipes': 818,
 'sentimentalising': 820,
 'documentary': 821,
 'virtue': 823,
 'unreasonably': 824,
 'cei': 826,
 'hobbled': 37240,
 'unglamorised': 827,
 'balky': 828,
 'complementary': 829,
 'paychecks': 830,
 'tughlaq': 45551,
 'functionality': 836,
 'ily': 833,
 'prc': 834,
 'ennobling': 835,
 'dissociated': 837,
 'elk': 838,
 'throbbing': 839,
 'tempe': 840,
 'linoleum': 841,
 'bottacin': 843,
 'hipper': 844,
 'barging': 846,
 'untie': 847,
 'sacchetti': 848,
 'gnat': 849,
 'roedel': 850,
 'performs': 852,
 'nanavati': 856,
 'migrs': 854,
 'teachs': 855,
 'gunslinger': 126,
 'fresco': 857,
 'davison': 858,
 'jet': 59446,
 'burglar': 860,
 'jerker': 69267,
 'masue': 861,
 'dickory': 862,
 'muggy': 46634,
 'grills': 863,
 'figment': 28693,
 'monogamistic': 49527,
 'appelagate': 864,
 'linkage': 865,
 'loesser': 867,
 'patties': 868,
 'prudent': 869,
 'mallorquins': 870,
 'nativetex': 871,
 'suprise': 872,
 'quill': 874,
 'angsty': 71451,
 'speeded': 875,
 'farscape': 876,
 'herman': 129,
 'saddening': 877,
 'centuries': 878,
 'mos': 879,
 'neccessarily': 881,
 'tankers': 883,
 'latte': 884,
 'faracy': 886,
 'stilts': 24897,
 'synthetically': 887,
 'thoughtless': 888,
 'authoring': 62813,
 'rake': 889,
 'ropes': 890,
 'whitewashed': 892,
 'donal': 893,
 'arching': 4910,
 'cockamamie': 899,
 'lifeless': 895,
 'perfidy': 896,
 'teresa': 897,
 'bulldog': 898,
 'vingh': 73726,
 'evacuees': 65858,
 'rasberries': 900,
 'chiseling': 903,
 'clampets': 905,
 'grecianized': 138,
 'smaller': 904,
 'kluznick': 62184,
 'alerts': 906,
 'aaaahhhhhhh': 909,
 'wellingtonian': 908,
 'dither': 910,
 'incertitude': 911,
 'florentine': 912,
 'imperioli': 913,
 'licking': 914,
 'disparagement': 915,
 'artfully': 916,
 'feds': 917,
 'fumiya': 918,
 'jbl': 52774,
 'tearfully': 919,
 'welfare': 24905,
 'idyllically': 49534,
 'isha': 43702,
 'lanchester': 920,
 'undertaken': 921,
 'longlost': 922,
 'netted': 923,
 'carrell': 924,
 'uncompelling': 925,
 'stems': 37258,
 'reliefs': 926,
 'leona': 927,
 'autorenfilm': 928,
 'unfriendly': 929,
 'typewriter': 930,
 'shifted': 931,
 'bertrand': 932,
 'blesses': 933,
 'leukemia': 12666,
 'posative': 142,
 'tricking': 934,
 'zanes': 936,
 'dashboard': 12667,
 'unknowingly': 937,
 'flatmates': 51897,
 'unnerve': 938,
 'caning': 939,
 'shortland': 146,
 'recluse': 941,
 'dcreasy': 942,
 'scratchiness': 24911,
 'pms': 30930,
 'chipmunk': 943,
 'tkachenko': 49537,
 'dipper': 944,
 'europeans': 61601,
 'berserkers': 948,
 'shys': 947,
 'monte': 68505,
 'eve': 949,
 'luxury': 61828,
 'conflagration': 950,
 'water': 46389,
 'irks': 951,
 'positronic': 954,
 'cushy': 150,
 'swiftness': 957,
 'underimpressed': 964,
 'imprint': 959,
 'sundance': 961,
 'aida': 31951,
 'thematically': 963,
 'uno': 965,
 'expressly': 966,
 'russkies': 967,
 'discos': 968,
 'shaping': 969,
 'verson': 970,
 'blushed': 61831,
 'prototype': 971,
 'lifewell': 976,
 'trafficker': 973,
 'crucifixions': 62188,
 'unrealistically': 975,
 'rivas': 977,
 'consequent': 978,
 'katsu': 979,
 'titantic': 980,
 'jalees': 981,
 'ranee': 982,
 'gambles': 984,
 'dispenses': 985,
 'disfigurement': 986,
 'bright': 987,
 'cristian': 988,
 'subculture': 37268,
 'capta': 991,
 'jewel': 992,
 'erect': 993,
 'avoide': 996,
 'inconnu': 997,
 'headquarters': 998,
 'babbling': 1000,
 'pac': 1001,
 'performace': 1003,
 'dorrit': 1004,
 'runners': 1005,
 'sentimentality': 1006,
 'marred': 1007,
 'commemorative': 1008,
 'helpers': 1012,
 'chiles': 1011,
 'snowy': 1013,
 'cheddar': 1014,
 'neath': 158,
 'outshine': 1016,
 'nadu': 1019,
 'wellbeing': 1020,
 'envisioned': 43779,
 'fanaticism': 1021,
 'morrisette': 12687,
 'sesame': 1024,
 'gran': 1023,
 'marlina': 1025,
 'artificiality': 1030,
 'coinsidence': 1027,
 'founders': 1028,
 'dismissably': 1029,
 'dracht': 66299,
 'scavengers': 1031,
 'neese': 12685,
 'pangborn': 1034,
 'elmore': 1039,
 'bristol': 71162,
 'lillies': 1035,
 'parkers': 1036,
 'skipped': 1038,
 'clipboard': 1042,
 'jucier': 1041,
 'haifa': 1043,
 ...}

In [49]:
def update_input_layer(review):
    
    global layer_0
    
    # clear out previous state, reset the layer to be all 0s
    layer_0 *= 0
    for word in review.split(" "):
        layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

In [33]:
layer_0


Out[33]:
array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

In [51]:
def get_target_for_label(label):
    if(label == 'POSITIVE'):
        return 1
    else:
        return 0

In [54]:
labels[0]


Out[54]:
'POSITIVE'

In [52]:
get_target_for_label(labels[0])


Out[52]:
1

In [55]:
labels[1]


Out[55]:
'NEGATIVE'

In [53]:
get_target_for_label(labels[1])


Out[53]:
0

Project 3: Building a Neural Network

  • Start with your neural network from the last chapter
  • 3 layer neural network
  • no non-linearity in hidden layer
  • use our functions to create the training data
  • create a "pre_process_data" function to create vocabulary for our training data generating functions
  • modify "train" to train over the entire corpus

Where to Get Help if You Need it


In [86]:
import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
    def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):
       
        # set our random number generator 
        np.random.seed(1)
    
        self.pre_process_data(reviews, labels)
        
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)
        
        
    def pre_process_data(self, reviews, labels):
        
        review_vocab = set()
        for review in reviews:
            for word in review.split(" "):
                review_vocab.add(word)
        self.review_vocab = list(review_vocab)
        
        label_vocab = set()
        for label in labels:
            label_vocab.add(label)
        
        self.label_vocab = list(label_vocab)
        
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        self.word2index = {}
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
         
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
    
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        self.learning_rate = learning_rate
        
        self.layer_0 = np.zeros((1,input_nodes))
    
        
    def update_input_layer(self,review):

        # clear out previous state, reset the layer to be all 0s
        self.layer_0 *= 0
        for word in review.split(" "):
            if(word in self.word2index.keys()):
                self.layer_0[0][self.word2index[word]] += 1
                
    def get_target_for_label(self,label):
        if(label == 'POSITIVE'):
            return 1
        else:
            return 0
        
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    
    
    def sigmoid_output_2_derivative(self,output):
        return output * (1 - output)
    
    def train(self, training_reviews, training_labels):
        
        assert(len(training_reviews) == len(training_labels))
        
        correct_so_far = 0
        
        start = time.time()
        
        for i in range(len(training_reviews)):
            
            review = training_reviews[i]
            label = training_labels[i]
            
            #### Implement the forward pass here ####
            ### Forward pass ###

            # Input Layer
            self.update_input_layer(review)

            # Hidden layer
            layer_1 = self.layer_0.dot(self.weights_0_1)

            # Output layer
            layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

            #### Implement the backward pass here ####
            ### Backward pass ###

            # TODO: Output error
            layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # TODO: Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

            # TODO: Update the weights
            self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
            self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

            if(np.abs(layer_2_error) < 0.5):
                correct_so_far += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
            if(i % 2500 == 0):
                print("")
    
    def test(self, testing_reviews, testing_labels):
        
        correct = 0
        
        start = time.time()
        
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                            + "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
    
    def run(self, review):
        
        # Input Layer
        self.update_input_layer(review.lower())

        # Hidden layer
        layer_1 = self.layer_0.dot(self.weights_0_1)

        # Output layer
        layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] > 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"

In [87]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

In [61]:
# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])


Progress:99.9% Speed(reviews/sec):587.5% #Correct:500 #Tested:1000 Testing Accuracy:50.0%

In [62]:
# train the network
mlp.train(reviews[:-1000],labels[:-1000])


Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):89.58 #Correct:1250 #Trained:2501 Training Accuracy:49.9%
Progress:20.8% Speed(reviews/sec):95.03 #Correct:2500 #Trained:5001 Training Accuracy:49.9%
Progress:27.4% Speed(reviews/sec):95.46 #Correct:3295 #Trained:6592 Training Accuracy:49.9%
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-62-d0f5d85ad402> in <module>()
      1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
    117             # TODO: Update the weights
    118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
    120 
    121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt: 

In [63]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.01)

In [64]:
# train the network
mlp.train(reviews[:-1000],labels[:-1000])


Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):96.39 #Correct:1247 #Trained:2501 Training Accuracy:49.8%
Progress:20.8% Speed(reviews/sec):99.31 #Correct:2497 #Trained:5001 Training Accuracy:49.9%
Progress:22.8% Speed(reviews/sec):99.02 #Correct:2735 #Trained:5476 Training Accuracy:49.9%
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-64-d0f5d85ad402> in <module>()
      1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
    117             # TODO: Update the weights
    118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
    120 
    121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt: 

In [65]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.001)

In [66]:
# train the network
mlp.train(reviews[:-1000],labels[:-1000])


Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):98.77 #Correct:1267 #Trained:2501 Training Accuracy:50.6%
Progress:20.8% Speed(reviews/sec):98.79 #Correct:2640 #Trained:5001 Training Accuracy:52.7%
Progress:31.2% Speed(reviews/sec):98.58 #Correct:4109 #Trained:7501 Training Accuracy:54.7%
Progress:41.6% Speed(reviews/sec):93.78 #Correct:5638 #Trained:10001 Training Accuracy:56.3%
Progress:52.0% Speed(reviews/sec):91.76 #Correct:7246 #Trained:12501 Training Accuracy:57.9%
Progress:62.5% Speed(reviews/sec):92.42 #Correct:8841 #Trained:15001 Training Accuracy:58.9%
Progress:69.4% Speed(reviews/sec):92.58 #Correct:9934 #Trained:16668 Training Accuracy:59.5%
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-66-d0f5d85ad402> in <module>()
      1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
    117             # TODO: Update the weights
    118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
    120 
    121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt: 

Understanding Neural Noise


In [67]:
from IPython.display import Image
Image(filename='sentiment_network.png')


Out[67]:

In [70]:
def update_input_layer(review):
    
    global layer_0
    
    # clear out previous state, reset the layer to be all 0s
    layer_0 *= 0
    for word in review.split(" "):
        layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

In [71]:
layer_0


Out[71]:
array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

In [79]:
review_counter = Counter()

In [80]:
for word in reviews[0].split(" "):
    review_counter[word] += 1

In [81]:
review_counter.most_common()


Out[81]:
[('.', 27),
 ('', 18),
 ('the', 9),
 ('to', 6),
 ('i', 5),
 ('high', 5),
 ('is', 4),
 ('of', 4),
 ('a', 4),
 ('bromwell', 4),
 ('teachers', 4),
 ('that', 4),
 ('their', 2),
 ('my', 2),
 ('at', 2),
 ('as', 2),
 ('me', 2),
 ('in', 2),
 ('students', 2),
 ('it', 2),
 ('student', 2),
 ('school', 2),
 ('through', 1),
 ('insightful', 1),
 ('ran', 1),
 ('years', 1),
 ('here', 1),
 ('episode', 1),
 ('reality', 1),
 ('what', 1),
 ('far', 1),
 ('t', 1),
 ('saw', 1),
 ('s', 1),
 ('repeatedly', 1),
 ('isn', 1),
 ('closer', 1),
 ('and', 1),
 ('fetched', 1),
 ('remind', 1),
 ('can', 1),
 ('welcome', 1),
 ('line', 1),
 ('your', 1),
 ('survive', 1),
 ('teaching', 1),
 ('satire', 1),
 ('classic', 1),
 ('who', 1),
 ('age', 1),
 ('knew', 1),
 ('schools', 1),
 ('inspector', 1),
 ('comedy', 1),
 ('down', 1),
 ('about', 1),
 ('pity', 1),
 ('m', 1),
 ('all', 1),
 ('adults', 1),
 ('see', 1),
 ('think', 1),
 ('situation', 1),
 ('time', 1),
 ('pomp', 1),
 ('lead', 1),
 ('other', 1),
 ('much', 1),
 ('many', 1),
 ('which', 1),
 ('one', 1),
 ('profession', 1),
 ('programs', 1),
 ('same', 1),
 ('some', 1),
 ('such', 1),
 ('pettiness', 1),
 ('immediately', 1),
 ('expect', 1),
 ('financially', 1),
 ('recalled', 1),
 ('tried', 1),
 ('whole', 1),
 ('right', 1),
 ('life', 1),
 ('cartoon', 1),
 ('scramble', 1),
 ('sack', 1),
 ('believe', 1),
 ('when', 1),
 ('than', 1),
 ('burn', 1),
 ('pathetic', 1)]

Project 4: Reducing Noise in our Input Data


In [82]:
import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
    def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):
       
        # set our random number generator 
        np.random.seed(1)
    
        self.pre_process_data(reviews, labels)
        
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)
        
        
    def pre_process_data(self, reviews, labels):
        
        review_vocab = set()
        for review in reviews:
            for word in review.split(" "):
                review_vocab.add(word)
        self.review_vocab = list(review_vocab)
        
        label_vocab = set()
        for label in labels:
            label_vocab.add(label)
        
        self.label_vocab = list(label_vocab)
        
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        self.word2index = {}
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
         
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
    
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        self.learning_rate = learning_rate
        
        self.layer_0 = np.zeros((1,input_nodes))
    
        
    def update_input_layer(self,review):

        # clear out previous state, reset the layer to be all 0s
        self.layer_0 *= 0
        for word in review.split(" "):
            if(word in self.word2index.keys()):
                self.layer_0[0][self.word2index[word]] = 1
                
    def get_target_for_label(self,label):
        if(label == 'POSITIVE'):
            return 1
        else:
            return 0
        
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    
    
    def sigmoid_output_2_derivative(self,output):
        return output * (1 - output)
    
    def train(self, training_reviews, training_labels):
        
        assert(len(training_reviews) == len(training_labels))
        
        correct_so_far = 0
        
        start = time.time()
        
        for i in range(len(training_reviews)):
            
            review = training_reviews[i]
            label = training_labels[i]
            
            #### Implement the forward pass here ####
            ### Forward pass ###

            # Input Layer
            self.update_input_layer(review)

            # Hidden layer
            layer_1 = self.layer_0.dot(self.weights_0_1)

            # Output layer
            layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

            #### Implement the backward pass here ####
            ### Backward pass ###

            # TODO: Output error
            layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # TODO: Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

            # TODO: Update the weights
            self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
            self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

            if(np.abs(layer_2_error) < 0.5):
                correct_so_far += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
            if(i % 2500 == 0):
                print("")
    
    def test(self, testing_reviews, testing_labels):
        
        correct = 0
        
        start = time.time()
        
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                            + "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
    
    def run(self, review):
        
        # Input Layer
        self.update_input_layer(review.lower())

        # Hidden layer
        layer_1 = self.layer_0.dot(self.weights_0_1)

        # Output layer
        layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] > 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"

In [83]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

In [84]:
mlp.train(reviews[:-1000],labels[:-1000])


Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):91.50 #Correct:1795 #Trained:2501 Training Accuracy:71.7%
Progress:20.8% Speed(reviews/sec):95.25 #Correct:3811 #Trained:5001 Training Accuracy:76.2%
Progress:31.2% Speed(reviews/sec):93.74 #Correct:5898 #Trained:7501 Training Accuracy:78.6%
Progress:41.6% Speed(reviews/sec):93.69 #Correct:8042 #Trained:10001 Training Accuracy:80.4%
Progress:52.0% Speed(reviews/sec):95.27 #Correct:10186 #Trained:12501 Training Accuracy:81.4%
Progress:62.5% Speed(reviews/sec):98.19 #Correct:12317 #Trained:15001 Training Accuracy:82.1%
Progress:72.9% Speed(reviews/sec):98.56 #Correct:14440 #Trained:17501 Training Accuracy:82.5%
Progress:83.3% Speed(reviews/sec):99.74 #Correct:16613 #Trained:20001 Training Accuracy:83.0%
Progress:93.7% Speed(reviews/sec):100.7 #Correct:18794 #Trained:22501 Training Accuracy:83.5%
Progress:99.9% Speed(reviews/sec):101.9 #Correct:20115 #Trained:24000 Training Accuracy:83.8%

In [85]:
# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])


Progress:99.9% Speed(reviews/sec):832.7% #Correct:851 #Tested:1000 Testing Accuracy:85.1%

Analyzing Inefficiencies in our Network


In [88]:
Image(filename='sentiment_network_sparse.png')


Out[88]:

In [89]:
layer_0 = np.zeros(10)

In [90]:
layer_0


Out[90]:
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [91]:
layer_0[4] = 1
layer_0[9] = 1

In [92]:
layer_0


Out[92]:
array([ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.])

In [93]:
weights_0_1 = np.random.randn(10,5)

In [94]:
layer_0.dot(weights_0_1)


Out[94]:
array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

In [101]:
indices = [4,9]

In [102]:
layer_1 = np.zeros(5)

In [103]:
for index in indices:
    layer_1 += (weights_0_1[index])

In [104]:
layer_1


Out[104]:
array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

In [100]:
Image(filename='sentiment_network_sparse_2.png')


Out[100]:

Project 5: Making our Network More Efficient


In [105]:
import time
import sys

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
    def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):
       
        np.random.seed(1)
    
        self.pre_process_data(reviews)
        
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)
        
        
    def pre_process_data(self,reviews):
        
        review_vocab = set()
        for review in reviews:
            for word in review.split(" "):
                review_vocab.add(word)
        self.review_vocab = list(review_vocab)
        
        label_vocab = set()
        for label in labels:
            label_vocab.add(label)
        
        self.label_vocab = list(label_vocab)
        
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        self.word2index = {}
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
         
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
    
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        self.learning_rate = learning_rate
        
        self.layer_0 = np.zeros((1,input_nodes))
        self.layer_1 = np.zeros((1,hidden_nodes))
        
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    
    
    def sigmoid_output_2_derivative(self,output):
        return output * (1 - output)
    
    def update_input_layer(self,review):

        # clear out previous state, reset the layer to be all 0s
        self.layer_0 *= 0
        for word in review.split(" "):
            self.layer_0[0][self.word2index[word]] = 1

    def get_target_for_label(self,label):
        if(label == 'POSITIVE'):
            return 1
        else:
            return 0
        
    def train(self, training_reviews_raw, training_labels):
        
        training_reviews = list()
        for review in training_reviews_raw:
            indices = set()
            for word in review.split(" "):
                if(word in self.word2index.keys()):
                    indices.add(self.word2index[word])
            training_reviews.append(list(indices))
        
        assert(len(training_reviews) == len(training_labels))
        
        correct_so_far = 0
        
        start = time.time()
        
        for i in range(len(training_reviews)):
            
            review = training_reviews[i]
            label = training_labels[i]
            
            #### Implement the forward pass here ####
            ### Forward pass ###

            # Input Layer

            # Hidden layer
#             layer_1 = self.layer_0.dot(self.weights_0_1)
            self.layer_1 *= 0
            for index in review:
                self.layer_1 += self.weights_0_1[index]
            
            # Output layer
            layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

            #### Implement the backward pass here ####
            ### Backward pass ###

            # Output error
            layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

            # Update the weights
            self.weights_1_2 -= self.layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
            
            for index in review:
                self.weights_0_1[index] -= layer_1_delta[0] * self.learning_rate # update input-to-hidden weights with gradient descent step

            if(np.abs(layer_2_error) < 0.5):
                correct_so_far += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
        
    
    def test(self, testing_reviews, testing_labels):
        
        correct = 0
        
        start = time.time()
        
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                            + "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
    
    def run(self, review):
        
        # Input Layer


        # Hidden layer
        self.layer_1 *= 0
        unique_indices = set()
        for word in review.lower().split(" "):
            if word in self.word2index.keys():
                unique_indices.add(self.word2index[word])
        for index in unique_indices:
            self.layer_1 += self.weights_0_1[index]
        
        # Output layer
        layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] > 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"

In [106]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

In [111]:
mlp.train(reviews[:-1000],labels[:-1000])

In [109]:
# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])


Progress:99.9% Speed(reviews/sec):1581.% #Correct:857 #Tested:1000 Testing Accuracy:85.7%

Further Noise Reduction


In [112]:
Image(filename='sentiment_network_sparse_2.png')


Out[112]:

In [113]:
# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()


Out[113]:
[('edie', 4.6913478822291435),
 ('paulie', 4.0775374439057197),
 ('felix', 3.1527360223636558),
 ('polanski', 2.8233610476132043),
 ('matthau', 2.8067217286092401),
 ('victoria', 2.6810215287142909),
 ('mildred', 2.6026896854443837),
 ('gandhi', 2.5389738710582761),
 ('flawless', 2.451005098112319),
 ('superbly', 2.2600254785752498),
 ('perfection', 2.1594842493533721),
 ('astaire', 2.1400661634962708),
 ('captures', 2.0386195471595809),
 ('voight', 2.0301704926730531),
 ('wonderfully', 2.0218960560332353),
 ('powell', 1.9783454248084671),
 ('brosnan', 1.9547990964725592),
 ('lily', 1.9203768470501485),
 ('bakshi', 1.9029851043382795),
 ('lincoln', 1.9014583864844796),
 ('refreshing', 1.8551812956655511),
 ('breathtaking', 1.8481124057791867),
 ('bourne', 1.8478489358790986),
 ('lemmon', 1.8458266904983307),
 ('delightful', 1.8002701588959635),
 ('flynn', 1.7996646487351682),
 ('andrews', 1.7764919970972666),
 ('homer', 1.7692866133759964),
 ('beautifully', 1.7626953362841438),
 ('soccer', 1.7578579175523736),
 ('elvira', 1.7397031072720019),
 ('underrated', 1.7197859696029656),
 ('gripping', 1.7165360479904674),
 ('superb', 1.7091514458966952),
 ('delight', 1.6714733033535532),
 ('welles', 1.6677068205580761),
 ('sadness', 1.663505133704376),
 ('sinatra', 1.6389967146756448),
 ('touching', 1.637217476541176),
 ('timeless', 1.62924053973028),
 ('macy', 1.6211339521972916),
 ('unforgettable', 1.6177367152487956),
 ('favorites', 1.6158688027643908),
 ('stewart', 1.6119987332957739),
 ('hartley', 1.6094379124341003),
 ('sullivan', 1.6094379124341003),
 ('extraordinary', 1.6094379124341003),
 ('brilliantly', 1.5950491749820008),
 ('friendship', 1.5677652160335325),
 ('wonderful', 1.5645425925262093),
 ('palma', 1.5553706911638245),
 ('magnificent', 1.54663701119507),
 ('finest', 1.5462590108125689),
 ('jackie', 1.5439233053234738),
 ('ritter', 1.5404450409471491),
 ('tremendous', 1.5184661342283736),
 ('freedom', 1.5091151908062312),
 ('fantastic', 1.5048433868558566),
 ('terrific', 1.5026699370083942),
 ('noir', 1.493925025312256),
 ('sidney', 1.493925025312256),
 ('outstanding', 1.4910053152089213),
 ('mann', 1.4894785973551214),
 ('pleasantly', 1.4894785973551214),
 ('nancy', 1.488077055429833),
 ('marie', 1.4825711915553104),
 ('marvelous', 1.4739999415389962),
 ('excellent', 1.4647538505723599),
 ('ruth', 1.4596256342054401),
 ('stanwyck', 1.4412101187160054),
 ('widmark', 1.4350845252893227),
 ('splendid', 1.4271163556401458),
 ('chan', 1.423108334242607),
 ('exceptional', 1.4201959127955721),
 ('tender', 1.410986973710262),
 ('gentle', 1.4078005663408544),
 ('poignant', 1.4022947024663317),
 ('gem', 1.3932148039644643),
 ('amazing', 1.3919815802404802),
 ('chilling', 1.3862943611198906),
 ('captivating', 1.3862943611198906),
 ('fisher', 1.3862943611198906),
 ('davies', 1.3862943611198906),
 ('darker', 1.3652409519220583),
 ('april', 1.3499267169490159),
 ('kelly', 1.3461743673304654),
 ('blake', 1.3418425985490567),
 ('overlooked', 1.329135947279942),
 ('ralph', 1.32818673031261),
 ('bette', 1.3156767939059373),
 ('hoffman', 1.3150668518315229),
 ('cole', 1.3121863889661687),
 ('shines', 1.3049487216659381),
 ('powerful', 1.2999662776313934),
 ('notch', 1.2950456896547455),
 ('remarkable', 1.2883688239495823),
 ('pitt', 1.286210902562908),
 ('winters', 1.2833463918674481),
 ('vivid', 1.2762934659055623),
 ('gritty', 1.2757524867200667),
 ('giallo', 1.2745029551317739),
 ('portrait', 1.2704625455947689),
 ('innocence', 1.2694300209805796),
 ('psychiatrist', 1.2685113254635072),
 ('favorite', 1.2668956297860055),
 ('ensemble', 1.2656663733312759),
 ('stunning', 1.2622417124499117),
 ('burns', 1.259880436264232),
 ('garbo', 1.258954938743289),
 ('barbara', 1.2580400255962119),
 ('panic', 1.2527629684953681),
 ('holly', 1.2527629684953681),
 ('philip', 1.2527629684953681),
 ('carol', 1.2481440226390734),
 ('perfect', 1.246742480713785),
 ('appreciated', 1.2462482874741743),
 ('favourite', 1.2411123512753928),
 ('journey', 1.2367626271489269),
 ('rural', 1.235471471385307),
 ('bond', 1.2321436812926323),
 ('builds', 1.2305398317106577),
 ('brilliant', 1.2287554137664785),
 ('brooklyn', 1.2286654169163074),
 ('von', 1.225175011976539),
 ('unfolds', 1.2163953243244932),
 ('recommended', 1.2163953243244932),
 ('daniel', 1.20215296760895),
 ('perfectly', 1.1971931173405572),
 ('crafted', 1.1962507582320256),
 ('prince', 1.1939224684724346),
 ('troubled', 1.192138346678933),
 ('consequences', 1.1865810616140668),
 ('haunting', 1.1814999484738773),
 ('cinderella', 1.180052620608284),
 ('alexander', 1.1759989522835299),
 ('emotions', 1.1753049094563641),
 ('boxing', 1.1735135968412274),
 ('subtle', 1.1734135017508081),
 ('curtis', 1.1649873576129823),
 ('rare', 1.1566438362402944),
 ('loved', 1.1563661500586044),
 ('daughters', 1.1526795099383853),
 ('courage', 1.1438688802562305),
 ('dentist', 1.1426722784621401),
 ('highly', 1.1420208631618658),
 ('nominated', 1.1409146683587992),
 ('tony', 1.1397491942285991),
 ('draws', 1.1325138403437911),
 ('everyday', 1.1306150197542835),
 ('contrast', 1.1284652518177909),
 ('cried', 1.1213405397456659),
 ('fabulous', 1.1210851445201684),
 ('ned', 1.120591195386885),
 ('fay', 1.120591195386885),
 ('emma', 1.1184149159642893),
 ('sensitive', 1.113318436057805),
 ('smooth', 1.1089750757036563),
 ('dramas', 1.1080910326226534),
 ('today', 1.1050431789984001),
 ('helps', 1.1023091505494358),
 ('inspiring', 1.0986122886681098),
 ('jimmy', 1.0937696641923216),
 ('awesome', 1.0931328229034842),
 ('unique', 1.0881409888008142),
 ('tragic', 1.0871835928444868),
 ('intense', 1.0870514662670339),
 ('stellar', 1.0857088838322018),
 ('rival', 1.0822184788924332),
 ('provides', 1.0797081340289569),
 ('depression', 1.0782034170369026),
 ('shy', 1.0775588794702773),
 ('carrie', 1.076139432816051),
 ('blend', 1.0753554265038423),
 ('hank', 1.0736109864626924),
 ('diana', 1.0726368022648489),
 ('adorable', 1.0726368022648489),
 ('unexpected', 1.0722255334949147),
 ('achievement', 1.0668635903535293),
 ('bettie', 1.0663514264498881),
 ('happiness', 1.0632729222228008),
 ('glorious', 1.0608719606852626),
 ('davis', 1.0541605260972757),
 ('terrifying', 1.0525211814678428),
 ('beauty', 1.050410186850232),
 ('ideal', 1.0479685558493548),
 ('fears', 1.0467872208035236),
 ('hong', 1.0438040521731147),
 ('seasons', 1.0433496099930604),
 ('fascinating', 1.0414538748281612),
 ('carries', 1.0345904299031787),
 ('satisfying', 1.0321225473992768),
 ('definite', 1.0319209141694374),
 ('touched', 1.0296194171811581),
 ('greatest', 1.0248947127715422),
 ('creates', 1.0241097613701886),
 ('aunt', 1.023388867430522),
 ('walter', 1.022328983918479),
 ('spectacular', 1.0198314108149955),
 ('portrayal', 1.0189810189761024),
 ('ann', 1.0127808528183286),
 ('enterprise', 1.0116009116784799),
 ('musicals', 1.0096648026516135),
 ('deeply', 1.0094845087721023),
 ('incredible', 1.0061677561461084),
 ('mature', 1.0060195018402847),
 ('triumph', 0.99682959435816731),
 ('margaret', 0.99682959435816731),
 ('navy', 0.99493385919326827),
 ('harry', 0.99176919305006062),
 ('lucas', 0.990398704027877),
 ('sweet', 0.98966110487955483),
 ('joey', 0.98794672078059009),
 ('oscar', 0.98721905111049713),
 ('balance', 0.98649499054740353),
 ('warm', 0.98485340331145166),
 ('ages', 0.98449898190068863),
 ('glover', 0.98082925301172619),
 ('guilt', 0.98082925301172619),
 ('carrey', 0.98082925301172619),
 ('learns', 0.97881108885548895),
 ('unusual', 0.97788374278196932),
 ('sons', 0.97777581552483595),
 ('complex', 0.97761897738147796),
 ('essence', 0.97753435711487369),
 ('brazil', 0.9769153536905899),
 ('widow', 0.97650959186720987),
 ('solid', 0.97537964824416146),
 ('beautiful', 0.97326301262841053),
 ('holmes', 0.97246100334120955),
 ('awe', 0.97186058302896583),
 ('vhs', 0.97116734209998934),
 ('eerie', 0.97116734209998934),
 ('lonely', 0.96873720724669754),
 ('grim', 0.96873720724669754),
 ('sport', 0.96825047080486615),
 ('debut', 0.96508089604358704),
 ('destiny', 0.96343751029985703),
 ('thrillers', 0.96281074750904794),
 ('tears', 0.95977584381389391),
 ('rose', 0.95664202739772253),
 ('feelings', 0.95551144502743635),
 ('ginger', 0.95551144502743635),
 ('winning', 0.95471810900804055),
 ('stanley', 0.95387344302319799),
 ('cox', 0.95343027882361187),
 ('paris', 0.95278479030472663),
 ('heart', 0.95238806924516806),
 ('hooked', 0.95155887071161305),
 ('comfortable', 0.94803943018873538),
 ('mgm', 0.94446160884085151),
 ('masterpiece', 0.94155039863339296),
 ('themes', 0.94118828349588235),
 ('danny', 0.93967118051821874),
 ('anime', 0.93378388932167222),
 ('perry', 0.93328830824272613),
 ('joy', 0.93301752567946861),
 ('lovable', 0.93081883243706487),
 ('hal', 0.92953595862417571),
 ('mysteries', 0.92953595862417571),
 ('louis', 0.92871325187271225),
 ('charming', 0.92520609553210742),
 ('urban', 0.92367083917177761),
 ('allows', 0.92183091224977043),
 ('impact', 0.91815814604895041),
 ('gradually', 0.91629073187415511),
 ('lifestyle', 0.91629073187415511),
 ('italy', 0.91629073187415511),
 ('spy', 0.91289514287301687),
 ('treat', 0.91193342650519937),
 ('subsequent', 0.91056005716517008),
 ('kennedy', 0.90981821736853763),
 ('loving', 0.90967549275543591),
 ('surprising', 0.90937028902958128),
 ('quiet', 0.90648673177753425),
 ('winter', 0.90624039602065365),
 ('reveals', 0.90490540964902977),
 ('raw', 0.90445627422715225),
 ('funniest', 0.90078654533818991),
 ('pleased', 0.89994159387262562),
 ('norman', 0.89994159387262562),
 ('thief', 0.89874642222324552),
 ('season', 0.89827222637147675),
 ('secrets', 0.89794159320595857),
 ('colorful', 0.89705936994626756),
 ('highest', 0.8967461358011849),
 ('compelling', 0.89462923509297576),
 ('danes', 0.89248008318043659),
 ('castle', 0.88967708335606499),
 ('kudos', 0.88889175768604067),
 ('great', 0.88810470901464589),
 ('baseball', 0.88730319500090271),
 ('subtitles', 0.88730319500090271),
 ('bleak', 0.88730319500090271),
 ('winner', 0.88643776872447388),
 ('tragedy', 0.88563699078315261),
 ('todd', 0.88551907320740142),
 ('nicely', 0.87924946019380601),
 ('arthur', 0.87546873735389985),
 ('essential', 0.87373111745535925),
 ('gorgeous', 0.8731725250935497),
 ('fonda', 0.87294029100054127),
 ('eastwood', 0.87139541196626402),
 ('focuses', 0.87082835779739776),
 ('enjoyed', 0.87070195951624607),
 ('natural', 0.86997924506912838),
 ('intensity', 0.86835126958503595),
 ('witty', 0.86824103423244681),
 ('rob', 0.8642954367557748),
 ('worlds', 0.86377269759070874),
 ('health', 0.86113891179907498),
 ('magical', 0.85953791528170564),
 ('deeper', 0.85802182375017932),
 ('lucy', 0.85618680780444956),
 ('moving', 0.85566611005772031),
 ('lovely', 0.85290640004681306),
 ('purple', 0.8513711857748395),
 ('memorable', 0.84801189112086062),
 ('sings', 0.84729786038720367),
 ('craig', 0.84342938360928321),
 ('modesty', 0.84342938360928321),
 ('relate', 0.84326559685926517),
 ('episodes', 0.84223712084137292),
 ('strong', 0.84167135777060931),
 ('smith', 0.83959811108590054),
 ('tear', 0.83704136022001441),
 ('apartment', 0.83333115290549531),
 ('princess', 0.83290912293510388),
 ('disagree', 0.83290912293510388),
 ('kung', 0.83173334384609199),
 ('adventure', 0.83150561393278388),
 ('columbo', 0.82667857318446791),
 ('jake', 0.82667857318446791),
 ('adds', 0.82485652591452319),
 ('hart', 0.82472353834866463),
 ('strength', 0.82417544296634937),
 ('realizes', 0.82360006895738058),
 ('dave', 0.8232003088081431),
 ('childhood', 0.82208086393583857),
 ('forbidden', 0.81989888619908913),
 ('tight', 0.81883539572344199),
 ('surreal', 0.8178506590609026),
 ('manager', 0.81770990320170756),
 ('dancer', 0.81574950265227764),
 ('con', 0.81093021621632877),
 ('studios', 0.81093021621632877),
 ('miike', 0.80821651034473263),
 ('realistic', 0.80807714723392232),
 ('explicit', 0.80792269515237358),
 ('kurt', 0.8060875917405409),
 ('traditional', 0.80535917116687328),
 ('deals', 0.80535917116687328),
 ('holds', 0.80493858654806194),
 ('carl', 0.80437281567016972),
 ('touches', 0.80396154690023547),
 ('gene', 0.80314807577427383),
 ('albert', 0.8027669055771679),
 ('abc', 0.80234647252493729),
 ('cry', 0.80011930011211307),
 ('sides', 0.7995275841185171),
 ('develops', 0.79850769621777162),
 ('eyre', 0.79850769621777162),
 ('dances', 0.79694397424158891),
 ('oscars', 0.79633141679517616),
 ('legendary', 0.79600456599965308),
 ('importance', 0.79492987486988764),
 ('hearted', 0.79492987486988764),
 ('portraying', 0.79356592830699269),
 ('impressed', 0.79258107754813223),
 ('waters', 0.79112758892014912),
 ('empire', 0.79078565012386137),
 ('edge', 0.789774016249017),
 ('environment', 0.78845736036427028),
 ('jean', 0.78845736036427028),
 ('sentimental', 0.7864791203521645),
 ('captured', 0.78623760362595729),
 ('styles', 0.78592891401091158),
 ('daring', 0.78592891401091158),
 ('backgrounds', 0.78275933924963248),
 ('frank', 0.78275933924963248),
 ('matches', 0.78275933924963248),
 ('tense', 0.78275933924963248),
 ('gothic', 0.78209466657644144),
 ('sharp', 0.7814397877056235),
 ('achieved', 0.78015855754957497),
 ('court', 0.77947526404844247),
 ('steals', 0.7789140023173704),
 ('rules', 0.77844476107184035),
 ('colors', 0.77684619943659217),
 ('reunion', 0.77318988823348167),
 ('covers', 0.77139937745969345),
 ('tale', 0.77010822169607374),
 ('rain', 0.7683706017975328),
 ('denzel', 0.76804848873306297),
 ('stays', 0.76787072675588186),
 ('blob', 0.76725515271366718),
 ('conventional', 0.76214005204689672),
 ('maria', 0.76214005204689672),
 ('fresh', 0.76158434211317383),
 ('midnight', 0.76096977689870637),
 ('landscape', 0.75852993982279704),
 ('animated', 0.75768570169751648),
 ('titanic', 0.75666058628227129),
 ('sunday', 0.75666058628227129),
 ('spring', 0.7537718023763802),
 ('cagney', 0.7537718023763802),
 ('enjoyable', 0.75246375771636476),
 ('immensely', 0.75198768058287868),
 ('sir', 0.7507762933965817),
 ('nevertheless', 0.75067102469813185),
 ('driven', 0.74994477895307854),
 ('performances', 0.74883252516063137),
 ('memories', 0.74721440183022114),
 ('nowadays', 0.74721440183022114),
 ('simple', 0.74641420974143258),
 ('golden', 0.74533293373051557),
 ('leslie', 0.74533293373051557),
 ('lovers', 0.74497224842453125),
 ('relationship', 0.74484232345601786),
 ('supporting', 0.74357803418683721),
 ('che', 0.74262723782331497),
 ('packed', 0.7410032017375805),
 ('trek', 0.74021469141793106),
 ('provoking', 0.73840377214806618),
 ('strikes', 0.73759894313077912),
 ('depiction', 0.73682224406260699),
 ('emotional', 0.73678211645681524),
 ('secretary', 0.7366322924996842),
 ('influenced', 0.73511137965897755),
 ('florida', 0.73511137965897755),
 ('germany', 0.73288750920945944),
 ('brings', 0.73142936713096229),
 ('lewis', 0.73129894652432159),
 ('elderly', 0.73088750854279239),
 ('owner', 0.72743625403857748),
 ('streets', 0.72666987259858895),
 ('henry', 0.72642196944481741),
 ('portrays', 0.72593700338293632),
 ('bears', 0.7252354951114458),
 ('china', 0.72489587887452556),
 ('anger', 0.72439972406404984),
 ('society', 0.72433010799663333),
 ('available', 0.72415741730250549),
 ('best', 0.72347034060446314),
 ('bugs', 0.72270598280148979),
 ('magic', 0.71878961117328299),
 ('verhoeven', 0.71846498854423513),
 ('delivers', 0.71846498854423513),
 ('jim', 0.71783979315031676),
 ('donald', 0.71667767797013937),
 ('endearing', 0.71465338578090898),
 ('relationships', 0.71393795022901896),
 ('greatly', 0.71256526641704687),
 ('charlie', 0.71024161391924534),
 ('brad', 0.71024161391924534),
 ('simon', 0.70967648251115578),
 ('effectively', 0.70914752190638641),
 ('march', 0.70774597998109789),
 ('atmosphere', 0.70744773070214162),
 ('influence', 0.70733181555190172),
 ('genius', 0.706392407309966),
 ('emotionally', 0.70556970055850243),
 ('ken', 0.70526854109229009),
 ('identity', 0.70484322032313651),
 ('sophisticated', 0.70470800296102132),
 ('dan', 0.70457587638356811),
 ('andrew', 0.70329955202396321),
 ('india', 0.70144598337464037),
 ('roy', 0.69970458110610434),
 ('surprisingly', 0.6995780708902356),
 ('sky', 0.69780919366575667),
 ('romantic', 0.69664981111114743),
 ('match', 0.69566924999265523),
 ('britain', 0.69314718055994529),
 ('beatty', 0.69314718055994529),
 ('affected', 0.69314718055994529),
 ('cowboy', 0.69314718055994529),
 ('wave', 0.69314718055994529),
 ('stylish', 0.69314718055994529),
 ('bitter', 0.69314718055994529),
 ('patient', 0.69314718055994529),
 ('meets', 0.69314718055994529),
 ('love', 0.69198533541937324),
 ('paul', 0.68980827929443067),
 ('andy', 0.68846333124751902),
 ('performance', 0.68797386327972465),
 ('patrick', 0.68645819240914863),
 ('unlike', 0.68546468438792907),
 ('brooks', 0.68433655087779044),
 ('refuses', 0.68348526964820844),
 ('award', 0.6824518914431974),
 ('complaint', 0.6824518914431974),
 ('ride', 0.68229716453587952),
 ('dawson', 0.68171848473632257),
 ('luke', 0.68158635815886937),
 ('wells', 0.68087708796813096),
 ('france', 0.6804081547825156),
 ('handsome', 0.68007509899259255),
 ('sports', 0.68007509899259255),
 ('rebel', 0.67875844310784572),
 ('directs', 0.67875844310784572),
 ('greater', 0.67605274720064523),
 ('dreams', 0.67599410133369586),
 ('effective', 0.67565402311242806),
 ('interpretation', 0.67479804189174875),
 ('works', 0.67445504754779284),
 ('brando', 0.67445504754779284),
 ('noble', 0.6737290947028437),
 ('paced', 0.67314651385327573),
 ('le', 0.67067432470788668),
 ('master', 0.67015766233524654),
 ('h', 0.6696166831497512),
 ('rings', 0.66904962898088483),
 ('easy', 0.66895995494594152),
 ('city', 0.66820823221269321),
 ('sunshine', 0.66782937257565544),
 ('succeeds', 0.66647893347778397),
 ('relations', 0.664159643686693),
 ('england', 0.66387679825983203),
 ('glimpse', 0.66329421741026418),
 ('aired', 0.66268797307523675),
 ('sees', 0.66263163663399482),
 ('both', 0.66248336767382998),
 ('definitely', 0.66199789483898808),
 ('imaginative', 0.66139848224536502),
 ('appreciate', 0.66083893732728749),
 ('tricks', 0.66071190480679143),
 ('striking', 0.66071190480679143),
 ('carefully', 0.65999497324304479),
 ('complicated', 0.65981076029235353),
 ('perspective', 0.65962448852130173),
 ('trilogy', 0.65877953705573755),
 ('future', 0.65834665141052828),
 ('lion', 0.65742909795786608),
 ('victor', 0.65540685257709819),
 ('douglas', 0.65540685257709819),
 ('inspired', 0.65459851044271034),
 ('marriage', 0.65392646740666405),
 ('demands', 0.65392646740666405),
 ('father', 0.65172321672194655),
 ('page', 0.65123628494430852),
 ('instant', 0.65058756614114943),
 ('era', 0.6495567444850836),
 ('ruthless', 0.64934455790155243),
 ('saga', 0.64934455790155243),
 ('joan', 0.64891392558311978),
 ('joseph', 0.64841128671855386),
 ('workers', 0.64829661439459352),
 ('fantasy', 0.64726757480925168),
 ('accomplished', 0.64551913157069074),
 ('distant', 0.64551913157069074),
 ('manhattan', 0.64435701639051324),
 ('personal', 0.64355023942057321),
 ('pushing', 0.64313675998528386),
 ('meeting', 0.64313675998528386),
 ('individual', 0.64313675998528386),
 ('pleasant', 0.64250344774119039),
 ('brave', 0.64185388617239469),
 ('william', 0.64083139119578469),
 ('hudson', 0.64077919504262937),
 ('friendly', 0.63949446706762514),
 ('eccentric', 0.63907995928966954),
 ('awards', 0.63875310849414646),
 ('jack', 0.63838309514997038),
 ('seeking', 0.63808740337691783),
 ('colonel', 0.63757732940513456),
 ('divorce', 0.63757732940513456),
 ('jane', 0.63443957973316734),
 ('keeping', 0.63414883979798953),
 ('gives', 0.63383568159497883),
 ('ted', 0.63342794585832296),
 ('animation', 0.63208692379869902),
 ('progress', 0.6317782341836532),
 ('concert', 0.63127177684185776),
 ('larger', 0.63127177684185776),
 ('nation', 0.6296337748376194),
 ('albeit', 0.62739580299716491),
 ('adapted', 0.62613647027698516),
 ('discovers', 0.62542900650499444),
 ('classic', 0.62504956428050518),
 ('segment', 0.62335141862440335),
 ('morgan', 0.62303761437291871),
 ('mouse', 0.62294292188669675),
 ('impressive', 0.62211140744319349),
 ('artist', 0.62168821657780038),
 ('ultimate', 0.62168821657780038),
 ('griffith', 0.62117368093485603),
 ('emily', 0.62082651898031915),
 ('drew', 0.62082651898031915),
 ('moved', 0.6197197120051281),
 ('profound', 0.61903920840622351),
 ('families', 0.61903920840622351),
 ('innocent', 0.61851219917136446),
 ('versions', 0.61730910416844087),
 ('eddie', 0.61691981517206107),
 ('criticism', 0.61651395453902935),
 ('nature', 0.61594514653194088),
 ('recognized', 0.61518563909023349),
 ('sexuality', 0.61467556511845012),
 ('contract', 0.61400986000122149),
 ('brian', 0.61344043794920278),
 ('remembered', 0.6131044728864089),
 ('determined', 0.6123858239154869),
 ('offers', 0.61207935747116349),
 ('pleasure', 0.61195702582993206),
 ('washington', 0.61180154110599294),
 ('images', 0.61159731359583758),
 ('games', 0.61067095873570676),
 ('academy', 0.60872983874736208),
 ('fashioned', 0.60798937221963845),
 ('melodrama', 0.60749173598145145),
 ('peoples', 0.60613580357031549),
 ('charismatic', 0.60613580357031549),
 ('rough', 0.60613580357031549),
 ('dealing', 0.60517840761398811),
 ('fine', 0.60496962268013299),
 ('tap', 0.60391604683200273),
 ('trio', 0.60157998703445481),
 ('russell', 0.60120968523425966),
 ('figures', 0.60077386042893011),
 ('ward', 0.60005675749393339),
 ('shine', 0.59911823091166894),
 ('brady', 0.59911823091166894),
 ('job', 0.59845562125168661),
 ('satisfied', 0.59652034487087369),
 ('river', 0.59637962862495086),
 ('brown', 0.595773016534769),
 ('believable', 0.59566072133302495),
 ('bound', 0.59470710774669278),
 ('always', 0.59470710774669278),
 ('hall', 0.5933967777928858),
 ('cook', 0.5916777203950857),
 ('claire', 0.59136448625000293),
 ('broadway', 0.59033768669372433),
 ('anna', 0.58778666490211906),
 ('peace', 0.58628403501758408),
 ('visually', 0.58539431926349916),
 ('falk', 0.58525821854876026),
 ('morality', 0.58525821854876026),
 ('growing', 0.58466653756587539),
 ('experiences', 0.58314628534561685),
 ('stood', 0.58314628534561685),
 ('touch', 0.58122926435596001),
 ('lives', 0.5810976767513224),
 ('kubrick', 0.58066919713325493),
 ('timing', 0.58047401805583243),
 ('struggles', 0.57981849525294216),
 ('expressions', 0.57981849525294216),
 ('authentic', 0.57848427223980559),
 ('helen', 0.57763429343810091),
 ('pre', 0.57700753064729182),
 ('quirky', 0.5753641449035618),
 ('young', 0.57531672344534313),
 ('inner', 0.57454143815209846),
 ('mexico', 0.57443087372056334),
 ('clint', 0.57380042292737909),
 ('sisters', 0.57286101468544337),
 ('realism', 0.57226528899949558),
 ('personalities', 0.5720692490067093),
 ('french', 0.5720692490067093),
 ('surprises', 0.57113222999698177),
 ('adventures', 0.57113222999698177),
 ('overcome', 0.5697681593994407),
 ('timothy', 0.56953322459276867),
 ('tales', 0.56909453188996639),
 ('war', 0.56843317302781682),
 ('civil', 0.5679840376059393),
 ('countries', 0.56737779327091187),
 ('streep', 0.56710645966458029),
 ('tradition', 0.56685345523565323),
 ('oliver', 0.56673325570428668),
 ('australia', 0.56580775818334383),
 ('understanding', 0.56531380905006046),
 ('players', 0.56509525370004821),
 ('knowing', 0.56489284503626647),
 ('rogers', 0.56421349718405212),
 ('suspenseful', 0.56368911332305849),
 ('variety', 0.56368911332305849),
 ('true', 0.56281525180810066),
 ('jr', 0.56220982311246936),
 ('psychological', 0.56108745854687891),
 ('branagh', 0.55961578793542266),
 ('wealth', 0.55961578793542266),
 ('performing', 0.55961578793542266),
 ('odds', 0.55961578793542266),
 ('sent', 0.55961578793542266),
 ('reminiscent', 0.55961578793542266),
 ('grand', 0.55961578793542266),
 ('overwhelming', 0.55961578793542266),
 ('brothers', 0.55891181043362848),
 ('howard', 0.55811089675600245),
 ('david', 0.55693122256475369),
 ('generation', 0.55628799784274796),
 ('grow', 0.55612538299565417),
 ('survival', 0.55594605904646033),
 ('mainstream', 0.55574731115750231),
 ('dick', 0.55431073570572953),
 ('charm', 0.55288175575407861),
 ('kirk', 0.55278982286502287),
 ('twists', 0.55244729845681018),
 ('gangster', 0.55206858230003986),
 ('jeff', 0.55179306225421365),
 ('family', 0.55116244510065526),
 ('tend', 0.55053307336110335),
 ('thanks', 0.55049088015842218),
 ('world', 0.54744234723432639),
 ('sutherland', 0.54743536937855164),
 ('life', 0.54695514434959924),
 ('disc', 0.54654370636806993),
 ('bug', 0.54654370636806993),
 ('tribute', 0.5455111817538808),
 ('europe', 0.54522705048332309),
 ('sacrifice', 0.54430155296238014),
 ('color', 0.54405127139431109),
 ('superior', 0.54333490233128523),
 ('york', 0.54318235866536513),
 ('pulls', 0.54266622962164945),
 ('hearts', 0.54232429082536171),
 ('jackson', 0.54232429082536171),
 ('enjoy', 0.54124285135906114),
 ('redemption', 0.54056759296472823),
 ('madness', 0.540384426007535),
 ('hamilton', 0.5389965007326869),
 ('stands', 0.5389965007326869),
 ('trial', 0.5389965007326869),
 ('greek', 0.5389965007326869),
 ('each', 0.5388212312554177),
 ('faithful', 0.53773307668591508),
 ('received', 0.5372768098531604),
 ('jealous', 0.53714293208336406),
 ('documentaries', 0.53714293208336406),
 ('different', 0.53709860682460819),
 ('describes', 0.53680111016925136),
 ('shorts', 0.53596159703753288),
 ('brilliance', 0.53551823635636209),
 ('mountains', 0.53492317534505118),
 ('share', 0.53408248593025787),
 ('dealt', 0.53408248593025787),
 ('providing', 0.53329847961804933),
 ('explore', 0.53329847961804933),
 ('series', 0.5325809226575603),
 ('fellow', 0.5323318289869543),
 ('loves', 0.53062825106217038),
 ('olivier', 0.53062825106217038),
 ('revolution', 0.53062825106217038),
 ('roman', 0.53062825106217038),
 ('century', 0.53002783074992665),
 ('musical', 0.52966871156747064),
 ('heroic', 0.52925932545482868),
 ('ironically', 0.52806743020049673),
 ('approach', 0.52806743020049673),
 ('temple', 0.52806743020049673),
 ('moves', 0.5279372642387119),
 ('gift', 0.52702030968597136),
 ('julie', 0.52609309589677911),
 ('tells', 0.52415107836314001),
 ('radio', 0.52394671172868779),
 ('uncle', 0.52354439617376536),
 ('union', 0.52324814376454787),
 ('deep', 0.52309571635780505),
 ('reminds', 0.52157841554225237),
 ('famous', 0.52118841080153722),
 ('jazz', 0.52053443789295151),
 ('dennis', 0.51987545928590861),
 ('epic', 0.51919387343650736),
 ('adult', 0.519167695083386),
 ('shows', 0.51915322220375304),
 ('performed', 0.5191244265806858),
 ('demons', 0.5191244265806858),
 ('eric', 0.51879379341516751),
 ('discovered', 0.51879379341516751),
 ('youth', 0.5185626062681431),
 ('human', 0.51851411224987087),
 ('tarzan', 0.51813827061227724),
 ('ourselves', 0.51794309153485463),
 ('wwii', 0.51758240622887042),
 ('passion', 0.5162164724008671),
 ('desire', 0.51607497965213445),
 ('pays', 0.51581316527702981),
 ('fox', 0.51557622652458857),
 ('dirty', 0.51557622652458857),
 ('symbolism', 0.51546600332249293),
 ('sympathetic', 0.51546600332249293),
 ('attitude', 0.51530993621331933),
 ('appearances', 0.51466440007315639),
 ('jeremy', 0.51466440007315639),
 ('fun', 0.51439068993048687),
 ('south', 0.51420972175023116),
 ('arrives', 0.51409894911095988),
 ('present', 0.51341965894303732),
 ('com', 0.51326167856387173),
 ('smile', 0.51265880484765169),
 ('fits', 0.51082562376599072),
 ('provided', 0.51082562376599072),
 ('carter', 0.51082562376599072),
 ('ring', 0.51082562376599072),
 ('aging', 0.51082562376599072),
 ('countryside', 0.51082562376599072),
 ('alan', 0.51082562376599072),
 ('visit', 0.51082562376599072),
 ('begins', 0.51015650363396647),
 ('success', 0.50900578704900468),
 ('japan', 0.50900578704900468),
 ('accurate', 0.50895471583017893),
 ('proud', 0.50800474742434931),
 ('daily', 0.5075946031845443),
 ('atmospheric', 0.50724780241810674),
 ('karloff', 0.50724780241810674),
 ('recently', 0.50714914903668207),
 ('fu', 0.50704490092608467),
 ('horrors', 0.50656122497953315),
 ('finding', 0.50637127341661037),
 ('lust', 0.5059356384717989),
 ('hitchcock', 0.50574947073413001),
 ('among', 0.50334004951332734),
 ('viewing', 0.50302139827440906),
 ('shining', 0.50262885656181222),
 ('investigation', 0.50262885656181222),
 ('duo', 0.5020919437972361),
 ('cameron', 0.5020919437972361),
 ('finds', 0.50128303100539795),
 ('contemporary', 0.50077528791248915),
 ('genuine', 0.50046283673044401),
 ('frightening', 0.49995595152908684),
 ('plays', 0.49975983848890226),
 ('age', 0.49941323171424595),
 ('position', 0.49899116611898781),
 ('continues', 0.49863035067217237),
 ('roles', 0.49839716550752178),
 ('james', 0.49837216269470402),
 ('individuals', 0.49824684155913052),
 ('brought', 0.49783842823917956),
 ('hilarious', 0.49714551986191058),
 ('brutal', 0.49681488669639234),
 ('appropriate', 0.49643688631389105),
 ('dance', 0.49581998314812048),
 ('league', 0.49578774640145024),
 ('helping', 0.49578774640145024),
 ('answers', 0.49578774640145024),
 ('stunts', 0.49561620510246196),
 ('traveling', 0.49532143723002542),
 ('thoroughly', 0.49414593456733524),
 ('depicted', 0.49317068852726992),
 ('honor', 0.49247648509779424),
 ('combination', 0.49247648509779424),
 ('differences', 0.49247648509779424),
 ('fully', 0.49213349075383811),
 ('tracy', 0.49159426183810306),
 ('battles', 0.49140753790888908),
 ('possibility', 0.49112055268665822),
 ('romance', 0.4901589869574316),
 ('initially', 0.49002249613622745),
 ('happy', 0.4898997500608791),
 ('crime', 0.48977221456815834),
 ('singing', 0.4893852925281213),
 ('especially', 0.48901267837860624),
 ('shakespeare', 0.48754793889664511),
 ('hugh', 0.48729512635579658),
 ('detail', 0.48609484250827351),
 ('guide', 0.48550781578170082),
 ('companion', 0.48550781578170082),
 ('julia', 0.48550781578170082),
 ('san', 0.48550781578170082),
 ('desperation', 0.48550781578170082),
 ('strongly', 0.48460242866688824),
 ('necessary', 0.48302334245403883),
 ('humanity', 0.48265474679929443),
 ('drama', 0.48221998493060503),
 ('warming', 0.48183808689273838),
 ('intrigue', 0.48183808689273838),
 ('nonetheless', 0.48183808689273838),
 ('cuba', 0.48183808689273838),
 ('planned', 0.47957308026188628),
 ('pictures', 0.47929937011921681),
 ('broadcast', 0.47849024312305422),
 ('nine', 0.47803580094299974),
 ('settings', 0.47743860773325364),
 ('history', 0.47732966933780852),
 ('ordinary', 0.47725880012690741),
 ('trade', 0.47692407209030935),
 ('primary', 0.47608267532211779),
 ('official', 0.47608267532211779),
 ('episode', 0.47529620261150429),
 ('role', 0.47520268270188676),
 ('spirit', 0.47477690799839323),
 ('grey', 0.47409361449726067),
 ('ways', 0.47323464982718205),
 ('cup', 0.47260441094579297),
 ('piano', 0.47260441094579297),
 ('familiar', 0.47241617565111949),
 ('sinister', 0.47198579044972683),
 ('reveal', 0.47171449364936496),
 ('max', 0.47150852042515579),
 ('dated', 0.47121648567094482),
 ('discovery', 0.47000362924573563),
 ('vicious', 0.47000362924573563),
 ('losing', 0.47000362924573563),
 ('genuinely', 0.46871413841586385),
 ('hatred', 0.46734051182625186),
 ('mistaken', 0.46702300110759781),
 ('dream', 0.46608972992459924),
 ('challenge', 0.46608972992459924),
 ('crisis', 0.46575733836428446),
 ('photographed', 0.46488852857896512),
 ('machines', 0.46430560813109778),
 ('critics', 0.46430560813109778),
 ('bird', 0.46430560813109778),
 ('born', 0.46411383518967209),
 ('detective', 0.4636633473511525),
 ('higher', 0.46328467899699055),
 ('remains', 0.46262352194811296),
 ('inevitable', 0.46262352194811296),
 ('soviet', 0.4618180446592961),
 ('ryan', 0.46134556650262099),
 ('african', 0.46112595521371813),
 ('smaller', 0.46081520319132935),
 ('techniques', 0.46052488529119184),
 ('information', 0.46034171833399862),
 ('deserved', 0.45999798712841444),
 ('cynical', 0.45953232937844013),
 ('lynch', 0.45953232937844013),
 ('francisco', 0.45953232937844013),
 ('tour', 0.45953232937844013),
 ('spielberg', 0.45953232937844013),
 ('struggle', 0.45911782160048453),
 ('language', 0.45902121257712653),
 ('visual', 0.45823514408822852),
 ('warner', 0.45724137763188427),
 ('social', 0.45720078250735313),
 ('reality', 0.45719346885019546),
 ('hidden', 0.45675840249571492),
 ('breaking', 0.45601738727099561),
 ('sometimes', 0.45563021171182794),
 ('modern', 0.45500247579345005),
 ('surfing', 0.45425527227759638),
 ('popular', 0.45410691533051023),
 ('surprised', 0.4534409399850382),
 ('follows', 0.45245361754408348),
 ('keeps', 0.45234869400701483),
 ('john', 0.4520909494482197),
 ('defeat', 0.45198512374305722),
 ('mixed', 0.45198512374305722),
 ('justice', 0.45142724367280018),
 ('treasure', 0.45083371313801535),
 ('presents', 0.44973793178615257),
 ('years', 0.44919197032104968),
 ('chief', 0.44895022004790319),
 ('shadows', 0.44802472252696035),
 ('closely', 0.44701411102103689),
 ('segments', 0.44701411102103689),
 ('lose', 0.44658335503763702),
 ('caine', 0.44628710262841953),
 ('caught', 0.44610275383999071),
 ('hamlet', 0.44558510189758965),
 ('chinese', 0.44507424620321018),
 ('welcome', 0.44438052435783792),
 ('birth', 0.44368632092836219),
 ('represents', 0.44320543609101143),
 ('puts', 0.44279106572085081),
 ('fame', 0.44183275227903923),
 ('closer', 0.44183275227903923),
 ('visuals', 0.44183275227903923),
 ('web', 0.44183275227903923),
 ('criminal', 0.4412745608048752),
 ('minor', 0.4409224199448939),
 ('jon', 0.44086703515908027),
 ('liked', 0.44074991514020723),
 ('restaurant', 0.44031183943833246),
 ('flaws', 0.43983275161237217),
 ('de', 0.43983275161237217),
 ('searching', 0.4393666597838457),
 ('rap', 0.43891304217570443),
 ('light', 0.43884433018199892),
 ('elizabeth', 0.43872232986464677),
 ('marry', 0.43861731542506488),
 ('oz', 0.43825493093115531),
 ('controversial', 0.43825493093115531),
 ('learned', 0.43825493093115531),
 ('slowly', 0.43785660389939979),
 ('bridge', 0.43721380642274466),
 ('thrilling', 0.43721380642274466),
 ('wayne', 0.43721380642274466),
 ('comedic', 0.43721380642274466),
 ('married', 0.43658501682196887),
 ('nazi', 0.4361020775700542),
 ('murder', 0.4353180712578455),
 ('physical', 0.4353180712578455),
 ('johnny', 0.43483971678806865),
 ('michelle', 0.43445264498141672),
 ('wallace', 0.43403848055222038),
 ('silent', 0.43395706390247063),
 ('comedies', 0.43395706390247063),
 ('played', 0.43387244114515305),
 ('international', 0.43363598507486073),
 ('vision', 0.43286408229627887),
 ('intelligent', 0.43196704885367099),
 ('shop', 0.43078291609245434),
 ('also', 0.43036720209769169),
 ('levels', 0.4302451371066513),
 ('miss', 0.43006426712153217),
 ('ocean', 0.4295626596872249),
 ...]

In [114]:
# words most frequently seen in a review with a "NEGATIVE" label
list(reversed(pos_neg_ratios.most_common()))[0:30]


Out[114]:
[('boll', -4.0778152602708904),
 ('uwe', -3.9218753018711578),
 ('seagal', -3.3202501058581921),
 ('unwatchable', -3.0269848170580955),
 ('stinker', -2.9876839403711624),
 ('mst', -2.7753833211707968),
 ('incoherent', -2.7641396677532537),
 ('unfunny', -2.5545257844967644),
 ('waste', -2.4907515123361046),
 ('blah', -2.4475792789485005),
 ('horrid', -2.3715779644809971),
 ('pointless', -2.3451073877136341),
 ('atrocious', -2.3187369339642556),
 ('redeeming', -2.2667790015910296),
 ('prom', -2.2601040980178784),
 ('drivel', -2.2476029585766928),
 ('lousy', -2.2118080125207054),
 ('worst', -2.1930856334332267),
 ('laughable', -2.172468615469592),
 ('awful', -2.1385076866397488),
 ('poorly', -2.1326133844207011),
 ('wasting', -2.1178155545614512),
 ('remotely', -2.111046881095167),
 ('existent', -2.0024805005437076),
 ('boredom', -1.9241486572738005),
 ('miserably', -1.9216610938019989),
 ('sucks', -1.9166645809588516),
 ('uninspired', -1.9131499212248517),
 ('lame', -1.9117232884159072),
 ('insult', -1.9085323769376259)]

In [22]:
from bokeh.models import ColumnDataSource, LabelSet
from bokeh.plotting import figure, show, output_file
from bokeh.io import output_notebook
output_notebook()


Loading BokehJS ...

In [116]:
hist, edges = np.histogram(list(map(lambda x:x[1],pos_neg_ratios.most_common())), density=True, bins=100, normed=True)

p = figure(tools="pan,wheel_zoom,reset,save",
           toolbar_location="above",
           title="Word Positive/Negative Affinity Distribution")
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], line_color="#555555")
show(p)



In [117]:
frequency_frequency = Counter()

for word, cnt in total_counts.most_common():
    frequency_frequency[cnt] += 1

In [118]:
hist, edges = np.histogram(list(map(lambda x:x[1],frequency_frequency.most_common())), density=True, bins=100, normed=True)

p = figure(tools="pan,wheel_zoom,reset,save",
           toolbar_location="above",
           title="The frequency distribution of the words in our corpus")
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], line_color="#555555")
show(p)


Reducing Noise by Strategically Reducing the Vocabulary


In [19]:
import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
    def __init__(self, reviews,labels,min_count = 10,polarity_cutoff = 0.1,hidden_nodes = 10, learning_rate = 0.1):
       
        np.random.seed(1)
    
        self.pre_process_data(reviews, polarity_cutoff, min_count)
        
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)
        
        
    def pre_process_data(self,reviews, polarity_cutoff,min_count):
        
        positive_counts = Counter()
        negative_counts = Counter()
        total_counts = Counter()

        for i in range(len(reviews)):
            if(labels[i] == 'POSITIVE'):
                for word in reviews[i].split(" "):
                    positive_counts[word] += 1
                    total_counts[word] += 1
            else:
                for word in reviews[i].split(" "):
                    negative_counts[word] += 1
                    total_counts[word] += 1

        pos_neg_ratios = Counter()

        for term,cnt in list(total_counts.most_common()):
            if(cnt >= 50):
                pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)
                pos_neg_ratios[term] = pos_neg_ratio

        for word,ratio in pos_neg_ratios.most_common():
            if(ratio > 1):
                pos_neg_ratios[word] = np.log(ratio)
            else:
                pos_neg_ratios[word] = -np.log((1 / (ratio + 0.01)))
        
        review_vocab = set()
        for review in reviews:
            for word in review.split(" "):
                if(total_counts[word] > min_count):
                    if(word in pos_neg_ratios.keys()):
                        if((pos_neg_ratios[word] >= polarity_cutoff) or (pos_neg_ratios[word] <= -polarity_cutoff)):
                            review_vocab.add(word)
                    else:
                        review_vocab.add(word)
        self.review_vocab = list(review_vocab)
        
        label_vocab = set()
        for label in labels:
            label_vocab.add(label)
        
        self.label_vocab = list(label_vocab)
        
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        self.word2index = {}
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
         
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
    
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        self.learning_rate = learning_rate
        
        self.layer_0 = np.zeros((1,input_nodes))
        self.layer_1 = np.zeros((1,hidden_nodes))
        
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    
    
    def sigmoid_output_2_derivative(self,output):
        return output * (1 - output)
    
    def update_input_layer(self,review):

        # clear out previous state, reset the layer to be all 0s
        self.layer_0 *= 0
        for word in review.split(" "):
            self.layer_0[0][self.word2index[word]] = 1

    def get_target_for_label(self,label):
        if(label == 'POSITIVE'):
            return 1
        else:
            return 0
        
    def train(self, training_reviews_raw, training_labels):
        
        training_reviews = list()
        for review in training_reviews_raw:
            indices = set()
            for word in review.split(" "):
                if(word in self.word2index.keys()):
                    indices.add(self.word2index[word])
            training_reviews.append(list(indices))
        
        assert(len(training_reviews) == len(training_labels))
        
        correct_so_far = 0
        
        start = time.time()
        
        for i in range(len(training_reviews)):
            
            review = training_reviews[i]
            label = training_labels[i]
            
            #### Implement the forward pass here ####
            ### Forward pass ###

            # Input Layer

            # Hidden layer
#             layer_1 = self.layer_0.dot(self.weights_0_1)
            self.layer_1 *= 0
            for index in review:
                self.layer_1 += self.weights_0_1[index]
            
            # Output layer
            layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

            #### Implement the backward pass here ####
            ### Backward pass ###

            # Output error
            layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

            # Update the weights
            self.weights_1_2 -= self.layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
            
            for index in review:
                self.weights_0_1[index] -= layer_1_delta[0] * self.learning_rate # update input-to-hidden weights with gradient descent step

            if(layer_2 >= 0.5 and label == 'POSITIVE'):
                correct_so_far += 1
            if(layer_2 < 0.5 and label == 'NEGATIVE'):
                correct_so_far += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
        
    
    def test(self, testing_reviews, testing_labels):
        
        correct = 0
        
        start = time.time()
        
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                            + "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
    
    def run(self, review):
        
        # Input Layer


        # Hidden layer
        self.layer_1 *= 0
        unique_indices = set()
        for word in review.lower().split(" "):
            if word in self.word2index.keys():
                unique_indices.add(self.word2index[word])
        for index in unique_indices:
            self.layer_1 += self.weights_0_1[index]
        
        # Output layer
        layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] >= 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"

In [123]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000],min_count=20,polarity_cutoff=0.05,learning_rate=0.01)

In [124]:
mlp.train(reviews[:-1000],labels[:-1000])


Progress:99.9% Speed(reviews/sec):1371. #Correct:20461 #Trained:24000 Training Accuracy:85.2%

In [125]:
mlp.test(reviews[-1000:],labels[-1000:])


Progress:99.9% Speed(reviews/sec):1708.% #Correct:859 #Tested:1000 Testing Accuracy:85.9%

In [126]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000],min_count=20,polarity_cutoff=0.8,learning_rate=0.01)

In [127]:
mlp.train(reviews[:-1000],labels[:-1000])


Progress:99.9% Speed(reviews/sec):7089. #Correct:20552 #Trained:24000 Training Accuracy:85.6%

In [128]:
mlp.test(reviews[-1000:],labels[-1000:])


Progress:99.9% Speed(reviews/sec):3805.% #Correct:822 #Tested:1000 Testing Accuracy:82.2%

Analysis: What's Going on in the Weights?


In [20]:
mlp_full = SentimentNetwork(reviews[:-1000],labels[:-1000],min_count=0,polarity_cutoff=0,learning_rate=0.01)

In [21]:
mlp_full.train(reviews[:-1000],labels[:-1000])


Progress:99.9% Speed(reviews/sec):717.6 #Correct:20335 #Trained:24000 Training Accuracy:84.7%

In [23]:
Image(filename='sentiment_network_sparse.png')


Out[23]:

In [24]:
def get_most_similar_words(focus = "horrible"):
    most_similar = Counter()

    for word in mlp_full.word2index.keys():
        most_similar[word] = np.dot(mlp_full.weights_0_1[mlp_full.word2index[word]],
                                    mlp_full.weights_0_1[mlp_full.word2index[focus]])
    
    return most_similar.most_common()

In [25]:
get_most_similar_words("excellent")


Out[25]:
[('excellent', 0.13672950757352462),
 ('perfect', 0.12548286087225946),
 ('amazing', 0.091827633925999672),
 ('today', 0.090223662694414217),
 ('wonderful', 0.089355976962214562),
 ('fun', 0.087504466674206888),
 ('great', 0.087141758882292031),
 ('best', 0.085810885617880611),
 ('liked', 0.077697629123843398),
 ('definitely', 0.076628781406965982),
 ('brilliant', 0.073423858769279038),
 ('loved', 0.073285428928122121),
 ('favorite', 0.072781136036160779),
 ('superb', 0.07173620717850504),
 ('fantastic', 0.070922191916266197),
 ('job', 0.069160617207634015),
 ('incredible', 0.066424077952614402),
 ('enjoyable', 0.065632560502888765),
 ('rare', 0.064819212662615075),
 ('highly', 0.063889453350970501),
 ('enjoyed', 0.062127546101812925),
 ('wonderfully', 0.062055178604090155),
 ('perfectly', 0.061093208811887373),
 ('fascinating', 0.060663547937493852),
 ('bit', 0.059655427045653034),
 ('gem', 0.059510859296156772),
 ('outstanding', 0.058860808147083013),
 ('beautiful', 0.058613934703162063),
 ('surprised', 0.058273314482562975),
 ('worth', 0.057657484236471213),
 ('especially', 0.057422020781760771),
 ('refreshing', 0.057310532092265755),
 ('entertaining', 0.056612033835629197),
 ('hilarious', 0.05616854103228662),
 ('masterpiece', 0.054993988649431565),
 ('simple', 0.054484083134924075),
 ('subtle', 0.054368883033508605),
 ('funniest', 0.05345716487130267),
 ('solid', 0.052903564743620644),
 ('awesome', 0.05248919420277038),
 ('always', 0.052260328525345262),
 ('noir', 0.05153019472640688),
 ('guys', 0.051109413645642678),
 ('sweet', 0.05081893031752599),
 ('unique', 0.050670162263589169),
 ('very', 0.050132994948528464),
 ('heart', 0.049948058498243582),
 ('moving', 0.04942460116437912),
 ('atmosphere', 0.048842500895912841),
 ('strong', 0.04857088063175919),
 ('remember', 0.048479036942291255),
 ('believable', 0.04841538439160379),
 ('shows', 0.048336045608039578),
 ('love', 0.047310648160924638),
 ('beautifully', 0.047118717440814889),
 ('both', 0.046957278901480319),
 ('terrific', 0.046686597975756625),
 ('touching', 0.046589962377280955),
 ('fine', 0.046256431328855763),
 ('caught', 0.046163326224782343),
 ('recommended', 0.045876341160885285),
 ('jack', 0.045352909975188316),
 ('everyone', 0.045145273964599379),
 ('episodes', 0.045064457062621285),
 ('classic', 0.044985816637932753),
 ('will', 0.044966672557930437),
 ('appreciate', 0.044764139584570858),
 ('powerful', 0.044176442621852781),
 ('realistic', 0.0435974822834648),
 ('performances', 0.043020249087841744),
 ('human', 0.042657925475092541),
 ('expecting', 0.042588442995212208),
 ('each', 0.042163774519666963),
 ('delightful', 0.041815007170235494),
 ('cry', 0.041750968395934819),
 ('enjoy', 0.041660091797818107),
 ('you', 0.041465994778271065),
 ('surprisingly', 0.041393139256517372),
 ('think', 0.041103720571057038),
 ('performance', 0.040844259420896839),
 ('nice', 0.040016506666931712),
 ('paced', 0.03994448864759962),
 ('true', 0.03975059264337067),
 ('tight', 0.039425438825552647),
 ('similar', 0.039222380170683482),
 ('friendship', 0.039110112764204286),
 ('somewhat', 0.03906961573101022),
 ('beauty', 0.038130922554738787),
 ('short', 0.037981700131409189),
 ('life', 0.037716639265310249),
 ('stunning', 0.037507364832543751),
 ('still', 0.037479827910101501),
 ('normal', 0.037422144669435109),
 ('works', 0.037255830186344166),
 ('appreciated', 0.037156165138066244),
 ('mind', 0.037080739403157759),
 ('twists', 0.036932552473074122),
 ('knowing', 0.036786021801572068),
 ('captures', 0.03646750688449471),
 ('certain', 0.036348359494082834),
 ('later', 0.03621004278676522),
 ('finest', 0.036132101827862646),
 ('compelling', 0.036098464918935765),
 ('others', 0.03609012020219609),
 ('tragic', 0.036005003580472768),
 ('viewing', 0.035933572455522977),
 ('above', 0.035886717849742573),
 ('them', 0.035717513281555736),
 ('matter', 0.035602710619685625),
 ('future', 0.035323777987573399),
 ('good', 0.035250130839512749),
 ('hooked', 0.035154077227307991),
 ('world', 0.035098777806455032),
 ('unexpected', 0.035078442502957774),
 ('innocent', 0.034765360696729197),
 ('tears', 0.034338309927008842),
 ('certainly', 0.034301037742714126),
 ('available', 0.034268101109488011),
 ('unlike', 0.034253988843446569),
 ('season', 0.034038922427011613),
 ('vhs', 0.034011519281018122),
 ('superior', 0.03391762273249576),
 ('unusual', 0.033797799688239358),
 ('genre', 0.033766115408287264),
 ('criminal', 0.033744472720326824),
 ('makes', 0.033587001877476604),
 ('greatest', 0.03343185227197535),
 ('small', 0.033426529870538395),
 ('episode', 0.033336443796849899),
 ('deal', 0.033336107665281924),
 ('now', 0.033283339034235505),
 ('quiet', 0.033147935977529276),
 ('played', 0.033108782201536791),
 ('day', 0.033074949731286586),
 ('moved', 0.032873980754099884),
 ('underrated', 0.032738818192726324),
 ('society', 0.032613580418616235),
 ('focuses', 0.032607333858382818),
 ('intense', 0.032564318613854969),
 ('sharp', 0.032309211040923339),
 ('adds', 0.032236076588351779),
 ('check', 0.032030541149668801),
 ('take', 0.031717140193258622),
 ('deeply', 0.031693099458454561),
 ('games', 0.03166349528572017),
 ('pre', 0.031251131973427111),
 ('change', 0.031183353959862565),
 ('thanks', 0.031172398048464698),
 ('own', 0.03112133794334707),
 ('easy', 0.031088479340529641),
 ('pace', 0.03093436149167823),
 ('parts', 0.030850186028628303),
 ('truly', 0.030836637734471671),
 ('tony', 0.030739434811745025),
 ('inspired', 0.030725453849735001),
 ('thought', 0.030707437377997408),
 ('complex', 0.030464622676702042),
 ('worlds', 0.030391255174782039),
 ('language', 0.03026497620030956),
 ('soundtrack', 0.030210032139046033),
 ('steals', 0.030207167115964783),
 ('glad', 0.029812003262142256),
 ('ride', 0.029801794809751706),
 ('came', 0.029760628313031532),
 ('impact', 0.029695785634015842),
 ('personally', 0.029677477012254878),
 ('gritty', 0.029540021762614992),
 ('effective', 0.029512382123355347),
 ('wise', 0.029510408701830332),
 ('ultimate', 0.029442440672320932),
 ('ways', 0.02943934179284419),
 ('well', 0.029238386207701295),
 ('sent', 0.029147924396380077),
 ('after', 0.029037668915531285),
 ('tells', 0.029004383695691471),
 ('along', 0.028932972901634893),
 ('modern', 0.028910642159349308),
 ('family', 0.028897380662865534),
 ('pleasantly', 0.028754280601052389),
 ('edge', 0.02874468747624128),
 ('american', 0.028706398764554442),
 ('england', 0.028640930969798108),
 ('grand', 0.02858110240637193),
 ('slowly', 0.028470328912922983),
 ('treat', 0.028418097520915946),
 ('pleasure', 0.02837070411200417),
 ('living', 0.028335845213660421),
 ('impressed', 0.028311856507726555),
 ('fans', 0.028234674336798968),
 ('suspenseful', 0.028156658725541142),
 ('smile', 0.02806565183459761),
 ('jim', 0.027910842672277562),
 ('saw', 0.027900239466183013),
 ('length', 0.027896431301274532),
 ('impressive', 0.027894778243362794),
 ('times', 0.027869981332762559),
 ('witty', 0.027809121334036416),
 ('flawless', 0.027676409302939117),
 ('magic', 0.027671001404746015),
 ('though', 0.027434087841071524),
 ('subtitles', 0.02743198117938046),
 ('stands', 0.02734851854841645),
 ('freedom', 0.027271908118037379),
 ('relationship', 0.027231146375769136),
 ('tape', 0.027213179198573838),
 ('apartment', 0.027198859160909989),
 ('shown', 0.027062169058709857),
 ('films', 0.027035590529373481),
 ('lot', 0.026934527370476375),
 ('barbara', 0.026837141036193602),
 ('office', 0.026775230449656282),
 ('damn', 0.026751196837598828),
 ('murder', 0.026709073212876626),
 ('brilliantly', 0.026701889741880671),
 ('learns', 0.026699872569574595),
 ('tends', 0.02668377436133576),
 ('complaint', 0.026587011626106858),
 ('themselves', 0.026524658938498969),
 ('war', 0.026518675436425346),
 ('violence', 0.026450628158076143),
 ('judge', 0.026443267774947338),
 ('thriller', 0.026431555027632114),
 ('his', 0.026370773394088613),
 ('finding', 0.026362279892885004),
 ('cast', 0.026360860883736618),
 ('police', 0.026352129453305256),
 ('once', 0.026255817642908224),
 ('spectacular', 0.026245466997092372),
 ('deserves', 0.026214508159961684),
 ('driven', 0.026194930792511638),
 ('spot', 0.026171686780563669),
 ('carrey', 0.026162838804053026),
 ('negative', 0.026161677045062219),
 ('suspense', 0.026110016575822789),
 ('flaws', 0.026085421601700295),
 ('brave', 0.026080835779725298),
 ('surprising', 0.026070851171974708),
 ('gives', 0.026069978044960768),
 ('takes', 0.026047493401813327),
 ('light', 0.025921067904644501),
 ('timing', 0.025900303450693638),
 ('crime', 0.025886011572638652),
 ('thank', 0.025873161609513372),
 ('century', 0.02587105631011263),
 ('until', 0.025870245942132507),
 ('nature', 0.025817942935875453),
 ('stellar', 0.025803971141651155),
 ('emotions', 0.025783809728671912),
 ('tremendous', 0.025772614605786559),
 ('missed', 0.025657501028952572),
 ('overall', 0.025655652485101776),
 ('haven', 0.025650692177140791),
 ('portrayal', 0.025594273657909627),
 ('taylor', 0.025516992710898162),
 ('appropriate', 0.025495908849901629),
 ('joan', 0.025489829859140629),
 ('realize', 0.025452457061382182),
 ('different', 0.02543407397006044),
 ('return', 0.025384569542597581),
 ('bound', 0.025380084410398834),
 ('noticed', 0.02530649499844077),
 ('constantly', 0.025282186745762457),
 ('first', 0.025246100888919792),
 ('lovable', 0.025213500492273062),
 ('comic', 0.025074597800944055),
 ('scared', 0.024995376513809509),
 ('fight', 0.024943209945836396),
 ('extraordinary', 0.024940366453083611),
 ('buy', 0.024803940824255584),
 ('know', 0.024749519416087051),
 ('brothers', 0.024675058346350743),
 ('action', 0.024660907824635262),
 ('needs', 0.024634851651549335),
 ('jerry', 0.02462148438534386),
 ('while', 0.024620233313683841),
 ('also', 0.024519480987472433),
 ('definite', 0.024509585305468838),
 ('genius', 0.024500478757646955),
 ('tragedy', 0.024481339186882275),
 ('heard', 0.024446567944460477),
 ('haunting', 0.024431007352898926),
 ('legendary', 0.02441277726490897),
 ('uses', 0.024358972452014002),
 ('years', 0.024316094895735246),
 ('notch', 0.024310571597216266),
 ('fabulous', 0.024258810824927635),
 ('herself', 0.024241390957491057),
 ('battle', 0.024205827940178122),
 ('ralph', 0.024205046194653326),
 ('provoking', 0.024106106062481807),
 ('ago', 0.024024541904156496),
 ('game', 0.024004541901512372),
 ('deals', 0.02394702024903099),
 ('themes', 0.023936597120221115),
 ('my', 0.023928374753346037),
 ('which', 0.023908264765228698),
 ('together', 0.02388768394280821),
 ('record', 0.023879473557965502),
 ('chilling', 0.023877413677317435),
 ('absorbing', 0.023848541510400112),
 ('studios', 0.023840610970325336),
 ('helps', 0.023800338082370958),
 ('paul', 0.023782537407117978),
 ('drama', 0.023766688862014711),
 ('spots', 0.023727534480488408),
 ('japanese', 0.02370847543051147),
 ('com', 0.023663537310393355),
 ('meets', 0.023649415936523126),
 ('may', 0.023577512715288872),
 ('goal', 0.02357199244925659),
 ('out', 0.023558753773465096),
 ('page', 0.023530160671184863),
 ('con', 0.023523200814540533),
 ('thankfully', 0.023405004970711695),
 ('number', 0.023389568775323531),
 ('captured', 0.023351056068531193),
 ('joy', 0.023338854638575421),
 ('brought', 0.023336907813285936),
 ('max', 0.023250909447975868),
 ('superbly', 0.023239871167515597),
 ('those', 0.023176845007530665),
 ('course', 0.023170128305056523),
 ('inspiring', 0.023124940469820013),
 ('troubled', 0.023104553288143287),
 ('starring', 0.023098181939380305),
 ('famous', 0.023080990484234912),
 ('nowadays', 0.023041214534459814),
 ('gripping', 0.023039160339941953),
 ('identity', 0.023038352369265169),
 ('many', 0.023030059748964153),
 ('victor', 0.023028627724258649),
 ('michael', 0.022946522358330855),
 ('stop', 0.022927047859442076),
 ('eerie', 0.022877301562370816),
 ('seen', 0.022820929217422629),
 ('caused', 0.022791670672167533),
 ('moment', 0.022789062338184275),
 ('portraying', 0.022729334983088951),
 ('influence', 0.022698569029077062),
 ('when', 0.022541791159242781),
 ('touched', 0.022525639292270201),
 ('complicated', 0.022432126566344631),
 ('turns', 0.022415566693423837),
 ('young', 0.022415228068631974),
 ('award', 0.022414761392271602),
 ('put', 0.022325849008177176),
 ('trust', 0.022301497663936395),
 ('issues', 0.02225775337618751),
 ('innocence', 0.022236928993752819),
 ('anime', 0.022201683728338893),
 ('without', 0.02214454398785886),
 ('himself', 0.022068240705874407),
 ('charlie', 0.02205203730146018),
 ('parents', 0.021888138202371763),
 ('covered', 0.02188753333796175),
 ('final', 0.021877215769079549),
 ('killers', 0.021830664900395119),
 ('ages', 0.021774376677575584),
 ('usual', 0.021760980512718141),
 ('physical', 0.021749103191221798),
 ('like', 0.021730991541426742),
 ('crazy', 0.021727382570242992),
 ('puts', 0.021725737321791543),
 ('got', 0.021701574500289096),
 ('room', 0.021690968569465629),
 ('complaints', 0.021670426593916568),
 ('type', 0.021663628982945167),
 ('brings', 0.021600600975875413),
 ('remarkable', 0.021576791719396034),
 ('get', 0.021538325389801369),
 ('city', 0.021523385378314882),
 ('coming', 0.021492351614142778),
 ('traditional', 0.021430875828269805),
 ('romantic', 0.021420587536168552),
 ('cinema', 0.021411776829230966),
 ('regular', 0.021395882255575833),
 ('intelligent', 0.021391350897315427),
 ('music', 0.021381013806527443),
 ('humor', 0.021365697759571513),
 ('experience', 0.021314525649372935),
 ('favourite', 0.02125347648387825),
 ('social', 0.021250085255237357),
 ('feelings', 0.021245030895714345),
 ('cried', 0.021233271641070747),
 ('rock', 0.02121328002983236),
 ('against', 0.021157314119587243),
 ('including', 0.021156674122491399),
 ('honest', 0.02114345875879349),
 ('parallel', 0.021107353247706448),
 ('eddie', 0.021080182147252723),
 ('crafted', 0.020979194953745086),
 ('more', 0.02093379734319379),
 ('glued', 0.02093198872193016),
 ('insanity', 0.020914935599101146),
 ('thoroughly', 0.020905661542252759),
 ('eyes', 0.020868013291281091),
 ('jr', 0.020865268971014535),
 ('dramas', 0.020836398428109217),
 ('follows', 0.020814937146708408),
 ('situation', 0.020814821105666462),
 ('understood', 0.020749677092470175),
 ('face', 0.020701739464945038),
 ('albeit', 0.020680340389878413),
 ('memorable', 0.020608260124115527),
 ('accurate', 0.020585303033408747),
 ('under', 0.020574430698374231),
 ('arthur', 0.020562083939889477),
 ('elderly', 0.020545350471808114),
 ('opinion', 0.020539570922797755),
 ('whoopi', 0.020515675744150079),
 ('helped', 0.02047624233713052),
 ('detract', 0.020443807698341677),
 ('flawed', 0.020436371691432333),
 ('unusually', 0.020433523835905333),
 ('performing', 0.020396957567555725),
 ('smooth', 0.020347681451465368),
 ('magnificent', 0.020334637688102838),
 ('desperation', 0.02028776899905723),
 ('lose', 0.02027753568325787),
 ('satisfying', 0.020251527110272068),
 ('friend', 0.020227651020398935),
 ('kudos', 0.020201477326926613),
 ('breaking', 0.020117861519854292),
 ('elephant', 0.020115783447057042),
 ('colors', 0.020112155987764876),
 ('willing', 0.020087728040224326),
 ('fresh', 0.02005401912359376),
 ('offers', 0.020003415308141065),
 ('provides', 0.020002909565985012),
 ('guilt', 0.019987917970659564),
 ('shouldn', 0.019907879458024347),
 ('japan', 0.019906368589571698),
 ('secrets', 0.019876976104814387),
 ('obligatory', 0.019789665431840405),
 ('dvd', 0.019782796187823429),
 ('tale', 0.019752149872839884),
 ('since', 0.019726258912690298),
 ('roles', 0.019710495505207995),
 ('breathtaking', 0.019705824135660525),
 ('ground', 0.019687236524961869),
 ('higher', 0.019670526139537556),
 ('jean', 0.019665400087401592),
 ('rich', 0.019653095716660716),
 ('right', 0.019629293580435747),
 ('stone', 0.0196105959056691),
 ('lives', 0.01961034893671014),
 ('it', 0.019542002303277555),
 ('essential', 0.01953386009392041),
 ('tend', 0.019523404457496819),
 ('places', 0.019510216587218014),
 ('recommend', 0.019506211559818108),
 ('loy', 0.019481148560970923),
 ('tell', 0.019450286669268766),
 ('challenge', 0.019374490591710928),
 ('fiction', 0.019350601498735361),
 ('able', 0.019340445094151421),
 ('animated', 0.019333069625267079),
 ('complain', 0.019332028796550112),
 ('deeper', 0.019318681931941164),
 ('blew', 0.019304454395430125),
 ('seeing', 0.019302442445035529),
 ('release', 0.019209904006239131),
 ('unfolds', 0.019184703456013679),
 ('boys', 0.019177414753158387),
 ('favorites', 0.019160378141489524),
 ('throughout', 0.019136892845690673),
 ('marvelous', 0.019110015321943563),
 ('relax', 0.019044075162625462),
 ('desire', 0.019016117204605987),
 ('end', 0.019014420138293214),
 ('questions', 0.018977699968684838),
 ('man', 0.018956744494720245),
 ('rea', 0.018928733395777456),
 ('comments', 0.018923870708363082),
 ('vengeance', 0.018908638777923942),
 ('brian', 0.018906876323023587),
 ('learned', 0.01889994792370445),
 ('lovely', 0.018854980464698644),
 ('seasons', 0.018852496578683823),
 ('shines', 0.018827509959493258),
 ('justice', 0.018827310862034669),
 ('succeeds', 0.018776998522312769),
 ('discovered', 0.018766802216817063),
 ('touch', 0.018762806738861472),
 ('white', 0.018743225697414191),
 ('bitter', 0.018724701999912878),
 ('knows', 0.01871906328874429),
 ('gene', 0.018660060796556237),
 ('mainstream', 0.018654252436913901),
 ('raw', 0.018609728881254825),
 ('focus', 0.018605078305494939),
 ('won', 0.018597537876871639),
 ('ve', 0.018560162581379304),
 ('million', 0.018514133006256917),
 ('attention', 0.018406547682637144),
 ('river', 0.018403383531225694),
 ('classics', 0.018375185367387345),
 ('quirky', 0.018358100535754599),
 ('although', 0.018350252973821906),
 ('september', 0.018345012211358883),
 ('emotional', 0.01832716507095174),
 ('events', 0.01832455447591811),
 ('released', 0.018304767183625538),
 ('thus', 0.018302709016086102),
 ('rules', 0.018298967789718675),
 ('trilogy', 0.018261985922288494),
 ('jackie', 0.018261017705562571),
 ('country', 0.018248984107628784),
 ('find', 0.018220001120247339),
 ('sure', 0.018205281970545894),
 ('overlooked', 0.01817364459210739),
 ('sensitive', 0.018173518786609135),
 ('harsh', 0.018143998075916396),
 ('chair', 0.018127987063468094),
 ('neatly', 0.018123044612179433),
 ('round', 0.018082305853658363),
 ('adult', 0.018060718859389518),
 ('strength', 0.018042558269708915),
 ('aunt', 0.018028313353173651),
 ('description', 0.017997557340833973),
 ('perspective', 0.017974761193339694),
 ('closer', 0.017945066423908043),
 ('extra', 0.017934760731343116),
 ('hit', 0.017910740181690348),
 ('tough', 0.017904509470376237),
 ('work', 0.017882494289916093),
 ('captivating', 0.01787507230892095),
 ('swim', 0.017853354272014843),
 ('holmes', 0.017846058193393119),
 ('unlikely', 0.017843839699452125),
 ('fears', 0.017838067451752794),
 ('nominated', 0.0178374393045206),
 ('neat', 0.01782306847491319),
 ('discovers', 0.017801301834152447),
 ('paris', 0.01779805788420007),
 ('streets', 0.017746147480597593),
 ('realism', 0.017729724930388029),
 ('travel', 0.017694257020940293),
 ('keep', 0.017684400089090099),
 ('anyway', 0.017675995400919457),
 ('realizes', 0.017618932935696142),
 ('variety', 0.017618487604827659),
 ('chief', 0.017603963834362808),
 ('broke', 0.017601657476194944),
 ('craven', 0.017597613499935324),
 ('moves', 0.017559744221771676),
 ('see', 0.017554713803040193),
 ('intellectual', 0.017537349329235133),
 ('normally', 0.017511237908563505),
 ('technique', 0.0175022650778302),
 ('dancer', 0.017501395365645257),
 ('awe', 0.017467446640641395),
 ('technology', 0.017414969148737202),
 ('kelly', 0.017380794671638257),
 ('particular', 0.017380503339109222),
 ('awards', 0.017343067374305077),
 ('twisted', 0.0173427316555122),
 ('manager', 0.017337683585341688),
 ('fantasy', 0.017314736380004723),
 ('blake', 0.017282963990552191),
 ('criticism', 0.017279558676803669),
 ('identify', 0.017277471199843665),
 ('collection', 0.017253533052260926),
 ('sidney', 0.017239120845031548),
 ('ironic', 0.017225809884120875),
 ('score', 0.017223046869263518),
 ('charm', 0.017204164112517871),
 ('lonely', 0.017192972607511965),
 ('recall', 0.01718951228267028),
 ('dream', 0.017185607849471301),
 ('known', 0.017169341473045805),
 ('hoffman', 0.017123937023014246),
 ('answers', 0.017112374531695257),
 ('taking', 0.017102244694823313),
 ('color', 0.017086755659474456),
 ('existed', 0.017084491834780034),
 ('mel', 0.017080644125498475),
 ('treats', 0.017076365809061664),
 ('kennedy', 0.017063054110179412),
 ('millionaire', 0.017058120181534065),
 ('stewart', 0.01701786393539512),
 ('soon', 0.017016949690113498),
 ('style', 0.016978446616527424),
 ('urban', 0.01696177374188855),
 ('sides', 0.016958377563876283),
 ('nicely', 0.016956584044665043),
 ('survive', 0.01695320106620354),
 ('contrast', 0.016949017788907707),
 ('granted', 0.016948500759420799),
 ('wes', 0.016856895803564035),
 ('heroic', 0.016849533387674559),
 ('sadness', 0.016836182986070525),
 ('faults', 0.016833966998505426),
 ('ladies', 0.016818146836646251),
 ('walter', 0.016813645209614796),
 ('exceptional', 0.016810242985337294),
 ('dangerous', 0.016796058008032438),
 ('fan', 0.016737120507724371),
 ('witch', 0.016717085914917339),
 ('occasionally', 0.016711349636820468),
 ('movies', 0.016676687954063647),
 ('celebration', 0.016664197566723733),
 ('castle', 0.016661909651854559),
 ('catch', 0.016647995152024701),
 ('its', 0.016639302941262289),
 ('tribute', 0.016629617927918797),
 ('jimmy', 0.016625132101972986),
 ('bravo', 0.01661675415646004),
 ('enjoying', 0.016613140144305667),
 ('bus', 0.016593157501778099),
 ('documentary', 0.016564651461285371),
 ('frightening', 0.016559987706802767),
 ('guilty', 0.016536110253664235),
 ('slightly', 0.016526421724199342),
 ('is', 0.016511509443399758),
 ('chan', 0.016507204515006663),
 ('mixed', 0.016506847567311397),
 ('curious', 0.016506488394564579),
 ('spirit', 0.016502977044099081),
 ('pleased', 0.016487261129390269),
 ('most', 0.016476759333214065),
 ('chemistry', 0.016425356343989044),
 ('age', 0.016410666314929878),
 ('understanding', 0.016345696202945559),
 ('marie', 0.016341053241072701),
 ('dreams', 0.016332672013556312),
 ('again', 0.016287090973937747),
 ('union', 0.016282379359022551),
 ('spy', 0.016278154923785915),
 ('presented', 0.016273043238663489),
 ('steele', 0.016260993339006803),
 ('lay', 0.01625999545879786),
 ('plenty', 0.01624719418983283),
 ('horrors', 0.016246022980305589),
 ('black', 0.016223176851856817),
 ('comedy', 0.01622040802201059),
 ('winner', 0.0162203188573984),
 ('african', 0.016214456609794946),
 ('drummer', 0.016178152199513924),
 ('entertainment', 0.016173112007890945),
 ('delivers', 0.016166599465683076),
 ('stays', 0.016139476352793784),
 ('america', 0.016108896341111487),
 ('disappoint', 0.016066615933996442),
 ('gorgeous', 0.016062350166815054),
 ('sisters', 0.016060080355840684),
 ('subsequent', 0.016043574203873975),
 ('cerebral', 0.016039058904070029),
 ('french', 0.016038425317363183),
 ('perfection', 0.016033154869346932),
 ('likable', 0.016021713396124571),
 ('warm', 0.016019144095827342),
 ('studio', 0.016007232818464591),
 ('late', 0.015997923350457081),
 ('reality', 0.015978872249423726),
 ('showed', 0.015938750644323929),
 ('figures', 0.01592744660892324),
 ('ever', 0.015926454600790643),
 ('italy', 0.015909186780479357),
 ('accustomed', 0.015906246911558279),
 ('into', 0.015892173681617976),
 ('he', 0.015866239932092338),
 ('journey', 0.015817191390925522),
 ('waters', 0.0158009068788263),
 ('bill', 0.015785976148791337),
 ('cousin', 0.015784382710801671),
 ('explores', 0.015768756345569589),
 ('originally', 0.015766016465315408),
 ('astonishing', 0.015741175347778347),
 ('mouse', 0.015739473070555076),
 ('affect', 0.01571979846044326),
 ('authenticity', 0.015716491136675281),
 ('key', 0.015706372736941261),
 ('authorities', 0.015700111946298497),
 ('fortunately', 0.015676427069879848),
 ('notes', 0.015668388567765468),
 ('disagree', 0.015659822231464247),
 ('advanced', 0.015653464856497611),
 ('contribution', 0.015651919381489538),
 ('flaw', 0.015630623175485556),
 ('burning', 0.015593951152590362),
 ('scoop', 0.015580911014213493),
 ('levels', 0.015579506047588169),
 ('dead', 0.015575945832152268),
 ('reveals', 0.015552631094426428),
 ('explicit', 0.015535052542383238),
 ('fault', 0.015532818014787668),
 ('requires', 0.015440001642516231),
 ('way', 0.015434313286947601),
 ('waitress', 0.015433929845739224),
 ('vividly', 0.015399209375312219),
 ('truman', 0.015388667015530332),
 ('leslie', 0.015388355420398653),
 ('cool', 0.015362419182461003),
 ('i', 0.015358846209804482),
 ('dated', 0.01535189493470787),
 ('ruthless', 0.015347223840634985),
 ('anymore', 0.015327840988573713),
 ('batman', 0.015325445892906488),
 ('york', 0.01532365079728272),
 ('expressions', 0.015290943599335199),
 ('terms', 0.015285161966075779),
 ('sunday', 0.015279982329904816),
 ('chinese', 0.015240680418926652),
 ('done', 0.015230733309302687),
 ('behind', 0.015219079842199838),
 ('event', 0.015214794169662826),
 ('chamberlain', 0.015214082741427186),
 ('mysteries', 0.01520455675940992),
 ('manages', 0.015203486934632015),
 ('simpsons', 0.015191849812926213),
 ('mine', 0.015191085212402703),
 ('canadian', 0.015117611742208794),
 ('purple', 0.015100505661562468),
 ('website', 0.015095063701722864),
 ('master', 0.01509152869655765),
 ('charming', 0.015088362486196539),
 ('joe', 0.01508192017787815),
 ('reservations', 0.015077821343474077),
 ('fever', 0.015076873583983718),
 ('covers', 0.015047233453258807),
 ('madness', 0.015030361859657226),
 ('glimpse', 0.014991086926970954),
 ('pilot', 0.014978443271049677),
 ('johansson', 0.014975808461544405),
 ('explains', 0.014970512080227464),
 ('excellently', 0.014970388571598848),
 ('hawke', 0.01496975010993136),
 ('genuinely', 0.014947672770702568),
 ('often', 0.014942833143544474),
 ('cube', 0.014939928709365356),
 ('clean', 0.014937853229023522),
 ('ensemble', 0.014913656909087875),
 ('referred', 0.014910582069880152),
 ('replies', 0.014907131594945567),
 ('disease', 0.014895193110452173),
 ('wish', 0.014892245549307043),
 ('logical', 0.014888665766304057),
 ('nathan', 0.014869928851670402),
 ('aware', 0.01486986711289452),
 ('exciting', 0.014823139694980614),
 ('gone', 0.014821497224651535),
 ('critics', 0.014818559383907356),
 ('split', 0.014788117032985612),
 ('series', 0.014770708703162182),
 ('henry', 0.014757735101897452),
 ('prisoners', 0.014747710184003867),
 ('sentenced', 0.014746219906503842),
 ('laughing', 0.014722151818909786),
 ('president', 0.014671766779490544),
 ('list', 0.014666775185665164),
 ('ones', 0.01465899785410932),
 ('information', 0.014651687169784215),
 ('bonus', 0.014648059891508171),
 ('chicago', 0.014631769872667611),
 ('someday', 0.014629340475262568),
 ('splendid', 0.014609703424340649),
 ('surprises', 0.014608824054662468),
 ('sentimental', 0.014591361045287955),
 ('admit', 0.014588098910742779),
 ('previously', 0.014571223247118625),
 ('conveys', 0.014567143509152123),
 ('prominent', 0.01454736311408328),
 ('born', 0.014536990751946699),
 ('necessary', 0.014533225697989453),
 ('yes', 0.014531704633026978),
 ('marvel', 0.014527554209112409),
 ('initially', 0.014510187714555967),
 ('jake', 0.014502509408478864),
 ('matters', 0.01449773042608421),
 ('lucas', 0.014496736417950695),
 ('stories', 0.014475382661229963),
 ('happy', 0.014471040644253806),
 ('improvement', 0.014459225025278393),
 ('anger', 0.01444069696929931),
 ('hong', 0.014412020732763238),
 ('devotion', 0.014406165594180752),
 ('infamous', 0.014402483161136861),
 ('sir', 0.014390585849942563),
 ('fashioned', 0.014376495163092877),
 ('whenever', 0.014311984840844727),
 ('facing', 0.014311813694297498),
 ('spin', 0.014300937890947244),
 ('clear', 0.014297831903635035),
 ('verhoeven', 0.014290838087095132),
 ('onto', 0.014287704198288412),
 ('sheriff', 0.014266680346279261),
 ('boy', 0.0142383932121725),
 ('felix', 0.014236371593101711),
 ('what', 0.014231196728127856),
 ('site', 0.01421283932921704),
 ('hits', 0.014208508715996906),
 ('convincingly', 0.014165838532387459),
 ('adventures', 0.014158492204346286),
 ('multiple', 0.014150723728410523),
 ('wrapped', 0.014118759103459127),
 ('reveal', 0.01407651065382279),
 ('toby', 0.01407522149311176),
 ('months', 0.014061986005374691),
 ('comedies', 0.014050301808876078),
 ('shot', 0.014031987455271896),
 ('holds', 0.014023504904484214),
 ('weeks', 0.014002257803042338),
 ('window', 0.013985434541614843),
 ('received', 0.013983301709629938),
 ('him', 0.013968181093938303),
 ('court', 0.013964352058193527),
 ('double', 0.013960483190947275),
 ('refuses', 0.013957613385590659),
 ('stand', 0.01394881385922137),
 ('shocked', 0.013935157243261928),
 ('powell', 0.013934062441977023),
 ('brutal', 0.013924129605946689),
 ('among', 0.013913156765292948),
 ('prostitute', 0.013911765274631796),
 ('nine', 0.013882343344720896),
 ('timeless', 0.013858274395499411),
 ('likes', 0.013844971514262236),
 ('kurosawa', 0.013820064338774894),
 ('fact', 0.013814297186034387),
 ('ass', 0.013813899781949799),
 ('deanna', 0.013799520782801162),
 ('almost', 0.013791517357271339),
 ('technicolor', 0.013790541990858995),
 ('adventure', 0.013782999907047068),
 ('gerard', 0.013776140434137591),
 ('analysis', 0.013764039325045371),
 ('mid', 0.013747853289146213),
 ('stanwyck', 0.013738927891779258),
 ('mann', 0.013726915645691881),
 ('stuart', 0.013700229069235782),
 ('reluctantly', 0.013697113976504024),
 ('humanity', 0.01369083073691104),
 ('classical', 0.013688949911986586),
 ('health', 0.013684784640613444),
 ('edie', 0.013683859176013941),
 ('british', 0.013666460250876436),
 ('primary', 0.013661794714033906),
 ('coaster', 0.013660631014138395),
 ('explore', 0.013656042478726909),
 ('china', 0.013638756081011151),
 ('advantage', 0.013631698822745387),
 ('protagonists', 0.013627593648932781),
 ('partly', 0.013617059618125359),
 ('artist', 0.013597123465502839),
 ('terrifying', 0.013581203319898153),
 ('scarlett', 0.013567078625941564),
 ('mesmerizing', 0.01354781689947941),
 ('prince', 0.013541105943095598),
 ('weird', 0.013535346249579566),
 ('vance', 0.013518150392608121),
 ('collect', 0.013513303578887652),
 ('humour', 0.013508890166677978),
 ('doc', 0.013507286431402924),
 ('history', 0.013506120200788268),
 ('miss', 0.01349818799089743),
 ('angles', 0.013497507265665435),
 ('dealers', 0.013493607234383895),
 ('mass', 0.013472328625932874),
 ('paramount', 0.013467546662344522),
 ('musicians', 0.013464517138686273),
 ('jackman', 0.013441428735872098),
 ('cheer', 0.013440230376864145),
 ('aired', 0.013427957547366854),
 ('personal', 0.013422418887670071),
 ('become', 0.013415910991211793),
 ('wang', 0.013406655764270567),
 ('unforgettable', 0.013405651085753997),
 ('theme', 0.013397995857105537),
 ('satisfy', 0.01336101263463744),
 ('beginning', 0.013353575498360082),
 ('tongue', 0.013332587937334757),
 ('ran', 0.013322580056022444),
 ('vh', 0.013321694862247338),
 ('april', 0.013317958082689022),
 ('cracking', 0.01331648265485188),
 ('hilariously', 0.013312111975215814),
 ('addictive', 0.013304056341282523),
 ('factory', 0.013302408850101527),
 ('bloom', 0.013287106893282025),
 ('outcome', 0.013278893812795744),
 ('startling', 0.013276469703553513),
 ('portrait', 0.01327305510099926),
 ('adapted', 0.013258514308676838),
 ('raines', 0.013257908724754863),
 ('sky', 0.013252502620889894),
 ('earlier', 0.013233110743632559),
 ('atlantis', 0.01322818861014456),
 ('delirious', 0.013226874818125445),
 ('titanic', 0.013205633401144466),
 ('nevertheless', 0.013198200611184941),
 ('proved', 0.013189760358384484),
 ('denzel', 0.013188430841614765),
 ('pleasant', 0.013180077348723361),
 ('horses', 0.013178651568029467),
 ('about', 0.013166154528006856),
 ('astounding', 0.013161698337226808),
 ('savage', 0.013154100553759934),
 ('winning', 0.01315324670837965),
 ('rose', 0.013145586701309777),
 ('fitting', 0.013133578254330347),
 ('compared', 0.013131693803520051),
 ('took', 0.013119343481498985),
 ('masterson', 0.013112762074217891),
 ('owner', 0.013108690454819136),
 ('delight', 0.013107278788311012),
 ('conventions', 0.01310603977069605),
 ('natali', 0.013094964441143215),
 ('message', 0.013093664295113416),
 ('stood', 0.013090122718303425),
 ('sailor', 0.01305895917042345),
 ('ida', 0.013058842950256232),
 ('escaping', 0.01305272362470678),
 ('top', 0.013047466741024414),
 ('louis', 0.013046238442637009),
 ('peace', 0.013040907918892328),
 ('several', 0.01302824488706027),
 ('info', 0.013023754625550174),
 ('graphics', 0.013020850288881849),
 ('reflection', 0.013019243823940105),
 ('slimy', 0.013014377070231845),
 ('elvira', 0.013009811638957064),
 ('andre', 0.01300004731344674),
 ('kong', 0.012999080313300528),
 ('mayor', 0.012994758409723564),
 ('punishment', 0.012988264949614938),
 ('morris', 0.012983710119604964),
 ('hall', 0.012981593609354808),
 ('match', 0.012980233583057324),
 ('bleak', 0.012972505086304058),
 ('lindy', 0.01297224893312126),
 ('sequence', 0.012964435808713573),
 ('learn', 0.012938848970083345),
 ('happen', 0.012932836387873745),
 ('john', 0.012929524979001666),
 ('gothic', 0.012926957011734876),
 ('wider', 0.012920985981480958),
 ('popular', 0.012891690509844083),
 ('diverse', 0.012875263936567813),
 ('compare', 0.012869395292065187),
 ('brooklyn', 0.012852986243263932),
 ('broadcast', 0.012839574692097613),
 ('zane', 0.012834302957709145),
 ('andrew', 0.012824020940615251),
 ('finely', 0.012822716004015855),
 ('confronted', 0.012817523686608621),
 ('going', 0.012809762839304961),
 ('likewise', 0.012804639349082516),
 ('breath', 0.012790132659417912),
 ('building', 0.012789809704793867),
 ('suggesting', 0.012780624321169345),
 ('contemporary', 0.012772749462937513),
 ('midnight', 0.012766963563112075),
 ('victoria', 0.012756422131580528),
 ('lasting', 0.012752424415642593),
 ('kitty', 0.012751468371946009),
 ('continued', 0.012744325456485397),
 ('indian', 0.012712962842718674),
 ('subplots', 0.012709887814283906),
 ('douglas', 0.012693830679455896),
 ('explosions', 0.012692697593201855),
 ('bond', 0.012689802823687821),
 ('delightfully', 0.012669417460922622),
 ('understated', 0.012669374312789351),
 ('greater', 0.012664580396020154),
 ('sailing', 0.012662424581282427),
 ('images', 0.012661803048859875),
 ('copy', 0.012624649645734159),
 ('seat', 0.012610464273152518),
 ('eleven', 0.012602533659978888),
 ('riveting', 0.012591829460094515),
 ('boiled', 0.01258886352963876),
 ('academy', 0.012581996178142983),
 ('whilst', 0.01256984165329564),
 ('heaven', 0.012547361621330921),
 ('fruit', 0.012543513029693252),
 ('reviewer', 0.012534273375083893),
 ('cost', 0.012529643005796618),
 ('week', 0.01252284501500827),
 ('intriguing', 0.012508687653306356),
 ('streak', 0.012507752385208551),
 ('san', 0.012502130058217922),
 ('awareness', 0.012476446442012446),
 ('catching', 0.012467108595451535),
 ('kicks', 0.012457714930570582),
 ('complexities', 0.012454362663082466),
 ('draws', 0.012447753285125911),
 ('easily', 0.012444885855614905),
 ('ealing', 0.012444339255708921),
 ('psychopath', 0.012431259926282277),
 ('skin', 0.012424248540973574),
 ('creative', 0.012386713452491526),
 ('recognition', 0.012354025801439423),
 ('downey', 0.012348698765161131),
 ('symbolism', 0.012329925038271326),
 ('touches', 0.012328013470751463),
 ('everyday', 0.012324934809895896),
 ('achieves', 0.012314898707483493),
 ('outcast', 0.012313662230219676),
 ('overwhelmed', 0.012306633138869472),
 ...]

In [26]:
get_most_similar_words("terrible")


Out[26]:
[('worst', 0.16966107259049848),
 ('awful', 0.12026847019691245),
 ('waste', 0.11945367265311002),
 ('poor', 0.09275888757443547),
 ('terrible', 0.091425387197727942),
 ('dull', 0.084209271678223604),
 ('poorly', 0.081241544516042055),
 ('disappointment', 0.080064759621368706),
 ('fails', 0.078599773723337527),
 ('disappointing', 0.07733948548032335),
 ('boring', 0.077127858748012895),
 ('unfortunately', 0.075502449705859051),
 ('worse', 0.070601835364194662),
 ('mess', 0.070564299623590412),
 ('stupid', 0.069484822832543036),
 ('badly', 0.066888903666228558),
 ('annoying', 0.065687021903374165),
 ('bad', 0.063093814537572152),
 ('save', 0.062880597495865748),
 ('disappointed', 0.06269235381207286),
 ('wasted', 0.061387183028051275),
 ('supposed', 0.060985452957725145),
 ('horrible', 0.060121772339380118),
 ('laughable', 0.058698406285467651),
 ('crap', 0.058104528667884577),
 ('basically', 0.057218840369636162),
 ('nothing', 0.057158220043034204),
 ('ridiculous', 0.056905481068931445),
 ('lacks', 0.055766565889465457),
 ('lame', 0.055616009058110184),
 ('avoid', 0.05551872607319721),
 ('unless', 0.054208926212940732),
 ('script', 0.053948359467048533),
 ('failed', 0.05341393055000912),
 ('pointless', 0.052855531546894118),
 ('oh', 0.052761580933176837),
 ('effort', 0.050773747127292324),
 ('guess', 0.050379576420076545),
 ('minutes', 0.049784532804242193),
 ('wooden', 0.049453108380727188),
 ('redeeming', 0.049182869114721757),
 ('seems', 0.049079625154669751),
 ('instead', 0.047957645123532282),
 ('weak', 0.046496387374765663),
 ('pathetic', 0.04609974114971576),
 ('looks', 0.045796536730244877),
 ('hoping', 0.045082242887577034),
 ('wonder', 0.044669791780934602),
 ('forgettable', 0.042854349251871711),
 ('silly', 0.042237829687270009),
 ('attempt', 0.04170629994137353),
 ('predictable', 0.041514442438568125),
 ('someone', 0.0415061190273373),
 ('sorry', 0.040868877281533364),
 ('might', 0.040445683500688355),
 ('slow', 0.040346869107034951),
 ('painful', 0.040220039039613256),
 ('thin', 0.040062642253777855),
 ('mediocre', 0.039407165377577387),
 ('garbage', 0.039310979440981109),
 ('money', 0.038907973313640494),
 ('none', 0.038300807052230941),
 ('bland', 0.038062246057085046),
 ('couldn', 0.038016664218957934),
 ('either', 0.037738833070341961),
 ('unfunny', 0.03707662980504451),
 ('entire', 0.036642119399463165),
 ('cheap', 0.036516800802525583),
 ('honestly', 0.03621204154379784),
 ('mildly', 0.035744850608185635),
 ('total', 0.035560454471013074),
 ('neither', 0.035415946043548557),
 ('making', 0.035244315060985618),
 ('problem', 0.035088251034562444),
 ('flat', 0.034518947038747076),
 ('bizarre', 0.034509460694521141),
 ('group', 0.034335883528586797),
 ('dreadful', 0.034287618511331858),
 ('ludicrous', 0.03415964932381603),
 ('decent', 0.03377158578786895),
 ('clich', 0.033751444631720556),
 ('daughter', 0.033732725858384882),
 ('bored', 0.033622879572852558),
 ('horror', 0.033464120619956815),
 ('writing', 0.033437913916756788),
 ('skip', 0.033430639850491169),
 ('absurd', 0.033154173530163318),
 ('barely', 0.032653416827517719),
 ('idea', 0.032584013175663243),
 ('wasn', 0.03248120796627206),
 ('fake', 0.032136435098031518),
 ('believe', 0.031677858935800801),
 ('uninteresting', 0.031526815915867139),
 ('reason', 0.031390715260270541),
 ('scenes', 0.03121636293538917),
 ('alright', 0.031046883113956251),
 ('body', 0.03099998294598668),
 ('no', 0.030917695380560412),
 ('insult', 0.030808450146355935),
 ('mst', 0.030527916471397864),
 ('nowhere', 0.030352177599338292),
 ('lousy', 0.03016019546838079),
 ('didn', 0.030115903194061419),
 ('interest', 0.029888118468771124),
 ('half', 0.029813246115057257),
 ('lee', 0.029804235955718652),
 ('dimensional', 0.029562861996904038),
 ('unconvincing', 0.029322607679950242),
 ('left', 0.029322408787030529),
 ('sex', 0.029296748476082147),
 ('even', 0.029225209450923412),
 ('far', 0.029192618334294561),
 ('tries', 0.029004001132703541),
 ('anything', 0.028988097743501119),
 ('trying', 0.02891947722846511),
 ('accent', 0.028779542310252575),
 ('nudity', 0.028662654953266063),
 ('apparently', 0.028291626941517923),
 ('zombies', 0.028178583120430676),
 ('sense', 0.028166740534758778),
 ('incoherent', 0.027988926190862514),
 ('something', 0.027986519420278223),
 ('tedious', 0.027952212405329517),
 ('wrong', 0.027831947557365632),
 ('were', 0.027825695799985388),
 ('endless', 0.027824591794431468),
 ('turkey', 0.027624266205058482),
 ('zombie', 0.027543333835110859),
 ('appears', 0.02746984087848325),
 ('embarrassing', 0.027425437142424351),
 ('walked', 0.027411768647042711),
 ('premise', 0.027346072285964189),
 ('ok', 0.027333008356232008),
 ('result', 0.027312558653191918),
 ('complete', 0.027247564384243431),
 ('t', 0.027186737465610209),
 ('least', 0.02694907263201728),
 ('was', 0.026917906772065292),
 ('unwatchable', 0.026829458762459388),
 ('sat', 0.026806511532143463),
 ('to', 0.026801902698524085),
 ('sadly', 0.026753380035391513),
 ('christmas', 0.026735555962199217),
 ('gore', 0.026670161630608404),
 ('mother', 0.026612696987437758),
 ('aspects', 0.026583237615263801),
 ('amateurish', 0.0265651592911757),
 ('below', 0.026548271016778147),
 ('stupidity', 0.026460990221946933),
 ('appeal', 0.02639659671342098),
 ('trite', 0.026331168557051404),
 ('then', 0.026284629203937659),
 ('rubbish', 0.026216695246125507),
 ('okay', 0.025981446095883612),
 ('sucks', 0.025930224401969348),
 ('pretentious', 0.025907912370628297),
 ('positive', 0.025773976409798761),
 ('confusing', 0.025737618729473642),
 ('remotely', 0.025699566061653023),
 ('obnoxious', 0.025454829745850255),
 ('m', 0.025435495928249188),
 ('rent', 0.025373441934038499),
 ('laughs', 0.025346512576104412),
 ('re', 0.025342239903627863),
 ('context', 0.025274382593713576),
 ('disgusting', 0.025195418263468185),
 ('so', 0.025148024611438818),
 ('tiresome', 0.025031684199042101),
 ('miscast', 0.024970026716882372),
 ('aren', 0.024968703889385904),
 ('forced', 0.024933299777713702),
 ('paid', 0.024906929703330343),
 ('utter', 0.024802282233385525),
 ('uninspired', 0.024799576212017463),
 ('falls', 0.024749631706810705),
 ('throw', 0.024614954073046699),
 ('been', 0.024470487429445045),
 ('ugly', 0.024334820044832381),
 ('hopes', 0.024315635652054312),
 ('dire', 0.024191221840051083),
 ('hunter', 0.024171291127418466),
 ('producers', 0.024089231997130232),
 ('seem', 0.024065146985976841),
 ('straight', 0.02399666645155216),
 ('vampire', 0.023942797574072684),
 ('paper', 0.023908828083961008),
 ('crappy', 0.023807255546688062),
 ('excited', 0.023764516357875815),
 ('start', 0.023739057832096774),
 ('material', 0.023729757962158749),
 ('excuse', 0.023681577270328102),
 ('cop', 0.023480677028928126),
 ('f', 0.023312251619610837),
 ('ms', 0.023282327986278321),
 ('villain', 0.023158273483660743),
 ('fest', 0.023091425711778243),
 ('lack', 0.023039437894325179),
 ('such', 0.023031161078650945),
 ('saving', 0.023025745893238081),
 ('clichs', 0.022928209200342314),
 ('enough', 0.022921397253925297),
 ('mistake', 0.022868689470375007),
 ('unbelievable', 0.022864325693347887),
 ('maybe', 0.022825002748295287),
 ('blame', 0.022808369279543172),
 ('bunch', 0.022769532876362859),
 ('version', 0.02275329694575548),
 ('candy', 0.022749363632616763),
 ('island', 0.02274580066608016),
 ('tripe', 0.022695188509832681),
 ('wasting', 0.022681371343356765),
 ('inept', 0.022679276425665761),
 ('actor', 0.022636975371771055),
 ('flop', 0.022613758633444534),
 ('any', 0.022560608437607207),
 ('k', 0.02255401757961505),
 ('appalling', 0.022500975853556055),
 ('propaganda', 0.022465024430755744),
 ('major', 0.022430482324246579),
 ('sequel', 0.022362296462477879),
 ('offensive', 0.022326080604825445),
 ('revenge', 0.022315150942472623),
 ('shoot', 0.02228810570921174),
 ('whatsoever', 0.02228649834694094),
 ('ruined', 0.022173811528211032),
 ('painfully', 0.022152008209040924),
 ('on', 0.022016020939730058),
 ('shame', 0.021981493467648276),
 ('effects', 0.021849482201960247),
 ('wouldn', 0.021848506706035161),
 ('development', 0.021773241990065747),
 ('plot', 0.021733893676650604),
 ('co', 0.021728673026887638),
 ('church', 0.021719723717009982),
 ('storyline', 0.021663404462350766),
 ('screenwriter', 0.021660177252485924),
 ('bother', 0.021571699909566967),
 ('miserably', 0.021516173872499805),
 ('christian', 0.021515873507543661),
 ('add', 0.021468134313277949),
 ('found', 0.021449077767987147),
 ('watching', 0.021344833140596594),
 ('pseudo', 0.021308384076023461),
 ('boredom', 0.021119995917930005),
 ('please', 0.021090765093296302),
 ('talent', 0.021005847445274783),
 ('continuity', 0.02100514585242191),
 ('talents', 0.020992716564348899),
 ('college', 0.020990718952374872),
 ('tried', 0.02097821962618682),
 ('editing', 0.020865814801443752),
 ('lines', 0.020853755408845785),
 ('drivel', 0.020726493692759695),
 ('generous', 0.020697017742242002),
 ('potential', 0.020672988272090836),
 ('creatures', 0.020601399429061324),
 ('disjointed', 0.020581338926655209),
 ('irritating', 0.020576764848872681),
 ('pile', 0.020560898967541534),
 ('acts', 0.020560043588043531),
 ('junk', 0.020558505639508211),
 ('raped', 0.020550629285133258),
 ('christ', 0.020481424289613526),
 ('brain', 0.020431161137662711),
 ('slasher', 0.020425652445140888),
 ('seconds', 0.020390927443421879),
 ('nobody', 0.020389268101762611),
 ('dialog', 0.020338349197601496),
 ('makers', 0.020333184431951135),
 ('excitement', 0.020290456024291803),
 ('flashbacks', 0.020267510512910245),
 ('sloppy', 0.020234078734398368),
 ('joke', 0.020212187048528514),
 ('sleep', 0.020108895811675784),
 ('bottom', 0.01998677054728017),
 ('however', 0.019981104962051171),
 ('fail', 0.019937405211620234),
 ('sucked', 0.019874923017311578),
 ('soap', 0.019853525395543015),
 ('looked', 0.019810211840927103),
 ('stinks', 0.019769365381781166),
 ('deserve', 0.019614034321096454),
 ('exact', 0.019555320028258997),
 ('substance', 0.019552647432498179),
 ('yeah', 0.019513150136671552),
 ('production', 0.019510696746296532),
 ('female', 0.019476914978121786),
 ('unintentional', 0.019387723280198929),
 ('army', 0.019364852889641612),
 ('minute', 0.019351862554568253),
 ('unrealistic', 0.019350657250497862),
 ('rescue', 0.019340920364464918),
 ('theater', 0.01933382927666849),
 ('monsters', 0.019332636015751022),
 ('frankly', 0.01932655082384388),
 ('children', 0.019314240606868868),
 ('convince', 0.019312073515560642),
 ('shallow', 0.019298445504930539),
 ('synopsis', 0.019259706392396592),
 ('scott', 0.01918347440557033),
 ('seriously', 0.019182027987149991),
 ('ridiculously', 0.019169300285178974),
 ('looking', 0.019150985439966572),
 ('kareena', 0.019110212601710662),
 ('wrote', 0.019015323411486432),
 ('attempts', 0.019006343780653943),
 ('bothered', 0.018970712777578516),
 ('utterly', 0.018924824767803394),
 ('giant', 0.018891084650049701),
 ('writers', 0.018868906582101285),
 ('atrocious', 0.018848042351202358),
 ('plain', 0.018828766525513598),
 ('presumably', 0.018826629750947944),
 ('example', 0.018796453237837189),
 ('murray', 0.018754173430046931),
 ('seemed', 0.018749132295913074),
 ('stay', 0.01874415970643269),
 ('interview', 0.018672085964709526),
 ('disaster', 0.018553283301235148),
 ('value', 0.018544080955166374),
 ('paint', 0.018529607132429366),
 ('original', 0.018528190682362406),
 ('difficult', 0.018518455298178589),
 ('care', 0.018494804801171258),
 ('watchable', 0.018481870605389104),
 ('useless', 0.018470481000366856),
 ('desperately', 0.01842167504700026),
 ('except', 0.018391993551238543),
 ('doing', 0.018384737621350653),
 ('errors', 0.018380414978330265),
 ('solely', 0.018349321075079392),
 ('sitting', 0.018346519170301064),
 ('giving', 0.018335957397904838),
 ('ideas', 0.018327099221245192),
 ('unbearable', 0.018321159676201411),
 ('advice', 0.018273372527688847),
 ('nor', 0.018254420259554292),
 ('project', 0.018252633214771746),
 ('dozen', 0.018206363291515749),
 ('charles', 0.018163660578293463),
 ('plastic', 0.018161741020378659),
 ('book', 0.018139011699011297),
 ('shots', 0.018114876064363867),
 ('ill', 0.018103621818215735),
 ('grade', 0.018088309511242354),
 ('where', 0.01806588259969515),
 ('women', 0.018026883825059355),
 ('screenplay', 0.018014307024101332),
 ('through', 0.017990863003241406),
 ('actress', 0.017876003487857148),
 ('sign', 0.01786563614405693),
 ('walk', 0.017823522607756635),
 ('santa', 0.017727102733219178),
 ('happens', 0.017722408798843577),
 ('contrived', 0.017720303645882802),
 ('gun', 0.017685993176933833),
 ('ashamed', 0.017679623098721592),
 ('gratuitous', 0.017665737783803856),
 ('one', 0.017608259344043278),
 ('not', 0.017562336441189881),
 ('credibility', 0.017558852870687959),
 ('promising', 0.017544417082572289),
 ('risk', 0.017532600100721243),
 ('sub', 0.017531947750389461),
 ('lacking', 0.017513759836446527),
 ('fell', 0.017464857159331271),
 ('scenery', 0.017451365955319969),
 ('flesh', 0.017402514298262693),
 ('animal', 0.017386681692205426),
 ('tired', 0.017383214541566681),
 ('writer', 0.017380887757560842),
 ('lady', 0.017370657212565477),
 ('dialogue', 0.017319373946647617),
 ('terribly', 0.017291135257276893),
 ('downright', 0.017277675563205454),
 ('rented', 0.017247977656900716),
 ('clumsy', 0.01724129080518208),
 ('blah', 0.017217377177396763),
 ('random', 0.017199913549247988),
 ('members', 0.017198947117344762),
 ('three', 0.017189383912215913),
 ('celluloid', 0.017174000803758888),
 ('your', 0.017140173886430052),
 ('lost', 0.017127763322061815),
 ('suddenly', 0.017124566068806111),
 ('cover', 0.017066680835874291),
 ('existent', 0.017028540662919325),
 ('mostly', 0.017009366180205387),
 ('dig', 0.016990887715494292),
 ('spending', 0.016944400877991015),
 ('elsewhere', 0.016937877167916518),
 ('suck', 0.016897737192407596),
 ('apparent', 0.016783874225807262),
 ('fill', 0.016766110935370608),
 ('running', 0.016728621099996364),
 ('jokes', 0.016718920312228033),
 ('cheese', 0.016699473014889846),
 ('outer', 0.016612591391981468),
 ('anil', 0.016581200840654873),
 ('director', 0.016512894450311424),
 ('awfully', 0.016492200414985302),
 ('mix', 0.016468214294032498),
 ('naturally', 0.016404879835269455),
 ('scientist', 0.016395078905109245),
 ('imdb', 0.016343168034107167),
 ('dumb', 0.016289693549692456),
 ('made', 0.016279809910441426),
 ('curiosity', 0.016277433551029962),
 ('somewhere', 0.01623611744674798),
 ('stereotyped', 0.016235814767295294),
 ('officer', 0.016235401039884582),
 ('shelf', 0.016151304702362455),
 ('spends', 0.016089566181633218),
 ('explanation', 0.016040330428242218),
 ('proof', 0.016021381235154293),
 ('killed', 0.016004979798664883),
 ('songs', 0.016002280189188103),
 ('why', 0.015994497048455181),
 ('adequate', 0.015978003410591603),
 ('assume', 0.015953574865902428),
 ('mean', 0.015907137878947281),
 ('year', 0.015900265748875854),
 ('named', 0.015897377296493421),
 ('actors', 0.015880849255718713),
 ('dreck', 0.01584418483784927),
 ('ripped', 0.01580935239122223),
 ('exception', 0.015801037653546946),
 ('let', 0.01574755499580684),
 ('said', 0.015739206756809138),
 ('handed', 0.015729421480492774),
 ('five', 0.015692627471399444),
 ('manage', 0.015647108880417121),
 ('thousands', 0.01564343097589297),
 ('faith', 0.015616976955551868),
 ('hideous', 0.015589158171890808),
 ('alas', 0.015538213296394238),
 ('interesting', 0.015537431607034399),
 ('camera', 0.015534217771859279),
 ('affair', 0.015499371820329419),
 ('basketball', 0.015498025904813828),
 ('saved', 0.015479619606949038),
 ('allow', 0.015471290657970002),
 ('embarrassed', 0.015465690911012365),
 ('historically', 0.015405093934372957),
 ('guy', 0.015377641254470054),
 ('smoking', 0.01534650885437833),
 ('implausible', 0.015340453986022747),
 ('entirely', 0.015334692788183628),
 ('insulting', 0.015328508644691501),
 ('unable', 0.015321433538157143),
 ('supposedly', 0.015316107621242393),
 ('replaced', 0.015263381265213493),
 ('write', 0.015247349730647845),
 ('devoid', 0.01519618192038018),
 ('angry', 0.01512887842510143),
 ('cannot', 0.015124671278970775),
 ('stinker', 0.015117424017513684),
 ('types', 0.015097306608066994),
 ('hype', 0.015076288365524312),
 ('responsible', 0.014991356276561571),
 ('peter', 0.014969127137333007),
 ('putting', 0.01491070725493724),
 ('over', 0.014897181020826416),
 ('cardboard', 0.014888714204149054),
 ('interspersed', 0.014883165331874143),
 ('haired', 0.014880449676198558),
 ('spend', 0.014876094316227651),
 ('elvis', 0.014854709844151742),
 ('indulgent', 0.014847232132387193),
 ('catholic', 0.014843519648135945),
 ('downhill', 0.014807184967767801),
 ('lazy', 0.014781514695229727),
 ('aged', 0.014773315829198596),
 ('exist', 0.014753607788843276),
 ('torture', 0.014733998799388383),
 ('prove', 0.014729418674653008),
 ('tolerable', 0.014680880104255794),
 ('four', 0.014654547592632508),
 ('acceptable', 0.01465173069496585),
 ('chick', 0.014641428398798825),
 ('unimaginative', 0.014629366067627067),
 ('whiny', 0.014626751487134585),
 ('artsy', 0.014597921349167287),
 ('decide', 0.014596087755808963),
 ('unpleasant', 0.014539257963097203),
 ('rotten', 0.014526987482368666),
 ('racist', 0.014521318292204649),
 ('air', 0.014513999400043538),
 ('flimsy', 0.014510298364381134),
 ('baldwin', 0.014458793249711608),
 ('merely', 0.014423588430956447),
 ('wood', 0.014405182128559185),
 ('thinking', 0.014365675477621551),
 ('earth', 0.014352953870200838),
 ('kidding', 0.014337420788166334),
 ('unintentionally', 0.014336443850996722),
 ('vampires', 0.014325905430975231),
 ('generic', 0.014319871170399822),
 ('defense', 0.014290336242912222),
 ('saif', 0.014289573796132724),
 ('asleep', 0.014289012435576957),
 ('execution', 0.01428396200827341),
 ('figure', 0.014283770855230152),
 ('lackluster', 0.014273058981901449),
 ('hoped', 0.014264724762345849),
 ('nonsense', 0.014261341497203133),
 ('horrid', 0.014253216604458425),
 ('god', 0.014237363547447925),
 ('l', 0.014187296773742579),
 ('caricatures', 0.014181564208326643),
 ('starts', 0.014153430344591583),
 ('dry', 0.014133935534427954),
 ('display', 0.014128179969827095),
 ('button', 0.014116471162614745),
 ('bore', 0.014116389381443269),
 ('empty', 0.014096772700681905),
 ('harold', 0.014052130896646571),
 ('incomprehensible', 0.014009428713655195),
 ('annie', 0.014008405850952515),
 ('thrown', 0.014007462594894701),
 ('incredibly', 0.014005185007294351),
 ('renting', 0.013926687608630473),
 ('connect', 0.013922471736926739),
 ('younger', 0.01392114839514175),
 ('author', 0.013908729139553405),
 ('mistakes', 0.013902060662024717),
 ('vague', 0.013900188409028444),
 ('susan', 0.013899718009237951),
 ('obvious', 0.013862928310275264),
 ('public', 0.013848261281553181),
 ('porn', 0.013842110384054571),
 ('trash', 0.013803990572178482),
 ('stevens', 0.013796967244647431),
 ('sequels', 0.013782463861472688),
 ('hurt', 0.01376954392124014),
 ('desert', 0.01376361912496973),
 ('did', 0.013737639449728171),
 ('behave', 0.013719767167839477),
 ('served', 0.013714838239223717),
 ('claims', 0.01370688626965051),
 ('ultimately', 0.013697643591100152),
 ('wide', 0.013685211021307757),
 ('wow', 0.013679184770624806),
 ('worthless', 0.01367053329629828),
 ('dear', 0.013653591379600143),
 ('plodding', 0.01362284584085525),
 ('mike', 0.013594086031988719),
 ('favor', 0.013578310381078491),
 ('call', 0.013577646631327938),
 ('biggest', 0.013529947586389578),
 ('worthy', 0.013524754842185318),
 ('meaning', 0.013517997531900569),
 ('scientific', 0.013515396653842859),
 ('hanks', 0.013467213376215904),
 ('ads', 0.013463653421760931),
 ('gay', 0.01341484080868823),
 ('embarrassingly', 0.013401336286973733),
 ('literary', 0.013389208999321039),
 ('playing', 0.01332995463472637),
 ('bo', 0.013312890564682513),
 ('manipulative', 0.013287016941406334),
 ('dressed', 0.013285092423656558),
 ('embarrassment', 0.01326953031919822),
 ('regarding', 0.013233250211631659),
 ('stilted', 0.013215539220141915),
 ('sleeve', 0.013215085161586726),
 ('rating', 0.013203442200940885),
 ('kills', 0.013183919467358743),
 ('sounds', 0.013178727878711719),
 ('ali', 0.013173031266866376),
 ('non', 0.01316260375180524),
 ('pie', 0.013161492629253851),
 ('populated', 0.013152746747459266),
 ('killing', 0.013111860853151806),
 ('else', 0.013110592541316695),
 ('schneider', 0.013093514941690405),
 ('priest', 0.013071537555948205),
 ('hollow', 0.013068001463175462),
 ('shower', 0.013029604174841072),
 ('ruins', 0.013021597567104512),
 ('mental', 0.013019696244479823),
 ('this', 0.013009778169664532),
 ('pregnant', 0.012997074834619548),
 ('make', 0.012992851916498642),
 ('timberlake', 0.012979689860020448),
 ('saves', 0.012915795355367859),
 ('vastly', 0.012914828969565754),
 ('swear', 0.012901059475490069),
 ('stella', 0.012883911119651205),
 ('grave', 0.012882555040277143),
 ('thats', 0.01286106181291035),
 ('drinking', 0.012860129471019702),
 ('boom', 0.01285177959469419),
 ('introduction', 0.012831129197335455),
 ('programming', 0.012796219757750258),
 ('career', 0.012773059501084108),
 ('stereotype', 0.012769447626661472),
 ('attractive', 0.012765873120010146),
 ('victims', 0.012749299245502168),
 ('pass', 0.012735021821089288),
 ('experiment', 0.012716112941788916),
 ('retarded', 0.012713099529852416),
 ('stuck', 0.012709332698253249),
 ('akshay', 0.012684273069877867),
 ('cut', 0.012676285239015487),
 ('shoddy', 0.012674792040888049),
 ('damme', 0.012666536417656676),
 ('inaccurate', 0.012653687577536547),
 ('ray', 0.01264981802351018),
 ('woman', 0.012646521945546326),
 ('research', 0.01264049466286456),
 ('mile', 0.012627245693716732),
 ('place', 0.012624645831509419),
 ('demon', 0.012621688470792605),
 ('vulgar', 0.012612150302693319),
 ('engage', 0.012602272831074859),
 ('wives', 0.012601890190118302),
 ('mention', 0.01258159848000647),
 ('if', 0.012569631262234709),
 ('cartoon', 0.012561864177985764),
 ('unbelievably', 0.01255039166831585),
 ('only', 0.012517107727859141),
 ('ended', 0.012507282716729793),
 ('stereotypical', 0.012506426536204342),
 ('spent', 0.012503032775055226),
 ('thing', 0.012483110991541426),
 ('phone', 0.012464039991489132),
 ('stock', 0.01244674214755662),
 ('drop', 0.012432978683590465),
 ('self', 0.012432059211520791),
 ('headache', 0.01242449513419548),
 ('escapes', 0.012419211298248923),
 ('conceived', 0.012392639977060709),
 ('required', 0.012392260947042842),
 ('assassin', 0.012332404091910106),
 ('meat', 0.012327751187890422),
 ('therefore', 0.012316138729629602),
 ('struggling', 0.012308628353572293),
 ('ho', 0.012307714936265706),
 ('ta', 0.012299409649320241),
 ('cold', 0.012289510775209267),
 ('expects', 0.012271684887263188),
 ('furthermore', 0.012263298696316208),
 ('remote', 0.012254529263879222),
 ('cgi', 0.012250569964074172),
 ('arab', 0.012230232115225254),
 ('feminist', 0.012220004405980549),
 ('hair', 0.012213792907949607),
 ('intelligence', 0.012203964889416778),
 ('destroy', 0.01219021390702397),
 ('cameo', 0.012186034087855138),
 ('claus', 0.012181510618531245),
 ('awake', 0.012171290237450141),
 ('sums', 0.012139945909251911),
 ('auto', 0.012126012687040624),
 ('cue', 0.012120943623008961),
 ('speak', 0.012117784815618099),
 ('stereotypes', 0.012106976159466593),
 ('footage', 0.012103658001584281),
 ('maker', 0.012093369539270357),
 ('rental', 0.012083052888147337),
 ('proper', 0.012063210621690414),
 ('mercifully', 0.012047936344961967),
 ('gimmick', 0.012041001769926649),
 ('coherent', 0.012027899920693618),
 ('inane', 0.011993175877578831),
 ('relies', 0.011992345660343809),
 ('nomination', 0.011982252573531251),
 ('segal', 0.011947340234058407),
 ('christians', 0.011946398905489907),
 ('overrated', 0.011926101166626015),
 ('don', 0.011924357980777279),
 ('severely', 0.011916168552237321),
 ('phony', 0.01191382239312172),
 ('selfish', 0.011900529017180249),
 ('resume', 0.011897346320859063),
 ('another', 0.011877684431361642),
 ('sean', 0.01187604021413761),
 ('hepburn', 0.011869243078008906),
 ('secondly', 0.011863109334450275),
 ('ups', 0.011859394818287424),
 ('planet', 0.011852030247443595),
 ('changed', 0.011845335611887473),
 ('amused', 0.011842962845878571),
 ('lowest', 0.011831634819501925),
 ('fools', 0.011824116232842373),
 ('spelling', 0.011821902194872622),
 ('repressed', 0.011821527286346355),
 ('unlikeable', 0.011818760110586484),
 ('failure', 0.011816519901709057),
 ('line', 0.011796438571873895),
 ('hyped', 0.011784666544684309),
 ('anti', 0.011764086315539175),
 ('acting', 0.011752348314205381),
 ('promise', 0.011749711660046621),
 ('observe', 0.01173960895927862),
 ('mindless', 0.011729368774426884),
 ('lacked', 0.011718485221863712),
 ('rather', 0.011704535222487881),
 ('ed', 0.011700096242496993),
 ('significant', 0.011696176501939935),
 ('talks', 0.011678101476086888),
 ('arty', 0.011674972481678902),
 ('spit', 0.011671408526135135),
 ('ilk', 0.011661568455359032),
 ('unoriginal', 0.01165110724584089),
 ('forward', 0.011646719533106092),
 ('toilet', 0.01163552220763908),
 ('suppose', 0.011633258510072193),
 ('feed', 0.011617447517425161),
 ('surrounded', 0.011607897169523132),
 ('wanted', 0.011604506869089728),
 ('tashan', 0.011596205445299114),
 ('dr', 0.011543949281335645),
 ('scare', 0.011543316667712905),
 ('murderer', 0.011535350571639668),
 ('explained', 0.011466329649783223),
 ('cheated', 0.011455846970137714),
 ('whats', 0.011451443577230849),
 ('romance', 0.011445558616225327),
 ('jewish', 0.011441564163643688),
 ('sexual', 0.011438682797255701),
 ('books', 0.011419811777535161),
 ('throwing', 0.011404165894740241),
 ('nose', 0.01139558365172063),
 ('parking', 0.011390688400833916),
 ('pick', 0.011357671445382187),
 ('chose', 0.011354353327826123),
 ('improve', 0.011350584813053918),
 ('kapoor', 0.01134076781407491),
 ('costs', 0.011325900726890985),
 ('saying', 0.011325617629551317),
 ('early', 0.01132052573418809),
 ('technically', 0.011317672837061947),
 ('hackman', 0.011288294849240653),
 ('birthday', 0.011282785404027754),
 ('cinematography', 0.011263572785831694),
 ('hurts', 0.011250154303091526),
 ('saturday', 0.011247837147971238),
 ('meaningless', 0.011239510238506721),
 ('mannered', 0.011239044207972256),
 ('screaming', 0.01123862031022237),
 ('should', 0.011236648355832374),
 ('crazed', 0.011236418275421323),
 ('dignity', 0.011236150963786551),
 ('mate', 0.011216700009844505),
 ('letters', 0.011208675517174492),
 ('recycled', 0.011206236378205576),
 ('promptly', 0.011202237607822147),
 ('inexplicably', 0.011161321811546259),
 ('or', 0.01115296534330535),
 ('simply', 0.011146233896835904),
 ('too', 0.011130044921930284),
 ('nerd', 0.011122543127721441),
 ('chris', 0.011116119389820142),
 ('proceedings', 0.011111786695547103),
 ('lived', 0.011100598930695576),
 ('code', 0.011095425242701426),
 ('potentially', 0.011093285835678526),
 ('open', 0.011075631889800952),
 ('faster', 0.011074177906888309),
 ('moore', 0.011070458274337775),
 ('bowl', 0.011060417562531438),
 ('absolutely', 0.011044130796846871),
 ('just', 0.011033356854991554),
 ('suspension', 0.011031781173072127),
 ('enemy', 0.011025820754518642),
 ('conclusion', 0.010986051066943354),
 ('hospital', 0.010977494845678698),
 ('romances', 0.010962761722118314),
 ('spoke', 0.010962116403553655),
 ('hardly', 0.010960545391113441),
 ('olds', 0.010951344004097443),
 ('creek', 0.01095002392432287),
 ('shouting', 0.010943727502542746),
 ('originality', 0.010912963822714922),
 ('bollywood', 0.010911409137577786),
 ('cape', 0.010902326129518278),
 ('teeth', 0.010900502046002614),
 ('backdrop', 0.010885688008708729),
 ('turn', 0.010880478059425666),
 ('mason', 0.010866951716170662),
 ('grace', 0.010848406257382317),
 ('valley', 0.010845180425875851),
 ('depressing', 0.01082781808673851),
 ('superficial', 0.010826403237558527),
 ('invested', 0.01081248871664086),
 ('bomb', 0.010811727591767118),
 ('embarrass', 0.010778451069403573),
 ('sided', 0.010773707983617683),
 ('sticking', 0.01076229243554771),
 ('common', 0.010754536408451018),
 ('boat', 0.010750196487059148),
 ('promised', 0.010746025901289752),
 ('wayans', 0.010744338945929417),
 ('sheer', 0.01073410327947452),
 ('wrestling', 0.010724515540975418),
 ('staff', 0.010715523520497058),
 ('apollo', 0.010711377643774771),
 ('leigh', 0.010702080598678557),
 ('virtually', 0.010691942663824006),
 ('seagal', 0.010677324100672115),
 ('comes', 0.010674899719725498),
 ('edition', 0.010673353805904194),
 ('predictably', 0.010666551243955751),
 ('stuff', 0.010664915811483258),
 ('gang', 0.010664441184213122),
 ('cancer', 0.010643225900463578),
 ('obviously', 0.010641670080654524),
 ('would', 0.010623530922231167),
 ('totally', 0.010616092995147892),
 ('profile', 0.010596003501785217),
 ('spacey', 0.010595967407784396),
 ('ability', 0.01058459252136016),
 ('horrendous', 0.010580213328532087),
 ('blood', 0.010579520401095315),
 ('imitation', 0.010568550630572965),
 ('bikini', 0.010568043371931098),
 ('talented', 0.010566001035979433),
 ('basis', 0.010564729746933199),
 ('dialogs', 0.010551191397294006),
 ('showing', 0.010548613564454237),
 ('door', 0.010544563357219785),
 ('portray', 0.010527799628490634),
 ('strictly', 0.010526959295132308),
 ('mexican', 0.010508731517822329),
 ('stick', 0.010465961443388684),
 ('east', 0.01045532471601677),
 ('anywhere', 0.01043153273466628),
 ('remake', 0.010419869194952835),
 ('am', 0.010410414209203937),
 ('attempting', 0.010386393998627374),
 ('disturbing', 0.010381152608581447),
 ('jude', 0.010377136500506754),
 ('wondering', 0.0103635126900122),
 ('celebrated', 0.010360111769075862),
 ('use', 0.010350554074714646),
 ('wreck', 0.010344734410393921),
 ('appear', 0.010344438351539169),
 ('entitled', 0.010335246001593064),
 ('youth', 0.010323214445994804),
 ('letdown', 0.010318553446258687),
 ('moran', 0.010305507693633363),
 ('mediocrity', 0.010302827140695373),
 ('news', 0.010292874788426096),
 ('bits', 0.010276065293631165),
 ('alone', 0.010268492053981974),
 ('accents', 0.010263852094534688),
 ('inhabited', 0.010244117693024822),
 ('mock', 0.010244061360675906),
 ('g', 0.010223458175403786),
 ('box', 0.010203304329265748),
 ('term', 0.010199983044386097),
 ('behavior', 0.010198776124373244),
 ('tedium', 0.01019009220150722),
 ('intent', 0.010190038120698576),
 ('husband', 0.010189502265957844),
 ('presence', 0.010187192336074173),
 ('z', 0.010184318583214764),
 ('unappealing', 0.010146391189444366),
 ('much', 0.010136790117697142),
 ('tree', 0.010113534581593914),
 ('doctors', 0.010099854380484188),
 ('pi', 0.010095099419111337),
 ('rodney', 0.010090819798082386),
 ('franchise', 0.010089650929674203),
 ('piece', 0.010086011549585333),
 ('company', 0.010083539582601045),
 ('choppy', 0.010079223420593735),
 ('turned', 0.010069855547990144),
 ('test', 0.010041505355613897),
 ('ball', 0.010040944323609528),
 ('hated', 0.010035509058945867),
 ('bear', 0.010034272465057463),
 ('serves', 0.010027495172169233),
 ('leonard', 0.010022751390164696),
 ('deserved', 0.010022334081283375),
 ('part', 0.01001636043614744),
 ('opportunity', 0.010013126012646695),
 ('turning', 0.010011850960865772),
 ('overacting', 0.010008994714980214),
 ('refer', 0.010006488920574088),
 ('flies', 0.010006418749637628),
 ('uninvolving', 0.0099991338976208148),
 ('produce', 0.0099962014038013792),
 ('jumpy', 0.0099947855808415198),
 ('die', 0.0099914129058670999),
 ('root', 0.0099747135001128275),
 ('insomnia', 0.0099744642555285139),
 ('blatant', 0.0099596620005663883),
 ('larry', 0.0099556905367902578),
 ('threw', 0.0099473965388449607),
 ('billed', 0.0099285818753670936),
 ('bullets', 0.0099281758971005961),
 ('intellectually', 0.0099081388278786167),
 ('rip', 0.0099013233996040825),
 ('stretching', 0.0099012969699172632),
 ('protest', 0.0098984552675623616),
 ('soldiers', 0.0098936923822449188),
 ('flick', 0.009887063364977652),
 ('justin', 0.009862246602717558),
 ('highlights', 0.0098589088020586326),
 ('move', 0.0098539899809540407),
 ('merit', 0.0098431205949966755),
 ('russian', 0.0098411717219841037),
 ('security', 0.0098373450338831055),
 ('idiotic', 0.009834123428814465),
 ('produced', 0.0098294307574257923),
 ('king', 0.0098266872343175573),
 ('magically', 0.0098228842476825642),
 ('united', 0.0098070847890707729),
 ('missile', 0.0097990578193348533),
 ('unlikable', 0.0097869158986480815),
 ('ignorant', 0.009773274317346101),
 ('amateur', 0.009767405987056119),
 ('bachelor', 0.0097673429455405695),
 ('asylum', 0.009762733851977996),
 ('screw', 0.009756809857392721),
 ('report', 0.0097479232699172417),
 ('dracula', 0.0097467323393205605),
 ('removed', 0.0097416519499422052),
 ('confess', 0.0097162925211573305),
 ('brand', 0.0097152534660907564),
 ('conspiracy', 0.0097116972290397056),
 ('horribly', 0.0097083785564252584),
 ('switch', 0.0097026840933795502),
 ('jaws', 0.0096877455513713073),
 ('unsuspecting', 0.009685342503584644),
 ('betty', 0.009677035213332465),
 ('forwarding', 0.0096711196893192793),
 ('university', 0.0096636715878149586),
 ('star', 0.0096623254931800431),
 ('crawl', 0.0096464318968590562),
 ('dopey', 0.0096460863315858646),
 ('ruin', 0.009623010638545728),
 ('lifeless', 0.009622880727487999),
 ('flash', 0.0096193625359650009),
 ('whoever', 0.0096174128915875439),
 ('coincidence', 0.0096024599741402154),
 ('choosing', 0.0095951100051069223),
 ('avid', 0.0095900913284222636),
 ('intended', 0.0095846987041676296),
 ('remained', 0.0095839628178583866),
 ('c', 0.0095732676681762399),
 ('waiting', 0.009556225869434885),
 ('cassie', 0.009548135444223808),
 ('garage', 0.0095349544587830272),
 ('clarke', 0.0095345445855698624),
 ('fortune', 0.0095330396648302101),
 ('interminable', 0.0095328159563552659),
 ('incessant', 0.0095235485026846384),
 ('plots', 0.0095225805490624666),
 ('danger', 0.0095171205654692934),
 ('costumes', 0.0094980144667524448),
 ('evidently', 0.0094952158467012208),
 ('minus', 0.009491149517466128),
 ('reporters', 0.009483681104099086),
 ('israeli', 0.0094750077183364638),
 ('failing', 0.0094711841313976936),
 ('paying', 0.0094692344066851265),
 ('godzilla', 0.0094586915548437855),
 ('dumber', 0.0094582903092924851),
 ('earn', 0.0094476224928425005),
 ('slows', 0.0094467463872487632),
 ('held', 0.0094452736817914849),
 ('chase', 0.0094438362611946516),
 ('lies', 0.0094383969845033399),
 ('hands', 0.0094381781614589055),
 ('grief', 0.00942384945341029),
 ('brains', 0.009418215341663214),
 ('tom', 0.0094130433384347241),
 ('resurrected', 0.0094083423437290557),
 ('asking', 0.0094021029403453284),
 ('sleeps', 0.009401795188265831),
 ('porno', 0.0093907201413965125),
 ('somehow', 0.0093889261270860523),
 ('sarcasm', 0.0093886064393904137),
 ('tie', 0.0093856009366311572),
 ('fall', 0.0093801640008931257),
 ('bring', 0.0093791273545761524),
 ('rape', 0.0093760851230746452),
 ('village', 0.0093684513318614028),
 ('kitchen', 0.0093649071460109607),
 ('concerned', 0.0093611353238811368),
 ('republic', 0.009349942694876422),
 ('hell', 0.0093400360705317275),
 ('inducing', 0.0093382129792553489),
 ('stomach', 0.0093378286385158559),
 ('shambles', 0.0093335457329829768),
 ('virgin', 0.0093312001339055928),
 ('extraneous', 0.0093250413800351293),
 ('cameras', 0.0093229460267977154),
 ('suffers', 0.0093204929924830034),
 ('justified', 0.009316321747936316),
 ('plummer', 0.0092948273285103945),
 ('ponderous', 0.0092880344237223338),
 ('player', 0.0092802296345443642),
 ('survivor', 0.0092767026472125712),
 ('rainy', 0.009269703421813753),
 ('graces', 0.0092620944963291291),
 ...]

In [27]:
import matplotlib.colors as colors

words_to_visualize = list()
for word, ratio in pos_neg_ratios.most_common(500):
    if(word in mlp_full.word2index.keys()):
        words_to_visualize.append(word)
    
for word, ratio in list(reversed(pos_neg_ratios.most_common()))[0:500]:
    if(word in mlp_full.word2index.keys()):
        words_to_visualize.append(word)

In [28]:
pos = 0
neg = 0

colors_list = list()
vectors_list = list()
for word in words_to_visualize:
    if word in pos_neg_ratios.keys():
        vectors_list.append(mlp_full.weights_0_1[mlp_full.word2index[word]])
        if(pos_neg_ratios[word] > 0):
            pos+=1
            colors_list.append("#00ff00")
        else:
            neg+=1
            colors_list.append("#ff0000")

In [33]:
from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, random_state=0)
words_top_ted_tsne = tsne.fit_transform(vectors_list)

In [31]:
p = figure(tools="pan,wheel_zoom,reset,save",
           toolbar_location="above",
           title="vector T-SNE for most polarized words")

source = ColumnDataSource(data=dict(x1=words_top_ted_tsne[:,0],
                                    x2=words_top_ted_tsne[:,1],
                                    names=words_to_visualize))

p.scatter(x="x1", y="x2", size=8, source=source,color=colors_list)

word_labels = LabelSet(x="x1", y="x2", text="names", y_offset=6,
                  text_font_size="8pt", text_color="#555555",
                  source=source, text_align='center')
#p.add_layout(word_labels)

show(p)

# green indicates positive words, black indicates negative words


/opt/conda/envs/dlnd/lib/python3.6/site-packages/bokeh/util/deprecation.py:34: BokehDeprecationWarning: 
Supplying a user-defined data source AND iterable values to glyph methods is deprecated.

See https://github.com/bokeh/bokeh/issues/2056 for more information.

  warn(message)
/opt/conda/envs/dlnd/lib/python3.6/site-packages/bokeh/util/deprecation.py:34: BokehDeprecationWarning: 
Supplying a user-defined data source AND iterable values to glyph methods is deprecated.

See https://github.com/bokeh/bokeh/issues/2056 for more information.

  warn(message)

In [ ]: