Sentiment Classification & How To "Frame Problems" for a Neural Network

by Andrew Trask

What You Should Already Know

  • neural networks, forward and back-propagation
  • stochastic gradient descent
  • mean squared error
  • and train/test splits

Where to Get Help if You Need it

  • Re-watch previous Udacity Lectures
  • Leverage the recommended Course Reading Material - Grokking Deep Learning (40% Off: traskud17)
  • Shoot me a tweet @iamtrask

Tutorial Outline:

  • Intro: The Importance of "Framing a Problem"
  • Curate a Dataset
  • Developing a "Predictive Theory"
  • PROJECT 1: Quick Theory Validation
  • Transforming Text to Numbers
  • PROJECT 2: Creating the Input/Output Data
  • Putting it all together in a Neural Network
  • PROJECT 3: Building our Neural Network
  • Understanding Neural Noise
  • PROJECT 4: Making Learning Faster by Reducing Noise
  • Analyzing Inefficiencies in our Network
  • PROJECT 5: Making our Network Train and Run Faster
  • Further Noise Reduction
  • PROJECT 6: Reducing Noise by Strategically Reducing the Vocabulary
  • Analysis: What's going on in the weights?

Lesson: Curate a Dataset


In [1]:
def pretty_print_review_and_label(i):
    print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

In [2]:
len(reviews)


Out[2]:
25000

In [3]:
reviews[0]


Out[3]:
'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   '

In [4]:
labels[0]


Out[4]:
'POSITIVE'

Lesson: Develop a Predictive Theory


In [5]:
print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)


labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...

Project 1: Quick Theory Validation


In [6]:
from collections import Counter
import numpy as np

In [7]:
positive_counts = Counter()
negative_counts = Counter()
total_counts = Counter()

In [8]:
for i in range(len(reviews)):
    if(labels[i] == 'POSITIVE'):
        for word in reviews[i].split(" "):
            positive_counts[word] += 1
            total_counts[word] += 1
    else:
        for word in reviews[i].split(" "):
            negative_counts[word] += 1
            total_counts[word] += 1

In [9]:
positive_counts.most_common()


Out[9]:
[('', 550468),
 ('the', 173324),
 ('.', 159654),
 ('and', 89722),
 ('a', 83688),
 ('of', 76855),
 ('to', 66746),
 ('is', 57245),
 ('in', 50215),
 ('br', 49235),
 ('it', 48025),
 ('i', 40743),
 ('that', 35630),
 ('this', 35080),
 ('s', 33815),
 ('as', 26308),
 ('with', 23247),
 ('for', 22416),
 ('was', 21917),
 ('film', 20937),
 ('but', 20822),
 ('movie', 19074),
 ('his', 17227),
 ('on', 17008),
 ('you', 16681),
 ('he', 16282),
 ('are', 14807),
 ('not', 14272),
 ('t', 13720),
 ('one', 13655),
 ('have', 12587),
 ('be', 12416),
 ('by', 11997),
 ('all', 11942),
 ('who', 11464),
 ('an', 11294),
 ('at', 11234),
 ('from', 10767),
 ('her', 10474),
 ('they', 9895),
 ('has', 9186),
 ('so', 9154),
 ('like', 9038),
 ('about', 8313),
 ('very', 8305),
 ('out', 8134),
 ('there', 8057),
 ('she', 7779),
 ('what', 7737),
 ('or', 7732),
 ('good', 7720),
 ('more', 7521),
 ('when', 7456),
 ('some', 7441),
 ('if', 7285),
 ('just', 7152),
 ('can', 7001),
 ('story', 6780),
 ('time', 6515),
 ('my', 6488),
 ('great', 6419),
 ('well', 6405),
 ('up', 6321),
 ('which', 6267),
 ('their', 6107),
 ('see', 6026),
 ('also', 5550),
 ('we', 5531),
 ('really', 5476),
 ('would', 5400),
 ('will', 5218),
 ('me', 5167),
 ('had', 5148),
 ('only', 5137),
 ('him', 5018),
 ('even', 4964),
 ('most', 4864),
 ('other', 4858),
 ('were', 4782),
 ('first', 4755),
 ('than', 4736),
 ('much', 4685),
 ('its', 4622),
 ('no', 4574),
 ('into', 4544),
 ('people', 4479),
 ('best', 4319),
 ('love', 4301),
 ('get', 4272),
 ('how', 4213),
 ('life', 4199),
 ('been', 4189),
 ('because', 4079),
 ('way', 4036),
 ('do', 3941),
 ('made', 3823),
 ('films', 3813),
 ('them', 3805),
 ('after', 3800),
 ('many', 3766),
 ('two', 3733),
 ('too', 3659),
 ('think', 3655),
 ('movies', 3586),
 ('characters', 3560),
 ('character', 3514),
 ('don', 3468),
 ('man', 3460),
 ('show', 3432),
 ('watch', 3424),
 ('seen', 3414),
 ('then', 3358),
 ('little', 3341),
 ('still', 3340),
 ('make', 3303),
 ('could', 3237),
 ('never', 3226),
 ('being', 3217),
 ('where', 3173),
 ('does', 3069),
 ('over', 3017),
 ('any', 3002),
 ('while', 2899),
 ('know', 2833),
 ('did', 2790),
 ('years', 2758),
 ('here', 2740),
 ('ever', 2734),
 ('end', 2696),
 ('these', 2694),
 ('such', 2590),
 ('real', 2568),
 ('scene', 2567),
 ('back', 2547),
 ('those', 2485),
 ('though', 2475),
 ('off', 2463),
 ('new', 2458),
 ('your', 2453),
 ('go', 2440),
 ('acting', 2437),
 ('plot', 2432),
 ('world', 2429),
 ('scenes', 2427),
 ('say', 2414),
 ('through', 2409),
 ('makes', 2390),
 ('better', 2381),
 ('now', 2368),
 ('work', 2346),
 ('young', 2343),
 ('old', 2311),
 ('ve', 2307),
 ('find', 2272),
 ('both', 2248),
 ('before', 2177),
 ('us', 2162),
 ('again', 2158),
 ('series', 2153),
 ('quite', 2143),
 ('something', 2135),
 ('cast', 2133),
 ('should', 2121),
 ('part', 2098),
 ('always', 2088),
 ('lot', 2087),
 ('another', 2075),
 ('actors', 2047),
 ('director', 2040),
 ('family', 2032),
 ('between', 2016),
 ('own', 2016),
 ('m', 1998),
 ('may', 1997),
 ('same', 1972),
 ('role', 1967),
 ('watching', 1966),
 ('every', 1954),
 ('funny', 1953),
 ('doesn', 1935),
 ('performance', 1928),
 ('few', 1918),
 ('bad', 1907),
 ('look', 1900),
 ('re', 1884),
 ('why', 1855),
 ('things', 1849),
 ('times', 1832),
 ('big', 1815),
 ('however', 1795),
 ('actually', 1790),
 ('action', 1789),
 ('going', 1783),
 ('bit', 1757),
 ('comedy', 1742),
 ('down', 1740),
 ('music', 1738),
 ('must', 1728),
 ('take', 1709),
 ('saw', 1692),
 ('long', 1690),
 ('right', 1688),
 ('fun', 1686),
 ('fact', 1684),
 ('excellent', 1683),
 ('around', 1674),
 ('didn', 1672),
 ('without', 1671),
 ('thing', 1662),
 ('thought', 1639),
 ('got', 1635),
 ('each', 1630),
 ('day', 1614),
 ('feel', 1597),
 ('seems', 1596),
 ('come', 1594),
 ('done', 1586),
 ('beautiful', 1580),
 ('especially', 1572),
 ('played', 1571),
 ('almost', 1566),
 ('want', 1562),
 ('yet', 1556),
 ('give', 1553),
 ('pretty', 1549),
 ('last', 1543),
 ('since', 1519),
 ('different', 1504),
 ('although', 1501),
 ('gets', 1490),
 ('true', 1487),
 ('interesting', 1481),
 ('job', 1470),
 ('enough', 1455),
 ('our', 1454),
 ('shows', 1447),
 ('horror', 1441),
 ('woman', 1439),
 ('tv', 1400),
 ('probably', 1398),
 ('father', 1395),
 ('original', 1393),
 ('girl', 1390),
 ('point', 1379),
 ('plays', 1378),
 ('wonderful', 1372),
 ('far', 1358),
 ('course', 1358),
 ('john', 1350),
 ('rather', 1340),
 ('isn', 1328),
 ('ll', 1326),
 ('later', 1324),
 ('dvd', 1324),
 ('whole', 1310),
 ('war', 1310),
 ('d', 1307),
 ('found', 1306),
 ('away', 1306),
 ('screen', 1305),
 ('nothing', 1300),
 ('year', 1297),
 ('once', 1296),
 ('hard', 1294),
 ('together', 1280),
 ('set', 1277),
 ('am', 1277),
 ('having', 1266),
 ('making', 1265),
 ('place', 1263),
 ('might', 1260),
 ('comes', 1260),
 ('sure', 1253),
 ('american', 1248),
 ('play', 1245),
 ('kind', 1244),
 ('perfect', 1242),
 ('takes', 1242),
 ('performances', 1237),
 ('himself', 1230),
 ('worth', 1221),
 ('everyone', 1221),
 ('anyone', 1214),
 ('actor', 1203),
 ('three', 1201),
 ('wife', 1196),
 ('classic', 1192),
 ('goes', 1186),
 ('ending', 1178),
 ('version', 1168),
 ('star', 1149),
 ('enjoy', 1146),
 ('book', 1142),
 ('nice', 1132),
 ('everything', 1128),
 ('during', 1124),
 ('put', 1118),
 ('seeing', 1111),
 ('least', 1102),
 ('house', 1100),
 ('high', 1095),
 ('watched', 1094),
 ('loved', 1087),
 ('men', 1087),
 ('night', 1082),
 ('anything', 1075),
 ('believe', 1071),
 ('guy', 1071),
 ('top', 1063),
 ('amazing', 1058),
 ('hollywood', 1056),
 ('looking', 1053),
 ('main', 1044),
 ('definitely', 1043),
 ('gives', 1031),
 ('home', 1029),
 ('seem', 1028),
 ('episode', 1023),
 ('audience', 1020),
 ('sense', 1020),
 ('truly', 1017),
 ('special', 1011),
 ('second', 1009),
 ('short', 1009),
 ('fan', 1009),
 ('mind', 1005),
 ('human', 1001),
 ('recommend', 999),
 ('full', 996),
 ('black', 995),
 ('help', 991),
 ('along', 989),
 ('trying', 987),
 ('small', 986),
 ('death', 985),
 ('friends', 981),
 ('remember', 974),
 ('often', 970),
 ('said', 966),
 ('favorite', 962),
 ('heart', 959),
 ('early', 957),
 ('left', 956),
 ('until', 955),
 ('script', 954),
 ('let', 954),
 ('maybe', 937),
 ('today', 936),
 ('live', 934),
 ('less', 934),
 ('moments', 933),
 ('others', 929),
 ('brilliant', 926),
 ('shot', 925),
 ('liked', 923),
 ('become', 916),
 ('won', 915),
 ('used', 910),
 ('style', 907),
 ('mother', 895),
 ('lives', 894),
 ('came', 893),
 ('stars', 890),
 ('cinema', 889),
 ('looks', 885),
 ('perhaps', 884),
 ('read', 882),
 ('enjoyed', 879),
 ('boy', 875),
 ('drama', 873),
 ('highly', 871),
 ('given', 870),
 ('playing', 867),
 ('use', 864),
 ('next', 859),
 ('women', 858),
 ('fine', 857),
 ('effects', 856),
 ('kids', 854),
 ('entertaining', 853),
 ('need', 852),
 ('line', 850),
 ('works', 848),
 ('someone', 847),
 ('mr', 836),
 ('simply', 835),
 ('picture', 833),
 ('children', 833),
 ('face', 831),
 ('keep', 831),
 ('friend', 831),
 ('dark', 830),
 ('overall', 828),
 ('certainly', 828),
 ('minutes', 827),
 ('wasn', 824),
 ('history', 822),
 ('finally', 820),
 ('couple', 816),
 ('against', 815),
 ('son', 809),
 ('understand', 808),
 ('lost', 807),
 ('michael', 805),
 ('else', 801),
 ('throughout', 798),
 ('fans', 797),
 ('city', 792),
 ('reason', 789),
 ('written', 787),
 ('production', 787),
 ('several', 784),
 ('school', 783),
 ('based', 781),
 ('rest', 781),
 ('try', 780),
 ('dead', 776),
 ('hope', 775),
 ('strong', 768),
 ('white', 765),
 ('tell', 759),
 ('itself', 758),
 ('half', 753),
 ('person', 749),
 ('sometimes', 746),
 ('past', 744),
 ('start', 744),
 ('genre', 743),
 ('beginning', 739),
 ('final', 739),
 ('town', 738),
 ('art', 734),
 ('humor', 732),
 ('game', 732),
 ('yes', 731),
 ('idea', 731),
 ('late', 730),
 ('becomes', 729),
 ('despite', 729),
 ('able', 726),
 ('case', 726),
 ('money', 723),
 ('child', 721),
 ('completely', 721),
 ('side', 719),
 ('camera', 716),
 ('getting', 714),
 ('instead', 712),
 ('soon', 702),
 ('under', 700),
 ('viewer', 699),
 ('age', 697),
 ('days', 696),
 ('stories', 696),
 ('felt', 694),
 ('simple', 694),
 ('roles', 693),
 ('video', 688),
 ('name', 683),
 ('either', 683),
 ('doing', 677),
 ('turns', 674),
 ('wants', 671),
 ('close', 671),
 ('title', 669),
 ('wrong', 668),
 ('went', 666),
 ('james', 665),
 ('evil', 659),
 ('budget', 657),
 ('episodes', 657),
 ('relationship', 655),
 ('fantastic', 653),
 ('piece', 653),
 ('david', 651),
 ('turn', 648),
 ('murder', 646),
 ('parts', 645),
 ('brother', 644),
 ('absolutely', 643),
 ('head', 643),
 ('experience', 642),
 ('eyes', 641),
 ('sex', 638),
 ('direction', 637),
 ('called', 637),
 ('directed', 636),
 ('lines', 634),
 ('behind', 633),
 ('sort', 632),
 ('actress', 631),
 ('lead', 630),
 ('oscar', 628),
 ('including', 627),
 ('example', 627),
 ('known', 625),
 ('musical', 625),
 ('chance', 621),
 ('score', 620),
 ('already', 619),
 ('feeling', 619),
 ('hit', 619),
 ('voice', 615),
 ('moment', 612),
 ('living', 612),
 ('low', 610),
 ('supporting', 610),
 ('ago', 609),
 ('themselves', 608),
 ('reality', 605),
 ('hilarious', 605),
 ('jack', 604),
 ('told', 603),
 ('hand', 601),
 ('quality', 600),
 ('moving', 600),
 ('dialogue', 600),
 ('song', 599),
 ('happy', 599),
 ('matter', 598),
 ('paul', 598),
 ('light', 594),
 ('future', 593),
 ('entire', 592),
 ('finds', 591),
 ('gave', 589),
 ('laugh', 587),
 ('released', 586),
 ('expect', 584),
 ('fight', 581),
 ('particularly', 580),
 ('cinematography', 579),
 ('police', 579),
 ('whose', 578),
 ('type', 578),
 ('sound', 578),
 ('view', 573),
 ('enjoyable', 573),
 ('number', 572),
 ('romantic', 572),
 ('husband', 572),
 ('daughter', 572),
 ('documentary', 571),
 ('self', 570),
 ('superb', 569),
 ('modern', 569),
 ('took', 569),
 ('robert', 569),
 ('mean', 566),
 ('shown', 563),
 ('coming', 561),
 ('important', 560),
 ('king', 559),
 ('leave', 559),
 ('change', 558),
 ('somewhat', 555),
 ('wanted', 555),
 ('tells', 554),
 ('events', 552),
 ('run', 552),
 ('career', 552),
 ('country', 552),
 ('heard', 550),
 ('season', 550),
 ('greatest', 549),
 ('girls', 549),
 ('etc', 547),
 ('care', 546),
 ('starts', 545),
 ('english', 542),
 ('killer', 541),
 ('tale', 540),
 ('guys', 540),
 ('totally', 540),
 ('animation', 540),
 ('usual', 539),
 ('miss', 535),
 ('opinion', 535),
 ('easy', 531),
 ('violence', 531),
 ('songs', 530),
 ('british', 528),
 ('says', 526),
 ('realistic', 525),
 ('writing', 524),
 ('writer', 522),
 ('act', 522),
 ('comic', 521),
 ('thriller', 519),
 ('television', 517),
 ('power', 516),
 ('ones', 515),
 ('kid', 514),
 ('york', 513),
 ('novel', 513),
 ('alone', 512),
 ('problem', 512),
 ('attention', 509),
 ('involved', 508),
 ('kill', 507),
 ('extremely', 507),
 ('seemed', 506),
 ('hero', 505),
 ('french', 505),
 ('rock', 504),
 ('stuff', 501),
 ('wish', 499),
 ('begins', 498),
 ('taken', 497),
 ('sad', 497),
 ('ways', 496),
 ('richard', 495),
 ('knows', 494),
 ('atmosphere', 493),
 ('similar', 491),
 ('surprised', 491),
 ('taking', 491),
 ('car', 491),
 ('george', 490),
 ('perfectly', 490),
 ('across', 489),
 ('team', 489),
 ('eye', 489),
 ('sequence', 489),
 ('room', 488),
 ('due', 488),
 ('among', 488),
 ('serious', 488),
 ('powerful', 488),
 ('strange', 487),
 ('order', 487),
 ('cannot', 487),
 ('b', 487),
 ('beauty', 486),
 ('famous', 485),
 ('happened', 484),
 ('tries', 484),
 ('herself', 484),
 ('myself', 484),
 ('class', 483),
 ('four', 482),
 ('cool', 481),
 ('release', 479),
 ('anyway', 479),
 ('theme', 479),
 ('opening', 478),
 ('entertainment', 477),
 ('slow', 475),
 ('ends', 475),
 ('unique', 475),
 ('exactly', 475),
 ('easily', 474),
 ('level', 474),
 ('o', 474),
 ('red', 474),
 ('interest', 472),
 ('happen', 471),
 ('crime', 470),
 ('viewing', 468),
 ('sets', 467),
 ('memorable', 467),
 ('stop', 466),
 ('group', 466),
 ('problems', 463),
 ('dance', 463),
 ('working', 463),
 ('sister', 463),
 ('message', 463),
 ('knew', 462),
 ('mystery', 461),
 ('nature', 461),
 ('bring', 460),
 ('believable', 459),
 ('thinking', 459),
 ('brought', 459),
 ('mostly', 458),
 ('disney', 457),
 ('couldn', 457),
 ('society', 456),
 ('lady', 455),
 ('within', 455),
 ('blood', 454),
 ('parents', 453),
 ('upon', 453),
 ('viewers', 453),
 ('meets', 452),
 ('form', 452),
 ('peter', 452),
 ('tom', 452),
 ('usually', 452),
 ('soundtrack', 452),
 ('local', 450),
 ('certain', 448),
 ('follow', 448),
 ('whether', 447),
 ('possible', 446),
 ('emotional', 445),
 ('killed', 444),
 ('above', 444),
 ('de', 444),
 ('god', 443),
 ('middle', 443),
 ('needs', 442),
 ('happens', 442),
 ('flick', 442),
 ('masterpiece', 441),
 ('period', 440),
 ('major', 440),
 ('named', 439),
 ('haven', 439),
 ('particular', 438),
 ('th', 438),
 ('earth', 437),
 ('feature', 437),
 ('stand', 436),
 ('words', 435),
 ('typical', 435),
 ('elements', 433),
 ('obviously', 433),
 ('romance', 431),
 ('jane', 430),
 ('yourself', 427),
 ('showing', 427),
 ('brings', 426),
 ('fantasy', 426),
 ('guess', 423),
 ('america', 423),
 ('unfortunately', 422),
 ('huge', 422),
 ('indeed', 421),
 ('running', 421),
 ('talent', 420),
 ('stage', 419),
 ('started', 418),
 ('leads', 417),
 ('sweet', 417),
 ('japanese', 417),
 ('poor', 416),
 ('deal', 416),
 ('incredible', 413),
 ('personal', 413),
 ('fast', 412),
 ('became', 410),
 ('deep', 410),
 ('hours', 409),
 ('giving', 408),
 ('nearly', 408),
 ('dream', 408),
 ('clearly', 407),
 ('turned', 407),
 ('obvious', 406),
 ('near', 406),
 ('cut', 405),
 ('surprise', 405),
 ('era', 404),
 ('body', 404),
 ('hour', 403),
 ('female', 403),
 ('five', 403),
 ('note', 399),
 ('learn', 398),
 ('truth', 398),
 ('except', 397),
 ('feels', 397),
 ('match', 397),
 ('tony', 397),
 ('filmed', 394),
 ('clear', 394),
 ('complete', 394),
 ('street', 393),
 ('eventually', 393),
 ('keeps', 393),
 ('older', 393),
 ('lots', 393),
 ('buy', 392),
 ('william', 391),
 ('stewart', 391),
 ('fall', 390),
 ('joe', 390),
 ('meet', 390),
 ('unlike', 389),
 ('talking', 389),
 ('shots', 389),
 ('rating', 389),
 ('difficult', 389),
 ('dramatic', 388),
 ('means', 388),
 ('situation', 386),
 ('wonder', 386),
 ('present', 386),
 ('appears', 386),
 ('subject', 386),
 ('comments', 385),
 ('general', 383),
 ('sequences', 383),
 ('lee', 383),
 ('points', 382),
 ('earlier', 382),
 ('gone', 379),
 ('check', 379),
 ('suspense', 378),
 ('recommended', 378),
 ('ten', 378),
 ('third', 377),
 ('business', 377),
 ('talk', 375),
 ('leaves', 375),
 ('beyond', 375),
 ('portrayal', 374),
 ('beautifully', 373),
 ('single', 372),
 ('bill', 372),
 ('plenty', 371),
 ('word', 371),
 ('whom', 370),
 ('falls', 370),
 ('scary', 369),
 ('non', 369),
 ('figure', 369),
 ('battle', 369),
 ('using', 368),
 ('return', 368),
 ('doubt', 367),
 ('add', 367),
 ('hear', 366),
 ('solid', 366),
 ('success', 366),
 ('jokes', 365),
 ('oh', 365),
 ('touching', 365),
 ('political', 365),
 ('hell', 364),
 ('awesome', 364),
 ('boys', 364),
 ('sexual', 362),
 ('recently', 362),
 ('dog', 362),
 ('please', 361),
 ('wouldn', 361),
 ('straight', 361),
 ('features', 361),
 ('forget', 360),
 ('setting', 360),
 ('lack', 360),
 ('married', 359),
 ('mark', 359),
 ('social', 357),
 ('interested', 356),
 ('adventure', 356),
 ('actual', 355),
 ('terrific', 355),
 ('sees', 355),
 ('brothers', 355),
 ('move', 354),
 ('call', 354),
 ('various', 353),
 ('theater', 353),
 ('dr', 353),
 ('animated', 352),
 ('western', 351),
 ('baby', 350),
 ('space', 350),
 ('leading', 348),
 ('disappointed', 348),
 ('portrayed', 346),
 ('aren', 346),
 ('screenplay', 345),
 ('smith', 345),
 ('towards', 344),
 ('hate', 344),
 ('noir', 343),
 ('outstanding', 342),
 ('decent', 342),
 ('kelly', 342),
 ('directors', 341),
 ('journey', 341),
 ('none', 340),
 ('looked', 340),
 ('effective', 340),
 ('storyline', 339),
 ('caught', 339),
 ('sci', 339),
 ('fi', 339),
 ('cold', 339),
 ('mary', 339),
 ('rich', 338),
 ('charming', 338),
 ('popular', 337),
 ('rare', 337),
 ('manages', 337),
 ('harry', 337),
 ('spirit', 336),
 ('appreciate', 335),
 ('open', 335),
 ('moves', 334),
 ('basically', 334),
 ('acted', 334),
 ('inside', 333),
 ('boring', 333),
 ('century', 333),
 ('mention', 333),
 ('deserves', 333),
 ('subtle', 333),
 ('pace', 333),
 ('familiar', 332),
 ('background', 332),
 ('ben', 331),
 ('creepy', 330),
 ('supposed', 330),
 ('secret', 329),
 ('die', 328),
 ('jim', 328),
 ('question', 327),
 ('effect', 327),
 ('natural', 327),
 ('impressive', 326),
 ('rate', 326),
 ('language', 326),
 ('saying', 325),
 ('intelligent', 325),
 ('telling', 324),
 ('realize', 324),
 ('material', 324),
 ('scott', 324),
 ('singing', 323),
 ('dancing', 322),
 ('visual', 321),
 ('adult', 321),
 ('imagine', 321),
 ('kept', 320),
 ('office', 320),
 ('uses', 319),
 ('pure', 318),
 ('wait', 318),
 ('stunning', 318),
 ('review', 317),
 ('previous', 317),
 ('copy', 317),
 ('seriously', 317),
 ('reading', 316),
 ('create', 316),
 ('hot', 316),
 ('created', 316),
 ('magic', 316),
 ('somehow', 316),
 ('stay', 315),
 ('attempt', 315),
 ('escape', 315),
 ('crazy', 315),
 ('air', 315),
 ('frank', 315),
 ('hands', 314),
 ('filled', 313),
 ('expected', 312),
 ('average', 312),
 ('surprisingly', 312),
 ('complex', 311),
 ('quickly', 310),
 ('successful', 310),
 ('studio', 310),
 ('plus', 309),
 ('male', 309),
 ('co', 307),
 ('images', 306),
 ('casting', 306),
 ('following', 306),
 ('minute', 306),
 ('exciting', 306),
 ('members', 305),
 ('follows', 305),
 ('themes', 305),
 ('german', 305),
 ('reasons', 305),
 ('e', 305),
 ('touch', 304),
 ('edge', 304),
 ('free', 304),
 ('cute', 304),
 ('genius', 304),
 ('outside', 303),
 ('reviews', 302),
 ('admit', 302),
 ('ok', 302),
 ('younger', 302),
 ('fighting', 301),
 ('odd', 301),
 ('master', 301),
 ('recent', 300),
 ('thanks', 300),
 ('break', 300),
 ('comment', 300),
 ('apart', 299),
 ('emotions', 298),
 ('lovely', 298),
 ('begin', 298),
 ('doctor', 297),
 ('party', 297),
 ('italian', 297),
 ('la', 296),
 ('missed', 296),
 ...]

In [12]:
pos_neg_ratios = Counter()

for term,cnt in list(total_counts.most_common()):
    if(cnt > 100):
        pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)
        pos_neg_ratios[term] = pos_neg_ratio

for word,ratio in pos_neg_ratios.most_common():
    if(ratio > 1):
        pos_neg_ratios[word] = np.log(ratio)
    else:
        pos_neg_ratios[word] = -np.log((1 / (ratio+0.01)))

In [13]:
# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()


Out[13]:
[('edie', 4.6913478822291435),
 ('paulie', 4.0775374439057197),
 ('felix', 3.1527360223636558),
 ('polanski', 2.8233610476132043),
 ('matthau', 2.8067217286092401),
 ('victoria', 2.6810215287142909),
 ('mildred', 2.6026896854443837),
 ('gandhi', 2.5389738710582761),
 ('flawless', 2.451005098112319),
 ('superbly', 2.2600254785752498),
 ('perfection', 2.1594842493533721),
 ('astaire', 2.1400661634962708),
 ('captures', 2.0386195471595809),
 ('voight', 2.0301704926730531),
 ('wonderfully', 2.0218960560332353),
 ('powell', 1.9783454248084671),
 ('brosnan', 1.9547990964725592),
 ('lily', 1.9203768470501485),
 ('bakshi', 1.9029851043382795),
 ('lincoln', 1.9014583864844796),
 ('refreshing', 1.8551812956655511),
 ('breathtaking', 1.8481124057791867),
 ('bourne', 1.8478489358790986),
 ('lemmon', 1.8458266904983307),
 ('delightful', 1.8002701588959635),
 ('flynn', 1.7996646487351682),
 ('andrews', 1.7764919970972666),
 ('homer', 1.7692866133759964),
 ('beautifully', 1.7626953362841438),
 ('soccer', 1.7578579175523736),
 ('elvira', 1.7397031072720019),
 ('underrated', 1.7197859696029656),
 ('gripping', 1.7165360479904674),
 ('superb', 1.7091514458966952),
 ('delight', 1.6714733033535532),
 ('welles', 1.6677068205580761),
 ('sadness', 1.663505133704376),
 ('sinatra', 1.6389967146756448),
 ('touching', 1.637217476541176),
 ('timeless', 1.62924053973028),
 ('macy', 1.6211339521972916),
 ('unforgettable', 1.6177367152487956),
 ('favorites', 1.6158688027643908),
 ('stewart', 1.6119987332957739),
 ('sullivan', 1.6094379124341003),
 ('extraordinary', 1.6094379124341003),
 ('hartley', 1.6094379124341003),
 ('brilliantly', 1.5950491749820008),
 ('friendship', 1.5677652160335325),
 ('wonderful', 1.5645425925262093),
 ('palma', 1.5553706911638245),
 ('magnificent', 1.54663701119507),
 ('finest', 1.5462590108125689),
 ('jackie', 1.5439233053234738),
 ('ritter', 1.5404450409471491),
 ('tremendous', 1.5184661342283736),
 ('freedom', 1.5091151908062312),
 ('fantastic', 1.5048433868558566),
 ('terrific', 1.5026699370083942),
 ('noir', 1.493925025312256),
 ('sidney', 1.493925025312256),
 ('outstanding', 1.4910053152089213),
 ('pleasantly', 1.4894785973551214),
 ('mann', 1.4894785973551214),
 ('nancy', 1.488077055429833),
 ('marie', 1.4825711915553104),
 ('marvelous', 1.4739999415389962),
 ('excellent', 1.4647538505723599),
 ('ruth', 1.4596256342054401),
 ('stanwyck', 1.4412101187160054),
 ('widmark', 1.4350845252893227),
 ('splendid', 1.4271163556401458),
 ('chan', 1.423108334242607),
 ('exceptional', 1.4201959127955721),
 ('tender', 1.410986973710262),
 ('gentle', 1.4078005663408544),
 ('poignant', 1.4022947024663317),
 ('gem', 1.3932148039644643),
 ('amazing', 1.3919815802404802),
 ('chilling', 1.3862943611198906),
 ('fisher', 1.3862943611198906),
 ('davies', 1.3862943611198906),
 ('captivating', 1.3862943611198906),
 ('darker', 1.3652409519220583),
 ('april', 1.3499267169490159),
 ('kelly', 1.3461743673304654),
 ('blake', 1.3418425985490567),
 ('overlooked', 1.329135947279942),
 ('ralph', 1.32818673031261),
 ('bette', 1.3156767939059373),
 ('hoffman', 1.3150668518315229),
 ('cole', 1.3121863889661687),
 ('shines', 1.3049487216659381),
 ('powerful', 1.2999662776313934),
 ('notch', 1.2950456896547455),
 ('remarkable', 1.2883688239495823),
 ('pitt', 1.286210902562908),
 ('winters', 1.2833463918674481),
 ('vivid', 1.2762934659055623),
 ('gritty', 1.2757524867200667),
 ('giallo', 1.2745029551317739),
 ('portrait', 1.2704625455947689),
 ('innocence', 1.2694300209805796),
 ('psychiatrist', 1.2685113254635072),
 ('favorite', 1.2668956297860055),
 ('ensemble', 1.2656663733312759),
 ('stunning', 1.2622417124499117),
 ('burns', 1.259880436264232),
 ('garbo', 1.258954938743289),
 ('barbara', 1.2580400255962119),
 ('philip', 1.2527629684953681),
 ('panic', 1.2527629684953681),
 ('holly', 1.2527629684953681),
 ('carol', 1.2481440226390734),
 ('perfect', 1.246742480713785),
 ('appreciated', 1.2462482874741743),
 ('favourite', 1.2411123512753928),
 ('journey', 1.2367626271489269),
 ('rural', 1.235471471385307),
 ('bond', 1.2321436812926323),
 ('builds', 1.2305398317106577),
 ('brilliant', 1.2287554137664785),
 ('brooklyn', 1.2286654169163074),
 ('von', 1.225175011976539),
 ('recommended', 1.2163953243244932),
 ('unfolds', 1.2163953243244932),
 ('daniel', 1.20215296760895),
 ('perfectly', 1.1971931173405572),
 ('crafted', 1.1962507582320256),
 ('prince', 1.1939224684724346),
 ('troubled', 1.192138346678933),
 ('consequences', 1.1865810616140668),
 ('haunting', 1.1814999484738773),
 ('cinderella', 1.180052620608284),
 ('alexander', 1.1759989522835299),
 ('emotions', 1.1753049094563641),
 ('boxing', 1.1735135968412274),
 ('subtle', 1.1734135017508081),
 ('curtis', 1.1649873576129823),
 ('rare', 1.1566438362402944),
 ('loved', 1.1563661500586044),
 ('daughters', 1.1526795099383853),
 ('courage', 1.1438688802562305),
 ('dentist', 1.1426722784621401),
 ('highly', 1.1420208631618658),
 ('nominated', 1.1409146683587992),
 ('tony', 1.1397491942285991),
 ('draws', 1.1325138403437911),
 ('everyday', 1.1306150197542835),
 ('contrast', 1.1284652518177909),
 ('cried', 1.1213405397456659),
 ('fabulous', 1.1210851445201684),
 ('ned', 1.120591195386885),
 ('fay', 1.120591195386885),
 ('emma', 1.1184149159642893),
 ('sensitive', 1.113318436057805),
 ('smooth', 1.1089750757036563),
 ('dramas', 1.1080910326226534),
 ('today', 1.1050431789984001),
 ('helps', 1.1023091505494358),
 ('inspiring', 1.0986122886681098),
 ('jimmy', 1.0937696641923216),
 ('awesome', 1.0931328229034842),
 ('unique', 1.0881409888008142),
 ('tragic', 1.0871835928444868),
 ('intense', 1.0870514662670339),
 ('stellar', 1.0857088838322018),
 ('rival', 1.0822184788924332),
 ('provides', 1.0797081340289569),
 ('depression', 1.0782034170369026),
 ('shy', 1.0775588794702773),
 ('carrie', 1.076139432816051),
 ('blend', 1.0753554265038423),
 ('hank', 1.0736109864626924),
 ('diana', 1.0726368022648489),
 ('adorable', 1.0726368022648489),
 ('unexpected', 1.0722255334949147),
 ('achievement', 1.0668635903535293),
 ('bettie', 1.0663514264498881),
 ('happiness', 1.0632729222228008),
 ('glorious', 1.0608719606852626),
 ('davis', 1.0541605260972757),
 ('terrifying', 1.0525211814678428),
 ('beauty', 1.050410186850232),
 ('ideal', 1.0479685558493548),
 ('fears', 1.0467872208035236),
 ('hong', 1.0438040521731147),
 ('seasons', 1.0433496099930604),
 ('fascinating', 1.0414538748281612),
 ('carries', 1.0345904299031787),
 ('satisfying', 1.0321225473992768),
 ('definite', 1.0319209141694374),
 ('touched', 1.0296194171811581),
 ('greatest', 1.0248947127715422),
 ('creates', 1.0241097613701886),
 ('aunt', 1.023388867430522),
 ('walter', 1.022328983918479),
 ('spectacular', 1.0198314108149955),
 ('portrayal', 1.0189810189761024),
 ('ann', 1.0127808528183286),
 ('enterprise', 1.0116009116784799),
 ('musicals', 1.0096648026516135),
 ('deeply', 1.0094845087721023),
 ('incredible', 1.0061677561461084),
 ('mature', 1.0060195018402847),
 ('triumph', 0.99682959435816731),
 ('margaret', 0.99682959435816731),
 ('navy', 0.99493385919326827),
 ('harry', 0.99176919305006062),
 ('lucas', 0.990398704027877),
 ('sweet', 0.98966110487955483),
 ('joey', 0.98794672078059009),
 ('oscar', 0.98721905111049713),
 ('balance', 0.98649499054740353),
 ('warm', 0.98485340331145166),
 ('ages', 0.98449898190068863),
 ('guilt', 0.98082925301172619),
 ('glover', 0.98082925301172619),
 ('carrey', 0.98082925301172619),
 ('learns', 0.97881108885548895),
 ('unusual', 0.97788374278196932),
 ('sons', 0.97777581552483595),
 ('complex', 0.97761897738147796),
 ('essence', 0.97753435711487369),
 ('brazil', 0.9769153536905899),
 ('widow', 0.97650959186720987),
 ('solid', 0.97537964824416146),
 ('beautiful', 0.97326301262841053),
 ('holmes', 0.97246100334120955),
 ('awe', 0.97186058302896583),
 ('vhs', 0.97116734209998934),
 ('eerie', 0.97116734209998934),
 ('lonely', 0.96873720724669754),
 ('grim', 0.96873720724669754),
 ('sport', 0.96825047080486615),
 ('debut', 0.96508089604358704),
 ('destiny', 0.96343751029985703),
 ('thrillers', 0.96281074750904794),
 ('tears', 0.95977584381389391),
 ('rose', 0.95664202739772253),
 ('feelings', 0.95551144502743635),
 ('ginger', 0.95551144502743635),
 ('winning', 0.95471810900804055),
 ('stanley', 0.95387344302319799),
 ('cox', 0.95343027882361187),
 ('paris', 0.95278479030472663),
 ('heart', 0.95238806924516806),
 ('hooked', 0.95155887071161305),
 ('comfortable', 0.94803943018873538),
 ('mgm', 0.94446160884085151),
 ('masterpiece', 0.94155039863339296),
 ('themes', 0.94118828349588235),
 ('danny', 0.93967118051821874),
 ('anime', 0.93378388932167222),
 ('perry', 0.93328830824272613),
 ('joy', 0.93301752567946861),
 ('lovable', 0.93081883243706487),
 ('mysteries', 0.92953595862417571),
 ('hal', 0.92953595862417571),
 ('louis', 0.92871325187271225),
 ('charming', 0.92520609553210742),
 ('urban', 0.92367083917177761),
 ('allows', 0.92183091224977043),
 ('impact', 0.91815814604895041),
 ('italy', 0.91629073187415511),
 ('gradually', 0.91629073187415511),
 ('lifestyle', 0.91629073187415511),
 ('spy', 0.91289514287301687),
 ('treat', 0.91193342650519937),
 ('subsequent', 0.91056005716517008),
 ('kennedy', 0.90981821736853763),
 ('loving', 0.90967549275543591),
 ('surprising', 0.90937028902958128),
 ('quiet', 0.90648673177753425),
 ('winter', 0.90624039602065365),
 ('reveals', 0.90490540964902977),
 ('raw', 0.90445627422715225),
 ('funniest', 0.90078654533818991),
 ('pleased', 0.89994159387262562),
 ('norman', 0.89994159387262562),
 ('thief', 0.89874642222324552),
 ('season', 0.89827222637147675),
 ('secrets', 0.89794159320595857),
 ('colorful', 0.89705936994626756),
 ('highest', 0.8967461358011849),
 ('compelling', 0.89462923509297576),
 ('danes', 0.89248008318043659),
 ('castle', 0.88967708335606499),
 ('kudos', 0.88889175768604067),
 ('great', 0.88810470901464589),
 ('baseball', 0.88730319500090271),
 ('subtitles', 0.88730319500090271),
 ('bleak', 0.88730319500090271),
 ('winner', 0.88643776872447388),
 ('tragedy', 0.88563699078315261),
 ('todd', 0.88551907320740142),
 ('nicely', 0.87924946019380601),
 ('arthur', 0.87546873735389985),
 ('essential', 0.87373111745535925),
 ('gorgeous', 0.8731725250935497),
 ('fonda', 0.87294029100054127),
 ('eastwood', 0.87139541196626402),
 ('focuses', 0.87082835779739776),
 ('enjoyed', 0.87070195951624607),
 ('natural', 0.86997924506912838),
 ('intensity', 0.86835126958503595),
 ('witty', 0.86824103423244681),
 ('rob', 0.8642954367557748),
 ('worlds', 0.86377269759070874),
 ('health', 0.86113891179907498),
 ('magical', 0.85953791528170564),
 ('deeper', 0.85802182375017932),
 ('lucy', 0.85618680780444956),
 ('moving', 0.85566611005772031),
 ('lovely', 0.85290640004681306),
 ('purple', 0.8513711857748395),
 ('memorable', 0.84801189112086062),
 ('sings', 0.84729786038720367),
 ('craig', 0.84342938360928321),
 ('modesty', 0.84342938360928321),
 ('relate', 0.84326559685926517),
 ('episodes', 0.84223712084137292),
 ('strong', 0.84167135777060931),
 ('smith', 0.83959811108590054),
 ('tear', 0.83704136022001441),
 ('apartment', 0.83333115290549531),
 ('princess', 0.83290912293510388),
 ('disagree', 0.83290912293510388),
 ('kung', 0.83173334384609199),
 ('adventure', 0.83150561393278388),
 ('columbo', 0.82667857318446791),
 ('jake', 0.82667857318446791),
 ('adds', 0.82485652591452319),
 ('hart', 0.82472353834866463),
 ('strength', 0.82417544296634937),
 ('realizes', 0.82360006895738058),
 ('dave', 0.8232003088081431),
 ('childhood', 0.82208086393583857),
 ('forbidden', 0.81989888619908913),
 ('tight', 0.81883539572344199),
 ('surreal', 0.8178506590609026),
 ('manager', 0.81770990320170756),
 ('dancer', 0.81574950265227764),
 ('studios', 0.81093021621632877),
 ('con', 0.81093021621632877),
 ('miike', 0.80821651034473263),
 ('realistic', 0.80807714723392232),
 ('explicit', 0.80792269515237358),
 ('kurt', 0.8060875917405409),
 ('traditional', 0.80535917116687328),
 ('deals', 0.80535917116687328),
 ('holds', 0.80493858654806194),
 ('carl', 0.80437281567016972),
 ('touches', 0.80396154690023547),
 ('gene', 0.80314807577427383),
 ('albert', 0.8027669055771679),
 ('abc', 0.80234647252493729),
 ('cry', 0.80011930011211307),
 ('sides', 0.7995275841185171),
 ('develops', 0.79850769621777162),
 ('eyre', 0.79850769621777162),
 ('dances', 0.79694397424158891),
 ('oscars', 0.79633141679517616),
 ('legendary', 0.79600456599965308),
 ('hearted', 0.79492987486988764),
 ('importance', 0.79492987486988764),
 ('portraying', 0.79356592830699269),
 ('impressed', 0.79258107754813223),
 ('waters', 0.79112758892014912),
 ('empire', 0.79078565012386137),
 ('edge', 0.789774016249017),
 ('jean', 0.78845736036427028),
 ('environment', 0.78845736036427028),
 ('sentimental', 0.7864791203521645),
 ('captured', 0.78623760362595729),
 ('styles', 0.78592891401091158),
 ('daring', 0.78592891401091158),
 ('frank', 0.78275933924963248),
 ('tense', 0.78275933924963248),
 ('backgrounds', 0.78275933924963248),
 ('matches', 0.78275933924963248),
 ('gothic', 0.78209466657644144),
 ('sharp', 0.7814397877056235),
 ('achieved', 0.78015855754957497),
 ('court', 0.77947526404844247),
 ('steals', 0.7789140023173704),
 ('rules', 0.77844476107184035),
 ('colors', 0.77684619943659217),
 ('reunion', 0.77318988823348167),
 ('covers', 0.77139937745969345),
 ('tale', 0.77010822169607374),
 ('rain', 0.7683706017975328),
 ('denzel', 0.76804848873306297),
 ('stays', 0.76787072675588186),
 ('blob', 0.76725515271366718),
 ('maria', 0.76214005204689672),
 ('conventional', 0.76214005204689672),
 ('fresh', 0.76158434211317383),
 ('midnight', 0.76096977689870637),
 ('landscape', 0.75852993982279704),
 ('animated', 0.75768570169751648),
 ('titanic', 0.75666058628227129),
 ('sunday', 0.75666058628227129),
 ('spring', 0.7537718023763802),
 ('cagney', 0.7537718023763802),
 ('enjoyable', 0.75246375771636476),
 ('immensely', 0.75198768058287868),
 ('sir', 0.7507762933965817),
 ('nevertheless', 0.75067102469813185),
 ('driven', 0.74994477895307854),
 ('performances', 0.74883252516063137),
 ('memories', 0.74721440183022114),
 ('nowadays', 0.74721440183022114),
 ('simple', 0.74641420974143258),
 ('golden', 0.74533293373051557),
 ('leslie', 0.74533293373051557),
 ('lovers', 0.74497224842453125),
 ('relationship', 0.74484232345601786),
 ('supporting', 0.74357803418683721),
 ('che', 0.74262723782331497),
 ('packed', 0.7410032017375805),
 ('trek', 0.74021469141793106),
 ('provoking', 0.73840377214806618),
 ('strikes', 0.73759894313077912),
 ('depiction', 0.73682224406260699),
 ('emotional', 0.73678211645681524),
 ('secretary', 0.7366322924996842),
 ('influenced', 0.73511137965897755),
 ('florida', 0.73511137965897755),
 ('germany', 0.73288750920945944),
 ('brings', 0.73142936713096229),
 ('lewis', 0.73129894652432159),
 ('elderly', 0.73088750854279239),
 ('owner', 0.72743625403857748),
 ('streets', 0.72666987259858895),
 ('henry', 0.72642196944481741),
 ('portrays', 0.72593700338293632),
 ('bears', 0.7252354951114458),
 ('china', 0.72489587887452556),
 ('anger', 0.72439972406404984),
 ('society', 0.72433010799663333),
 ('available', 0.72415741730250549),
 ('best', 0.72347034060446314),
 ('bugs', 0.72270598280148979),
 ('magic', 0.71878961117328299),
 ('delivers', 0.71846498854423513),
 ('verhoeven', 0.71846498854423513),
 ('jim', 0.71783979315031676),
 ('donald', 0.71667767797013937),
 ('endearing', 0.71465338578090898),
 ('relationships', 0.71393795022901896),
 ('greatly', 0.71256526641704687),
 ('charlie', 0.71024161391924534),
 ('brad', 0.71024161391924534),
 ('simon', 0.70967648251115578),
 ('effectively', 0.70914752190638641),
 ('march', 0.70774597998109789),
 ('atmosphere', 0.70744773070214162),
 ('influence', 0.70733181555190172),
 ('genius', 0.706392407309966),
 ('emotionally', 0.70556970055850243),
 ('ken', 0.70526854109229009),
 ('identity', 0.70484322032313651),
 ('sophisticated', 0.70470800296102132),
 ('dan', 0.70457587638356811),
 ('andrew', 0.70329955202396321),
 ('india', 0.70144598337464037),
 ('roy', 0.69970458110610434),
 ('surprisingly', 0.6995780708902356),
 ('sky', 0.69780919366575667),
 ('romantic', 0.69664981111114743),
 ('match', 0.69566924999265523),
 ('meets', 0.69314718055994529),
 ('cowboy', 0.69314718055994529),
 ('wave', 0.69314718055994529),
 ('bitter', 0.69314718055994529),
 ('patient', 0.69314718055994529),
 ('stylish', 0.69314718055994529),
 ('britain', 0.69314718055994529),
 ('affected', 0.69314718055994529),
 ('beatty', 0.69314718055994529),
 ('love', 0.69198533541937324),
 ('paul', 0.68980827929443067),
 ('andy', 0.68846333124751902),
 ('performance', 0.68797386327972465),
 ('patrick', 0.68645819240914863),
 ('unlike', 0.68546468438792907),
 ('brooks', 0.68433655087779044),
 ('refuses', 0.68348526964820844),
 ('award', 0.6824518914431974),
 ('complaint', 0.6824518914431974),
 ('ride', 0.68229716453587952),
 ('dawson', 0.68171848473632257),
 ('luke', 0.68158635815886937),
 ('wells', 0.68087708796813096),
 ('france', 0.6804081547825156),
 ('sports', 0.68007509899259255),
 ('handsome', 0.68007509899259255),
 ('directs', 0.67875844310784572),
 ('rebel', 0.67875844310784572),
 ('greater', 0.67605274720064523),
 ('dreams', 0.67599410133369586),
 ('effective', 0.67565402311242806),
 ('interpretation', 0.67479804189174875),
 ('works', 0.67445504754779284),
 ('brando', 0.67445504754779284),
 ('noble', 0.6737290947028437),
 ('paced', 0.67314651385327573),
 ('le', 0.67067432470788668),
 ('master', 0.67015766233524654),
 ('h', 0.6696166831497512),
 ('rings', 0.66904962898088483),
 ('easy', 0.66895995494594152),
 ('city', 0.66820823221269321),
 ('sunshine', 0.66782937257565544),
 ('succeeds', 0.66647893347778397),
 ('relations', 0.664159643686693),
 ('england', 0.66387679825983203),
 ('glimpse', 0.66329421741026418),
 ('aired', 0.66268797307523675),
 ('sees', 0.66263163663399482),
 ('both', 0.66248336767382998),
 ('definitely', 0.66199789483898808),
 ('imaginative', 0.66139848224536502),
 ('appreciate', 0.66083893732728749),
 ('tricks', 0.66071190480679143),
 ('striking', 0.66071190480679143),
 ('carefully', 0.65999497324304479),
 ('complicated', 0.65981076029235353),
 ('perspective', 0.65962448852130173),
 ('trilogy', 0.65877953705573755),
 ('future', 0.65834665141052828),
 ('lion', 0.65742909795786608),
 ('douglas', 0.65540685257709819),
 ('victor', 0.65540685257709819),
 ('inspired', 0.65459851044271034),
 ('marriage', 0.65392646740666405),
 ('demands', 0.65392646740666405),
 ('father', 0.65172321672194655),
 ('page', 0.65123628494430852),
 ('instant', 0.65058756614114943),
 ('era', 0.6495567444850836),
 ('ruthless', 0.64934455790155243),
 ('saga', 0.64934455790155243),
 ('joan', 0.64891392558311978),
 ('joseph', 0.64841128671855386),
 ('workers', 0.64829661439459352),
 ('fantasy', 0.64726757480925168),
 ('distant', 0.64551913157069074),
 ('accomplished', 0.64551913157069074),
 ('manhattan', 0.64435701639051324),
 ('personal', 0.64355023942057321),
 ('meeting', 0.64313675998528386),
 ('individual', 0.64313675998528386),
 ('pushing', 0.64313675998528386),
 ('pleasant', 0.64250344774119039),
 ('brave', 0.64185388617239469),
 ('william', 0.64083139119578469),
 ('hudson', 0.64077919504262937),
 ('friendly', 0.63949446706762514),
 ('eccentric', 0.63907995928966954),
 ('awards', 0.63875310849414646),
 ('jack', 0.63838309514997038),
 ('seeking', 0.63808740337691783),
 ('divorce', 0.63757732940513456),
 ('colonel', 0.63757732940513456),
 ('jane', 0.63443957973316734),
 ('keeping', 0.63414883979798953),
 ('gives', 0.63383568159497883),
 ('ted', 0.63342794585832296),
 ('animation', 0.63208692379869902),
 ('progress', 0.6317782341836532),
 ('larger', 0.63127177684185776),
 ('concert', 0.63127177684185776),
 ('nation', 0.6296337748376194),
 ('albeit', 0.62739580299716491),
 ('adapted', 0.62613647027698516),
 ('discovers', 0.62542900650499444),
 ('classic', 0.62504956428050518),
 ('segment', 0.62335141862440335),
 ('morgan', 0.62303761437291871),
 ('mouse', 0.62294292188669675),
 ('impressive', 0.62211140744319349),
 ('artist', 0.62168821657780038),
 ('ultimate', 0.62168821657780038),
 ('griffith', 0.62117368093485603),
 ('drew', 0.62082651898031915),
 ('emily', 0.62082651898031915),
 ('moved', 0.6197197120051281),
 ('families', 0.61903920840622351),
 ('profound', 0.61903920840622351),
 ('innocent', 0.61851219917136446),
 ('versions', 0.61730910416844087),
 ('eddie', 0.61691981517206107),
 ('criticism', 0.61651395453902935),
 ('nature', 0.61594514653194088),
 ('recognized', 0.61518563909023349),
 ('sexuality', 0.61467556511845012),
 ('contract', 0.61400986000122149),
 ('brian', 0.61344043794920278),
 ('remembered', 0.6131044728864089),
 ('determined', 0.6123858239154869),
 ('offers', 0.61207935747116349),
 ('pleasure', 0.61195702582993206),
 ('washington', 0.61180154110599294),
 ('images', 0.61159731359583758),
 ('games', 0.61067095873570676),
 ('academy', 0.60872983874736208),
 ('fashioned', 0.60798937221963845),
 ('melodrama', 0.60749173598145145),
 ('rough', 0.60613580357031549),
 ('charismatic', 0.60613580357031549),
 ('peoples', 0.60613580357031549),
 ('dealing', 0.60517840761398811),
 ('fine', 0.60496962268013299),
 ('tap', 0.60391604683200273),
 ('trio', 0.60157998703445481),
 ('russell', 0.60120968523425966),
 ('figures', 0.60077386042893011),
 ('ward', 0.60005675749393339),
 ('shine', 0.59911823091166894),
 ('brady', 0.59911823091166894),
 ('job', 0.59845562125168661),
 ('satisfied', 0.59652034487087369),
 ('river', 0.59637962862495086),
 ('brown', 0.595773016534769),
 ('believable', 0.59566072133302495),
 ('always', 0.59470710774669278),
 ('bound', 0.59470710774669278),
 ('hall', 0.5933967777928858),
 ('cook', 0.5916777203950857),
 ('claire', 0.59136448625000293),
 ('broadway', 0.59033768669372433),
 ('anna', 0.58778666490211906),
 ('peace', 0.58628403501758408),
 ('visually', 0.58539431926349916),
 ('morality', 0.58525821854876026),
 ('falk', 0.58525821854876026),
 ('growing', 0.58466653756587539),
 ('experiences', 0.58314628534561685),
 ('stood', 0.58314628534561685),
 ('touch', 0.58122926435596001),
 ('lives', 0.5810976767513224),
 ('kubrick', 0.58066919713325493),
 ('timing', 0.58047401805583243),
 ('expressions', 0.57981849525294216),
 ('struggles', 0.57981849525294216),
 ('authentic', 0.57848427223980559),
 ('helen', 0.57763429343810091),
 ('pre', 0.57700753064729182),
 ('quirky', 0.5753641449035618),
 ('young', 0.57531672344534313),
 ('inner', 0.57454143815209846),
 ('mexico', 0.57443087372056334),
 ('clint', 0.57380042292737909),
 ('sisters', 0.57286101468544337),
 ('realism', 0.57226528899949558),
 ('french', 0.5720692490067093),
 ('personalities', 0.5720692490067093),
 ('surprises', 0.57113222999698177),
 ('adventures', 0.57113222999698177),
 ('overcome', 0.5697681593994407),
 ('timothy', 0.56953322459276867),
 ('tales', 0.56909453188996639),
 ('war', 0.56843317302781682),
 ('civil', 0.5679840376059393),
 ('countries', 0.56737779327091187),
 ('streep', 0.56710645966458029),
 ('tradition', 0.56685345523565323),
 ('oliver', 0.56673325570428668),
 ('australia', 0.56580775818334383),
 ('understanding', 0.56531380905006046),
 ('players', 0.56509525370004821),
 ('knowing', 0.56489284503626647),
 ('rogers', 0.56421349718405212),
 ('suspenseful', 0.56368911332305849),
 ('variety', 0.56368911332305849),
 ('true', 0.56281525180810066),
 ('jr', 0.56220982311246936),
 ('psychological', 0.56108745854687891),
 ('sent', 0.55961578793542266),
 ('grand', 0.55961578793542266),
 ('branagh', 0.55961578793542266),
 ('reminiscent', 0.55961578793542266),
 ('performing', 0.55961578793542266),
 ('wealth', 0.55961578793542266),
 ('overwhelming', 0.55961578793542266),
 ('odds', 0.55961578793542266),
 ('brothers', 0.55891181043362848),
 ('howard', 0.55811089675600245),
 ('david', 0.55693122256475369),
 ('generation', 0.55628799784274796),
 ('grow', 0.55612538299565417),
 ('survival', 0.55594605904646033),
 ('mainstream', 0.55574731115750231),
 ('dick', 0.55431073570572953),
 ('charm', 0.55288175575407861),
 ('kirk', 0.55278982286502287),
 ('twists', 0.55244729845681018),
 ('gangster', 0.55206858230003986),
 ('jeff', 0.55179306225421365),
 ('family', 0.55116244510065526),
 ('tend', 0.55053307336110335),
 ('thanks', 0.55049088015842218),
 ('world', 0.54744234723432639),
 ('sutherland', 0.54743536937855164),
 ('life', 0.54695514434959924),
 ('disc', 0.54654370636806993),
 ('bug', 0.54654370636806993),
 ('tribute', 0.5455111817538808),
 ('europe', 0.54522705048332309),
 ('sacrifice', 0.54430155296238014),
 ('color', 0.54405127139431109),
 ('superior', 0.54333490233128523),
 ('york', 0.54318235866536513),
 ('pulls', 0.54266622962164945),
 ('jackson', 0.54232429082536171),
 ('hearts', 0.54232429082536171),
 ('enjoy', 0.54124285135906114),
 ('redemption', 0.54056759296472823),
 ('madness', 0.540384426007535),
 ('stands', 0.5389965007326869),
 ('trial', 0.5389965007326869),
 ('greek', 0.5389965007326869),
 ('hamilton', 0.5389965007326869),
 ('each', 0.5388212312554177),
 ('faithful', 0.53773307668591508),
 ('received', 0.5372768098531604),
 ('documentaries', 0.53714293208336406),
 ('jealous', 0.53714293208336406),
 ('different', 0.53709860682460819),
 ('describes', 0.53680111016925136),
 ('shorts', 0.53596159703753288),
 ('brilliance', 0.53551823635636209),
 ('mountains', 0.53492317534505118),
 ('share', 0.53408248593025787),
 ('dealt', 0.53408248593025787),
 ('providing', 0.53329847961804933),
 ('explore', 0.53329847961804933),
 ('series', 0.5325809226575603),
 ('fellow', 0.5323318289869543),
 ('loves', 0.53062825106217038),
 ('revolution', 0.53062825106217038),
 ('olivier', 0.53062825106217038),
 ('roman', 0.53062825106217038),
 ('century', 0.53002783074992665),
 ('musical', 0.52966871156747064),
 ('heroic', 0.52925932545482868),
 ('approach', 0.52806743020049673),
 ('ironically', 0.52806743020049673),
 ('temple', 0.52806743020049673),
 ('moves', 0.5279372642387119),
 ('gift', 0.52702030968597136),
 ('julie', 0.52609309589677911),
 ('tells', 0.52415107836314001),
 ('radio', 0.52394671172868779),
 ('uncle', 0.52354439617376536),
 ('union', 0.52324814376454787),
 ('deep', 0.52309571635780505),
 ('reminds', 0.52157841554225237),
 ('famous', 0.52118841080153722),
 ('jazz', 0.52053443789295151),
 ('dennis', 0.51987545928590861),
 ('epic', 0.51919387343650736),
 ('adult', 0.519167695083386),
 ('shows', 0.51915322220375304),
 ('performed', 0.5191244265806858),
 ('demons', 0.5191244265806858),
 ('discovered', 0.51879379341516751),
 ('eric', 0.51879379341516751),
 ('youth', 0.5185626062681431),
 ('human', 0.51851411224987087),
 ('tarzan', 0.51813827061227724),
 ('ourselves', 0.51794309153485463),
 ('wwii', 0.51758240622887042),
 ('passion', 0.5162164724008671),
 ('desire', 0.51607497965213445),
 ('pays', 0.51581316527702981),
 ('dirty', 0.51557622652458857),
 ('fox', 0.51557622652458857),
 ('sympathetic', 0.51546600332249293),
 ('symbolism', 0.51546600332249293),
 ('attitude', 0.51530993621331933),
 ('appearances', 0.51466440007315639),
 ('jeremy', 0.51466440007315639),
 ('fun', 0.51439068993048687),
 ('south', 0.51420972175023116),
 ('arrives', 0.51409894911095988),
 ('present', 0.51341965894303732),
 ('com', 0.51326167856387173),
 ('smile', 0.51265880484765169),
 ('alan', 0.51082562376599072),
 ('ring', 0.51082562376599072),
 ('visit', 0.51082562376599072),
 ('fits', 0.51082562376599072),
 ('provided', 0.51082562376599072),
 ('carter', 0.51082562376599072),
 ('aging', 0.51082562376599072),
 ('countryside', 0.51082562376599072),
 ('begins', 0.51015650363396647),
 ('success', 0.50900578704900468),
 ('japan', 0.50900578704900468),
 ('accurate', 0.50895471583017893),
 ('proud', 0.50800474742434931),
 ('daily', 0.5075946031845443),
 ('karloff', 0.50724780241810674),
 ('atmospheric', 0.50724780241810674),
 ('recently', 0.50714914903668207),
 ('fu', 0.50704490092608467),
 ('horrors', 0.50656122497953315),
 ('finding', 0.50637127341661037),
 ('lust', 0.5059356384717989),
 ('hitchcock', 0.50574947073413001),
 ('among', 0.50334004951332734),
 ('viewing', 0.50302139827440906),
 ('investigation', 0.50262885656181222),
 ('shining', 0.50262885656181222),
 ('duo', 0.5020919437972361),
 ('cameron', 0.5020919437972361),
 ('finds', 0.50128303100539795),
 ('contemporary', 0.50077528791248915),
 ('genuine', 0.50046283673044401),
 ('frightening', 0.49995595152908684),
 ('plays', 0.49975983848890226),
 ('age', 0.49941323171424595),
 ('position', 0.49899116611898781),
 ('continues', 0.49863035067217237),
 ('roles', 0.49839716550752178),
 ('james', 0.49837216269470402),
 ('individuals', 0.49824684155913052),
 ('brought', 0.49783842823917956),
 ('hilarious', 0.49714551986191058),
 ('brutal', 0.49681488669639234),
 ('appropriate', 0.49643688631389105),
 ('dance', 0.49581998314812048),
 ('league', 0.49578774640145024),
 ('helping', 0.49578774640145024),
 ('answers', 0.49578774640145024),
 ('stunts', 0.49561620510246196),
 ('traveling', 0.49532143723002542),
 ('thoroughly', 0.49414593456733524),
 ('depicted', 0.49317068852726992),
 ('combination', 0.49247648509779424),
 ('honor', 0.49247648509779424),
 ('differences', 0.49247648509779424),
 ('fully', 0.49213349075383811),
 ('tracy', 0.49159426183810306),
 ('battles', 0.49140753790888908),
 ('possibility', 0.49112055268665822),
 ('romance', 0.4901589869574316),
 ('initially', 0.49002249613622745),
 ('happy', 0.4898997500608791),
 ('crime', 0.48977221456815834),
 ('singing', 0.4893852925281213),
 ('especially', 0.48901267837860624),
 ('shakespeare', 0.48754793889664511),
 ('hugh', 0.48729512635579658),
 ('detail', 0.48609484250827351),
 ('julia', 0.48550781578170082),
 ('san', 0.48550781578170082),
 ('guide', 0.48550781578170082),
 ('desperation', 0.48550781578170082),
 ('companion', 0.48550781578170082),
 ('strongly', 0.48460242866688824),
 ('necessary', 0.48302334245403883),
 ('humanity', 0.48265474679929443),
 ('drama', 0.48221998493060503),
 ('nonetheless', 0.48183808689273838),
 ('intrigue', 0.48183808689273838),
 ('warming', 0.48183808689273838),
 ('cuba', 0.48183808689273838),
 ('planned', 0.47957308026188628),
 ('pictures', 0.47929937011921681),
 ('broadcast', 0.47849024312305422),
 ('nine', 0.47803580094299974),
 ('settings', 0.47743860773325364),
 ('history', 0.47732966933780852),
 ('ordinary', 0.47725880012690741),
 ('trade', 0.47692407209030935),
 ('official', 0.47608267532211779),
 ('primary', 0.47608267532211779),
 ('episode', 0.47529620261150429),
 ('role', 0.47520268270188676),
 ('spirit', 0.47477690799839323),
 ('grey', 0.47409361449726067),
 ('ways', 0.47323464982718205),
 ('cup', 0.47260441094579297),
 ('piano', 0.47260441094579297),
 ('familiar', 0.47241617565111949),
 ('sinister', 0.47198579044972683),
 ('reveal', 0.47171449364936496),
 ('max', 0.47150852042515579),
 ('dated', 0.47121648567094482),
 ('losing', 0.47000362924573563),
 ('discovery', 0.47000362924573563),
 ('vicious', 0.47000362924573563),
 ('genuinely', 0.46871413841586385),
 ('hatred', 0.46734051182625186),
 ('mistaken', 0.46702300110759781),
 ('dream', 0.46608972992459924),
 ('challenge', 0.46608972992459924),
 ('crisis', 0.46575733836428446),
 ('photographed', 0.46488852857896512),
 ('critics', 0.46430560813109778),
 ('bird', 0.46430560813109778),
 ('machines', 0.46430560813109778),
 ('born', 0.46411383518967209),
 ('detective', 0.4636633473511525),
 ('higher', 0.46328467899699055),
 ('remains', 0.46262352194811296),
 ('inevitable', 0.46262352194811296),
 ('soviet', 0.4618180446592961),
 ('ryan', 0.46134556650262099),
 ('african', 0.46112595521371813),
 ('smaller', 0.46081520319132935),
 ('techniques', 0.46052488529119184),
 ('information', 0.46034171833399862),
 ('deserved', 0.45999798712841444),
 ('lynch', 0.45953232937844013),
 ('spielberg', 0.45953232937844013),
 ('cynical', 0.45953232937844013),
 ('tour', 0.45953232937844013),
 ('francisco', 0.45953232937844013),
 ('struggle', 0.45911782160048453),
 ('language', 0.45902121257712653),
 ('visual', 0.45823514408822852),
 ('warner', 0.45724137763188427),
 ('social', 0.45720078250735313),
 ('reality', 0.45719346885019546),
 ('hidden', 0.45675840249571492),
 ('breaking', 0.45601738727099561),
 ('sometimes', 0.45563021171182794),
 ('modern', 0.45500247579345005),
 ('surfing', 0.45425527227759638),
 ('popular', 0.45410691533051023),
 ('surprised', 0.4534409399850382),
 ('follows', 0.45245361754408348),
 ('keeps', 0.45234869400701483),
 ('john', 0.4520909494482197),
 ('mixed', 0.45198512374305722),
 ('defeat', 0.45198512374305722),
 ('justice', 0.45142724367280018),
 ('treasure', 0.45083371313801535),
 ('presents', 0.44973793178615257),
 ('years', 0.44919197032104968),
 ('chief', 0.44895022004790319),
 ('shadows', 0.44802472252696035),
 ('closely', 0.44701411102103689),
 ('segments', 0.44701411102103689),
 ('lose', 0.44658335503763702),
 ('caine', 0.44628710262841953),
 ('caught', 0.44610275383999071),
 ('hamlet', 0.44558510189758965),
 ('chinese', 0.44507424620321018),
 ('welcome', 0.44438052435783792),
 ('birth', 0.44368632092836219),
 ('represents', 0.44320543609101143),
 ('puts', 0.44279106572085081),
 ('visuals', 0.44183275227903923),
 ('fame', 0.44183275227903923),
 ('closer', 0.44183275227903923),
 ('web', 0.44183275227903923),
 ('criminal', 0.4412745608048752),
 ('minor', 0.4409224199448939),
 ('jon', 0.44086703515908027),
 ('liked', 0.44074991514020723),
 ('restaurant', 0.44031183943833246),
 ('de', 0.43983275161237217),
 ('flaws', 0.43983275161237217),
 ('searching', 0.4393666597838457),
 ('rap', 0.43891304217570443),
 ('light', 0.43884433018199892),
 ('elizabeth', 0.43872232986464682),
 ('marry', 0.43861731542506488),
 ('learned', 0.43825493093115531),
 ('controversial', 0.43825493093115531),
 ('oz', 0.43825493093115531),
 ('slowly', 0.43785660389939979),
 ('comedic', 0.43721380642274466),
 ('wayne', 0.43721380642274466),
 ('thrilling', 0.43721380642274466),
 ('bridge', 0.43721380642274466),
 ('married', 0.43658501682196887),
 ('nazi', 0.4361020775700542),
 ('murder', 0.4353180712578455),
 ('physical', 0.4353180712578455),
 ('johnny', 0.43483971678806865),
 ('michelle', 0.43445264498141672),
 ('wallace', 0.43403848055222038),
 ('comedies', 0.43395706390247063),
 ('silent', 0.43395706390247063),
 ('played', 0.43387244114515305),
 ('international', 0.43363598507486073),
 ('vision', 0.43286408229627887),
 ('intelligent', 0.43196704885367099),
 ('shop', 0.43078291609245434),
 ('also', 0.43036720209769169),
 ('levels', 0.4302451371066513),
 ('miss', 0.43006426712153217),
 ('movement', 0.4295626596872249),
 ...]

In [14]:
# words most frequently seen in a review with a "NEGATIVE" label
list(reversed(pos_neg_ratios.most_common()))[0:30]


Out[14]:
[('boll', -4.0778152602708904),
 ('uwe', -3.9218753018711578),
 ('seagal', -3.3202501058581921),
 ('unwatchable', -3.0269848170580955),
 ('stinker', -2.9876839403711624),
 ('mst', -2.7753833211707968),
 ('incoherent', -2.7641396677532537),
 ('unfunny', -2.5545257844967644),
 ('waste', -2.4907515123361046),
 ('blah', -2.4475792789485005),
 ('horrid', -2.3715779644809971),
 ('pointless', -2.3451073877136341),
 ('atrocious', -2.3187369339642556),
 ('redeeming', -2.2667790015910296),
 ('prom', -2.2601040980178784),
 ('drivel', -2.2476029585766928),
 ('lousy', -2.2118080125207054),
 ('worst', -2.1930856334332267),
 ('laughable', -2.172468615469592),
 ('awful', -2.1385076866397488),
 ('poorly', -2.1326133844207011),
 ('wasting', -2.1178155545614512),
 ('remotely', -2.111046881095167),
 ('existent', -2.0024805005437076),
 ('boredom', -1.9241486572738005),
 ('miserably', -1.9216610938019989),
 ('sucks', -1.9166645809588516),
 ('uninspired', -1.9131499212248517),
 ('lame', -1.9117232884159072),
 ('insult', -1.9085323769376259)]

Transforming Text into Numbers


In [15]:
from IPython.display import Image

review = "This was a horrible, terrible movie."

Image(filename='sentiment_network.png')


Out[15]:

In [16]:
review = "The movie was excellent"

Image(filename='sentiment_network_pos.png')


Out[16]:

Project 2: Creating the Input/Output Data


In [17]:
vocab = set(total_counts.keys())
vocab_size = len(vocab)
print(vocab_size)


74074

In [18]:
list(vocab)


Out[18]:
['',
 'paradise',
 'devotion',
 'rwtd',
 'hazy',
 'governing',
 'shied',
 'usb',
 'kahlua',
 'dolman',
 'tiana',
 'toms',
 'waterslides',
 'perpetuates',
 'ragged',
 'kirtland',
 'unparalleled',
 'scrubbed',
 'contemperaneous',
 'hellishly',
 'gimli',
 'uncynical',
 'synonym',
 'proba',
 'trident',
 'sidestep',
 'babban',
 'chenoweth',
 'cannabalistic',
 'spots',
 'ihave',
 'kriemshild',
 'frys',
 'route',
 'innsbruck',
 'nonreligious',
 'swig',
 'haft',
 'wags',
 'rugrats',
 'blocker',
 'paltrow',
 'pacer',
 'prada',
 'snapshotters',
 'weaver',
 'eaters',
 'floradora',
 'deflected',
 'batch',
 'dub',
 'unreasonable',
 'clenching',
 'introducing',
 'plugged',
 'venice',
 'nlp',
 'honored',
 'pittman',
 'plodded',
 'airliners',
 'grimacing',
 'centerfold',
 'whiteboy',
 'preservatives',
 'ninteen',
 'beaut',
 'gaggling',
 'hotdog',
 'demonized',
 'therapeutic',
 'demon',
 'jonatha',
 'ramping',
 'ingmar',
 'accorsi',
 'lsd',
 'facts',
 'shiny',
 'climber',
 'gallows',
 'konigin',
 'drownes',
 'preparatory',
 'crimson',
 'simplyfied',
 'recomendation',
 'demolish',
 'orignal',
 'preempt',
 'mpkdh',
 'counterfeit',
 'beastly',
 'overhead',
 'sollett',
 'unskillful',
 'andersen',
 'plateaus',
 'old',
 'noam',
 'startling',
 'mutilation',
 'projective',
 'liquefied',
 'favorites',
 'delusional',
 'meddlesome',
 'swankiest',
 'lwensohn',
 'eds',
 'anabel',
 'mcdougall',
 'waving',
 'following',
 'hissy',
 'voracious',
 'demotes',
 'geopolitical',
 'paramour',
 'hultn',
 'gendered',
 'atrociousness',
 'kipling',
 'withdrawal',
 'mucks',
 'repairmen',
 'headaches',
 'kilts',
 'rosson',
 'linklatter',
 'spasitc',
 'rude',
 'pupsi',
 'blainsworth',
 'undeclared',
 'birdman',
 'tian',
 'excell',
 'spaces',
 'reviewing',
 'arlette',
 'deewar',
 'blanked',
 'wellesley',
 'yaphet',
 'despise',
 'infective',
 'accredited',
 'hatsumo',
 'allows',
 'thieriot',
 'royale',
 'bolye',
 'concentrated',
 'lacked',
 'benedict',
 'fluegel',
 'undercurrent',
 'unrurly',
 'idly',
 'sophomore',
 'schilling',
 'burlesque',
 'undead',
 'define',
 'reincarnations',
 'transparencies',
 'aparthied',
 'reggae',
 'sprog',
 'matel',
 'ekeing',
 'comparable',
 'marcy',
 'caregiver',
 'roo',
 'prating',
 'skips',
 'petrillo',
 'underpinnings',
 'stallonethat',
 'mistresses',
 'laundromat',
 'contrite',
 'split',
 'eliminating',
 'poppins',
 'meanie',
 'southron',
 'tait',
 'allover',
 'korman',
 'normally',
 'departure',
 'effortlessly',
 'otherworldliness',
 'chest',
 'pah',
 'scorned',
 'bumble',
 'velocity',
 'duration',
 'motivates',
 'catty',
 'improbabilities',
 'facet',
 'concieved',
 'lascivious',
 'ferociously',
 'silvia',
 'walder',
 'transfixing',
 'gruntled',
 'penpusher',
 'vill',
 'herriman',
 'smartie',
 'collapse',
 'unashamedly',
 'idioterne',
 'darkside',
 'anachronic',
 'kerim',
 'nihlani',
 'topping',
 'yama',
 'arrghh',
 'detatched',
 'magnificant',
 'strength',
 'sontee',
 'troublesome',
 'lakers',
 'resolve',
 'resurrection',
 'editorializing',
 'wexler',
 'backdrops',
 'stevenson',
 'erred',
 'comparrison',
 'dabney',
 'parallels',
 'wcw',
 'adrien',
 'hasselhof',
 'tongues',
 'quip',
 'artist',
 'pagan',
 'trademarks',
 'bingham',
 'necheyev',
 'sais',
 'winger',
 'finds',
 'gryll',
 'winterwonder',
 'producer',
 'humoristic',
 'dighton',
 'blocking',
 'feij',
 'gabbled',
 'lars',
 'yamashita',
 'bekmambetov',
 'pure',
 'perspicacious',
 'concerned',
 'protaganiste',
 'electing',
 'ceremony',
 'nva',
 'longendecker',
 'tassi',
 'overviews',
 'hannay',
 'dumb',
 'gaberial',
 'booboo',
 'hoffman',
 'harrowing',
 'guerriri',
 'prominant',
 'whirry',
 'evangelion',
 'popinjay',
 'recommended',
 'develops',
 'heroistic',
 'crumpled',
 'scoop',
 'azuma',
 'contextualising',
 'durn',
 'joycey',
 'tvm',
 'megabomb',
 'versatile',
 'listens',
 'natalie',
 'shelved',
 'recored',
 'symbiosis',
 'unfocused',
 'berth',
 'georgians',
 'amlie',
 'heavyarms',
 'collette',
 'soldierly',
 'contracting',
 'anastacia',
 'herv',
 'wayback',
 'insensitive',
 'activist',
 'judders',
 'upstream',
 'romanced',
 'flix',
 'presently',
 'securing',
 'jurgens',
 'husen',
 'bleakness',
 'geare',
 'regularity',
 'casper',
 'arming',
 'yubari',
 'catwalk',
 'domestic',
 'transplantation',
 'ennia',
 'fleurieu',
 'censorious',
 'westwood',
 'psyching',
 'divers',
 'kikki',
 'passion',
 'coctails',
 'ne',
 'intermingle',
 'lagravenese',
 'bassinger',
 'disengaged',
 'pennies',
 'patheticness',
 'kheymeh',
 'kiarostami',
 'dions',
 'hearing',
 'caustic',
 'impending',
 'shrine',
 'corinne',
 'sorrowfully',
 'ramotswe',
 'ears',
 'benq',
 'oompah',
 'maneur',
 'pleasance',
 'plummer',
 'carver',
 'laborious',
 'chancellor',
 'nyquist',
 'houseboats',
 'womman',
 'jayenge',
 'boswell',
 'keillor',
 'askey',
 'objected',
 'conflicting',
 'shapes',
 'ww',
 'gravitate',
 'warpath',
 'guadalajara',
 'idylls',
 'dignities',
 'muddies',
 'fairytales',
 'untangle',
 'feast',
 'soars',
 'lowered',
 'notwithstanding',
 'malplaced',
 'arriv',
 'ridden',
 'laawaris',
 'lyu',
 'reappears',
 'playoffs',
 'idol',
 'orphanage',
 'iceholes',
 'cebuano',
 'multilevel',
 'drugging',
 'argued',
 'vapidness',
 'unzips',
 'ayone',
 'peeling',
 'baggot',
 'abishek',
 'halluzinations',
 'restrictions',
 'mirrors',
 'mimic',
 'reaganism',
 'bgr',
 'magazines',
 'naysayer',
 'richly',
 'modulation',
 'perspectives',
 'luckett',
 'willed',
 'plying',
 'sherrys',
 'bloodiness',
 'mortal',
 'tieing',
 'rename',
 'agns',
 'melman',
 'caminho',
 'cnn',
 'unbeknownest',
 'kinematograph',
 'rtl',
 'footnotes',
 'kobe',
 'insurgents',
 'hathcocks',
 'salk',
 'alyce',
 'bestowing',
 'complacency',
 'soured',
 'ullman',
 'yield',
 'calchas',
 'glinda',
 'hennessey',
 'amateur',
 'halperin',
 'zealands',
 'purposely',
 'chilton',
 'linch',
 'eartha',
 'hyperspace',
 'legioners',
 'ninjas',
 'grosse',
 'balls',
 'satanic',
 'shoudl',
 'scalped',
 'afterschool',
 'transient',
 'shalom',
 'coherently',
 'endeavoring',
 'tobei',
 'gnashingly',
 'manhole',
 'coixet',
 'soundtracks',
 'kohala',
 'edo',
 'incentivized',
 'ibsen',
 'breckenridge',
 'thoughtlessness',
 'identification',
 'derails',
 'cinematographers',
 'tamako',
 'jeroen',
 'rhind',
 'deniselacey',
 'candolis',
 'caisse',
 'rationalize',
 'exasperatedly',
 'ibnez',
 'congregations',
 'heartbreak',
 'unintended',
 'could',
 'deceiving',
 'ubc',
 'thumbtack',
 'economies',
 'delanda',
 'booooring',
 'ati',
 'mariel',
 'computability',
 'engulf',
 'tindersticks',
 'reverse',
 'bobo',
 'sewing',
 'boreham',
 'defying',
 'toning',
 'packed',
 'innovated',
 'clot',
 'rosarios',
 'arbore',
 'terror',
 'sticks',
 'fraudulence',
 'subsided',
 'regretful',
 'snipering',
 'deficating',
 'rampages',
 'blackbird',
 'howser',
 'naismith',
 'kornbluths',
 'somersaulted',
 'sensationalistic',
 'qualitatively',
 'contrasting',
 'crapdom',
 'chichi',
 'phi',
 'ladder',
 'shrunken',
 'duh',
 'materialistic',
 'winstons',
 'mohd',
 'warters',
 'husbandgino',
 'benno',
 'clomps',
 'arros',
 'reefer',
 'hilarius',
 'attemps',
 'smirky',
 'serrador',
 'trivialized',
 'thirds',
 'neva',
 'tykes',
 'dyeing',
 'excelent',
 'crest',
 'essential',
 'aloft',
 'matchbox',
 'befores',
 'mpaarated',
 'thinly',
 'vaudevillian',
 'cineliterate',
 'leaving',
 'burlinson',
 'muzzy',
 'oddly',
 'proper',
 'airphone',
 'mujde',
 'client',
 'imaginatively',
 'addario',
 'coworkers',
 'suspicious',
 'cutlet',
 'barley',
 'sparring',
 'francine',
 'gwilym',
 'insanities',
 'blank',
 'farly',
 'groove',
 'fallowing',
 'ogre',
 'mightiest',
 'telling',
 'interracial',
 'undo',
 'prosthetic',
 'outmatched',
 'operating',
 'fill',
 'goldthwait',
 'trappings',
 'govida',
 'coral',
 'masssacre',
 'stooges',
 'simulated',
 'kik',
 'culminates',
 'occaisionally',
 'mme',
 'unluckily',
 'hometown',
 'irishman',
 'jymn',
 'devouring',
 'reprimanded',
 'dealings',
 'belaboured',
 'denmark',
 'kooks',
 'maman',
 'attain',
 'ahhhhhh',
 'demolishing',
 'bdus',
 'karadzhic',
 'proficient',
 'friedrich',
 'kaczorowski',
 'catboy',
 'kazakh',
 'takechi',
 'inexhaustible',
 'bragg',
 'verikoan',
 'soh',
 'cletus',
 'fugue',
 'carteloise',
 'visayans',
 'microwaving',
 'diepardieu',
 'iyer',
 'outmoded',
 'partha',
 'firework',
 'spores',
 'clack',
 'kimberley',
 'capraesque',
 'elmes',
 'kaye',
 'conceptions',
 'compiled',
 'tastic',
 'zizek',
 'distribution',
 'rajasthani',
 'kak',
 'waaaay',
 'mousy',
 'martnez',
 'tollywood',
 'suggestively',
 'phase',
 'trios',
 'tripped',
 'rombero',
 'jianxiang',
 'hyser',
 'stumps',
 'butlers',
 'vaughan',
 'indra',
 'fairmindedness',
 'unshaven',
 'idiotically',
 'rudolf',
 'circulate',
 'kmadden',
 'titantic',
 'wallop',
 'christo',
 'imprisonment',
 'actively',
 'westernisation',
 'personalize',
 'enraging',
 'impersonating',
 'benson',
 'daghang',
 'fork',
 'eventide',
 'convinced',
 'haughtiness',
 'underclothing',
 'idyllic',
 'pragmatism',
 'reporter',
 'slowish',
 'sanjeev',
 'diagnosis',
 'diamantino',
 'overdue',
 'patriarchal',
 'intros',
 'byu',
 'frisky',
 'tum',
 'silhouetted',
 'cruelity',
 'cannibal',
 'cule',
 'failure',
 'darts',
 'seminar',
 'pret',
 'coleridge',
 'sourpuss',
 'buccaneer',
 'photowise',
 'redundancies',
 'critisim',
 'arielle',
 'furtive',
 'atlantians',
 'kwok',
 'mccain',
 'costar',
 'sleaziest',
 'reaally',
 'repugnancy',
 'celery',
 'streamlining',
 'basra',
 'virtuous',
 'democrats',
 'brazilian',
 'inanely',
 'cranial',
 'thrice',
 'artiest',
 'expose',
 'hackenstein',
 'nuns',
 'garda',
 'savalas',
 'debts',
 'replicated',
 'hotwired',
 'trolls',
 'antiwar',
 'mcallister',
 'appalachia',
 'dimes',
 'steretyped',
 'rukh',
 'tramps',
 'impulses',
 'collaborator',
 'exeption',
 'hms',
 'wolsky',
 'terrorizer',
 'roflmao',
 'barrio',
 'rantzen',
 'kaufmann',
 'arms',
 'telkovsky',
 'estes',
 'clearer',
 'vachtangi',
 'rougher',
 'mikuni',
 'zinemann',
 'unizhennye',
 'gothas',
 'governmentmedia',
 'lis',
 'affable',
 'unfotunately',
 'wieder',
 'delane',
 'achievable',
 'spinsterish',
 'clytemnestra',
 'wichita',
 'textbook',
 'regrets',
 'gosha',
 'clement',
 'wiggly',
 'salle',
 'derboiler',
 'wads',
 'fraculater',
 'directors',
 'tugging',
 'stuhr',
 'revelling',
 'bedlam',
 'fanaticism',
 'keyser',
 'pests',
 'joey',
 'sleepless',
 'ruggia',
 'watkins',
 'cadby',
 'quotes',
 'centralized',
 'publicists',
 'marshal',
 'tadger',
 'traditionally',
 'pat',
 'adaptaion',
 'nonprofessional',
 'puny',
 'developping',
 'huey',
 'morrisette',
 'waldomiro',
 'auditioning',
 'eastwoods',
 'counterweight',
 'metamorphis',
 'attanborough',
 'sadness',
 'torre',
 'fraidy',
 'piercings',
 'superwonderscope',
 'nietszche',
 'sione',
 'beggining',
 'rotne',
 'indomitability',
 'atley',
 'molnar',
 'fruits',
 'greeter',
 'recompense',
 'foreshadowed',
 'tannhauser',
 'cats',
 'goriness',
 'hirjee',
 'clockers',
 'scums',
 'extort',
 'sets',
 'brooked',
 'charley',
 'dissing',
 'paraphernalia',
 'belisario',
 'ververgaert',
 'bonet',
 'toly',
 'raggedys',
 'chuck',
 'saxophonists',
 'sulfurous',
 'carrion',
 'fangorn',
 'haige',
 'bambaiya',
 'rentar',
 'raptus',
 'lupa',
 'mordant',
 'chestnuts',
 'methodology',
 'synchronicity',
 'lbs',
 'mutilating',
 'fellatio',
 'zapar',
 'apparel',
 'descendant',
 'delaware',
 'proof',
 'combatant',
 'oozed',
 'unbelieveable',
 'adjuster',
 'bliep',
 'speared',
 'smelling',
 'soviet',
 'strings',
 'keen',
 'picturization',
 'curits',
 'brad',
 'explosive',
 'rosa',
 'regales',
 'blackgood',
 'prosy',
 'roadkill',
 'brocoli',
 'snickers',
 'benussi',
 'propagandist',
 'castle',
 'hayseed',
 'stretchs',
 'badgering',
 'fatherland',
 'makeup',
 'aldiss',
 'inverts',
 'outward',
 'looking',
 'lutz',
 'huitieme',
 'cds',
 'whispers',
 'inconsequential',
 'substantiate',
 'klembecker',
 'fluctuates',
 'lamented',
 'rides',
 'trustees',
 'omarosa',
 'poliwhirl',
 'mothballed',
 'femi',
 'dinged',
 'casio',
 'nighty',
 'espionage',
 'golgo',
 'commonality',
 'bodysuckers',
 'semester',
 'unnaturally',
 'surging',
 'havana',
 'classicists',
 'chimps',
 'rusting',
 'sooni',
 'gish',
 'strickland',
 'unctuous',
 'quarreled',
 'expands',
 'zeffirelli',
 'inarguably',
 'blackploitation',
 'manhattanites',
 'summing',
 'absolutly',
 'galvanize',
 'clerks',
 'insidiously',
 'empt',
 'brewery',
 'steph',
 'batali',
 'coulouris',
 'arena',
 'turkish',
 'undercooked',
 'juveniles',
 'hopes',
 'departs',
 'jima',
 'burgendy',
 'mbongeni',
 'gazillion',
 'calicos',
 'oaks',
 'wrestled',
 'puling',
 'trixie',
 'kalashnikov',
 'strangeness',
 'cots',
 'populated',
 'thespic',
 'mache',
 'daubeney',
 'steaming',
 'parmistan',
 'waaaaaay',
 'misbehaves',
 'local',
 'resent',
 'massacred',
 'trifling',
 ...]

In [19]:
import numpy as np

layer_0 = np.zeros((1,vocab_size))
layer_0


Out[19]:
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.]])

In [20]:
from IPython.display import Image
Image(filename='sentiment_network.png')


Out[20]:

In [21]:
word2index = {}

for i,word in enumerate(vocab):
    word2index[word] = i
word2index


Out[21]:
{'': 0,
 'paradise': 1,
 'devotion': 2,
 'rwtd': 3,
 'hazy': 4,
 'governing': 5,
 'shied': 6,
 'usb': 7,
 'kahlua': 8,
 'dolman': 9,
 'tiana': 10,
 'toms': 11,
 'waterslides': 12,
 'perpetuates': 13,
 'ragged': 14,
 'kirtland': 15,
 'unparalleled': 16,
 'scrubbed': 17,
 'contemperaneous': 18,
 'hellishly': 19,
 'gimli': 20,
 'uncynical': 21,
 'synonym': 22,
 'proba': 23,
 'trident': 24,
 'sidestep': 25,
 'babban': 26,
 'chenoweth': 27,
 'cannabalistic': 28,
 'spots': 29,
 'ihave': 30,
 'kriemshild': 31,
 'frys': 32,
 'route': 33,
 'innsbruck': 34,
 'nonreligious': 35,
 'swig': 36,
 'haft': 37,
 'wags': 38,
 'rugrats': 39,
 'blocker': 40,
 'paltrow': 41,
 'pacer': 42,
 'prada': 43,
 'snapshotters': 44,
 'weaver': 45,
 'eaters': 46,
 'floradora': 47,
 'deflected': 48,
 'batch': 49,
 'dub': 50,
 'unreasonable': 51,
 'clenching': 52,
 'introducing': 53,
 'plugged': 54,
 'venice': 55,
 'nlp': 56,
 'honored': 57,
 'pittman': 58,
 'plodded': 59,
 'airliners': 60,
 'grimacing': 61,
 'centerfold': 62,
 'whiteboy': 63,
 'preservatives': 64,
 'ninteen': 65,
 'beaut': 66,
 'gaggling': 67,
 'hotdog': 68,
 'demonized': 69,
 'therapeutic': 70,
 'demon': 71,
 'jonatha': 72,
 'ramping': 73,
 'ingmar': 74,
 'accorsi': 75,
 'lsd': 76,
 'facts': 77,
 'shiny': 78,
 'climber': 79,
 'gallows': 80,
 'konigin': 81,
 'drownes': 82,
 'preparatory': 83,
 'crimson': 84,
 'simplyfied': 85,
 'recomendation': 86,
 'demolish': 87,
 'orignal': 88,
 'preempt': 89,
 'mpkdh': 90,
 'counterfeit': 91,
 'beastly': 92,
 'overhead': 93,
 'sollett': 94,
 'unskillful': 95,
 'andersen': 96,
 'plateaus': 97,
 'old': 98,
 'noam': 99,
 'startling': 100,
 'mutilation': 101,
 'projective': 102,
 'liquefied': 103,
 'favorites': 104,
 'delusional': 105,
 'meddlesome': 106,
 'swankiest': 107,
 'lwensohn': 108,
 'eds': 109,
 'anabel': 110,
 'mcdougall': 111,
 'waving': 112,
 'following': 113,
 'hissy': 114,
 'voracious': 115,
 'demotes': 116,
 'geopolitical': 117,
 'paramour': 118,
 'hultn': 119,
 'gendered': 120,
 'atrociousness': 121,
 'kipling': 122,
 'withdrawal': 123,
 'mucks': 124,
 'repairmen': 125,
 'headaches': 126,
 'kilts': 127,
 'rosson': 128,
 'linklatter': 129,
 'spasitc': 130,
 'rude': 131,
 'pupsi': 132,
 'blainsworth': 133,
 'undeclared': 134,
 'birdman': 135,
 'tian': 136,
 'excell': 137,
 'spaces': 138,
 'reviewing': 139,
 'arlette': 140,
 'deewar': 141,
 'blanked': 142,
 'wellesley': 143,
 'yaphet': 144,
 'despise': 145,
 'infective': 146,
 'accredited': 147,
 'hatsumo': 148,
 'allows': 149,
 'thieriot': 150,
 'royale': 151,
 'bolye': 152,
 'concentrated': 153,
 'lacked': 154,
 'benedict': 155,
 'fluegel': 156,
 'undercurrent': 157,
 'unrurly': 158,
 'idly': 159,
 'sophomore': 160,
 'schilling': 161,
 'burlesque': 162,
 'undead': 163,
 'define': 164,
 'reincarnations': 165,
 'transparencies': 166,
 'aparthied': 167,
 'reggae': 168,
 'sprog': 169,
 'matel': 170,
 'ekeing': 171,
 'comparable': 172,
 'marcy': 173,
 'caregiver': 174,
 'roo': 175,
 'prating': 176,
 'skips': 177,
 'petrillo': 178,
 'underpinnings': 179,
 'stallonethat': 180,
 'mistresses': 181,
 'laundromat': 182,
 'contrite': 183,
 'split': 184,
 'eliminating': 185,
 'poppins': 186,
 'meanie': 187,
 'southron': 188,
 'tait': 189,
 'allover': 190,
 'korman': 191,
 'normally': 192,
 'departure': 193,
 'effortlessly': 194,
 'otherworldliness': 195,
 'chest': 196,
 'pah': 197,
 'scorned': 198,
 'bumble': 199,
 'velocity': 200,
 'duration': 201,
 'motivates': 202,
 'catty': 203,
 'improbabilities': 204,
 'facet': 205,
 'concieved': 206,
 'lascivious': 207,
 'ferociously': 208,
 'silvia': 209,
 'walder': 210,
 'transfixing': 211,
 'gruntled': 212,
 'penpusher': 213,
 'vill': 214,
 'herriman': 215,
 'smartie': 216,
 'collapse': 217,
 'unashamedly': 218,
 'idioterne': 219,
 'darkside': 220,
 'anachronic': 221,
 'kerim': 222,
 'nihlani': 223,
 'topping': 224,
 'yama': 225,
 'arrghh': 226,
 'detatched': 227,
 'magnificant': 228,
 'strength': 229,
 'sontee': 230,
 'troublesome': 231,
 'lakers': 232,
 'resolve': 233,
 'resurrection': 234,
 'editorializing': 235,
 'wexler': 236,
 'backdrops': 237,
 'stevenson': 238,
 'erred': 239,
 'comparrison': 240,
 'dabney': 241,
 'parallels': 242,
 'wcw': 243,
 'adrien': 244,
 'hasselhof': 245,
 'tongues': 246,
 'quip': 247,
 'artist': 248,
 'pagan': 249,
 'trademarks': 250,
 'bingham': 251,
 'necheyev': 252,
 'sais': 253,
 'winger': 254,
 'finds': 255,
 'gryll': 256,
 'winterwonder': 257,
 'producer': 258,
 'humoristic': 259,
 'dighton': 260,
 'blocking': 261,
 'feij': 262,
 'gabbled': 263,
 'lars': 264,
 'yamashita': 265,
 'bekmambetov': 266,
 'pure': 267,
 'perspicacious': 268,
 'concerned': 269,
 'protaganiste': 270,
 'electing': 271,
 'ceremony': 272,
 'nva': 273,
 'longendecker': 274,
 'tassi': 275,
 'overviews': 276,
 'hannay': 277,
 'dumb': 278,
 'gaberial': 279,
 'booboo': 280,
 'hoffman': 281,
 'harrowing': 282,
 'guerriri': 283,
 'prominant': 284,
 'whirry': 285,
 'evangelion': 286,
 'popinjay': 287,
 'recommended': 288,
 'develops': 289,
 'heroistic': 290,
 'crumpled': 291,
 'scoop': 292,
 'azuma': 293,
 'contextualising': 294,
 'durn': 295,
 'joycey': 296,
 'tvm': 297,
 'megabomb': 298,
 'versatile': 299,
 'listens': 300,
 'natalie': 301,
 'shelved': 302,
 'recored': 303,
 'symbiosis': 304,
 'unfocused': 305,
 'berth': 306,
 'georgians': 307,
 'amlie': 308,
 'heavyarms': 309,
 'collette': 310,
 'soldierly': 311,
 'contracting': 312,
 'anastacia': 313,
 'herv': 314,
 'wayback': 315,
 'insensitive': 316,
 'activist': 317,
 'judders': 318,
 'upstream': 319,
 'romanced': 320,
 'flix': 321,
 'presently': 322,
 'securing': 323,
 'jurgens': 324,
 'husen': 325,
 'bleakness': 326,
 'geare': 327,
 'regularity': 328,
 'casper': 329,
 'arming': 330,
 'yubari': 331,
 'catwalk': 332,
 'domestic': 333,
 'transplantation': 334,
 'ennia': 335,
 'fleurieu': 336,
 'censorious': 337,
 'westwood': 338,
 'psyching': 339,
 'divers': 340,
 'kikki': 341,
 'passion': 342,
 'coctails': 343,
 'ne': 344,
 'intermingle': 345,
 'lagravenese': 346,
 'bassinger': 347,
 'disengaged': 348,
 'pennies': 349,
 'patheticness': 350,
 'kheymeh': 351,
 'kiarostami': 352,
 'dions': 353,
 'hearing': 354,
 'caustic': 355,
 'impending': 356,
 'shrine': 357,
 'corinne': 358,
 'sorrowfully': 359,
 'ramotswe': 360,
 'ears': 361,
 'benq': 362,
 'oompah': 363,
 'maneur': 364,
 'pleasance': 365,
 'plummer': 366,
 'carver': 367,
 'laborious': 368,
 'chancellor': 369,
 'nyquist': 370,
 'houseboats': 371,
 'womman': 372,
 'jayenge': 373,
 'boswell': 374,
 'keillor': 375,
 'askey': 376,
 'objected': 377,
 'conflicting': 378,
 'shapes': 379,
 'ww': 380,
 'gravitate': 381,
 'warpath': 382,
 'guadalajara': 383,
 'idylls': 384,
 'dignities': 385,
 'muddies': 386,
 'fairytales': 387,
 'untangle': 388,
 'feast': 389,
 'soars': 390,
 'lowered': 391,
 'notwithstanding': 392,
 'malplaced': 393,
 'arriv': 394,
 'ridden': 395,
 'laawaris': 396,
 'lyu': 397,
 'reappears': 398,
 'playoffs': 399,
 'idol': 400,
 'orphanage': 401,
 'iceholes': 402,
 'cebuano': 403,
 'multilevel': 404,
 'drugging': 405,
 'argued': 406,
 'vapidness': 407,
 'unzips': 408,
 'ayone': 409,
 'peeling': 410,
 'baggot': 411,
 'abishek': 412,
 'halluzinations': 413,
 'restrictions': 414,
 'mirrors': 415,
 'mimic': 416,
 'reaganism': 417,
 'bgr': 418,
 'magazines': 419,
 'naysayer': 420,
 'richly': 421,
 'modulation': 422,
 'perspectives': 423,
 'luckett': 424,
 'willed': 425,
 'plying': 426,
 'sherrys': 427,
 'bloodiness': 428,
 'mortal': 429,
 'tieing': 430,
 'rename': 431,
 'agns': 432,
 'melman': 433,
 'caminho': 434,
 'cnn': 435,
 'unbeknownest': 436,
 'kinematograph': 437,
 'rtl': 438,
 'footnotes': 439,
 'kobe': 440,
 'insurgents': 441,
 'hathcocks': 442,
 'salk': 443,
 'alyce': 444,
 'bestowing': 445,
 'complacency': 446,
 'soured': 447,
 'ullman': 448,
 'yield': 449,
 'calchas': 450,
 'glinda': 451,
 'hennessey': 452,
 'amateur': 453,
 'halperin': 454,
 'zealands': 455,
 'purposely': 456,
 'chilton': 457,
 'linch': 458,
 'eartha': 459,
 'hyperspace': 460,
 'legioners': 461,
 'ninjas': 462,
 'grosse': 463,
 'balls': 464,
 'satanic': 465,
 'shoudl': 466,
 'scalped': 467,
 'afterschool': 468,
 'transient': 469,
 'shalom': 470,
 'coherently': 471,
 'endeavoring': 472,
 'tobei': 473,
 'gnashingly': 474,
 'manhole': 475,
 'coixet': 476,
 'soundtracks': 477,
 'kohala': 478,
 'edo': 479,
 'incentivized': 480,
 'ibsen': 481,
 'breckenridge': 482,
 'thoughtlessness': 483,
 'identification': 484,
 'derails': 485,
 'cinematographers': 486,
 'tamako': 487,
 'jeroen': 488,
 'rhind': 489,
 'deniselacey': 490,
 'candolis': 491,
 'caisse': 492,
 'rationalize': 493,
 'exasperatedly': 494,
 'ibnez': 495,
 'congregations': 496,
 'heartbreak': 497,
 'unintended': 498,
 'could': 499,
 'deceiving': 500,
 'ubc': 501,
 'thumbtack': 502,
 'economies': 503,
 'delanda': 504,
 'booooring': 505,
 'ati': 506,
 'mariel': 507,
 'computability': 508,
 'engulf': 509,
 'tindersticks': 510,
 'reverse': 511,
 'bobo': 512,
 'sewing': 513,
 'boreham': 514,
 'defying': 515,
 'toning': 516,
 'packed': 517,
 'innovated': 518,
 'clot': 519,
 'rosarios': 520,
 'arbore': 521,
 'terror': 522,
 'sticks': 523,
 'fraudulence': 524,
 'subsided': 525,
 'regretful': 526,
 'snipering': 527,
 'deficating': 528,
 'rampages': 529,
 'blackbird': 530,
 'howser': 531,
 'naismith': 532,
 'kornbluths': 533,
 'somersaulted': 534,
 'sensationalistic': 535,
 'qualitatively': 536,
 'contrasting': 537,
 'crapdom': 538,
 'chichi': 539,
 'phi': 540,
 'ladder': 541,
 'shrunken': 542,
 'duh': 543,
 'materialistic': 544,
 'winstons': 545,
 'mohd': 546,
 'warters': 547,
 'husbandgino': 548,
 'benno': 549,
 'clomps': 550,
 'arros': 551,
 'reefer': 552,
 'hilarius': 553,
 'attemps': 554,
 'smirky': 555,
 'serrador': 556,
 'trivialized': 557,
 'thirds': 558,
 'neva': 559,
 'tykes': 560,
 'dyeing': 561,
 'excelent': 562,
 'crest': 563,
 'essential': 564,
 'aloft': 565,
 'matchbox': 566,
 'befores': 567,
 'mpaarated': 568,
 'thinly': 569,
 'vaudevillian': 570,
 'cineliterate': 571,
 'leaving': 572,
 'burlinson': 573,
 'muzzy': 574,
 'oddly': 575,
 'proper': 576,
 'airphone': 577,
 'mujde': 578,
 'client': 579,
 'imaginatively': 580,
 'addario': 581,
 'coworkers': 582,
 'suspicious': 583,
 'cutlet': 584,
 'barley': 585,
 'sparring': 586,
 'francine': 587,
 'gwilym': 588,
 'insanities': 589,
 'blank': 590,
 'farly': 591,
 'groove': 592,
 'fallowing': 593,
 'ogre': 594,
 'mightiest': 595,
 'telling': 596,
 'interracial': 597,
 'undo': 598,
 'prosthetic': 599,
 'outmatched': 600,
 'operating': 601,
 'fill': 602,
 'goldthwait': 603,
 'trappings': 604,
 'govida': 605,
 'coral': 606,
 'masssacre': 607,
 'stooges': 608,
 'simulated': 609,
 'kik': 610,
 'culminates': 611,
 'occaisionally': 612,
 'mme': 613,
 'unluckily': 614,
 'hometown': 615,
 'irishman': 616,
 'jymn': 617,
 'devouring': 618,
 'reprimanded': 619,
 'dealings': 620,
 'belaboured': 621,
 'denmark': 622,
 'kooks': 623,
 'maman': 624,
 'attain': 625,
 'ahhhhhh': 626,
 'demolishing': 627,
 'bdus': 628,
 'karadzhic': 629,
 'proficient': 630,
 'friedrich': 631,
 'kaczorowski': 632,
 'catboy': 633,
 'kazakh': 634,
 'takechi': 635,
 'inexhaustible': 636,
 'bragg': 637,
 'verikoan': 638,
 'soh': 639,
 'cletus': 640,
 'fugue': 641,
 'carteloise': 642,
 'visayans': 643,
 'microwaving': 644,
 'diepardieu': 645,
 'iyer': 646,
 'outmoded': 647,
 'partha': 648,
 'firework': 649,
 'spores': 650,
 'clack': 651,
 'kimberley': 652,
 'capraesque': 653,
 'elmes': 654,
 'kaye': 655,
 'conceptions': 656,
 'compiled': 657,
 'tastic': 658,
 'zizek': 659,
 'distribution': 660,
 'rajasthani': 661,
 'kak': 662,
 'waaaay': 663,
 'mousy': 664,
 'martnez': 665,
 'tollywood': 666,
 'suggestively': 667,
 'phase': 668,
 'trios': 669,
 'tripped': 670,
 'rombero': 671,
 'jianxiang': 672,
 'hyser': 673,
 'stumps': 674,
 'butlers': 675,
 'vaughan': 676,
 'indra': 677,
 'fairmindedness': 678,
 'unshaven': 679,
 'idiotically': 680,
 'rudolf': 681,
 'circulate': 682,
 'kmadden': 683,
 'titantic': 684,
 'wallop': 685,
 'christo': 686,
 'imprisonment': 687,
 'actively': 688,
 'westernisation': 689,
 'personalize': 690,
 'enraging': 691,
 'impersonating': 692,
 'benson': 693,
 'daghang': 694,
 'fork': 695,
 'eventide': 696,
 'convinced': 697,
 'haughtiness': 698,
 'underclothing': 699,
 'idyllic': 700,
 'pragmatism': 701,
 'reporter': 702,
 'slowish': 703,
 'sanjeev': 704,
 'diagnosis': 705,
 'diamantino': 706,
 'overdue': 707,
 'patriarchal': 708,
 'intros': 709,
 'byu': 710,
 'frisky': 711,
 'tum': 712,
 'silhouetted': 713,
 'cruelity': 714,
 'cannibal': 715,
 'cule': 716,
 'failure': 717,
 'darts': 718,
 'seminar': 719,
 'pret': 720,
 'coleridge': 721,
 'sourpuss': 722,
 'buccaneer': 723,
 'photowise': 724,
 'redundancies': 725,
 'critisim': 726,
 'arielle': 727,
 'furtive': 728,
 'atlantians': 729,
 'kwok': 730,
 'mccain': 731,
 'costar': 732,
 'sleaziest': 733,
 'reaally': 734,
 'repugnancy': 735,
 'celery': 736,
 'streamlining': 737,
 'basra': 738,
 'virtuous': 739,
 'democrats': 740,
 'brazilian': 741,
 'inanely': 742,
 'cranial': 743,
 'thrice': 744,
 'artiest': 745,
 'expose': 746,
 'hackenstein': 747,
 'nuns': 748,
 'garda': 749,
 'savalas': 750,
 'debts': 751,
 'replicated': 752,
 'hotwired': 753,
 'trolls': 754,
 'antiwar': 755,
 'mcallister': 756,
 'appalachia': 757,
 'dimes': 758,
 'steretyped': 759,
 'rukh': 760,
 'tramps': 761,
 'impulses': 762,
 'collaborator': 763,
 'exeption': 764,
 'hms': 765,
 'wolsky': 766,
 'terrorizer': 767,
 'roflmao': 768,
 'barrio': 769,
 'rantzen': 770,
 'kaufmann': 771,
 'arms': 772,
 'telkovsky': 773,
 'estes': 774,
 'clearer': 775,
 'vachtangi': 776,
 'rougher': 777,
 'mikuni': 778,
 'zinemann': 779,
 'unizhennye': 780,
 'gothas': 781,
 'governmentmedia': 782,
 'lis': 783,
 'affable': 784,
 'unfotunately': 785,
 'wieder': 786,
 'delane': 787,
 'achievable': 788,
 'spinsterish': 789,
 'clytemnestra': 790,
 'wichita': 791,
 'textbook': 792,
 'regrets': 793,
 'gosha': 794,
 'clement': 795,
 'wiggly': 796,
 'salle': 797,
 'derboiler': 798,
 'wads': 799,
 'fraculater': 800,
 'directors': 801,
 'tugging': 802,
 'stuhr': 803,
 'revelling': 804,
 'bedlam': 805,
 'fanaticism': 806,
 'keyser': 807,
 'pests': 808,
 'joey': 809,
 'sleepless': 810,
 'ruggia': 811,
 'watkins': 812,
 'cadby': 813,
 'quotes': 814,
 'centralized': 815,
 'publicists': 816,
 'marshal': 817,
 'tadger': 818,
 'traditionally': 819,
 'pat': 820,
 'adaptaion': 821,
 'nonprofessional': 822,
 'puny': 823,
 'developping': 824,
 'huey': 825,
 'morrisette': 826,
 'waldomiro': 827,
 'auditioning': 828,
 'eastwoods': 829,
 'counterweight': 830,
 'metamorphis': 831,
 'attanborough': 832,
 'sadness': 833,
 'torre': 834,
 'fraidy': 835,
 'piercings': 836,
 'superwonderscope': 837,
 'nietszche': 838,
 'sione': 839,
 'beggining': 840,
 'rotne': 841,
 'indomitability': 842,
 'atley': 843,
 'molnar': 844,
 'fruits': 845,
 'greeter': 846,
 'recompense': 847,
 'foreshadowed': 848,
 'tannhauser': 849,
 'cats': 850,
 'goriness': 851,
 'hirjee': 852,
 'clockers': 853,
 'scums': 854,
 'extort': 855,
 'sets': 856,
 'brooked': 857,
 'charley': 858,
 'dissing': 859,
 'paraphernalia': 860,
 'belisario': 861,
 'ververgaert': 862,
 'bonet': 863,
 'toly': 864,
 'raggedys': 865,
 'chuck': 866,
 'saxophonists': 867,
 'sulfurous': 868,
 'carrion': 869,
 'fangorn': 870,
 'haige': 871,
 'bambaiya': 872,
 'rentar': 873,
 'raptus': 874,
 'lupa': 875,
 'mordant': 876,
 'chestnuts': 877,
 'methodology': 878,
 'synchronicity': 879,
 'lbs': 880,
 'mutilating': 881,
 'fellatio': 882,
 'zapar': 883,
 'apparel': 884,
 'descendant': 885,
 'delaware': 886,
 'proof': 887,
 'combatant': 888,
 'oozed': 889,
 'unbelieveable': 890,
 'adjuster': 891,
 'bliep': 892,
 'speared': 893,
 'smelling': 894,
 'soviet': 895,
 'strings': 896,
 'keen': 897,
 'picturization': 898,
 'curits': 899,
 'brad': 900,
 'explosive': 901,
 'rosa': 902,
 'regales': 903,
 'blackgood': 904,
 'prosy': 905,
 'roadkill': 906,
 'brocoli': 907,
 'snickers': 908,
 'benussi': 909,
 'propagandist': 910,
 'castle': 911,
 'hayseed': 912,
 'stretchs': 913,
 'badgering': 914,
 'fatherland': 915,
 'makeup': 916,
 'aldiss': 917,
 'inverts': 918,
 'outward': 919,
 'looking': 920,
 'lutz': 921,
 'huitieme': 922,
 'cds': 923,
 'whispers': 924,
 'inconsequential': 925,
 'substantiate': 926,
 'klembecker': 927,
 'fluctuates': 928,
 'lamented': 929,
 'rides': 930,
 'trustees': 931,
 'omarosa': 932,
 'poliwhirl': 933,
 'mothballed': 934,
 'femi': 935,
 'dinged': 936,
 'casio': 937,
 'nighty': 938,
 'espionage': 939,
 'golgo': 940,
 'commonality': 941,
 'bodysuckers': 942,
 'semester': 943,
 'unnaturally': 944,
 'surging': 945,
 'havana': 946,
 'classicists': 947,
 'chimps': 948,
 'rusting': 949,
 'sooni': 950,
 'gish': 951,
 'strickland': 952,
 'unctuous': 953,
 'quarreled': 954,
 'expands': 955,
 'zeffirelli': 956,
 'inarguably': 957,
 'blackploitation': 958,
 'manhattanites': 959,
 'summing': 960,
 'absolutly': 961,
 'galvanize': 962,
 'clerks': 963,
 'insidiously': 964,
 'empt': 965,
 'brewery': 966,
 'steph': 967,
 'batali': 968,
 'coulouris': 969,
 'arena': 970,
 'turkish': 971,
 'undercooked': 972,
 'juveniles': 973,
 'hopes': 974,
 'departs': 975,
 'jima': 976,
 'burgendy': 977,
 'mbongeni': 978,
 'gazillion': 979,
 'calicos': 980,
 'oaks': 981,
 'wrestled': 982,
 'puling': 983,
 'trixie': 984,
 'kalashnikov': 985,
 'strangeness': 986,
 'cots': 987,
 'populated': 988,
 'thespic': 989,
 'mache': 990,
 'daubeney': 991,
 'steaming': 992,
 'parmistan': 993,
 'waaaaaay': 994,
 'misbehaves': 995,
 'local': 996,
 'resent': 997,
 'massacred': 998,
 'trifling': 999,
 ...}

In [22]:
def update_input_layer(review):
    
    global layer_0
    
    # clear out previous state, reset the layer to be all 0s
    layer_0 *= 0
    for word in review.split(" "):
        layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

In [23]:
layer_0


Out[23]:
array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

In [24]:
def get_target_for_label(label):
    if(label == 'POSITIVE'):
        return 1
    else:
        return 0

In [25]:
labels[0]


Out[25]:
'POSITIVE'

In [26]:
get_target_for_label(labels[0])


Out[26]:
1

In [27]:
labels[1]


Out[27]:
'NEGATIVE'

In [28]:
get_target_for_label(labels[1])


Out[28]:
0

Project 3: Building a Neural Network

  • Start with your neural network from the last chapter
  • 3 layer neural network
  • no non-linearity in hidden layer
  • use our functions to create the training data
  • create a "pre_process_data" function to create vocabulary for our training data generating functions
  • modify "train" to train over the entire corpus

Where to Get Help if You Need it


In [29]:
import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
    def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):
       
        # set our random number generator 
        np.random.seed(1)
    
        self.pre_process_data(reviews, labels)
        
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)
        
        
    def pre_process_data(self, reviews, labels):
        
        review_vocab = set()
        for review in reviews:
            for word in review.split(" "):
                review_vocab.add(word)
        self.review_vocab = list(review_vocab)
        
        label_vocab = set()
        for label in labels:
            label_vocab.add(label)
        
        self.label_vocab = list(label_vocab)
        
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        self.word2index = {}
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
         
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
    
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        self.learning_rate = learning_rate
        
        self.layer_0 = np.zeros((1,input_nodes))
    
        
    def update_input_layer(self,review):

        # clear out previous state, reset the layer to be all 0s
        self.layer_0 *= 0
        for word in review.split(" "):
            if(word in self.word2index.keys()):
                self.layer_0[0][self.word2index[word]] += 1
                
    def get_target_for_label(self,label):
        if(label == 'POSITIVE'):
            return 1
        else:
            return 0
        
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    
    
    def sigmoid_output_2_derivative(self,output):
        return output * (1 - output)
    
    def train(self, training_reviews, training_labels):
        
        assert(len(training_reviews) == len(training_labels))
        
        correct_so_far = 0
        
        start = time.time()
        
        for i in range(len(training_reviews)):
            
            review = training_reviews[i]
            label = training_labels[i]
            
            #### Implement the forward pass here ####
            ### Forward pass ###

            # Input Layer
            self.update_input_layer(review)

            # Hidden layer
            layer_1 = self.layer_0.dot(self.weights_0_1)

            # Output layer
            layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

            #### Implement the backward pass here ####
            ### Backward pass ###

            # TODO: Output error
            layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # TODO: Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

            # TODO: Update the weights
            self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
            self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

            if(np.abs(layer_2_error) < 0.5):
                correct_so_far += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
            if(i % 2500 == 0):
                print("")
    
    def test(self, testing_reviews, testing_labels):
        
        correct = 0
        
        start = time.time()
        
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                            + "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
    
    def run(self, review):
        
        # Input Layer
        self.update_input_layer(review.lower())

        # Hidden layer
        layer_1 = self.layer_0.dot(self.weights_0_1)

        # Output layer
        layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] > 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"

In [87]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

In [61]:
# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])


Progress:99.9% Speed(reviews/sec):587.5% #Correct:500 #Tested:1000 Testing Accuracy:50.0%

In [62]:
# train the network
mlp.train(reviews[:-1000],labels[:-1000])


Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):89.58 #Correct:1250 #Trained:2501 Training Accuracy:49.9%
Progress:20.8% Speed(reviews/sec):95.03 #Correct:2500 #Trained:5001 Training Accuracy:49.9%
Progress:27.4% Speed(reviews/sec):95.46 #Correct:3295 #Trained:6592 Training Accuracy:49.9%
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-62-d0f5d85ad402> in <module>()
      1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
    117             # TODO: Update the weights
    118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
    120 
    121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt: 

In [63]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.01)

In [64]:
# train the network
mlp.train(reviews[:-1000],labels[:-1000])


Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):96.39 #Correct:1247 #Trained:2501 Training Accuracy:49.8%
Progress:20.8% Speed(reviews/sec):99.31 #Correct:2497 #Trained:5001 Training Accuracy:49.9%
Progress:22.8% Speed(reviews/sec):99.02 #Correct:2735 #Trained:5476 Training Accuracy:49.9%
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-64-d0f5d85ad402> in <module>()
      1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
    117             # TODO: Update the weights
    118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
    120 
    121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt: 

In [65]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.001)

In [66]:
# train the network
mlp.train(reviews[:-1000],labels[:-1000])


Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):98.77 #Correct:1267 #Trained:2501 Training Accuracy:50.6%
Progress:20.8% Speed(reviews/sec):98.79 #Correct:2640 #Trained:5001 Training Accuracy:52.7%
Progress:31.2% Speed(reviews/sec):98.58 #Correct:4109 #Trained:7501 Training Accuracy:54.7%
Progress:41.6% Speed(reviews/sec):93.78 #Correct:5638 #Trained:10001 Training Accuracy:56.3%
Progress:52.0% Speed(reviews/sec):91.76 #Correct:7246 #Trained:12501 Training Accuracy:57.9%
Progress:62.5% Speed(reviews/sec):92.42 #Correct:8841 #Trained:15001 Training Accuracy:58.9%
Progress:69.4% Speed(reviews/sec):92.58 #Correct:9934 #Trained:16668 Training Accuracy:59.5%
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-66-d0f5d85ad402> in <module>()
      1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
    117             # TODO: Update the weights
    118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
    120 
    121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt: 

Understanding Neural Noise


In [67]:
from IPython.display import Image
Image(filename='sentiment_network.png')


Out[67]:

In [70]:
def update_input_layer(review):
    
    global layer_0
    
    # clear out previous state, reset the layer to be all 0s
    layer_0 *= 0
    for word in review.split(" "):
        layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

In [71]:
layer_0


Out[71]:
array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

In [79]:
review_counter = Counter()

In [80]:
for word in reviews[0].split(" "):
    review_counter[word] += 1

In [81]:
review_counter.most_common()


Out[81]:
[('.', 27),
 ('', 18),
 ('the', 9),
 ('to', 6),
 ('i', 5),
 ('high', 5),
 ('is', 4),
 ('of', 4),
 ('a', 4),
 ('bromwell', 4),
 ('teachers', 4),
 ('that', 4),
 ('their', 2),
 ('my', 2),
 ('at', 2),
 ('as', 2),
 ('me', 2),
 ('in', 2),
 ('students', 2),
 ('it', 2),
 ('student', 2),
 ('school', 2),
 ('through', 1),
 ('insightful', 1),
 ('ran', 1),
 ('years', 1),
 ('here', 1),
 ('episode', 1),
 ('reality', 1),
 ('what', 1),
 ('far', 1),
 ('t', 1),
 ('saw', 1),
 ('s', 1),
 ('repeatedly', 1),
 ('isn', 1),
 ('closer', 1),
 ('and', 1),
 ('fetched', 1),
 ('remind', 1),
 ('can', 1),
 ('welcome', 1),
 ('line', 1),
 ('your', 1),
 ('survive', 1),
 ('teaching', 1),
 ('satire', 1),
 ('classic', 1),
 ('who', 1),
 ('age', 1),
 ('knew', 1),
 ('schools', 1),
 ('inspector', 1),
 ('comedy', 1),
 ('down', 1),
 ('about', 1),
 ('pity', 1),
 ('m', 1),
 ('all', 1),
 ('adults', 1),
 ('see', 1),
 ('think', 1),
 ('situation', 1),
 ('time', 1),
 ('pomp', 1),
 ('lead', 1),
 ('other', 1),
 ('much', 1),
 ('many', 1),
 ('which', 1),
 ('one', 1),
 ('profession', 1),
 ('programs', 1),
 ('same', 1),
 ('some', 1),
 ('such', 1),
 ('pettiness', 1),
 ('immediately', 1),
 ('expect', 1),
 ('financially', 1),
 ('recalled', 1),
 ('tried', 1),
 ('whole', 1),
 ('right', 1),
 ('life', 1),
 ('cartoon', 1),
 ('scramble', 1),
 ('sack', 1),
 ('believe', 1),
 ('when', 1),
 ('than', 1),
 ('burn', 1),
 ('pathetic', 1)]

Project 4: Reducing Noise in our Input Data


In [82]:
import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
    def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):
       
        # set our random number generator 
        np.random.seed(1)
    
        self.pre_process_data(reviews, labels)
        
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)
        
        
    def pre_process_data(self, reviews, labels):
        
        review_vocab = set()
        for review in reviews:
            for word in review.split(" "):
                review_vocab.add(word)
        self.review_vocab = list(review_vocab)
        
        label_vocab = set()
        for label in labels:
            label_vocab.add(label)
        
        self.label_vocab = list(label_vocab)
        
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        self.word2index = {}
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
         
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
    
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        self.learning_rate = learning_rate
        
        self.layer_0 = np.zeros((1,input_nodes))
    
        
    def update_input_layer(self,review):

        # clear out previous state, reset the layer to be all 0s
        self.layer_0 *= 0
        for word in review.split(" "):
            if(word in self.word2index.keys()):
                self.layer_0[0][self.word2index[word]] = 1
                
    def get_target_for_label(self,label):
        if(label == 'POSITIVE'):
            return 1
        else:
            return 0
        
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    
    
    def sigmoid_output_2_derivative(self,output):
        return output * (1 - output)
    
    def train(self, training_reviews, training_labels):
        
        assert(len(training_reviews) == len(training_labels))
        
        correct_so_far = 0
        
        start = time.time()
        
        for i in range(len(training_reviews)):
            
            review = training_reviews[i]
            label = training_labels[i]
            
            #### Implement the forward pass here ####
            ### Forward pass ###

            # Input Layer
            self.update_input_layer(review)

            # Hidden layer
            layer_1 = self.layer_0.dot(self.weights_0_1)

            # Output layer
            layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

            #### Implement the backward pass here ####
            ### Backward pass ###

            # TODO: Output error
            layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # TODO: Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

            # TODO: Update the weights
            self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
            self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

            if(np.abs(layer_2_error) < 0.5):
                correct_so_far += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
            if(i % 2500 == 0):
                print("")
    
    def test(self, testing_reviews, testing_labels):
        
        correct = 0
        
        start = time.time()
        
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                            + "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
    
    def run(self, review):
        
        # Input Layer
        self.update_input_layer(review.lower())

        # Hidden layer
        layer_1 = self.layer_0.dot(self.weights_0_1)

        # Output layer
        layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] > 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"

In [83]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

In [84]:
mlp.train(reviews[:-1000],labels[:-1000])


Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):91.50 #Correct:1795 #Trained:2501 Training Accuracy:71.7%
Progress:20.8% Speed(reviews/sec):95.25 #Correct:3811 #Trained:5001 Training Accuracy:76.2%
Progress:31.2% Speed(reviews/sec):93.74 #Correct:5898 #Trained:7501 Training Accuracy:78.6%
Progress:41.6% Speed(reviews/sec):93.69 #Correct:8042 #Trained:10001 Training Accuracy:80.4%
Progress:52.0% Speed(reviews/sec):95.27 #Correct:10186 #Trained:12501 Training Accuracy:81.4%
Progress:62.5% Speed(reviews/sec):98.19 #Correct:12317 #Trained:15001 Training Accuracy:82.1%
Progress:72.9% Speed(reviews/sec):98.56 #Correct:14440 #Trained:17501 Training Accuracy:82.5%
Progress:83.3% Speed(reviews/sec):99.74 #Correct:16613 #Trained:20001 Training Accuracy:83.0%
Progress:93.7% Speed(reviews/sec):100.7 #Correct:18794 #Trained:22501 Training Accuracy:83.5%
Progress:99.9% Speed(reviews/sec):101.9 #Correct:20115 #Trained:24000 Training Accuracy:83.8%

In [85]:
# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])


Progress:99.9% Speed(reviews/sec):832.7% #Correct:851 #Tested:1000 Testing Accuracy:85.1%

Analyzing Inefficiencies in our Network


In [88]:
Image(filename='sentiment_network_sparse.png')


Out[88]:

In [89]:
layer_0 = np.zeros(10)

In [90]:
layer_0


Out[90]:
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [91]:
layer_0[4] = 1
layer_0[9] = 1

In [92]:
layer_0


Out[92]:
array([ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.])

In [93]:
weights_0_1 = np.random.randn(10,5)

In [94]:
layer_0.dot(weights_0_1)


Out[94]:
array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

In [101]:
indices = [4,9]

In [102]:
layer_1 = np.zeros(5)

In [103]:
for index in indices:
    layer_1 += (weights_0_1[index])

In [104]:
layer_1


Out[104]:
array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

In [100]:
Image(filename='sentiment_network_sparse_2.png')


Out[100]:

Project 5: Making our Network More Efficient


In [30]:
import time
import sys

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
    def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):
       
        np.random.seed(1)
    
        self.pre_process_data(reviews)
        
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)
        
        
    def pre_process_data(self,reviews):
        
        review_vocab = set()
        for review in reviews:
            for word in review.split(" "):
                review_vocab.add(word)
        self.review_vocab = list(review_vocab)
        
        label_vocab = set()
        for label in labels:
            label_vocab.add(label)
        
        self.label_vocab = list(label_vocab)
        
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        self.word2index = {}
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
         
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Initialize weights
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
    
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        self.learning_rate = learning_rate
        
        self.layer_0 = np.zeros((1,input_nodes))
        self.layer_1 = np.zeros((1,hidden_nodes))
        
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    
    
    def sigmoid_output_2_derivative(self,output):
        return output * (1 - output)
    
    def update_input_layer(self,review):

        # clear out previous state, reset the layer to be all 0s
        self.layer_0 *= 0
        for word in review.split(" "):
            self.layer_0[0][self.word2index[word]] = 1

    def get_target_for_label(self,label):
        if(label == 'POSITIVE'):
            return 1
        else:
            return 0
        
    def train(self, training_reviews_raw, training_labels):
        
        training_reviews = list()
        for review in training_reviews_raw:
            indices = set()
            for word in review.split(" "):
                if(word in self.word2index.keys()):
                    indices.add(self.word2index[word])
            training_reviews.append(list(indices))
        
        assert(len(training_reviews) == len(training_labels))
        
        correct_so_far = 0
        
        start = time.time()
        
        for i in range(len(training_reviews)):
            
            review = training_reviews[i]
            label = training_labels[i]
            
            #### Implement the forward pass here ####
            ### Forward pass ###

            # Input Layer

            # Hidden layer
#             layer_1 = self.layer_0.dot(self.weights_0_1)
            self.layer_1 *= 0
            for index in review:
                self.layer_1 += self.weights_0_1[index]
            
            # Output layer
            layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

            #### Implement the backward pass here ####
            ### Backward pass ###

            # Output error
            layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

            # Update the weights
            self.weights_1_2 -= self.layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
            
            for index in review:
                self.weights_0_1[index] -= layer_1_delta[0] * self.learning_rate # update input-to-hidden weights with gradient descent step

            if(np.abs(layer_2_error) < 0.5):
                correct_so_far += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
        
    
    def test(self, testing_reviews, testing_labels):
        
        correct = 0
        
        start = time.time()
        
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            reviews_per_second = i / float(time.time() - start)
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                            + "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
    
    def run(self, review):
        
        # Input Layer


        # Hidden layer
        self.layer_1 *= 0
        unique_indices = set()
        for word in review.lower().split(" "):
            if word in self.word2index.keys():
                unique_indices.add(self.word2index[word])
        for index in unique_indices:
            self.layer_1 += self.weights_0_1[index]
        
        # Output layer
        layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] > 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"

In [31]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

In [32]:
mlp.train(reviews[:-1000],labels[:-1000])


Progress:99.9% Speed(reviews/sec):964.0 #Correct:20076 #Trained:24000 Training Accuracy:83.6%

In [33]:
# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])


Progress:99.9% Speed(reviews/sec):1201.% #Correct:851 #Tested:1000 Testing Accuracy:85.1%

In [ ]: