# Sentiment Classification & How To "Frame Problems" for a Neural Network

### What You Should Already Know

• neural networks, forward and back-propagation
• stochastic gradient descent
• mean squared error
• and train/test splits

### Where to Get Help if You Need it

• Re-watch previous Udacity Lectures
• Leverage the recommended Course Reading Material - Grokking Deep Learning (40% Off: traskud17)
• Shoot me a tweet @iamtrask

### Tutorial Outline:

• Intro: The Importance of "Framing a Problem"
• Curate a Dataset
• Developing a "Predictive Theory"
• PROJECT 1: Quick Theory Validation
• Transforming Text to Numbers
• PROJECT 2: Creating the Input/Output Data
• Putting it all together in a Neural Network
• PROJECT 3: Building our Neural Network
• Understanding Neural Noise
• PROJECT 4: Making Learning Faster by Reducing Noise
• Analyzing Inefficiencies in our Network
• PROJECT 5: Making our Network Train and Run Faster
• Further Noise Reduction
• PROJECT 6: Reducing Noise by Strategically Reducing the Vocabulary
• Analysis: What's going on in the weights?

# Lesson: Curate a Dataset

``````

In [1]:

def pretty_print_review_and_label(i):
print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

``````
``````

In [2]:

len(reviews)

``````
``````

Out[2]:

25000

``````
``````

In [3]:

reviews[0]

``````
``````

Out[3]:

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   '

``````
``````

In [4]:

labels[0]

``````
``````

Out[4]:

'POSITIVE'

``````

# Lesson: Develop a Predictive Theory

``````

In [5]:

print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)

``````
``````

labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...

``````

# Project 1: Quick Theory Validation

``````

In [6]:

from collections import Counter
import numpy as np

``````
``````

In [7]:

positive_counts = Counter()
negative_counts = Counter()
total_counts = Counter()

``````
``````

In [8]:

for i in range(len(reviews)):
if(labels[i] == 'POSITIVE'):
for word in reviews[i].split(" "):
positive_counts[word] += 1
total_counts[word] += 1
else:
for word in reviews[i].split(" "):
negative_counts[word] += 1
total_counts[word] += 1

``````
``````

In [9]:

positive_counts.most_common()

``````
``````

Out[9]:

[('', 550468),
('the', 173324),
('.', 159654),
('and', 89722),
('a', 83688),
('of', 76855),
('to', 66746),
('is', 57245),
('in', 50215),
('br', 49235),
('it', 48025),
('i', 40743),
('that', 35630),
('this', 35080),
('s', 33815),
('as', 26308),
('with', 23247),
('for', 22416),
('was', 21917),
('film', 20937),
('but', 20822),
('movie', 19074),
('his', 17227),
('on', 17008),
('you', 16681),
('he', 16282),
('are', 14807),
('not', 14272),
('t', 13720),
('one', 13655),
('have', 12587),
('be', 12416),
('by', 11997),
('all', 11942),
('who', 11464),
('an', 11294),
('at', 11234),
('from', 10767),
('her', 10474),
('they', 9895),
('has', 9186),
('so', 9154),
('like', 9038),
('very', 8305),
('out', 8134),
('there', 8057),
('she', 7779),
('what', 7737),
('or', 7732),
('good', 7720),
('more', 7521),
('when', 7456),
('some', 7441),
('if', 7285),
('just', 7152),
('can', 7001),
('story', 6780),
('time', 6515),
('my', 6488),
('great', 6419),
('well', 6405),
('up', 6321),
('which', 6267),
('their', 6107),
('see', 6026),
('also', 5550),
('we', 5531),
('really', 5476),
('would', 5400),
('will', 5218),
('me', 5167),
('only', 5137),
('him', 5018),
('even', 4964),
('most', 4864),
('other', 4858),
('were', 4782),
('first', 4755),
('than', 4736),
('much', 4685),
('its', 4622),
('no', 4574),
('into', 4544),
('people', 4479),
('best', 4319),
('love', 4301),
('get', 4272),
('how', 4213),
('life', 4199),
('been', 4189),
('because', 4079),
('way', 4036),
('do', 3941),
('films', 3813),
('them', 3805),
('after', 3800),
('many', 3766),
('two', 3733),
('too', 3659),
('think', 3655),
('movies', 3586),
('characters', 3560),
('character', 3514),
('don', 3468),
('man', 3460),
('show', 3432),
('watch', 3424),
('seen', 3414),
('then', 3358),
('little', 3341),
('still', 3340),
('make', 3303),
('could', 3237),
('never', 3226),
('being', 3217),
('where', 3173),
('does', 3069),
('over', 3017),
('any', 3002),
('while', 2899),
('know', 2833),
('did', 2790),
('years', 2758),
('here', 2740),
('ever', 2734),
('end', 2696),
('these', 2694),
('such', 2590),
('real', 2568),
('scene', 2567),
('back', 2547),
('those', 2485),
('though', 2475),
('off', 2463),
('new', 2458),
('your', 2453),
('go', 2440),
('acting', 2437),
('plot', 2432),
('world', 2429),
('scenes', 2427),
('say', 2414),
('through', 2409),
('makes', 2390),
('better', 2381),
('now', 2368),
('work', 2346),
('young', 2343),
('old', 2311),
('ve', 2307),
('find', 2272),
('both', 2248),
('before', 2177),
('us', 2162),
('again', 2158),
('series', 2153),
('quite', 2143),
('something', 2135),
('cast', 2133),
('should', 2121),
('part', 2098),
('always', 2088),
('lot', 2087),
('another', 2075),
('actors', 2047),
('director', 2040),
('family', 2032),
('between', 2016),
('own', 2016),
('m', 1998),
('may', 1997),
('same', 1972),
('role', 1967),
('watching', 1966),
('every', 1954),
('funny', 1953),
('doesn', 1935),
('performance', 1928),
('few', 1918),
('look', 1900),
('re', 1884),
('why', 1855),
('things', 1849),
('times', 1832),
('big', 1815),
('however', 1795),
('actually', 1790),
('action', 1789),
('going', 1783),
('bit', 1757),
('comedy', 1742),
('down', 1740),
('music', 1738),
('must', 1728),
('take', 1709),
('saw', 1692),
('long', 1690),
('right', 1688),
('fun', 1686),
('fact', 1684),
('excellent', 1683),
('around', 1674),
('didn', 1672),
('without', 1671),
('thing', 1662),
('thought', 1639),
('got', 1635),
('each', 1630),
('day', 1614),
('feel', 1597),
('seems', 1596),
('come', 1594),
('done', 1586),
('beautiful', 1580),
('especially', 1572),
('played', 1571),
('almost', 1566),
('want', 1562),
('yet', 1556),
('give', 1553),
('pretty', 1549),
('last', 1543),
('since', 1519),
('different', 1504),
('although', 1501),
('gets', 1490),
('true', 1487),
('interesting', 1481),
('job', 1470),
('enough', 1455),
('our', 1454),
('shows', 1447),
('horror', 1441),
('woman', 1439),
('tv', 1400),
('probably', 1398),
('father', 1395),
('original', 1393),
('girl', 1390),
('point', 1379),
('plays', 1378),
('wonderful', 1372),
('far', 1358),
('course', 1358),
('john', 1350),
('rather', 1340),
('isn', 1328),
('ll', 1326),
('later', 1324),
('dvd', 1324),
('whole', 1310),
('war', 1310),
('d', 1307),
('found', 1306),
('away', 1306),
('screen', 1305),
('nothing', 1300),
('year', 1297),
('once', 1296),
('hard', 1294),
('together', 1280),
('set', 1277),
('am', 1277),
('having', 1266),
('making', 1265),
('place', 1263),
('might', 1260),
('comes', 1260),
('sure', 1253),
('american', 1248),
('play', 1245),
('kind', 1244),
('perfect', 1242),
('takes', 1242),
('performances', 1237),
('himself', 1230),
('worth', 1221),
('everyone', 1221),
('anyone', 1214),
('actor', 1203),
('three', 1201),
('wife', 1196),
('classic', 1192),
('goes', 1186),
('ending', 1178),
('version', 1168),
('star', 1149),
('enjoy', 1146),
('book', 1142),
('nice', 1132),
('everything', 1128),
('during', 1124),
('put', 1118),
('seeing', 1111),
('least', 1102),
('house', 1100),
('high', 1095),
('watched', 1094),
('loved', 1087),
('men', 1087),
('night', 1082),
('anything', 1075),
('believe', 1071),
('guy', 1071),
('top', 1063),
('amazing', 1058),
('hollywood', 1056),
('looking', 1053),
('main', 1044),
('definitely', 1043),
('gives', 1031),
('home', 1029),
('seem', 1028),
('episode', 1023),
('audience', 1020),
('sense', 1020),
('truly', 1017),
('special', 1011),
('second', 1009),
('short', 1009),
('fan', 1009),
('mind', 1005),
('human', 1001),
('recommend', 999),
('full', 996),
('black', 995),
('help', 991),
('along', 989),
('trying', 987),
('small', 986),
('death', 985),
('friends', 981),
('remember', 974),
('often', 970),
('said', 966),
('favorite', 962),
('heart', 959),
('early', 957),
('left', 956),
('until', 955),
('script', 954),
('let', 954),
('maybe', 937),
('today', 936),
('live', 934),
('less', 934),
('moments', 933),
('others', 929),
('brilliant', 926),
('shot', 925),
('liked', 923),
('become', 916),
('won', 915),
('used', 910),
('style', 907),
('mother', 895),
('lives', 894),
('came', 893),
('stars', 890),
('cinema', 889),
('looks', 885),
('perhaps', 884),
('enjoyed', 879),
('boy', 875),
('drama', 873),
('highly', 871),
('given', 870),
('playing', 867),
('use', 864),
('next', 859),
('women', 858),
('fine', 857),
('effects', 856),
('kids', 854),
('entertaining', 853),
('need', 852),
('line', 850),
('works', 848),
('someone', 847),
('mr', 836),
('simply', 835),
('picture', 833),
('children', 833),
('face', 831),
('keep', 831),
('friend', 831),
('dark', 830),
('overall', 828),
('certainly', 828),
('minutes', 827),
('wasn', 824),
('history', 822),
('finally', 820),
('couple', 816),
('against', 815),
('son', 809),
('understand', 808),
('lost', 807),
('michael', 805),
('else', 801),
('throughout', 798),
('fans', 797),
('city', 792),
('reason', 789),
('written', 787),
('production', 787),
('several', 784),
('school', 783),
('based', 781),
('rest', 781),
('try', 780),
('hope', 775),
('strong', 768),
('white', 765),
('tell', 759),
('itself', 758),
('half', 753),
('person', 749),
('sometimes', 746),
('past', 744),
('start', 744),
('genre', 743),
('beginning', 739),
('final', 739),
('town', 738),
('art', 734),
('humor', 732),
('game', 732),
('yes', 731),
('idea', 731),
('late', 730),
('becomes', 729),
('despite', 729),
('able', 726),
('case', 726),
('money', 723),
('child', 721),
('completely', 721),
('side', 719),
('camera', 716),
('getting', 714),
('soon', 702),
('under', 700),
('viewer', 699),
('age', 697),
('days', 696),
('stories', 696),
('felt', 694),
('simple', 694),
('roles', 693),
('video', 688),
('name', 683),
('either', 683),
('doing', 677),
('turns', 674),
('wants', 671),
('close', 671),
('title', 669),
('wrong', 668),
('went', 666),
('james', 665),
('evil', 659),
('budget', 657),
('episodes', 657),
('relationship', 655),
('fantastic', 653),
('piece', 653),
('david', 651),
('turn', 648),
('murder', 646),
('parts', 645),
('brother', 644),
('absolutely', 643),
('experience', 642),
('eyes', 641),
('sex', 638),
('direction', 637),
('called', 637),
('directed', 636),
('lines', 634),
('behind', 633),
('sort', 632),
('actress', 631),
('oscar', 628),
('including', 627),
('example', 627),
('known', 625),
('musical', 625),
('chance', 621),
('score', 620),
('feeling', 619),
('hit', 619),
('voice', 615),
('moment', 612),
('living', 612),
('low', 610),
('supporting', 610),
('ago', 609),
('themselves', 608),
('reality', 605),
('hilarious', 605),
('jack', 604),
('told', 603),
('hand', 601),
('quality', 600),
('moving', 600),
('dialogue', 600),
('song', 599),
('happy', 599),
('matter', 598),
('paul', 598),
('light', 594),
('future', 593),
('entire', 592),
('finds', 591),
('gave', 589),
('laugh', 587),
('released', 586),
('expect', 584),
('fight', 581),
('particularly', 580),
('cinematography', 579),
('police', 579),
('whose', 578),
('type', 578),
('sound', 578),
('view', 573),
('enjoyable', 573),
('number', 572),
('romantic', 572),
('husband', 572),
('daughter', 572),
('documentary', 571),
('self', 570),
('superb', 569),
('modern', 569),
('took', 569),
('robert', 569),
('mean', 566),
('shown', 563),
('coming', 561),
('important', 560),
('king', 559),
('leave', 559),
('change', 558),
('somewhat', 555),
('wanted', 555),
('tells', 554),
('events', 552),
('run', 552),
('career', 552),
('country', 552),
('heard', 550),
('season', 550),
('greatest', 549),
('girls', 549),
('etc', 547),
('care', 546),
('starts', 545),
('english', 542),
('killer', 541),
('tale', 540),
('guys', 540),
('totally', 540),
('animation', 540),
('usual', 539),
('miss', 535),
('opinion', 535),
('easy', 531),
('violence', 531),
('songs', 530),
('british', 528),
('says', 526),
('realistic', 525),
('writing', 524),
('writer', 522),
('act', 522),
('comic', 521),
('thriller', 519),
('television', 517),
('power', 516),
('ones', 515),
('kid', 514),
('york', 513),
('novel', 513),
('alone', 512),
('problem', 512),
('attention', 509),
('involved', 508),
('kill', 507),
('extremely', 507),
('seemed', 506),
('hero', 505),
('french', 505),
('rock', 504),
('stuff', 501),
('wish', 499),
('begins', 498),
('taken', 497),
('ways', 496),
('richard', 495),
('knows', 494),
('atmosphere', 493),
('similar', 491),
('surprised', 491),
('taking', 491),
('car', 491),
('george', 490),
('perfectly', 490),
('across', 489),
('team', 489),
('eye', 489),
('sequence', 489),
('room', 488),
('due', 488),
('among', 488),
('serious', 488),
('powerful', 488),
('strange', 487),
('order', 487),
('cannot', 487),
('b', 487),
('beauty', 486),
('famous', 485),
('happened', 484),
('tries', 484),
('herself', 484),
('myself', 484),
('class', 483),
('four', 482),
('cool', 481),
('release', 479),
('anyway', 479),
('theme', 479),
('opening', 478),
('entertainment', 477),
('slow', 475),
('ends', 475),
('unique', 475),
('exactly', 475),
('easily', 474),
('level', 474),
('o', 474),
('red', 474),
('interest', 472),
('happen', 471),
('crime', 470),
('viewing', 468),
('sets', 467),
('memorable', 467),
('stop', 466),
('group', 466),
('problems', 463),
('dance', 463),
('working', 463),
('sister', 463),
('message', 463),
('knew', 462),
('mystery', 461),
('nature', 461),
('bring', 460),
('believable', 459),
('thinking', 459),
('brought', 459),
('mostly', 458),
('disney', 457),
('couldn', 457),
('society', 456),
('within', 455),
('blood', 454),
('parents', 453),
('upon', 453),
('viewers', 453),
('meets', 452),
('form', 452),
('peter', 452),
('tom', 452),
('usually', 452),
('soundtrack', 452),
('local', 450),
('certain', 448),
('follow', 448),
('whether', 447),
('possible', 446),
('emotional', 445),
('killed', 444),
('above', 444),
('de', 444),
('god', 443),
('middle', 443),
('needs', 442),
('happens', 442),
('flick', 442),
('masterpiece', 441),
('period', 440),
('major', 440),
('named', 439),
('haven', 439),
('particular', 438),
('th', 438),
('earth', 437),
('feature', 437),
('stand', 436),
('words', 435),
('typical', 435),
('elements', 433),
('obviously', 433),
('romance', 431),
('jane', 430),
('yourself', 427),
('showing', 427),
('brings', 426),
('fantasy', 426),
('guess', 423),
('america', 423),
('unfortunately', 422),
('huge', 422),
('indeed', 421),
('running', 421),
('talent', 420),
('stage', 419),
('started', 418),
('sweet', 417),
('japanese', 417),
('poor', 416),
('deal', 416),
('incredible', 413),
('personal', 413),
('fast', 412),
('became', 410),
('deep', 410),
('hours', 409),
('giving', 408),
('nearly', 408),
('dream', 408),
('clearly', 407),
('turned', 407),
('obvious', 406),
('near', 406),
('cut', 405),
('surprise', 405),
('era', 404),
('body', 404),
('hour', 403),
('female', 403),
('five', 403),
('note', 399),
('learn', 398),
('truth', 398),
('except', 397),
('feels', 397),
('match', 397),
('tony', 397),
('filmed', 394),
('clear', 394),
('complete', 394),
('street', 393),
('eventually', 393),
('keeps', 393),
('older', 393),
('lots', 393),
('william', 391),
('stewart', 391),
('fall', 390),
('joe', 390),
('meet', 390),
('unlike', 389),
('talking', 389),
('shots', 389),
('rating', 389),
('difficult', 389),
('dramatic', 388),
('means', 388),
('situation', 386),
('wonder', 386),
('present', 386),
('appears', 386),
('subject', 386),
('general', 383),
('sequences', 383),
('lee', 383),
('points', 382),
('earlier', 382),
('gone', 379),
('check', 379),
('suspense', 378),
('recommended', 378),
('ten', 378),
('third', 377),
('talk', 375),
('leaves', 375),
('beyond', 375),
('portrayal', 374),
('beautifully', 373),
('single', 372),
('bill', 372),
('plenty', 371),
('word', 371),
('whom', 370),
('falls', 370),
('scary', 369),
('non', 369),
('figure', 369),
('battle', 369),
('using', 368),
('return', 368),
('doubt', 367),
('hear', 366),
('solid', 366),
('success', 366),
('jokes', 365),
('oh', 365),
('touching', 365),
('political', 365),
('hell', 364),
('awesome', 364),
('boys', 364),
('sexual', 362),
('recently', 362),
('dog', 362),
('wouldn', 361),
('straight', 361),
('features', 361),
('forget', 360),
('setting', 360),
('lack', 360),
('married', 359),
('mark', 359),
('social', 357),
('interested', 356),
('actual', 355),
('terrific', 355),
('sees', 355),
('brothers', 355),
('move', 354),
('call', 354),
('various', 353),
('theater', 353),
('dr', 353),
('animated', 352),
('western', 351),
('baby', 350),
('space', 350),
('disappointed', 348),
('portrayed', 346),
('aren', 346),
('screenplay', 345),
('smith', 345),
('towards', 344),
('hate', 344),
('noir', 343),
('outstanding', 342),
('decent', 342),
('kelly', 342),
('directors', 341),
('journey', 341),
('none', 340),
('looked', 340),
('effective', 340),
('storyline', 339),
('caught', 339),
('sci', 339),
('fi', 339),
('cold', 339),
('mary', 339),
('rich', 338),
('charming', 338),
('popular', 337),
('rare', 337),
('manages', 337),
('harry', 337),
('spirit', 336),
('appreciate', 335),
('open', 335),
('moves', 334),
('basically', 334),
('acted', 334),
('inside', 333),
('boring', 333),
('century', 333),
('mention', 333),
('deserves', 333),
('subtle', 333),
('pace', 333),
('familiar', 332),
('background', 332),
('ben', 331),
('creepy', 330),
('supposed', 330),
('secret', 329),
('die', 328),
('jim', 328),
('question', 327),
('effect', 327),
('natural', 327),
('impressive', 326),
('rate', 326),
('language', 326),
('saying', 325),
('intelligent', 325),
('telling', 324),
('realize', 324),
('material', 324),
('scott', 324),
('singing', 323),
('dancing', 322),
('visual', 321),
('imagine', 321),
('kept', 320),
('office', 320),
('uses', 319),
('pure', 318),
('wait', 318),
('stunning', 318),
('review', 317),
('previous', 317),
('copy', 317),
('seriously', 317),
('create', 316),
('hot', 316),
('created', 316),
('magic', 316),
('somehow', 316),
('stay', 315),
('attempt', 315),
('escape', 315),
('crazy', 315),
('air', 315),
('frank', 315),
('hands', 314),
('filled', 313),
('expected', 312),
('average', 312),
('surprisingly', 312),
('complex', 311),
('quickly', 310),
('successful', 310),
('studio', 310),
('plus', 309),
('male', 309),
('co', 307),
('images', 306),
('casting', 306),
('following', 306),
('minute', 306),
('exciting', 306),
('members', 305),
('follows', 305),
('themes', 305),
('german', 305),
('reasons', 305),
('e', 305),
('touch', 304),
('edge', 304),
('free', 304),
('cute', 304),
('genius', 304),
('outside', 303),
('reviews', 302),
('ok', 302),
('younger', 302),
('fighting', 301),
('odd', 301),
('master', 301),
('recent', 300),
('thanks', 300),
('break', 300),
('comment', 300),
('apart', 299),
('emotions', 298),
('lovely', 298),
('begin', 298),
('doctor', 297),
('party', 297),
('italian', 297),
('la', 296),
('missed', 296),
...]

``````
``````

In [12]:

pos_neg_ratios = Counter()

for term,cnt in list(total_counts.most_common()):
if(cnt > 100):
pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)
pos_neg_ratios[term] = pos_neg_ratio

for word,ratio in pos_neg_ratios.most_common():
if(ratio > 1):
pos_neg_ratios[word] = np.log(ratio)
else:
pos_neg_ratios[word] = -np.log((1 / (ratio+0.01)))

``````
``````

In [13]:

# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()

``````
``````

Out[13]:

[('edie', 4.6913478822291435),
('paulie', 4.0775374439057197),
('felix', 3.1527360223636558),
('polanski', 2.8233610476132043),
('matthau', 2.8067217286092401),
('victoria', 2.6810215287142909),
('mildred', 2.6026896854443837),
('gandhi', 2.5389738710582761),
('flawless', 2.451005098112319),
('superbly', 2.2600254785752498),
('perfection', 2.1594842493533721),
('astaire', 2.1400661634962708),
('captures', 2.0386195471595809),
('voight', 2.0301704926730531),
('wonderfully', 2.0218960560332353),
('powell', 1.9783454248084671),
('brosnan', 1.9547990964725592),
('lily', 1.9203768470501485),
('bakshi', 1.9029851043382795),
('lincoln', 1.9014583864844796),
('refreshing', 1.8551812956655511),
('breathtaking', 1.8481124057791867),
('bourne', 1.8478489358790986),
('lemmon', 1.8458266904983307),
('delightful', 1.8002701588959635),
('flynn', 1.7996646487351682),
('andrews', 1.7764919970972666),
('homer', 1.7692866133759964),
('beautifully', 1.7626953362841438),
('soccer', 1.7578579175523736),
('elvira', 1.7397031072720019),
('underrated', 1.7197859696029656),
('gripping', 1.7165360479904674),
('superb', 1.7091514458966952),
('delight', 1.6714733033535532),
('welles', 1.6677068205580761),
('sinatra', 1.6389967146756448),
('touching', 1.637217476541176),
('timeless', 1.62924053973028),
('macy', 1.6211339521972916),
('unforgettable', 1.6177367152487956),
('favorites', 1.6158688027643908),
('stewart', 1.6119987332957739),
('sullivan', 1.6094379124341003),
('extraordinary', 1.6094379124341003),
('hartley', 1.6094379124341003),
('brilliantly', 1.5950491749820008),
('friendship', 1.5677652160335325),
('wonderful', 1.5645425925262093),
('palma', 1.5553706911638245),
('magnificent', 1.54663701119507),
('finest', 1.5462590108125689),
('jackie', 1.5439233053234738),
('ritter', 1.5404450409471491),
('tremendous', 1.5184661342283736),
('freedom', 1.5091151908062312),
('fantastic', 1.5048433868558566),
('terrific', 1.5026699370083942),
('noir', 1.493925025312256),
('sidney', 1.493925025312256),
('outstanding', 1.4910053152089213),
('pleasantly', 1.4894785973551214),
('mann', 1.4894785973551214),
('nancy', 1.488077055429833),
('marie', 1.4825711915553104),
('marvelous', 1.4739999415389962),
('excellent', 1.4647538505723599),
('ruth', 1.4596256342054401),
('stanwyck', 1.4412101187160054),
('widmark', 1.4350845252893227),
('splendid', 1.4271163556401458),
('chan', 1.423108334242607),
('exceptional', 1.4201959127955721),
('tender', 1.410986973710262),
('gentle', 1.4078005663408544),
('poignant', 1.4022947024663317),
('gem', 1.3932148039644643),
('amazing', 1.3919815802404802),
('chilling', 1.3862943611198906),
('fisher', 1.3862943611198906),
('davies', 1.3862943611198906),
('captivating', 1.3862943611198906),
('darker', 1.3652409519220583),
('april', 1.3499267169490159),
('kelly', 1.3461743673304654),
('blake', 1.3418425985490567),
('overlooked', 1.329135947279942),
('ralph', 1.32818673031261),
('bette', 1.3156767939059373),
('hoffman', 1.3150668518315229),
('cole', 1.3121863889661687),
('shines', 1.3049487216659381),
('powerful', 1.2999662776313934),
('notch', 1.2950456896547455),
('remarkable', 1.2883688239495823),
('pitt', 1.286210902562908),
('winters', 1.2833463918674481),
('vivid', 1.2762934659055623),
('gritty', 1.2757524867200667),
('giallo', 1.2745029551317739),
('portrait', 1.2704625455947689),
('innocence', 1.2694300209805796),
('psychiatrist', 1.2685113254635072),
('favorite', 1.2668956297860055),
('ensemble', 1.2656663733312759),
('stunning', 1.2622417124499117),
('burns', 1.259880436264232),
('garbo', 1.258954938743289),
('barbara', 1.2580400255962119),
('philip', 1.2527629684953681),
('panic', 1.2527629684953681),
('holly', 1.2527629684953681),
('carol', 1.2481440226390734),
('perfect', 1.246742480713785),
('appreciated', 1.2462482874741743),
('favourite', 1.2411123512753928),
('journey', 1.2367626271489269),
('rural', 1.235471471385307),
('bond', 1.2321436812926323),
('builds', 1.2305398317106577),
('brilliant', 1.2287554137664785),
('brooklyn', 1.2286654169163074),
('von', 1.225175011976539),
('recommended', 1.2163953243244932),
('unfolds', 1.2163953243244932),
('daniel', 1.20215296760895),
('perfectly', 1.1971931173405572),
('crafted', 1.1962507582320256),
('prince', 1.1939224684724346),
('troubled', 1.192138346678933),
('consequences', 1.1865810616140668),
('haunting', 1.1814999484738773),
('cinderella', 1.180052620608284),
('alexander', 1.1759989522835299),
('emotions', 1.1753049094563641),
('boxing', 1.1735135968412274),
('subtle', 1.1734135017508081),
('curtis', 1.1649873576129823),
('rare', 1.1566438362402944),
('loved', 1.1563661500586044),
('daughters', 1.1526795099383853),
('courage', 1.1438688802562305),
('dentist', 1.1426722784621401),
('highly', 1.1420208631618658),
('nominated', 1.1409146683587992),
('tony', 1.1397491942285991),
('draws', 1.1325138403437911),
('everyday', 1.1306150197542835),
('contrast', 1.1284652518177909),
('cried', 1.1213405397456659),
('fabulous', 1.1210851445201684),
('ned', 1.120591195386885),
('fay', 1.120591195386885),
('emma', 1.1184149159642893),
('sensitive', 1.113318436057805),
('smooth', 1.1089750757036563),
('dramas', 1.1080910326226534),
('today', 1.1050431789984001),
('helps', 1.1023091505494358),
('inspiring', 1.0986122886681098),
('jimmy', 1.0937696641923216),
('awesome', 1.0931328229034842),
('unique', 1.0881409888008142),
('tragic', 1.0871835928444868),
('intense', 1.0870514662670339),
('stellar', 1.0857088838322018),
('rival', 1.0822184788924332),
('provides', 1.0797081340289569),
('depression', 1.0782034170369026),
('shy', 1.0775588794702773),
('carrie', 1.076139432816051),
('blend', 1.0753554265038423),
('hank', 1.0736109864626924),
('diana', 1.0726368022648489),
('unexpected', 1.0722255334949147),
('achievement', 1.0668635903535293),
('bettie', 1.0663514264498881),
('happiness', 1.0632729222228008),
('glorious', 1.0608719606852626),
('davis', 1.0541605260972757),
('terrifying', 1.0525211814678428),
('beauty', 1.050410186850232),
('ideal', 1.0479685558493548),
('fears', 1.0467872208035236),
('hong', 1.0438040521731147),
('seasons', 1.0433496099930604),
('fascinating', 1.0414538748281612),
('carries', 1.0345904299031787),
('satisfying', 1.0321225473992768),
('definite', 1.0319209141694374),
('touched', 1.0296194171811581),
('greatest', 1.0248947127715422),
('creates', 1.0241097613701886),
('aunt', 1.023388867430522),
('walter', 1.022328983918479),
('spectacular', 1.0198314108149955),
('portrayal', 1.0189810189761024),
('ann', 1.0127808528183286),
('enterprise', 1.0116009116784799),
('musicals', 1.0096648026516135),
('deeply', 1.0094845087721023),
('incredible', 1.0061677561461084),
('mature', 1.0060195018402847),
('triumph', 0.99682959435816731),
('margaret', 0.99682959435816731),
('navy', 0.99493385919326827),
('harry', 0.99176919305006062),
('lucas', 0.990398704027877),
('sweet', 0.98966110487955483),
('joey', 0.98794672078059009),
('oscar', 0.98721905111049713),
('balance', 0.98649499054740353),
('warm', 0.98485340331145166),
('ages', 0.98449898190068863),
('guilt', 0.98082925301172619),
('glover', 0.98082925301172619),
('carrey', 0.98082925301172619),
('learns', 0.97881108885548895),
('unusual', 0.97788374278196932),
('sons', 0.97777581552483595),
('complex', 0.97761897738147796),
('essence', 0.97753435711487369),
('brazil', 0.9769153536905899),
('widow', 0.97650959186720987),
('solid', 0.97537964824416146),
('beautiful', 0.97326301262841053),
('holmes', 0.97246100334120955),
('awe', 0.97186058302896583),
('vhs', 0.97116734209998934),
('eerie', 0.97116734209998934),
('lonely', 0.96873720724669754),
('grim', 0.96873720724669754),
('sport', 0.96825047080486615),
('debut', 0.96508089604358704),
('destiny', 0.96343751029985703),
('thrillers', 0.96281074750904794),
('tears', 0.95977584381389391),
('rose', 0.95664202739772253),
('feelings', 0.95551144502743635),
('ginger', 0.95551144502743635),
('winning', 0.95471810900804055),
('stanley', 0.95387344302319799),
('cox', 0.95343027882361187),
('paris', 0.95278479030472663),
('heart', 0.95238806924516806),
('hooked', 0.95155887071161305),
('comfortable', 0.94803943018873538),
('mgm', 0.94446160884085151),
('masterpiece', 0.94155039863339296),
('themes', 0.94118828349588235),
('danny', 0.93967118051821874),
('anime', 0.93378388932167222),
('perry', 0.93328830824272613),
('joy', 0.93301752567946861),
('lovable', 0.93081883243706487),
('mysteries', 0.92953595862417571),
('hal', 0.92953595862417571),
('louis', 0.92871325187271225),
('charming', 0.92520609553210742),
('urban', 0.92367083917177761),
('allows', 0.92183091224977043),
('impact', 0.91815814604895041),
('italy', 0.91629073187415511),
('lifestyle', 0.91629073187415511),
('spy', 0.91289514287301687),
('treat', 0.91193342650519937),
('subsequent', 0.91056005716517008),
('kennedy', 0.90981821736853763),
('loving', 0.90967549275543591),
('surprising', 0.90937028902958128),
('quiet', 0.90648673177753425),
('winter', 0.90624039602065365),
('reveals', 0.90490540964902977),
('raw', 0.90445627422715225),
('funniest', 0.90078654533818991),
('norman', 0.89994159387262562),
('thief', 0.89874642222324552),
('season', 0.89827222637147675),
('secrets', 0.89794159320595857),
('colorful', 0.89705936994626756),
('highest', 0.8967461358011849),
('compelling', 0.89462923509297576),
('danes', 0.89248008318043659),
('castle', 0.88967708335606499),
('kudos', 0.88889175768604067),
('great', 0.88810470901464589),
('baseball', 0.88730319500090271),
('subtitles', 0.88730319500090271),
('bleak', 0.88730319500090271),
('winner', 0.88643776872447388),
('tragedy', 0.88563699078315261),
('todd', 0.88551907320740142),
('nicely', 0.87924946019380601),
('arthur', 0.87546873735389985),
('essential', 0.87373111745535925),
('gorgeous', 0.8731725250935497),
('fonda', 0.87294029100054127),
('eastwood', 0.87139541196626402),
('focuses', 0.87082835779739776),
('enjoyed', 0.87070195951624607),
('natural', 0.86997924506912838),
('intensity', 0.86835126958503595),
('witty', 0.86824103423244681),
('rob', 0.8642954367557748),
('worlds', 0.86377269759070874),
('health', 0.86113891179907498),
('magical', 0.85953791528170564),
('deeper', 0.85802182375017932),
('lucy', 0.85618680780444956),
('moving', 0.85566611005772031),
('lovely', 0.85290640004681306),
('purple', 0.8513711857748395),
('memorable', 0.84801189112086062),
('sings', 0.84729786038720367),
('craig', 0.84342938360928321),
('modesty', 0.84342938360928321),
('relate', 0.84326559685926517),
('episodes', 0.84223712084137292),
('strong', 0.84167135777060931),
('smith', 0.83959811108590054),
('tear', 0.83704136022001441),
('apartment', 0.83333115290549531),
('princess', 0.83290912293510388),
('disagree', 0.83290912293510388),
('kung', 0.83173334384609199),
('columbo', 0.82667857318446791),
('jake', 0.82667857318446791),
('hart', 0.82472353834866463),
('strength', 0.82417544296634937),
('realizes', 0.82360006895738058),
('dave', 0.8232003088081431),
('childhood', 0.82208086393583857),
('forbidden', 0.81989888619908913),
('tight', 0.81883539572344199),
('surreal', 0.8178506590609026),
('manager', 0.81770990320170756),
('dancer', 0.81574950265227764),
('studios', 0.81093021621632877),
('con', 0.81093021621632877),
('miike', 0.80821651034473263),
('realistic', 0.80807714723392232),
('explicit', 0.80792269515237358),
('kurt', 0.8060875917405409),
('deals', 0.80535917116687328),
('holds', 0.80493858654806194),
('carl', 0.80437281567016972),
('touches', 0.80396154690023547),
('gene', 0.80314807577427383),
('albert', 0.8027669055771679),
('abc', 0.80234647252493729),
('cry', 0.80011930011211307),
('sides', 0.7995275841185171),
('develops', 0.79850769621777162),
('eyre', 0.79850769621777162),
('dances', 0.79694397424158891),
('oscars', 0.79633141679517616),
('legendary', 0.79600456599965308),
('hearted', 0.79492987486988764),
('importance', 0.79492987486988764),
('portraying', 0.79356592830699269),
('impressed', 0.79258107754813223),
('waters', 0.79112758892014912),
('empire', 0.79078565012386137),
('edge', 0.789774016249017),
('jean', 0.78845736036427028),
('environment', 0.78845736036427028),
('sentimental', 0.7864791203521645),
('captured', 0.78623760362595729),
('styles', 0.78592891401091158),
('daring', 0.78592891401091158),
('frank', 0.78275933924963248),
('tense', 0.78275933924963248),
('backgrounds', 0.78275933924963248),
('matches', 0.78275933924963248),
('gothic', 0.78209466657644144),
('sharp', 0.7814397877056235),
('achieved', 0.78015855754957497),
('court', 0.77947526404844247),
('steals', 0.7789140023173704),
('rules', 0.77844476107184035),
('colors', 0.77684619943659217),
('reunion', 0.77318988823348167),
('covers', 0.77139937745969345),
('tale', 0.77010822169607374),
('rain', 0.7683706017975328),
('denzel', 0.76804848873306297),
('stays', 0.76787072675588186),
('blob', 0.76725515271366718),
('maria', 0.76214005204689672),
('conventional', 0.76214005204689672),
('fresh', 0.76158434211317383),
('midnight', 0.76096977689870637),
('landscape', 0.75852993982279704),
('animated', 0.75768570169751648),
('titanic', 0.75666058628227129),
('sunday', 0.75666058628227129),
('spring', 0.7537718023763802),
('cagney', 0.7537718023763802),
('enjoyable', 0.75246375771636476),
('immensely', 0.75198768058287868),
('sir', 0.7507762933965817),
('nevertheless', 0.75067102469813185),
('driven', 0.74994477895307854),
('performances', 0.74883252516063137),
('memories', 0.74721440183022114),
('simple', 0.74641420974143258),
('golden', 0.74533293373051557),
('leslie', 0.74533293373051557),
('lovers', 0.74497224842453125),
('relationship', 0.74484232345601786),
('supporting', 0.74357803418683721),
('che', 0.74262723782331497),
('packed', 0.7410032017375805),
('trek', 0.74021469141793106),
('provoking', 0.73840377214806618),
('strikes', 0.73759894313077912),
('depiction', 0.73682224406260699),
('emotional', 0.73678211645681524),
('secretary', 0.7366322924996842),
('influenced', 0.73511137965897755),
('florida', 0.73511137965897755),
('germany', 0.73288750920945944),
('brings', 0.73142936713096229),
('lewis', 0.73129894652432159),
('elderly', 0.73088750854279239),
('owner', 0.72743625403857748),
('streets', 0.72666987259858895),
('henry', 0.72642196944481741),
('portrays', 0.72593700338293632),
('bears', 0.7252354951114458),
('china', 0.72489587887452556),
('anger', 0.72439972406404984),
('society', 0.72433010799663333),
('available', 0.72415741730250549),
('best', 0.72347034060446314),
('bugs', 0.72270598280148979),
('magic', 0.71878961117328299),
('delivers', 0.71846498854423513),
('verhoeven', 0.71846498854423513),
('jim', 0.71783979315031676),
('donald', 0.71667767797013937),
('endearing', 0.71465338578090898),
('relationships', 0.71393795022901896),
('greatly', 0.71256526641704687),
('charlie', 0.71024161391924534),
('simon', 0.70967648251115578),
('effectively', 0.70914752190638641),
('march', 0.70774597998109789),
('atmosphere', 0.70744773070214162),
('influence', 0.70733181555190172),
('genius', 0.706392407309966),
('emotionally', 0.70556970055850243),
('ken', 0.70526854109229009),
('identity', 0.70484322032313651),
('sophisticated', 0.70470800296102132),
('dan', 0.70457587638356811),
('andrew', 0.70329955202396321),
('india', 0.70144598337464037),
('roy', 0.69970458110610434),
('surprisingly', 0.6995780708902356),
('sky', 0.69780919366575667),
('romantic', 0.69664981111114743),
('match', 0.69566924999265523),
('meets', 0.69314718055994529),
('cowboy', 0.69314718055994529),
('wave', 0.69314718055994529),
('bitter', 0.69314718055994529),
('patient', 0.69314718055994529),
('stylish', 0.69314718055994529),
('britain', 0.69314718055994529),
('affected', 0.69314718055994529),
('beatty', 0.69314718055994529),
('love', 0.69198533541937324),
('paul', 0.68980827929443067),
('andy', 0.68846333124751902),
('performance', 0.68797386327972465),
('patrick', 0.68645819240914863),
('unlike', 0.68546468438792907),
('brooks', 0.68433655087779044),
('refuses', 0.68348526964820844),
('award', 0.6824518914431974),
('complaint', 0.6824518914431974),
('ride', 0.68229716453587952),
('dawson', 0.68171848473632257),
('luke', 0.68158635815886937),
('wells', 0.68087708796813096),
('france', 0.6804081547825156),
('sports', 0.68007509899259255),
('handsome', 0.68007509899259255),
('directs', 0.67875844310784572),
('rebel', 0.67875844310784572),
('greater', 0.67605274720064523),
('dreams', 0.67599410133369586),
('effective', 0.67565402311242806),
('interpretation', 0.67479804189174875),
('works', 0.67445504754779284),
('brando', 0.67445504754779284),
('noble', 0.6737290947028437),
('paced', 0.67314651385327573),
('le', 0.67067432470788668),
('master', 0.67015766233524654),
('h', 0.6696166831497512),
('rings', 0.66904962898088483),
('easy', 0.66895995494594152),
('city', 0.66820823221269321),
('sunshine', 0.66782937257565544),
('succeeds', 0.66647893347778397),
('relations', 0.664159643686693),
('england', 0.66387679825983203),
('glimpse', 0.66329421741026418),
('aired', 0.66268797307523675),
('sees', 0.66263163663399482),
('both', 0.66248336767382998),
('definitely', 0.66199789483898808),
('imaginative', 0.66139848224536502),
('appreciate', 0.66083893732728749),
('tricks', 0.66071190480679143),
('striking', 0.66071190480679143),
('carefully', 0.65999497324304479),
('complicated', 0.65981076029235353),
('perspective', 0.65962448852130173),
('trilogy', 0.65877953705573755),
('future', 0.65834665141052828),
('lion', 0.65742909795786608),
('douglas', 0.65540685257709819),
('victor', 0.65540685257709819),
('inspired', 0.65459851044271034),
('marriage', 0.65392646740666405),
('demands', 0.65392646740666405),
('father', 0.65172321672194655),
('page', 0.65123628494430852),
('instant', 0.65058756614114943),
('era', 0.6495567444850836),
('ruthless', 0.64934455790155243),
('saga', 0.64934455790155243),
('joan', 0.64891392558311978),
('joseph', 0.64841128671855386),
('workers', 0.64829661439459352),
('fantasy', 0.64726757480925168),
('distant', 0.64551913157069074),
('accomplished', 0.64551913157069074),
('manhattan', 0.64435701639051324),
('personal', 0.64355023942057321),
('meeting', 0.64313675998528386),
('individual', 0.64313675998528386),
('pushing', 0.64313675998528386),
('pleasant', 0.64250344774119039),
('brave', 0.64185388617239469),
('william', 0.64083139119578469),
('hudson', 0.64077919504262937),
('friendly', 0.63949446706762514),
('eccentric', 0.63907995928966954),
('awards', 0.63875310849414646),
('jack', 0.63838309514997038),
('seeking', 0.63808740337691783),
('divorce', 0.63757732940513456),
('colonel', 0.63757732940513456),
('jane', 0.63443957973316734),
('keeping', 0.63414883979798953),
('gives', 0.63383568159497883),
('ted', 0.63342794585832296),
('animation', 0.63208692379869902),
('progress', 0.6317782341836532),
('larger', 0.63127177684185776),
('concert', 0.63127177684185776),
('nation', 0.6296337748376194),
('albeit', 0.62739580299716491),
('discovers', 0.62542900650499444),
('classic', 0.62504956428050518),
('segment', 0.62335141862440335),
('morgan', 0.62303761437291871),
('mouse', 0.62294292188669675),
('impressive', 0.62211140744319349),
('artist', 0.62168821657780038),
('ultimate', 0.62168821657780038),
('griffith', 0.62117368093485603),
('drew', 0.62082651898031915),
('emily', 0.62082651898031915),
('moved', 0.6197197120051281),
('families', 0.61903920840622351),
('profound', 0.61903920840622351),
('innocent', 0.61851219917136446),
('versions', 0.61730910416844087),
('eddie', 0.61691981517206107),
('criticism', 0.61651395453902935),
('nature', 0.61594514653194088),
('recognized', 0.61518563909023349),
('sexuality', 0.61467556511845012),
('contract', 0.61400986000122149),
('brian', 0.61344043794920278),
('remembered', 0.6131044728864089),
('determined', 0.6123858239154869),
('offers', 0.61207935747116349),
('pleasure', 0.61195702582993206),
('washington', 0.61180154110599294),
('images', 0.61159731359583758),
('games', 0.61067095873570676),
('fashioned', 0.60798937221963845),
('melodrama', 0.60749173598145145),
('rough', 0.60613580357031549),
('charismatic', 0.60613580357031549),
('peoples', 0.60613580357031549),
('dealing', 0.60517840761398811),
('fine', 0.60496962268013299),
('tap', 0.60391604683200273),
('trio', 0.60157998703445481),
('russell', 0.60120968523425966),
('figures', 0.60077386042893011),
('ward', 0.60005675749393339),
('shine', 0.59911823091166894),
('job', 0.59845562125168661),
('satisfied', 0.59652034487087369),
('river', 0.59637962862495086),
('brown', 0.595773016534769),
('believable', 0.59566072133302495),
('always', 0.59470710774669278),
('bound', 0.59470710774669278),
('hall', 0.5933967777928858),
('cook', 0.5916777203950857),
('claire', 0.59136448625000293),
('anna', 0.58778666490211906),
('peace', 0.58628403501758408),
('visually', 0.58539431926349916),
('morality', 0.58525821854876026),
('falk', 0.58525821854876026),
('growing', 0.58466653756587539),
('experiences', 0.58314628534561685),
('stood', 0.58314628534561685),
('touch', 0.58122926435596001),
('lives', 0.5810976767513224),
('kubrick', 0.58066919713325493),
('timing', 0.58047401805583243),
('expressions', 0.57981849525294216),
('struggles', 0.57981849525294216),
('authentic', 0.57848427223980559),
('helen', 0.57763429343810091),
('pre', 0.57700753064729182),
('quirky', 0.5753641449035618),
('young', 0.57531672344534313),
('inner', 0.57454143815209846),
('mexico', 0.57443087372056334),
('clint', 0.57380042292737909),
('sisters', 0.57286101468544337),
('realism', 0.57226528899949558),
('french', 0.5720692490067093),
('personalities', 0.5720692490067093),
('surprises', 0.57113222999698177),
('overcome', 0.5697681593994407),
('timothy', 0.56953322459276867),
('tales', 0.56909453188996639),
('war', 0.56843317302781682),
('civil', 0.5679840376059393),
('countries', 0.56737779327091187),
('streep', 0.56710645966458029),
('oliver', 0.56673325570428668),
('australia', 0.56580775818334383),
('understanding', 0.56531380905006046),
('players', 0.56509525370004821),
('knowing', 0.56489284503626647),
('rogers', 0.56421349718405212),
('suspenseful', 0.56368911332305849),
('variety', 0.56368911332305849),
('true', 0.56281525180810066),
('jr', 0.56220982311246936),
('psychological', 0.56108745854687891),
('sent', 0.55961578793542266),
('grand', 0.55961578793542266),
('branagh', 0.55961578793542266),
('reminiscent', 0.55961578793542266),
('performing', 0.55961578793542266),
('wealth', 0.55961578793542266),
('overwhelming', 0.55961578793542266),
('odds', 0.55961578793542266),
('brothers', 0.55891181043362848),
('howard', 0.55811089675600245),
('david', 0.55693122256475369),
('generation', 0.55628799784274796),
('grow', 0.55612538299565417),
('survival', 0.55594605904646033),
('mainstream', 0.55574731115750231),
('dick', 0.55431073570572953),
('charm', 0.55288175575407861),
('kirk', 0.55278982286502287),
('twists', 0.55244729845681018),
('gangster', 0.55206858230003986),
('jeff', 0.55179306225421365),
('family', 0.55116244510065526),
('tend', 0.55053307336110335),
('thanks', 0.55049088015842218),
('world', 0.54744234723432639),
('sutherland', 0.54743536937855164),
('life', 0.54695514434959924),
('disc', 0.54654370636806993),
('bug', 0.54654370636806993),
('tribute', 0.5455111817538808),
('europe', 0.54522705048332309),
('sacrifice', 0.54430155296238014),
('color', 0.54405127139431109),
('superior', 0.54333490233128523),
('york', 0.54318235866536513),
('pulls', 0.54266622962164945),
('jackson', 0.54232429082536171),
('hearts', 0.54232429082536171),
('enjoy', 0.54124285135906114),
('redemption', 0.54056759296472823),
('stands', 0.5389965007326869),
('trial', 0.5389965007326869),
('greek', 0.5389965007326869),
('hamilton', 0.5389965007326869),
('each', 0.5388212312554177),
('faithful', 0.53773307668591508),
('documentaries', 0.53714293208336406),
('jealous', 0.53714293208336406),
('different', 0.53709860682460819),
('describes', 0.53680111016925136),
('shorts', 0.53596159703753288),
('brilliance', 0.53551823635636209),
('mountains', 0.53492317534505118),
('share', 0.53408248593025787),
('dealt', 0.53408248593025787),
('providing', 0.53329847961804933),
('explore', 0.53329847961804933),
('series', 0.5325809226575603),
('fellow', 0.5323318289869543),
('loves', 0.53062825106217038),
('revolution', 0.53062825106217038),
('olivier', 0.53062825106217038),
('roman', 0.53062825106217038),
('century', 0.53002783074992665),
('musical', 0.52966871156747064),
('heroic', 0.52925932545482868),
('approach', 0.52806743020049673),
('ironically', 0.52806743020049673),
('temple', 0.52806743020049673),
('moves', 0.5279372642387119),
('julie', 0.52609309589677911),
('tells', 0.52415107836314001),
('uncle', 0.52354439617376536),
('union', 0.52324814376454787),
('deep', 0.52309571635780505),
('reminds', 0.52157841554225237),
('famous', 0.52118841080153722),
('jazz', 0.52053443789295151),
('dennis', 0.51987545928590861),
('epic', 0.51919387343650736),
('shows', 0.51915322220375304),
('performed', 0.5191244265806858),
('demons', 0.5191244265806858),
('discovered', 0.51879379341516751),
('eric', 0.51879379341516751),
('youth', 0.5185626062681431),
('human', 0.51851411224987087),
('tarzan', 0.51813827061227724),
('ourselves', 0.51794309153485463),
('wwii', 0.51758240622887042),
('passion', 0.5162164724008671),
('desire', 0.51607497965213445),
('pays', 0.51581316527702981),
('dirty', 0.51557622652458857),
('fox', 0.51557622652458857),
('sympathetic', 0.51546600332249293),
('symbolism', 0.51546600332249293),
('attitude', 0.51530993621331933),
('appearances', 0.51466440007315639),
('jeremy', 0.51466440007315639),
('fun', 0.51439068993048687),
('south', 0.51420972175023116),
('arrives', 0.51409894911095988),
('present', 0.51341965894303732),
('com', 0.51326167856387173),
('smile', 0.51265880484765169),
('alan', 0.51082562376599072),
('ring', 0.51082562376599072),
('visit', 0.51082562376599072),
('fits', 0.51082562376599072),
('provided', 0.51082562376599072),
('carter', 0.51082562376599072),
('aging', 0.51082562376599072),
('countryside', 0.51082562376599072),
('begins', 0.51015650363396647),
('success', 0.50900578704900468),
('japan', 0.50900578704900468),
('accurate', 0.50895471583017893),
('proud', 0.50800474742434931),
('daily', 0.5075946031845443),
('karloff', 0.50724780241810674),
('atmospheric', 0.50724780241810674),
('recently', 0.50714914903668207),
('fu', 0.50704490092608467),
('horrors', 0.50656122497953315),
('finding', 0.50637127341661037),
('lust', 0.5059356384717989),
('hitchcock', 0.50574947073413001),
('among', 0.50334004951332734),
('viewing', 0.50302139827440906),
('investigation', 0.50262885656181222),
('shining', 0.50262885656181222),
('duo', 0.5020919437972361),
('cameron', 0.5020919437972361),
('finds', 0.50128303100539795),
('contemporary', 0.50077528791248915),
('genuine', 0.50046283673044401),
('frightening', 0.49995595152908684),
('plays', 0.49975983848890226),
('age', 0.49941323171424595),
('position', 0.49899116611898781),
('continues', 0.49863035067217237),
('roles', 0.49839716550752178),
('james', 0.49837216269470402),
('individuals', 0.49824684155913052),
('brought', 0.49783842823917956),
('hilarious', 0.49714551986191058),
('brutal', 0.49681488669639234),
('appropriate', 0.49643688631389105),
('dance', 0.49581998314812048),
('league', 0.49578774640145024),
('helping', 0.49578774640145024),
('stunts', 0.49561620510246196),
('traveling', 0.49532143723002542),
('thoroughly', 0.49414593456733524),
('depicted', 0.49317068852726992),
('combination', 0.49247648509779424),
('honor', 0.49247648509779424),
('differences', 0.49247648509779424),
('fully', 0.49213349075383811),
('tracy', 0.49159426183810306),
('battles', 0.49140753790888908),
('possibility', 0.49112055268665822),
('romance', 0.4901589869574316),
('initially', 0.49002249613622745),
('happy', 0.4898997500608791),
('crime', 0.48977221456815834),
('singing', 0.4893852925281213),
('especially', 0.48901267837860624),
('shakespeare', 0.48754793889664511),
('hugh', 0.48729512635579658),
('detail', 0.48609484250827351),
('julia', 0.48550781578170082),
('san', 0.48550781578170082),
('guide', 0.48550781578170082),
('desperation', 0.48550781578170082),
('companion', 0.48550781578170082),
('strongly', 0.48460242866688824),
('necessary', 0.48302334245403883),
('humanity', 0.48265474679929443),
('drama', 0.48221998493060503),
('nonetheless', 0.48183808689273838),
('intrigue', 0.48183808689273838),
('warming', 0.48183808689273838),
('cuba', 0.48183808689273838),
('planned', 0.47957308026188628),
('pictures', 0.47929937011921681),
('nine', 0.47803580094299974),
('settings', 0.47743860773325364),
('history', 0.47732966933780852),
('ordinary', 0.47725880012690741),
('official', 0.47608267532211779),
('primary', 0.47608267532211779),
('episode', 0.47529620261150429),
('role', 0.47520268270188676),
('spirit', 0.47477690799839323),
('grey', 0.47409361449726067),
('ways', 0.47323464982718205),
('cup', 0.47260441094579297),
('piano', 0.47260441094579297),
('familiar', 0.47241617565111949),
('sinister', 0.47198579044972683),
('reveal', 0.47171449364936496),
('max', 0.47150852042515579),
('dated', 0.47121648567094482),
('losing', 0.47000362924573563),
('discovery', 0.47000362924573563),
('vicious', 0.47000362924573563),
('genuinely', 0.46871413841586385),
('hatred', 0.46734051182625186),
('mistaken', 0.46702300110759781),
('dream', 0.46608972992459924),
('challenge', 0.46608972992459924),
('crisis', 0.46575733836428446),
('photographed', 0.46488852857896512),
('critics', 0.46430560813109778),
('bird', 0.46430560813109778),
('machines', 0.46430560813109778),
('born', 0.46411383518967209),
('detective', 0.4636633473511525),
('higher', 0.46328467899699055),
('remains', 0.46262352194811296),
('inevitable', 0.46262352194811296),
('soviet', 0.4618180446592961),
('ryan', 0.46134556650262099),
('african', 0.46112595521371813),
('smaller', 0.46081520319132935),
('techniques', 0.46052488529119184),
('information', 0.46034171833399862),
('deserved', 0.45999798712841444),
('lynch', 0.45953232937844013),
('spielberg', 0.45953232937844013),
('cynical', 0.45953232937844013),
('tour', 0.45953232937844013),
('francisco', 0.45953232937844013),
('struggle', 0.45911782160048453),
('language', 0.45902121257712653),
('visual', 0.45823514408822852),
('warner', 0.45724137763188427),
('social', 0.45720078250735313),
('reality', 0.45719346885019546),
('hidden', 0.45675840249571492),
('breaking', 0.45601738727099561),
('sometimes', 0.45563021171182794),
('modern', 0.45500247579345005),
('surfing', 0.45425527227759638),
('popular', 0.45410691533051023),
('surprised', 0.4534409399850382),
('follows', 0.45245361754408348),
('keeps', 0.45234869400701483),
('john', 0.4520909494482197),
('mixed', 0.45198512374305722),
('defeat', 0.45198512374305722),
('justice', 0.45142724367280018),
('treasure', 0.45083371313801535),
('presents', 0.44973793178615257),
('years', 0.44919197032104968),
('chief', 0.44895022004790319),
('closely', 0.44701411102103689),
('segments', 0.44701411102103689),
('lose', 0.44658335503763702),
('caine', 0.44628710262841953),
('caught', 0.44610275383999071),
('hamlet', 0.44558510189758965),
('chinese', 0.44507424620321018),
('welcome', 0.44438052435783792),
('birth', 0.44368632092836219),
('represents', 0.44320543609101143),
('puts', 0.44279106572085081),
('visuals', 0.44183275227903923),
('fame', 0.44183275227903923),
('closer', 0.44183275227903923),
('web', 0.44183275227903923),
('criminal', 0.4412745608048752),
('minor', 0.4409224199448939),
('jon', 0.44086703515908027),
('liked', 0.44074991514020723),
('restaurant', 0.44031183943833246),
('de', 0.43983275161237217),
('flaws', 0.43983275161237217),
('searching', 0.4393666597838457),
('rap', 0.43891304217570443),
('light', 0.43884433018199892),
('elizabeth', 0.43872232986464682),
('marry', 0.43861731542506488),
('learned', 0.43825493093115531),
('controversial', 0.43825493093115531),
('oz', 0.43825493093115531),
('slowly', 0.43785660389939979),
('comedic', 0.43721380642274466),
('wayne', 0.43721380642274466),
('thrilling', 0.43721380642274466),
('bridge', 0.43721380642274466),
('married', 0.43658501682196887),
('nazi', 0.4361020775700542),
('murder', 0.4353180712578455),
('physical', 0.4353180712578455),
('johnny', 0.43483971678806865),
('michelle', 0.43445264498141672),
('wallace', 0.43403848055222038),
('comedies', 0.43395706390247063),
('silent', 0.43395706390247063),
('played', 0.43387244114515305),
('international', 0.43363598507486073),
('vision', 0.43286408229627887),
('intelligent', 0.43196704885367099),
('shop', 0.43078291609245434),
('also', 0.43036720209769169),
('levels', 0.4302451371066513),
('miss', 0.43006426712153217),
('movement', 0.4295626596872249),
...]

``````
``````

In [14]:

# words most frequently seen in a review with a "NEGATIVE" label
list(reversed(pos_neg_ratios.most_common()))[0:30]

``````
``````

Out[14]:

[('boll', -4.0778152602708904),
('uwe', -3.9218753018711578),
('seagal', -3.3202501058581921),
('unwatchable', -3.0269848170580955),
('stinker', -2.9876839403711624),
('mst', -2.7753833211707968),
('incoherent', -2.7641396677532537),
('unfunny', -2.5545257844967644),
('waste', -2.4907515123361046),
('blah', -2.4475792789485005),
('horrid', -2.3715779644809971),
('pointless', -2.3451073877136341),
('atrocious', -2.3187369339642556),
('redeeming', -2.2667790015910296),
('prom', -2.2601040980178784),
('drivel', -2.2476029585766928),
('lousy', -2.2118080125207054),
('worst', -2.1930856334332267),
('laughable', -2.172468615469592),
('awful', -2.1385076866397488),
('poorly', -2.1326133844207011),
('wasting', -2.1178155545614512),
('remotely', -2.111046881095167),
('existent', -2.0024805005437076),
('boredom', -1.9241486572738005),
('miserably', -1.9216610938019989),
('sucks', -1.9166645809588516),
('uninspired', -1.9131499212248517),
('lame', -1.9117232884159072),
('insult', -1.9085323769376259)]

``````

# Transforming Text into Numbers

``````

In [15]:

from IPython.display import Image

review = "This was a horrible, terrible movie."

Image(filename='sentiment_network.png')

``````
``````

Out[15]:

``````
``````

In [16]:

review = "The movie was excellent"

Image(filename='sentiment_network_pos.png')

``````
``````

Out[16]:

``````

# Project 2: Creating the Input/Output Data

``````

In [17]:

vocab = set(total_counts.keys())
vocab_size = len(vocab)
print(vocab_size)

``````
``````

74074

``````
``````

In [18]:

list(vocab)

``````
``````

Out[18]:

['',
'devotion',
'rwtd',
'hazy',
'governing',
'shied',
'usb',
'kahlua',
'dolman',
'tiana',
'toms',
'waterslides',
'perpetuates',
'ragged',
'kirtland',
'unparalleled',
'scrubbed',
'contemperaneous',
'hellishly',
'gimli',
'uncynical',
'synonym',
'proba',
'trident',
'sidestep',
'babban',
'chenoweth',
'cannabalistic',
'spots',
'ihave',
'kriemshild',
'frys',
'route',
'innsbruck',
'nonreligious',
'swig',
'haft',
'wags',
'rugrats',
'blocker',
'paltrow',
'pacer',
'snapshotters',
'weaver',
'eaters',
'deflected',
'batch',
'dub',
'unreasonable',
'clenching',
'introducing',
'plugged',
'venice',
'nlp',
'honored',
'pittman',
'plodded',
'airliners',
'grimacing',
'centerfold',
'whiteboy',
'preservatives',
'ninteen',
'beaut',
'gaggling',
'hotdog',
'demonized',
'therapeutic',
'demon',
'jonatha',
'ramping',
'ingmar',
'accorsi',
'lsd',
'facts',
'shiny',
'climber',
'gallows',
'konigin',
'drownes',
'preparatory',
'crimson',
'simplyfied',
'recomendation',
'demolish',
'orignal',
'preempt',
'mpkdh',
'counterfeit',
'beastly',
'sollett',
'unskillful',
'andersen',
'plateaus',
'old',
'noam',
'startling',
'mutilation',
'projective',
'liquefied',
'favorites',
'delusional',
'meddlesome',
'swankiest',
'lwensohn',
'eds',
'anabel',
'mcdougall',
'waving',
'following',
'hissy',
'voracious',
'demotes',
'geopolitical',
'paramour',
'hultn',
'gendered',
'atrociousness',
'kipling',
'withdrawal',
'mucks',
'repairmen',
'kilts',
'rosson',
'spasitc',
'rude',
'pupsi',
'blainsworth',
'undeclared',
'birdman',
'tian',
'excell',
'spaces',
'reviewing',
'arlette',
'deewar',
'blanked',
'wellesley',
'yaphet',
'despise',
'infective',
'accredited',
'hatsumo',
'allows',
'thieriot',
'royale',
'bolye',
'concentrated',
'lacked',
'benedict',
'fluegel',
'undercurrent',
'unrurly',
'idly',
'sophomore',
'schilling',
'burlesque',
'define',
'reincarnations',
'transparencies',
'aparthied',
'reggae',
'sprog',
'matel',
'ekeing',
'comparable',
'marcy',
'caregiver',
'roo',
'prating',
'skips',
'petrillo',
'underpinnings',
'stallonethat',
'mistresses',
'laundromat',
'contrite',
'split',
'eliminating',
'poppins',
'meanie',
'southron',
'tait',
'allover',
'korman',
'normally',
'departure',
'effortlessly',
'otherworldliness',
'chest',
'pah',
'scorned',
'bumble',
'velocity',
'duration',
'motivates',
'catty',
'improbabilities',
'facet',
'concieved',
'lascivious',
'ferociously',
'silvia',
'walder',
'transfixing',
'gruntled',
'penpusher',
'vill',
'herriman',
'smartie',
'collapse',
'unashamedly',
'idioterne',
'darkside',
'anachronic',
'kerim',
'nihlani',
'topping',
'yama',
'arrghh',
'detatched',
'magnificant',
'strength',
'sontee',
'troublesome',
'lakers',
'resolve',
'resurrection',
'editorializing',
'wexler',
'backdrops',
'stevenson',
'erred',
'comparrison',
'dabney',
'parallels',
'wcw',
'hasselhof',
'tongues',
'quip',
'artist',
'pagan',
'bingham',
'necheyev',
'sais',
'winger',
'finds',
'gryll',
'winterwonder',
'producer',
'humoristic',
'dighton',
'blocking',
'feij',
'gabbled',
'lars',
'yamashita',
'bekmambetov',
'pure',
'perspicacious',
'concerned',
'protaganiste',
'electing',
'ceremony',
'nva',
'longendecker',
'tassi',
'overviews',
'hannay',
'dumb',
'gaberial',
'booboo',
'hoffman',
'harrowing',
'guerriri',
'prominant',
'whirry',
'evangelion',
'popinjay',
'recommended',
'develops',
'heroistic',
'crumpled',
'scoop',
'azuma',
'contextualising',
'durn',
'joycey',
'tvm',
'megabomb',
'versatile',
'listens',
'natalie',
'shelved',
'recored',
'symbiosis',
'unfocused',
'berth',
'georgians',
'amlie',
'heavyarms',
'collette',
'soldierly',
'contracting',
'anastacia',
'herv',
'wayback',
'insensitive',
'activist',
'judders',
'upstream',
'romanced',
'flix',
'presently',
'securing',
'jurgens',
'husen',
'bleakness',
'geare',
'regularity',
'casper',
'arming',
'yubari',
'catwalk',
'domestic',
'transplantation',
'ennia',
'fleurieu',
'censorious',
'westwood',
'psyching',
'divers',
'kikki',
'passion',
'coctails',
'ne',
'intermingle',
'lagravenese',
'bassinger',
'disengaged',
'pennies',
'patheticness',
'kheymeh',
'kiarostami',
'dions',
'hearing',
'caustic',
'impending',
'shrine',
'corinne',
'sorrowfully',
'ramotswe',
'ears',
'benq',
'oompah',
'maneur',
'pleasance',
'plummer',
'carver',
'laborious',
'chancellor',
'nyquist',
'houseboats',
'womman',
'jayenge',
'boswell',
'keillor',
'objected',
'conflicting',
'shapes',
'ww',
'gravitate',
'warpath',
'idylls',
'dignities',
'muddies',
'fairytales',
'untangle',
'feast',
'soars',
'lowered',
'notwithstanding',
'malplaced',
'arriv',
'ridden',
'laawaris',
'lyu',
'reappears',
'playoffs',
'idol',
'orphanage',
'iceholes',
'cebuano',
'multilevel',
'drugging',
'argued',
'vapidness',
'unzips',
'ayone',
'peeling',
'baggot',
'abishek',
'halluzinations',
'restrictions',
'mirrors',
'mimic',
'reaganism',
'bgr',
'magazines',
'naysayer',
'richly',
'modulation',
'perspectives',
'luckett',
'willed',
'plying',
'sherrys',
'bloodiness',
'mortal',
'tieing',
'rename',
'agns',
'melman',
'caminho',
'cnn',
'unbeknownest',
'kinematograph',
'rtl',
'footnotes',
'kobe',
'insurgents',
'hathcocks',
'salk',
'alyce',
'bestowing',
'complacency',
'soured',
'ullman',
'yield',
'calchas',
'glinda',
'hennessey',
'amateur',
'halperin',
'zealands',
'purposely',
'chilton',
'linch',
'eartha',
'hyperspace',
'legioners',
'ninjas',
'grosse',
'balls',
'satanic',
'shoudl',
'scalped',
'afterschool',
'transient',
'shalom',
'coherently',
'endeavoring',
'tobei',
'gnashingly',
'manhole',
'coixet',
'soundtracks',
'kohala',
'edo',
'incentivized',
'ibsen',
'breckenridge',
'thoughtlessness',
'identification',
'derails',
'cinematographers',
'tamako',
'jeroen',
'rhind',
'deniselacey',
'candolis',
'caisse',
'rationalize',
'exasperatedly',
'ibnez',
'congregations',
'heartbreak',
'unintended',
'could',
'deceiving',
'ubc',
'thumbtack',
'economies',
'delanda',
'booooring',
'ati',
'mariel',
'computability',
'engulf',
'tindersticks',
'reverse',
'bobo',
'sewing',
'boreham',
'defying',
'toning',
'packed',
'innovated',
'clot',
'rosarios',
'arbore',
'terror',
'sticks',
'fraudulence',
'subsided',
'regretful',
'snipering',
'deficating',
'rampages',
'blackbird',
'howser',
'naismith',
'kornbluths',
'somersaulted',
'sensationalistic',
'qualitatively',
'contrasting',
'crapdom',
'chichi',
'phi',
'shrunken',
'duh',
'materialistic',
'winstons',
'mohd',
'warters',
'husbandgino',
'benno',
'clomps',
'arros',
'reefer',
'hilarius',
'attemps',
'smirky',
'trivialized',
'thirds',
'neva',
'tykes',
'dyeing',
'excelent',
'crest',
'essential',
'aloft',
'matchbox',
'befores',
'mpaarated',
'thinly',
'vaudevillian',
'cineliterate',
'leaving',
'burlinson',
'muzzy',
'oddly',
'proper',
'airphone',
'mujde',
'client',
'imaginatively',
'coworkers',
'suspicious',
'cutlet',
'barley',
'sparring',
'francine',
'gwilym',
'insanities',
'blank',
'farly',
'groove',
'fallowing',
'ogre',
'mightiest',
'telling',
'interracial',
'undo',
'prosthetic',
'outmatched',
'operating',
'fill',
'goldthwait',
'trappings',
'govida',
'coral',
'masssacre',
'stooges',
'simulated',
'kik',
'culminates',
'occaisionally',
'mme',
'unluckily',
'hometown',
'irishman',
'jymn',
'devouring',
'reprimanded',
'dealings',
'belaboured',
'denmark',
'kooks',
'maman',
'attain',
'ahhhhhh',
'demolishing',
'bdus',
'proficient',
'friedrich',
'kaczorowski',
'catboy',
'kazakh',
'takechi',
'inexhaustible',
'bragg',
'verikoan',
'soh',
'cletus',
'fugue',
'carteloise',
'visayans',
'microwaving',
'diepardieu',
'iyer',
'outmoded',
'partha',
'firework',
'spores',
'clack',
'kimberley',
'capraesque',
'elmes',
'kaye',
'conceptions',
'compiled',
'tastic',
'zizek',
'distribution',
'rajasthani',
'kak',
'waaaay',
'mousy',
'martnez',
'tollywood',
'suggestively',
'phase',
'trios',
'tripped',
'rombero',
'jianxiang',
'hyser',
'stumps',
'butlers',
'vaughan',
'indra',
'fairmindedness',
'unshaven',
'idiotically',
'rudolf',
'circulate',
'titantic',
'wallop',
'christo',
'imprisonment',
'actively',
'westernisation',
'personalize',
'enraging',
'impersonating',
'benson',
'daghang',
'fork',
'eventide',
'convinced',
'haughtiness',
'underclothing',
'idyllic',
'pragmatism',
'reporter',
'slowish',
'sanjeev',
'diagnosis',
'diamantino',
'overdue',
'patriarchal',
'intros',
'byu',
'frisky',
'tum',
'silhouetted',
'cruelity',
'cannibal',
'cule',
'failure',
'darts',
'seminar',
'pret',
'coleridge',
'sourpuss',
'buccaneer',
'photowise',
'redundancies',
'critisim',
'arielle',
'furtive',
'atlantians',
'kwok',
'mccain',
'costar',
'sleaziest',
'reaally',
'repugnancy',
'celery',
'streamlining',
'basra',
'virtuous',
'democrats',
'brazilian',
'inanely',
'cranial',
'thrice',
'artiest',
'expose',
'hackenstein',
'nuns',
'garda',
'savalas',
'debts',
'replicated',
'hotwired',
'trolls',
'antiwar',
'mcallister',
'appalachia',
'dimes',
'steretyped',
'rukh',
'tramps',
'impulses',
'collaborator',
'exeption',
'hms',
'wolsky',
'terrorizer',
'roflmao',
'barrio',
'rantzen',
'kaufmann',
'arms',
'telkovsky',
'estes',
'clearer',
'vachtangi',
'rougher',
'mikuni',
'zinemann',
'unizhennye',
'gothas',
'governmentmedia',
'lis',
'affable',
'unfotunately',
'wieder',
'delane',
'achievable',
'spinsterish',
'clytemnestra',
'wichita',
'textbook',
'regrets',
'gosha',
'clement',
'wiggly',
'salle',
'derboiler',
'fraculater',
'directors',
'tugging',
'stuhr',
'revelling',
'bedlam',
'fanaticism',
'keyser',
'pests',
'joey',
'sleepless',
'ruggia',
'watkins',
'quotes',
'centralized',
'publicists',
'marshal',
'pat',
'nonprofessional',
'puny',
'developping',
'huey',
'morrisette',
'waldomiro',
'auditioning',
'eastwoods',
'counterweight',
'metamorphis',
'attanborough',
'torre',
'fraidy',
'piercings',
'superwonderscope',
'nietszche',
'sione',
'beggining',
'rotne',
'indomitability',
'atley',
'molnar',
'fruits',
'greeter',
'recompense',
'tannhauser',
'cats',
'goriness',
'hirjee',
'clockers',
'scums',
'extort',
'sets',
'brooked',
'charley',
'dissing',
'paraphernalia',
'belisario',
'ververgaert',
'bonet',
'toly',
'raggedys',
'chuck',
'saxophonists',
'sulfurous',
'carrion',
'fangorn',
'haige',
'bambaiya',
'rentar',
'raptus',
'lupa',
'mordant',
'chestnuts',
'methodology',
'synchronicity',
'lbs',
'mutilating',
'fellatio',
'zapar',
'apparel',
'descendant',
'delaware',
'proof',
'combatant',
'oozed',
'unbelieveable',
'bliep',
'speared',
'smelling',
'soviet',
'strings',
'keen',
'picturization',
'curits',
'explosive',
'rosa',
'regales',
'blackgood',
'prosy',
'brocoli',
'snickers',
'benussi',
'propagandist',
'castle',
'hayseed',
'stretchs',
'fatherland',
'makeup',
'aldiss',
'inverts',
'outward',
'looking',
'lutz',
'huitieme',
'cds',
'whispers',
'inconsequential',
'substantiate',
'klembecker',
'fluctuates',
'lamented',
'rides',
'trustees',
'omarosa',
'poliwhirl',
'mothballed',
'femi',
'dinged',
'casio',
'nighty',
'espionage',
'golgo',
'commonality',
'bodysuckers',
'semester',
'unnaturally',
'surging',
'havana',
'classicists',
'chimps',
'rusting',
'sooni',
'gish',
'strickland',
'unctuous',
'quarreled',
'expands',
'zeffirelli',
'inarguably',
'blackploitation',
'manhattanites',
'summing',
'absolutly',
'galvanize',
'clerks',
'insidiously',
'empt',
'brewery',
'steph',
'batali',
'coulouris',
'arena',
'turkish',
'undercooked',
'juveniles',
'hopes',
'departs',
'jima',
'burgendy',
'mbongeni',
'gazillion',
'calicos',
'oaks',
'wrestled',
'puling',
'trixie',
'kalashnikov',
'strangeness',
'cots',
'populated',
'thespic',
'mache',
'daubeney',
'steaming',
'parmistan',
'waaaaaay',
'misbehaves',
'local',
'resent',
'massacred',
'trifling',
...]

``````
``````

In [19]:

import numpy as np

layer_0 = np.zeros((1,vocab_size))
layer_0

``````
``````

Out[19]:

array([[ 0.,  0.,  0., ...,  0.,  0.,  0.]])

``````
``````

In [20]:

from IPython.display import Image
Image(filename='sentiment_network.png')

``````
``````

Out[20]:

``````
``````

In [21]:

word2index = {}

for i,word in enumerate(vocab):
word2index[word] = i
word2index

``````
``````

Out[21]:

{'': 0,
'devotion': 2,
'rwtd': 3,
'hazy': 4,
'governing': 5,
'shied': 6,
'usb': 7,
'kahlua': 8,
'dolman': 9,
'tiana': 10,
'toms': 11,
'waterslides': 12,
'perpetuates': 13,
'ragged': 14,
'kirtland': 15,
'unparalleled': 16,
'scrubbed': 17,
'contemperaneous': 18,
'hellishly': 19,
'gimli': 20,
'uncynical': 21,
'synonym': 22,
'proba': 23,
'trident': 24,
'sidestep': 25,
'babban': 26,
'chenoweth': 27,
'cannabalistic': 28,
'spots': 29,
'ihave': 30,
'kriemshild': 31,
'frys': 32,
'route': 33,
'innsbruck': 34,
'nonreligious': 35,
'swig': 36,
'haft': 37,
'wags': 38,
'rugrats': 39,
'blocker': 40,
'paltrow': 41,
'pacer': 42,
'snapshotters': 44,
'weaver': 45,
'eaters': 46,
'deflected': 48,
'batch': 49,
'dub': 50,
'unreasonable': 51,
'clenching': 52,
'introducing': 53,
'plugged': 54,
'venice': 55,
'nlp': 56,
'honored': 57,
'pittman': 58,
'plodded': 59,
'airliners': 60,
'grimacing': 61,
'centerfold': 62,
'whiteboy': 63,
'preservatives': 64,
'ninteen': 65,
'beaut': 66,
'gaggling': 67,
'hotdog': 68,
'demonized': 69,
'therapeutic': 70,
'demon': 71,
'jonatha': 72,
'ramping': 73,
'ingmar': 74,
'accorsi': 75,
'lsd': 76,
'facts': 77,
'shiny': 78,
'climber': 79,
'gallows': 80,
'konigin': 81,
'drownes': 82,
'preparatory': 83,
'crimson': 84,
'simplyfied': 85,
'recomendation': 86,
'demolish': 87,
'orignal': 88,
'preempt': 89,
'mpkdh': 90,
'counterfeit': 91,
'beastly': 92,
'sollett': 94,
'unskillful': 95,
'andersen': 96,
'plateaus': 97,
'old': 98,
'noam': 99,
'startling': 100,
'mutilation': 101,
'projective': 102,
'liquefied': 103,
'favorites': 104,
'delusional': 105,
'meddlesome': 106,
'swankiest': 107,
'lwensohn': 108,
'eds': 109,
'anabel': 110,
'mcdougall': 111,
'waving': 112,
'following': 113,
'hissy': 114,
'voracious': 115,
'demotes': 116,
'geopolitical': 117,
'paramour': 118,
'hultn': 119,
'gendered': 120,
'atrociousness': 121,
'kipling': 122,
'withdrawal': 123,
'mucks': 124,
'repairmen': 125,
'kilts': 127,
'rosson': 128,
'spasitc': 130,
'rude': 131,
'pupsi': 132,
'blainsworth': 133,
'undeclared': 134,
'birdman': 135,
'tian': 136,
'excell': 137,
'spaces': 138,
'reviewing': 139,
'arlette': 140,
'deewar': 141,
'blanked': 142,
'wellesley': 143,
'yaphet': 144,
'despise': 145,
'infective': 146,
'accredited': 147,
'hatsumo': 148,
'allows': 149,
'thieriot': 150,
'royale': 151,
'bolye': 152,
'concentrated': 153,
'lacked': 154,
'benedict': 155,
'fluegel': 156,
'undercurrent': 157,
'unrurly': 158,
'idly': 159,
'sophomore': 160,
'schilling': 161,
'burlesque': 162,
'define': 164,
'reincarnations': 165,
'transparencies': 166,
'aparthied': 167,
'reggae': 168,
'sprog': 169,
'matel': 170,
'ekeing': 171,
'comparable': 172,
'marcy': 173,
'caregiver': 174,
'roo': 175,
'prating': 176,
'skips': 177,
'petrillo': 178,
'underpinnings': 179,
'stallonethat': 180,
'mistresses': 181,
'laundromat': 182,
'contrite': 183,
'split': 184,
'eliminating': 185,
'poppins': 186,
'meanie': 187,
'southron': 188,
'tait': 189,
'allover': 190,
'korman': 191,
'normally': 192,
'departure': 193,
'effortlessly': 194,
'otherworldliness': 195,
'chest': 196,
'pah': 197,
'scorned': 198,
'bumble': 199,
'velocity': 200,
'duration': 201,
'motivates': 202,
'catty': 203,
'improbabilities': 204,
'facet': 205,
'concieved': 206,
'lascivious': 207,
'ferociously': 208,
'silvia': 209,
'walder': 210,
'transfixing': 211,
'gruntled': 212,
'penpusher': 213,
'vill': 214,
'herriman': 215,
'smartie': 216,
'collapse': 217,
'unashamedly': 218,
'idioterne': 219,
'darkside': 220,
'anachronic': 221,
'kerim': 222,
'nihlani': 223,
'topping': 224,
'yama': 225,
'arrghh': 226,
'detatched': 227,
'magnificant': 228,
'strength': 229,
'sontee': 230,
'troublesome': 231,
'lakers': 232,
'resolve': 233,
'resurrection': 234,
'editorializing': 235,
'wexler': 236,
'backdrops': 237,
'stevenson': 238,
'erred': 239,
'comparrison': 240,
'dabney': 241,
'parallels': 242,
'wcw': 243,
'hasselhof': 245,
'tongues': 246,
'quip': 247,
'artist': 248,
'pagan': 249,
'bingham': 251,
'necheyev': 252,
'sais': 253,
'winger': 254,
'finds': 255,
'gryll': 256,
'winterwonder': 257,
'producer': 258,
'humoristic': 259,
'dighton': 260,
'blocking': 261,
'feij': 262,
'gabbled': 263,
'lars': 264,
'yamashita': 265,
'bekmambetov': 266,
'pure': 267,
'perspicacious': 268,
'concerned': 269,
'protaganiste': 270,
'electing': 271,
'ceremony': 272,
'nva': 273,
'longendecker': 274,
'tassi': 275,
'overviews': 276,
'hannay': 277,
'dumb': 278,
'gaberial': 279,
'booboo': 280,
'hoffman': 281,
'harrowing': 282,
'guerriri': 283,
'prominant': 284,
'whirry': 285,
'evangelion': 286,
'popinjay': 287,
'recommended': 288,
'develops': 289,
'heroistic': 290,
'crumpled': 291,
'scoop': 292,
'azuma': 293,
'contextualising': 294,
'durn': 295,
'joycey': 296,
'tvm': 297,
'megabomb': 298,
'versatile': 299,
'listens': 300,
'natalie': 301,
'shelved': 302,
'recored': 303,
'symbiosis': 304,
'unfocused': 305,
'berth': 306,
'georgians': 307,
'amlie': 308,
'heavyarms': 309,
'collette': 310,
'soldierly': 311,
'contracting': 312,
'anastacia': 313,
'herv': 314,
'wayback': 315,
'insensitive': 316,
'activist': 317,
'judders': 318,
'upstream': 319,
'romanced': 320,
'flix': 321,
'presently': 322,
'securing': 323,
'jurgens': 324,
'husen': 325,
'bleakness': 326,
'geare': 327,
'regularity': 328,
'casper': 329,
'arming': 330,
'yubari': 331,
'catwalk': 332,
'domestic': 333,
'transplantation': 334,
'ennia': 335,
'fleurieu': 336,
'censorious': 337,
'westwood': 338,
'psyching': 339,
'divers': 340,
'kikki': 341,
'passion': 342,
'coctails': 343,
'ne': 344,
'intermingle': 345,
'lagravenese': 346,
'bassinger': 347,
'disengaged': 348,
'pennies': 349,
'patheticness': 350,
'kheymeh': 351,
'kiarostami': 352,
'dions': 353,
'hearing': 354,
'caustic': 355,
'impending': 356,
'shrine': 357,
'corinne': 358,
'sorrowfully': 359,
'ramotswe': 360,
'ears': 361,
'benq': 362,
'oompah': 363,
'maneur': 364,
'pleasance': 365,
'plummer': 366,
'carver': 367,
'laborious': 368,
'chancellor': 369,
'nyquist': 370,
'houseboats': 371,
'womman': 372,
'jayenge': 373,
'boswell': 374,
'keillor': 375,
'objected': 377,
'conflicting': 378,
'shapes': 379,
'ww': 380,
'gravitate': 381,
'warpath': 382,
'idylls': 384,
'dignities': 385,
'muddies': 386,
'fairytales': 387,
'untangle': 388,
'feast': 389,
'soars': 390,
'lowered': 391,
'notwithstanding': 392,
'malplaced': 393,
'arriv': 394,
'ridden': 395,
'laawaris': 396,
'lyu': 397,
'reappears': 398,
'playoffs': 399,
'idol': 400,
'orphanage': 401,
'iceholes': 402,
'cebuano': 403,
'multilevel': 404,
'drugging': 405,
'argued': 406,
'vapidness': 407,
'unzips': 408,
'ayone': 409,
'peeling': 410,
'baggot': 411,
'abishek': 412,
'halluzinations': 413,
'restrictions': 414,
'mirrors': 415,
'mimic': 416,
'reaganism': 417,
'bgr': 418,
'magazines': 419,
'naysayer': 420,
'richly': 421,
'modulation': 422,
'perspectives': 423,
'luckett': 424,
'willed': 425,
'plying': 426,
'sherrys': 427,
'bloodiness': 428,
'mortal': 429,
'tieing': 430,
'rename': 431,
'agns': 432,
'melman': 433,
'caminho': 434,
'cnn': 435,
'unbeknownest': 436,
'kinematograph': 437,
'rtl': 438,
'footnotes': 439,
'kobe': 440,
'insurgents': 441,
'hathcocks': 442,
'salk': 443,
'alyce': 444,
'bestowing': 445,
'complacency': 446,
'soured': 447,
'ullman': 448,
'yield': 449,
'calchas': 450,
'glinda': 451,
'hennessey': 452,
'amateur': 453,
'halperin': 454,
'zealands': 455,
'purposely': 456,
'chilton': 457,
'linch': 458,
'eartha': 459,
'hyperspace': 460,
'legioners': 461,
'ninjas': 462,
'grosse': 463,
'balls': 464,
'satanic': 465,
'shoudl': 466,
'scalped': 467,
'afterschool': 468,
'transient': 469,
'shalom': 470,
'coherently': 471,
'endeavoring': 472,
'tobei': 473,
'gnashingly': 474,
'manhole': 475,
'coixet': 476,
'soundtracks': 477,
'kohala': 478,
'edo': 479,
'incentivized': 480,
'ibsen': 481,
'breckenridge': 482,
'thoughtlessness': 483,
'identification': 484,
'derails': 485,
'cinematographers': 486,
'tamako': 487,
'jeroen': 488,
'rhind': 489,
'deniselacey': 490,
'candolis': 491,
'caisse': 492,
'rationalize': 493,
'exasperatedly': 494,
'ibnez': 495,
'congregations': 496,
'heartbreak': 497,
'unintended': 498,
'could': 499,
'deceiving': 500,
'ubc': 501,
'thumbtack': 502,
'economies': 503,
'delanda': 504,
'booooring': 505,
'ati': 506,
'mariel': 507,
'computability': 508,
'engulf': 509,
'tindersticks': 510,
'reverse': 511,
'bobo': 512,
'sewing': 513,
'boreham': 514,
'defying': 515,
'toning': 516,
'packed': 517,
'innovated': 518,
'clot': 519,
'rosarios': 520,
'arbore': 521,
'terror': 522,
'sticks': 523,
'fraudulence': 524,
'subsided': 525,
'regretful': 526,
'snipering': 527,
'deficating': 528,
'rampages': 529,
'blackbird': 530,
'howser': 531,
'naismith': 532,
'kornbluths': 533,
'somersaulted': 534,
'sensationalistic': 535,
'qualitatively': 536,
'contrasting': 537,
'crapdom': 538,
'chichi': 539,
'phi': 540,
'shrunken': 542,
'duh': 543,
'materialistic': 544,
'winstons': 545,
'mohd': 546,
'warters': 547,
'husbandgino': 548,
'benno': 549,
'clomps': 550,
'arros': 551,
'reefer': 552,
'hilarius': 553,
'attemps': 554,
'smirky': 555,
'trivialized': 557,
'thirds': 558,
'neva': 559,
'tykes': 560,
'dyeing': 561,
'excelent': 562,
'crest': 563,
'essential': 564,
'aloft': 565,
'matchbox': 566,
'befores': 567,
'mpaarated': 568,
'thinly': 569,
'vaudevillian': 570,
'cineliterate': 571,
'leaving': 572,
'burlinson': 573,
'muzzy': 574,
'oddly': 575,
'proper': 576,
'airphone': 577,
'mujde': 578,
'client': 579,
'imaginatively': 580,
'coworkers': 582,
'suspicious': 583,
'cutlet': 584,
'barley': 585,
'sparring': 586,
'francine': 587,
'gwilym': 588,
'insanities': 589,
'blank': 590,
'farly': 591,
'groove': 592,
'fallowing': 593,
'ogre': 594,
'mightiest': 595,
'telling': 596,
'interracial': 597,
'undo': 598,
'prosthetic': 599,
'outmatched': 600,
'operating': 601,
'fill': 602,
'goldthwait': 603,
'trappings': 604,
'govida': 605,
'coral': 606,
'masssacre': 607,
'stooges': 608,
'simulated': 609,
'kik': 610,
'culminates': 611,
'occaisionally': 612,
'mme': 613,
'unluckily': 614,
'hometown': 615,
'irishman': 616,
'jymn': 617,
'devouring': 618,
'reprimanded': 619,
'dealings': 620,
'belaboured': 621,
'denmark': 622,
'kooks': 623,
'maman': 624,
'attain': 625,
'ahhhhhh': 626,
'demolishing': 627,
'bdus': 628,
'proficient': 630,
'friedrich': 631,
'kaczorowski': 632,
'catboy': 633,
'kazakh': 634,
'takechi': 635,
'inexhaustible': 636,
'bragg': 637,
'verikoan': 638,
'soh': 639,
'cletus': 640,
'fugue': 641,
'carteloise': 642,
'visayans': 643,
'microwaving': 644,
'diepardieu': 645,
'iyer': 646,
'outmoded': 647,
'partha': 648,
'firework': 649,
'spores': 650,
'clack': 651,
'kimberley': 652,
'capraesque': 653,
'elmes': 654,
'kaye': 655,
'conceptions': 656,
'compiled': 657,
'tastic': 658,
'zizek': 659,
'distribution': 660,
'rajasthani': 661,
'kak': 662,
'waaaay': 663,
'mousy': 664,
'martnez': 665,
'tollywood': 666,
'suggestively': 667,
'phase': 668,
'trios': 669,
'tripped': 670,
'rombero': 671,
'jianxiang': 672,
'hyser': 673,
'stumps': 674,
'butlers': 675,
'vaughan': 676,
'indra': 677,
'fairmindedness': 678,
'unshaven': 679,
'idiotically': 680,
'rudolf': 681,
'circulate': 682,
'titantic': 684,
'wallop': 685,
'christo': 686,
'imprisonment': 687,
'actively': 688,
'westernisation': 689,
'personalize': 690,
'enraging': 691,
'impersonating': 692,
'benson': 693,
'daghang': 694,
'fork': 695,
'eventide': 696,
'convinced': 697,
'haughtiness': 698,
'underclothing': 699,
'idyllic': 700,
'pragmatism': 701,
'reporter': 702,
'slowish': 703,
'sanjeev': 704,
'diagnosis': 705,
'diamantino': 706,
'overdue': 707,
'patriarchal': 708,
'intros': 709,
'byu': 710,
'frisky': 711,
'tum': 712,
'silhouetted': 713,
'cruelity': 714,
'cannibal': 715,
'cule': 716,
'failure': 717,
'darts': 718,
'seminar': 719,
'pret': 720,
'coleridge': 721,
'sourpuss': 722,
'buccaneer': 723,
'photowise': 724,
'redundancies': 725,
'critisim': 726,
'arielle': 727,
'furtive': 728,
'atlantians': 729,
'kwok': 730,
'mccain': 731,
'costar': 732,
'sleaziest': 733,
'reaally': 734,
'repugnancy': 735,
'celery': 736,
'streamlining': 737,
'basra': 738,
'virtuous': 739,
'democrats': 740,
'brazilian': 741,
'inanely': 742,
'cranial': 743,
'thrice': 744,
'artiest': 745,
'expose': 746,
'hackenstein': 747,
'nuns': 748,
'garda': 749,
'savalas': 750,
'debts': 751,
'replicated': 752,
'hotwired': 753,
'trolls': 754,
'antiwar': 755,
'mcallister': 756,
'appalachia': 757,
'dimes': 758,
'steretyped': 759,
'rukh': 760,
'tramps': 761,
'impulses': 762,
'collaborator': 763,
'exeption': 764,
'hms': 765,
'wolsky': 766,
'terrorizer': 767,
'roflmao': 768,
'barrio': 769,
'rantzen': 770,
'kaufmann': 771,
'arms': 772,
'telkovsky': 773,
'estes': 774,
'clearer': 775,
'vachtangi': 776,
'rougher': 777,
'mikuni': 778,
'zinemann': 779,
'unizhennye': 780,
'gothas': 781,
'governmentmedia': 782,
'lis': 783,
'affable': 784,
'unfotunately': 785,
'wieder': 786,
'delane': 787,
'achievable': 788,
'spinsterish': 789,
'clytemnestra': 790,
'wichita': 791,
'textbook': 792,
'regrets': 793,
'gosha': 794,
'clement': 795,
'wiggly': 796,
'salle': 797,
'derboiler': 798,
'fraculater': 800,
'directors': 801,
'tugging': 802,
'stuhr': 803,
'revelling': 804,
'bedlam': 805,
'fanaticism': 806,
'keyser': 807,
'pests': 808,
'joey': 809,
'sleepless': 810,
'ruggia': 811,
'watkins': 812,
'quotes': 814,
'centralized': 815,
'publicists': 816,
'marshal': 817,
'pat': 820,
'nonprofessional': 822,
'puny': 823,
'developping': 824,
'huey': 825,
'morrisette': 826,
'waldomiro': 827,
'auditioning': 828,
'eastwoods': 829,
'counterweight': 830,
'metamorphis': 831,
'attanborough': 832,
'torre': 834,
'fraidy': 835,
'piercings': 836,
'superwonderscope': 837,
'nietszche': 838,
'sione': 839,
'beggining': 840,
'rotne': 841,
'indomitability': 842,
'atley': 843,
'molnar': 844,
'fruits': 845,
'greeter': 846,
'recompense': 847,
'tannhauser': 849,
'cats': 850,
'goriness': 851,
'hirjee': 852,
'clockers': 853,
'scums': 854,
'extort': 855,
'sets': 856,
'brooked': 857,
'charley': 858,
'dissing': 859,
'paraphernalia': 860,
'belisario': 861,
'ververgaert': 862,
'bonet': 863,
'toly': 864,
'raggedys': 865,
'chuck': 866,
'saxophonists': 867,
'sulfurous': 868,
'carrion': 869,
'fangorn': 870,
'haige': 871,
'bambaiya': 872,
'rentar': 873,
'raptus': 874,
'lupa': 875,
'mordant': 876,
'chestnuts': 877,
'methodology': 878,
'synchronicity': 879,
'lbs': 880,
'mutilating': 881,
'fellatio': 882,
'zapar': 883,
'apparel': 884,
'descendant': 885,
'delaware': 886,
'proof': 887,
'combatant': 888,
'oozed': 889,
'unbelieveable': 890,
'bliep': 892,
'speared': 893,
'smelling': 894,
'soviet': 895,
'strings': 896,
'keen': 897,
'picturization': 898,
'curits': 899,
'explosive': 901,
'rosa': 902,
'regales': 903,
'blackgood': 904,
'prosy': 905,
'brocoli': 907,
'snickers': 908,
'benussi': 909,
'propagandist': 910,
'castle': 911,
'hayseed': 912,
'stretchs': 913,
'fatherland': 915,
'makeup': 916,
'aldiss': 917,
'inverts': 918,
'outward': 919,
'looking': 920,
'lutz': 921,
'huitieme': 922,
'cds': 923,
'whispers': 924,
'inconsequential': 925,
'substantiate': 926,
'klembecker': 927,
'fluctuates': 928,
'lamented': 929,
'rides': 930,
'trustees': 931,
'omarosa': 932,
'poliwhirl': 933,
'mothballed': 934,
'femi': 935,
'dinged': 936,
'casio': 937,
'nighty': 938,
'espionage': 939,
'golgo': 940,
'commonality': 941,
'bodysuckers': 942,
'semester': 943,
'unnaturally': 944,
'surging': 945,
'havana': 946,
'classicists': 947,
'chimps': 948,
'rusting': 949,
'sooni': 950,
'gish': 951,
'strickland': 952,
'unctuous': 953,
'quarreled': 954,
'expands': 955,
'zeffirelli': 956,
'inarguably': 957,
'blackploitation': 958,
'manhattanites': 959,
'summing': 960,
'absolutly': 961,
'galvanize': 962,
'clerks': 963,
'insidiously': 964,
'empt': 965,
'brewery': 966,
'steph': 967,
'batali': 968,
'coulouris': 969,
'arena': 970,
'turkish': 971,
'undercooked': 972,
'juveniles': 973,
'hopes': 974,
'departs': 975,
'jima': 976,
'burgendy': 977,
'mbongeni': 978,
'gazillion': 979,
'calicos': 980,
'oaks': 981,
'wrestled': 982,
'puling': 983,
'trixie': 984,
'kalashnikov': 985,
'strangeness': 986,
'cots': 987,
'populated': 988,
'thespic': 989,
'mache': 990,
'daubeney': 991,
'steaming': 992,
'parmistan': 993,
'waaaaaay': 994,
'misbehaves': 995,
'local': 996,
'resent': 997,
'massacred': 998,
'trifling': 999,
...}

``````
``````

In [22]:

def update_input_layer(review):

global layer_0

# clear out previous state, reset the layer to be all 0s
layer_0 *= 0
for word in review.split(" "):
layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

``````
``````

In [23]:

layer_0

``````
``````

Out[23]:

array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

``````
``````

In [24]:

def get_target_for_label(label):
if(label == 'POSITIVE'):
return 1
else:
return 0

``````
``````

In [25]:

labels[0]

``````
``````

Out[25]:

'POSITIVE'

``````
``````

In [26]:

get_target_for_label(labels[0])

``````
``````

Out[26]:

1

``````
``````

In [27]:

labels[1]

``````
``````

Out[27]:

'NEGATIVE'

``````
``````

In [28]:

get_target_for_label(labels[1])

``````
``````

Out[28]:

0

``````

# Project 3: Building a Neural Network

• Start with your neural network from the last chapter
• 3 layer neural network
• no non-linearity in hidden layer
• use our functions to create the training data
• create a "pre_process_data" function to create vocabulary for our training data generating functions
• modify "train" to train over the entire corpus

### Where to Get Help if You Need it

``````

In [29]:

import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):

# set our random number generator
np.random.seed(1)

self.pre_process_data(reviews, labels)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self, reviews, labels):

review_vocab = set()
for review in reviews:
for word in review.split(" "):
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
if(word in self.word2index.keys()):
self.layer_0[0][self.word2index[word]] += 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def train(self, training_reviews, training_labels):

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer
self.update_input_layer(review)

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# TODO: Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# TODO: Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# TODO: Update the weights
self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

if(np.abs(layer_2_error) < 0.5):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
if(i % 2500 == 0):
print("")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer
self.update_input_layer(review.lower())

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

if(layer_2[0] > 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [87]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

``````
``````

In [61]:

# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):587.5% #Correct:500 #Tested:1000 Testing Accuracy:50.0%

``````
``````

In [62]:

# train the network
mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):89.58 #Correct:1250 #Trained:2501 Training Accuracy:49.9%
Progress:20.8% Speed(reviews/sec):95.03 #Correct:2500 #Trained:5001 Training Accuracy:49.9%
Progress:27.4% Speed(reviews/sec):95.46 #Correct:3295 #Trained:6592 Training Accuracy:49.9%

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
117             # TODO: Update the weights
118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
120
121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt:

``````
``````

In [63]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.01)

``````
``````

In [64]:

# train the network
mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):96.39 #Correct:1247 #Trained:2501 Training Accuracy:49.8%
Progress:20.8% Speed(reviews/sec):99.31 #Correct:2497 #Trained:5001 Training Accuracy:49.9%
Progress:22.8% Speed(reviews/sec):99.02 #Correct:2735 #Trained:5476 Training Accuracy:49.9%

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
117             # TODO: Update the weights
118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
120
121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt:

``````
``````

In [65]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.001)

``````
``````

In [66]:

# train the network
mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):98.77 #Correct:1267 #Trained:2501 Training Accuracy:50.6%
Progress:20.8% Speed(reviews/sec):98.79 #Correct:2640 #Trained:5001 Training Accuracy:52.7%
Progress:31.2% Speed(reviews/sec):98.58 #Correct:4109 #Trained:7501 Training Accuracy:54.7%
Progress:41.6% Speed(reviews/sec):93.78 #Correct:5638 #Trained:10001 Training Accuracy:56.3%
Progress:52.0% Speed(reviews/sec):91.76 #Correct:7246 #Trained:12501 Training Accuracy:57.9%
Progress:62.5% Speed(reviews/sec):92.42 #Correct:8841 #Trained:15001 Training Accuracy:58.9%
Progress:69.4% Speed(reviews/sec):92.58 #Correct:9934 #Trained:16668 Training Accuracy:59.5%

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
1 # train the network
----> 2 mlp.train(reviews[:-1000],labels[:-1000])

<ipython-input-59-6334c4ec4642> in train(self, training_reviews, training_labels)
117             # TODO: Update the weights
118             self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
--> 119             self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step
120
121             if(np.abs(layer_2_error) < 0.5):

KeyboardInterrupt:

``````

# Understanding Neural Noise

``````

In [67]:

from IPython.display import Image
Image(filename='sentiment_network.png')

``````
``````

Out[67]:

``````
``````

In [70]:

def update_input_layer(review):

global layer_0

# clear out previous state, reset the layer to be all 0s
layer_0 *= 0
for word in review.split(" "):
layer_0[0][word2index[word]] += 1

update_input_layer(reviews[0])

``````
``````

In [71]:

layer_0

``````
``````

Out[71]:

array([[ 18.,   0.,   0., ...,   0.,   0.,   0.]])

``````
``````

In [79]:

review_counter = Counter()

``````
``````

In [80]:

for word in reviews[0].split(" "):
review_counter[word] += 1

``````
``````

In [81]:

review_counter.most_common()

``````
``````

Out[81]:

[('.', 27),
('', 18),
('the', 9),
('to', 6),
('i', 5),
('high', 5),
('is', 4),
('of', 4),
('a', 4),
('bromwell', 4),
('teachers', 4),
('that', 4),
('their', 2),
('my', 2),
('at', 2),
('as', 2),
('me', 2),
('in', 2),
('students', 2),
('it', 2),
('student', 2),
('school', 2),
('through', 1),
('insightful', 1),
('ran', 1),
('years', 1),
('here', 1),
('episode', 1),
('reality', 1),
('what', 1),
('far', 1),
('t', 1),
('saw', 1),
('s', 1),
('repeatedly', 1),
('isn', 1),
('closer', 1),
('and', 1),
('fetched', 1),
('remind', 1),
('can', 1),
('welcome', 1),
('line', 1),
('your', 1),
('survive', 1),
('teaching', 1),
('satire', 1),
('classic', 1),
('who', 1),
('age', 1),
('knew', 1),
('schools', 1),
('inspector', 1),
('comedy', 1),
('down', 1),
('pity', 1),
('m', 1),
('all', 1),
('see', 1),
('think', 1),
('situation', 1),
('time', 1),
('pomp', 1),
('other', 1),
('much', 1),
('many', 1),
('which', 1),
('one', 1),
('profession', 1),
('programs', 1),
('same', 1),
('some', 1),
('such', 1),
('pettiness', 1),
('immediately', 1),
('expect', 1),
('financially', 1),
('recalled', 1),
('tried', 1),
('whole', 1),
('right', 1),
('life', 1),
('cartoon', 1),
('scramble', 1),
('sack', 1),
('believe', 1),
('when', 1),
('than', 1),
('burn', 1),
('pathetic', 1)]

``````

# Project 4: Reducing Noise in our Input Data

``````

In [82]:

import time
import sys
import numpy as np

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):

# set our random number generator
np.random.seed(1)

self.pre_process_data(reviews, labels)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self, reviews, labels):

review_vocab = set()
for review in reviews:
for word in review.split(" "):
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
if(word in self.word2index.keys()):
self.layer_0[0][self.word2index[word]] = 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def train(self, training_reviews, training_labels):

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer
self.update_input_layer(review)

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# TODO: Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# TODO: Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# TODO: Update the weights
self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

if(np.abs(layer_2_error) < 0.5):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
if(i % 2500 == 0):
print("")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer
self.update_input_layer(review.lower())

# Hidden layer
layer_1 = self.layer_0.dot(self.weights_0_1)

# Output layer
layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))

if(layer_2[0] > 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [83]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

``````
``````

In [84]:

mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:0.0% Speed(reviews/sec):0.0 #Correct:0 #Trained:1 Training Accuracy:0.0%
Progress:10.4% Speed(reviews/sec):91.50 #Correct:1795 #Trained:2501 Training Accuracy:71.7%
Progress:20.8% Speed(reviews/sec):95.25 #Correct:3811 #Trained:5001 Training Accuracy:76.2%
Progress:31.2% Speed(reviews/sec):93.74 #Correct:5898 #Trained:7501 Training Accuracy:78.6%
Progress:41.6% Speed(reviews/sec):93.69 #Correct:8042 #Trained:10001 Training Accuracy:80.4%
Progress:52.0% Speed(reviews/sec):95.27 #Correct:10186 #Trained:12501 Training Accuracy:81.4%
Progress:62.5% Speed(reviews/sec):98.19 #Correct:12317 #Trained:15001 Training Accuracy:82.1%
Progress:72.9% Speed(reviews/sec):98.56 #Correct:14440 #Trained:17501 Training Accuracy:82.5%
Progress:83.3% Speed(reviews/sec):99.74 #Correct:16613 #Trained:20001 Training Accuracy:83.0%
Progress:93.7% Speed(reviews/sec):100.7 #Correct:18794 #Trained:22501 Training Accuracy:83.5%
Progress:99.9% Speed(reviews/sec):101.9 #Correct:20115 #Trained:24000 Training Accuracy:83.8%

``````
``````

In [85]:

# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):832.7% #Correct:851 #Tested:1000 Testing Accuracy:85.1%

``````

# Analyzing Inefficiencies in our Network

``````

In [88]:

Image(filename='sentiment_network_sparse.png')

``````
``````

Out[88]:

``````
``````

In [89]:

layer_0 = np.zeros(10)

``````
``````

In [90]:

layer_0

``````
``````

Out[90]:

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

``````
``````

In [91]:

layer_0[4] = 1
layer_0[9] = 1

``````
``````

In [92]:

layer_0

``````
``````

Out[92]:

array([ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.])

``````
``````

In [93]:

weights_0_1 = np.random.randn(10,5)

``````
``````

In [94]:

layer_0.dot(weights_0_1)

``````
``````

Out[94]:

array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

``````
``````

In [101]:

indices = [4,9]

``````
``````

In [102]:

layer_1 = np.zeros(5)

``````
``````

In [103]:

for index in indices:
layer_1 += (weights_0_1[index])

``````
``````

In [104]:

layer_1

``````
``````

Out[104]:

array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

``````
``````

In [100]:

Image(filename='sentiment_network_sparse_2.png')

``````
``````

Out[100]:

``````

# Project 5: Making our Network More Efficient

``````

In [30]:

import time
import sys

# Let's tweak our network from before to model these phenomena
class SentimentNetwork:
def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):

np.random.seed(1)

self.pre_process_data(reviews)

self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

def pre_process_data(self,reviews):

review_vocab = set()
for review in reviews:
for word in review.split(" "):
self.review_vocab = list(review_vocab)

label_vocab = set()
for label in labels:

self.label_vocab = list(label_vocab)

self.review_vocab_size = len(self.review_vocab)
self.label_vocab_size = len(self.label_vocab)

self.word2index = {}
for i, word in enumerate(self.review_vocab):
self.word2index[word] = i

self.label2index = {}
for i, label in enumerate(self.label_vocab):
self.label2index[label] = i

def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
(self.hidden_nodes, self.output_nodes))

self.learning_rate = learning_rate

self.layer_0 = np.zeros((1,input_nodes))
self.layer_1 = np.zeros((1,hidden_nodes))

def sigmoid(self,x):
return 1 / (1 + np.exp(-x))

def sigmoid_output_2_derivative(self,output):
return output * (1 - output)

def update_input_layer(self,review):

# clear out previous state, reset the layer to be all 0s
self.layer_0 *= 0
for word in review.split(" "):
self.layer_0[0][self.word2index[word]] = 1

def get_target_for_label(self,label):
if(label == 'POSITIVE'):
return 1
else:
return 0

def train(self, training_reviews_raw, training_labels):

training_reviews = list()
for review in training_reviews_raw:
indices = set()
for word in review.split(" "):
if(word in self.word2index.keys()):
training_reviews.append(list(indices))

assert(len(training_reviews) == len(training_labels))

correct_so_far = 0

start = time.time()

for i in range(len(training_reviews)):

review = training_reviews[i]
label = training_labels[i]

#### Implement the forward pass here ####
### Forward pass ###

# Input Layer

# Hidden layer
#             layer_1 = self.layer_0.dot(self.weights_0_1)
self.layer_1 *= 0
for index in review:
self.layer_1 += self.weights_0_1[index]

# Output layer
layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

#### Implement the backward pass here ####
### Backward pass ###

# Output error
layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

# Backpropagated error
layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

# Update the weights
self.weights_1_2 -= self.layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step

for index in review:
self.weights_0_1[index] -= layer_1_delta[0] * self.learning_rate # update input-to-hidden weights with gradient descent step

if(np.abs(layer_2_error) < 0.5):
correct_so_far += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")

def test(self, testing_reviews, testing_labels):

correct = 0

start = time.time()

for i in range(len(testing_reviews)):
pred = self.run(testing_reviews[i])
if(pred == testing_labels[i]):
correct += 1

reviews_per_second = i / float(time.time() - start)

sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
+ "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
+ "% #Correct:" + str(correct) + " #Tested:" + str(i+1) + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

def run(self, review):

# Input Layer

# Hidden layer
self.layer_1 *= 0
unique_indices = set()
for word in review.lower().split(" "):
if word in self.word2index.keys():
for index in unique_indices:
self.layer_1 += self.weights_0_1[index]

# Output layer
layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

if(layer_2[0] > 0.5):
return "POSITIVE"
else:
return "NEGATIVE"

``````
``````

In [31]:

mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

``````
``````

In [32]:

mlp.train(reviews[:-1000],labels[:-1000])

``````
``````

Progress:99.9% Speed(reviews/sec):964.0 #Correct:20076 #Trained:24000 Training Accuracy:83.6%

``````
``````

In [33]:

# evaluate our model before training (just to show how horrible it is)
mlp.test(reviews[-1000:],labels[-1000:])

``````
``````

Progress:99.9% Speed(reviews/sec):1201.% #Correct:851 #Tested:1000 Testing Accuracy:85.1%

``````
``````

In [ ]:

``````