Algorithms Exercise 1

Imports


In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np

Word counting

Write a function tokenize that takes a string of English text returns a list of words. It should also remove stop words, which are common short words that are often removed before natural language processing. Your function should have the following logic:

  • Split the string into lines using splitlines.
  • Split each line into a list of words and merge the lists for each line.
  • Use Python's builtin filter function to remove all punctuation.
  • If stop_words is a list, remove all occurences of the words in the list.
  • If stop_words is a space delimeted string of words, split them and remove them.
  • Remove any remaining empty words.
  • Make all words lowercase.

In [2]:
def tokenize(s, stop_words=None, punctuation='`~!@#$%^&*()_-+={[}]|\:;"<,>.?/}\t'):
    """Split a string into a list of words, removing punctuation and stop words."""
    low = []
    t = s.splitlines()
    for i in range(len(t)):
        u = t[i].split()
        for j in range(len(u)):
            low.append(u[j])                        # Turns multi-lined string into neat list
    no_more_punc = []
    for k in range(len(low)):
        no_more_punc.append(''.join(list(filter(lambda x: not x in punctuation, low[k]))))      # Removes punctuation
    if type(stop_words)==list:
        no_more_stops = list(filter(lambda x: not x in stop_words, no_more_punc))               # Removes stop words
    elif type(stop_words)==str:
        no_more_stops = list(filter(lambda x: not x in stop_words.split(), no_more_punc))       # for different input types
    elif stop_words==None:
        no_more_stops = no_more_punc
    no_gaps = list(filter(lambda x: not x=='', no_more_stops))                                  # Removes empty strings
    all_low = []
    for l in range(len(no_gaps)):
        all_low.append(no_gaps[l].lower())                                                      # Removes any capitalization
    return all_low

In [3]:
tokenize("This, is the way; that things will end", stop_words=['the', 'is'])


Out[3]:
['this', 'way', 'that', 'things', 'will', 'end']

In [4]:
assert tokenize("This, is the way; that things will end", stop_words=['the', 'is']) == \
    ['this', 'way', 'that', 'things', 'will', 'end']
wasteland = """
APRIL is the cruellest month, breeding
Lilacs out of the dead land, mixing
Memory and desire, stirring
Dull roots with spring rain.
"""

assert tokenize(wasteland, stop_words='is the of and') == \
    ['april','cruellest','month','breeding','lilacs','out','dead','land',
     'mixing','memory','desire','stirring','dull','roots','with','spring',
     'rain']

Write a function count_words that takes a list of words and returns a dictionary where the keys in the dictionary are the unique words in the list and the values are the word counts.


In [5]:
p = [1,2,3]
q = ['one','two','three']
pq = []
for i in range(len(p)):
    pq.append((p[i],q[i]))
d = dict(pq)
d


Out[5]:
{1: 'one', 2: 'two', 3: 'three'}

In [6]:
def count_words(data):
    """Return a word count dictionary from the list of words in data."""
    word = []
    count = []
    for i in range(len(data)):
        if not data[i] in word:
            word.append(data[i])
            count.append(1)
        elif data[i] in word:
            for j in range(len(word)):
                if word[j]==data[i]:
                    count[j]+=1
    wc = []
    for j in range(len(word)):
        wc.append((word[j],count[j]))
    wcd = dict(wc)
    return wcd

In [7]:
count_words(tokenize('this and the this from and a a a'))


Out[7]:
{'a': 3, 'and': 2, 'from': 1, 'the': 1, 'this': 2}

In [8]:
assert count_words(tokenize('this and the this from and a a a')) == \
    {'a': 3, 'and': 2, 'from': 1, 'the': 1, 'this': 2}

Write a function sort_word_counts that return a list of sorted word counts:

  • Each element of the list should be a (word, count) tuple.
  • The list should be sorted by the word counts, with the higest counts coming first.
  • To perform this sort, look at using the sorted function with a custom key and reverse argument.

In [9]:
wiggity = [(1,'one'),(3,'three'),(2,'two')]
sorted(wiggity, key=lambda x: x[0], reverse=True)


Out[9]:
[(3, 'three'), (2, 'two'), (1, 'one')]

In [10]:
boogity = {1:2,2:3,3:4}
list(iter(boogity)),boogity[1]


Out[10]:
([1, 2, 3], 2)

In [11]:
def sort_word_counts(wc):
    """Return a list of 2-tuples of (word, count), sorted by count descending."""
    word = list(iter(wc))
    count = []
    for i in word:
        count.append(wc[i])
    tups = []
    for j in range(len(word)):
        tups.append((word[j],count[j]))
    return sorted(tups, key=lambda x: x[1], reverse=True)

In [12]:
sort_word_counts(count_words(tokenize('this and a the this this and a a a')))


Out[12]:
[('a', 4), ('this', 3), ('and', 2), ('the', 1)]

In [13]:
assert sort_word_counts(count_words(tokenize('this and a the this this and a a a'))) == \
    [('a', 4), ('this', 3), ('and', 2), ('the', 1)]

Perform a word count analysis on Chapter 1 of Moby Dick, whose text can be found in the file mobydick_chapter1.txt:

  • Read the file into a string.
  • Tokenize with stop words of 'the of and a to in is it that as'.
  • Perform a word count, the sort and save the result in a variable named swc.

In [14]:
with open('mobydick_chapter1.txt', 'r') as f:
    read_text = f.read()
f.closed


Out[14]:
True

In [15]:
swc = sort_word_counts(count_words(tokenize(read_text, 'the of and a to in is it that as')))

In [16]:
assert swc[0]==('i',43)
assert len(swc)==848

Create a "Cleveland Style" dotplot of the counts of the top 50 words using Matplotlib. If you don't know what a dotplot is, you will have to do some research...


In [17]:
swc


Out[17]:
[('i', 43),
 ('me', 24),
 ('you', 23),
 ('all', 23),
 ('this', 17),
 ('for', 16),
 ('but', 15),
 ('there', 15),
 ('my', 14),
 ('with', 13),
 ('on', 12),
 ('they', 12),
 ('go', 12),
 ('not', 11),
 ('from', 11),
 ('some', 11),
 ('or', 10),
 ('sea', 10),
 ('one', 10),
 ('his', 10),
 ('into', 9),
 ('be', 9),
 ('upon', 9),
 ('if', 9),
 ('he', 9),
 ('by', 8),
 ('have', 8),
 ('was', 8),
 ('part', 7),
 ('why', 7),
 ('and', 7),
 ('do', 7),
 ('were', 7),
 ('what', 7),
 ('it', 6),
 ('about', 6),
 ('like', 6),
 ('more', 6),
 ('can', 6),
 ('voyage', 6),
 ('take', 6),
 ('get', 6),
 ('water', 6),
 ('land', 6),
 ('time', 6),
 ('old', 6),
 ('down', 6),
 ('will', 6),
 ('your', 6),
 ('at', 5),
 ('are', 5),
 ('did', 5),
 ('the', 5),
 ('here', 5),
 ('whenever', 5),
 ('no', 5),
 ('when', 5),
 ('see', 5),
 ('sailor', 5),
 ('most', 5),
 ('same', 5),
 ('ever', 5),
 ('whaling', 5),
 ('who', 5),
 ('other', 5),
 ('such', 5),
 ('now', 5),
 ('must', 4),
 ('being', 4),
 ('an', 4),
 ('than', 4),
 ('them', 4),
 ('men', 4),
 ('because', 4),
 ('up', 4),
 ('those', 4),
 ('little', 4),
 ('then', 4),
 ('so', 4),
 ('way', 4),
 ('passenger', 4),
 ('money', 4),
 ('every', 4),
 ('am', 4),
 ('which', 4),
 ('tell', 4),
 ('though', 4),
 ('first', 4),
 ('never', 4),
 ('these', 4),
 ('their', 4),
 ('much', 4),
 ('things', 4),
 ('two', 4),
 ('high', 4),
 ('passengers', 3),
 ('winds', 3),
 ('head', 3),
 ('come', 3),
 ('man', 3),
 ('image', 3),
 ('nothing', 3),
 ('out', 3),
 ('would', 3),
 ('whale', 3),
 ('how', 3),
 ('something', 3),
 ('myself', 3),
 ('purse', 3),
 ('may', 3),
 ('we', 3),
 ('almost', 3),
 ('say', 3),
 ('him', 3),
 ('world', 3),
 ('broiled', 3),
 ('parts', 3),
 ('soul', 3),
 ('paying', 3),
 ('ships', 3),
 ('grand', 3),
 ('great', 3),
 ('yet', 3),
 ('going', 3),
 ('before', 3),
 ('forecastle', 3),
 ('sort', 3),
 ('should', 3),
 ('right', 3),
 ('without', 3),
 ('hand', 3),
 ('set', 3),
 ('still', 3),
 ('stand', 3),
 ('always', 2),
 ('metaphysical', 2),
 ('look', 2),
 ('round', 2),
 ('cook', 2),
 ('leaves', 2),
 ('particular', 2),
 ('scores', 2),
 ('requires', 2),
 ('aloft', 2),
 ('could', 2),
 ('plunged', 2),
 ('make', 2),
 ('among', 2),
 ('phantom', 2),
 ('pay', 2),
 ('themselves', 2),
 ('crowds', 2),
 ('country', 2),
 ('leaders', 2),
 ('find', 2),
 ('chief', 2),
 ('under', 2),
 ('city', 2),
 ('fixed', 2),
 ('its', 2),
 ('think', 2),
 ('point', 2),
 ('content', 2),
 ('once', 2),
 ('account', 2),
 ('let', 2),
 ('where', 2),
 ('seas', 2),
 ('healthy', 2),
 ('warehouses', 2),
 ('long', 2),
 ('well', 2),
 ('glory', 2),
 ('ocean', 2),
 ('off', 2),
 ("shepherd's", 2),
 ('in', 2),
 ('thinks', 2),
 ('better', 2),
 ('thousands', 2),
 ('wild', 2),
 ('ship', 2),
 ('sail', 2),
 ('himself', 2),
 ('unless', 2),
 ('strong', 2),
 ('hunks', 2),
 ('previous', 2),
 ('officer', 2),
 ('just', 2),
 ('thousand', 2),
 ('fates', 2),
 ('schoolmaster', 2),
 ('care', 2),
 ('commodore', 2),
 ('does', 2),
 ('stream', 2),
 ('spar', 2),
 ('over', 2),
 ('meadow', 2),
 ('motives', 2),
 ('between', 2),
 ('exactly', 2),
 ('thump', 2),
 ('sight', 2),
 ('distant', 2),
 ('cannot', 2),
 ('lead', 2),
 ('ourselves', 2),
 ('respectfully', 2),
 ('besides', 2),
 ('air', 2),
 ('mean', 2),
 ('own', 2),
 ('each', 2),
 ('begin', 2),
 ('any', 2),
 ('sleep', 2),
 ('ishmael', 2),
 ('miles', 2),
 ('yonder', 2),
 ('streets', 2),
 ('been', 2),
 ('meaning', 2),
 ('magic', 2),
 ('perhaps', 2),
 ('order', 2),
 ('robust', 2),
 ('else', 2),
 ('quietly', 1),
 ('judgment', 1),
 ('enjoy', 1),
 ('orchard', 1),
 ('lungs', 1),
 ('performing', 1),
 ('randolphs', 1),
 ('speak', 1),
 ('lakes', 1),
 ('ills', 1),
 ('sleepy', 1),
 ('royal', 1),
 ('leaning', 1),
 ('rockaway', 1),
 ('wade', 1),
 ('orders', 1),
 ('swung', 1),
 ('whether', 1),
 ('cooled', 1),
 ('tennessee', 1),
 ('terms', 1),
 ('lee', 1),
 ('thought', 1),
 ('employs', 1),
 ('trouble', 1),
 ('breezes', 1),
 ('oceans', 1),
 ('pistol', 1),
 ('captain', 1),
 ('huge', 1),
 ('masthead', 1),
 ('trials', 1),
 ('taking', 1),
 ('growing', 1),
 ('astern', 1),
 ('hypos', 1),
 ('give', 1),
 ('east', 1),
 ('welcome', 1),
 ('wharves', 1),
 ('ball', 1),
 ('street', 1),
 ('answer', 1),
 ('circumstances', 1),
 ('mummies', 1),
 ('cunningly', 1),
 ('benches', 1),
 ('lording', 1),
 ('programme', 1),
 ('really', 1),
 ('seacaptain', 1),
 ('mystical', 1),
 ('shabby', 1),
 ('barques', 1),
 ('brother', 1),
 ('lath', 1),
 ('sounds', 1),
 ('pure', 1),
 ('recall', 1),
 ('unpleasant', 1),
 ('loitering', 1),
 ('various', 1),
 ('constant', 1),
 ('quick', 1),
 ('lodges', 1),
 ('archangel', 1),
 ('few', 1),
 ('cheerfully', 1),
 ('funeral', 1),
 ('honourable', 1),
 ('unite', 1),
 ('undeliverable', 1),
 ('commonalty', 1),
 ('flourish', 1),
 ('shakes', 1),
 ('poet', 1),
 ('surrounds', 1),
 ('river', 1),
 ('smoke', 1),
 ('us', 1),
 ('ignoring', 1),
 ('dreamiest', 1),
 ('battery', 1),
 ('deepest', 1),
 ('washed', 1),
 ('knocking', 1),
 ('fowl', 1),
 ('bakehouses', 1),
 ('wayhe', 1),
 ('heard', 1),
 ('consign', 1),
 ('root', 1),
 ('particularly', 1),
 ('considerable', 1),
 ('rigging', 1),
 ('deeper', 1),
 ('standmiles', 1),
 ('making', 1),
 ('soon', 1),
 ('portentous', 1),
 ('promptly', 1),
 ('eyes', 1),
 ('jump', 1),
 ('hours', 1),
 ('magnificent', 1),
 ('roused', 1),
 ('coasts', 1),
 ('overwhelming', 1),
 ('pyramids', 1),
 ('damp', 1),
 ('maxim', 1),
 ('tigerlilieswhat', 1),
 ('trees', 1),
 ('noble', 1),
 ('thingno', 1),
 ('itwould', 1),
 ('barbarous', 1),
 ('thus', 1),
 ('came', 1),
 ('testament', 1),
 ('election', 1),
 ('general', 1),
 ('honour', 1),
 ('suffice', 1),
 ('managers', 1),
 ('inducements', 1),
 ('slip', 1),
 ('american', 1),
 ('methodically', 1),
 ('judgmatically', 1),
 ('fountain', 1),
 ('handfuls', 1),
 ('sights', 1),
 ('yes', 1),
 ('finally', 1),
 ('upper', 1),
 ('principle', 1),
 ('habit', 1),
 ('dale', 1),
 ('awe', 1),
 ('offices', 1),
 ('patagonian', 1),
 ('brief', 1),
 ('separate', 1),
 ('ah', 1),
 ('nor', 1),
 ("one's", 1),
 ('silver', 1),
 ('presented', 1),
 ('cataract', 1),
 ('indignity', 1),
 ('looking', 1),
 ('extensive', 1),
 ('thing', 1),
 ('nightsdo', 1),
 ('breathes', 1),
 ('grasp', 1),
 ('marvellous', 1),
 ('pierheads', 1),
 ('circumambulate', 1),
 ('suddenly', 1),
 ('shoulderblades', 1),
 ('seasickgrow', 1),
 ('providence', 1),
 ('shipboardyet', 1),
 ('scales', 1),
 ('absentminded', 1),
 ('themleagues', 1),
 ('mysterious', 1),
 ('amount', 1),
 ('happen', 1),
 ('uncomfortable', 1),
 ('far', 1),
 ('mortal', 1),
 ('good', 1),
 ('shady', 1),
 ('gabriel', 1),
 ('tallest', 1),
 ('agoing', 1),
 ('narcissus', 1),
 ('seemingly', 1),
 ('drop', 1),
 ('key', 1),
 ('grim', 1),
 ('mid', 1),
 ('wholesome', 1),
 ('assure', 1),
 ('pacing', 1),
 ('compasses', 1),
 ('wayeither', 1),
 ('picture', 1),
 ('many', 1),
 ('fancied', 1),
 ('deep', 1),
 ('smelt', 1),
 ('judiciously', 1),
 ('atmosphere', 1),
 ('saw', 1),
 ('boys', 1),
 ('grasshopper', 1),
 ('inmates', 1),
 ('quietest', 1),
 ('prevent', 1),
 ('moral', 1),
 ('legs', 1),
 ('pythagorean', 1),
 ('please', 1),
 ('virtue', 1),
 ('wedded', 1),
 ('experiment', 1),
 ('inlanders', 1),
 ('preciselyhaving', 1),
 ('caravan', 1),
 ('cattle', 1),
 ('june', 1),
 ('sailors', 1),
 ('persians', 1),
 ('philosophical', 1),
 ('delusion', 1),
 ('told', 1),
 ('broiling', 1),
 ('pent', 1),
 ('helped', 1),
 ('farcesthough', 1),
 ('meditation', 1),
 ('indian', 1),
 ('years', 1),
 ('easy', 1),
 ('niagara', 1),
 ('contrary', 1),
 ('hill', 1),
 ('thither', 1),
 ('downtown', 1),
 ('whatsoever', 1),
 ('battle', 1),
 ('bloody', 1),
 ('west', 1),
 ('second', 1),
 ('slave', 1),
 ('woodlands', 1),
 ('trunk', 1),
 ('pinetree', 1),
 ('stoics', 1),
 ('supplied', 1),
 ('believe', 1),
 ('interest', 1),
 ('sleeps', 1),
 ('discriminating', 1),
 ('sabbath', 1),
 ('urbane', 1),
 ('enchanting', 1),
 ('fields', 1),
 ('feelings', 1),
 ('rag', 1),
 ('idea', 1),
 ('desires', 1),
 ('limit', 1),
 ('sense', 1),
 ('visit', 1),
 ('perdition', 1),
 ('sand', 1),
 ('hardicanutes', 1),
 ('tarpot', 1),
 ('drizzly', 1),
 ('reason', 1),
 ('hooded', 1),
 ('very', 1),
 ('touches', 1),
 ('charm', 1),
 ('monied', 1),
 ('salted', 1),
 ('purpose', 1),
 ('life', 1),
 ('waterward', 1),
 ('cajoling', 1),
 ('knowing', 1),
 ('infliction', 1),
 ('itch', 1),
 ('bear', 1),
 ('mild', 1),
 ('whereas', 1),
 ('insular', 1),
 ('left', 1),
 ('days', 1),
 ('paidwhat', 1),
 ('crucifix', 1),
 ('pool', 1),
 ('coffin', 1),
 ('again', 1),
 ('reaching', 1),
 ('van', 1),
 ('reverentially', 1),
 ('grow', 1),
 ('contested', 1),
 ('hazy', 1),
 ('somehow', 1),
 ('mind', 1),
 ('seneca', 1),
 ('carries', 1),
 ('everybody', 1),
 ('stage', 1),
 ('unaccountable', 1),
 ('exercise', 1),
 ('driving', 1),
 ('infallibly', 1),
 ('seaward', 1),
 ('choice', 1),
 ('belted', 1),
 ('ago', 1),
 ('bit', 1),
 ('paid', 1),
 ('island', 1),
 ('bathed', 1),
 ('penny', 1),
 ('perils', 1),
 ('coral', 1),
 ('corlears', 1),
 ("people's", 1),
 ('bound', 1),
 ('around', 1),
 ('hats', 1),
 ('thieves', 1),
 ('sway', 1),
 ('nameless', 1),
 ('gone', 1),
 ('invest', 1),
 ('landscape', 1),
 ('pausing', 1),
 ('induced', 1),
 ('creatures', 1),
 ('single', 1),
 ('salt', 1),
 ('abouthowever', 1),
 ('bulwarks', 1),
 ('extremest', 1),
 ('regulating', 1),
 ('sweep', 1),
 ('poor', 1),
 ('story', 1),
 ('violate', 1),
 ('spleen', 1),
 ('valley', 1),
 ('tranced', 1),
 ('thence', 1),
 ('secretly', 1),
 ('place', 1),
 ('egyptians', 1),
 ('served', 1),
 ('processions', 1),
 ('rivers', 1),
 ('substitute', 1),
 ('dreamy', 1),
 ('rather', 1),
 ('south', 1),
 ('mole', 1),
 ('counters', 1),
 ('influences', 1),
 ('bringing', 1),
 ('doubtless', 1),
 ('ungraspable', 1),
 ('shadiest', 1),
 ('established', 1),
 ('floodgates', 1),
 ('sighs', 1),
 ('seacaptains', 1),
 ('rensselaers', 1),
 ('repeatedly', 1),
 ('friendly', 1),
 ('bulk', 1),
 ("quarrelsomedon't", 1),
 ('town', 1),
 ('abominate', 1),
 ('knows', 1),
 ('against', 1),
 ('activity', 1),
 ('china', 1),
 ('put', 1),
 ("ain't", 1),
 ('saco', 1),
 ('buttered', 1),
 ('mesince', 1),
 ('wish', 1),
 ('cookthough', 1),
 ('lanes', 1),
 ('distinction', 1),
 ('strange', 1),
 ('striving', 1),
 ('travel', 1),
 ('weighed', 1),
 ('formed', 1),
 ('jove', 1),
 ('floated', 1),
 ('seeposted', 1),
 ('artist', 1),
 ('run', 1),
 ('needles', 1),
 ('bill', 1),
 ('her', 1),
 ('tormenting', 1),
 ('wears', 1),
 ('hermit', 1),
 ('wantingwaterthere', 1),
 ('putting', 1),
 ('extreme', 1),
 ('invisible', 1),
 ('rear', 1),
 ('element', 1),
 ('cato', 1),
 ('surveillance', 1),
 ('desert', 1),
 ('athirst', 1),
 ('eye', 1),
 ('kneedeep', 1),
 ('possibly', 1),
 ('buy', 1),
 ('dotings', 1),
 ('drowned', 1),
 ('coenties', 1),
 ('inmost', 1),
 ('meet', 1),
 ('afternoon', 1),
 ('mountains', 1),
 ('rolled', 1),
 ('watery', 1),
 ('receives', 1),
 ('sadly', 1),
 ('surely', 1),
 ('decoction', 1),
 ('boy', 1),
 ('prairies', 1),
 ('whitehall', 1),
 ('degree', 1),
 ('clinched', 1),
 ('springs', 1),
 ('disguises', 1),
 ('watergazers', 1),
 ('hold', 1),
 ('throws', 1),
 ('sentinels', 1),
 ('surf', 1),
 ('call', 1),
 ('states', 1),
 ('freewill', 1),
 ('roasted', 1),
 ('social', 1),
 ('obey', 1),
 ('entailed', 1),
 ('seated', 1),
 ('greeks', 1),
 ('feet', 1),
 ('true', 1),
 ('act', 1),
 ('landsmen', 1),
 ('tribulations', 1),
 ('swayed', 1),
 ('cherish', 1),
 ('idolatrous', 1),
 ('romantic', 1),
 ('peep', 1),
 ('punch', 1),
 ('suspect', 1),
 ('isles', 1),
 ('broom', 1),
 ('stepping', 1),
 ('nailed', 1),
 ('genteel', 1),
 ('short', 1),
 ('spiles', 1),
 ('conceits', 1),
 ('everlasting', 1),
 ('region', 1),
 ('yourself', 1),
 ('path', 1),
 ('nearly', 1),
 ('deity', 1),
 ('monster', 1),
 ('ibis', 1),
 ('wonderworld', 1),
 ('ten', 1),
 ('physical', 1),
 ('schooners', 1),
 ('affghanistan', 1),
 ('forbidden', 1),
 ('waves', 1),
 ('nigh', 1),
 ('after', 1),
 ('mazy', 1),
 ('confess', 1),
 ('overlapping', 1),
 ('remote', 1),
 ('horror', 1),
 ('love', 1),
 ('earthly', 1),
 ('earnestly', 1),
 ('paint', 1),
 ('heaven', 1),
 ('plastertied', 1),
 ('transition', 1),
 ('solo', 1),
 ('of', 1),
 ('police', 1),
 ('deliberately', 1),
 ('abandon', 1),
 ('knew', 1),
 ('falling', 1),
 ('compare', 1),
 ('november', 1),
 ('brigs', 1),
 ('green', 1),
 ('surprising', 1),
 ('gets', 1),
 ('attending', 1),
 ('drawn', 1),
 ('less', 1),
 ('within', 1),
 ('silent', 1),
 ('view', 1),
 ('deck', 1),
 ('inferred', 1),
 ('offthen', 1),
 ('needed', 1),
 ('kind', 1),
 ('hillside', 1),
 ('snow', 1),
 ('considering', 1),
 ('reveriesstand', 1),
 ('tormented', 1),
 ('northward', 1),
 ('unbiased', 1),
 ('pedestrian', 1),
 ('even', 1),
 ('conscious', 1),
 ('coat', 1),
 ('enable', 1),
 ('needs', 1),
 ('simple', 1),
 ('hollow', 1),
 ('holy', 1),
 ('instance', 1),
 ('satisfaction', 1),
 ('open', 1),
 ('plumb', 1),
 ('dogs', 1),
 ('trip', 1),
 ('enter', 1),
 ('universal', 1),
 ('week', 1),
 ('hook', 1),
 ('comedies', 1),
 ('deliberate', 1),
 ('reveries', 1),
 ('passed', 1),
 ('especially', 1),
 ('attract', 1),
 ('dive', 1),
 ('grin', 1),
 ('involuntarily', 1),
 ('try', 1),
 ('has', 1),
 ('towards', 1),
 ('lies', 1),
 ('desks', 1),
 ('presidency', 1),
 ('enough', 1),
 ('keen', 1),
 ('prevalent', 1),
 ('new', 1),
 ('endless', 1),
 ('receiving', 1),
 ('professor', 1),
 ('mast', 1),
 ('horse', 1),
 ('however', 1),
 ('difference', 1),
 ('tragedies', 1),
 ('crazy', 1),
 ('performances', 1),
 ('vain', 1),
 ('quite', 1),
 ('united', 1),
 ('reefscommerce', 1),
 ('manhattoes', 1),
 ('wherefore', 1),
 ('perceive', 1),
 ('cottage', 1),
 ('peppered', 1),
 ('straight', 1),
 ('shore', 1),
 ('marvels', 1),
 ('toils', 1),
 ('merchant', 1),
 ('family', 1),
 ('goes', 1),
 ('anything', 1),
 ('quarterdeck', 1),
 ('vibration', 1),
 ('resulting', 1),
 ('avenuesnorth', 1),
 ('curiosity', 1),
 ('spurs', 1),
 ('hands', 1),
 ('alleys', 1),
 ('sword', 1),
 ('mouth', 1),
 ('jolly', 1),
 ('magnetic', 1),
 ('circulation', 1),
 ('blue', 1),
 ('others', 1),
 ('feel', 1),
 ('respectable', 1),
 ('beach', 1),
 ('fowlsthough', 1),
 ("other's", 1),
 ('rub', 1),
 ('agonever', 1),
 ('having', 1),
 ('interlude', 1),
 ('decks', 1)]

In [18]:
swc[1][1]


Out[18]:
24

In [19]:
x = []
y = []
for i in range(50):
    j = 1
    while j <= swc[i][1]:
        x.append(i)
        y.append(j)
        j+=1
print(x, y)                          # Perfect data for normal dot plot... Oops.


[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 43, 43, 43, 43, 43, 43, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5]

In [20]:
plt.scatter(x,y)


Out[20]:
<matplotlib.collections.PathCollection at 0x7fb6baf92390>

In [21]:
x = []
for i in range(50):
    x.append(swc[i][1])
y = list(range(1,51))[::-1]

In [22]:
x,y


Out[22]:
([43,
  24,
  23,
  23,
  17,
  16,
  15,
  15,
  14,
  13,
  12,
  12,
  12,
  11,
  11,
  11,
  10,
  10,
  10,
  10,
  9,
  9,
  9,
  9,
  9,
  8,
  8,
  8,
  7,
  7,
  7,
  7,
  7,
  7,
  6,
  6,
  6,
  6,
  6,
  6,
  6,
  6,
  6,
  6,
  6,
  6,
  6,
  6,
  6,
  5],
 [50,
  49,
  48,
  47,
  46,
  45,
  44,
  43,
  42,
  41,
  40,
  39,
  38,
  37,
  36,
  35,
  34,
  33,
  32,
  31,
  30,
  29,
  28,
  27,
  26,
  25,
  24,
  23,
  22,
  21,
  20,
  19,
  18,
  17,
  16,
  15,
  14,
  13,
  12,
  11,
  10,
  9,
  8,
  7,
  6,
  5,
  4,
  3,
  2,
  1])

In [23]:
words = []
for i in range(50):
    words.append(swc[i][0])
rev_words = words[::-1]
warray = np.array(rev_words)
warray


Out[23]:
array(['at', 'your', 'will', 'down', 'old', 'time', 'land', 'water', 'get',
       'take', 'voyage', 'can', 'more', 'like', 'about', 'it', 'what',
       'were', 'do', 'and', 'why', 'part', 'was', 'have', 'by', 'he', 'if',
       'upon', 'be', 'into', 'his', 'one', 'sea', 'or', 'some', 'from',
       'not', 'go', 'they', 'on', 'with', 'my', 'there', 'but', 'for',
       'this', 'all', 'you', 'me', 'i'], 
      dtype='<U6')

In [25]:
f = plt.figure(figsize=(7,9))
plt.scatter(x, y)
plt.xlim(0,45)
plt.ylim(0,51)
plt.yticks(np.arange(1,51), warray)
plt.grid(axis='y')
plt.title('Moby Dick, Ch. 1: Word Count')
plt.xlabel('Count')
plt.ylabel('Words');



In [26]:
assert True # use this for grading the dotplot