DATASCI W261: Machine Learning at Scale

Week 5, Homework 5

Katrina Adams

kradams@ischool.berkeley.edu
13 October 2015

NOTE: Issues with HW and mrjobs still running

No group involvement: I was assigned to group C, however, after reaching out to my group members several times about working together, I received no response so I had to complete the assignment on my own. I had at first delayed running on the entire data set to coordinate with my group members, until I was forced to move forward myself.
Couldn't start jobs on emr: From Sunday night through Monday night, I was unable to run on emr at all (except one small job after someone graciously told me that they had just terminated two instances) because I was running into quota limits. I later discovered that I may not have been changing regions correctly (I was changing the region in 'aws configure' only until Monday night when I added a region parameter to mrjob.conf).
Still running on entire dataset: With all of these factors, I started running on the entire dataset late, and it is not yet done running. I started 10 instances for the cosine similarity mrjob and it has been running for over 27 hours. At the same time, I started the Jaccard index similarity job on my local machine, so that I did not use too many instances on the shared emr resource. However, the Jaccard index map reduce wrote 117GB to my computer's drive for the first reducer and my computer ran out of space, so I had to cancel it. The cosine similarity job is not done so I have included a pdf of the progress and continued to calculate the precision and recall on a small sample of the data.
Some mrjob/emr output is not shown here: So that I could run sections of the notebook while working on other sections, I created multiple copies of the notebook while I was working on this assignment. Therefore, cells were run in different places. I could not find an easy way to merge those cells along with their output so I have simply copied the code from those cells and sent the final output from the mapreduce tasks to text files and have shown the beginning of those files. I have included pdfs of the other notebooks where certain cells were run that were not run here.


HW5.0:
What is a data warehouse? What is a Star schema? When is it used?

Data warehouse: A data warehouse stores data for an organization. Traditionally, this data has been stored in a relational database but more and more semi-structured and unstructured data is also being stored. Data is stored in the data warehouse from operational systems and extracted for analyses. Hadoop is increasingly popular for data warehousing.

Star schema: A star schema is sometimes used to store data in traditional BI data pipelines. A fact table contains a particular real-time measure, and various dimension tables can be used to slice that measure.

HW5.1:
In the database world What is 3NF? Does machine learning use data in 3NF? If so why?
In what form does ML consume data?
Why would one use log files that are denormalized?

3NF: Third normal form is the normalization process for constructing relational tables. Machine learning does not use data in 3NF because the features that could be used to build the model would be in different places under 3NF.

Data for ML: Machine learning comsumes denormalized data.

Denormalized log files: One may use log files that are denormalized to see combine data from different aspects of the logs. For example, for online ads serving, denormalized log files could be created to examine ad sizes and clicks together.


HW5.2.: Hashside join
Using MRJob, implement a hashside join (memory-backed map-side) for left, right and inner joins. Run your code on the data used in HW 4.4: (Recall HW 4.4: Find the most frequent visitor of each page using mrjob and the output of 4.2 (i.e., transfromed log file). In this output please include the webpage URL, webpageID and Visitor ID.)

Justify which table you chose as the Left table in this hashside join.
Please report the number of rows resulting from:
(1) Left joining Table Left with Table Right
(2) Right joining Table Left with Table Right
(3) Inner joining Table Left with Table Right


In [26]:
'''
    HW 5.2
    Using MRJob, implement a hashside join (memory-backed map-side) for left, 
    right and inner joins. Run your code on the  data used in HW 4.4: 
    (Recall HW 4.4: Find the most frequent visitor of each page using mrjob 
    and the output of 4.2  (i.e., transfromed log file). In this output 
    please include the webpage URL, webpageID and Visitor ID.)

    Justify which table you chose as the Left table in this hashside join.

    Please report the number of rows resulting from:

    (1) Left joining Table Left with Table Right
            all rows from left and matching rows from right
    (2) Right joining Table Left with Table Right
            all rows from right and matching rows from left
    (3) Inner joining Table Left with Table Right
            all rows with match for both

'''

# make directory for problem and change to that dir
!mkdir ~/Documents/W261/hw5/hw5_2/
%cd ~/Documents/W261/hw5/hw5_2/


mkdir: /Users/davidadams/Documents/W261/hw5/hw5_2/: File exists
/Users/davidadams/Documents/W261/hw5/hw5_2

In [16]:
%%writefile hw52_leftjoin.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.step import MRStep

class MRhashsideleftjoin(MRJob):

    # Load left table into memory as a dictionary keyed by page id
    def mapper_init(self):
        self.lefttable = dict()
        leftfilename = '/Users/davidadams/Documents/W261/hw5/hw5_2/anonymous-msweb_urls.txt'
        with open(leftfilename, 'r') as f:
            for row in f.readlines():
                row = row.strip()
                pageid, url = row.split(',')
                self.lefttable[pageid]=url
    
    # read in right table and emit key = page id, value = (url from left table, user id)
    def mapper(self, _, line):
        line = line.strip()
        line = line.split(',')
        pageid_right = line[1]
        userid = line[4]
        
        # keep track of keys that are emitted (i.e. right table keys also in left table)
        self.emittedkeys = list()
        
        # if key from right table is in left table, emit joined row 
        if pageid_right in self.lefttable.keys():
            self.emittedkeys.append(pageid_right)
            yield pageid_right, (self.lefttable[pageid_right], userid)
    
    # output rows that have a page id in the left table, 
    #   with null value for userid if row is missing from right table
    def mapper_final(self):
        for pageid_left in self.lefttable.keys():
            # if the key was not emitted as a joined row, emit the row with user id missing
            if pageid_left not in self.emittedkeys:
                yield pageid_left, (self.lefttable[pageid_left], 'NA')

if __name__ == '__main__':
    MRhashsideleftjoin.run()


Overwriting hw52_leftjoin.py

In [27]:
# run the left join mrjob
!python hw52_leftjoin.py anonymous-msweb_reformatted.txt > leftjoin.txt


using configs in /Users/davidadams/.mrjob.conf
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_leftjoin.davidadams.20151006.031647.223114

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_leftjoin.davidadams.20151006.031647.223114/step-0-mapper_part-00000
Counters from step 1:
  (no counters found)
Moving /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_leftjoin.davidadams.20151006.031647.223114/step-0-mapper_part-00000 -> /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_leftjoin.davidadams.20151006.031647.223114/output/part-00000
Streaming final output from /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_leftjoin.davidadams.20151006.031647.223114/output
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_leftjoin.davidadams.20151006.031647.223114
   98947 leftjoin.txt

In [29]:
# output the number of rows in the joined table
!wc -l leftjoin.txt > leftjoin_numrows.txt
with open('leftjoin_numrows.txt','r') as f:
    line = f.readline()
    line = line.strip()
    numrows = line.split(' ')[0]
    print 'There are '+numrows+' rows in the left joined table'


There are 98947 rows in the left joined table

In [21]:
%%writefile hw52_rightjoin.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.step import MRStep

class MRhashsiderightjoin(MRJob):

    # Load left table into memory as a dictionary keyed by page id
    def mapper_init(self):
        self.lefttable = dict()
        leftfilename = '/Users/davidadams/Documents/W261/hw5/hw5_2/anonymous-msweb_urls.txt'
        with open(leftfilename, 'r') as f:
            for row in f.readlines():
                row = row.strip()
                pageid, url = row.split(',')
                self.lefttable[pageid]=url
    
    # read in right table and emit key = page id, value = (url from left table, user id)
    # output rows that have a page id in the right table, 
    #   with null value for url if row is missing from left table
    def mapper(self, _, line):
        line = line.strip()
        line = line.split(',')
        pageid_right = line[1]
        userid = line[4]
        # if key from right table is in left table, emit joined row
        if pageid_right in self.lefttable.keys():
            yield pageid_right, (self.lefttable[pageid_right], userid)
        # if right table key is not in left table, emit row with url missing
        else:
            yield pageid_right, ('NA', userid)


if __name__ == '__main__':
    MRhashsiderightjoin.run()


Writing hw52_rightjoin.py

In [26]:
# run right join mrjob
!python hw52_rightjoin.py anonymous-msweb_reformatted.txt > rightjoin.txt


using configs in /Users/davidadams/.mrjob.conf
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031627.979587

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031627.979587/step-0-mapper_part-00000
Counters from step 1:
  (no counters found)
Moving /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031627.979587/step-0-mapper_part-00000 -> /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031627.979587/output/part-00000
Streaming final output from /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031627.979587/output
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031627.979587
   98654 rightjoin.txt

In [30]:
# output the number of rows in the joined table
!wc -l rightjoin.txt > rightjoin_numrows.txt 
with open('rightjoin_numrows.txt','r') as f:
    line = f.readline()
    line = line.strip()
    numrows = line.split(' ')[0]
    print 'There are '+numrows+' rows in the right joined table'


There are 98654 rows in the right joined table

In [23]:
%%writefile hw52_innerjoin.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.step import MRStep

class MRhashsideinnerjoin(MRJob):

    # Load left table into memory as a dictionary keyed by page id
    def mapper_init(self):
        self.lefttable = dict()
        leftfilename = '/Users/davidadams/Documents/W261/hw5/hw5_2/anonymous-msweb_urls.txt'
        with open(leftfilename, 'r') as f:
            for row in f.readlines():
                row = row.strip()
                pageid, url = row.split(',')
                self.lefttable[pageid]=url
    
    # read in right table and emit key = page id, value = (url from left table, user id)
    # only output rows that are in both tables
    def mapper(self, _, line):
        line = line.strip()
        line = line.split(',')
        pageid_right = line[1]
        userid = line[4]
        # if key from right table is also in left table, emit joined row
        if pageid_right in self.lefttable.keys():
            yield pageid_right, (self.lefttable[pageid_right], userid)
    

if __name__ == '__main__':
    MRhashsideinnerjoin.run()


Writing hw52_innerjoin.py

In [25]:
# run inner join mrjob
!python hw52_rightjoin.py anonymous-msweb_reformatted.txt > innerjoin.txt


using configs in /Users/davidadams/.mrjob.conf
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031606.147738

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031606.147738/step-0-mapper_part-00000
Counters from step 1:
  (no counters found)
Moving /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031606.147738/step-0-mapper_part-00000 -> /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031606.147738/output/part-00000
Streaming final output from /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031606.147738/output
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/hw52_rightjoin.davidadams.20151006.031606.147738
   98654 innerjoin.txt

In [31]:
# output the number of rows in the joined table
!wc -l innerjoin.txt > innerjoin_numrows.txt 
with open('innerjoin_numrows.txt','r') as f:
    line = f.readline()
    line = line.strip()
    numrows = line.split(' ')[0]
    print 'There are '+numrows+' rows in the inner joined table'


There are 98654 rows in the inner joined table

HW5.3: Gogle n-grams EDA
For the remainder of this assignment you will work with a large subset of the Google n-grams dataset, https://aws.amazon.com/datasets/google-books-ngrams/

which we have placed in a bucket on s3: s3://filtered-5grams/

In particular, this bucket contains (~200) files in the format:

(ngram) \t (count) \t (pages_count) \t (books_count)

Do some EDA on this dataset using mrjob, e.g.,

  • Longest 5-gram (number of characters)
  • Top 10 most frequent words (count), i.e., unigrams
  • Most/Least densely appearing words (count/pages_count) sorted in decreasing order of relative frequency (Hint: save to PART-000* and take the head -n 1000)
  • Distribution of 5-gram sizes (counts) sorted in decreasing order of relative frequency. (Hint: save to PART-000* and take the head -n 1000) OPTIONAL Question:
  • Plot the log-log plot of the frequency distributuion of unigrams. Does it follow power law distribution?

For more background see: https://en.wikipedia.org/wiki/Log%E2%80%93log_plot https://en.wikipedia.org/wiki/Power_law


In [32]:
'''
    HW 5.3 Google n-grams dataset,

    Bucket on s3:    s3://filtered-5grams/

    In particular, this bucket contains (~200) files in the format:

        (ngram) \t (count) \t (pages_count) \t (books_count)

    Do some EDA on this dataset using mrjob, e.g., 

    - Longest 5-gram (number of characters)
    - Top 10 most frequent words (count), i.e., unigrams
    - Most/Least densely appearing words (count/pages_count) 
        sorted in decreasing order of relative frequency 
        (Hint: save to PART-000* and take the head -n 1000)
    - Distribution of 5-gram sizes (counts) sorted in decreasing 
        order of relative frequency. 
        (Hint: save to PART-000* and take the head -n 1000)
    OPTIONAL Question:
    - Plot the log-log plot of the frequency distributuion of unigrams. 
        Does it follow power law distribution?
'''

# make directory for problem and change to that dir
!mkdir ~/Documents/W261/hw5/hw5_3/
%cd ~/Documents/W261/hw5/hw5_3/


mkdir: /Users/davidadams/Documents/W261/hw5/hw5_3/: File exists
/Users/davidadams/Documents/W261/hw5/hw5_3

In [10]:
%%writefile eda_longestngram.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.step import MRStep
import operator

class MR_EDA_Longestngram(MRJob):

    def mapper(self, _, line):
        line = line.strip()
        line = line.split('\t')
        # emit the number of characters in the 5-gram and the 5-gram itself as tuple value
        # None key so that all values go to one reducer
        yield None, (len(line[0]), line[0])
        
    def reducer(self, _, pair):
        # emit the pair with the max length
        yield max(pair)

if __name__ == '__main__':
    MR_EDA_Longestngram.run()


Overwriting eda_longestngram.py

In [4]:
%load_ext autoreload
%autoreload 2

from eda_longestngram import MR_EDA_Longestngram
mrjob = MR_EDA_Longestngram(args=['../googlebooks-eng-all-5gram-20090715-0-filtered.txt'])
with mrjob.make_runner() as runner:
    # run the mapreduce job defined in MRPageVisitCount
    runner.run()
    print 'Page ID\tNumber of Visits'
    # print number of characters in longest 5-gram and longest 5-gram
    for line in runner.stream_output():
        print line


WARNING:mrjob.runner:
WARNING:mrjob.runner:PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols
WARNING:mrjob.runner:
Page ID	Number of Visits
58	"Interpersonal Communication Interpersonal communication is"


In [30]:
!python eda_longestngram.py s3://ucb-mids-mls-katieadams/ngram-sample/* -r emr


using configs in /Users/davidadams/.mrjob.conf
using existing scratch bucket mrjob-03e94e1f06830625
using s3://mrjob-03e94e1f06830625/tmp/ as our scratch dir on S3
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_longestngram.davidadams.20151006.033157.800242
writing master bootstrap script to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_longestngram.davidadams.20151006.033157.800242/b.py

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

Copying non-input files into s3://mrjob-03e94e1f06830625/tmp/eda_longestngram.davidadams.20151006.033157.800242/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-IR5LF4F56OS4
Created new job flow j-IR5LF4F56OS4
Job launched 30.4s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 60.8s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 91.2s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 121.7s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 152.2s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 182.8s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 213.2s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 243.6s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 274.0s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 304.5s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 334.9s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 365.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151006.033157.800242: Step 1 of 1)
Job launched 395.8s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151006.033157.800242: Step 1 of 1)
Job launched 426.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151006.033157.800242: Step 1 of 1)
Job launched 456.7s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151006.033157.800242: Step 1 of 1)
Job launched 487.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151006.033157.800242: Step 1 of 1)
Job launched 518.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151006.033157.800242: Step 1 of 1)
Job launched 548.9s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151006.033157.800242: Step 1 of 1)
Job launched 579.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151006.033157.800242: Step 1 of 1)
Job completed.
Running time was 255.0s (not counting time spent waiting for the EC2 instances)
ec2_key_pair_file not specified, going to S3
Fetching counters from S3...
Waiting 5.0s for S3 eventual consistency
Counters from step 1:
  File Input Format Counters :
    Bytes Read: 46215176
  File Output Format Counters :
    Bytes Written: 68
  FileSystemCounters:
    FILE_BYTES_READ: 44143865
    FILE_BYTES_WRITTEN: 66471085
    HDFS_BYTES_READ: 702
    S3_BYTES_READ: 46215176
    S3_BYTES_WRITTEN: 68
  Job Counters :
    Launched map tasks: 5
    Launched reduce tasks: 1
    Rack-local map tasks: 5
    SLOTS_MILLIS_MAPS: 289614
    SLOTS_MILLIS_REDUCES: 122816
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 132920
    Combine input records: 0
    Combine output records: 0
    Map input bytes: 46215176
    Map input records: 1257891
    Map output bytes: 49981374
    Map output materialized bytes: 22168233
    Map output records: 1257891
    Physical memory (bytes) snapshot: 1277239296
    Reduce input groups: 1
    Reduce input records: 1257891
    Reduce output records: 1
    Reduce shuffle bytes: 22168233
    SPLIT_RAW_BYTES: 702
    Spilled Records: 3763673
    Total committed heap usage (bytes): 864436224
    Virtual memory (bytes) snapshot: 3952553984
Streaming final output from s3://mrjob-03e94e1f06830625/tmp/eda_longestngram.davidadams.20151006.033157.800242/output/
62	"Spontane Chromosomenaberrationen bei familiarer Panmyelopathie"
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_longestngram.davidadams.20151006.033157.800242
Removing all files in s3://mrjob-03e94e1f06830625/tmp/eda_longestngram.davidadams.20151006.033157.800242/
Removing all files in s3://mrjob-03e94e1f06830625/tmp/logs/j-IR5LF4F56OS4/
Terminating job flow: j-IR5LF4F56OS4

In [33]:
# run mrjob on emr to find longest n-gram, show results below
!python eda_longestngram.py s3://filtered-5grams/* -r emr --num-ec2-instances 10 --ec2-task-instance-type m1.medium


using configs in /Users/davidadams/.mrjob.conf
using existing scratch bucket mrjob-1febc2c04977da79
using s3://mrjob-1febc2c04977da79/tmp/ as our scratch dir on S3
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_longestngram.davidadams.20151014.013817.708236
writing master bootstrap script to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_longestngram.davidadams.20151014.013817.708236/b.py

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

Copying non-input files into s3://mrjob-1febc2c04977da79/tmp/eda_longestngram.davidadams.20151014.013817.708236/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-3UYRI8BIZM2EM
Created new job flow j-3UYRI8BIZM2EM
Job launched 32.8s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 64.6s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 95.7s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 126.9s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 158.6s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 189.8s ago, status STARTING: Configuring cluster software
Job launched 222.3s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 253.5s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 284.7s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 316.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 349.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 381.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 413.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 444.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 475.9s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 507.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 541.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 574.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 610.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 643.8s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 675.1s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 706.5s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 740.5s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 771.9s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 803.2s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 834.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 865.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 896.9s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 928.2s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 959.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 990.7s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1022.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1053.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1084.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1115.8s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1147.8s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1180.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1212.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1243.2s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1274.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1305.5s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1337.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1368.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1405.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1436.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1469.5s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1503.8s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1535.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1566.5s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1600.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1631.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1663.8s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1695.1s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1728.8s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1760.2s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1791.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1822.8s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1853.9s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1885.2s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1916.9s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1949.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 1980.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2013.0s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2044.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2075.7s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2107.2s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2138.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2169.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2200.7s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2232.2s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2263.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2294.9s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2326.1s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2357.6s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2388.9s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2420.3s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2451.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2482.8s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2513.9s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2545.4s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job launched 2576.5s ago, status RUNNING: Running step (eda_longestngram.davidadams.20151014.013817.708236: Step 1 of 1)
Job completed.
Running time was 2294.0s (not counting time spent waiting for the EC2 instances)
ec2_key_pair_file not specified, going to S3
Fetching counters from S3...
Waiting 5.0s for S3 eventual consistency
Counters from step 1:
  File Input Format Counters :
    Bytes Read: 2156069116
  File Output Format Counters :
    Bytes Written: 166
  FileSystemCounters:
    FILE_BYTES_READ: 2288200716
    FILE_BYTES_WRITTEN: 3105081914
    HDFS_BYTES_READ: 23640
    S3_BYTES_READ: 2156069116
    S3_BYTES_WRITTEN: 166
  Job Counters :
    Launched map tasks: 198
    Launched reduce tasks: 21
    Rack-local map tasks: 196
    SLOTS_MILLIS_MAPS: 9802675
    SLOTS_MILLIS_REDUCES: 7864463
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 5494920
    Combine input records: 0
    Combine output records: 0
    Map input bytes: 2156069116
    Map input records: 58682266
    Map output bytes: 2331730493
    Map output materialized bytes: 1033176563
    Map output records: 58682266
    Physical memory (bytes) snapshot: 44442079232
    Reduce input groups: 1
    Reduce input records: 58682266
    Reduce output records: 1
    Reduce shuffle bytes: 1033176563
    SPLIT_RAW_BYTES: 23640
    Spilled Records: 176046798
    Total committed heap usage (bytes): 34503680000
    Virtual memory (bytes) snapshot: 132268052480
Streaming final output from s3://mrjob-1febc2c04977da79/tmp/eda_longestngram.davidadams.20151014.013817.708236/output/
159	"ROPLEZIMPREDASTRODONBRASLPKLSON YHROACLMPARCHEYXMMIOUDAVESAURUS PIOFPILOCOWERSURUASOGETSESNEGCP TYRAVOPSIFENGOQUAPIALLOBOSKENUO OWINFUYAIOKENECKSASXHYILPOYNUAT"
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_longestngram.davidadams.20151014.013817.708236
Removing all files in s3://mrjob-1febc2c04977da79/tmp/eda_longestngram.davidadams.20151014.013817.708236/
Removing all files in s3://mrjob-1febc2c04977da79/tmp/logs/j-3UYRI8BIZM2EM/
Terminating job flow: j-3UYRI8BIZM2EM

In [34]:
%%writefile eda_mostusedword.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.step import MRStep
import operator

class MR_EDA_Longestngram(MRJob):

    def mapper(self, _, line):
        # read in n-gram line and emit word and count
        line = line.strip()
        line = line.split('\t')
        unigrams = line[0].split(' ') # ngram
        count = int(line[1]) # ngram count
        for word in unigrams:
            yield word, count
    
    def combiner_counts(self, word, counts):
        # sum counts for each word
        yield word, sum(counts)
        
    def reducer_getcounts(self, word, counts):
        # sum counts for each word
        yield word, sum(counts)
    
    def mapper_makepairs(self, word, count):
        # emit null key so that all values go to the same reducer
        yield None, (count, word)
    
    def reducer_getmax(self, _, countpair):
        # output max count and word with that count
        yield max(countpair)
    
    def steps(self):
        return [
            MRStep(mapper=self.mapper,
                   combiner=self.combiner_counts,
                   reducer=self.reducer_getcounts),
            MRStep(mapper=self.mapper_makepairs,
                   reducer=self.reducer_getmax)  
            ]

if __name__ == '__main__':
    MR_EDA_Longestngram.run()


Overwriting eda_mostusedword.py

In [36]:
# test on sample of data locally
!python eda_mostusedword.py ../ngram-sample/*


using configs in /Users/davidadams/.mrjob.conf
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00000
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00001
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00002
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00003
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00004
Counters from step 1:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper-sorted
> sort /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00000 /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00001 /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00002 /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00003 /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-mapper_part-00004
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-1-mapper_part-00000
Counters from step 2:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-1-mapper-sorted
> sort /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-1-mapper_part-00000
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-1-reducer_part-00000
Counters from step 2:
  (no counters found)
Moving /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/step-1-reducer_part-00000 -> /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/output/part-00000
Streaming final output from /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283/output
572702	"the"
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.035103.036283

In [37]:
# test on sample of data on emr
!python eda_mostusedword.py s3://ucb-mids-mls-katieadams/ngram-sample/* -r emr --num-ec2-instances 3 --ec2-task-instance-type m1.medium


using configs in /Users/davidadams/.mrjob.conf
using existing scratch bucket mrjob-03e94e1f06830625
using s3://mrjob-03e94e1f06830625/tmp/ as our scratch dir on S3
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.041717.363823
writing master bootstrap script to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.041717.363823/b.py

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

Copying non-input files into s3://mrjob-03e94e1f06830625/tmp/eda_mostusedword.davidadams.20151006.041717.363823/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-19H5BSRFSCXE3
Created new job flow j-19H5BSRFSCXE3
Job launched 30.5s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 61.3s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 91.7s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 122.6s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 153.0s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 183.9s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 214.3s ago, status STARTING: Configuring cluster software
Job launched 244.8s ago, status STARTING: Configuring cluster software
Job launched 275.3s ago, status STARTING: Configuring cluster software
Job launched 306.1s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 336.5s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 367.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 397.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 428.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 459.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 489.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 520.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 550.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 581.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 612.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 642.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 673.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 703.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 734.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 765.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 1 of 2)
Job launched 795.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 2 of 2)
Job launched 826.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 2 of 2)
Job launched 857.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 2 of 2)
Job launched 887.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 2 of 2)
Job launched 918.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.041717.363823: Step 2 of 2)
Job completed.
Running time was 566.0s (not counting time spent waiting for the EC2 instances)
ec2_key_pair_file not specified, going to S3
Fetching counters from S3...
Waiting 5.0s for S3 eventual consistency
Counters from step 1:
  File Input Format Counters :
    Bytes Read: 46219644
  File Output Format Counters :
    Bytes Written: 995891
  FileSystemCounters:
    FILE_BYTES_READ: 5698812
    FILE_BYTES_WRITTEN: 7567996
    HDFS_BYTES_READ: 1283
    HDFS_BYTES_WRITTEN: 995891
    S3_BYTES_READ: 46219644
  Job Counters :
    Launched map tasks: 9
    Launched reduce tasks: 4
    Rack-local map tasks: 9
    SLOTS_MILLIS_MAPS: 1171799
    SLOTS_MILLIS_REDUCES: 357870
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 404060
    Combine input records: 6712384
    Combine output records: 675746
    Map input bytes: 46215176
    Map input records: 1257891
    Map output bytes: 58786613
    Map output materialized bytes: 2244241
    Map output records: 6289455
    Physical memory (bytes) snapshot: 2259230720
    Reduce input groups: 75509
    Reduce input records: 252817
    Reduce output records: 75509
    Reduce shuffle bytes: 2244241
    SPLIT_RAW_BYTES: 1283
    Spilled Records: 928563
    Total committed heap usage (bytes): 1517125632
    Virtual memory (bytes) snapshot: 7649865728
Counters from step 2:
  File Input Format Counters :
    Bytes Read: 1035410
  File Output Format Counters :
    Bytes Written: 13
  FileSystemCounters:
    FILE_BYTES_READ: 733292
    FILE_BYTES_WRITTEN: 1785837
    HDFS_BYTES_READ: 1036877
    S3_BYTES_WRITTEN: 13
  Job Counters :
    Data-local map tasks: 5
    Launched map tasks: 9
    Launched reduce tasks: 4
    Rack-local map tasks: 4
    SLOTS_MILLIS_MAPS: 220298
    SLOTS_MILLIS_REDUCES: 133114
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 25760
    Combine input records: 0
    Combine output records: 0
    Map input bytes: 995891
    Map input records: 75509
    Map output bytes: 1599963
    Map output materialized bytes: 733726
    Map output records: 75509
    Physical memory (bytes) snapshot: 2051444736
    Reduce input groups: 1
    Reduce input records: 75509
    Reduce output records: 1
    Reduce shuffle bytes: 733726
    SPLIT_RAW_BYTES: 1467
    Spilled Records: 151018
    Total committed heap usage (bytes): 1420832768
    Virtual memory (bytes) snapshot: 7776030720
Streaming final output from s3://mrjob-03e94e1f06830625/tmp/eda_mostusedword.davidadams.20151006.041717.363823/output/
572702	"the"
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.041717.363823
Removing all files in s3://mrjob-03e94e1f06830625/tmp/eda_mostusedword.davidadams.20151006.041717.363823/
Removing all files in s3://mrjob-03e94e1f06830625/tmp/logs/j-19H5BSRFSCXE3/
Terminating job flow: j-19H5BSRFSCXE3

In [41]:
# run on entire dataset on emr to find the most-used word
!python eda_mostusedword.py s3://filtered-5grams/* -r emr --num-ec2-instances 2 --ec2-task-instance-type m1.medium > mostusedword_output.txt


using configs in /Users/davidadams/.mrjob.conf
using existing scratch bucket mrjob-03e94e1f06830625
using s3://mrjob-03e94e1f06830625/tmp/ as our scratch dir on S3
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.054048.828589
writing master bootstrap script to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.054048.828589/b.py

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

Copying non-input files into s3://mrjob-03e94e1f06830625/tmp/eda_mostusedword.davidadams.20151006.054048.828589/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-35ZNP3LSATENH
Created new job flow j-35ZNP3LSATENH
Job launched 30.4s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 61.2s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 91.7s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 122.6s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 153.1s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 184.0s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 214.6s ago, status STARTING: Configuring cluster software
Job launched 245.4s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 275.9s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 306.7s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 337.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 367.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 398.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 429.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 459.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 489.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 520.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 551.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 581.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 612.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 642.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 673.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 704.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 735.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 765.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 796.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 826.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 857.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 888.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 919.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 949.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 980.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1011.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1041.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1072.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1103.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1133.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1164.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1195.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1225.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1256.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1287.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1317.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1348.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1379.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1409.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1440.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1471.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1501.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1532.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1562.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1593.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1624.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1655.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1685.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1716.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1746.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1777.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1807.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1838.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1869.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1900.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1930.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1961.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 1991.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2022.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2053.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2084.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2114.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2145.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2176.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2206.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2237.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2268.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2298.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2329.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2360.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2391.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2421.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2452.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2482.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2513.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2543.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2574.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2605.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2636.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2666.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2697.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2728.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2759.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2789.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2820.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2850.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2881.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2912.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2943.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 2973.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3004.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3035.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3066.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3096.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3127.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3158.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3189.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3219.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3250.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3281.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3311.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3341.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3372.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3403.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3434.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3464.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3495.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3526.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3556.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3587.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3618.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3648.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3679.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3710.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3741.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3771.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3802.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3832.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3863.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3894.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3925.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3955.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 3985.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4016.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4047.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4077.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4108.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4138.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4169.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4200.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4231.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4261.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4292.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4323.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4354.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4384.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4415.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4445.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4476.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4507.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4538.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4568.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4599.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4630.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4661.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4691.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4722.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4752.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4783.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4814.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4845.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4875.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4906.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4937.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4968.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 4998.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5029.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5059.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5090.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5121.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5152.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5182.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5213.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5244.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5275.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5305.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5336.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5366.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5397.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5428.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5459.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5489.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5520.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5551.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5582.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5612.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5643.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5673.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5704.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5735.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5766.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5796.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5827.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5858.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5889.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5919.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5950.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 5981.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6012.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6042.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6073.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6104.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6135.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6165.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6196.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6226.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6257.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6288.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6319.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6349.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6380.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6410.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6441.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6472.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6503.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6533.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6564.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6594.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6625.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6655.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6686.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6717.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6747.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6778.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6809.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6839.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6870.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6901.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6931.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6962.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 6993.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7023.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7054.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7085.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7116.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7146.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7177.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7207.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7238.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7269.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7300.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7330.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7361.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7391.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7422.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7453.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7484.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7514.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7545.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7575.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7606.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7637.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7668.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7698.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7729.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7760.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7790.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7821.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7852.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7882.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7913.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7944.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 7975.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8005.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8036.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8066.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8097.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8128.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8159.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8189.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8220.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8251.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8281.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8312.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8343.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8373.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8404.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8435.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8466.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8496.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8527.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8558.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8589.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8619.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8650.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8680.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8711.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8741.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8772.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8803.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8834.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8864.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8895.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8925.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8956.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 8986.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9017.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9048.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9079.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9109.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9140.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9171.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9202.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9232.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9263.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9293.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9324.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9355.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9386.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9416.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9447.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9478.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9508.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9539.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9570.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9600.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9631.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9662.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9693.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9723.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9754.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9784.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9815.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9846.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9877.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9907.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9938.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9969.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 9999.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10030.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10061.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10091.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10122.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10152.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10183.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10214.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10244.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10274.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10305.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10336.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10367.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10397.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10428.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10458.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10489.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10520.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10551.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10581.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10612.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10643.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10673.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10704.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10735.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10765.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10796.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10827.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10858.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10888.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10919.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10950.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 10980.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11011.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11042.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11072.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11103.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11134.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11165.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11195.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11226.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11256.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11288.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11318.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11349.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11379.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11410.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11441.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11472.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11502.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11533.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11564.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11595.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11625.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11656.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11687.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11718.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11748.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11779.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11809.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11840.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11871.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11901.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11932.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11963.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 11993.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12024.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12054.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12085.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12116.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12147.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12177.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12208.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12239.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12269.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12300.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12331.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12361.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12392.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12423.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12454.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12484.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12515.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12545.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12576.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12606.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12637.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12667.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12698.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12729.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12760.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12790.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12821.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12852.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12883.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12913.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12943.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 12974.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13005.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13035.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13066.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13097.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13127.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13158.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13189.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13219.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13250.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13281.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13312.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13342.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13373.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13403.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13434.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13465.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13496.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13526.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13557.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13588.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13618.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13649.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13680.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13710.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13741.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13772.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13802.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13833.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13864.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13894.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13925.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13956.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 13987.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14017.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14048.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14078.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14110.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14140.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14171.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14202.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14233.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14263.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14294.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14324.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14355.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14386.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14417.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14447.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14478.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14509.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14540.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14570.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14601.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14631.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14662.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14693.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14724.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14754.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14785.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14816.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14847.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14877.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14908.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14939.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 14969.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15000.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15031.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15061.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15092.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15123.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15154.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15184.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15215.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15245.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15276.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15307.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15338.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15368.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15399.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15429.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15460.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15491.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15522.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15552.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15583.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15614.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15645.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15675.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15706.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15736.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15767.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15798.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15829.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15859.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15890.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15921.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15952.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 15982.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16013.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16043.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16074.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16105.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16136.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16167.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16197.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16228.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16259.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16289.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16320.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16351.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16382.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16412.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16443.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16474.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16505.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16535.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16565.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16596.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16627.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16657.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16688.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16718.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16749.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16780.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16810.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16841.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16872.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16902.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16933.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16964.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 16995.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17025.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17056.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17087.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17117.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17148.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17179.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17209.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17240.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17271.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17302.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17332.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17363.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17393.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17424.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17455.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17486.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17516.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17547.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17577.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17608.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17639.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17670.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17700.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17731.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17762.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17793.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17823.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17854.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17884.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17915.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17946.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 17977.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18007.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18038.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18069.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18100.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18130.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18161.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18191.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18222.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18252.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18283.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18313.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18344.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18375.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18406.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18436.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18467.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18498.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18529.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18559.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18590.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18620.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18651.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18682.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18713.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18743.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18774.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18804.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18835.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18866.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18897.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18927.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18958.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 18989.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19020.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19050.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19081.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19111.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19143.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19173.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19204.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19234.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19265.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19296.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19326.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19357.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19388.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19418.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19449.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19479.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19510.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19541.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19572.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19602.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19633.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19663.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19694.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19725.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19756.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19786.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19817.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19848.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19879.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19909.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19940.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 19971.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20002.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20032.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20063.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20093.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20124.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20155.3s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20186.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20216.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20247.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20277.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20308.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20339.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20370.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20400.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20431.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20461.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20492.8s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20523.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20554.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20584.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20614.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20645.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20676.2s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20706.7s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20737.6s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 1 of 2)
Job launched 20768.1s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 2 of 2)
Job launched 20799.0s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 2 of 2)
Job launched 20829.5s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 2 of 2)
Job launched 20860.4s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 2 of 2)
Job launched 20890.9s ago, status RUNNING: Running step (eda_mostusedword.davidadams.20151006.054048.828589: Step 2 of 2)
Job completed.
Running time was 20551.0s (not counting time spent waiting for the EC2 instances)
ec2_key_pair_file not specified, going to S3
Fetching counters from S3...
Waiting 5.0s for S3 eventual consistency
Counters from step 1:
  File Input Format Counters :
    Bytes Read: 2156069116
  File Output Format Counters :
    Bytes Written: 4672671
  FileSystemCounters:
    FILE_BYTES_READ: 176650706
    FILE_BYTES_WRITTEN: 246553446
    HDFS_BYTES_READ: 23640
    HDFS_BYTES_WRITTEN: 4672671
    S3_BYTES_READ: 2156069116
  Job Counters :
    Launched map tasks: 190
    Launched reduce tasks: 1
    Rack-local map tasks: 188
    SLOTS_MILLIS_MAPS: 39925153
    SLOTS_MILLIS_REDUCES: 19234237
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 17059270
    Combine input records: 313284500
    Combine output records: 27795213
    Map input bytes: 2156069116
    Map input records: 58682266
    Map output bytes: 2742506470
    Map output materialized bytes: 64798599
    Map output records: 293411330
    Physical memory (bytes) snapshot: 42264498176
    Reduce input groups: 343019
    Reduce input records: 7922043
    Reduce output records: 343019
    Reduce shuffle bytes: 64798599
    SPLIT_RAW_BYTES: 23640
    Spilled Records: 35717256
    Total committed heap usage (bytes): 32028372992
    Virtual memory (bytes) snapshot: 123474776064
Counters from step 2:
  File Input Format Counters :
    Bytes Read: 4707115
  File Output Format Counters :
    Bytes Written: 15
  FileSystemCounters:
    FILE_BYTES_READ: 3106926
    FILE_BYTES_WRITTEN: 6347296
    HDFS_BYTES_READ: 4707767
    S3_BYTES_WRITTEN: 15
  Job Counters :
    Data-local map tasks: 4
    Launched map tasks: 4
    Launched reduce tasks: 1
    SLOTS_MILLIS_MAPS: 158426
    SLOTS_MILLIS_REDUCES: 64465
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 48920
    Combine input records: 0
    Combine output records: 0
    Map input bytes: 4672671
    Map input records: 343019
    Map output bytes: 7416823
    Map output materialized bytes: 3107696
    Map output records: 343019
    Physical memory (bytes) snapshot: 930820096
    Reduce input groups: 1
    Reduce input records: 343019
    Reduce output records: 1
    Reduce shuffle bytes: 3107696
    SPLIT_RAW_BYTES: 652
    Spilled Records: 686038
    Total committed heap usage (bytes): 620118016
    Virtual memory (bytes) snapshot: 3229167616
Streaming final output from s3://mrjob-03e94e1f06830625/tmp/eda_mostusedword.davidadams.20151006.054048.828589/output/
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/eda_mostusedword.davidadams.20151006.054048.828589
Removing all files in s3://mrjob-03e94e1f06830625/tmp/eda_mostusedword.davidadams.20151006.054048.828589/
Removing all files in s3://mrjob-03e94e1f06830625/tmp/logs/j-35ZNP3LSATENH/
Terminating job flow: j-35ZNP3LSATENH

In [42]:
# show most used word and its count
!cat mostusedword_output.txt


26769875	"the"

HW5.4

In this part of the assignment we will focus on developing methods for detecting synonyms, using the Google 5-grams dataset. To accomplish this you must script two main tasks using MRJob:

(1) Build stripes of word co-ocurrence for the top 10,000 most frequently appearing words across the entire set of 5-grams, and output to a file in your bucket on s3 (bigram analysis, though the words are non-contiguous).

(2) Using two (symmetric) comparison methods of your choice (e.g., correlations, distances, similarities), pairwise compare all stripes (vectors), and output to a file in your bucket on s3.

==Design notes for (1)==
For this task you will be able to modify the pattern we used in HW 3.2 (feel free to use the solution as reference). To total the word counts across the 5-grams, output the support from the mappers using the total order inversion pattern:

<*word,count>

to ensure that the support arrives before the cooccurrences.

In addition to ensuring the determination of the total word counts, the mapper must also output co-occurrence counts for the pairs of words inside of each 5-gram. Treat these words as a basket, as we have in HW 3, but count all stripes or pairs in both orders, i.e., count both orderings: (word1,word2), and (word2,word1), to preserve symmetry in our output for (2).

==Design notes for (2)==
For this task you will have to determine a method of comparison. Here are a few that you might consider:

  • Spearman correlation
  • Euclidean distance
  • Taxicab (Manhattan) distance
  • Shortest path graph distance (a graph, because our data is symmetric!)
  • Pearson correlation
  • Cosine similarity
  • Kendall correlation
    ...

However, be cautioned that some comparison methods are more difficult to parallelize than others, and do not perform more associations than is necessary, since your choice of association will be symmetric.


In [2]:
'''
    HW 5.4
    
'''

# make directory for problem and change to that dir
!mkdir ~/Documents/W261/hw5/hw5_4/
%cd ~/Documents/W261/hw5/hw5_4/


mkdir: /Users/davidadams/Documents/W261/hw5/hw5_4/: File exists
/Users/davidadams/Documents/W261/hw5/hw5_4

In [ ]:
%%writefile CountWords.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.protocol import RawValueProtocol
from mrjob.step import MRStep
import operator
from itertools import combinations
from collections import defaultdict
from re import match

class MR_CountWords(MRJob):
    
    OUTPUT_PROTOCOL = RawValueProtocol
    
    def mapper_countwords(self, _, line):
        # read in n-grams and emit words in ngram and ngram count
        line = line.strip()
        line = line.split('\t')
        words = line[0].split(' ')
        count = int(line[1])
        for word in words:
            yield word, count
            
    def combiner_countwords(self, word, count):
        # sum counts for each word
        yield word, sum(count)
        
    def reducer_countwords(self, word, count):
        # sum counts for each word
        yield word, sum(count)
    
    def reducer_output(self, word, count):
        for c in count:
            yield None, (word, int(c))
            #yield word, int(c)
    
    def steps(self):
        return [
            MRStep(mapper=self.mapper_countwords,
                   combiner=self.combiner_countwords,
                   reducer=self.reducer_countwords),
            MRStep(reducer=self.reducer_output)
            ]

if __name__ == '__main__':
    MR_CountWords.run()

In [ ]:
# test on filtered sample locally
!python CountWords.py ../ngram-sample/gbooks_filtered_sample.txt > wordcounts_small_local.txt

In [3]:
!head wordcounts_small_local.txt


('A', 74206)
("A's", 165)
('AAR', 99)
('AB', 43)
('ABBREVIATIONS', 106)
('ABC', 44)
('ABD', 177)
('ABM', 59)
('ABOUT', 88)
('ACKNOWLEDGEMENTS', 70)

In [ ]:
# test on filtered sample on emr
!python CountWords.py s3://ucb-mids-mls-katieadams/ngram-sample/gbooks_filtered_sample.txt -r emr --num-ec2-instances 4 --ec2-task-instance-type m1.small > wordcount_small_emr.txt

In [4]:
!head wordcount_small_emr.txt


('AB', 43)	
('ACT', 73)	
('AG', 111)	
('AL', 97)	
('AMENDMENT', 318)	
('AMERICAN', 97)	
('ANCIENT', 43)	
('AND', 3684)	
('ARE', 66)	
('ASIA', 65)	

In [ ]:
# run on full ngram dataset on emr
!python CountWords.py s3://filtered-5grams/* -r emr --num-ec2-instances 5 --ec2-task-instance-type m1.small > wordcounts_emr_all.txt

In [37]:
!head wordcounts_emr_all.txt


('AA', 23451)	
('AAAI', 1623)	
('AAASS', 72)	
('AAAl', 59)	
('AAC', 134)	
('AAHPERD', 44)	
('AAMFT', 246)	
('AAPM', 176)	
('ABABA', 3436)	
('ABANDONED', 124)	

In [ ]:
%%writefile FindMinSupport.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.step import MRStep
import operator
from ast import literal_eval
from mrjob.protocol import RawValueProtocol

class MR_FindMinSupport(MRJob):
    '''
        From Jake's SortingExample
    '''
    
    OUTPUT_PROTOCOL = RawValueProtocol
    
    def jobconf(self):
        # update jobconf for reverse numberical sorting
        orig_jobconf = super(MR_FindMinSupport, self).jobconf()        
        custom_jobconf = {
            'mapred.output.key.comparator.class': 'org.apache.hadoop.mapred.lib.KeyFieldBasedComparator',
            'mapred.text.key.comparator.options': '-k1rn',
        }
        combined_jobconf = orig_jobconf
        combined_jobconf.update(custom_jobconf)
        self.jobconf = combined_jobconf
        return combined_jobconf

    def mapper(self, _, line):
        # emit count of word as key and word as value 
        #   (b/c mapper output sorted by key)
        line = line.strip()
        word, count = literal_eval(line)
        yield int(count), word
    
    def reducer(self, count, values):
        # output sorted counts from mappers
        for word in values:
            yield None, (word, count)
    
    def steps(self):
        return [
            MRStep(mapper=self.mapper,
                   reducer=self.reducer)
            ]

if __name__ == '__main__':
    MR_FindMinSupport.run()

In [ ]:
# test on sample of data locally
!python FindMinSupport.py wordcounts_small_local.txt > rankedcounts_small_local.txt
# see attached pdf 5.4.1 for output from run

In [5]:
!head rankedcounts_small_local.txt


('Afterwards', 100)
('Boke', 100)
('Copper', 100)
('Cry', 100)
('Devonian', 100)
('Diet', 100)
("Elyot's", 100)
('Everyone', 100)
('Hymn', 100)
('Kensington', 100)

In [ ]:
# test on sample of data on emr
!python FindMinSupport.py wordcount_small_emr.txt -r emr --num-ec2-instances 3 --ec2-task-instance-type m1.small > rankedcounts_small_emr.txt
# see attached pdf 5.4.1 for output from run

In [6]:
!head rankedcounts_small_emr.txt


('the', 633346)	
('have', 411226)	
('that', 393037)	
('to', 235138)	
('and', 153694)	
('He', 136657)	
('meeting', 104488)	
('for', 50138)	
('with', 45473)	
('And', 44804)	

In [ ]:
# run on entire dataset on emr
!python FindMinSupport.py wordcounts_emr_all.txt -r emr --num-ec2-instances 2 --ec2-task-instance-type m1.small > rankedcounts_all_emr.txt
# see attached pdf 5.4.1 for output from run

In [7]:
!head rankedcounts_all_emr.txt


('the', 5375699242)	
('of', 3691308874)	
('to', 2221164346)	
('in', 1387638591)	
('a', 1342195425)	
('and', 1135779433)	
('that', 798553959)	
('is', 756296656)	
('be', 688053106)	
('as', 481373389)	

In [38]:
%cd ~/Documents/W261/hw5/hw5_4/
def make_top10k_list():
    # from the sorted word count output, 
    #    take the top 10,000 most frequent words
    #    and write to file
    rankfilename = 'rankedoutput_emr_all.txt'
    listfilename = 'top10klist_wcounts.txt'
    linenum = 0
    f2 = open(listfilename, 'w')
    with open(rankfilename, 'r') as f1:
        for line in f1.readlines():
            if len(line)==1:
                continue
            linenum+=1
            if linenum>10000:
                break
            f2.write(line)
    f2.close()

make_top10k_list()


/Users/davidadams/Documents/W261/hw5/hw5_4

In [37]:
%%writefile MakeStripes.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.protocol import RawValueProtocol
from mrjob.step import MRStep
import operator
from itertools import combinations
from collections import defaultdict
from re import match
from ast import literal_eval

class MR_MakeStripes(MRJob):

    OUTPUT_PROTOCOL = RawValueProtocol
    topWords = set()
    
    def mapper_init(self):
        # store list of top 10,000 words in mapper memory
        with open('top10klist_wcounts.txt', 'r') as f:
            for line in f.readlines():
                (word,count)=literal_eval(line)
                self.topWords.add(word)
    
    def mapper(self, _, line):
        # read in ngram data and emit stripe for each ngram
        # initialize stripe
        counts = dict()
        line = line.strip()
        line = line.split('\t')
        words = line[0].split(' ')
        count = int(line[1])
        # make all pairs of words from 5-gram
        combs = list(combinations(words,2))
        # loop over combinations
        for combination in combs:
            word1,word2=combination
            # if both words in pair are in the top 10k words, 
            #     then add to the stripe
            if word1 in self.topWords and word2 in self.topWords:
                counts.setdefault(word1,{})
                counts[word1].setdefault(word2,0)
                counts[word1][word2] += count
                counts.setdefault(word2,{})
                counts[word2].setdefault(word1,0)
                counts[word2][word1] += count
        for word in counts.keys():
            yield word, counts[word]
                
            
    def combiner(self, word, values):
        # sum counts for stripes with the same key
        counts = {}
        for stripe in values:
            for coword in stripe.keys():
                counts.setdefault(coword,0)
                counts[coword] += stripe[coword]
        yield word,counts
    
    def reducer(self, word, values):
        # sum counts for stripes with the same key
        counts = {}
        for stripe in values:
            for coword in stripe.keys():
                counts.setdefault(coword,0)
                counts[coword] += stripe[coword]
        yield None,word+"\t"+str(counts)
        
    
    def steps(self):
        return [MRStep(
                mapper_init = self.mapper_init,
                mapper = self.mapper, 
                combiner = self.combiner,               
                reducer = self.reducer
            )]
    

if __name__ == '__main__':
    MR_MakeStripes.run()


Overwriting MakeStripes.py

In [8]:
# test on sample locally
!python MakeStripes.py ../ngram-sample/gbooks_filtered_sample.txt --file top10klist_wcounts.txt > stripes_small_local.txt


using configs in /Users/davidadams/.mrjob.conf
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035513.138466

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035513.138466/step-0-mapper_part-00000
Counters from step 1:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035513.138466/step-0-mapper-sorted
> sort /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035513.138466/step-0-mapper_part-00000
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035513.138466/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
Moving /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035513.138466/step-0-reducer_part-00000 -> /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035513.138466/output/part-00000
Streaming final output from /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035513.138466/output
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035513.138466

In [9]:
!head stripes_small_local.txt


A	{'all': 551, 'issued': 111, 'lack': 85, 'people': 757, 'month': 264, 'consists': 45, 'revealed': 118, 'sung': 52, 'whose': 70, 'young': 120, 'to': 4817, 'program': 49, 'Western': 42, 'smile': 175, 'pleasing': 46, 'presentation': 43, 'worth': 175, 'sent': 213, 'placed': 108, 'division': 52, 'woman': 208, 'seized': 65, 'very': 668, 'Constitutional': 187, 'wave': 148, 'medicine': 108, 'instruments': 82, 'Financial': 62, 'difference': 384, 'dim': 43, 'presented': 42, 'list': 88, 'large': 1097, 'sand': 45, 'small': 252, 'round': 251, 'Select': 118, 'application': 217, 'force': 41, 'findings': 117, 'trend': 626, 'sigh': 46, 'streets': 154, 'Rev': 52, 'direct': 64, 'pulse': 47, 'go': 42, 'second': 739, 'perspective': 127, 'lawyer': 50, 'rigorous': 110, 'further': 408, 'air': 98, 'shining': 43, 'established': 43, 'stands': 55, 'fine': 4722, 'celebrated': 52, 'brief': 88, 'current': 80, 'waiting': 161, 'version': 598, 'new': 6985, 'learned': 53, 'appeal': 57, 'method': 619, 'filled': 46, 'body': 661, 'full': 703, 'led': 42, 'Treatise': 178, 'degree': 55, 'never': 184, 'understanding': 52, 'water': 84, 'pause': 49, 'focused': 83, 'will': 601, 'English': 65, 'path': 51, 'Distribution': 64, 'My': 355, 'teacher': 68, 'change': 119, 'boy': 151, 'great': 470, 'balance': 90, 'difficulty': 108, 'laughter': 185, 'study': 1174, 'reports': 41, 'example': 3345, 'trial': 45, 'amount': 580, 'survey': 154, 'smoke': 140, 'social': 61, 'military': 116, 'adolescents': 85, 'pillow': 108, 'ventured': 65, 'useful': 165, 'secure': 86, 'Three': 294, 'private': 60, 'brought': 49, 'Book': 500, 'names': 106, 'stern': 140, 'glance': 593, 'sugar': 81, 'singing': 49, 'unit': 132, 'Europe': 86, 'Star': 58, 'Word': 172, 'would': 281, 'positive': 87, 'hospital': 84, 'contains': 68, 'two': 265, 'few': 2407, 'examination': 379, 'strike': 118, 'length': 52, 'type': 116, 'more': 653, 'door': 102, 'knows': 130, 'on': 1046, 'company': 58, 'about': 482, 'glass': 57, 'American': 162, 'broke': 175, 'riding': 81, 'known': 50, 'Procedure': 81, 'must': 125, 'account': 47, 'word': 364, 'heard': 54, 'this': 872, 'car': 249, 'work': 282, "patient's": 791, 'values': 398, 'can': 879, 'growing': 397, 'my': 657, 'could': 77, 'history': 325, 'control': 98, 'heart': 134, 'stream': 76, 'give': 6149, 'calculation': 51, 'choice': 77, 'in': 3620, 'organized': 44, 'escaped': 90, 'indicates': 206, 'way': 48, 'council': 1174, 'sense': 230, 'phrase': 77, 'species': 129, 'motion': 51, 'court': 137, 'syndrome': 42, 'rather': 44, 'discussion': 45, 'feature': 488, 'how': 196, 'cluster': 42, 'panic': 65, 'stock': 67, 'A': 180, 'description': 588, 'may': 764, 'after': 208, 'collection': 71, 'diagram': 103, 'wrong': 49, 'coming': 43, 'such': 96, 'Model': 158, 'law': 292, 'data': 276, 'View': 277, 'a': 4685, 'short': 76, 'Letters': 101, 'liquid': 50, 'object': 47, 'light': 525, 'element': 169, 'so': 71, 'allow': 67, 'furious': 196, 'representation': 140, 'order': 51, 'wind': 196, 'Catholic': 48, 'What': 119, 'passages': 231, 'office': 79, 'over': 312, 'soon': 45, 'years': 265, 'produced': 62, 'Popular': 46, 'through': 257, 'committee': 102, 'cold': 42, 'Nation': 99, 'its': 101, 'before': 57, 'style': 48, 'His': 67, 'group': 181, 'manuscript': 64, 'personal': 104, 'late': 50, 'listing': 106, 'weeks': 154, 'areas': 162, 'might': 67, 'finer': 76, 'then': 258, 'good': 1035, 'diseases': 234, 'practice': 108, 'significant': 503, 'Analysis': 336, 'compound': 162, 'communities': 68, 'hitherto': 45, 'day': 165, 'bread': 116, 'term': 52, 'document': 55, 'lies': 86, 'opera': 59, 'India': 75, 'stopped': 41, 'each': 355, 'found': 174, 'bond': 65, 'square': 80, 'From': 272, 'significantly': 133, 'series': 63, 'related': 239, 'society': 78, 'variety': 391, 'entering': 73, 'year': 48, 'girl': 59, 'summary': 117, 'special': 54, 'out': 115, 'shown': 44, 'factors': 57, 'vessels': 56, 'research': 277, 'William': 281, 'studying': 64, 'dominated': 78, 'Report': 82, 'issue': 139, 'red': 206, 'turning': 188, 'theory': 231, 'written': 55, 'given': 169, 'free': 41, 'Record': 607, 'British': 252, 'murmur': 46, 'estimate': 128, 'care': 55, 'definition': 51, 'training': 49, 'language': 71, 'National': 131, 'transition': 53, 'times': 54, 'filter': 45, 'turn': 72, 'conducted': 54, 'place': 123, 'Human': 607, 'stone': 51, 'fallen': 69, 'think': 119, 'precepts': 43, 'first': 72, 'major': 372, 'features': 50, 'Working': 82, 'striking': 492, 'clause': 146, 'number': 3781, 'one': 249, 'randomized': 46, 'Black': 249, 'wages': 118, 'open': 46, 'city': 43, 'story': 85, 'reference': 70, 'Dictionary': 364, 'their': 264, 'introduction': 45, 'twenty': 59, 'needed': 79, 'wonderful': 47, 'white': 274, 'man': 1467, 'Data': 742, 'friend': 580, 'B': 1426, 'Letter': 281, 'that': 1210, 'explanation': 192, 'treasures': 45, 'natural': 250, 'copy': 177, 'than': 71, 'Road': 65, 'History': 1041, 'wide': 95, 'kind': 339, 'third': 47, 'incision': 46, 'elected': 55, 'bits': 97, 'tree': 86, 'State': 139, 'bed': 45, 'matter': 164, 'patients': 148, 'determined': 117, 'Relation': 52, 'historical': 109, 'Series': 111, 'feeling': 74, 'result': 44, 'and': 3988, 'bridge': 71, 'false': 169, 'journey': 47, 'God': 46, 'defect': 45, 'comfortable': 116, 'rat': 72, 'have': 600, 'anger': 44, 'demonstration': 71, "God's": 84, 'relatively': 59, 'Christian': 215, 'built': 71, 'convenient': 123, 'note': 380, 'also': 772, 'potential': 66, 'which': 734, 'meeting': 76, 'finding': 44, 'Spirit': 102, 'Articles': 111, 'multiple': 44, 'fundamental': 70, 'any': 72, 'who': 514, 'said': 55, 'eight': 51, 'printed': 83, 'letter': 168, 'preliminary': 116, 'phase': 45, 'Biological': 52, 'model': 95, 'appear': 404, 'FROM': 319, 'surprising': 111, 'considered': 54, 'later': 2546, 'treaty': 146, 'part': 117, 'Film': 419, 'analogy': 165, 'barrel': 67, 'lying': 49, 'impression': 67, 'selection': 85, 'shot': 52, 'pope': 55, 'jurisdiction': 60, 'Wilson': 41, 'longitudinal': 52, 'relation': 194, 'conclusion': 116, 'essays': 85, 'repugnant': 60, 'European': 46, 'slow': 45, 'based': 81, 'true': 52, 'writer': 235, 'proportion': 96, 'catalogue': 176, 'French': 539, 'peculiar': 435, 'tears': 66, 'score': 59, 'York': 50, 'enthusiasm': 85, 'hope': 47, 'equilibrium': 317, 'do': 264, 'his': 169, 'means': 204, 'H': 66, 'membrane': 53, 'watch': 509, 'beam': 348, 'remedy': 57, 'cannot': 58, 'closely': 100, 'churches': 100, 'buyer': 116, 'during': 483, 'THE': 552, 'du': 65, 'him': 66, 'treatise': 131, 'enemy': 52, 'bar': 206, 'skull': 47, 'held': 1245, 'cry': 105, 'guns': 51, 'book': 101, 'common': 58, 'House': 191, 'view': 118, 'set': 669, 'questions': 210, 'fair': 1291, 'plausible': 86, 'Source': 52, 'computer': 71, 'Handbook': 58, 'are': 153, 'portrait': 137, 'White': 452, 'concern': 87, 'voyage': 251, 'wire': 42, 'compounds': 46, 'review': 994, 'label': 67, 'case': 122, 'state': 223, 'future': 71, 'between': 556, 'marriage': 48, 'approach': 320, 'multiplied': 166, 'C': 147, 'men': 1178, 'iron': 206, 'parent': 70, 'Study': 725, 'were': 884, 'agencies': 43, 'interfere': 57, 'incident': 253, 'rapid': 63, 'key': 383, 'less': 84, 'police': 52, 'AND': 70, 'reaction': 349, 'toward': 52, 'last': 87, 'license': 108, 'many': 56, 'drug': 111, 'whole': 820, 'asked': 66, 'experimental': 44, 'duty': 140, 'among': 122, 'merry': 46, 'point': 559, 'battery': 51, 'effective': 41, 'Mr': 52, 'colony': 43, 'community': 74, 'sampling': 51, 'Music': 85, 'consideration': 210, 'vessel': 84, 'sees': 52, 'Method': 127, 'better': 136, 'source': 117, 'addition': 65, 'been': 194, 'mark': 404, 'taste': 76, 'reduction': 43, 'Plan': 124, 'interest': 131, 'basic': 212, 'proposal': 125, 'hardly': 77, 'treatment': 284, 'wants': 42, 'pity': 52, 'life': 48, 'arises': 190, 'general': 290, 'fire': 43, 'offered': 78, 'systematic': 65, 'demand': 77, 'towns': 182, 'careful': 314, 'former': 68, 'applied': 582, 'Night': 223, 'Old': 131, 'look': 61, 'these': 1324, 'bill': 112, 'pretence': 70, 'single': 44, 'value': 90, 'General': 178, 'rope': 42, 'while': 143, 'College': 54, 'items': 94, 'report': 86, 'situation': 120, 'voice': 48, 'guide': 82, 'procedure': 632, 'layer': 44, 'is': 5518, 'it': 92, 'hardware': 71, 'player': 42, 'tenor': 52, 'comparative': 123, 'protein': 61, 'if': 76, 'different': 538, 'considerable': 219, 'patent': 69, 'pay': 124, 'began': 42, 'bowl': 116, 'same': 155, 'wherein': 47, 'corridor': 56, 'party': 53, 'stepped': 254, 'several': 169, 'higher': 118, 'independent': 75, 'I': 6513, 'comprehensive': 327, 'upon': 201, 'possible': 50, 'driven': 41, 'moment': 88, 'arrived': 41, 'arose': 46, 'student': 54, 'recent': 257, 'ingenious': 53, 'task': 142, "mother's": 514, 'makes': 46, 'cheek': 70, 'analysis': 154, 'thought': 43, 'person': 258, 'Philosophical': 134, 'relief': 148, 'thoroughly': 70, 'the': 19985, 'Dutch': 65, 'Service': 194, 'sentence': 68, 'being': 71, 'telegram': 118, 'Social': 452, 'Labor': 62, 'verses': 40, 'front': 104, 'thanks': 323, 'human': 68, 'alternative': 50, 'speed': 43, 'yet': 50, 'announcement': 65, 'now': 62, 'bibliography': 403, 'majority': 45, 'Far': 100, 'rose': 140, 'bell': 107, 'had': 531, 'simple': 229, 'subjects': 61, 'exhibition': 112, 'has': 822, 'Change': 80, 'On': 49, 'D': 42, 'fate': 67, 'couple': 722, 'Commons': 118, 'game': 54, 'judge': 86, 'world': 92, 'railroad': 58, 'cloud': 96, 'you': 175, 'color': 206, 'desire': 140, 'psychological': 127, 'like': 106, 'particle': 45, 'OF': 371, 'manual': 43, 'benefit': 42, 'individuals': 45, 'officer': 52, 'hung': 99, 'everyone': 54, 'security': 82, 'works': 58, 'page': 59, 'Little': 350, 'deal': 123, 'Army': 76, 'some': 60, 'back': 102, 'Commission': 46, 'growth': 41, 'sight': 46, 'New': 887, 'little': 467, 'Vienna': 131, 'scale': 54, 'leaf': 86, 'for': 2735, 'tube': 46, 'does': 406, 'market': 75, 'be': 784, 'particular': 171, 'blowing': 196, 'patient': 101, 'bold': 65, 'business': 41, 'decline': 77, 'seconds': 1878, 'rock': 56, 'graduates': 88, 'step': 52, 'Second': 84, 'Translated': 176, 'drawn': 64, 'post': 167, 'by': 964, 'First': 44, 'comparison': 290, 'limitation': 90, 'working': 55, 'extension': 113, 'of': 31799, 'newspapers': 269, 'practical': 217, 'union': 65, 'range': 51, 'important': 164, 'act': 44, 'tongue': 380, 'Captain': 46, 'slightly': 63, 'or': 421, 'silence': 41, 'pollution': 98, 'raised': 131, 'commodity': 90, 'communication': 192, 'obtaining': 146, 'satisfactory': 135, 'services': 51, 'Critical': 117, 'Approach': 237, 'your': 62, 'existed': 110, 'crowd': 71, 'her': 953, 'area': 126, 'gleam': 47, 'there': 60, 'question': 224, 'long': 207, 'print': 99, 'start': 123, 'lot': 818, 'scientific': 160, 'was': 3653, 'slight': 121, 'building': 50, 'from': 1530, 'an': 634, "child's": 92, 'bus': 41, 'but': 68, 'sophisticated': 59, 'volume': 157, 'Common': 59, 'cycle': 58, 'line': 87, 'with': 323, 'Guide': 1337, 'he': 57, 'made': 320, 'Ten': 53, 'characteristic': 108, 'inside': 48, 'up': 148, 'us': 46, 'record': 69, 'exploration': 58, 'emotional': 54, 'problem': 142, 'Women': 222, 'similar': 1295, 'called': 147, 'promises': 124, 'constant': 65, 'Introduction': 176, 'certain': 282, 'evidence': 169, 'as': 165, 'at': 2634, 'accounting': 319, 'plan': 55, 'face': 46, 'check': 404, 'no': 52, 'not': 508, 'corporation': 42, 'when': 66, 'wrought': 206, 'TO': 401, 'other': 869, 'role': 360, 'test': 228, 'Story': 49, 'Man': 46, 'Case': 62, 'picture': 180, 'plate': 57, 'behavioral': 60, 'poem': 44, 'occurred': 253, 'Look': 215, 'fell': 41, 'Baltimore': 48, 'HISTORY': 194, 'ago': 215, 'individual': 51, 'longer': 165, 'curve': 41, 'Governments': 139, 'portion': 47, 'came': 139, 'time': 324, 'fresh': 223, 'resolution': 78, 'decision': 218}
AB	{'of': 43, 'is': 43, 'the': 43, 'height': 43}
AND	{'A': 70, 'AND': 362, 'and': 47, 'ON': 52, 'OF': 1161, 'in': 804, 'LIBRARY': 74, 'etiology': 47, 'TO': 42, 'Printed': 804, 'IN': 333, 'NEW': 103, 'THE': 742, 'The': 47}
AT	{'his': 199, 'end': 199, 'OF': 48, 'of': 199, 'THE': 48, 'the': 199}
About	{'two': 291, 'Use': 102, 'there': 40, 'ten': 220, 'seven': 176, 'years': 220, 'four': 76, 'as': 86, 'Book': 178, 'our': 124, 'further': 76, 'Be': 105, 'from': 120, 'away': 47, 'six': 73, 'three': 47, 'also': 79, 'weeks': 40, 'we': 471, 'that': 79, 'I': 183, 'after': 251, 'here': 73, 'hours': 251, 'Against': 105, 'half': 124, 'THE': 108, 'one': 124, 'he': 79, 'ago': 220, 'on': 76, 'of': 402, 'age': 176, 'later': 40, 'State': 178, 'near': 43, 'miles': 196, 'time': 262, 'Our': 178, 'the': 563, 'spend': 108, 'came': 183}
Abuse	{'on': 509, 'Centre': 509, 'Canadian': 509}
Academy	{'a': 237, 'and': 75, 'Art': 75, 'Associate': 157, 'Students': 75, 'of': 545, 'Religion': 99, 'French': 237, 'Foreign': 157, 'Sciences': 237, 'American': 99, 'Collection': 52, 'in': 480, 'the': 764}
According	{'and': 94, 'the': 8698, 'International': 62, 'findings': 54, 'von': 68, 'some': 127, 'it': 93, 'an': 52, 'press': 127, 'proposition': 55, 'same': 259, 'in': 54, 'our': 186, 'person': 104, 'Rules': 94, 'writer': 93, 'ideas': 132, 'handed': 101, 'to': 10013, 'Alexander': 68, 'government': 539, 'who': 104, 'report': 539, 'one': 206, 'tariff': 49, 'tradition': 360, 'a': 717, 'German': 52, 'of': 8137, 'official': 154, 'reports': 127, 'this': 93, 'survey': 134, 'senior': 102, 'Technology': 62, 'nature': 7723, 'view': 87}
Account	{'and': 95, 'United': 735, 'of': 1379, 'An': 830, 'Historical': 95, 'battle': 322, 'the': 1057}
Act	{'and': 1371, 'summer': 400, 'own': 52, 'Reform': 87, 'is': 95, 'Bengal': 55, 'as': 56, 'United': 46, 'taking': 41, 'in': 1006, 'Department': 701, 'made': 207, 'your': 52, 'provided': 95, 'Parliament': 41, 'North': 189, 'for': 182, 'also': 56, 'provide': 46, 'discrimination': 67, 'been': 56, 'to': 623, 'setting': 52, 'passed': 158, 'Child': 113, 'America': 189, 'was': 298, 'important': 46, 'Societies': 93, 'Commerce': 45, 'Land': 55, 'broadly': 67, 'An': 526, 'Age': 126, 'Drug': 46, 'most': 43, 'said': 52, 'British': 189, 'Act': 250, 'such': 46, 'now': 52, 'Left': 113, 'on': 113, 'has': 56, 'of': 3169, 'originally': 56, 'sale': 95, 'or': 189, 'continue': 230, 'Elementary': 166, 'the': 2059, 'signal': 43, 'designed': 45}

In [10]:
# test on sample on emr
!python MakeStripes.py s3://ucb-mids-mls-katieadams/ngram-sample/gbooks_filtered_sample.txt -r emr --file '/Users/davidadams/Documents/W261/hw5/hw5_4/top10klist_wcounts.txt' --num-ec2-instances 2 --ec2-task-instance-type m1.small > stripes_small_emr.txt


using configs in /Users/davidadams/.mrjob.conf
using existing scratch bucket mrjob-1febc2c04977da79
using s3://mrjob-1febc2c04977da79/tmp/ as our scratch dir on S3
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035645.441002
writing master bootstrap script to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035645.441002/b.py

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

Copying non-input files into s3://mrjob-1febc2c04977da79/tmp/MakeStripes.davidadams.20151014.035645.441002/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-2TBKJW6D7Y2BO
Created new job flow j-2TBKJW6D7Y2BO
Job launched 31.1s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 62.2s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 93.3s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 124.3s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 155.6s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 186.7s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 217.9s ago, status STARTING: Configuring cluster software
Job launched 249.0s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 280.1s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 311.2s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 342.2s ago, status RUNNING: Running step (MakeStripes.davidadams.20151014.035645.441002: Step 1 of 1)
Job launched 373.3s ago, status RUNNING: Running step (MakeStripes.davidadams.20151014.035645.441002: Step 1 of 1)
Job launched 404.5s ago, status RUNNING: Running step (MakeStripes.davidadams.20151014.035645.441002: Step 1 of 1)
Job launched 435.5s ago, status RUNNING: Running step (MakeStripes.davidadams.20151014.035645.441002: Step 1 of 1)
Job launched 466.5s ago, status RUNNING: Running step (MakeStripes.davidadams.20151014.035645.441002: Step 1 of 1)
Job completed.
Running time was 140.0s (not counting time spent waiting for the EC2 instances)
ec2_key_pair_file not specified, going to S3
Fetching counters from S3...
Waiting 5.0s for S3 eventual consistency
Counters from step 1:
  (no counters found)
Streaming final output from s3://mrjob-1febc2c04977da79/tmp/MakeStripes.davidadams.20151014.035645.441002/output/
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/MakeStripes.davidadams.20151014.035645.441002
Removing all files in s3://mrjob-1febc2c04977da79/tmp/MakeStripes.davidadams.20151014.035645.441002/
Removing all files in s3://mrjob-1febc2c04977da79/tmp/logs/j-2TBKJW6D7Y2BO/
Terminating job flow: j-2TBKJW6D7Y2BO

In [11]:
!head stripes_small_emr.txt


A	{'all': 551, 'evidence': 169, 'issued': 111, 'lack': 85, 'people': 757, 'month': 264, 'consists': 45, 'revealed': 118, 'sung': 52, 'whose': 70, 'young': 120, 'to': 4817, 'program': 49, 'Western': 42, 'buyer': 116, 'smile': 175, 'pleasing': 46, 'presentation': 43, 'worth': 175, 'sent': 213, 'division': 52, 'woman': 208, 'seized': 65, 'very': 668, 'Constitutional': 187, 'wave': 148, 'medicine': 108, 'instruments': 82, 'Financial': 62, 'difference': 384, 'dim': 43, 'and': 3988, 'presented': 42, 'list': 88, 'large': 1097, 'sand': 45, 'small': 252, 'round': 251, 'Select': 118, 'TO': 401, 'force': 41, 'findings': 117, 'trend': 626, 'sigh': 46, 'streets': 154, 'Rev': 52, 'direct': 64, 'pulse': 47, 'go': 42, 'second': 739, 'perspective': 127, 'lawyer': 50, 'rigorous': 110, 'further': 408, 'air': 98, 'shining': 43, 'established': 43, 'appear': 404, 'celebrated': 52, 'brief': 88, 'current': 80, 'waiting': 161, 'version': 598, 'treatise': 131, 'churches': 100, 'learned': 53, 'appeal': 57, 'method': 619, 'filled': 46, 'body': 661, 'full': 703, 'led': 42, 'Treatise': 178, 'degree': 55, 'never': 184, 'understanding': 52, 'water': 84, 'pause': 49, 'focused': 83, 'English': 65, 'consideration': 210, 'Distribution': 64, 'My': 355, 'teacher': 68, 'change': 119, 'boy': 151, 'great': 470, 'voice': 48, 'Record': 607, 'items': 94, 'study': 1174, 'treatment': 284, 'reports': 41, 'example': 3345, 'trial': 45, 'amount': 580, 'survey': 154, 'smoke': 140, 'social': 61, 'military': 116, 'adolescents': 85, 'pillow': 108, 'fell': 41, 'ventured': 65, 'useful': 165, 'secure': 86, 'Three': 294, 'private': 60, 'brought': 49, 'Book': 500, 'names': 106, 'stern': 140, 'glance': 593, 'singing': 49, 'market': 75, 'Europe': 86, 'Star': 58, 'Word': 172, 'working': 55, 'positive': 87, 'hospital': 84, 'contains': 68, 'two': 265, 'few': 2407, 'examination': 379, 'strike': 118, 'type': 116, 'more': 653, 'door': 102, 'knows': 130, 'comparison': 290, 'company': 58, 'about': 482, 'glass': 57, 'American': 162, 'broke': 175, 'particular': 171, 'known': 50, 'Procedure': 81, 'must': 125, 'account': 47, 'word': 364, 'heard': 54, 'this': 872, 'car': 249, 'work': 282, "patient's": 791, 'values': 398, 'can': 879, 'growing': 397, 'laughter': 185, 'could': 77, 'history': 325, 'control': 98, 'heart': 134, 'stream': 76, 'give': 6149, 'calculation': 51, 'choice': 77, 'sense': 230, 'organized': 44, 'escaped': 90, 'indicates': 206, 'council': 1174, 'Dutch': 65, 'phrase': 77, 'species': 129, 'motion': 51, 'court': 137, 'syndrome': 42, 'rather': 44, 'discussion': 45, 'length': 52, 'how': 196, 'H': 66, 'cluster': 42, 'record': 69, 'panic': 65, 'stock': 67, 'A': 180, 'description': 588, 'may': 764, 'after': 208, 'collection': 71, 'diagram': 103, 'wrong': 49, 'coming': 43, 'such': 96, 'Model': 158, 'law': 292, 'data': 276, 'man': 1467, 'a': 4685, 'short': 76, 'Letters': 101, 'effective': 41, 'light': 525, 'element': 169, 'so': 71, 'allow': 67, 'furious': 196, 'representation': 140, 'order': 51, 'wind': 196, 'Catholic': 48, 'What': 119, 'passages': 231, 'office': 79, 'over': 312, 'portion': 47, 'soon': 45, 'feature': 488, 'produced': 62, 'Popular': 46, 'through': 257, 'committee': 102, 'White': 452, 'wherein': 47, 'its': 101, 'before': 57, 'style': 48, 'His': 67, 'group': 181, 'manuscript': 64, 'personal': 104, 'late': 50, 'essays': 85, 'listing': 106, 'weeks': 154, 'might': 67, 'finer': 76, 'then': 258, 'good': 1035, 'diseases': 234, 'practice': 108, 'Analysis': 336, 'compound': 162, 'communities': 68, 'now': 62, 'day': 165, 'bread': 116, 'term': 52, 'document': 55, 'lies': 86, 'opera': 59, 'India': 75, 'applied': 582, 'stopped': 41, 'each': 355, 'found': 174, 'bond': 65, 'square': 80, 'From': 272, 'significantly': 133, 'series': 63, 'related': 239, 'society': 78, 'variety': 391, 'entering': 73, 'year': 48, 'girl': 59, 'special': 54, 'out': 115, 'shown': 44, 'factors': 57, 'vessels': 56, 'research': 277, 'William': 281, 'dominated': 78, 'Report': 82, 'issue': 139, 'red': 206, 'turning': 188, 'theory': 231, 'written': 55, 'story': 85, 'free': 41, 'difficulty': 108, 'British': 252, 'murmur': 46, 'estimate': 128, 'care': 55, 'definition': 51, 'training': 49, 'language': 71, 'question': 224, 'National': 131, 'transition': 53, 'times': 54, 'filter': 45, 'turn': 72, 'conducted': 54, 'place': 123, 'Human': 607, 'fallen': 69, 'think': 119, 'precepts': 43, 'first': 72, 'major': 372, 'features': 50, 'Working': 82, 'striking': 492, 'sentence': 68, 'clause': 146, 'number': 3781, 'one': 249, 'randomized': 46, 'Black': 249, 'wages': 118, 'open': 46, 'riding': 81, 'city': 43, 'given': 169, 'fair': 1291, 'Dictionary': 364, 'needed': 79, 'introduction': 45, 'twenty': 59, 'their': 264, 'wonderful': 47, 'white': 274, 'View': 277, 'Data': 742, 'friend': 580, 'B': 1426, 'lot': 818, 'that': 1210, 'explanation': 192, 'D': 42, 'part': 117, 'natural': 250, 'copy': 177, 'than': 71, 'Road': 65, 'History': 1041, 'wide': 95, 'kind': 339, 'liquid': 50, 'incision': 46, 'elected': 55, 'bits': 97, 'tree': 86, 'State': 139, 'bed': 45, 'peculiar': 435, 'matter': 164, 'patients': 148, 'determined': 117, 'Relation': 52, 'iron': 206, 'Series': 111, 'toward': 52, 'result': 44, 'ingenious': 53, 'bridge': 71, 'sees': 52, 'God': 46, 'defect': 45, 'Story': 49, 'comfortable': 116, 'rat': 72, 'have': 600, 'anger': 44, 'my': 657, 'demonstration': 71, "God's": 84, 'relatively': 59, 'hitherto': 45, 'Christian': 215, 'built': 71, 'thoroughly': 70, 'note': 380, 'also': 772, 'subjects': 61, 'analogy': 165, 'meeting': 76, 'finding': 44, 'Spirit': 102, 'Articles': 111, 'multiple': 44, 'fundamental': 70, 'any': 72, 'object': 47, 'said': 55, 'eight': 51, 'printed': 83, 'significant': 503, 'preliminary': 116, 'phase': 45, 'Biological': 52, 'model': 95, 'stands': 55, 'FROM': 319, 'surprising': 111, 'considered': 54, 'later': 2546, 'treaty': 146, 'treasures': 45, 'Film': 419, 'barrel': 67, 'came': 139, 'impression': 67, 'selection': 85, 'shot': 52, 'pope': 55, 'jurisdiction': 60, 'Wilson': 41, 'longitudinal': 52, 'relation': 194, 'fine': 4722, 'repugnant': 60, 'European': 46, 'slow': 45, 'based': 81, 'with': 323, 'writer': 235, 'proportion': 96, 'state': 223, 'French': 539, 'sugar': 81, 'tears': 66, 'score': 59, 'York': 50, 'enthusiasm': 85, 'hope': 47, 'equilibrium': 317, 'plate': 57, 'his': 169, 'means': 204, 'commodity': 90, 'membrane': 53, 'watch': 509, 'beam': 348, 'remedy': 57, 'cannot': 58, 'closely': 100, 'new': 6985, 'report': 86, 'during': 483, 'THE': 552, 'years': 265, 'him': 66, 'areas': 162, 'enemy': 52, 'bar': 206, 'Ten': 53, 'held': 1245, 'cry': 105, 'summary': 117, 'other': 869, 'common': 58, 'characteristic': 108, 'view': 118, 'set': 669, 'human': 68, 'reference': 70, 'plausible': 86, 'Source': 52, 'computer': 71, 'Handbook': 58, 'are': 153, 'portrait': 137, 'cold': 42, 'battery': 51, 'concern': 87, 'voyage': 251, 'wire': 42, 'compounds': 46, 'review': 994, 'label': 67, 'catalogue': 176, 'future': 71, 'between': 556, 'drawn': 64, 'approach': 320, 'multiplied': 166, 'C': 147, 'men': 1178, 'historical': 109, 'parent': 70, 'Study': 725, 'were': 884, 'agencies': 43, 'interfere': 57, 'rope': 42, 'incident': 253, 'merry': 46, 'limitation': 90, 'police': 52, 'Letter': 281, 'AND': 70, 'reaction': 349, 'feeling': 74, 'last': 87, 'license': 108, 'many': 56, 'drug': 111, 'whole': 820, 'asked': 66, 'experimental': 44, 'duty': 140, 'among': 122, 'key': 383, 'bell': 107, 'simple': 229, 'Mr': 52, 'colony': 43, 'community': 74, 'sampling': 51, 'Music': 85, 'path': 51, 'vessel': 84, 'false': 169, 'Method': 127, 'better': 136, 'addition': 65, 'been': 194, 'mark': 404, 'taste': 76, 'reduction': 43, 'Plan': 124, 'interest': 131, 'basic': 212, 'proposal': 125, 'hardly': 77, 'Women': 222, 'wants': 42, 'pity': 52, 'life': 48, 'arises': 190, 'fire': 43, 'offered': 78, 'systematic': 65, 'demand': 77, 'towns': 182, 'careful': 314, 'former': 68, 'case': 122, 'Night': 223, 'Old': 131, 'look': 61, 'these': 1324, 'bill': 112, 'pretence': 70, 'single': 44, 'value': 90, 'General': 178, 'will': 601, 'while': 143, 'College': 54, 'situation': 120, 'balance': 90, 'guide': 82, 'procedure': 632, 'layer': 44, 'is': 5518, 'it': 92, 'Man': 46, 'hardware': 71, 'player': 42, 'unit': 132, 'tenor': 52, 'comparative': 123, 'protein': 61, 'if': 76, 'different': 538, 'considerable': 219, 'patent': 69, 'pay': 124, 'began': 42, 'bowl': 116, 'same': 155, 'Nation': 99, 'guns': 51, 'corridor': 56, 'party': 53, 'stepped': 254, 'several': 169, 'higher': 118, 'independent': 75, 'du': 65, 'I': 6513, 'comprehensive': 327, 'upon': 201, 'possible': 50, 'driven': 41, 'moment': 88, 'arrived': 41, 'arose': 46, 'student': 54, 'recent': 257, 'task': 142, 'in': 3620, "mother's": 514, 'makes': 46, 'cheek': 70, 'analysis': 154, 'thought': 43, 'person': 258, 'Philosophical': 134, 'relief': 148, 'studying': 64, 'the': 19985, 'Service': 194, 'less': 84, 'being': 71, 'telegram': 118, 'Social': 452, 'Labor': 62, 'verses': 40, 'front': 104, 'thanks': 323, 'questions': 210, 'alternative': 50, 'speed': 43, 'yet': 50, 'announcement': 65, 'rapid': 63, 'bibliography': 403, 'majority': 45, 'Far': 100, 'rose': 140, 'point': 559, 'had': 531, 'source': 117, 'potential': 66, 'exhibition': 112, 'has': 822, 'Change': 80, 'On': 49, 'which': 734, 'fate': 67, 'couple': 722, 'Commons': 118, 'game': 54, 'judge': 86, 'world': 92, 'railroad': 58, 'color': 206, 'desire': 140, 'psychological': 127, 'like': 106, 'particle': 45, 'OF': 371, 'manual': 43, 'benefit': 42, 'individuals': 45, 'officer': 52, 'hung': 99, 'everyone': 54, 'security': 82, 'works': 58, 'page': 59, 'Little': 350, 'who': 514, 'deal': 123, 'Army': 76, 'do': 264, 'some': 60, 'back': 102, 'Commission': 46, 'growth': 41, 'sight': 46, 'New': 887, 'little': 467, 'Vienna': 131, 'scale': 54, 'leaf': 86, 'for': 2735, 'tube': 46, 'does': 406, 'cloud': 96, 'be': 784, 'marriage': 48, 'blowing': 196, 'patient': 101, 'bold': 65, 'business': 41, 'decline': 77, 'seconds': 1878, 'rock': 56, 'graduates': 88, 'step': 52, 'Translated': 176, 'from': 1530, 'post': 167, 'by': 964, 'First': 44, 'on': 1046, 'stone': 51, 'would': 281, 'extension': 113, 'of': 31799, 'newspapers': 269, 'practical': 217, 'convenient': 123, 'range': 51, 'journey': 47, 'act': 44, 'tongue': 380, 'Captain': 46, 'slightly': 63, 'or': 421, 'silence': 41, 'pollution': 98, 'raised': 131, 'letter': 168, 'communication': 192, 'obtaining': 146, 'satisfactory': 135, 'services': 51, 'Critical': 117, 'Approach': 237, 'your': 62, 'existed': 110, 'crowd': 71, 'her': 953, 'area': 126, 'gleam': 47, 'there': 60, 'print': 99, 'long': 207, 'start': 123, 'way': 48, 'scientific': 160, 'was': 3653, 'slight': 121, 'third': 47, 'building': 50, 'an': 634, "child's": 92, 'bus': 41, 'but': 68, 'sophisticated': 59, 'volume': 157, 'Common': 59, 'cycle': 58, 'line': 87, 'true': 52, 'Guide': 1337, 'he': 57, 'made': 320, 'skull': 47, 'House': 191, 'inside': 48, 'up': 148, 'us': 46, 'placed': 108, 'exploration': 58, 'emotional': 54, 'problem': 142, 'Governments': 139, 'similar': 1295, 'called': 147, 'promises': 124, 'constant': 65, 'Introduction': 176, 'certain': 282, 'general': 290, 'as': 165, 'at': 2634, 'accounting': 319, 'plan': 55, 'face': 46, 'check': 404, 'no': 52, 'not': 508, 'corporation': 42, 'when': 66, 'wrought': 206, 'application': 217, 'book': 101, 'role': 360, 'test': 228, 'you': 175, 'conclusion': 116, 'Case': 62, 'picture': 180, 'behavioral': 60, 'poem': 44, 'occurred': 253, 'Look': 215, 'important': 164, 'union': 65, 'Baltimore': 48, 'HISTORY': 194, 'ago': 215, 'individual': 51, 'longer': 165, 'curve': 41, 'Second': 84, 'lying': 49, 'time': 324, 'fresh': 223, 'resolution': 78, 'decision': 218}
AB	{'of': 43, 'is': 43, 'the': 43, 'height': 43}
AND	{'and': 47, 'A': 70, 'ON': 52, 'OF': 1161, 'in': 804, 'AND': 362, 'LIBRARY': 74, 'etiology': 47, 'TO': 42, 'Printed': 804, 'IN': 333, 'NEW': 103, 'The': 47, 'THE': 742}
AT	{'his': 199, 'end': 199, 'OF': 48, 'of': 199, 'THE': 48, 'the': 199}
About	{'seven': 176, 'there': 40, 'ten': 220, 'Use': 102, 'two': 291, 'four': 76, 'as': 86, 'Book': 178, 'further': 76, 'our': 124, 'one': 124, 'Be': 105, 'from': 120, 'away': 47, 'six': 73, 'three': 47, 'also': 79, 'weeks': 40, 'we': 471, 'that': 79, 'I': 183, 'after': 251, 'here': 73, 'hours': 251, 'Against': 105, 'half': 124, 'THE': 108, 'years': 220, 'he': 79, 'ago': 220, 'on': 76, 'of': 402, 'age': 176, 'later': 40, 'State': 178, 'near': 43, 'miles': 196, 'time': 262, 'Our': 178, 'the': 563, 'spend': 108, 'came': 183}
Abuse	{'on': 509, 'Centre': 509, 'Canadian': 509}
Academy	{'and': 75, 'a': 237, 'Art': 75, 'Associate': 157, 'Students': 75, 'of': 545, 'Sciences': 237, 'Collection': 52, 'Foreign': 157, 'Religion': 99, 'American': 99, 'French': 237, 'in': 480, 'the': 764}
According	{'and': 94, 'International': 62, 'findings': 54, 'von': 68, 'some': 127, 'it': 93, 'an': 52, 'press': 127, 'proposition': 55, 'same': 259, 'in': 54, 'our': 186, 'person': 104, 'Rules': 94, 'writer': 93, 'ideas': 132, 'handed': 101, 'to': 10013, 'Alexander': 68, 'government': 539, 'who': 104, 'report': 539, 'one': 206, 'tariff': 49, 'tradition': 360, 'a': 717, 'German': 52, 'of': 8137, 'official': 154, 'reports': 127, 'senior': 102, 'this': 93, 'survey': 134, 'the': 8698, 'Technology': 62, 'nature': 7723, 'view': 87}
Account	{'and': 95, 'United': 735, 'of': 1379, 'An': 830, 'Historical': 95, 'battle': 322, 'the': 1057}
Act	{'and': 1371, 'summer': 400, 'own': 52, 'Reform': 87, 'is': 95, 'Bengal': 55, 'as': 56, 'United': 46, 'in': 1006, 'Department': 701, 'Drug': 46, 'your': 52, 'provided': 95, 'Parliament': 41, 'North': 189, 'for': 182, 'to': 623, 'provide': 46, 'discrimination': 67, 'been': 56, 'also': 56, 'setting': 52, 'America': 189, 'passed': 158, 'Child': 113, 'has': 56, 'was': 298, 'originally': 56, 'important': 46, 'Societies': 93, 'Commerce': 45, 'Land': 55, 'broadly': 67, 'Age': 126, 'British': 189, 'most': 43, 'said': 52, 'designed': 45, 'Act': 250, 'such': 46, 'now': 52, 'Left': 113, 'on': 113, 'made': 207, 'of': 3169, 'signal': 43, 'or': 189, 'sale': 95, 'An': 526, 'continue': 230, 'Elementary': 166, 'the': 2059, 'taking': 41}

In [ ]:
# run on all data on emr
!python MakeStripes.py s3://filtered-5grams/* -r emr --file '/Users/davidadams/Documents/W261/hw5/hw5_4/top10klist_wcounts.txt' --num-ec2-instances 5 --ec2-task-instance-type m1.small > stripes_all_emr.txt
# for output see attached pdf for 5.4.1

In [12]:
!head stripes_all_emr.txt


AIDS	{'limited': 91, 'all': 1387, 'evidence': 142, 'caused': 1399, 'global': 74, 'results': 159, 'four': 43, 'pathogenesis': 935, 'Department': 155, 'children': 1068, 'causes': 176, 'whose': 42, 'suicide': 50, 'disorders': 61, 'young': 124, 'created': 44, 'to': 19694, 'program': 210, 'those': 1444, 'stable': 69, 'Day': 54, 'degeneration': 73, 'Training': 702, 'woman': 68, 'agents': 90, 'risk': 4882, 'very': 137, 'J': 366, 'continues': 90, 'surgeon': 48, 'difference': 172, 'condition': 57, 'states': 66, 'treating': 59, 'Reference': 83, 'presented': 117, 'transmission': 850, 'cause': 1548, 'Global': 4057, 'list': 215, 'solution': 101, 'small': 79, 'Impact': 520, 'prevent': 1555, 'work': 90, 'occurrence': 60, 'direct': 43, 'battle': 52, 'rate': 369, 'cost': 165, 'Organization': 122, 'Red': 51, 'educate': 74, 'stands': 55, 'contracted': 492, 'attitudes': 135, 'issues': 98, 'new': 1788, 'increasing': 114, 'public': 316, 'movement': 48, 'full': 43, 'Foundation': 342, 'Age': 755, 'men': 878, 'understanding': 48, 'lack': 42, 'reported': 4125, 'cultural': 140, 'groups': 48, 'active': 130, 'safer': 75, 'Changes': 55, 'appears': 103, 'State': 253, 'study': 174, 'commonly': 56, 'Children': 251, 'trial': 51, 'simply': 107, 'breakdown': 58, 'diagnosed': 787, 'social': 337, 'action': 251, 'studies': 48, 'punishment': 115, 'control': 232, 'Group': 1400, 'campaign': 48, 'danger': 217, 'When': 100, 'psychological': 129, 'Book': 45, 'Health': 281, 'Treatment': 153, 'counts': 48, 'total': 814, 'crisis': 1540, 'use': 261, 'from': 6219, 'working': 97, 'positive': 880, 'prevalence': 292, "world's": 172, 'next': 161, 'call': 51, 'criteria': 68, 'stage': 123, 'more': 708, 'occurred': 107, 'infection': 4943, 'tested': 419, 'American': 644, 'adult': 257, 'particular': 47, 'known': 190, 'rare': 60, 'midst': 328, 'women': 1237, 'topic': 130, 'rights': 185, 'this': 338, 'local': 52, 'challenge': 198, 'clients': 63, 'halt': 201, 'tissue': 68, 'era': 583, 'December': 54, 'believed': 67, 'meet': 68, 'male': 160, 'National': 2577, 'history': 658, 'Conference': 4831, 'predict': 114, 'Fourth': 63, 'purposes': 52, 'agent': 237, 'high': 1315, 'numbers': 303, 'tract': 78, 'allowed': 99, 'Policy': 495, 'Australian': 130, 'occur': 138, 'information': 229, 'needs': 169, 'end': 478, 'syndrome': 975, 'HIV': 11995, 'carriers': 245, 'Journal': 45, 'get': 254, 'feature': 53, 'immune': 920, 'lesions': 384, 'if': 43, 'III': 95, 'A': 482, 'rise': 149, 'threat': 1218, 'may': 985, 'after': 617, 'Disease': 101, 'per': 400, 'fatigue': 78, 'blood': 187, 'Yale': 101, 'Swiss': 59, 'such': 1877, 'marrow': 66, 'response': 1345, 'man': 489, 'a': 9330, 'refuse': 56, 'organization': 96, 'natural': 247, 'neck': 54, 'third': 123, 'Mexico': 70, 'light': 243, 'Community': 83, 'One': 147, 'Politics': 610, 'Trust': 75, 'order': 108, 'talk': 59, 'restriction': 190, 'Nervous': 457, 'over': 41, 'raised': 99, 'years': 355, 'brain': 356, 'through': 525, 'signs': 67, 'White': 1014, 'gastrointestinal': 150, 'its': 375, 'before': 271, 'Body': 87, 'rapidly': 43, 'how': 204, 'susceptible': 57, "Women's": 153, 'late': 208, 'window': 53, 'vast': 45, 'World': 141, 'Italian': 108, 'might': 88, 'image': 51, 'Sound': 90, 'eventually': 71, 'someone': 54, 'affected': 145, 'diseases': 1577, 'Psychological': 55, 'number': 2542, 'example': 110, 'effects': 549, 'Social': 752, 'not': 2463, 'documented': 54, 'hearing': 40, 'association': 340, 'Control': 223, 'growing': 189, 'San': 882, 'profound': 58, 'advent': 1426, 'identified': 97, 'did': 48, 'magnitude': 160, 'transmitted': 1176, 'Council': 75, 'die': 279, 'found': 457, 'bone': 66, 'referred': 62, 'reactions': 81, 'From': 67, 'primarily': 53, 'weight': 73, 'series': 63, 'reduce': 521, 'related': 688, 'contracting': 949, 'Angeles': 291, 'year': 164, 'relates': 52, 'sexual': 235, 'special': 90, 'living': 1800, 'lovers': 42, 'psychology': 124, 'intercourse': 62, 'factors': 233, 'cerebral': 93, 'since': 68, 'research': 213, 'increase': 254, 'Construction': 76, 'health': 293, 'Report': 436, 'receiving': 111, 'issue': 574, 'announced': 75, 'perception': 293, 'belief': 171, 'interactions': 105, 'Work': 47, 'sensory': 82, 'attending': 46, 'members': 84, 'Studies': 1100, 'care': 1105, 'definition': 1958, 'training': 78, 'service': 52, 'People': 427, 'Federation': 130, 'could': 157, 'times': 69, 'thing': 93, 'conducted': 57, 'Human': 1015, 'consequence': 123, 'first': 1641, 'origin': 321, 'already': 257, 'Working': 75, 'Review': 45, 'primary': 285, 'Asia': 47, 'one': 762, 'Program': 61, 'another': 44, 'changing': 102, 'carry': 103, 'AIDS': 5624, 'quality': 65, 'size': 54, 'little': 50, 'management': 796, 'North': 47, 'likely': 372, 'introduction': 123, 'leading': 238, 'system': 1058, 'their': 211, 'rates': 235, 'listed': 156, 'percentage': 56, 'final': 69, 'Association': 1736, 'prone': 49, 'B': 107, 'relationship': 248, 'that': 5223, 'brains': 162, 'transverse': 45, 'part': 144, 'manifestation': 555, 'believe': 43, 'than': 535, 'Advisory': 52, 'anxiety': 53, 'course': 71, 'second': 107, 'limiting': 80, 'patients': 33134, 'absence': 171, 'risks': 143, 'sole': 147, 'were': 1358, 'treated': 679, 'result': 535, 'and': 49389, 'Information': 128, 'half': 162, 'analysis': 51, 'have': 4262, 'need': 194, 'seen': 1824, 'subjects': 166, "God's": 72, 'relatively': 91, 'States': 69, 'Diseases': 442, 'Resources': 835, 'responsible': 197, 'isolated': 163, 'contact': 58, 'onset': 1703, 'majority': 800, 'potential': 46, 'generalized': 74, 'which': 562, 'transmit': 144, 'With': 501, 'towards': 103, 'pain': 116, 'combat': 115, 'collaboration': 120, 'incidence': 1321, 'sequence': 561, 'most': 520, 'virus': 8483, 'said': 45, 'Crisis': 359, 'representation': 85, 'services': 314, 'The': 4149, 'cases': 12976, 'million': 114, 'especially': 219, 'consequences': 53, 'clear': 123, 'later': 120, 'implications': 232, 'disease': 2177, 'face': 165, 'Table': 81, 'definitely': 91, 'Survey': 74, 'Prevention': 1946, 'notion': 57, 'fact': 251, 'Political': 130, 'Research': 2207, 'suffer': 48, 'fear': 343, 'decade': 429, 'impact': 3653, 'partners': 150, 'slow': 376, 'knowledge': 83, 'became': 123, 'nervous': 427, 'proportion': 744, 'French': 48, 'should': 568, 'tears': 55, 'rural': 147, 'black': 80, 'York': 439, 'developed': 108, 'go': 443, 'thousands': 48, 'mortality': 230, 'do': 177, 'his': 105, 'means': 49, 'stop': 67, 'lead': 52, 'cannot': 105, 'tuberculosis': 279, 'international': 110, 'during': 50, 'THE': 509, 'Once': 59, 'scene': 42, 'countries': 256, 'New': 487, 'conditions': 75, 'aspects': 253, 'experience': 63, 'common': 1934, 'Education': 893, 'where': 59, 'view': 118, 'set': 47, 'aspect': 86, 'reference': 176, 'acquired': 970, 'attached': 43, 'testing': 97, 'sex': 228, 'observed': 62, 'Source': 189, 'are': 3658, 'Sex': 97, 'subject': 240, 'evaluation': 61, 'See': 114, 'tend': 48, 'unable': 50, 'between': 478, 'progress': 50, 'approach': 116, 'alveolar': 52, 'attention': 63, 'terms': 85, 'responses': 50, 'parent': 53, 'nature': 71, 'Study': 1339, 'lover': 100, 'Manual': 47, 'extent': 51, 'African': 322, 'group': 92, 'come': 48, 'problem': 367, 'responded': 99, 'AND': 67, 'received': 46, 'improved': 55, 'money': 322, 'country': 129, 'ill': 45, 'against': 930, 'conference': 174, 'manifestations': 124, 'Statistical': 126, 'expression': 201, 'progression': 1830, 'comes': 51, 'duty': 49, 'among': 5407, 'Death': 63, 'cancer': 683, 'can': 1114, 'appeared': 42, 'period': 293, 'epidemic': 14055, 'Sciences': 92, 'anti': 105, 'recognized': 139, 'pulmonary': 49, 'caring': 581, 'symptoms': 224, 'complication': 48, 'described': 139, 'raise': 250, 'addition': 43, 'know': 159, 'political': 77, 'detected': 62, 'immunodeficiency': 257, 'strategy': 62, 'Early': 236, 'whom': 80, 'California': 50, 'treat': 471, 'maintenance': 192, 'important': 41, 'Women': 1098, 'field': 51, 'life': 252, 'tissues': 78, 'Center': 2131, 'gay': 1247, 'reaction': 44, 'exposed': 59, 'Nations': 150, 'cent': 475, 'wake': 108, 'liver': 67, 'lung': 84, 'child': 507, 'general': 208, 'present': 56, 'case': 2466, 'Force': 1152, 'resulting': 44, 'developing': 823, 'expanded': 251, 'appearance': 295, 'as': 5743, 'will': 935, 'tragedy': 119, 'Clinical': 1800, 'behavior': 140, 'aid': 40, 'property': 79, 'region': 71, 'City': 71, 'VIII': 783, 'is': 12459, 'it': 285, 'strain': 58, 'Task': 1152, 'drug': 782, 'in': 57083, 'partner': 49, 'worry': 148, 'greater': 61, 'confirmed': 41, 'develop': 734, 'severity': 145, 'Sixth': 81, 'began': 187, 'any': 59, 'cope': 69, 'complex': 473, 'outbreak': 249, 'higher': 105, 'development': 2202, 'used': 121, 'context': 528, 'I': 165, 'reporting': 83, 'effect': 519, 'IV': 124, 'director': 112, 'persons': 3385, 'levels': 143, 'two': 46, 'frequently': 64, 'nation': 53, 'hepatitis': 260, 'Project': 544, 'antibodies': 134, 'older': 180, 'adolescents': 2205, 'Is': 54, 'well': 621, 'increases': 48, 'fighting': 48, 'thought': 56, 'person': 850, 'without': 921, 'In': 1266, 'therapy': 60, 'the': 97212, 'prevention': 1246, 'Native': 372, 'Living': 345, 'United': 3522, 'just': 67, 'being': 124, 'when': 164, 'accurate': 46, 'executive': 68, 'communities': 68, 'human': 224, 'world': 533, 'facts': 59, 'yet': 50, 'death': 828, 'Aspects': 64, 'also': 159, 'Service': 47, 'characterization': 113, 'had': 2564, 'spread': 6773, 'parents': 413, 'treatment': 1407, 'has': 3135, 'divided': 42, 'victims': 283, 'increased': 195, 'loss': 342, 'Many': 57, 'survival': 58, 'possible': 107, 'early': 119, 'suffering': 695, 'five': 100, 'community': 199, 'confined': 45, 'using': 255, 'emergence': 99, 'Care': 366, 'usually': 44, 'advanced': 583, 'occurring': 122, 'ON': 45, 'like': 102, 'lost': 53, 'Los': 291, 'OF': 57, 'continue': 182, 'become': 161, 'right': 45, 'old': 192, 'often': 93, 'deal': 429, 'people': 7142, 'West': 440, 'creation': 56, 'some': 104, 'System': 457, 'Commission': 659, 'trends': 84, 'economic': 376, 'Cooperation': 71, 'International': 5311, 'indirect': 45, 'recognition': 50, 'Economy': 130, 'both': 413, 'for': 15123, 'contacts': 154, 'avoid': 97, 'Seventh': 620, 'legal': 47, 'Fund': 2328, 'defined': 70, 'intravenous': 399, 'cord': 73, 'be': 3470, 'who': 4791, 'patient': 880, 'stem': 45, 'deadly': 44, 'behaviors': 79, 'by': 4848, 'First': 83, 'on': 11836, 'about': 1951, 'central': 291, 'origins': 124, 'getting': 285, 'of': 98731, 'violence': 48, 'US': 472, 'costs': 45, 'carrier': 99, 'Americans': 90, 'Committee': 62, 'trials': 149, 'mean': 113, 'or': 7138, 'outset': 85, 'presence': 286, 'into': 380, 'within': 145, 'son': 108, 'chronic': 194, 'because': 536, 'receptor': 980, 'been': 3554, 'female': 88, 'reducing': 57, 'Approach': 92, 'intervention': 46, 'her': 68, 'support': 210, 'initial': 353, 'limit': 138, 'long': 96, 'fight': 817, 'start': 244, 'clinical': 352, 'infected': 939, 'South': 265, 'way': 263, 'resulted': 99, 'was': 3207, 'only': 182, 'form': 49, 'funds': 51, 'regard': 58, 'but': 164, 'contract': 60, 'construction': 151, 'link': 42, 'highest': 118, 'with': 53780, 'Guide': 77, 'he': 508, 'made': 182, 'Effect': 61, 'Office': 632, 'deficiency': 722, 'Africa': 1937, 'up': 47, 'different': 43, 'Society': 321, 'growth': 96, 'complications': 231, 'distribution': 52, 'stages': 201, 'similar': 67, 'medical': 560, 'associated': 1578, 'adults': 517, 'deaths': 311, 'nurses': 46, 'infections': 298, 'sexually': 1838, 'Cross': 51, 'an': 2336, 'single': 100, 'cure': 1626, 'at': 3593, 'home': 105, 'politics': 173, 'education': 330, 'ward': 47, 'Francisco': 809, 'dying': 1142, 'no': 952, 'personnel': 64, 'percent': 925, 'reality': 119, 'preventing': 279, 'TO': 111, 'other': 2880, 'role': 41, 'test': 302, 'you': 101, 'Law': 184, 'Case': 217, 'users': 325, 'problems': 129, 'attributed': 64, 'variation': 54, 'Practical': 92, 'Future': 59, 'Brazil': 132, 'peak': 57, 'coverage': 79, 'Act': 155, 'included': 48, 'Making': 350, 'friends': 108, 'died': 3159, 'Elizabeth': 255, 'ago': 46, 'longer': 62, 'age': 138, 'persistent': 74, 'An': 58, 'As': 87, 'mass': 52, 'presenting': 93, 'diagnosis': 1489, 'time': 1305, 'serious': 123, 'oral': 62, 'having': 212, 'Geography': 118}
Abraham	{'all': 1883, 'writings': 100, 'founder': 344, 'issued': 373, 'souls': 131, 'tested': 222, 'founded': 76, 'hath': 89, 'Take': 47, 'go': 273, 'Moses': 1062, 'saved': 61, 'children': 5895, 'William': 124, 'certainly': 55, 'graves': 54, 'disciples': 226, 'head': 81, 'description': 103, 'father': 13128, 'young': 523, 'send': 696, 'to': 90862, 'asking': 93, 'those': 483, 'under': 296, 'lord': 289, 'Now': 406, 'Day': 182, 'sent': 328, 'case': 1013, 'returned': 681, 'sitting': 244, 'far': 450, 'rise': 51, 'account': 416, 'choice': 232, 'P': 202, 'sons': 1702, 'merit': 141, 'telling': 61, 'yourselves': 42, 'little': 44, 'Jersey': 213, 'Thomas': 1409, 'impressive': 50, 'Just': 62, 'did': 1995, 'forth': 1304, 'Birth': 51, 'brother': 62, 'grandfather': 109, 'leave': 354, 'race': 429, 'form': 63, 'dealt': 66, 'George': 851, 'spiritual': 342, 'work': 288, 'says': 49, 'ten': 150, 'widow': 213, 'sign': 82, 'preacher': 84, 'Department': 53, 'second': 591, 'lawyer': 124, 'equal': 106, 'further': 54, 'even': 1116, 'established': 294, 'what': 1393, 'hide': 1829, 'appear': 40, 'giving': 197, 'hate': 42, 'assisted': 74, 'und': 53, 'anniversary': 63, 'liberty': 58, 'pleased': 46, 'There': 45, 'above': 98, 'new': 47, 'tribute': 49, 'Francis': 112, 'ever': 110, 'told': 966, 'body': 443, 'edited': 111, 'justification': 226, 'hero': 66, 'chose': 177, 'Presidency': 165, 'Brown': 234, 'Light': 61, 'men': 188, 'here': 106, 'hundreds': 44, 'youngest': 125, 'met': 92, 'Arthur': 65, 'them': 517, 'slaughter': 661, 'separation': 154, 'making': 164, 'Times': 91, 'field': 67, 'legend': 87, 'deduced': 72, 'obtained': 44, 'speeches': 58, 'daughter': 400, 'hearts': 40, 'inspired': 42, 'changed': 50, 'experience': 72, 'leaving': 56, 'trial': 100, 'descended': 330, 'proclamation': 94, 'opinion': 116, 'makes': 183, 'honor': 345, 'fulfill': 73, 'reckoned': 212, 'named': 455, 'plays': 98, 'elect': 121, 'Jew': 51, 'family': 2140, 'heart': 429, 'Robert': 41, 'When': 908, 'natural': 48, 'brought': 46, 'ask': 109, 'addressed': 118, 'Book': 801, 'Satan': 67, 'names': 130, 'actions': 46, 'standing': 211, 'plot': 99, 'while': 299, 'from': 17341, 'spoke': 366, 'would': 2588, 'prophets': 330, 'sake': 900, 'beginning': 63, 'angry': 244, 'downfall': 58, 'visit': 97, 'two': 267, 'next': 172, "Christ's": 150, 'doubt': 54, 'call': 3329, 'memory': 1598, 'bowed': 105, 'themselves': 110, 'First': 162, 'angels': 268, 'until': 344, 'more': 84, 'possession': 560, 'biblical': 567, 'on': 7811, 'clearly': 57, 'altar': 214, 'Mount': 263, 'site': 102, 'Services': 80, 'American': 438, 'warm': 58, 'So': 396, 'excellent': 51, 'known': 144, 'cases': 204, 'must': 62, 'me': 228, 'none': 299, 'word': 91, 'debates': 262, 'hour': 99, 'this': 652, 'des': 52, 'soul': 123, 'theories': 49, 'era': 48, 'servants': 173, 'stature': 42, 'believed': 149, 'Our': 194, 'my': 1963, 'example': 808, 'address': 113, 'history': 2114, 'claim': 203, 'Bible': 105, 'grace': 203, 'give': 476, 'household': 125, 'Convention': 148, 'now': 42, 'heard': 83, 'fortunes': 63, 'purchased': 172, 'sense': 76, 'Lord': 3363, 'Jesus': 471, 'offspring': 617, 'biography': 383, 'Walter': 103, 'needs': 685, 'end': 42, 'sit': 110, 'Journal': 738, 'returning': 661, 'stayed': 180, 'very': 403, 'carrier': 257, 'pure': 137, 'swore': 951, 'answer': 194, 'instead': 53, 'ancestor': 497, 'sworn': 188, 'influence': 94, 'stock': 88, 'A': 307, 'tried': 90, 'may': 2056, 'sacrifice': 4983, 'after': 1050, 'spot': 109, 'turned': 142, 'Roman': 78, 'birthday': 527, 'lay': 44, 'Martin': 148, 'ready': 269, 'such': 609, 'Gospel': 317, 'law': 216, 'authentic': 62, 'efforts': 56, 'proclaimed': 46, 'man': 1680, 'a': 15285, 'Letters': 61, 'remember': 501, 'succeeded': 144, 'inheritance': 52, 'lines': 62, 'departure': 126, 'think': 174, 'inform': 262, 'so': 841, 'pay': 59, 'representation': 974, 'dream': 84, 'wine': 45, 'replied': 62, 'What': 93, 'help': 68, 'office': 95, 'hierarchy': 757, 'essence': 60, 'over': 96, 'displayed': 87, 'borne': 97, 'Illinois': 845, 'held': 123, 'through': 631, 'same': 151, 'mentioned': 50, 'its': 60, 'before': 2381, 'His': 731, 'March': 47, 'how': 46, 'chosen': 333, 'bosom': 772, 'late': 345, 'maid': 64, 'presently': 48, 'willing': 205, 'encounter': 64, 'enlisted': 52, 'might': 4947, 'Spain': 56, 'outside': 41, 'then': 570, 'Evolution': 132, 'good': 412, 'holy': 52, 'greater': 220, 'descendants': 6986, 'thee': 463, 'Address': 43, 'ye': 266, 'nation': 79, 'obeyed': 716, 'mention': 51, 'She': 51, 'they': 667, 'hands': 151, 'not': 2614, 'Works': 1478, 'day': 1058, 'nor': 45, 'successor': 51, 'prayed': 40, 'going': 457, 'speaks': 47, 'name': 4947, 'servant': 384, 'always': 102, 'advent': 75, 'slavery': 64, 'bless': 184, 'rock': 97, 'found': 427, 'went': 2830, 'side': 238, 'English': 206, 'mean': 76, 'burial': 247, 'lifted': 179, 'From': 776, 'doing': 148, 'house': 528, 'prophet': 174, 'hurried': 53, 'Quarterly': 114, 'books': 84, 'year': 65, 'our': 7788, 'calling': 569, 'tenth': 49, 'event': 80, 'out': 1666, 'Elizabeth': 193, 'god': 138, 'since': 2625, 'strangers': 52, 'Prophet': 145, 'got': 228, 'cause': 76, 'red': 84, 'Word': 51, 'theory': 108, 'G': 59, 'passes': 44, 'This': 103, 'University': 42, 'converse': 48, 'story': 5778, 'reason': 66, 'surely': 781, 'success': 62, 'put': 172, 'van': 52, 'repose': 90, 'Studies': 92, 'heroic': 44, 'thrown': 98, 'planted': 90, 'Education': 455, 'benefits': 41, 'Boston': 105, 'could': 422, 'days': 3103, 'times': 226, 'conversation': 80, 'place': 1151, 'wrote': 134, 'mortal': 60, 'secretary': 110, 'exhibited': 51, 'first': 891, 'Even': 123, 'plainly': 93, 'striking': 80, 'spoken': 396, 'coming': 136, 'number': 52, 'fancy': 60, 'one': 529, 'feet': 45, 'elder': 46, 'done': 41, 'Speech': 86, 'president': 529, 'vote': 372, 'reached': 99, 'drove': 90, 'Genesis': 334, 'sheep': 200, 'dwelt': 259, 'given': 1209, 'ancient': 173, 'preached': 206, 'plains': 2256, 'introduction': 56, 'revealed': 1071, 'caught': 342, 'slept': 42, 'revive': 58, 'heights': 2238, 'their': 1576, 'station': 170, 'tragic': 66, 'passed': 177, 'convention': 205, 'dwell': 359, 'hundred': 834, 'beneath': 45, 'voted': 126, 'Association': 137, 'sixteenth': 102, 'Lincoln': 65329, 'murder': 170, 'Letter': 59, 'relationship': 190, 'that': 22959, 'exactly': 54, 'took': 2914, 'Roosevelt': 263, 'wisdom': 67, 'part': 156, 'If': 365, 'sincerity': 40, 'off': 731, 'translation': 49, 'believe': 119, 'representing': 46, 'than': 1213, 'History': 1902, 'begins': 42, 'mercy': 61, 'grew': 482, 'third': 654, 'rival': 55, 'older': 40, 'tree': 54, 'State': 53, 'seated': 50, 'nations': 229, 'marriage': 111, 'supposed': 104, 'were': 5459, 'visited': 45, 'country': 56, 'oath': 250, 'declare': 57, 'migration': 308, 'and': 82527, 'pointed': 48, 'willingness': 139, 'Chicago': 186, 'ran': 290, 'God': 16567, 'mind': 49, 'eyes': 652, 'Idea': 145, 'say': 593, 'seed': 8737, 'buried': 687, 'have': 8491, 'seen': 411, 'One': 75, 'saw': 1776, 'any': 47, 'sat': 307, 'lie': 73, 'speaking': 153, 'built': 431, 'Egyptian': 97, 'Jewish': 318, 'self': 42, 'cave': 42, 'able': 210, 'note': 53, 'also': 337, 'take': 194, 'They': 206, 'Faculty': 49, 'commentary': 103, 'subject': 288, 'With': 205, 'Party': 239, 'play': 77, 'sure': 79, 'shall': 2470, 'price': 64, 'who': 2001, 'Frederick': 133, 'paid': 52, 'eldest': 398, 'most': 95, 'Chief': 77, 'amongst': 94, 'letter': 352, 'commanding': 169, 'nothing': 59, 'stood': 488, 'The': 9881, 'the': 163157, 'parting': 122, 'considered': 167, 'That': 216, 'request': 55, 'face': 616, 'looked': 348, 'Table': 45, 'reputation': 42, 'hear': 214, 'left': 88, 'Its': 89, 'came': 1136, 'stones': 329, 'saying': 51, 'celebrated': 41, 'shot': 327, 'answered': 419, 'show': 55, 'agreed': 43, 'Political': 239, 'contemporary': 155, 'nominated': 1480, 'bring': 1363, 'relation': 52, 'expressed': 48, 'Under': 54, 'Palestine': 44, 'tempted': 107, 'find': 45, 'Independence': 54, 'We': 1165, 'resemblance': 80, 'with': 16393, 'explain': 57, 'character': 449, 'should': 744, 'only': 175, 'wood': 259, 'York': 65, 'rich': 494, 'Jacob': 349, 'thousands': 44, 'shrine': 40, 'personality': 71, 'executive': 51, 'do': 2224, 'his': 30754, 'listened': 59, 'means': 59, 'Apostles': 45, 'de': 50, 'Days': 70, 'dealings': 41, 'yielded': 47, 'famous': 179, 'Writings': 552, 'legitimate': 77, 'words': 853, 'During': 51, 'during': 640, 'THE': 69, 'years': 1051, 'banner': 59, 'him': 947, 'For': 320, 'following': 64, 'sacrificed': 96, 'True': 220, 'married': 99, 'Washington': 1094, 'us': 868, 'morning': 275, 'whether': 187, 'she': 59, 'House': 45, 'Meeting': 91, 'gospel': 151, 'tent': 362, 'where': 1063, 'vision': 128, 'view': 56, 'set': 265, 'human': 63, 'reference': 89, 'burst': 103, 'National': 287, 'we': 779, 'elected': 247, 'see': 2604, 'Short': 175, 'are': 2529, 'portrait': 365, 'hastened': 375, 'John': 1014, 'aroused': 57, 'breast': 45, 'wonder': 59, 'Second': 50, 'said': 7569, 'learnt': 59, 'raise': 133, 'temptation': 84, 'pictures': 48, 'away': 205, 'Rise': 52, 'Park': 46, "father's": 104, 'written': 434, 'won': 74, 'between': 2718, 'seven': 141, 'justified': 1731, 'Republican': 422, 'Study': 299, 'bought': 1223, 'arrival': 206, 'ends': 42, 'never': 202, 'sold': 59, 'preach': 67, 'ascribed': 71, 'S': 381, 'succeed': 105, 'spare': 164, 'drew': 169, 'portion': 66, 'come': 4994, 'Honor': 92, 'motivation': 66, 'praying': 67, 'both': 204, 'last': 151, 'obedience': 55, 'reported': 99, 'many': 146, 'taking': 42, 'acknowledge': 50, 'Blessed': 101, 'against': 92, 'fulfilled': 310, 'personal': 99, 'connection': 92, 'became': 574, 'figures': 44, 'Quebec': 42, 'receives': 97, 'expense': 40, 'whole': 50, 'asked': 270, 'comes': 44, 'among': 527, 'Death': 178, 'Christ': 155, 'Center': 147, 'Sir': 1616, 'wall': 43, 'seems': 55, 'Mr': 295, 'period': 324, 'walk': 41, 'running': 105, 'stretched': 1228, 'company': 57, 'news': 63, 'table': 81, 'nomination': 160, 'save': 43, 'Literature': 178, 'described': 144, 'angel': 52, 'basis': 132, 'union': 65, 'remained': 152, 'political': 63, 'three': 269, 'been': 2106, 'whom': 846, 'secret': 45, 'startled': 59, 'practised': 67, 'portraits': 44, 'figure': 237, 'meeting': 292, 'choose': 48, 'wonderful': 105, 'life': 2250, 'Georgia': 60, 'families': 200, 'Adams': 417, 'stationed': 59, 'received': 144, 'Pennsylvania': 41, 'Father': 687, 'concerning': 294, 'blessing': 5995, 'great': 453, 'Nations': 55, 'assert': 96, 'finished': 112, 'lives': 246, 'child': 597, 'thing': 918, 'spirit': 366, 'present': 97, 'quotation': 48, 'And': 4699, 'exception': 188, 'righteous': 233, 'Old': 206, 'presidency': 152, 'these': 562, 'appearance': 152, 'can': 220, 'General': 112, 'will': 1498, 'cast': 163, 'near': 443, 'coffin': 75, 'situation': 50, 'voice': 360, 'study': 46, 'manner': 176, 'resolved': 44, 'kindness': 52, 'is': 8861, 'prayer': 90, 'it': 1866, 'defeated': 99, 'according': 485, 'weighed': 222, 'in': 25734, 'comparable': 45, 'commanded': 1461, 'fame': 82, 'if': 470, 'party': 159, 'memorial': 197, 'Egypt': 143, 'descent': 530, 'things': 263, 'make': 498, 'views': 53, 'administration': 768, 'cross': 52, 'heirs': 339, "God's": 2435, 'discourse': 44, 'speech': 66, 'James': 722, 'President': 4249, 'pupil': 66, 'disciple': 132, 'daughters': 544, 'covenant': 8370, 'Faith': 107, 'used': 78, 'again': 307, 'I': 5661, 'comprehensive': 56, 'conceived': 145, 'upon': 3783, 'effect': 116, 'hand': 1713, 'director': 47, 'kept': 114, 'moment': 97, 'statue': 952, 'tradition': 109, 'practiced': 43, 'unlike': 52, 'thy': 1263, 'blood': 50, 'expression': 99, 'humble': 429, 'well': 680, 'It': 492, 'Son': 178, 'States': 77, 'person': 153, 'command': 462, 'greatest': 46, 'In': 930, 'position': 65, 'model': 280, 'bodies': 72, 'Adam': 131, 'restore': 44, 'United': 178, 'just': 74, 'less': 133, 'being': 253, 'Albert': 44, 'Mother': 88, 'domestic': 40, 'rest': 95, 'schools': 47, 'kill': 162, 'Reconstruction': 72, 'behind': 59, 'encouraged': 46, 'Territory': 85, 'yet': 537, 'unto': 5000, 'Be': 128, 'death': 1066, 'campaign': 51, 'candidate': 84, 'mother': 401, 'rose': 1342, 'assassination': 1085, 'had': 5072, 'except': 102, 'appeared': 1282, 'book': 42, 'Philip': 103, 'has': 710, 'By': 564, 'gave': 3784, 'kingdom': 330, 'On': 1300, 'which': 6393, 'government': 67, 'read': 115, 'Of': 131, 'papers': 43, 'possible': 97, 'early': 1965, 'Solomon': 152, 'know': 641, 'birth': 1764, 'accepted': 86, 'acquaintance': 68, 'descendant': 1352, 'Note': 42, 'appearing': 42, 'like': 717, 'stranger': 66, 'accession': 59, 'audience': 133, 'steps': 150, 'Are': 209, 'guise': 63, 'translated': 51, 'become': 983, 'downward': 49, 'works': 3111, 'dictatorship': 47, 'Before': 1293, 'Little': 57, 'because': 464, 'honored': 56, 'people': 290, 'tells': 231, 'fathers': 1531, 'creation': 49, 'some': 47, 'back': 612, 'immortal': 107, 'ideals': 55, 'born': 1091, 'lips': 287, 'election': 5749, 'home': 644, 'bore': 543, 'divine': 420, 'Mary': 681, 'Vienna': 42, 'delivered': 48, 'Life': 5857, 'friend': 683, 'for': 7594, 'Was': 45, 'recorded': 52, 'generations': 1651, 'best': 42, 'religion': 721, 'does': 462, 'Black': 85, 'native': 120, 'Tradition': 1676, 'He': 553, 'be': 5861, 'School': 308, 'knew': 185, 'bold': 42, 'intimate': 159, 'Complete': 177, 'O': 459, 'David': 673, 'leadership': 54, 'Place': 48, 'posterity': 1636, 'El': 61, 'by': 9946, 'troops': 90, 'gained': 209, 'faith': 5525, 'about': 2241, 'rare': 87, 'of': 196205, 'US': 100, 'remains': 292, 'blessings': 463, 'tomb': 348, 'or': 1252, 'Story': 523, 'own': 118, 'letters': 54, 'Psychology': 216, 'Then': 331, 'enough': 52, 'into': 1281, 'within': 83, 'Scotland': 60, 'blessed': 525, 'intellectual': 69, 'son': 11643, 'down': 677, 'emancipation': 94, 'right': 108, 'promise': 9691, 'But': 104, 'old': 840, 'thence': 143, 'your': 1209, 'Covenant': 417, 'rejoice': 48, 'Gentiles': 82, 'her': 106, 'there': 192, 'Collection': 56, 'start': 61, 'type': 106, 'much': 175, 'way': 200, 'forward': 52, 'Lives': 243, 'was': 27050, 'ghost': 509, 'Baptist': 88, 'himself': 693, 'offering': 463, 'offer': 768, 'aircraft': 257, 'attempted': 55, 'Declaration': 54, 'but': 304, 'state': 44, 'link': 99, 'Legal': 53, 'line': 467, 'removed': 49, 'true': 1163, 'LORD': 190, 'he': 5627, 'admiration': 61, 'made': 13424, 'skull': 96, 'versions': 44, 'submitted': 57, 'hardly': 41, 'up': 3710, 'signed': 652, 'nearest': 68, 'record': 103, 'Jefferson': 1449, 'stories': 493, 'Samuel': 108, 'monument': 121, 'Colonel': 40, 'called': 2836, 'examples': 215, 'gone': 52, 'promises': 2468, 'v': 56, 'ordered': 233, 'certain': 119, 'am': 2077, 'dialogue': 254, 'an': 1906, 'To': 127, 'as': 12144, 'promised': 3746, 'Richard': 134, 'New': 462, 'lifetime': 45, 'physical': 406, 'High': 53, 'compared': 56, 'raised': 223, 'faithful': 352, 'no': 450, 'May': 145, 'administered': 125, 'peace': 127, 'when': 3259, 'Fort': 228, 'thinks': 83, 'fire': 64, 'other': 872, 'role': 47, 'becomes': 96, 'you': 783, 'really': 157, 'offered': 93, 'Maria': 325, 'picture': 898, 'claimed': 92, 'felt': 40, 'pretended': 45, 'prepared': 63, 'Thought': 225, 'fell': 758, 'breathed': 98, 'Testament': 1305, 'died': 47, 'ago': 46, 'younger': 66, 'land': 2354, 'Mexican': 244, 'wife': 1824, 'age': 395, 'lot': 98, 'Thus': 115, 'An': 52, 'As': 150, 'fact': 691, 'time': 6478, 'at': 2128, 'understand': 57, 'having': 47, 'others': 47, 'once': 76}
Administrative	{'Canada': 678, 'all': 645, 'issued': 46, 'Workers': 49, 'Bureau': 110, 'Department': 2420, 'Communist': 78, 'Appeal': 242, 'decisions': 42, 'Columbia': 227, 'Business': 92, 'concerned': 49, 'laid': 55, 'passage': 296, 'to': 5658, 'finally': 59, 'program': 961, 'under': 702, 'League': 61, 'Training': 189, 'jurisdiction': 60, 'appointment': 294, 'Countries': 95, 'pursuant': 134, 'Constitutional': 1116, 'Military': 1400, 'Jersey': 84, 'Nursing': 188, 'did': 116, 'Tax': 205, 'team': 111, 'enjoy': 92, 'Index': 76, 'specially': 178, 'State': 2822, 'Northwest': 181, 'East': 489, 'auspices': 60, 'constitution': 129, 'new': 161, 'appeal': 192, 'method': 54, 'exercise': 62, 'Economics': 74, 'Professor': 353, 'Budget': 164, 'Changes': 270, 'obtained': 52, 'Dean': 86, 'receive': 52, 'Communication': 118, 'Staff': 3160, 'composed': 329, 'Conference': 3298, 'Group': 121, 'members': 531, 'Reform': 1103, 'Justice': 173, 'Hindu': 344, 'enactment': 269, 'Relations': 624, 'Member': 607, 'Basic': 65, 'establish': 71, 'use': 56, 'Australia': 81, 'from': 1350, 'Joint': 93, 'June': 160, 'charge': 110, "People's": 59, 'Land': 47, 'American': 336, 'appointed': 346, 'Procedure': 20851, 'must': 63, 'Financial': 1779, 'Comparative': 61, 'making': 277, 'control': 55, 'Police': 49, 'Federal': 5030, 'share': 107, 'states': 75, 'Policy': 1309, 'Australian': 684, 'Journal': 147, 'located': 331, 'A': 4336, 'Technical': 325, 'British': 560, 'boards': 81, 'such': 148, 'Model': 264, 'law': 226, 'Powers': 80, 'consisting': 43, 'a': 2355, 'All': 67, 'Appeals': 504, 'Civil': 395, 'Mayor': 51, 'inform': 41, 'maintain': 214, 'Politics': 72, 'travel': 56, 'Ancient': 337, 'Colonial': 301, 'over': 234, 'Use': 186, 'held': 57, 'Standing': 119, 'through': 101, 'its': 102, 'Higher': 898, 'group': 897, 'Scientific': 53, 'Rules': 104, 'Cabinet': 54, 'Director': 2914, 'Judge': 378, 'World': 1713, 'Evolution': 63, 'Royal': 72, 'practice': 75, 'Analysis': 506, 'Social': 591, 'not': 383, 'provision': 216, 'enacted': 57, 'Control': 576, 'Supreme': 897, 'name': 74, 'India': 3327, 'Authority': 57, 'Council': 7284, 'Statistics': 292, 'Legal': 683, 'Quarterly': 990, 'operation': 46, 'out': 68, 'Toward': 51, 'profit': 97, 'Clerk': 73, 'Practice': 352, 'Agreement': 135, 'may': 1737, 'acting': 98, 'China': 759, 'Report': 1184, 'Class': 350, 'theory': 75, 'Judicial': 313, 'University': 361, 'USSR': 67, 'formation': 72, 'violation': 154, 'Security': 253, 'Studies': 1231, 'National': 2884, 'Centre': 55, 'conducted': 147, 'Human': 63, 'Assistant': 641, 'retain': 100, 'Impact': 152, 'Review': 6569, 'There': 54, 'Administration': 87, 'Indian': 1755, 'directly': 61, 'Medicine': 46, 'Local': 287, "President's": 5242, 'North': 79, 'Congress': 773, 'Nature': 538, 'Problems': 1188, 'Environmental': 214, 'Data': 74, 'Association': 978, 'B': 60, 'that': 80, 'punish': 49, 'tool': 99, 'serve': 46, 'part': 83, 'because': 60, 'consult': 72, 'Problem': 173, 'Democratic': 128, 'elite': 108, 'History': 6718, 'Advisory': 99, 'Code': 1417, 'accordance': 105, 'second': 78, 'future': 310, 'Relation': 125, 'were': 120, 'Elementary': 94, 'powers': 52, 'and': 33014, 'Information': 375, 'Court': 2496, 'Philadelphia': 58, 'have': 60, 'Main': 714, 'Administrative': 118, 'their': 49, 'Development': 1920, 'Power': 343, 'Egyptian': 76, 'Arab': 44, 'also': 54, 'recommended': 78, 'chairman': 390, 'which': 188, 'Faculty': 99, 'Bank': 103, 'added': 63, 'shall': 1501, 'Reports': 161, 'Chief': 600, 'The': 7885, 'request': 226, 'High': 77, 'Growth': 170, 'senior': 42, 'Its': 72, 'Union': 187, 'meetings': 154, 'Agency': 444, 'Political': 1933, 'Research': 84, 'Role': 338, 'Public': 1590, 'European': 169, 'appealed': 113, 'implementation': 94, 'menu': 174, 'French': 46, 'Records': 57, 'York': 546, 'consent': 46, 'Services': 4071, 'Members': 52, 'permission': 128, 'report': 701, 'Organization': 1880, 'procedures': 111, 'comply': 68, 'Kong': 4055, 'recommendations': 78, 'Austrian': 52, 'Education': 416, 'questions': 267, 'District': 69, 'violated': 72, 'edition': 78, 'Handbook': 358, 'are': 267, 'Wisconsin': 212, 'review': 57, 'Rise': 402, 'Courts': 77, 'between': 119, 'before': 199, 'numerous': 284, 'sold': 64, 'Central': 142, 'Study': 4291, 'Manual': 236, 'Aspects': 120, 'suggested': 69, 'protect': 56, 'according': 117, 'Design': 48, 'Statistical': 57, 'whole': 43, 'Legislative': 596, 'relevant': 59, 'Affairs': 146, 'appeared': 79, 'Guinea': 617, 'Sciences': 6647, 'Board': 2014, 'political': 267, 'been': 53, 'Early': 172, 'California': 183, 'meeting': 665, 'duties': 65, 'Women': 480, 'Center': 66, 'Tokyo': 96, 'Nations': 70, 'Action': 603, 'Associated': 74, 'those': 64, 'Secretary': 485, 'Process': 711, 'General': 707, 'will': 1196, 'College': 3267, 'applies': 92, 'procedure': 48, 'City': 546, 'is': 845, 'expenses': 128, 'in': 21528, 'Division': 488, 'Israel': 238, 'make': 49, 'orders': 54, 'member': 401, 'parts': 100, 'President': 197, 'Tribunal': 2248, 'Official': 61, 'I': 1845, 'Medical': 81, 'upon': 128, 'judgment': 55, 'Order': 257, 'States': 247, 'Economic': 1292, 'Subcommittee': 469, 'In': 81, 'organization': 48, 'the': 127606, 'Port': 144, 'Physical': 150, 'United': 7176, 'Service': 2143, 'hands': 80, 'Permanent': 309, 'Examination': 438, 'Chairman': 1075, 'Branch': 163, 'Papers': 425, 'bibliography': 163, 'Theory': 662, 'headed': 72, 'had': 128, 'Ethics': 265, 'has': 923, 'Change': 214, 'provisions': 1860, 'Project': 118, 'Officer': 658, 'Pakistan': 60, 'like': 69, 'Colony': 71, 'continue': 122, 'London': 142, 'officer': 51, 'Further': 63, 'amendment': 74, 'Management': 3173, 'Democracy': 735, 'staff': 56, 'Influence': 163, 'Section': 201, 'System': 940, 'Commission': 4030, 'New': 1415, 'Economy': 61, 'Life': 115, 'Institute': 304, 'for': 4592, 'Government': 1297, 'decision': 714, 'discretion': 70, 'refer': 55, 'limitations': 60, 'be': 768, 'School': 109, 'Origins': 151, 'Bill': 43, 'Times': 124, 'Materials': 270, 'Place': 123, 'Behavior': 74, 'become': 218, 'Property': 3276, 'by': 4179, 'on': 13223, 'carried': 68, 'of': 103410, 'US': 3135, 'mutually': 71, 'UN': 331, 'Committee': 14192, 'or': 244, 'Swedish': 237, 'Hong': 1082, 'No': 134, 'down': 55, 'appropriate': 91, 'Critical': 51, 'references': 284, 'area': 49, 'transfer': 96, 'editor': 96, 'was': 537, 'head': 264, 'Science': 1175, 'Survey': 107, 'courts': 186, 'filed': 93, 'with': 1455, 'made': 142, 'Office': 18622, 'whether': 92, 'up': 45, 'County': 575, 'Society': 84, 'ruling': 52, 'Special': 7681, 'Modern': 167, 'Space': 322, 'Introduction': 904, 'Some': 195, 'sales': 97, 'an': 1785, 'as': 1389, 'at': 1068, 'International': 5495, 'functions': 91, 'Geneva': 62, 'no': 55, 'Vice': 53, 'other': 281, 'Law': 8471, 'Structure': 554, 'prepared': 100, 'important': 284, 'Act': 21511, 'Making': 340, 'South': 212, 'Executive': 86, 'Principles': 170, 'governed': 44, 'An': 1332, 'Second': 71, 'Empire': 64, 'requires': 160}
Advancement	{'Living': 56, 'Status': 127, 'United': 386, 'among': 71, 'Union': 2150, 'issued': 41, 'meetings': 251, 'filed': 90, 'Political': 125, 'Research': 2654, 'Religion': 2259, 'during': 58, 'Branch': 122, 'Papers': 576, 'I': 53, 'Progress': 637, 'Public': 487, 'Resources': 73, 'Higher': 116, 'devoted': 46, 'Foundation': 36706, 'Theory': 185, 'also': 58, 'Philosophy': 1154, 'had': 80, 'Carnegie': 34757, 'passage': 94, 'better': 391, 'to': 1926, 'book': 975, 'under': 146, 'Ethics': 312, 'has': 57, 'sent': 66, 'Secondary': 547, 'Sound': 158, 'League': 354, 'Training': 125, 'his': 894, 'Judaism': 107, 'Canadian': 67, 'Of': 95, 'Analysis': 60, 'Project': 2547, 'Oxford': 97, 'State': 809, 'Truth': 338, 'Social': 658, 'Organization': 148, 'association': 82, 'Growth': 55, 'Symposium': 296, 'America': 141, 'Liberal': 139, 'presented': 66, 'Washington': 78, 'Episcopal': 137, 'London': 113, 'Council': 3227, 'Black': 78, 'Education': 3118, 'Agriculture': 103, 'Female': 142, 'wrote': 132, 'Arts': 334, 'Management': 234, 'From': 62, 'human': 79, 'For': 1083, 'some': 116, 'Some': 108, 'Christianity': 171, 'elected': 46, 'see': 109, 'books': 184, 'are': 48, 'New': 315, 'Essays': 128, 'John': 50, 'Carolina': 76, 'Life': 43, 'Latin': 120, 'August': 186, 'for': 594824, 'since': 40, 'its': 584, 'looking': 862, 'Fund': 6386, 'Harvard': 89, 'Report': 176, 'Reformation': 1059, 'Francis': 156, 'True': 335, 'University': 85, 'Napoleon': 42, 'System': 129, 'be': 41, 'School': 312, 'Economics': 105, 'Colored': 111157, 'found': 51, 'Study': 573, 'Work': 55, 'met': 112, 'Race': 874, 'Behavior': 112, 'African': 1685, 'from': 301, 'Studies': 726, 'by': 179, 'Confederation': 176, 'Scientific': 1069, 'Commonwealth': 47, 'on': 815, 'Convention': 62, 'Chinese': 144, 'People': 5966, 'September': 178, 'of': 585158, 'National': 1660, 'annual': 147, 'urged': 42, 'Trade': 819, 'Its': 56, 'Materials': 328, 'Americans': 118, 'Committee': 1526, 'published': 328, 'Technology': 342, 'or': 654, 'first': 225, 'Conference': 514, 'Group': 305, 'Knowledge': 338, 'Sir': 87, 'Justice': 202, 'Administration': 236, 'Educational': 325, 'Asia': 128, 'Their': 433, 'Book': 193, 'Health': 282, 'Negro': 226, 'Medicine': 715, 'Technical': 497, "Women's": 282, 'Meeting': 159, 'conference': 100, 'Literature': 494, 'North': 76, 'publication': 80, 'Congress': 102, 'Centre': 217, 'beginning': 53, 'contains': 54, 'two': 117, 'been': 80, 'awarded': 44, 'their': 335, 'Reprinted': 142, 'Control': 93, "People's": 125, 'was': 2125, 'Association': 372590, 'Women': 2080, 'Essay': 167, 'contribute': 45, 'Center': 467, 'that': 397, 'Science': 12611, 'took': 54, 'it': 46, 'American': 2995, 'particular': 40, 'Small': 903, 'translation': 163, 'with': 195, 'he': 419, 'directed': 67, 'Colleges': 996, 'Industrial': 362, 'Financial': 162, 'edition': 46, 'Process': 725, 'meeting': 249, 'work': 160, 'Sciences': 576, 'second': 319, 'General': 129, 'Society': 774, 'See': 72, 'itself': 55, 'College': 1301, 'Military': 140, 'Communication': 189, 'my': 92, 'and': 16609, 'Applied': 67, 'liberal': 61, 'Agricultural': 477, 'Corporation': 72, 'is': 1418, 'Peace': 423, 'organized': 105, 'Learning': 5871, 'an': 58, 'Field': 56, 'as': 425, 'Netherlands': 1262, 'held': 378, 'at': 2112, 'have': 125, 'International': 1369, 'Policy': 152, 'Lord': 188, 'made': 112, 'Division': 3465, 'Israel': 215, 'Christian': 422, 'in': 12481, 'author': 99, 'Food': 90, 'Journal': 1534, 'De': 67, 'England': 140, 'recommended': 46, 'parts': 40, 'which': 210, 'Teaching': 36221, 'touching': 74, 'Zealand': 2870, 'Faith': 222, 'may': 45, 'Art': 2323, 'Institute': 2806, 'Medical': 262, 'who': 50, 'upon': 200, 'Disease': 88, 'Reports': 53, 'Act': 52, 'Rural': 1703, 'The': 7263, 'Nations': 386, 'South': 1052, 'a': 120, 'Mexican': 118, 'Philippine': 117, 'Academy': 1317, 'Private': 116, 'Fellow': 47, 'Jewish': 338, 'Second': 140, 'Economic': 390, 'Moral': 130, 'the': 633233, 'Native': 313, 'began': 101, 'Physical': 77}
Affairs	{'Canada': 103, 'all': 457, 'issued': 497, 'hands': 117, 'Indians': 265, 'felt': 46, 'existing': 495, 'country': 98, 'Bureau': 83450, 'cooperative': 198, 'Department': 85430, 'Communist': 312, 'issues': 53, 'Brown': 954, 'Columbia': 1033, 'Business': 881, 'Fund': 68, 'concerned': 44, 'editors': 105, 'Bulletin': 2833, 'presents': 1399, 'to': 33820, 'charge': 1709, 'program': 53, 'voted': 89, 'under': 3154, 'sent': 217, 'activities': 178, 'Training': 419, 'Agency': 2043, 'Canadian': 633, 'rise': 44, 'Constitutional': 591, 'Standards': 98, 'P': 88, 'fall': 63, 'Military': 15208, 'Graduate': 1861, 'Economy': 65, 'condition': 45, 'Memoirs': 434, 'Ambassador': 582, 'school': 378, 'presented': 196, 'did': 552, 'Global': 272, 'make': 56, 'German': 2333, 'Defense': 5232, 'assurance': 64, 'Congress': 109, 'relation': 198, 'Paul': 118, 'George': 1267, 'Select': 1873, 'Turkey': 49, 'Index': 98, 'says': 69, 'dealing': 107, 'estimates': 50, 'Some': 94, 'sign': 48, 'go': 41, 'State': 28842, 'Northwest': 129, 'Conduct': 768, 'East': 4077, 'chair': 193, 'established': 384, 'Head': 547, 'Latin': 1906, 'Jones': 58, 'troops': 81, 'Governors': 152, 'fact': 61, 'section': 126, 'expressed': 42, 'debate': 135, 'current': 110, 'Cultural': 13066, 'above': 75, 'new': 593, 'America': 54, 'told': 666, 'body': 139, 'Kingdom': 1381, 'full': 48, 'Presidency': 71, 'whose': 45, 'honour': 208, 'Professor': 684, 'men': 56, 'here': 62, 'Planning': 949, 'reported': 419, 'English': 1191, 'consideration': 175, 'David': 48, 'Confederation': 211, 'Commonwealth': 2002, 'sending': 58, 'great': 93, 'Dean': 355, 'Chinese': 3794, 'Communication': 121, 'study': 296, 'Proceedings': 47, 'Quarterly': 61, 'Your': 42, 'Session': 143, 'Reform': 159, 'published': 600, 'Home': 53784, 'Religious': 5435, 'Continent': 62, 'Standing': 3825, 'Group': 1537, 'deputy': 113, 'Justice': 2184, 'Robert': 130, 'When': 40, 'Relations': 1110, 'brought': 51, 'addressed': 346, 'Health': 1864, 'Colonies': 54, 'crisis': 86, 'Europe': 561, 'Australia': 169, 'from': 5354, 'August': 81, 'would': 1103, 'Joint': 889, 'beginning': 181, 'stated': 1270, 'June': 310, 'two': 233, 'Russia': 219, 'few': 54, 'Lives': 710, 'archives': 111, "People's": 10502, 'Russian': 1211, 'until': 109, 'Western': 3734, 'commissioner': 250, 'Land': 564, 'Ministry': 250238, 'division': 54, 'Privy': 53, 'St': 170, 'Banking': 81, 'American': 11820, 'commissioned': 43, 'V': 177, 'effort': 54, 'appointed': 1040, 'behalf': 430, 'must': 48, 'me': 188, 'account': 56, 'Financial': 2895, 'Family': 442, 'this': 1065, 'ministry': 63, 'soul': 142, 'Pacific': 1927, 'Present': 3367, 'v': 196, 'following': 70, 'Our': 140, 'my': 223, 'could': 208, 'Mr': 61, 'Conference': 765, 'Applied': 54, 'Israeli': 1610, 'Police': 49, 'Northern': 7151, 'give': 47, 'Federal': 2675, 'in': 84697, 'Account': 180, 'Parts': 53, 'o': 2551, 'Dutch': 638, 'Policy': 700, 'Lord': 172, 'Australian': 1013, 'Walter': 191, 'respective': 48, 'Parliament': 175, 'end': 149, 'declared': 52, 'departments': 227, 'Journal': 3155, 'get': 100, "state's": 139, 'Paper': 428, 'Friends': 330, 'Zealand': 237, 'Lower': 58, 'Technical': 147, 'Disease': 202, 'British': 1552, 'Ocean': 3796, 'Q': 54, 'lay': 91, 'Paris': 285, 'Martin': 118, 'Swiss': 68, 'such': 177, 'a': 8946, 'Letters': 132, 'Metropolitan': 63, 'Mexico': 250, 'Civil': 7431, 'Ottoman': 234, 'Japanese': 271, 'Community': 6893, 'Middle': 618, 'chief': 252, 'Tennessee': 64, 'responsibility': 58, 'Politics': 133, 'order': 384, 'furnish': 48, 'replied': 109, 'Wilson': 208, 'office': 74, 'Colonial': 2038, 'pointed': 331, 'soon': 86, 'years': 53, 'held': 263, 'through': 328, 'committee': 287, 'Organization': 344, 'Interior': 20448, 'White': 79, 'still': 53, 'its': 1328, 'Texas': 938, 'before': 5680, 'Urban': 5996, 'His': 320, 'March': 379, 'Scientific': 6910, 'Exchange': 76, 'banner': 600, 'Philosophy': 396, 'Monetary': 2383, 'Negro': 1475, "Women's": 3022, 'Director': 2256, 'World': 27044, 'might': 42, 'Spain': 353, 'Sound': 115, 'then': 2472, 'Royal': 4149, 'Register': 116, 'Thomas': 59, 'Youth': 905, 'Rome': 218, 'Documents': 1103, 'Social': 28022, 'not': 1175, 'Academy': 136, 'hearing': 138, 'discuss': 51, 'nor': 65, 'Control': 261, 'articles': 72, 'San': 474, 'notified': 133, 'level': 97, 'India': 720, 'Authority': 168, 'Four': 117, 'Program': 467, 'Council': 15035, 'Scottish': 1189, 'Agricultural': 256, 'went': 209, 'Agriculture': 362, 'A': 1978, 'Arts': 377, 'series': 43, 'Representatives': 745, 'Legal': 2917, 'Medicine': 108, 'Joseph': 41, 'ex': 89, 'Treasury': 80, 'our': 1136, 'Essays': 258, 'Criminal': 63, 'special': 58, 'out': 282, 'Holy': 268, 'Chancellor': 1045, 'since': 483, 'William': 144, 'Institute': 17585, 'acting': 42, 'China': 143, 'Executive': 284, 'Report': 1263, 'got': 65, 'issue': 258, 'announced': 350, 'Commerce': 1036, 'Cuban': 190, 'approached': 86, 'assured': 41, 'Rights': 2213, 'Judicial': 1183, 'University': 4415, 'free': 54, 'Record': 74, 'USSR': 6657, 'members': 55, 'Security': 9534, 'Foreign': 413566, 'Studies': 5358, 'director': 274, 'prepare': 163, 'created': 353, 'September': 221, 'Boston': 201, 'National': 18168, 'Centre': 557, 'called': 45, 'Embassy': 60, 'place': 170, 'Human': 3943, 'Welfare': 585, 'first': 104, 'Universities': 77, 'Ontario': 137, 'Working': 1028, 'Palestinian': 61, 'Review': 4748, 'Administration': 4381, 'primary': 45, 'Asia': 153, 'one': 211, 'clear': 89, 'Indian': 118718, 'commissioners': 62, 'Syria': 230, 'carry': 58, 'Local': 1372, "President's": 93, 'size': 142, 'North': 438, 'Dictionary': 129, 'Eastern': 9141, 'together': 48, 'Commercial': 147, 'name': 61, 'Environmental': 4324, 'statement': 535, 'time': 1087, 'Press': 337, 'Association': 2340, 'Lincoln': 192, 'that': 6079, 'February': 48, 'took': 456, 'rejected': 45, 'released': 81, 'part': 43, 'Minister': 99366, 'Democratic': 758, 'than': 56, 'History': 470, 'Advisory': 425, 'Love': 90, 'accordance': 65, 'second': 76, 'See': 1021, 'officials': 42, 'Relation': 816, 'were': 1158, 'abolished': 109, 'toward': 180, 'and': 236051, 'Affairs': 506, 'Defence': 3829, 'Court': 377, 'Richard': 280, 'remained': 95, 'respecting': 336, 'Philadelphia': 397, 'Current': 513, 'Secret': 201, 'Other': 125, 'have': 647, 'Administrative': 146, 'their': 546, 'Republic': 6216, 'Development': 3432, 'Power': 152, 'Egyptian': 656, 'Jewish': 772, 'UNIVERSITY': 267, 'responsible': 667, 'able': 54, 'instructed': 59, 'Arab': 545, 'also': 1164, 'recommended': 58, 'without': 47, 'Germany': 144, 'chairman': 1954, 'which': 901, 'so': 76, 'Bank': 348, 'Party': 79, 'Articles': 105, 'towards': 116, 'shall': 561, 'collaboration': 84, 'who': 713, 'what': 63, 'Library': 673, 'most': 43, 'Chief': 457, 'Korea': 377, 'sponsored': 45, 'printed': 57, 'honor': 66, 'Rural': 1402, 'The': 30139, 'approved': 87, 'Reserve': 271, 'Foundation': 215, 'considered': 231, 'medical': 64, 'later': 381, 'request': 113, 'Fellow': 46, 'High': 55, 'managing': 47, 'occasion': 90, 'charged': 178, 'Carolina': 259, 'Its': 511, 'came': 117, 'Principal': 101, 'Provincial': 455, 'answered': 73, 'Union': 1750, 'memorandum': 58, 'Political': 6764, 'Ministers': 13967, 'Research': 1686, 'Religion': 165, 'session': 114, 'Role': 851, 'Under': 606, 'Public': 48334, 'Resources': 3363, 'European': 3799, 'Province': 674, 'proposed': 111, 'title': 85, 'true': 93, 'Muslim': 2470, 'state': 442, 'French': 752, 'should': 720, 'Records': 320, 'chiefs': 62, 'York': 325, 'Services': 81, 'Housing': 801, 'Prime': 498, 'over': 124, 'his': 3608, 'permission': 656, 'assistant': 46, 'Notes': 137, 'Annual': 409, 'Nuclear': 136, 'Industrial': 719, 'Reich': 230, 'reply': 62, 'report': 381, 'during': 893, 'Soviet': 1738, 'him': 204, 'Italian': 1756, 'regarding': 110, 'Republican': 414, 'committees': 163, 'Dominion': 50, 'ended': 127, 'Washington': 2985, 'Kong': 804, 'investigate': 46, 'Austrian': 902, 'Education': 1302, 'wrote': 50, 'secretary': 91, 'Educational': 2019, 'set': 50, 'Caribbean': 1085, 'District': 75, 'Area': 202, 'Hospital': 65, 'operates': 173, 'relative': 864, 'edition': 86, 'see': 346, 'decided': 48, 'Handbook': 259, 'are': 545, 'John': 443, 'said': 932, 'Wisconsin': 149, 'Johnson': 3209, 'League': 104, 'federal': 696, 'Oregon': 241, 'Water': 3019, 'Berlin': 353, 'Harvard': 3534, 'cooperation': 219, 'between': 341, 'July': 355, 'Higher': 232, 'clerk': 51, 'C': 48, 'Central': 1223, 'wanted': 55, 'importance': 48, 'Study': 1105, 'Florida': 851, 'S': 301, 'Race': 50, 'African': 5802, 'correspondence': 69, 'article': 2034, 'suggested': 56, 'received': 143, 'Settlement': 219, 'last': 92, 'Greek': 913, 'influential': 118, 'many': 53, 'foreign': 209, 'Indies': 94, 'became': 668, 'dated': 88, 'Statistical': 128, 'Navy': 67, 'El': 41, 'whole': 53, 'asked': 164, 'Legislative': 432, 'afterwards': 113, 'Diego': 336, 'appeared': 44, 'Panama': 77, 'Published': 81, 'Los': 253, 'agents': 100, 'Board': 3586, 'Ireland': 116, 'Islands': 461, 'Cabinet': 1092, 'War': 1503, 'Publishing': 57, 'conference': 69, 'late': 237, 'Assembly': 91, 'Medieval': 69, 'basis': 171, 'west': 105, 'France': 1014, 'Chicago': 207, 'informed': 546, 'Immigration': 81, 'been': 736, 'whom': 48, 'California': 271, 'Modern': 122, 'Schools': 1777, 'territory': 63, 'meeting': 326, 'Women': 361, 'direction': 273, 'Center': 10473, 'Vice': 2996, 'Pennsylvania': 480, 'concerning': 569, 'formed': 54, 'October': 212, 'Nations': 2046, 'Allied': 60, 'minister': 774, 'former': 283, 'present': 487, 'Secretary': 18874, 'case': 142, 'Upper': 333, 'myself': 99, 'Force': 250, 'Portugal': 217, 'Deputy': 728, 'say': 48, 'General': 3961, 'will': 788, 'Culture': 40, 'Persian': 919, 'College': 1683, 'Church': 716, 'situation': 183, 'Taiwan': 279, 'gave': 123, 'Energy': 123, 'Market': 394, 'City': 806, 'Italy': 540, 'note': 54, 'is': 7372, 'it': 160, 'Imperial': 396, 'Literary': 124, 'inform': 485, 'Capital': 178, 'Puerto': 401, 'if': 264, 'Division': 11912, 'Israel': 69, 'Associate': 159, 'inquire': 45, 'began': 198, 'Regional': 510, 'administration': 320, 'connection': 207, 'same': 115, 'tells': 45, 'member': 459, 'speech': 210, 'Oxford': 304, 'President': 1463, 'several': 124, 'entitled': 96, "King's": 81, 'noble': 60, 'Third': 91, 'I': 193, 'Medical': 3988, 'upon': 250, 'Angeles': 472, 'purpose': 45, 'Order': 83, 'recent': 241, 'Superintendent': 7174, 'matters': 77, 'Senate': 5770, 'Employment': 3263, 'well': 746, 'agency': 200, "country's": 45, 'States': 8492, 'Food': 182, 'Economic': 44992, 'Moral': 163, 'Subcommittee': 7302, 'In': 311, 'the': 538433, 'Jerusalem': 2103, 'stating': 154, 'Native': 13275, 'United': 10900, 'Service': 3501, 'being': 195, 'executive': 42, 'Labor': 1654, 'Care': 115, 'Permanent': 191, 'Chairman': 391, 'Branch': 600, 'Papers': 1492, 'Churches': 70, 'Territory': 469, 'yet': 55, 'web': 60, 'now': 330, 'Municipal': 3210, 'Far': 3358, 'Company': 1139, 'Turkish': 127, 'had': 4733, 'except': 77, 'Prevention': 148, "Majesty's": 127, 'Internal': 18774, 'News': 432, 'Ethics': 55, 'has': 3465, 'Marine': 122, 'Change': 383, 'take': 442, 'provisions': 66, 'government': 319, 'Rico': 107, 'replaced': 59, 'Commons': 1366, 'early': 102, 'possibly': 103, 'Officer': 568, 'advanced': 79, 'prominent': 124, 'Member': 336, 'acquainted': 50, 'England': 518, 'spoke': 65, 'Sciences': 676, 'follows': 101, 'Influence': 44, 'Contemporary': 91, 'London': 341, 'officer': 69, 'Polish': 373, 'reduced': 679, 'Before': 164, 'Swedish': 1499, 'Legislature': 107, 'Management': 686, 'superintendent': 255, 'deal': 47, 'Army': 122, 'staff': 47, 'Colony': 76, 'Section': 778, 'System': 57, 'authority': 44, 'Commission': 9760, 'Year': 127, 'Conservation': 173, 'Archives': 110, 'Cooperation': 460, 'recommendation': 147, 'New': 1834, 'Vienna': 143, 'delivered': 116, 'Life': 1487, 'both': 80, 'for': 159066, 'Government': 3184, 'Men': 1179, 'per': 247, 'Southeast': 132, 'does': 165, 'offices': 44, 'Population': 98, 'be': 2973, 'School': 10069, 'Later': 69, 'journal': 697, 'Economics': 470, 'agreement': 67, 'refused': 103, 'Near': 405, 'Vietnam': 55, 'Command': 47, 'Behavior': 304, 'served': 146, 'relating': 398, 'by': 5164, 'von': 53, 'First': 54, 'on': 93664, 'about': 229, 'Revolution': 510, 'Nation': 159, 'of': 842640, 'months': 54, 'year': 183, 'US': 2951, 'Trade': 2147, 'Austria': 92, 'UN': 1412, 'Commissioners': 3702, 'Committee': 64488, 'testimony': 48, 'Japan': 2354, 'or': 1455, 'Information': 6575, 'Hong': 804, 'own': 542, 'formerly': 258, 'No': 267, 'communication': 56, 'into': 1892, 'within': 371, 'Scotland': 363, 'ministers': 99, 'Environment': 580, 'Labour': 1423, 'Pakistan': 118, 'Korean': 583, 'Meeting': 245, 'your': 58, 'Finance': 407, 'her': 188, 'area': 63, 'Assistant': 1721, 'print': 99, 'Approach': 105, 'pages': 264, 'editor': 174, 'West': 394, 'November': 105, 'was': 11728, 'Punjab': 144, 'head': 1160, 'form': 59, 'Science': 2998, 'regard': 311, 'pleased': 1582, 'but': 63, 'Treaty': 53, 'Chamber': 565, 'Survey': 195, 'with': 3920, 'he': 872, 'directed': 68, 'Irish': 202, 'made': 859, 'Office': 13527, 'whether': 515, 'House': 17383, 'Africa': 543, 'up': 50, 'signed': 64, 'Society': 434, 'Britain': 223, 'Jefferson': 59, 'more': 56, 'Crisis': 101, 'Special': 137, 'Sir': 164, 'Charles': 347, 'ad': 56, 'Space': 101, 'Philippine': 1105, 'Six': 217, 'Continental': 109, 'an': 1440, 'continued': 73, 'as': 10106, 'Clinical': 93, 'at': 17408, 'International': 48413, 'again': 100, 'Commissioner': 7515, 'Francisco': 60, 'no': 66, 'May': 134, 'F': 149, 'when': 509, 'Fort': 47, 'handed': 62, 'Islamic': 487, 'other': 855, 'Atlantic': 53, 'becomes': 84, 'department': 146, 'Poems': 945, 'you': 48, 'Law': 1922, 'Poland': 131, 'Man': 46, 'requested': 244, 'Great': 530, 'may': 369, 'Naval': 15273, 'Asiatic': 86, 'wing': 74, 'Country': 119, 'become': 97, 'prepared': 51, 'April': 45, 'ground': 46, 'Act': 400, 'Making': 270, 'N': 50, 'Soviets': 120, 'South': 4176, 'building': 176, 'Mexican': 843, 'vice': 94, 'home': 45, 'age': 98, 'Cuba': 245, 'Private': 1110, 'An': 49, 'As': 147, 'Scots': 90, 'Asian': 6882, 'remarked': 63, 'Empire': 241, 'Southern': 368}
African	{'limited': 201, 'writings': 305, 'similarity': 46, 'dissolution': 101, 'Poetry': 455, 'four': 1655, 'facilities': 42, 'protest': 82, 'woods': 135, 'According': 856, 'Lodge': 423, 'Communist': 3072, 'oldest': 477, 'integrity': 211, 'Until': 48, 'relationships': 153, 'Foundation': 1649, 'feeding': 60, 'segments': 981, 'Western': 3375, 'under': 1438, 'Changing': 410, 'pride': 268, 'sway': 53, 'League': 2868, 'hormone': 47, 'risk': 155, 'Industry': 831, 'rise': 1085, 'railways': 241, "women's": 508, 'every': 970, 'Henry': 279, 'Military': 333, 'bringing': 101, 'encounter': 56, 'Origin': 295, 'Continental': 435, 'school': 1000, 'scholar': 56, 'Reference': 53, 'companies': 496, 'solution': 702, 'Caesar': 52, 'delegate': 48, 'overview': 88, 'reliable': 40, 'triumph': 181, 'Europeans': 167, 'Turkish': 62, 'charter': 162, 'force': 991, 'leaders': 3035, 'estimates': 49, 'direct': 177, 'voting': 347, 'preacher': 222, 'likely': 1049, 'persuade': 63, 'follow': 66, 'subordinate': 40, 'sailed': 330, 'estimated': 131, 'shining': 149, 'blue': 211, 'established': 739, 'Appeal': 614, 'supplied': 182, 'assembled': 119, 'victims': 85, 'Diseases': 129, 'reconstruction': 104, 'conduct': 528, 'new': 10053, 'increasing': 452, 'ever': 205, 'told': 54, 'agreement': 52, 'hero': 92, 'whose': 446, 'join': 545, 'men': 7001, 'here': 54, 'hundreds': 51, 'met': 165, 'protection': 446, 'Evaluation': 220, 'active': 175, 'aftermath': 45, 'celebration': 44, 'obtained': 52, 'monarch': 47, 'voice': 87, 'daughter': 94, 'study': 11574, 'changed': 101, 'reports': 47, 'controversy': 173, 'credit': 49, 'abolition': 2768, 'military': 1782, 'adopting': 52, 'settled': 223, 'Continent': 1108, 'criticism': 778, 'golden': 58, 'Group': 949, 'secure': 86, 'campaign': 4207, 'mobility': 258, 'ground': 87, 'Italian': 485, 'Three': 264, 'Relations': 200, 'highly': 98, 'brought': 2791, 'visible': 435, 'Book': 3256, 'Health': 636, 'Basic': 82, 'total': 1507, 'landscape': 386, 'opponents': 165, 'would': 2310, 'army': 632, 'voters': 628, 'negative': 471, 'music': 1507, 'recommend': 41, 'strike': 67, 'type': 96, 'until': 521, 'Recent': 97, 'females': 134, 'Ministry': 362, 'successful': 72, 'hereby': 112, 'wars': 135, 'Ideas': 130, 'disruption': 40, 'arms': 144, 'teaching': 297, 'midst': 278, 'circumstances': 42, 'me': 258, 'Financial': 111, 'rights': 7268, 'work': 2403, 'Conservation': 62, 'era': 331, 'guise': 40, 'my': 476, 'example': 1149, 'shook': 42, 'Standing': 247, 'indicated': 102, 'give': 372, 'Corporation': 3746, 'Capital': 297, 'organized': 82, 'Training': 375, 'involve': 61, 'currency': 199, 'want': 41, 'times': 71, 'advancing': 45, 'parcel': 86, 'aging': 115, 'autonomy': 59, 'end': 2254, 'provide': 241, 'Journal': 70577, 'feature': 767, 'makes': 74, 'how': 299, 'hot': 293, 'writers': 2166, 'significance': 577, 'ancestor': 112, 'A': 6726, 'fever': 496, 'description': 355, 'beauty': 227, 'after': 1395, 'loyalty': 42, 'demanding': 41, 'Ocean': 91, 'shores': 762, 'wrong': 143, 'induce': 75, 'destined': 162, 'president': 4350, 'law': 700, 'parallel': 58, 'types': 43, 'All': 729, 'attempt': 65, 'third': 418, 'Civil': 1011, 'amid': 203, 'grant': 51, 'revolt': 124, 'headquarters': 53, 'capitalism': 54, 'complexity': 102, 'green': 1923, 'capitalist': 57, 'correlation': 141, 'Trust': 1492, 'order': 1105, 'wind': 188, 'wine': 276, 'operations': 67, 'Brazil': 206, 'Ancient': 1095, 'interpretation': 166, 'Colonial': 1367, 'over': 3152, 'prohibition': 75, 'dared': 46, 'mayor': 593, 'before': 1117, 'Urban': 1285, 'His': 215, 'fit': 109, 'personal': 44, 'Monetary': 1870, 'writing': 428, 'Anthropology': 528, 'production': 601, 'missionary': 156, 'Birth': 139, 'strength': 534, 'Spain': 49, 'then': 252, 'them': 975, 'diseases': 48, 'Berlin': 973, 'prevented': 100, 'weakness': 367, 'Address': 103, 'Thomas': 126, 'Analysis': 364, 'band': 274, 'detachment': 232, 'penetration': 259, 'effects': 277, 'they': 1079, 'plates': 440, 'schools': 645, 'one': 7687, 'Control': 1274, 'suicide': 49, 'indictment': 62, 'Liberal': 271, 'Tax': 350, 'edge': 55, 'princes': 108, 'India': 96, 'slavery': 2537, 'Four': 284, 'Archaeological': 68, 'daughters': 76, 'each': 405, 'went': 369, 'Agriculture': 847, 'side': 3533, 'mean': 475, 'expectancy': 128, 'prohibit': 277, 'preceded': 51, 'financial': 210, 'councils': 122, 'series': 534, 'principles': 249, 'Medicine': 58, 'literature': 4686, 'trading': 484, 'fortnight': 163, 'reached': 107, 'laboratory': 85, 'flock': 42, 'merchants': 155, 'Holy': 105, 'Local': 283, 'tales': 40, 'Wars': 60, 'vessels': 62, 'content': 90, 'encourage': 92, 'adapt': 44, 'enable': 53, 'newly': 1262, 'millions': 703, 'independence': 3659, 'foundation': 259, 'Cuban': 54, 'turning': 48, 'University': 4784, 'Work': 184, 'given': 486, 'free': 1601, 'differing': 41, 'USSR': 204, 'formation': 1083, 'struggle': 1792, 'violation': 1106, 'enlightened': 48, 'wanted': 69, 'Studies': 69143, 'Cambridge': 753, 'created': 104, 'National': 57607, 'Eastern': 16437, 'messages': 41, 'days': 264, 'Its': 501, 'shipped': 98, 'Embassy': 1162, 'Human': 10507, 'kingdom': 339, 'Publications': 154, 'Universities': 1002, 'industrial': 212, 'features': 713, 'render': 42, 'grade': 64, 'Commercial': 520, 'peculiarity': 83, 'adopted': 775, 'another': 146, 'hear': 58, 'AIDS': 322, 'revival': 1463, 'indicate': 187, 'Academy': 227, 'girls': 768, 'approximately': 42, 'fiction': 45, 'needed': 99, 'rates': 285, 'observations': 156, 'worlds': 192, 'Press': 856, 'percentage': 1325, 'genesis': 175, 'urban': 648, 'Search': 540, 'collapse': 572, 'happily': 67, 'villages': 138, 'serve': 1904, 'took': 336, 'wisdom': 70, 'western': 439, 'somewhat': 42, 'wasted': 45, 'peculiar': 198, 'positively': 49, 'Advisory': 235, 'anxiety': 46, 'roles': 185, 'participated': 139, 'immigrants': 1324, 'tree': 121, 'second': 1053, 'nations': 3242, 'project': 208, 'matter': 228, 'See': 952, 'classes': 152, 'Relation': 140, 'historical': 276, 'contrasted': 224, 'powers': 124, 'Biography': 188, 'palace': 55, 'Chicago': 57, 'respecting': 94, 'modern': 1213, 'mind': 294, 'eyes': 230, 'raw': 58, 'purchaser': 102, 'manner': 191, 'seen': 321, 'subjects': 48, 'theology': 92, 'tells': 57, 'dozen': 218, 'forced': 249, 'functioning': 197, 'Christian': 866, 'Egyptian': 162, 'climate': 204, 'responsible': 88, 'recommended': 56, 'materials': 58, 'Minister': 2019, 'forces': 3101, 'Bank': 5421, 'Subcommittee': 61, 'even': 96, 'shall': 551, 'Frederick': 42, 'Reports': 187, 'affirmation': 131, 'carcinoma': 57, 'letter': 143, 'entry': 113, 'phase': 269, 'grave': 58, 'peers': 54, 'professor': 304, 'partition': 164, 'Are': 156, 'accounted': 75, 'principle': 181, 'Institutes': 42, 'Native': 8706, 'consumer': 85, 'notion': 708, 'came': 717, 'incorporate': 56, 'renewed': 46, 'opinions': 73, 'Union': 15157, 'Agency': 147, 'nominated': 40, 'eminent': 56, 'religions': 430, 'attempts': 47, 'abilities': 171, 'radio': 114, 'solutions': 100, 'conclusion': 192, 'thirds': 42, 'Nutrition': 103, 'Resources': 510, 'European': 12462, 'lessons': 78, 'overwhelming': 799, 'resemblance': 86, 'just': 131, 'dawn': 42, 'peasants': 118, 'herself': 72, 'Republicans': 59, 'Free': 1904, 'Records': 429, 'bush': 565, 'version': 1473, 'rich': 44, 'History': 12675, 'personality': 291, 'continuity': 105, 'do': 605, 'Account': 181, 'mixture': 1689, 'compounded': 40, 'air': 218, 'de': 119, 'stop': 124, 'psychiatric': 41, 'coast': 40904, 'altogether': 60, 'report': 299, 'Organization': 12312, 'quarters': 67, 'Soviet': 235, 'comply': 54, 'societies': 3160, 'earn': 322, 'countries': 54124, 'fields': 362, 'Mediterranean': 619, 'method': 43, 'twice': 207, 'ban': 382, 'aesthetic': 54, 'squadron': 212, 'steal': 108, 'secretary': 368, 'headed': 313, 'Caribbean': 1797, 'questions': 40, 'District': 78, 'Christianity': 886, 'observed': 50, 'decided': 69, 'result': 607, 'cattle': 68, 'discussions': 112, 'John': 397, 'best': 175, 'subject': 967, 'voyage': 473, 'said': 349, 'capacity': 352, 'lots': 65, 'away': 216, 'sail': 68, 'continent': 32863, 'Water': 134, 'pressures': 92, 'unable': 64, 'cooperation': 143, 'traits': 90, 'drawn': 158, 'previous': 48, 'discovery': 404, 'preserve': 231, 'we': 401, 'never': 420, 'terms': 1131, 'extend': 47, 'nature': 1179, 'handful': 391, 'ignorance': 75, 'extent': 728, 'news': 51, 'debt': 58, 'improve': 48, 'faced': 641, 'Among': 848, 'protect': 199, 'providing': 41, 'reported': 237, 'country': 4023, 'against': 4626, 'players': 51, 'faces': 40, 'Navy': 319, 'distinction': 282, 'contribution': 796, 'expense': 980, 'confronted': 86, 'admission': 292, 'appeared': 50, 'had': 11693, 'vast': 782, 'advancement': 460, 'represented': 138, 'initiative': 70, 'Board': 3249, 'path': 41, 'wider': 40, 'Movement': 1288, 'conference': 1365, 'Principles': 223, 'Literature': 5664, 'basis': 807, 'union': 1095, 'condemned': 46, 'Southern': 44093, 'three': 4191, 'been': 8003, 'Early': 522, 'extraction': 43, 'much': 1459, 'interest': 2002, 'basic': 96, 'expected': 147, 'figure': 119, 'duties': 59, 'threw': 97, 'life': 8146, 'families': 5068, 'eastern': 386, 'suppress': 625, 'concerning': 61, 'uncommon': 44, 'argument': 127, 'child': 523, 'worked': 522, 'Secretary': 1316, 'applied': 465, 'commerce': 207, 'exception': 355, 'has': 7135, 'Co': 968, 'ahead': 41, 'reciprocal': 48, 'Deputy': 139, 'economies': 1074, 'n': 1624, 'aim': 163, 'Culture': 3118, 'near': 347, 'suppose': 84, 'teachers': 204, 'aid': 86, 'property': 204, 'employees': 78, 'launched': 42, 'natives': 1274, 'seven': 43, 'Queen': 289, 'necessary': 41, 'played': 640, 'is': 34063, 'it': 4636, 'defeated': 58, 'player': 100, 'in': 318811, 'Puerto': 330, 'Tell': 53, 'if': 1211, 'confirmed': 56, 'descent': 21680, 'prior': 105, 'suggest': 715, 'make': 950, 'Regional': 844, 'rebellion': 54, 'complex': 68, 'split': 78, 'unfortunate': 40, 'Of': 50, 'President': 1296, 'colonies': 7270, 'several': 1457, 'grows': 50, 'independent': 5628, 'Institute': 26397, 'swell': 41, 'published': 1601, 'conversion': 75, 'rain': 211, 'hand': 52, 'Angeles': 67, 'Area': 78, 'Oil': 321, 'characters': 49, 'Home': 573, 'workings': 145, 'wives': 58, 'opportunity': 94, 'blamed': 42, 'tune': 82, 'shortly': 114, 'adolescents': 455, 'possessed': 46, 'humble': 77, 'cultural': 2020, 'academic': 477, 'stillness': 58, 'contact': 539, 'greatest': 686, 'incapable': 102, 'mother': 454, 'Table': 141, 'the': 1399402, 'camps': 69, 'corporate': 42, 'musical': 43, 'left': 291, 'leagues': 46, 'agency': 108, 'traditions': 1232, 'quoted': 76, 'unemployment': 248, 'ideas': 56, 'identify': 107, 'farther': 218, 'human': 389, 'facts': 148, 'Churches': 8339, 'plight': 1030, 'yet': 255, 'Jew': 231, 'comparisons': 48, 'hills': 43, 'similarities': 146, 'regarded': 76, 'character': 480, 'ideal': 100, 'belongs': 180, 'collections': 98, 'board': 115, 'easy': 65, 'prison': 113, 'News': 417, 'east': 1418, 'humanity': 169, 'gave': 266, 'Building': 58, 'dignity': 170, 'sailors': 66, 'James': 189, 'survival': 503, 'posed': 44, 'possible': 509, 'possibly': 46, 'fusion': 300, 'unification': 87, 'birth': 746, 'judge': 74, 'shadow': 151, 'unique': 152, 'dreams': 106, 'disadvantage': 111, 'desire': 211, 'psychological': 71, 'strengths': 490, 'Art': 1836, 'gift': 93, 'Providence': 47, 'specific': 60, 'officer': 119, 'night': 169, 'security': 1629, 'Prince': 72, 'towards': 362, 'right': 820, 'old': 1648, 'deal': 305, 'people': 19285, 'understand': 106, 'Colony': 123, "King's": 2153, 'elderly': 245, 'Commission': 4289, 'productions': 70, 'election': 353, 'Between': 126, 'bore': 65, 'enemies': 112, 'Asiatic': 2951, 'Life': 5805, 'proceeding': 80, 'theatre': 358, 'for': 88019, 'bottom': 152, 'establish': 208, 'normal': 57, 'creative': 181, 'individuals': 94, 'unit': 106, 'denied': 108, 'subordination': 131, 'Tradition': 936, 'participation': 1221, 'He': 1741, 'core': 416, 'School': 3129, 'knew': 62, 'Origins': 2737, 'marketing': 109, 'Daily': 53, 'translated': 41, 'Near': 1320, 'Property': 352, 'First': 1910, 'chapter': 52, 'newspapers': 110, 'citizens': 988, 'islands': 828, 'efforts': 585, 'Still': 80, 'Coming': 92, 'Use': 217, 'dilemma': 112, 'formerly': 115, 'presence': 2407, 'civil': 3117, 'prisoners': 249, 'intellectual': 110, 'two': 2280, 'down': 3062, 'valleys': 271, 'doctrine': 43, 'Korean': 408, 'soon': 96, 'existed': 256, 'rely': 41, 'fabric': 162, 'support': 1823, 'Post': 231, 'sovereign': 587, 'why': 489, 'fought': 285, 'editor': 134, 'way': 2269, 'resulted': 40, 'call': 57, 'was': 25825, 'war': 7551, 'synthesis': 298, 'head': 2105, 'medium': 155, 'economics': 699, 'form': 1054, 'offer': 43, 'magnificent': 46, 'becoming': 1255, 'differences': 433, 'landing': 45, 'System': 1389, 'failure': 470, 'heat': 132, 'incorporation': 173, 'astonished': 101, 'Nineteenth': 99, 'stand': 108, 'true': 997, 'especially': 479, 'continuance': 45, 'absent': 53, 'portions': 140, 'born': 595, 'extremity': 306, 'inside': 83, 'attached': 60, 'County': 58, 'considered': 114, 'Engineering': 151, 'Much': 60, 'Governments': 508, 'penal': 59, 'Sir': 234, 'later': 359, 'Publishing': 532, 'Introduction': 1003, 'evidence': 582, 'Century': 99, 'exist': 275, 'promised': 51, 'ship': 105, 'enlist': 55, 'injurious': 99, 'physical': 131, 'Only': 151, 'no': 2096, 'populations': 626, 'when': 3437, 'reality': 339, 'setting': 441, 'role': 3299, 'test': 87, 'picture': 179, 'brothers': 354, 'embodied': 71, 'models': 42, 'felt': 162, 'convincing': 40, 'diet': 42, 'invested': 58, 'scores': 228, 'authorities': 972, 'discharge': 98, 'aware': 63, 'younger': 390, 'outskirts': 48, 'Person': 52, 'assume': 104, 'proposed': 231, 'Mission': 1404, 'daily': 332, 'Second': 1176, 'time': 3404, 'conferred': 60, 'Empire': 1907, 'dust': 60, 'profits': 56, 'concept': 1476, 'managed': 47, 'proletariat': 44, 'impression': 52, 'dance': 389, 'Indians': 1059, 'focus': 180, 'themes': 68, 'manager': 143, 'Republic': 34500, 'grateful': 61, 'skin': 446, 'poetry': 41, 'chair': 102, 'Conflict': 100, 'retention': 157, 'factories': 64, 'Columbia': 62, 'disorders': 496, 'Fund': 494, 'father': 487, 'Standard': 1348, 'environment': 169, 'chances': 54, 'charge': 193, 'Struggle': 589, 'weakening': 53, 'terror': 53, 'suffered': 113, 'must': 1316, 'tidings': 48, 'Prize': 148, 'advantage': 170, 'Countries': 2041, 'cheap': 169, 'keeping': 183, 'choice': 49, 'congregation': 43, 'word': 277, 'Where': 63, 'governments': 2078, 'presented': 1654, 'did': 1597, 'die': 41, 'Global': 63, 'posts': 599, 'brother': 55, 'standards': 113, 'leave': 182, 'Episcopal': 34604, 'settle': 48, 'team': 326, 'prevent': 267, 'spiritual': 74, 'revolution': 628, 'findings': 101, 'dealing': 283, 'Protestant': 122, 'invaded': 63, 'cost': 60, 'Northwest': 740, 'Orange': 158, 'run': 101, 'port': 417, 'Red': 52, 'cargo': 445, 'Head': 138, 'constitution': 435, 'Theatre': 154, 'assistance': 212, 'favour': 170, 'shares': 132, 'supporter': 104, 'current': 238, 'annals': 515, 'decade': 154, 'goes': 95, 'reply': 40, 'appeal': 166, 'Fort': 341, 'satisfy': 41, 'Economics': 6984, 'Presidency': 46, 'heavily': 135, 'Age': 231, 'genes': 57, 'understanding': 1972, 'attainment': 108, 'twentieth': 116, 'groups': 1071, 'English': 4889, 'alone': 62, 'along': 8396, 'Times': 62, 'teaches': 74, 'appears': 112, 'change': 679, 'boy': 219, 'exemplified': 43, 'thirty': 148, 'root': 162, 'exploitation': 1196, 'accomplished': 45, 'chemical': 95, 'descended': 260, 'studies': 2124, 'influx': 210, 'beautiful': 40, 'elect': 43, 'perspectives': 79, 'Justice': 271, 'saints': 48, 'When': 456, 'forefront': 40, 'employer': 50, 'crisis': 409, 'market': 1431, 'everybody': 50, 'Emperor': 84, 'troops': 1500, 'working': 2178, 'positive': 458, 'visit': 333, 'territories': 4134, 'live': 921, 'opposed': 167, 'memory': 108, 'scope': 55, 'today': 53, 'rainfall': 129, 'entrance': 49, 'afford': 124, 'abandonment': 49, 'apparent': 89, "Children's": 91, 'everywhere': 50, 'Time': 68, 'cases': 125, 'effort': 51, 'Procedure': 337, 'organizations': 110, 'Charter': 15244, 'Family': 1447, 'indicates': 150, 'German': 796, 'thousands': 1730, 'originally': 40, 'Pacific': 264, 'cat': 41, 'values': 681, 'can': 747, 'royal': 53, 'growing': 574, 'making': 1014, 'Children': 248, 'slightest': 50, 'attain': 51, 'heart': 793, 'months': 156, 'stream': 56, 'Federal': 52, 'confused': 123, 'agent': 208, 'sample': 356, 'fortunes': 133, 'allowed': 324, 'occur': 59, 'rays': 536, 'requirements': 86, 'discussion': 629, 'Who': 810, 'write': 171, 'till': 45, 'fourth': 99, 'eight': 67, 'economy': 4609, 'map': 52, 'paramount': 693, 'coming': 90, 'critics': 111, 'southern': 2257, 'membership': 363, 'produce': 40, 'designed': 118, 'date': 47, 'representations': 161, 'data': 350, 'eighth': 110, 'View': 307, 'branches': 58, 'outline': 214, 'sectors': 48, 'Coal': 93, 'Series': 421, 'ordinarily': 47, 'inhabitants': 434, 'ones': 66, 'so': 1571, 'representation': 1223, 'travel': 506, 'talk': 43, 'Islam': 1495, 'serving': 168, 'indeed': 107, 'pointed': 211, 'garrison': 65, 'disintegration': 111, 'years': 1118, 'stability': 165, 'ended': 241, 'Trustees': 310, 'argued': 354, 'managers': 42, 'White': 3213, 'still': 550, 'birds': 126, 'Higher': 536, 'constituted': 45, 'group': 2781, 'thank': 55, 'organisation': 156, 'vicinity': 113, 'forms': 284, 'Muslims': 319, 'policy': 2578, 'World': 3779, 'main': 270, 'decades': 61, 'conquest': 304, 'happened': 175, 'non': 268, 'views': 168, 'Socialist': 111, 'reject': 41, 'nation': 2770, 'records': 64, 'She': 206, 'realities': 382, 'half': 530, 'not': 10920, 'now': 410, 'provision': 41, 'killed': 48, 'nor': 88, 'possess': 130, 'accommodate': 51, 'term': 115, 'equality': 156, 'name': 1022, 'corners': 56, 'drop': 53, 'advent': 116, 'possibilities': 58, 'Critical': 745, 'Agricultural': 2111, 'entirely': 154, 'quarter': 114, 'subjection': 87, 'Zealand': 314, 'square': 407, 'coastal': 116, 'Museum': 3866, 'Legal': 409, 'challenges': 107, 'owing': 76, 'begun': 41, 'authenticity': 73, 'girl': 617, 'minority': 127, 'living': 1793, 'shown': 203, 'opened': 152, 'factors': 49, 'tensions': 253, 'increase': 689, 'attracted': 99, 'contained': 78, 'investigation': 45, 'Formation': 63, 'emerged': 48, 'churches': 721, 'assurance': 41, 'Commerce': 498, 'attended': 187, 'theory': 83, 'Rights': 1643, 'cars': 138, 'million': 447, 'possibility': 265, 'quite': 275, 'Record': 416, 'advantages': 85, 'Security': 290, 'customs': 285, 'care': 397, 'advance': 223, 'training': 494, 'derivation': 82, 'language': 960, 'transition': 651, 'kingdoms': 408, 'motion': 72, 'turn': 135, 'place': 1456, 'massive': 175, 'invitation': 149, 'routes': 45, 'During': 47, 'promotion': 388, 'preach': 67, 'think': 205, 'first': 31371, 'origin': 4139, 'Even': 49, 'flying': 196, 'surviving': 40, 'spoken': 475, 'There': 286, 'Royal': 7454, 'declared': 269, 'Asia': 44, 'yourself': 126, 'Black': 7065, 'specifically': 44, 'directly': 204, 'specificity': 119, 'vote': 651, 'impossible': 534, 'message': 91, 'fight': 277, 'open': 240, 'size': 470, 'city': 1928, 'little': 113, 'bite': 325, 'breed': 54, 'Problems': 216, 'households': 210, 'plains': 65, 'teacher': 71, 'conservation': 68, 'draft': 81, 'convention': 184, 'white': 8793, 'Data': 62, 'friend': 92, 'mining': 587, 'that': 55065, 'suit': 63, 'released': 40, 'undoubtedly': 42, 'than': 9115, 'Earth': 884, 'population': 16378, 'wide': 245, 'television': 43, 'effective': 55, 'accordance': 279, 'rival': 43, 'diplomatic': 72, 'frontier': 104, 'future': 1733, 'were': 20743, 'Settlement': 61, 'and': 371526, 'Court': 2520, 'locations': 67, 'voices': 205, 'Field': 253, 'say': 774, 'Inquiry': 143, 'DNA': 47, 'Valley': 539, 'occupied': 58, 'any': 2588, 'prosperity': 128, 'speakers': 69, 'Jewish': 1348, 'warriors': 187, 'aside': 318, 'note': 215, 'emphasis': 56, 'potential': 100, 'take': 326, 'interior': 2142, 'performance': 736, 'wonder': 254, 'urge': 54, 'altered': 63, 'fundamental': 110, 'trace': 279, 'opposite': 295, 'price': 86, 'enter': 101, 'Slavery': 549, 'assault': 103, 'Wisconsin': 123, 'Theology': 316, 'hardships': 49, 'Portuguese': 805, 'operate': 79, 'bishops': 298, 'surprising': 317, 'feminine': 55, 'diversity': 187, 'average': 558, 'proud': 123, 'implications': 76, 'sale': 58, 'slaves': 17809, 'Preface': 93, 'Survey': 800, 'senior': 109, 'Guinea': 103, 'Fifth': 93, 'laws': 1199, "People's": 2812, 'shot': 99, 'show': 132, 'Political': 2857, 'contemporary': 499, 'Research': 8233, 'discovered': 43, 'Role': 515, 'Systems': 1261, 'Test': 54, 'corner': 528, 'geography': 190, 'colours': 57, 'Woman': 258, 'ratio': 166, 'gulf': 55, 'title': 62, 'proportion': 1614, 'state': 5018, 'liberation': 2316, 'only': 5609, 'going': 115, 'black': 3026, 'essence': 141, 'Services': 668, 'enthusiasm': 42, 'reminiscent': 50, 'dispute': 46, 'Applied': 124, 'Portugal': 320, 'viewpoint': 92, 'get': 82, 'contracts': 46, 'Annual': 3302, 'truly': 116, 'cannot': 368, 'nearly': 117, 'distinctly': 48, 'Interpretation': 458, 'prime': 79, 'Comparative': 871, 'Methods': 146, 'artist': 271, 'Genesis': 79, 'seldom': 47, 'median': 237, 'effectual': 563, 'miles': 1453, 'Manchester': 271, 'liable': 84, 'Education': 3826, 'where': 2076, 'vision': 265, 'staff': 102, 'Educational': 146, 'homes': 54, 'committee': 44, 'relative': 176, 'elected': 1786, 'college': 136, 'declares': 50, 'Grand': 321, 'encouraged': 45, 'Iowa': 107, 'voluntary': 92, 'That': 199, 'ways': 318, 'review': 59, 'representatives': 1066, 'boundaries': 50, 'outside': 399, 'Twentieth': 380, 'Members': 177, 'between': 16674, 'import': 109, 'across': 671, 'arrival': 276, 'Bible': 146, 'Natural': 867, 'Johnson': 86, 'Catholics': 98, 'legitimacy': 125, 'killing': 55, 'surrounded': 112, 'Directors': 47, 'implemented': 69, 'Each': 78, 'article': 304, 'cities': 705, 'come': 342, 'concentrated': 42, 'stages': 567, 'many': 21094, 'region': 2549, 'priests': 42, 'resided': 71, 'tour': 352, 'Statistical': 230, 'railway': 348, 'expression': 407, 'Investment': 86, 'comes': 251, 'duty': 107, 'among': 9398, 'historians': 49, 'cancer': 374, 'passenger': 55, 'color': 299, 'continuation': 194, 'mandate': 211, 'colony': 705, 'period': 732, 'Sciences': 960, 'maintained': 43, 'strengthening': 126, 'approval': 67, 'Hundred': 476, 'save': 40, 'arts': 84, 'Assembly': 105, 'capable': 233, 'stretch': 285, 'west': 1789, 'territorial': 62, 'immunodeficiency': 416, 'tropical': 156, 'deposited': 58, 'Schools': 262, 'linguistic': 287, 'borders': 541, 'wants': 84, 'direction': 79, 'educated': 496, 'thousand': 188, 'formed': 438, 'observers': 216, 'wake': 86, 'Dominion': 145, 'Action': 1151, 'spirit': 614, 'those': 6892, 'case': 3864, 'developing': 200, 'these': 845, 'mount': 40, 'polls': 58, 'cast': 48, 'policies': 181, 'newspaper': 519, 'situation': 7729, 'Application': 626, 'margin': 193, 'invited': 40, 'eventually': 51, 'determination': 209, 'characteristics': 185, 'canon': 121, 'coffee': 47, 'commander': 45, 'middle': 2151, 'embrace': 76, 'fame': 45, 'movements': 584, 'Division': 481, 'Israel': 205, 'Towards': 463, 'handled': 48, 'author': 151, 'media': 45, 'granted': 124, 'seeking': 44, 'reminded': 49, 'same': 236, 'Historical': 11050, 'speech': 363, 'Blood': 801, 'regime': 1586, 'events': 109, 'tried': 148, 'status': 5139, 'oil': 189, 'centuries': 64, 'sickness': 500, 'I': 851, 'assist': 117, 'IV': 78, 'director': 564, 'persons': 5578, 'fruit': 152, 'Iron': 616, 'intended': 52, 'Tribes': 346, 'Physics': 140, 'tradition': 3950, 'vantage': 78, 'theater': 273, 'modes': 53, 'bondage': 46, 'Employment': 81, 'constitutional': 73, 'It': 462, 'Philosophical': 1008, 'without': 159, 'solve': 53, 'bottle': 50, 'In': 7292, 'inability': 148, 'model': 828, 'otherwise': 43, 'Nazi': 70, 'dimension': 52, 'researchers': 53, 'scholarship': 43, 'summer': 97, 'United': 6595, 'Service': 1218, 'being': 1213, 'tip': 205, 'rest': 350, 'communities': 2444, 'civilized': 47, 'Examination': 91, 'aspect': 1364, 'touch': 240, 'speed': 57, 'captured': 42, 'death': 435, 'Municipal': 86, 'thinking': 119, 'seems': 148, 'except': 67, 'improvement': 184, 'instrument': 75, 'interested': 317, 'treatment': 1641, 'Although': 313, 'overthrow': 150, 'republic': 167, 'extensive': 55, 'Change': 2749, 'real': 84, 'aspects': 1504, 'around': 252, 'read': 126, 'Many': 374, 'specimen': 51, 'Nile': 53, 'confines': 49, 'dark': 265, 'traffic': 750, 'pages': 82, 'confined': 127, 'world': 1818, 'accepted': 216, 'acquainted': 43, 'identifying': 42, 'Old': 121, 'collective': 62, 'Contemporary': 707, 'annually': 59, 'benefit': 417, 'grants': 65, 'London': 2572, 'either': 535, 'served': 2669, 'reduced': 51, 'Democracy': 323, 'Army': 360, 'West': 72652, 'Since': 122, 'competition': 46, 'service': 104, 'Archives': 109, 'respect': 157, 'images': 951, 'New': 5521, 'racial': 767, 'Mary': 47, 'provided': 164, 'dimensions': 125, 'Government': 15829, 'memories': 44, 'Men': 424, 'extensively': 153, 'legal': 228, 'Customs': 2164, 'Southeast': 144, 'statements': 44, 'imply': 54, 'restoring': 45, 'freed': 100, 'Chapter': 56, 'contingent': 143, 'power': 1545, 'intimate': 77, 'equivalent': 160, "nation's": 134, 'leadership': 164, 'Behavior': 143, 'found': 3589, 'refers': 116, 'earlier': 101, 'on': 76352, 'stone': 68, 'Revolution': 281, 'central': 655, 'origins': 668, 'stations': 187, 'zealous': 145, 'of': 1087955, 'industry': 425, 'violence': 279, 'generated': 41, 'slender': 90, 'Trade': 9324, 'tremendous': 46, 'Paper': 682, 'nothing': 85, 'evident': 42, 'mixed': 1183, 'or': 9892, 'tribe': 487, 'Story': 1392, 'lands': 204, 'burning': 61, 'inclusion': 52, 'whites': 3149, 'instruments': 97, 'image': 740, 'skilled': 84, 'Advancement': 1685, 'persecution': 122, 'involvement': 752, 'But': 50, 'parties': 245, 'Meeting': 2369, 'intervention': 269, 'legacy': 746, 'her': 1442, 'area': 564, 'strictly': 136, 'there': 2802, 'Collection': 174, 'sides': 250, 'low': 105, 'lot': 202, 'cultures': 1019, 'valley': 370, 'complete': 49, 'enough': 99, 'petty': 407, 'regard': 190, 'amongst': 165, 'Treaty': 1186, 'poet': 350, 'statesman': 215, 'survived': 40, 'mice': 48, 'trying': 70, 'with': 31607, 'Guide': 420, 'buying': 49, 'happening': 91, 'hire': 181, 'Writings': 58, 'romantic': 171, 'embedded': 59, 'monopoly': 445, 'House': 981, 'objective': 149, 'stake': 113, 'using': 42, 'Society': 11126, 'organize': 44, 'beliefs': 259, 'strongly': 100, 'money': 166, 'citizenship': 815, 'USA': 130, 'Space': 235, 'certain': 320, 'describe': 177, 'moved': 412, 'sales': 127, 'deep': 87, 'fellow': 48, 'How': 176, 'as': 41506, 'Production': 300, 'hypertension': 168, 'together': 104, 'formulation': 51, 'politics': 1882, 'horse': 53, 'bishop': 921, 'film': 236, 'Criminal': 432, 'administered': 205, 'graduate': 911, 'field': 3745, 'Atlantic': 319, 'you': 425, 'Law': 2006, 'poor': 1148, 'Great': 947, 'separate': 149, 'students': 10353, 'symbol': 154, 'teeth': 47, 'Presbyterian': 60, 'important': 1794, 'nucleus': 57, 'included': 209, 'sponsored': 263, 'Essay': 855, 'Mexican': 193, 'calls': 43, 'wife': 246, 'invest': 94, 'Modern': 2781, "city's": 882, 'directors': 59, 'mass': 299, 'resembles': 45, 'remarkable': 43, 'oral': 1153, 'original': 44, 'refused': 141, 'minister': 805, 'Canada': 90, 'represent': 110, 'all': 10522, 'Practice': 137, 'founder': 254, 'caused': 92, 'Workers': 968, 'lack': 245, 'concern': 222, 'founded': 428, 'coach': 52, 'welfare': 546, 'customary': 157, 'settlement': 113, 'religious': 999, 'children': 8907, 'causes': 206, 'repetition': 101, 'referring': 53, 'hunting': 147, 'lining': 44, 'opportunities': 1437, 'former': 2332, 'to': 151287, 'program': 189, 'presentation': 62, 'Christians': 106, 'activities': 408, 'belonging': 108, 'woman': 11599, 'appointment': 600, 'returned': 107, 'sitting': 41, 'very': 1214, 'translations': 71, 'fat': 71, 'resistance': 211, 'sons': 76, 'difference': 282, 'condition': 1296, 'treating': 43, 'illustrated': 44, 'Nursing': 604, 'mothers': 61, 'deficient': 52, 'Arabs': 113, 'list': 48, 'joined': 809, 'occurs': 53, 'large': 6429, 'small': 2079, 'Impact': 237, 'neighborhood': 165, 'ten': 48, 'past': 1068, 'go': 242, 'rate': 1312, 'perspective': 1481, 'lawyer': 50, 'further': 158, 'East': 73252, 'investment': 270, 'what': 618, 'Latin': 3817, 'imported': 1272, 'darkness': 157, 'sun': 1427, 'section': 2723, 'brief': 78, 'Cultural': 582, 'colonists': 303, 'attitudes': 183, 'Cape': 2642, 'learned': 45, 'public': 1048, 'contrast': 126, 'movement': 2175, 'edited': 88, 'full': 251, 'estate': 64, 'component': 270, 'provides': 45, 'Mental': 49, 'operating': 53, 'strong': 274, 'tragedy': 67, 'arena': 57, 'search': 629, 'Jews': 853, 'conviction': 102, 'Communication': 222, 'experience': 7110, 'evil': 82, 'soldier': 261, 'simply': 49, 'social': 2387, 'cessation': 91, 'Constitution': 699, 'family': 2614, 'Reform': 84, 'Richmond': 284, 'put': 87, 'regiment': 43, 'armed': 368, 'select': 91, 'readily': 51, 'Europe': 111, 'literary': 2056, 'eye': 131, 'distinct': 128, 'finally': 125, 'soil': 1464, 'taken': 568, 'achieving': 98, 'more': 6554, 'rhythm': 45, 'definition': 58, 'engaged': 397, 'company': 337, 'St': 416, 'American': 241683, 'Encyclopedia': 884, 'broke': 904, 'particular': 440, 'known': 817, 'foundations': 107, 'sleeping': 677, 'Oral': 1085, 'analogous': 41, 'town': 300, 'respected': 47, 'none': 276, 'Now': 87, 'strain': 72, 'science': 139, 'launch': 71, 'guards': 62, 'remain': 56, 'nine': 179, 'sent': 226, 'evolved': 43, 'learn': 289, 'abandon': 88, 'male': 946, 'history': 14514, 'Conference': 12515, 'stated': 163, 'share': 41, 'accept': 163, 'states': 18662, 'collision': 111, 'numbers': 3949, 'sense': 651, 'exclude': 476, 'species': 1167, 'firmly': 50, 'influenced': 303, 'needs': 4152, 'Parliament': 2404, 'court': 197, 'Federation': 8830, 'rather': 440, 'breaking': 57, 'surpassed': 92, 'influences': 163, 'Fall': 328, 'expecting': 42, 'plant': 89, 'plans': 65, 'advice': 68, 'derived': 324, 'different': 747, 'Foreign': 2008, 'replace': 140, 'tries': 68, 'ambassador': 52, 'blood': 1395, 'faculty': 55, 'Swiss': 111, 'invasion': 334, 'appealed': 117, 'derives': 49, 'response': 817, 'a': 76660, 'register': 137, 'shore': 937, 'deterioration': 73, 'banks': 139, 'pleasure': 79, 'Armed': 134, 'privileges': 42, 'Film': 682, 'infant': 139, 'help': 285, 'developed': 473, 'mission': 231, 'trade': 15302, 'held': 1134, 'attitude': 628, 'embassy': 1131, 'through': 1314, 'Forces': 69, 'Income': 500, 'existence': 1600, 'suffer': 88, 'its': 5408, 'roots': 1143, 'level': 236, 'gentlemen': 70, 'style': 54, 'consent': 104, 'Working': 1268, 'possessions': 954, 'Philosophy': 984, 'late': 584, 'absence': 277, 'systems': 267, 'founders': 285, 'seventeenth': 52, 'might': 313, 'Sound': 78, 'Evolution': 122, 'good': 47, 'return': 446, 'Ambassador': 206, 'food': 180, 'males': 2209, 'socially': 73, 'reflected': 107, 'Youth': 1890, 'framework': 62, 'Victorian': 61, 'party': 260, 'foot': 1270, 'association': 231, 'sympathy': 56, 'always': 274, 'intervene': 47, 'Authority': 262, 'habits': 106, 'Council': 19538, 'Scottish': 64, 'inherent': 102, 'residing': 103, 'friendship': 150, 'ports': 866, 'characterized': 113, 'referred': 87, 'mental': 519, 'generation': 1746, 'jaws': 433, 'oppressed': 220, 'idea': 3144, 'police': 2174, 'Quarterly': 350, 'extended': 215, 'countrymen': 98, 'Essays': 847, 'beyond': 153, 'event': 268, 'travels': 72, 'flower': 81, 'Orleans': 222, 'blacks': 275, 'item': 47, 'since': 580, 'Lakes': 2880, 'publish': 54, 'research': 223, 'Renaissance': 466, 'participants': 112, 'Defense': 560, 'health': 1373, 'Report': 2149, 'issue': 756, 'Class': 145, 'Town': 353, 'None': 89, 'refusal': 60, 'belief': 656, 'circumstance': 43, 'confidence': 159, 'differ': 56, 'reason': 233, 'members': 5614, 'backed': 93, 'teach': 108, 'earliest': 343, 'beginning': 355, 'II': 60, 'bring': 303, 'Congress': 53715, 'People': 2592, 'Boston': 386, 'bourgeoisie': 449, 'Years': 1894, 'conducted': 164, 'prejudice': 105, 'threat': 383, 'leading': 848, 'undergo': 47, 'concluded': 109, 'major': 646, 'rooted': 321, 'slipped': 40, 'feel': 86, 'relate': 71, 'Spaniards': 48, 'number': 22672, 'preservation': 64, 'feet': 93, 'sisters': 268, 'Indian': 11253, 'done': 288, 'embraced': 160, 'causing': 46, 'story': 1385, 'heads': 720, 'Dictionary': 167, 'utilization': 50, 'introduction': 1098, 'twenty': 66, 'financed': 107, 'coasts': 829, 'revive': 229, 'least': 554, 'Order': 191, 'Environmental': 157, 'station': 220, 'saint': 103, 'master': 48, 'scheme': 95, 'too': 142, 'selling': 95, 'passed': 224, 'Language': 838, 'immediate': 83, 'employ': 42, 'part': 17050, 'differentiate': 47, 'authors': 46, 'Problem': 939, 'believe': 514, 'Road': 144, 'reflection': 84, 'king': 102, 'financing': 75, 'hundred': 485, 'Cancer': 91, 'mortality': 108, 'largely': 83, 'grey': 70, 'levels': 70, 'youth': 1611, 'nineteenth': 50, 'marriage': 114, 'toward': 136, 'treated': 114, 'Affairs': 5802, 'Defence': 983, 'Welfare': 75, 'strengthen': 271, 'imports': 82, 'ages': 224, 'Current': 180, 'substantial': 245, 'alike': 57, 'concentration': 275, 'option': 74, 'States': 10411, 'relationship': 774, 'Power': 392, 'Food': 732, 'self': 91, 'cave': 150, 'Virginia': 107, 'Arab': 1841, 'majority': 11081, 'internal': 277, 'Germany': 44, 'chairman': 51, 'populous': 109, 'With': 91, 'Spirit': 146, 'province': 41, 'play': 113, 'experienced': 491, 'reach': 338, 'educate': 44, 'Governor': 167, 'depths': 61, 'most': 9828, 'virus': 622, 'experiences': 4208, 'Common': 1078, 'significant': 863, 'undertaken': 153, 'services': 486, 'The': 42114, 'achievement': 351, 'appear': 101, 'Form': 66, 'appreciation': 123, 'Philadelphia': 276, 'Publication': 85, 'clear': 748, 'popularity': 60, 'traditional': 4527, 'concessions': 53, 'institutions': 601, 'lying': 122, 'joining': 60, 'sector': 189, 'particularly': 776, 'phenomenon': 49, 'gold': 876, 'fond': 137, 'scattered': 82, 'Ministers': 1810, 'relation': 409, 'fine': 46, 'find': 104, 'occupation': 590, 'Kingdom': 1160, 'cell': 650, 'giant': 97, 'northern': 352, 'justice': 221, 'writer': 1194, 'French': 3822, 'beat': 54, 'failed': 46, 'factor': 362, 'columns': 166, 'Prime': 779, 'his': 4688, 'founding': 483, 'gains': 86, 'distance': 407, 'dependent': 267, 'connections': 63, 'kind': 102, 'famous': 598, 'While': 140, 'grew': 530, 'actions': 99, 'closely': 151, 'international': 339, 'during': 4749, 'banner': 114, 'him': 897, 'grievances': 47, 'sources': 247, 'remove': 195, 'brush': 59, 'common': 3888, 'withdrawal': 1945, 'peasantry': 304, 'river': 67, 'List': 114, 'wrote': 223, 'disappearance': 75, 'set': 1476, 'art': 889, 'For': 312, 'carrying': 79, 'instituted': 51, 'tended': 64, 'France': 610, 'culture': 8108, 'see': 1090, 'defense': 99, 'are': 15366, 'sea': 197, 'close': 606, 'bark': 121, 'Sex': 1107, 'breast': 175, 'expert': 114, 'mine': 195, 'distinctive': 197, 'unions': 1356, 'Part': 44, 'various': 1076, 'Light': 424, 'probably': 131, 'conditions': 1660, 'beasts': 53, 'available': 609, 'Sea': 120, 'sold': 162, 'attention': 250, 'Florida': 120, 'respondents': 66, 'succeed': 48, 'premise': 41, 'African': 9484, 'bibliography': 193, 'pathogenesis': 58, 'both': 7743, 'prospects': 198, 'feeling': 207, 'last': 227, 'license': 56, 'Greek': 132, 'influential': 63, 'annual': 235, 'foreign': 495, 'connection': 93, 'became': 2011, 'acquisition': 60, 'context': 9631, 'poverty': 836, 'whole': 6896, 'Legislative': 1335, 'point': 2567, 'reasons': 126, 'loan': 67, 'Published': 51, 'sweep': 41, 'community': 24373, 'coincidence': 55, 'newborn': 182, 'Sociology': 944, 'agents': 253, 'adaptation': 40, 'Way': 283, 'village': 499, 'vessel': 201, 'throughout': 1184, 'Distribution': 58, 'belt': 172, 'decline': 287, 'described': 211, 'willingness': 146, 'Students': 3293, 'Changes': 127, 'create': 89, 'acceptance': 185, 'political': 5233, 'Economic': 6726, 'Most': 498, 'reduction': 82, 'California': 321, 'Plan': 157, 'God': 66, 'collected': 63, 'territory': 642, 'meeting': 1313, 'Women': 6150, 'firm': 70, 'dialogue': 125, 'lived': 1398, 'partly': 74, 'champion': 140, 'gay': 713, 'fire': 343, 'cruelty': 176, 'gap': 493, 'Allied': 716, 'lives': 7555, 'representative': 326, 'systematic': 185, 'demand': 440, 'domination': 101, 'towns': 392, 'redemption': 66, 'Late': 50, 'conception': 404, 'look': 142, 'socialist': 49, 'governor': 93, 'By': 128, 'landed': 363, 'while': 219, 'socialism': 45, 'College': 1513, 'behavior': 487, 'Church': 15359, 'Cold': 91, 'Market': 838, 'City': 1595, 'pound': 151, 'Italy': 43, 'century': 146, 'Mother': 270, 'rulers': 149, 'itself': 113, 'ready': 109, 'seem': 86, 'currents': 41, 'Stock': 52, 'expulsion': 50, 'unworthy': 69, 'fate': 175, 'Associate': 178, 'belong': 410, 'Nation': 126, 'discourse': 277, 'Session': 347, 'virtually': 88, 'widely': 126, 'peoples': 11722, 'products': 204, 'composition': 226, 'conflict': 248, 'higher': 1796, 'development': 10467, 'About': 98, 'used': 966, 'affairs': 1792, 'Naples': 52, 'negotiate': 199, 'moment': 118, 'traditionally': 51, 'arrived': 84, 'purpose': 46, 'realm': 91, 'residents': 436, 'predecessors': 42, 'recent': 496, 'Project': 143, 'lower': 104, 'older': 564, 'Technology': 304, 'spent': 58, 'analysis': 880, 'theme': 225, 'person': 1079, 'Congo': 268, 'organization': 359, 'things': 86, 'abstain': 61, 'settlements': 122, 'networks': 47, 'Labor': 363, 'Reconstruction': 180, 'Papers': 383, 'Deputies': 62, 'Territory': 74, 'prince': 123, 'wage': 56, 'Theory': 729, 'also': 2930, 'workers': 1711, 'exclusively': 43, 'source': 125, 'labour': 1955, 'parents': 298, 'exhibition': 49, "Majesty's": 80, 'relevance': 48, 'prominent': 705, 'Marine': 359, 'transformation': 785, 'build': 92, 'associations': 92, 'On': 521, 'Last': 132, 'They': 68, 'demands': 46, 'big': 57, 'couple': 168, 'dominions': 92, 'Air': 1475, 'Truth': 729, 'donor': 55, 'damage': 89, 'emergence': 1651, 'imposed': 97, 'lost': 136, 'Short': 118, 'Methodist': 37817, 'communications': 48, 'continue': 632, 'Missionary': 914, 'popular': 407, 'tribes': 2012, 'Treatment': 738, 'Revolutionary': 239, 'Management': 395, 'often': 215, 'Gulf': 62, 'fathers': 65, 'creation': 2782, 'some': 6161, 'back': 652, 'delegation': 470, 'urgent': 54, 'economic': 2172, 'palm': 189, 'sight': 175, 'Economy': 2609, 'facing': 200, 'scale': 49, 'affects': 61, 'though': 103, 'integration': 128, 'per': 527, 'religion': 1328, 'pen': 59, 'civilization': 322, 'superiority': 70, 'be': 11967, 'varying': 123, 'replaced': 157, 'patient': 70, 'Look': 128, 'peculiarly': 170, 'Bill': 97, 'continuing': 85, 'lakes': 614, 'David': 48, 'Command': 117, 'universe': 69, 'everyday': 49, 'by': 49238, 'wealth': 109, 'faith': 120, 'goods': 338, 'god': 150, 'Chief': 74, 'drama': 81, 'range': 633, 'involving': 433, 'militia': 58, 'Committee': 3784, 'mouth': 191, 'farmers': 278, 'silence': 85, 'plan': 44, 'enduring': 499, 'into': 4292, 'within': 1478, 'Young': 231, 'Their': 84, 'because': 1509, 'instruction': 109, 'statistics': 91, 'verbal': 47, 'renewal': 55, 'exposed': 44, 'question': 1312, 'vigour': 41, 'long': 1020, 'custom': 131, 'heritage': 2443, 'sections': 1185, 'Section': 269, 'Steel': 489, 'Baptist': 1445, 'himself': 387, 'an': 70841, 'elsewhere': 111, 'registered': 89, 'criteria': 50, 'immigration': 156, 'boys': 454, 'link': 169, 'hopes': 87, 'line': 82, 'considerable': 535, 'directed': 125, 'Irish': 167, 'skull': 151, 'characteristic': 808, 'Africa': 6418, 'up': 3366, 'us': 289, 'Egypt': 322, 'Books': 65, 'exploration': 559, 'similar': 576, 'called': 954, 'sort': 44, 'mediation': 80, 'associated': 455, 'adults': 191, 'expedition': 424, 'influence': 1172, 'commissioner': 61, 'To': 995, 'single': 785, 'diverse': 49, 'International': 23912, 'Sierra': 249, 'nationalism': 2331, 'kidney': 1182, 'brethren': 327, 'Commissioner': 124, 'ancestors': 63, 'peace': 75, 'fears': 71, 'politicians': 231, 'points': 151, 'statement': 103, 'recognition': 566, 'income': 561, 'department': 231, 'Case': 588, 'draw': 65, 'elements': 582, 'beginnings': 208, 'problems': 1888, 'pretended': 56, 'prepared': 74, 'Thought': 2587, 'meaning': 114, 'restoration': 254, 'audience': 82, 'suppression': 560, 'Making': 1257, 'Railway': 561, 'desert': 758, 'structure': 1219, 'mood': 75, 'ago': 48, 'urged': 221, 'land': 643, 'lead': 135, 'practically': 71, 'officers': 331, 'age': 370, 'An': 4897, 'As': 687, 'summit': 42, 'Asian': 32927, 'enrolled': 323, 'far': 239, 'management': 90, 'footing': 52, 'partial': 42, 'saying': 66, 'issued': 588, 'results': 67, 'alien': 140, 'efficient': 97, 'Bureau': 2499, 'hepatitis': 58, 'broader': 72, 'Department': 4563, 'Progress': 611, 'adjustment': 344, 'contributions': 368, 'issues': 296, 'seemed': 172, 'Business': 117, 'concerned': 155, 'young': 3965, 'appointed': 1153, 'languages': 4614, 'Joint': 98, 'topic': 49, 'behalf': 1707, 'excluded': 98, 'include': 350, 'friendly': 46, 'resources': 673, 'seized': 60, 'mainstream': 126, 'Canadian': 11374, 'reserved': 58, 'continues': 50, 'Constitutional': 158, 'wave': 272, 'wishes': 81, 'ending': 116, 'continued': 414, 'entire': 1222, 'treaties': 107, 'transmission': 61, 'positions': 44, 'deals': 52, 'race': 8917, 'realization': 51, 'refer': 499, 'settlers': 111, 'Select': 107, "government's": 526, 'Turkey': 45, 'smaller': 51, 'rivers': 434, 'State': 3186, 'dynamics': 283, 'folk': 281, 'auspices': 1546, 'universities': 327, 'reconciliation': 63, 'Experience': 1346, 'business': 52, 'Governors': 266, 'saw': 52, 'practices': 375, 'access': 235, 'experiment': 71, 'eating': 43, 'capital': 511, 'America': 3029, 'body': 171, 'impact': 1048, 'focuses': 59, 'led': 914, 'chose': 71, 'degree': 68, 'Freedom': 1318, 'Professor': 1027, 'commercial': 40, 'believed': 526, 'explore': 160, 'Marriage': 1044, 'others': 368, 'sing': 87, 'invented': 72, 'Confederation': 541, 'Commonwealth': 189, 'great': 3581, 'Convention': 565, 'Chinese': 91, 'receive': 1381, 'involved': 518, 'larger': 244, 'shades': 95, 'murdered': 53, 'suggests': 45, 'survey': 247, 'Arabia': 54, 'opinion': 130, 'implement': 42, 'Out': 222, 'Religious': 1455, 'honor': 44, 'involves': 203, 'composed': 74, 'named': 267, 'Muslim': 73, 'nearest': 107, 'win': 2126, 'Robert': 216, 'private': 99, 'outcomes': 113, 'Safety': 71, 'names': 285, 'apply': 140, 'Colonies': 846, 'route': 107, 'Status': 201, 'use': 1976, 'from': 31476, 'Northern': 829, 'freedoms': 110, 'remains': 194, 'illegal': 197, 'ministers': 85, 'few': 3493, 'doubt': 253, 'examination': 228, 'vehicle': 55, 'themselves': 948, 'possession': 219, 'Land': 1298, 'victory': 110, 'comparison': 369, 'becomes': 51, 'savage': 42, 'occurred': 42, 'cooperate': 53, 'train': 104, 'Bulletin': 1327, 'women': 24966, 'escaped': 113, 'account': 523, 'this': 3360, 'challenge': 93, 'Revenue': 50, 'Library': 5357, 'roar': 50, 'reserves': 142, 'crossed': 42, 'servants': 57, 'island': 573, 'closing': 69, 'drops': 47, 'withdrew': 50, 'proof': 85, 'control': 426, 'ordained': 79, 'Israeli': 131, 'Police': 1569, 'links': 275, 'plate': 239, 'process': 330, 'proportions': 132, 'exports': 238, 'high': 1000, 'Liberation': 194, 'Netherlands': 758, 'recognized': 416, 'Policy': 1586, 'Party': 6018, 'Australian': 571, 'native': 1454, 'educational': 187, 'six': 108, 'syndrome': 314, 'democracy': 42, 'sit': 168, 'permission': 114, 'struggles': 390, 'arrangement': 63, 'regions': 452, 'located': 72, 'responsibilities': 82, 'forest': 506, 'instead': 86, 'mysteries': 44, 'establishment': 2897, 'Notes': 100, 'stock': 40, 'profile': 52, 'tension': 133, 'farm': 59, 'waters': 122, 'philosophy': 351, 'British': 10569, 'chiefs': 120, 'concentrations': 144, 'abuse': 93, 'ethnic': 428, 'ties': 195, 'Gospel': 80, 'prostate': 66, 'solidarity': 541, 'varied': 77, 'emperor': 139, 'purely': 226, 'manufacturing': 75, 'light': 349, 'lines': 53, 'Community': 11638, 'One': 154, 'Middle': 10432, 'chief': 331, 'Politics': 3627, 'looking': 60, 'divided': 56, 'preparing': 49, 'Catholic': 845, 'holds': 52, 'producer': 42, 'Future': 91, "king's": 55, 'alliance': 48, 'Oriental': 3912, 'gender': 83, 'mentioned': 45, 'agricultural': 181, 'fortunate': 49, 'Body': 149, 'superior': 133, 'negotiated': 58, 'Scientific': 2008, 'chosen': 83, 'masses': 591, 'No': 74, 'ships': 288, 'Negro': 1718, 'Cabinet': 170, 'Director': 568, 'willing': 56, 'degrees': 283, 'criminal': 85, 'Task': 84, 'greater': 593, 'descendants': 3033, 'material': 80, 'hands': 1648, 'front': 45, 'Works': 105, 'day': 165, 'articles': 113, 'Supreme': 145, 'Friends': 100, 'university': 149, 'traced': 186, 'identified': 51, 'sufficiently': 43, 'mode': 349, 'shortage': 68, 'crossing': 41, 'accounts': 111, 'traces': 553, 'upward': 147, 'Arts': 461, 'From': 390, 'doing': 42, 'Washington': 635, 'adventure': 125, 'lawful': 75, 'Representatives': 56, 'related': 453, 'society': 6086, 'books': 501, 'our': 2447, 'sexual': 178, 'special': 528, 'out': 2460, 'Rio': 76, 'cerebral': 114, 'wholly': 221, 'may': 1712, 'Office': 403, 'cause': 2054, 'annexed': 57, 'announced': 317, 'achievements': 334, 'classification': 504, 'perish': 51, 'This': 238, 'activity': 45, 'delegates': 237, 'completely': 144, 'collection': 323, 'Narrative': 336, 'early': 1411, 'quo': 110, 'Standards': 71, 'interaction': 58, 'mainland': 524, 'Harvard': 92, 'Point': 156, 'deprived': 142, 'could': 1813, 'keep': 822, 'length': 196, 'allied': 87, 'Gothic': 56, 'explored': 46, 'Danube': 41, 'south': 1310, 'finest': 57, 'blown': 126, 'Review': 1788, 'Administration': 1605, 'scene': 986, 'owned': 76, 'improving': 189, 'Program': 453, 'such': 4585, 'owner': 61, 'Arabic': 251, 'quality': 269, 'legislative': 140, 'ancient': 47, 'North': 35016, 'reference': 540, 'Centre': 5453, 'flag': 419, 'system': 1464, 'relations': 2807, 'Coast': 3767, 'their': 7879, 'attack': 289, 'man': 2799, 'final': 150, 'Association': 13865, 'interests': 6281, 'negro': 860, 'environmental': 111, 'natural': 196, 'Democratic': 1898, 'elite': 365, 'institution': 681, 'succeeded': 144, 'Mass': 572, 'isolation': 127, 'lion': 125, 'claim': 104, 'patients': 741, 'individual': 250, 'visited': 260, 'Peter': 53, 'distinguished': 115, 'migration': 1447, 'behaviors': 56, 'Information': 191, 'Richard': 101, 'Other': 580, 'variable': 51, 'have': 11310, 'portrait': 131, 'need': 1135, 'element': 143, 'border': 280, 'viewed': 80, 'Development': 27156, 'Independent': 5803, 'documents': 139, 'studying': 74, 'After': 182, 'able': 239, 'mid': 180, 'purchasing': 40, 'disposition': 50, 'mix': 49, 'concerns': 109, 'which': 12245, 'competence': 95, 'vegetation': 51, 'savages': 153, 'unless': 45, 'clash': 57, 'centre': 896, 'who': 17003, 'connected': 76, 'demise': 47, 'Kings': 132, 'emancipation': 284, 'Rural': 1550, 'cabinet': 54, 'segment': 205, 'class': 3475, 'Reserve': 747, 'deny': 54, 'placement': 146, 'consequences': 50, 'bred': 41, 'Some': 777, 'face': 877, 'High': 576, 'pure': 93, 'Growth': 2290, 'pictures': 72, 'wounded': 48, 'typical': 102, 'Bishop': 175, 'fact': 2520, 'gain': 138, 'Provincial': 51, 'emerging': 142, 'Rise': 1443, 'Paris': 725, 'agreed': 373, 'supported': 292, 'Religion': 2678, 'Under': 139, 'soldiers': 922, 'fear': 144, 'Public': 886, 'Independence': 200, 'principal': 104, 'nearer': 57, 'promote': 339, 'based': 590, 'knowledge': 444, 'earned': 54, 'should': 2136, 'bases': 202, 'York': 2050, 'employee': 44, 'employed': 718, 'partially': 147, 'local': 622, 'inferior': 325, 'meant': 168, 'means': 437, 'lobe': 50, 'familiar': 48, 'Egyptians': 92, 'background': 322, 'bear': 108, 'joint': 58, 'made': 2701, 'words': 298, 'argues': 68, 'procedures': 69, 'shaping': 107, 'areas': 1509, 'following': 44, 'evolution': 1351, 'Because': 40, 'places': 143, 'organ': 522, 'course': 985, 'married': 365, 'suburbs': 41, 'calling': 126, 'Both': 132, 'she': 305, 'contain': 123, 'including': 409, 'generations': 243, 'view': 2858, 'exists': 140, 'practised': 42, 'acquired': 89, 'national': 921, 'humans': 111, 'C': 57, 'Source': 70, 'officials': 217, 'Handbook': 162, 'changes': 496, 'operated': 183, 'Central': 37875, 'responses': 110, 'closer': 103, 'exploited': 54, 'Slave': 7762, 'Interior': 68, 'placed': 137, 'pattern': 152, 'below': 64, 'tend': 600, 'favor': 214, 'written': 994, 'difficulties': 53, 'supernatural': 93, 'injustice': 46, 'correctly': 49, 'crude': 60, 'progress': 1026, 'neither': 131, 'comparable': 119, 'ability': 1337, 'opening': 251, 'importance': 2079, 'joy': 68, 'Study': 9224, 'infants': 73, 'Race': 371, 'key': 154, 'Islands': 201, 'distribution': 96, 'exclusion': 1248, 'drum': 46, 'career': 202, 'taking': 176, 'equal': 474, 'minds': 465, 'Crisis': 682, 'admit': 103, 'manifestations': 84, 'figures': 184, 'grounds': 168, 'relevant': 144, 'opposition': 1243, 'Her': 80, 'conclude': 336, 'removal': 244, 'resulting': 43, 'colleges': 208, 'Los': 67, 'Music': 1390, 'kings': 47, 'labor': 790, 'meantime': 42, 'provinces': 358, 'persistence': 100, 'addition': 362, 'discrimination': 633, 'Immigration': 396, 'cent': 569, 'genetic': 53, 'offers': 85, 'slowly': 42, 'treat': 59, 'Communities': 471, 'suggested': 111, 'shoulders': 213, 'controlled': 109, 'Center': 2521, 'powerful': 320, 'received': 44, 'am': 291, 'sufficient': 243, 'essentially': 42, 'Nations': 863, 'bulk': 50, 'accident': 40, 'expenditure': 61, 'ours': 64, "Women's": 1588, 'general': 1516, 'present': 488, 'divisions': 41, 'novel': 352, 'Force': 1908, 'imagination': 58, 'appearance': 227, 'value': 346, 'General': 1199, 'will': 2243, 'secrets': 46, 'deem': 41, 'wild': 126, 'balance': 236, 'discoveries': 103, 'at': 24094, 'almost': 515, 'thus': 126, 'Mines': 167, 'Imperial': 1103, 'Literary': 1131, 'watched': 46, 'claimed': 251, 'minorities': 172, 'let': 43, 'Garden': 60, 'perhaps': 103, 'began': 573, 'richness': 249, 'administration': 1055, 'cross': 45, 'unite': 105, 'member': 3136, 'unity': 1744, 'parts': 3893, 'geographical': 55, 'largest': 1044, 'Oxford': 1161, 'outbreak': 2277, 'difficult': 446, 'colonial': 237, 'practice': 234, 'slave': 15946, 'Third': 1679, 'novels': 46, 'Prophet': 128, 'Medical': 3528, 'logic': 126, 'upon': 1150, 'effect': 78, 'proprietors': 44, 'student': 321, 'fierce': 119, 'frequently': 164, 'indigenous': 416, 'judgment': 65, 'scholars': 192, 'identity': 875, 'destruction': 146, 'argue': 43, 'off': 2119, 'center': 746, 'Senate': 159, 'colour': 59, 'well': 4995, 'Idea': 171, 'thought': 549, 'command': 107, 'position': 2187, 'reside': 160, 'Provinces': 251, 'less': 1278, 'increasingly': 41, 'proximity': 147, 'executive': 107, 'domestic': 76, 'obtain': 53, 'aspirations': 986, 'Chairman': 542, 'Fourth': 48, 'seats': 85, 'Negroes': 240, 'Texas': 127, 'abolish': 52, 'rapid': 237, 'Aspects': 405, 'Far': 42, 'Company': 8387, 'supply': 322, 'sky': 211, 'lake': 81, 'realize': 127, 'book': 255, 'developments': 145, 'citizen': 65, 'Institution': 772, 'ought': 52, 'identical': 45, 'increased': 336, 'provisions': 426, 'government': 21939, 'judiciary': 50, 'republics': 303, 'built': 51, 'five': 308, 'know': 108, 'press': 1197, 'securities': 94, 'Nobel': 45, 'descendant': 244, 'loss': 52, 'England': 1333, 'like': 840, 'success': 916, 'admitted': 391, 'loses': 44, 'payments': 126, 'lose': 179, 'become': 1251, 'works': 742, 'Forest': 136, 'Documents': 248, 'belonged': 114, 'church': 2193, 'authority': 92, 'ideals': 166, 'Fellow': 51, 'growth': 2059, 'export': 57, 'Cooperation': 169, 'home': 215, 'empire': 625, 'employment': 612, 'transport': 155, 'happens': 119, 'Green': 46, 'nuclear': 119, 'operation': 594, 'broad': 52, 'mines': 1208, 'War': 13829, 'does': 269, 'warfare': 90, 'speak': 353, 'chains': 44, 'leader': 665, 'Population': 816, 'poets': 109, 'location': 72, 'journal': 1023, 'monetary': 55, 'expansion': 234, 'paper': 359, 'pressure': 405, 'host': 103, 'retaining': 40, 'prohibited': 227, 'although': 64, 'importation': 1088, 'Toward': 115, 'Archaeology': 1165, 'stage': 404, 'gained': 54, 'about': 3347, 'rare': 68, 'carried': 113, 'extension': 438, 'getting': 54, 'column': 42, 'freedom': 853, 'consciousness': 232, 'dependence': 193, 'US': 859, 'UN': 612, 'Americans': 159670, 'introduced': 145, 'norm': 44, 'Arctic': 185, 'own': 862, 'letters': 84, 'Psychology': 2110, 'Social': 2001, 'Two': 748, 'parliament': 1173, 'guard': 240, 'Labour': 1466, 'esteem': 48, 'female': 693, 'artists': 506, 'Approach': 91, 'significantly': 48, 'descriptions': 44, 'accused': 56, 'Finance': 417, 'adolescent': 639, 'whom': 928, 'Assistant': 1014, 'Upon': 294, 'awarded': 169, 'spread': 47, 'inner': 100, 'Child': 56, 'biggest': 52, 'Societies': 1965, 'function': 104, 'building': 163, 'buy': 111, 'north': 1127, 'Science': 713, 'but': 1577, 'Chamber': 434, 'construction': 200, 'Liberty': 58, 'courts': 315, 'counterparts': 396, 'highest': 364, 'he': 2310, 'Industrial': 1350, 'tactics': 65, 'prevalent': 594, 'Gold': 919, 'whether': 399, 'dangerous': 109, 'cells': 46, 'official': 48, 'record': 158, 'Britain': 982, 'stories': 77, 'ruling': 432, 'cake': 46, 'demonstrate': 120, 'problem': 1748, 'piece': 44, 'Insurance': 51, 'deaths': 44, 'nurses': 111, 'recognize': 116, 'year': 160, 'periods': 43, 'relates': 106, 'flight': 53, 'education': 3813, 'campus': 105, 'functions': 98, 'compared': 422, 'variety': 720, 'agriculture': 220, 'Vice': 61, 'percent': 2261, 'forests': 406, 'Islamic': 136, 'other': 28162, 'Five': 181, 'details': 86, 'branch': 441, 'better': 358, 'Carolina': 666, 'Structure': 319, 'wished': 47, 'Like': 188, 'pastor': 174, 'oppression': 1028, 'Year': 225, 'variation': 108, 'chance': 88, 'peasant': 92, 'genus': 433, 'Act': 1705, 'pertaining': 40, 'else': 118, 'friends': 176, 'South': 438925, 'appoint': 271, 'Corps': 444, 'ranks': 411, 'resolution': 66, 'space': 232, 'Thus': 49, 'rule': 304, 'portion': 1646, 'races': 440, 'Spanish': 745, 'Critique': 111, 'rural': 652, 'Geography': 617}
Agreement	{'Canada': 1715, 'all': 1527, 'results': 45, 'Lakes': 99, 'Continental': 123, 'hath': 49, 'settle': 130, 'Republic': 274, 'during': 328, 'Department': 55, 'Appendix': 82, 'Communist': 242, 'settlement': 65, 'whose': 130, 'eligible': 56, 'contained': 2646, 'Round': 1176, 'to': 52541, 'must': 392, 'present': 6658, 'under': 10967, 'Not': 65, 'breach': 1366, 'include': 93, 'sent': 67, 'jurisdiction': 46, 'far': 391, 'pursuant': 1346, 'void': 59, 'regional': 127, 'Charter': 273, 'verb': 522, 'fall': 54, 'affect': 48, 'ending': 133, 'enter': 1127, 'Origin': 52, 'entire': 2003, 'bring': 48, 'did': 442, 'Global': 186, 'Tax': 225, 'applicable': 377, 'German': 539, 'convenience': 77, 'Defense': 1069, 'round': 488, 'Systems': 62, 'concluded': 944, 'prevent': 174, 'work': 57, 'force': 4420, 'construed': 2085, 'consistent': 730, 'participate': 162, 'sign': 167, 'second': 82, 'Organization': 83, 'implemented': 50, 'Red': 179, 'established': 562, 'appear': 58, 'Form': 1773, 'giving': 438, 'sum': 97, 'expressed': 82, 'debate': 55, 'consistently': 195, 'Cultural': 198, 'version': 84, 'Resources': 116, 'goes': 50, 'basic': 50, 'new': 463, 'America': 813, 'public': 146, 'VIII': 116, 'body': 66, 'deemed': 46, 'full': 250, 'led': 154, 'Foundation': 60, 'got': 60, 'understanding': 1089, 'reported': 42, 'Mental': 212, 'them': 46, 'concluding': 90, 'Commonwealth': 1055, 'Associated': 74, 'compliance': 496, 'example': 122, 'published': 45, 'prior': 739, 'amount': 88, 'products': 66, 'social': 187, 'cessation': 429, 'implement': 164, 'successors': 91, 'Technology': 134, 'elect': 82, 'Justice': 161, 'Relations': 1774, 'replace': 158, 'Member': 361, 'EU': 583, 'Book': 2017, 'Health': 1708, 'Basic': 809, 'apply': 2087, 'Standard': 132, 'establish': 80, 'Europe': 458, 'Australia': 182, 'from': 999, 'August': 285, 'would': 549, 'Joint': 1140, 'remains': 218, 'beginning': 55, 'objectives': 328, 'contains': 316, 'two': 1209, 'Russia': 116, 'comparing': 71, 'Declaration': 943, 'Russian': 89, 'scope': 849, 'until': 136, 'more': 131, 'Land': 140, 'becomes': 467, 'hereby': 64, 'occurred': 42, 'validity': 220, 'granted': 179, 'American': 5977, 'carrying': 139, 'virtue': 172, 'particular': 47, 'known': 1014, 'behalf': 713, 'presumed': 49, 'states': 204, 'account': 133, 'Financial': 123, 'organization': 69, 'rights': 398, 'this': 109084, 'avoiding': 43, 'People': 1144, 'modified': 254, 'measures': 224, 'remain': 1210, 'paragraph': 88, 'can': 288, 'December': 644, 'following': 285, 'Asian': 419, 'conformity': 269, 'could': 89, 'reserved': 55, 'Northern': 79, 'give': 267, 'Federal': 75, 'in': 77006, 'purposes': 1317, 'accept': 919, 'involve': 150, 'currency': 320, 'Dutch': 71, 'Policy': 339, 'Party': 238, 'Without': 56, 'Interstate': 231, 'respective': 43, 'admission': 70, 'requirements': 199, 'Federation': 179, 'provide': 447, 'feature': 81, 'ratified': 162, 'Jordan': 94, 'Environmental': 123, 'III': 214, 'A': 480, 'rise': 51, 'intended': 100, 'protection': 123, 'Technical': 5671, 'annexed': 816, 'after': 1694, 'British': 244, 'Foreign': 81, 'designed': 47, 'Paris': 1758, 'date': 2558, 'such': 728, 'a': 10807, 'All': 229, 'entry': 60, 'attempt': 102, 'succeeded': 55, 'Mexico': 245, 'Civil': 1401, 'Series': 94, 'Community': 180, 'so': 632, 'Parties': 1048, 'Trust': 947, 'order': 44, 'our': 260, 'executed': 1603, 'interpretation': 725, 'furnished': 337, 'consent': 219, 'over': 866, 'govern': 75, 'soon': 198, 'trade': 89, 'produced': 44, 'through': 169, 'Forces': 1314, 'Military': 579, 'mentioned': 86, 'its': 1764, 'before': 241, 'negotiated': 192, 'His': 56, 'March': 156, 'Scientific': 345, 'Exchange': 97, 'Rules': 117, 'Delhi': 183, 'Monetary': 177, 'absence': 91, 'constitutes': 1087, 'covered': 727, 'Italian': 66, 'defining': 47, 'then': 70, 'non': 159, 'affected': 49, 'propose': 118, 'material': 265, 'Analysis': 138, 'Bay': 107, 'framework': 711, 'Rome': 44, 'Documents': 132, 'they': 96, 'Social': 588, 'not': 5135, 'Works': 206, 'provision': 1295, 'day': 629, 'association': 316, 'Control': 245, 'articles': 400, 'term': 1773, 'Natural': 618, 'San': 380, 'lies': 54, 'establishment': 761, 'preclude': 57, 'VII': 208, 'India': 383, 'establishing': 1434, 'Four': 359, 'Council': 410, 'each': 237, 'found': 154, 'entitled': 170, 'Agriculture': 3967, 'mean': 188, 'Zealand': 175, 'signing': 5286, 'From': 64, 'another': 2426, 'principles': 60, 'related': 472, 'constitute': 432, 'contracting': 208, 'entering': 128, 'Treasury': 109, 'clauses': 105, 'operation': 341, 'Kong': 101, 'now': 42, 'event': 132, 'special': 45, 'out': 1046, 'by': 16227, 'join': 494, 'Contracting': 54, 'Chancellor': 128, 'since': 46, 'Agreement': 540, 'adhere': 49, 'may': 3785, 'laid': 662, 'Construction': 218, 'health': 77, 'Article': 1889, 'transfer': 54, 'forth': 1117, 'Town': 183, 'working': 272, 'theory': 545, 'G': 66, 'Rights': 609, 'This': 6894, 'withdrawal': 103, 'According': 245, 'negotiate': 102, 'reason': 80, 'disputes': 61, 'formation': 80, 'put': 98, 'Security': 127, 'hand': 51, 'remainder': 778, 'allies': 93, 'language': 397, 'Point': 100, 'September': 786, 'National': 639, 'Nature': 105, 'enforce': 63, 'ratification': 488, 'prejudice': 348, 'incorporated': 199, 'signature': 1492, 'retain': 435, 'assign': 614, 'first': 190, 'suspension': 188, 'clause': 42, 'symbolic': 423, 'number': 83, 'thereof': 124, 'one': 346, 'instances': 61, 'Indian': 77, 'War': 1112, 'suspended': 76, 'carry': 177, 'reached': 804, 'open': 269, "President's": 130, 'given': 199, 'North': 1738, 'stuck': 43, 'leading': 149, 'Commercial': 87, 'system': 49, 'their': 464, 'draft': 168, 'statement': 68, 'final': 84, 'Association': 2235, 'gives': 51, 'interests': 62, 'enforcement': 83, 'Search': 116, 'relationship': 116, 'that': 7115, 'completed': 41, 'took': 89, 'No': 149, 'part': 2940, 'copy': 506, 'than': 470, 'specify': 120, 'third': 155, 'accordance': 2849, 'framers': 53, 'require': 150, 'State': 79, 'submitted': 411, 'nations': 50, 'patients': 52, 'future': 54, 'contrary': 264, 'were': 719, 'providing': 141, 'represented': 42, 'result': 890, 'and': 199298, 'Information': 231, 'Defence': 684, 'Peace': 2991, 'manner': 47, 'have': 1688, 'null': 189, 'Administrative': 135, 'any': 4716, 'Development': 251, 'East': 68, 'Power': 315, 'Egyptian': 69, 'After': 296, 'Arab': 111, 'also': 336, 'entirety': 55, 'without': 1065, 'take': 806, 'which': 3336, 'January': 222, 'objective': 136, 'performance': 389, 'intent': 203, 'Articles': 16101, 'begin': 83, 'J': 44, 'unless': 45, 'shall': 35827, 'who': 53, 'Governor': 67, 'said': 717, 'Common': 251, 'letter': 201, 'undertaken': 76, 'services': 536, 'The': 8741, 'inserted': 257, 'approved': 518, 'stands': 90, 'observation': 42, 'considered': 262, 'arising': 244, 'calculated': 45, 'That': 219, 'brought': 44, 'treaty': 55, 'China': 51, 'Its': 68, 'fact': 546, 'wording': 45, "People's": 66, 'particularly': 51, 'violation': 1488, 'text': 1170, 'agreed': 82, 'Agency': 45, 'terminate': 3020, 'Research': 156, 'Religion': 167, 'Role': 221, 'inconsistent': 158, 'Under': 852, 'determined': 427, 'Public': 1708, 'Independence': 42, 'European': 1660, 'Kingdom': 359, 'execution': 1703, 'Democratic': 60, 'based': 254, 'precedence': 674, 'refers': 46, 'implementation': 1168, 'Author': 187, 'Berlin': 122, 'Free': 13364, 'should': 151, 'failed': 63, 'only': 234, 'experiment': 503, 'York': 78, 'essence': 80, 'Services': 21094, 'constant': 77, 'pointed': 99, 'do': 261, 'his': 96, 'cooperation': 357, 'means': 281, 'Notes': 229, 'allotted': 129, 'joint': 144, 'remedy': 113, 'Industrial': 121, 'negative': 61, 'Cape': 183, 'X': 57, 'THE': 448, 'interpreted': 412, 'Soviet': 71, 'areas': 48, 'Methods': 44, 'countries': 101, 'Washington': 166, 'method': 126, 'Straits': 53, 'common': 45, 'contain': 56, 'including': 102, 'fixed': 70, 'valid': 134, 'view': 58, 'declared': 113, 'set': 996, 'Caribbean': 169, 'reference': 135, 'Area': 230, 'attached': 125, 'observed': 111, 'see': 430, 'decided': 185, 'are': 3905, 'Central': 825, 'John': 70, 'subject': 3237, 'Good': 2401, 'reform': 51, 'subsequent': 46, 'review': 227, 'Water': 99, 'written': 1239, 'Part': 553, 'Members': 174, 'between': 13942, 'affecting': 79, 'drawn': 290, 'July': 709, 'conditions': 532, 'available': 51, 'notice': 669, 'terms': 9112, 'extend': 73, 'nature': 58, 'sole': 49, 'extent': 233, 'article': 84, 'come': 2125, 'remaining': 409, 'aim': 66, 'both': 126, 'interpretations': 79, 'Settlement': 248, 'according': 421, 'against': 572, 'called': 322, 'Human': 86, 'connection': 2434, 'became': 50, 'dated': 47, 'context': 140, 'reply': 68, 'whole': 77, 'Investment': 1014, 'comes': 287, 'grounds': 72, 'negotiation': 225, 'among': 169, 'adjusted': 58, 'relevant': 564, 'World': 4271, 'conclude': 845, 'simple': 65, 'effective': 2220, 'period': 721, 'subscribe': 51, 'End': 51, 'Union': 930, 'approval': 51, 'respect': 1804, 'Method': 2175, 'Publishing': 222, 'light': 49, 'Assembly': 170, 'USA': 407, 'basis': 436, 'addition': 60, 'create': 46, 'acceptance': 57, 'copies': 85, 'been': 2199, 'quickly': 101, 'deposited': 119, 'Plan': 754, 'interest': 96, 'certain': 82, 'women': 52, 'entered': 3567, 'meeting': 96, 'important': 52, 'treatment': 75, 'arises': 48, 'concerning': 1027, 'Universal': 157, 'Nations': 1069, 'child': 71, 'altered': 70, 'spirit': 1159, 'those': 368, 'Secretary': 104, 'exception': 77, 'Force': 208, 'contemplated': 647, 'value': 134, 'General': 67180, 'will': 1950, 'while': 59, 'expiration': 295, 'balance': 79, 'decides': 65, 'supply': 120, 'Building': 266, 'importation': 55, 'is': 10074, 'binding': 376, 'it': 699, 'reads': 79, 'itself': 45, 'Convention': 687, 'Assessment': 65, 'if': 729, 'Israel': 164, 'grant': 176, 'perform': 120, 'privileges': 47, 'Regional': 390, 'administration': 193, 'same': 57, 'rules': 183, 'modification': 148, 'party': 2833, 'conflict': 221, 'Quebec': 79, 'status': 103, 'used': 1565, 'I': 262, 'VI': 1641, 'defeat': 67, 'upon': 1565, 'effect': 974, 'IV': 413, 'II': 555, 'action': 92, 'Oil': 205, 'arrived': 98, 'purpose': 45, 'recent': 58, 'prescribed': 116, 'stipulated': 43, 'settled': 769, 'well': 908, 'Coal': 395, 'States': 4080, 'person': 133, 'Economic': 3386, 'experimental': 45, 'In': 952, 'termination': 5606, 'the': 307687, 'otherwise': 77, 'Quality': 213, 'If': 84, 'Status': 1953, 'United': 6025, 'understood': 84, 'proposed': 519, 'being': 98, 'rest': 114, 'Labor': 66, 'Care': 122, 'Death': 167, 'cease': 290, 'questions': 116, 'Baltic': 55, 'years': 53, 'yet': 321, 'enters': 1248, 'Theory': 123, 'Islamic': 266, 'Far': 68, 'Service': 150, 'Friday': 2401, 'had': 1174, 'except': 266, 'governmental': 142, 'concerns': 64, 'nor': 46, 'Although': 55, 'continuation': 44, 'violations': 59, 'has': 2470, 'maintenance': 149, 'resolved': 129, 'On': 105, 'aspects': 49, 'around': 44, 'provisions': 16390, 'represents': 282, 'read': 69, 'James': 107, 'unlimited': 68, 'execute': 104, 'Nile': 148, 'Air': 114, 'Project': 465, 'matters': 114, 'immediately': 190, 'accepted': 216, 'Prevention': 824, 'name': 49, 'Pakistan': 203, 'equity': 60, 'imposed': 101, 'like': 41, 'success': 101, 'follows': 73, 'performing': 320, 'specific': 54, 'benefit': 101, 'continue': 83, 'London': 1259, 'Bank': 64, 'fully': 133, 'become': 155, 'revision': 169, 'amendment': 122, 'pertaining': 162, 'exceed': 42, 'right': 360, 'old': 87, 'methods': 219, 'West': 963, 'creation': 284, 'Application': 287, 'Conservation': 151, 'Master': 218, 'pursuance': 527, 'Cooperation': 3096, 'specified': 364, 'duration': 244, 'International': 8404, 'Marx': 66, 'recognition': 66, 'Economy': 69, 'provided': 1578, 'for': 15859, 'Government': 1826, 'Evaluation': 553, 'Moscow': 137, 'transactions': 647, 'Customs': 372, 'Fund': 1067, 'does': 1414, 'provides': 1351, 'Nothing': 79, 'refer': 265, 'be': 25157, 'object': 62, 'agreement': 278, 'expansion': 46, 'David': 601, 'Vietnam': 80, 'Agricultural': 2893, 'described': 51, 'relating': 2304, 'Property': 732, 'dictated': 98, 'permitted': 106, 'stage': 66, 'on': 260029, 'carried': 319, 'anything': 255, 'of': 211482, 'year': 263, 'US': 1134, 'Trade': 205625, 'compatible': 73, 'extension': 108, 'UN': 85, 'ensure': 59, 'UK': 60, 'Japan': 140, 'referred': 2835, 'or': 16937, 'Lord': 41, 'Hong': 101, 'letters': 40, 'presence': 63, 'into': 9567, 'within': 657, 'integral': 894, 'bound': 76, 'due': 88, 'down': 756, 'nothing': 2240, 'Internal': 61, 'parties': 1754, 'observance': 93, 'Korean': 167, 'rounds': 206, 'terminated': 333, 'renewal': 84, 'measured': 45, 'there': 152, 'long': 224, 'strict': 170, 'authorized': 80, 'Company': 72, 'resulted': 47, 'November': 77, 'was': 7903, 'war': 76, 'agreements': 67, 'elsewhere': 295, 'form': 813, 'duly': 83, 'Science': 122, 'registered': 363, 'regard': 532, 'failure': 47, 'construction': 116, 'survive': 419, 'line': 58, 'with': 24573, 'longer': 55, 'continuance': 64, 'Irish': 64, 'made': 2494, 'applying': 228, 'Office': 222, 'Commission': 150, 'default': 148, 'House': 97, 'Africa': 186, 'up': 372, 'signed': 6861, 'lead': 48, 'Egypt': 318, 'structure': 62, 'Governments': 346, 'agree': 211, 'Special': 106, 'amendments': 172, 'Executive': 225, 'associated': 50, 'undertake': 42, 'bind': 74, 'defined': 504, 'Nuclear': 264, 'Between': 2884, 'an': 5991, 'To': 375, 'as': 10876, 'Production': 83, 'exist': 65, 'at': 2301, 'obligations': 2088, 'New': 224, 'negotiations': 1103, 'functions': 127, 'Geneva': 590, 'no': 739, 'when': 151, 'end': 658, 'application': 2548, 'other': 3132, 'role': 63, 'make': 77, 'you': 553, 'intention': 239, 'conclusion': 2193, 'requested': 43, 'February': 62, 'elements': 67, 'Naval': 498, 'amended': 574, 'April': 209, 'cumulative': 42, 'ground': 90, 'Act': 4383, 'included': 309, 'Railway': 94, 'South': 605, 'bilateral': 76, 'required': 758, 'Principles': 425, 'governed': 408, 'An': 677, 'As': 1081, 'came': 866, 'diagnosis': 71, 'time': 929, 'recognize': 41, 'conferred': 46, 'joining': 265, 'requires': 120, 'indirectly': 250}
Alabama	{'all': 475, 'Indians': 299, 'alien': 64, 'sunk': 152, 'woods': 73, 'Bureau': 236, 'skin': 66, 'go': 43, 'settlement': 853, 'Louisiana': 634, 'children': 136, 'Conflict': 66, 'seemed': 128, 'young': 164, 'to': 10322, 'charge': 69, 'under': 689, 'sent': 80, 'sinking': 412, 'League': 117, 'woman': 63, 'returned': 218, 'homes': 74, 'Constitutional': 233, 'Co': 96, 'founded': 50, 'school': 43, 'large': 99, 'sand': 175, 'participation': 84, 'small': 538, 'George': 1584, 'upper': 61, 'force': 51, 'exposed': 104, 'malice': 102, 'd': 97, 'Department': 5830, 'State': 18438, 'invention': 45, 'General': 1107, 'what': 151, 'constitution': 231, 'section': 50, 'version': 43, 'capital': 75, 'new': 92, 'public': 375, 'told': 66, 'men': 62, 'English': 62, 'lands': 268, 'along': 112, 'change': 63, 'addresses': 76, 'Associated': 80, 'colored': 71, 'Session': 129, 'action': 110, 'opinion': 43, 'residents': 205, 'settled': 95, 'punishment': 94, 'followed': 61, 'Circuit': 122, 'Reform': 102, 'Justice': 58, 'When': 110, 'Three': 66, 'Relations': 709, 'Richmond': 46, 'regiment': 47, 'Safety': 416, 'stern': 89, 'from': 5802, 'would': 141, 'two': 50, 'Houses': 49, 'live': 86, 'upheld': 244, 'equity': 42, "Children's": 177, 'known': 290, 'central': 104, 'cases': 245, 'town': 584, 'rights': 55, 'this': 302, 'challenge': 41, 'work': 87, 'Library': 295, 'can': 81, 'my': 210, 'could': 99, 'history': 218, 'Conference': 488, 'heart': 319, 'Northern': 300, 'Convention': 865, 'states': 3543, 'Speaker': 51, 'Party': 289, 'made': 47, 'native': 797, 'court': 141, 'discussion': 45, 'regions': 74, 'vital': 45, 'Technical': 155, 'waters': 223, 'after': 231, 'southern': 797, 'membership': 282, 'lay': 104, 'president': 646, 'law': 788, 'types': 118, 'a': 5503, 'short': 106, 'Letters': 84, 'Civil': 476, 'Series': 110, 'Middle': 49, 'Guard': 211, 'Tennessee': 1232, 'enter': 243, 'Politics': 464, 'order': 48, 'Blue': 122, 'Catholic': 59, 'Mississippi': 4663, 'over': 58, 'residence': 65, 'course': 53, 'Trustees': 80, 'including': 53, 'Forces': 213, 'mentioned': 45, 'White': 111, 'its': 203, 'Texas': 182, 'before': 275, 'gentlemen': 81, 'but': 116, 'Scientific': 147, 'Transactions': 634, 'chosen': 44, 'Negro': 49, 'acres': 117, 'criminal': 46, 'happened': 205, 'then': 129, 'Bar': 400, 'Register': 59, 'Analysis': 77, 'She': 127, 'half': 203, 'not': 299, 'now': 349, 'day': 411, 'bank': 296, 'Supreme': 2014, 'name': 45, 'university': 79, 'Archaeological': 67, 'Council': 919, 'found': 135, 'went': 139, 'Agriculture': 94, 'side': 528, 'Superior': 47, 'Arts': 80, 'From': 44, 'house': 88, 'related': 50, 'vote': 57, "enemy's": 45, 'our': 65, 'Criminal': 120, 'Carolina': 1904, 'Presbyterian': 90, 'Institute': 3173, 'Report': 62, 'Railroad': 732, 'Class': 132, 'delegate': 59, 'attended': 52, 'Rights': 261, 'University': 54391, 'Work': 55, 'delegates': 193, 'houses': 76, 'base': 50, 'Historical': 634, 'Senator': 1445, 'beginning': 68, 'owners': 45, 'river': 86, 'language': 89, 'People': 80, 'National': 506, 'Years': 404, 'place': 61, 'Human': 970, 'fallen': 106, 'Nineteenth': 61, 'first': 292, 'Universities': 135, 'there': 62, 'one': 538, 'Program': 70, 'Black': 231, 'such': 416, 'Medicine': 785, 'reached': 90, 'open': 54, 'contended': 59, 'arbitration': 878, 'city': 96, 'North': 1009, 'Dictionary': 1262, 'coasts': 52, 'their': 159, 'passed': 144, 'convention': 90, 'Press': 24276, 'white': 123, 'final': 72, 'Association': 400, 'that': 2051, 'forests': 48, 'part': 1097, 'doctors': 53, 'Superintendent': 83, 'Democratic': 252, 'than': 105, 'History': 447, 'steel': 43, 'grew': 176, 'sailing': 58, 'likely': 188, 'classes': 59, 'friend': 90, 'iron': 340, 'were': 237, 'Settlement': 44, 'distinguished': 91, 'Biography': 1262, 'result': 110, 'and': 29491, 'Court': 2457, 'Richard': 132, 'well': 88, 'Peace': 61, 'head': 51, 'Other': 200, 'out': 83, 'have': 160, 'Valley': 44, 'States': 1915, 'East': 126, 'Christian': 1622, 'built': 49, 'Power': 1514, 'coal': 278, 'Virginia': 59, 'Congo': 75, 'take': 64, 'which': 159, 'interior': 106, 'With': 65, 'Bank': 176, 'Legislature': 879, 'who': 492, 'Kentucky': 282, 'confluence': 323, 'most': 60, 'invoked': 135, 'mouth': 102, 'Common': 50, 'letter': 73, 'The': 3957, 'statute': 194, 'Attorney': 229, 'later': 41, 'cover': 54, 'playing': 87, 'points': 45, 'slaves': 90, 'came': 325, 'laws': 880, 'affair': 72, 'Union': 374, 'Political': 119, 'Research': 119, 'colors': 67, 'session': 140, 'Hill': 48, 'battle': 70, 'Under': 107, 'Public': 506, 'Woman': 172, 'northern': 1423, 'French': 66, 'should': 82, 'only': 42, 'black': 388, 'York': 1374, 'Confederate': 1190, 'executive': 135, 'his': 515, 'Annual': 129, 'coast': 108, 'famous': 101, 'cannot': 42, 'closely': 50, 'during': 258, 'River': 1130, 'him': 81, 'areas': 42, 'bar': 61, 'Cancer': 92, 'held': 402, 'fields': 46, 'morning': 97, 'she': 111, 'forces': 106, 'through': 165, 'Education': 1973, 'where': 46, 'declared': 53, 'Fourth': 67, 'Hospital': 272, 'transferred': 119, 'elected': 408, 'are': 412, 'Grand': 70, 'John': 1005, 'outward': 101, 'said': 106, 'served': 68, 'purpose': 247, 'case': 866, 'state': 2903, 'future': 49, 'cooperation': 196, 'between': 392, 'affecting': 58, 'probably': 42, 'boundary': 49, 'across': 41, 'King': 44, 'we': 185, 'Central': 147, 'Colored': 192, 'Johnson': 57, 'Manual': 59, 'sold': 95, 'Florida': 1229, 'Islands': 69, 'come': 240, 'legislature': 664, 'protect': 65, 'career': 122, 'many': 63, 'annual': 199, 'against': 110, 'foreign': 63, 'admit': 103, 'became': 285, 'Statistical': 59, 'admission': 238, 'Her': 45, 'conclude': 185, 'Delaware': 71, 'represented': 79, 'Board': 1924, 'throughout': 42, 'War': 176, 'belt': 468, 'unusual': 50, 'Assembly': 832, 'union': 54, 'west': 183, 'Southern': 1118, 'been': 476, 'Protection': 49, 'California': 200, 'Schools': 77, 'extends': 64, 'entered': 722, 'meeting': 273, 'Georgia': 3277, 'Center': 284, 'eastern': 135, 'Pennsylvania': 210, 'offered': 48, 'gas': 218, 'scrutiny': 74, 'recalled': 73, 'general': 166, 'former': 74, 'present': 426, 'applied': 349, 'commerce': 72, 'east': 153, 'Force': 390, 'presidency': 68, 'plain': 112, 'governor': 481, 'air': 64, 'will': 92, 'near': 87, 'College': 1899, 'Church': 152, 'Trent': 45, 'property': 50, 'resident': 117, 'is': 1063, 'it': 370, 'surface': 101, 'commander': 148, 'middle': 97, 'in': 24870, 'Stock': 50, 'governors': 332, 'make': 65, 'member': 2545, 'parts': 717, 'unfortunate': 73, 'largest': 96, 'I': 131, 'Medical': 542, 'upon': 84, 'director': 241, 'running': 61, 'Teachers': 209, 'judgment': 142, 'lower': 88, 'off': 150, 'Senate': 75, 'constitutional': 164, 'analysis': 58, 'Maine': 59, 'thought': 45, 'Economic': 213, 'In': 778, 'claims': 1463, 'the': 123882, 'summer': 42, 'United': 514, 'proposed': 99, 'Social': 108, 'officers': 168, 'schools': 343, 'Reconstruction': 66, 'succeeding': 63, 'Constitution': 54, 'hills': 165, 'Company': 1181, 'had': 711, 'Connecticut': 71, 'adoption': 89, 'justices': 65, 'citizen': 79, 'has': 905, 'By': 60, 'chairman': 41, 'They': 60, 'fate': 47, 'term': 41, 'Air': 522, 'early': 193, 'ruled': 362, 'brought': 88, 'like': 157, 'submission': 123, 'relied': 97, 'admitted': 281, 'Germans': 43, 'Labor': 53, 'tribes': 124, 'Fine': 80, 'attorney': 166, 'valleys': 62, 'Gulf': 79, 'people': 1773, 'West': 227, 'some': 167, 'back': 56, 'delegation': 135, 'Commission': 72, 'Conservation': 285, 'Archives': 2213, 'escape': 1254, 'New': 1449, 'Life': 175, 'for': 4055, 'Government': 143, 'decision': 105, 'per': 137, 'Appeals': 151, 'asking': 81, 'provides': 225, 'Movement': 1855, 'demonstrated': 72, 'be': 216, 'School': 1515, 'And': 92, 'Times': 40, 'Governor': 1974, 'Agricultural': 671, 'graduated': 73, 'by': 2304, 'First': 90, 'hearing': 46, 'on': 2641, 'actual': 102, 'Fair': 76, 'of': 157922, 'industry': 105, 'citizens': 256, 'hostilities': 60, 'shipping': 51, 'Commissioners': 493, 'done': 185, 'or': 400, 'block': 47, 'Journal': 51, 'raised': 63, 'own': 42, 'Psychology': 146, 'No': 50, 'civil': 198, 'into': 775, 'within': 94, 'Two': 473, 'down': 123, 'Internal': 102, 'cent': 137, 'coastal': 112, 'registration': 63, 'Alabama': 132, 'her': 597, 'diocese': 107, 'question': 266, 'editor': 51, 'way': 262, 'was': 3222, 'Baptist': 558, 'elsewhere': 73, 'regard': 480, 'strikes': 87, 'collected': 55, 'construction': 48, 'courts': 520, 'sustained': 65, 'line': 266, 'with': 1671, 'he': 355, 'count': 42, 'Industrial': 147, 'Office': 109, 'born': 359, 'whether': 144, 'House': 296, 'up': 347, 'Society': 214, 'Britain': 126, 'Surgery': 291, 'similar': 40, 'supreme': 91, 'infections': 130, 'moved': 324, 'Lake': 47, 'an': 452, 'as': 1988, 'at': 4433, 'home': 46, 'campus': 382, 'cotton': 111, 'corporation': 63, 'when': 156, 'Fort': 57, 'members': 214, 'other': 1699, 'branch': 60, 'Law': 384, 'Great': 832, 'may': 51, 'includes': 60, 'Act': 60, 'included': 44, 'Railway': 80, 'South': 2531, 'building': 268, 'land': 325, 'resolution': 48, 'Thus': 43, 'An': 185, 'fact': 48, 'time': 203, 'requires': 63, 'legislatures': 50}
Albert	{'writings': 322, 'managed': 131, 'hands': 88, 'Poetry': 335, 'hath': 118, 'protest': 43, 'Saint': 63, 'go': 86, 'Van': 49, 'children': 59, 'seemed': 47, 'Brown': 158, 'apartment': 68, 'concerned': 70, 'young': 180, 'Bulletin': 112, 'to': 8226, 'under': 1993, 'sent': 146, 'case': 64, 'appointment': 42, 'returned': 40, 'sitting': 41, 'far': 227, 'end': 181, 'eldest': 127, 'P': 46, 'sons': 41, 'Henry': 400, 'awful': 63, 'vast': 41, 'Memoirs': 79, 'believed': 41, 'Thomas': 472, 'proceeded': 66, 'did': 159, 'brother': 533, 'large': 46, 'sang': 63, 'Paul': 1257, 'George': 1451, 'says': 59, 'School': 67, 'Department': 65, 'second': 116, 'design': 52, 'blue': 134, 'what': 45, 'developed': 46, 'Theatre': 71, 'desirous': 65, 'fact': 208, 'experiment': 46, 'scientists': 71, 'supplies': 46, 'Francis': 42, 'told': 165, 'edited': 707, 'Professor': 323, 'men': 178, 'here': 166, 'reported': 90, 'Arthur': 83, 'let': 54, 'English': 68, 'represented': 44, 'sing': 44, 'along': 86, 'teacher': 82, 'obtained': 53, 'great': 158, 'daughter': 327, 'climbed': 50, 'reports': 104, 'example': 101, 'honor': 85, 'Canal': 1063, 'named': 231, 'Robert': 356, 'explained': 45, 'When': 43, 'private': 93, 'Book': 149, 'Tom': 256, 'beheld': 111, 'Wales': 164, 'Europe': 158, 'eye': 202, 'troops': 50, 'would': 72, 'army': 196, 'contains': 94, 'visit': 133, 'two': 65, 'next': 68, 'doubt': 53, 'Lives': 117, 'means': 113, 'markets': 65, 'tell': 70, 'Recent': 83, 'Count': 351, 'flat': 61, 'entrance': 43, 'Land': 102, 'conductor': 42, 'Eve': 71, 'it': 289, 'American': 248, 'V': 161, 'appointed': 43, 'circumstances': 40, 'me': 221, 'Duke': 527, 'none': 63, 'wore': 164, 'this': 186, 'work': 1349, 'Library': 109, 'theories': 56, 'remain': 41, 'can': 41, 'den': 197, 'stature': 40, 'following': 50, 'my': 510, 'National': 497, 'history': 60, 'escaped': 62, 'Netherlands': 63, 'Speaker': 63, 'something': 59, 'Lord': 177, 'story': 42, 'dress': 62, 'Walter': 84, 'information': 181, 'court': 66, 'permission': 107, 'write': 61, 'animal': 51, 'Fall': 76, 'III': 182, 'A': 510, 'plans': 68, 'may': 336, 'waters': 83, 'after': 58, 'southern': 181, 'British': 250, 'shores': 211, 'designed': 52, 'faculty': 42, 'such': 508, 'man': 206, 'a': 3736, 'introduced': 64, 'Letters': 407, 'succeeded': 126, 'coat': 1335, 'interrupted': 137, 'One': 96, 'shore': 144, 'Tennessee': 575, 'so': 105, 'south': 66, 'banks': 85, 'subsequently': 42, 'indeed': 67, 'over': 561, 'Minnesota': 51, 'held': 476, 'scientist': 44, 'including': 149, 'still': 45, 'Texas': 48, 'before': 242, 'perfect': 92, 'bridges': 384, 'style': 127, 'His': 2216, 'March': 65, 'thank': 50, 'la': 468, 'interesting': 76, 'Philosophy': 92, 'vicinity': 87, 'late': 172, 'platform': 159, 'condition': 60, 'happened': 45, 'designs': 111, 'evening': 94, 'Dominican': 85, 'Michael': 209, 'school': 61, 'therapy': 167, 'Adam': 112, 'Mother': 50, 'not': 182, 'now': 553, 'bank': 62, 'several': 61, 'name': 438, 'always': 123, 'each': 52, 'found': 420, 'went': 191, 'heavy': 151, 'Arts': 154, 'just': 58, 'fired': 69, 'Museum': 82872, 'reduce': 104, 'Jefferson': 102, 'entering': 68, 'Treasury': 135, 'et': 136, 'neighbourhood': 55, 'really': 119, 'Chancellor': 195, 'Samuel': 115, 'Lakes': 315, 'William': 402, 'I': 868, 'lecture': 53, 'cause': 43, 'advice': 42, 'theory': 380, 'assured': 52, 'University': 264, 'Work': 86, 'given': 291, 'collection': 110, 'Historical': 53, 'Senator': 960, 'Studies': 127, 'organ': 111, 'service': 64, 'Boston': 42, 'could': 67, 'days': 106, 'conversation': 135, 'wrote': 159, 'think': 300, 'waited': 115, 'first': 340, 'emotion': 111, 'student': 46, 'Royal': 4660, 'owned': 61, 'one': 147, 'feet': 49, 'done': 46, 'president': 43, 'Medicine': 1577, 'reached': 103, 'little': 46, 'district': 155, 'introduction': 580, 'top': 45, 'their': 59, 'intelligence': 64, 'speaking': 59, 'Alexander': 109, 'Lincoln': 44, 'B': 306, 'way': 45, 'that': 2193, 'took': 68, 'part': 46, 'Thames': 45, 'translation': 68, 'than': 232, 'History': 41, 'colleagues': 61, 'showed': 48, 'novel': 51, 'See': 185, 'marriage': 103, 'friend': 59, 'were': 1345, 'and': 153608, 'Chicago': 46, 'ran': 69, 'Prussia': 298, 'Search': 129, 'Field': 207, 'rat': 52, 'murder': 98, 'manner': 148, 'have': 93, 'seem': 51, 'saw': 52, 'Christian': 1175, 'After': 284, 'Albert': 1236, 'note': 76, 'Minister': 65, 'which': 317, 'performance': 89, 'Hall': 8851, 'Jones': 56, 'play': 113, 'collaboration': 355, 'who': 111, 'M': 165, 'Kentucky': 99, 'most': 52, 'Chief': 90, 'Good': 95, 'mouth': 182, 'reverence': 66, 'The': 8973, 'Abraham': 44, 'gallery': 64, 'Reprinted': 107, 'paintings': 61, 'later': 133, 'face': 222, 'High': 84, 'proceedings': 74, 'R': 112, 'enterprise': 54, 'came': 175, 'show': 43, 'Union': 65, 'Political': 104, 'accession': 124, 'Research': 108, 'corner': 41, 'heavens': 48, 'lessons': 76, 'envy': 45, 'quoted': 52, 'title': 56, 'explain': 66, 'Kansas': 58, 'French': 121, 'should': 58, 'only': 92, 'going': 47, 'black': 332, 'pretty': 46, 'Confederate': 131, 'Prime': 65, 'his': 1392, 'Marquis': 49, 'get': 55, 'Louis': 43, 'de': 632, 'famous': 46, 'made': 143, 'words': 566, 'THE': 48, 'silk': 41, 'him': 186, 'celebrate': 40, 'tobacco': 101, 'married': 448, 'Washington': 60, 'morning': 69, 'Trustees': 59, 'she': 44, 'wish': 50, 'ruins': 56, 'husband': 170, 'secretary': 93, 'set': 117, 'concert': 881, 'For': 181, 'Colonel': 539, 'Hospital': 989, 'C': 131, 'see': 47, 'Handbook': 106, 'are': 646, 'Grand': 136, 'John': 533, 'L': 49, 'said': 682, 'currently': 42, 'Park': 419, 'LIBRARY': 45, 'favor': 218, 'outside': 48, 'between': 222, 'July': 56, 'arrival': 42, 'King': 1526, 'we': 94, 'Johnson': 109, 'ushered': 77, 'attention': 51, 'dome': 140, 'S': 301, 'death': 2592, 'wear': 108, 'smiled': 48, 'cousin': 129, 'come': 158, 'Honor': 118, 'AND': 59, 'league': 50, 'against': 104, 'assure': 68, 'became': 52, 'figures': 56, 'whole': 41, 'asked': 67, 'comment': 122, 'Death': 196, 'supply': 43, 'Music': 52, 'likes': 74, 'church': 40, 'War': 201, 'better': 50, 'described': 54, 'west': 66, 'remained': 66, 'Southern': 94, 'three': 43, 'been': 457, 'whom': 133, 'startled': 137, 'entered': 42, 'meeting': 490, 'Princess': 129, 'life': 497, 'Center': 383, 'received': 324, 'mind': 103, 'wretched': 61, 'child': 48, 'general': 432, 'spirit': 43, 'those': 154, 'Secretary': 188, 'sound': 137, 'east': 42, 'look': 147, 'General': 2708, 'will': 64, 'while': 52, 'College': 4778, 'Queen': 9245, 'played': 46, 'is': 1348, 'them': 96, 'defeated': 131, 'in': 45146, 'commanded': 115, 'if': 72, 'things': 61, 'make': 44, 'radiation': 41, 'same': 414, 'speech': 579, 'party': 68, 'gets': 48, 'disciple': 48, 'ball': 104, 'Third': 74, 'Institute': 183, 'Medical': 779, 'upon': 129, 'hand': 177, 'II': 110, 'persons': 40, 'moment': 64, 'arose': 67, 'statue': 288, 'President': 282, 'It': 52, 'person': 138, 'command': 722, 'greatest': 48, 'In': 60, 'direction': 995, 'the': 140067, 'Nazi': 112, 'drawing': 44, 'left': 68, 'proposed': 106, 'less': 54, 'being': 52, 'able': 61, 'Social': 74, 'rest': 143, 'Papers': 1138, 'sprang': 49, 'renewed': 60, 'Company': 64, 'had': 1427, 'Victoria': 120984, 'parents': 42, 'exhibition': 137, 'beloved': 236, 'Philip': 182, 'has': 365, 'hat': 245, 'On': 127, 'James': 520, 'Commons': 44, 'Nile': 66, 'early': 73, 'know': 141, 'birth': 58, 'acquainted': 67, 'Art': 94, 'like': 301, 'stranger': 45, 'OF': 45, 'Gallery': 45, 'London': 1145, 'warned': 84, 'translated': 122, 'become': 47, 'works': 49, 'Prince': 20878, 'old': 63, 'often': 45, 'people': 123, 'stern': 68, 'some': 94, 'born': 249, 'Master': 186, 'election': 434, 'dear': 136, 'home': 161, 'Mary': 41, 'delivered': 46, 'Life': 1713, 'Lee': 637, 'for': 1377, 'does': 51, 'He': 117, 'be': 491, 'Protestant': 52, 'bold': 69, 'Frederick': 152, 'Bill': 73, 'lakes': 127, 'O': 181, 'Times': 140, 'substituted': 49, 'paid': 130, 'worthy': 48, 'from': 1571, 'Emperor': 199, 'by': 6040, 'throne': 107, 'on': 1726, 'about': 181, 'carried': 61, 'of': 41002, 'le': 241, 'Austria': 214, 'rule': 45, 'emperor': 64, 'persuaded': 61, 'or': 598, 'road': 57, 'Story': 102, 'letter': 101, 'owe': 50, 'June': 80, 'into': 314, 'within': 64, 'son': 653, 'down': 101, 'Speech': 176, 'But': 67, 'artists': 40, 'your': 114, 'van': 255, 'her': 690, 'museum': 428, 'there': 218, 'long': 377, 'awarded': 150, 'accordingly': 136, 'much': 46, 'Letter': 114, 'was': 3440, 'war': 129, 'happy': 63, 'offer': 63, 'January': 72, 'registered': 183, 'J': 495, 'uncle': 103, 'line': 71, 'with': 2679, 'he': 509, 'Irish': 65, 'Writings': 1278, 'apology': 64, 'Gold': 61, 'whether': 224, 'House': 63, 'Africa': 109, 'up': 162, 'Society': 130, 'Joseph': 58, 'more': 294, 'Sir': 97, 'Modern': 41, 'Introduction': 43, 'year': 60, 'influence': 209, 'Lake': 905, 'an': 620, 'as': 1882, 'Edward': 958, 'at': 8469, 'watched': 44, 'kissed': 59, 'prevail': 55, 'looked': 52, 'no': 166, 'Vice': 221, 'when': 340, 'other': 186, 'poor': 211, 'Great': 1657, 'E': 47, 'brothers': 40, 'remarked': 50, 'visits': 63, 'Thought': 197, 'fell': 82, 'Lodge': 82, 'St': 112, 'South': 233, 'building': 84, 'Majesty': 484, 'wife': 400, 'Charles': 2248, 'Private': 117, 'An': 64, 'As': 77, 'Street': 41, 'mass': 141, 'pupil': 166, 'time': 1041, 'once': 375}
Alexander	{'writings': 154, 'child': 58, 'Poetry': 256, 'hath': 82, 'sleep': 51, 'whose': 113, 'Plato': 98, 'under': 4639, 'teaching': 48, 'rescue': 42, 'rise': 584, 'every': 202, 'Henry': 509, 'Military': 177, 'vast': 41, 'Thomas': 1599, 'presidency': 247, 'succession': 58, 'Paul': 767, 'triumph': 109, 'charter': 62, 'second': 1009, 'sailed': 41, 'even': 141, 'established': 286, 'selected': 42, 'poison': 71, 'conduct': 247, 'new': 211, 'ever': 126, 'told': 507, 'Foundation': 387, 'never': 148, 'here': 266, 'reported': 187, 'aftermath': 133, 'obtained': 161, 'generals': 297, 'daughter': 2005, 'study': 173, 'Queen': 120, 'military': 89, 'settled': 79, 'secure': 56, 'campaign': 114, 'erected': 47, 'brought': 337, 'Satan': 67, 'Russian': 729, 'spoke': 54, 'would': 1178, 'army': 639, 'coins': 113, 'arms': 281, 'until': 1548, 'type': 93, 'tell': 60, 'wars': 410, 'reign': 11519, 'it': 2886, 'V': 365, 'must': 232, 'me': 162, 'room': 175, 'pursue': 46, 'work': 1426, 'worn': 103, 'era': 337, 'guise': 81, 'akin': 44, 'my': 135, 'route': 168, 'estate': 186, 'organized': 51, 'want': 150, 'Lord': 97, 'end': 500, 'thing': 207, 'Journal': 71, 'how': 146, 'significance': 168, 'interview': 801, 'ancestor': 77, 'III': 9331, 'A': 439, 'fever': 89, 'minority': 447, 'beauty': 44, 'after': 14961, 'modest': 97, 'lay': 218, 'president': 143, 'law': 132, 'attempt': 227, 'third': 290, 'things': 40, 'order': 869, 'office': 58, 'over': 494, 'reforms': 377, 'satisfied': 83, 'before': 2817, 'Urban': 69, 'His': 912, 'personal': 139, 'crew': 51, 'destroyed': 354, 'Birth': 167, 'then': 269, 'them': 733, 'thee': 43, 'school': 50, 'band': 48, 'they': 577, 'prize': 51, 'grammar': 89, 'India': 3685, 'victory': 573, 'went': 380, 'side': 245, 'burial': 47, 'preceded': 59, 'fairly': 64, 'Bar': 126, 'taught': 107, 'borne': 127, 'decree': 243, 'god': 44, 'Wars': 175, 'William': 1853, 'fellowship': 257, 'got': 210, 'Napoleon': 2407, 'foundation': 46, 'University': 703, 'given': 595, 'free': 101, 'Liberal': 138, 'mistress': 57, 'Studies': 183, 'Cambridge': 76, 'National': 200, 'days': 4277, 'already': 216, 'adopted': 200, 'another': 90, 'service': 112, 'historian': 382, 'master': 213, 'Alexander': 3926, 'murder': 1206, 'took': 520, 'showed': 47, 'seated': 199, 'matter': 57, 'See': 307, 'guilt': 66, 'friend': 558, 'palace': 61, 'ran': 40, 'portraits': 488, 'mind': 369, 'manner': 407, 'seen': 219, 'seem': 45, 'forced': 87, 'strength': 64, 'Egyptian': 72, 'latter': 99, 'Albert': 109, 'Minister': 91, 'Jones': 78, 'though': 68, 'object': 69, 'Frederick': 78, 'letter': 3372, 'coin': 117, 'alarm': 60, 'camp': 81, 'treaty': 44, 'came': 792, 'saying': 43, 'pope': 136, 'European': 54, 'guidance': 162, 'do': 248, 'Account': 43, 'de': 127, 'Days': 171, 'wedding': 49, 'report': 96, 'X': 130, 'reviewed': 42, 'Washington': 240, 'public': 54, 'steal': 45, 'ears': 43, 'reference': 265, 'decided': 93, 'John': 3159, 'best': 177, 'subject': 66, 'said': 1852, 'Rise': 128, 'unable': 51, 'drawn': 231, 'approach': 43, 'discovery': 293, 'we': 174, 'men': 471, 'Central': 236, 'handful': 42, 'weak': 64, 'drew': 102, 'news': 41, 'received': 572, 'protect': 74, 'met': 126, 'country': 333, 'adventures': 285, 'against': 722, 'und': 132, 'expense': 131, 'regarded': 160, 'appeared': 65, 'had': 11339, 'initiative': 106, 'path': 40, 'trust': 90, 'Method': 151, 'speak': 152, 'proceeded': 58, 'condemned': 58, 'three': 49, 'been': 4636, 'much': 78, 'entered': 195, 'life': 2884, 'Georgia': 78, 'Adams': 833, 'Vice': 232, 'turned': 131, 'concerning': 62, 'enterprises': 51, 'Stuart': 94, 'Secretary': 1503, 'applied': 92, 'physician': 48, 'exception': 430, 'publicly': 70, 'Deputy': 91, 'apprehension': 42, 'near': 71, 'Persian': 495, 'suppose': 50, 'teachers': 43, 'stopping': 43, 'property': 70, 'mistake': 67, 'is': 5001, 'Ideas': 174, 'defeated': 682, 'shame': 72, 'in': 27442, 'if': 1011, 'grown': 41, 'confirmed': 192, 'descent': 46, 'perform': 51, 'saw': 316, 'make': 181, 'beside': 57, 'President': 198, 'entitled': 251, 'meets': 153, 'Institute': 466, 'VI': 9873, 'hand': 192, 'marched': 195, 'Home': 102, 'insisted': 79, 'opportunity': 40, 'kept': 40, 'mother': 873, 'the': 329136, 'left': 273, 'opinions': 41, 'just': 236, 'gods': 41, 'repose': 44, 'aside': 84, 'thanks': 159, 'ease': 41, 'assassination': 4641, 'character': 499, 'dreams': 41, 'News': 44, 'Philip': 3125, 'has': 1055, 'Eighth': 66, 'gave': 620, 'James': 1048, 'possible': 58, 'birth': 536, 'feelings': 47, 'advanced': 227, 'vanity': 43, 'desire': 200, 'performing': 48, 'offices': 91, 'officer': 69, 'Aristotle': 1208, 'night': 482, 'security': 44, 'Prince': 1208, 'old': 379, 'deal': 74, 'dead': 103, 'born': 146, 'election': 1370, 'dear': 40, 'rapidity': 46, 'congratulate': 100, 'Life': 4046, 'Lee': 51, 'for': 3326, 'companions': 513, 'honoured': 42, 'steps': 71, 'He': 446, 'School': 116, 'Origins': 189, 'Near': 365, 'memorable': 78, 'Emperor': 12794, 'First': 541, 'months': 45, 'Eighteenth': 84, 'Austria': 110, 'efforts': 195, 'raised': 110, 'formerly': 174, 'Greeks': 119, 'prisoners': 46, 'Scotland': 559, 'two': 385, 'down': 302, 'support': 232, 'flying': 47, 'fought': 155, 'way': 732, 'was': 24712, 'war': 181, 'head': 168, 'attempted': 80, 'failure': 47, 'true': 297, 'admiration': 60, 'attached': 64, 'ruined': 52, 'economic': 67, 'Sir': 11302, 'later': 432, 'proved': 45, 'Introduction': 113, 'Fellow': 352, 'negotiations': 88, 'constructed': 91, 'High': 116, 'no': 843, 'when': 2985, 'interested': 84, 'role': 68, 'holding': 94, 'papers': 113, 'picture': 50, 'brothers': 50, 'Asiatic': 89, 'surprise': 51, 'welcome': 66, 'fell': 90, 'died': 1166, 'younger': 279, 'Charles': 359, 'together': 67, 'Second': 249, 'time': 28792, 'Empire': 329, 'decision': 88, 'Times': 506, 'eldest': 898, 'grateful': 446, 'battle': 461, 'certainly': 108, 'praises': 163, 'father': 4602, 'passage': 64, 'answered': 58, 'charge': 415, 'marks': 116, 'division': 108, 'Prize': 52, 'advantage': 105, 'choice': 77, 'advised': 102, 'presented': 266, 'did': 1300, 'die': 69, 'brother': 1041, 'leave': 56, 'quick': 51, 'round': 60, 'photograph': 101, 'prevent': 274, 'unexpected': 139, 'regret': 51, 'invaded': 364, 'constitution': 105, 'assistance': 498, 'celebrated': 68, 'favour': 89, 'During': 1446, 'Francis': 61, 'honour': 231, 'funeral': 162, 'availed': 55, 'understanding': 96, 'English': 289, 'alone': 50, 'along': 120, 'David': 65, 'teacher': 214, 'change': 43, 'boy': 46, 'exemption': 48, 'studied': 93, 'example': 326, 'trial': 86, 'descended': 87, 'suggestion': 95, 'When': 1057, 'troops': 254, 'sake': 56, 'visit': 257, 'by': 39714, 'memory': 213, 'associated': 49, 'afford': 101, 'virtue': 116, 'Time': 635, 'cases': 94, 'Duke': 400, 'Family': 73, 'German': 266, 'soul': 58, 'inferior': 54, 'believed': 63, 'deceased': 64, 'claim': 46, 'dedicated': 54, 'figure': 47, 'agent': 51, 'heard': 541, 'allowed': 174, 'Walter': 118, 'divided': 442, 'write': 177, 'till': 1915, 'sword': 343, 'swore': 74, 'Why': 42, 'undertook': 133, 'armies': 112, 'may': 310, 'fed': 46, 'Martin': 57, 'such': 818, 'revealed': 40, 'man': 895, 'natural': 138, 'neck': 101, 'succeeded': 1767, 'tale': 197, 'so': 1084, 'talk': 45, 'shield': 124, 'furnished': 93, 'pointed': 221, 'years': 1367, 'course': 52, 'White': 87, 'still': 157, 'Lectures': 367, 'group': 43, 'interesting': 172, 'attraction': 56, 'policy': 713, 'World': 163, 'into': 1460, 'happened': 81, 'introduce': 67, 'answer': 102, 'Rome': 42, 'conquer': 279, 'half': 48, 'not': 5148, 'now': 407, 'killed': 413, 'nor': 181, 'wont': 57, 'name': 3792, 'resignation': 115, 'advent': 447, 'Museum': 312, 'hurried': 51, 'preaching': 76, 'year': 408, 'et': 146, 'shown': 133, 'opened': 49, 'space': 56, 'attracted': 43, 'dominions': 127, 'Commerce': 48, 'attended': 59, 'siege': 42, 'quite': 181, 'advantages': 105, 'poems': 223, 'care': 176, 'turn': 42, 'place': 463, 'invitation': 50, 'think': 156, 'first': 1378, 'origin': 42, 'accorded': 48, 'Asia': 699, 'one': 1085, 'invasion': 1039, 'vote': 62, 'open': 44, 'city': 750, 'little': 109, 'checked': 59, 'convent': 51, 'caught': 51, 'victorious': 46, 'Such': 42, 'speaking': 54, 'that': 21462, 'than': 1458, 'History': 875, 'wide': 43, 'future': 299, 'were': 4221, 'and': 81646, 'Court': 199, 'Prussia': 140, 'say': 279, 'conspiracy': 66, 'anger': 40, 'occupied': 103, 'any': 293, 'ideas': 135, 'note': 127, 'take': 194, 'Dr': 86, 'begin': 50, 'Name': 249, 'trace': 46, 'track': 43, 'knew': 56, 'printed': 60, 'considered': 50, 'adopt': 91, 'bade': 66, 'walking': 48, 'show': 164, 'Political': 97, 'accession': 1364, 'Research': 285, 'discovered': 393, 'ground': 40, 'We': 43, 'Woman': 97, 'only': 970, 'dispute': 75, 'Louis': 148, 'cannot': 116, 'conjunction': 75, 'Having': 44, 'Admiralty': 51, 'Right': 89, 'naked': 40, 'priest': 210, 'where': 490, 'declared': 236, 'Colonel': 270, 'eighteenth': 75, 'seat': 191, 'elected': 155, 'Grand': 355, 'hastened': 40, 'Johnson': 130, 'That': 73, 'enough': 67, 'between': 2116, 'reading': 209, 'arrival': 207, 'notice': 43, 'Natural': 134, 'FOR': 82, 'ascribed': 247, 'killing': 57, 'reigns': 615, 'cities': 246, 'come': 227, 'residence': 84, 'according': 57, 'somewhere': 76, 'among': 522, 'Death': 491, 'afterwards': 84, 'period': 1079, 'learning': 71, 'liberal': 53, 'better': 169, 'Medieval': 101, 'informed': 240, 'But': 331, 'breath': 77, 'direction': 941, 'thousand': 127, 'wake': 334, 'declaration': 68, 'spirit': 220, 'those': 1515, 'case': 865, 'these': 320, 'mount': 44, 'cash': 44, 'associates': 46, 'policies': 100, 'many': 68, 'telephone': 585, 'sudden': 62, 'commanded': 46, 'Israel': 121, 'Complete': 129, 'author': 73, 'Sixth': 477, 'granted': 275, 'heirs': 102, 'epoch': 183, 'speech': 46, 'noble': 314, 'centuries': 91, 'I': 7914, 'IV': 1175, 'driven': 81, 'persons': 198, 'statue': 1785, 'tradition': 44, 'It': 1824, 'without': 206, 'In': 1577, 'very': 111, 'model': 78, 'reward': 41, 'If': 782, 'summer': 51, 'being': 273, 'actions': 90, 'Daniel': 70, 'kill': 85, 'touch': 96, 'captured': 425, 'blow': 73, 'death': 35963, 'measure': 100, 'seems': 309, 'shrine': 47, 'Although': 47, 'Ethics': 97, 'real': 93, 'easily': 45, 'read': 273, 'Island': 318, 'early': 476, 'Solomon': 102, 'confined': 106, 'world': 436, 'execution': 102, 'recipient': 48, 'fortune': 144, 'Contemporary': 200, 'London': 208, 'either': 119, 'served': 41, 'Before': 68, 'Army': 87, 'Since': 112, 'Master': 109, 'table': 63, 'New': 484, 'Marx': 73, 'Mary': 200, 'Vienna': 57, 'passions': 43, 'recorded': 139, 'expressing': 41, 'freed': 151, 'Earl': 614, 'power': 164, 'leadership': 178, 'on': 6451, 'of': 310098, 'favorite': 48, 'stand': 40, 'testimony': 48, 'or': 2918, 'Story': 92, 'lands': 51, 'No': 98, 'image': 170, 'lively': 53, 'heated': 124, 'legacy': 114, 'prepare': 57, 'area': 53, 'assumed': 64, 'there': 407, 'alleged': 97, 'Letter': 512, 'J': 102, 'Treaty': 178, 'with': 10511, 'Guide': 81, 'Writings': 764, 'House': 53, 'Joseph': 115, 'strongly': 70, 'moved': 47, 'deep': 77, 'general': 270, 'How': 44, 'as': 8341, 'at': 9109, 'lifetime': 301, 'horse': 48, 'bishop': 41, 'again': 167, 'field': 64, 'Poems': 1041, 'you': 73, 'Lady': 540, 'offered': 94, 'poor': 65, 'Great': 88671, 'Essay': 362, 'Majesty': 504, 'wife': 467, 'contemporary': 533, 'Street': 1262, 'original': 78, 'Canada': 132, 'all': 435, 'founder': 41, 'caused': 142, 'month': 47, 'founded': 3769, 'Caesar': 2383, 'follow': 43, 'children': 47, 'Brown': 77, 'former': 44, 'to': 41250, 'painter': 105, 'woman': 46, 'returned': 48, 'far': 280, 'induce': 44, 'sons': 311, 'diary': 51, 'fall': 60, 'dictatorship': 133, 'joined': 106, 'past': 53, 'invention': 73, 'further': 108, 'East': 657, 'Theological': 99, 'what': 507, 'stood': 47, 'sum': 45, 'brief': 48, 'Cultural': 56, 'version': 232, 'learned': 108, 'method': 144, 'edited': 459, 'full': 46, 'Columbia': 59, 'observing': 40, 'legend': 133, 'Jews': 90, 'experience': 41, 'social': 113, 'successors': 3763, 'followed': 298, 'family': 826, 'Reform': 72, 'beheld': 189, 'Europe': 348, 'takes': 184, 'finally': 83, 'Romans': 107, 'Lives': 71, 'taken': 926, 'more': 167, 'initiated': 42, 'El': 47, 'St': 1059, 'American': 538, 'broke': 153, 'known': 760, 'Fleet': 54, 'presumed': 64, 'town': 40, 'none': 68, 'science': 44, 'der': 124, 'guards': 74, 'den': 216, 'v': 47, 'marble': 53, 'history': 1012, 'compare': 58, 'imitation': 97, 'purchased': 97, 'phrase': 42, 'biography': 267, 'information': 43, 'court': 708, 'rather': 154, 'acts': 79, 'ratified': 45, 'Fall': 74, 'Don': 80, 'intended': 42, 'advice': 90, 'different': 56, 'Foreign': 152, 'blood': 308, 'coming': 94, 'Syria': 254, 'appealed': 52, 'a': 12899, 'short': 42, 'resemble': 53, 'departure': 89, 'supervision': 293, 'pleasure': 40, 'dream': 211, 'playing': 40, 'replied': 61, 'infant': 49, 'help': 101, 'developed': 75, 'soon': 515, 'held': 45, 'embassy': 301, 'through': 777, 'same': 702, 'existence': 116, 'suffer': 47, 'its': 242, 'romance': 348, 'style': 54, 'March': 380, 'deeds': 449, 'late': 1208, 'absence': 232, 'might': 189, 'ally': 47, 'good': 191, 'return': 269, 'Michael': 52, 'feet': 88, 'likeness': 93, 'imprisoned': 54, 'ashes': 52, 'Scottish': 49, 'found': 709, 'friendship': 153, 'ports': 209, 'harm': 82, 'generation': 93, 'house': 491, 'thirteenth': 57, 'reduce': 58, 'idea': 55, 'Jefferson': 928, 'Essays': 165, 'beyond': 98, 'inquired': 68, 'wondered': 55, 'since': 120, 'Report': 203, 'induced': 47, 'Town': 132, 'confidence': 58, 'According': 147, 'difficulty': 87, 'reason': 291, 'members': 42, 'put': 412, 'rises': 50, 'beginning': 305, 'II': 22879, 'Congress': 57, 'but': 387, 'George': 1009, 'conducted': 119, 'indebted': 240, 'number': 126, 'juncture': 42, 'Indian': 40, 'done': 519, 'story': 1951, 'heads': 113, 'guest': 47, 'introduction': 85, 'least': 40, 'station': 50, 'passed': 338, 'hundred': 172, 'relationship': 109, 'immediate': 233, 'part': 549, 'translation': 185, 'believe': 129, 'convinced': 164, 'king': 436, 'kind': 216, 'instruction': 61, 'youth': 59, 'architect': 43, 'supposed': 94, 'treated': 51, 'Peace': 96, 'ages': 56, 'alike': 70, 'orders': 196, 'built': 1569, 'officers': 106, 'Arab': 158, 'also': 255, 'finding': 64, 'commentary': 194, 'With': 102, 'play': 42, 'towards': 67, 'monastery': 121, 'M': 189, 'Governor': 57, 'most': 113, 'Chief': 129, 'experiences': 43, 'nothing': 234, 'The': 13254, 'approved': 239, 'appear': 193, 'Egypt': 1107, 'clear': 299, 'charged': 40, 'joining': 179, 'particularly': 72, 'Had': 190, 'Palestine': 75, 'find': 265, 'occupation': 43, 'writes': 50, 'BC': 522, 'French': 100, 'Confederate': 92, 'his': 14108, 'founding': 51, 'permission': 181, 'famous': 184, 'Reich': 61, 'during': 2015, 'banner': 127, 'him': 1629, 'enemy': 47, 'sacrifices': 52, 'wrote': 1186, 'set': 345, 'For': 78, 'throwing': 82, 'see': 465, 'are': 1074, 'surrendered': 164, 'portrait': 798, 'Neither': 70, 'feared': 55, 'probable': 114, 'Park': 116, 'won': 105, 'marriage': 199, 'burned': 62, 'Age': 978, 'King': 6191, 'sole': 51, 'opposition': 192, 'both': 373, 'last': 152, 'Greek': 232, 'became': 755, 'let': 76, 'whole': 178, 'point': 138, 'Mr': 174, 'agents': 75, 'likes': 51, 'unsuccessful': 71, 'throughout': 222, 'War': 146, 'described': 252, 'raise': 42, 'Students': 53, 'appears': 43, 'whom': 541, 'meeting': 365, 'firm': 138, 'lived': 41, 'cruelty': 60, 'else': 121, 'lives': 179, 'struck': 51, 'fur': 75, 'Memoirs': 437, 'governor': 94, 'while': 246, 'replaced': 151, 'pressed': 65, 'fleet': 289, 'acquaintance': 522, 'kingdom': 282, 'century': 546, 'von': 4989, 'pronounced': 47, 'grant': 90, 'discourse': 152, 'esteemed': 50, 'daughters': 91, 'used': 696, 'affairs': 59, 'moment': 106, 'arrived': 224, 'purpose': 43, 'dared': 46, 'person': 524, 'Economic': 68, 'Jerusalem': 60, 'Virginia': 44, 'openly': 48, 'Papers': 3769, 'prince': 155, 'unto': 134, 'cut': 246, 'Theory': 93, 'hated': 61, 'source': 89, "Majesty's": 109, 'surprised': 70, 'On': 2178, 'march': 42, 'Did': 57, 'Of': 161, 'showing': 40, 'Commons': 53, 'Air': 86, 'matters': 79, 'submission': 57, 'signal': 42, 'translated': 391, 'foolish': 48, 'often': 55, 'senate': 62, 'obliged': 130, 'some': 273, 'back': 653, 'added': 42, 'sight': 144, 'delivered': 61, 'gradually': 43, 'scale': 73, 'Moscow': 88, 'religion': 148, 'pen': 269, 'dating': 41, 'temple': 44, 'be': 2496, 'remembered': 136, 'agreement': 47, 'refused': 318, 'become': 212, 'Translated': 77, 'Russia': 2933, 'Captain': 141, 'seeing': 72, 'conquest': 2517, 'within': 59, 'Young': 67, 'impressed': 61, 'Norman': 226, 'long': 241, 'forward': 51, 'opens': 77, 'mountains': 41, 'himself': 1767, 'hoped': 44, 'authority': 282, 'considerable': 42, 'up': 677, 'us': 115, 'Books': 83, 'called': 400, 'Works': 3636, 'ordered': 67, 'expedition': 1005, 'influence': 1035, 'presided': 98, 'To': 172, 'single': 76, 'diverse': 41, 'peace': 134, 'application': 44, 'Athens': 88, 'Man': 46, 'Case': 169, 'AD': 69, 'favourite': 89, 'breasts': 54, 'Thought': 210, 'rested': 47, 'lent': 64, 'vice': 92, 'age': 3219, 'An': 267, 'As': 300, 'Scott': 394, 'At': 121, 'fresh': 78, 'having': 111, 'once': 129, 'issued': 493, 'Saint': 58, 'go': 60, 'seemed': 116, 'young': 642, 'send': 206, 'dislike': 58, 'sent': 1517, 'confirmation': 126, 'seized': 94, 'continues': 86, 'Pope': 17455, 'anxious': 49, 'procession': 61, 'State': 1926, 'auspices': 48, 'giving': 59, 'assisted': 51, 'can': 121, 'body': 496, 'led': 553, 'Professor': 188, 'youngest': 234, 'desired': 41, 'Marriage': 515, 'others': 231, 'invented': 187, 'great': 778, 'Palace': 233, 'defeat': 57, 'opinion': 152, 'Arabic': 60, 'makes': 97, 'honor': 946, 'named': 881, 'danger': 47, 'Robert': 826, 'addressed': 77, 'names': 393, 'standing': 94, 'use': 100, 'from': 10139, 'ministers': 91, 'few': 221, 'doubt': 142, 'possession': 78, 'heir': 226, 'sister': 59, 'train': 259, 'appointed': 91, 'women': 126, 'account': 177, 'this': 664, 'Library': 742, 'crossed': 622, 'VIII': 47, 'meet': 75, 'reserved': 82, 'rode': 70, 'sit': 135, 'poetry': 114, 'regions': 46, 'establishment': 149, 'greatly': 41, 'Prayer': 194, 'collection': 92, 'Roman': 951, 'emperor': 892, 'Letters': 1495, 'One': 102, 'Middle': 43, 'chief': 53, 'Greece': 197, 'counted': 41, 'Wilson': 213, 'despite': 43, 'Smith': 55, 'doubtful': 45, 'whilst': 44, 'rendered': 69, 'including': 94, 'mentioned': 76, 'chosen': 162, 'physicians': 62, 'fourth': 285, 'descendants': 164, 'Can': 51, 'ambition': 103, 'front': 56, 'kindness': 160, 'fled': 179, 'successor': 790, 'worlds': 41, 'VII': 966, 'crossing': 65, 'accompanied': 661, 'eastern': 208, 'From': 101, 'Mediterranean': 285, 'related': 52, 'sands': 53, 'Which': 70, 'Epistle': 41, 'relates': 75, 'out': 963, 'messenger': 90, 'cause': 51, 'announced': 82, 'achievements': 58, 'This': 74, 'Senator': 44, 'her': 353, 'could': 905, 'times': 947, 'length': 46, 'secretary': 196, 'south': 92, 'Hotel': 67, 'manuscript': 103, 'Yale': 49, 'reached': 129, 'ancient': 118, 'North': 109, 'fair': 180, 'Eastern': 49, 'journals': 243, 'relations': 112, 'their': 555, 'Lincoln': 87, 'completed': 64, 'institution': 53, 'colleagues': 61, 'visited': 530, 'abolished': 45, 'Peter': 291, 'distinguished': 54, 'Richard': 294, 'unlikely': 62, 'have': 2723, 'need': 45, 'pursued': 139, 'After': 4830, 'able': 482, 'battles': 85, 'instance': 154, 'which': 4919, 'regard': 136, 'collaboration': 47, 'who': 1964, 'courtesy': 103, 'connected': 43, 'demise': 56, 'prisoner': 91, 'why': 190, 'bred': 107, 'request': 435, 'disease': 49, 'face': 127, 'looked': 48, 'H': 541, 'wounded': 111, 'Bishop': 427, 'fact': 304, 'son': 9307, 'Paris': 50, 'text': 78, 'agreed': 62, 'bring': 56, 'Under': 45, 'soldiers': 877, 'fear': 82, 'pleased': 281, 'staff': 116, 'knowledge': 96, 'should': 297, 'York': 238, 'Jacob': 44, 'hope': 69, 'meant': 50, 'beat': 44, 'bear': 135, 'words': 385, 'THE': 43, 'River': 59, 'ascended': 226, 'following': 271, 'evident': 57, 'ended': 42, 'married': 462, 'she': 376, 'widow': 270, 'strengthened': 42, 'C': 69, 'accused': 107, 'Good': 86, 'Wisconsin': 54, 'genius': 585, 'written': 568, 'neither': 99, 'Republican': 44, 'tent': 261, 'attention': 230, 'Study': 72, 'monarchy': 158, 'compelled': 113, 'historians': 113, 'approval': 144, 'Religion': 1359, 'career': 1332, 'invited': 143, 'equal': 46, 'attributed': 95, 'figures': 261, 'conquered': 1740, 'glorious': 79, 'walk': 43, 'cousin': 42, 'respect': 43, 'poem': 73, 'immense': 75, 'treat': 102, 'poet': 464, 'Among': 45, 'am': 568, 'sufficient': 74, 'depend': 55, 'finished': 63, 'bull': 1294, 'an': 3049, 'present': 300, 'And': 154, 'unlike': 45, 'appearance': 77, 'General': 1212, 'will': 325, 'instantly': 56, 'thus': 46, 'Imperial': 684, 'Literary': 292, 'demanded': 92, 'greater': 358, 'Garden': 132, 'perhaps': 134, 'began': 98, 'cross': 45, 'member': 60, 'disciple': 54, 'difficult': 61, 'Third': 662, 'novels': 52, 'upon': 654, 'Teachers': 473, 'dust': 357, 'Order': 241, 'destruction': 54, 'well': 465, 'Idea': 897, 'asked': 266, 'command': 695, 'undertaken': 45, 'restore': 60, 'cried': 41, 'less': 40, 'accurate': 43, 'hands': 82, 'domestic': 70, 'Fourth': 67, 'Be': 200, 'generous': 98, 'rapid': 45, 'day': 447, 'arrest': 66, 'other': 409, 'By': 117, 'identical': 50, 'government': 369, 'know': 93, 'descendant': 115, 'necessary': 40, 'like': 987, 'lost': 156, 'Influence': 70, 'Polish': 140, 'works': 336, 'belonged': 114, 'alive': 62, 'home': 307, 'empire': 4415, 'corpse': 51, 'Seventh': 118, 'philosopher': 132, 'does': 183, 'demonstrated': 73, 'Official': 119, 'worthy': 115, 'Chemistry': 70, 'swept': 40, 'throne': 760, 'about': 1029, 'actual': 43, 'US': 738, 'tomb': 714, 'introduced': 363, 'persuaded': 62, 'Arctic': 47, 'own': 94, 'letters': 239, 'Juan': 184, 'guard': 91, 'Mountains': 87, 'Finance': 48, 'intention': 53, 'preparations': 75, 'Following': 96, 'he': 3429, 'made': 1473, 'glory': 88, 'instructive': 72, 'whether': 245, 'wish': 88, 'stories': 67, 'monument': 45, 'urging': 97, 'bearing': 145, 'deaths': 92, 'Treasury': 2060, 'Lake': 53, 'Private': 163, 'education': 91, 'compared': 307, 'contest': 40, 'attacked': 75, 'book': 176, 'conclusion': 43, 'chance': 53, 'Testament': 105, 'friends': 41, 'South': 100, 'rule': 442, 'pupil': 392, 'Spanish': 54, 'understand': 125}

Cosine Similarity


In [104]:
from collections import defaultdict
from math import sqrt
from ast import literal_eval

def make_cosinesim_normalizations():
    '''
        Calculate the cosine similarity normalization for each word
    '''
    
    # file containing stripes with cooccurrence counts
    #stripesfilename = 'stripes_small_local.txt'
    #stripesfilename = 'stripes_small_emr.txt'
    stripesfilename = 'stripes_all_emr.txt'
    
    # file to store normalization values
    #normsfilename = 'cosinesim_normalizations_small_emr.txt'
    normsfilename = 'cosinesim_normalizations_all_emr.txt'
    normalizations = defaultdict(int)
    with open(stripesfilename, 'r') as f1:
        # loop over all stripes
        for line in f1.readlines():
            line = line.strip()
            line = line.split('\t')
            word = line[0]
            cooccurrences = literal_eval(line[1])
            # loop over all words in stripe
            for w2,count in cooccurrences.iteritems():
                # sum the squared counts
                normalizations[word]+=count**2

    with open(normsfilename, 'w') as f2:
        for word,sqsum in normalizations.iteritems():
            # take the squareroot of the squared sum
            normalizations[word]=sqrt(sqsum)
            # write the word and the normalization
            line = word+'\t'+str(normalizations[word])+'\n'
            f2.write(line)


make_cosinesim_normalizations()

In [ ]:


In [32]:
%%writefile CosineSim.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.protocol import RawValueProtocol
from mrjob.step import MRStep
import operator
from itertools import combinations
from collections import defaultdict
from re import match
from ast import literal_eval
from math import sqrt

class MR_CosineSim(MRJob):

    OUTPUT_PROTOCOL = RawValueProtocol
    
    def mapper(self, _, line):
        # for each stripe, emit each cooccurrence as the key
        #    and the input key word (the "document") and the coocurrence count as the value
        line = line.strip()
        line = line.split('\t')
        doc = line[0]
        posting = literal_eval(line[1])
        for word in posting.keys():
            yield word, (doc,posting[word])

    def reducer_cross(self, key, values):
        # find all pairs from the cross of the words that cooccur with the key
        #    and calculate the product of the coocurrence counts for that pair
        #    from the common key word
        counts = list(values)
        for i in range(len(counts)):
            doc1 = counts[i][0]
            c1 = counts[i][1]
            for j in range(i+1,len(counts)):
                doc2 = counts[j][0]
                c2 = counts[j][1]
                v = int(c1)*int(c2)
                # emit both orderings of the words in the pair
                if doc1<doc2:
                    yield (doc1, doc2), v
                else:
                    yield (doc2, doc1), v

    def reducer_total(self, pair, values):
        # sum the intermediate values for each pair of words and emit to one reducer
        yield None, (pair, sum(values))

    def reducer_init_calcsim(self):
        # load the normalizations for each word into memory in a map
        #     from the file that was created above
        self.normalizations = dict()
        #with open('cosinesim_normalizations_test.txt') as f:
        with open('cosinesim_normalizations_small_emr.txt') as f:
        #with open('cosinesim_normalizations_all_emr.txt') as f:
            for line in f.readlines():
                line = line.strip()
                line = line.split('\t')
                word = line[0]
                normval = float(line[1])
                self.normalizations[word]=normval
    
    def reducer_calcsim(self, _, pairsum):
        # calculate the cosine similarity for each pair of words
        #    using the aggregated numerator from the previous reducer
        #    and the normalizations
        for kv in pairsum:
            pair = kv[0]
            numerator = kv[1]
            word1 = pair[0]
            word2 = pair[1]
            sim = 1.0*numerator/(self.normalizations[word1]*self.normalizations[word2])
            yield None, (pair, sim)
    
    def steps(self):
        return [
            MRStep(mapper=self.mapper,
                   reducer=self.reducer_cross),
            MRStep(reducer=self.reducer_total),
            MRStep(reducer_init=self.reducer_init_calcsim,
                  reducer=self.reducer_calcsim)
            ]

if __name__ == '__main__':
    MR_CosineSim.run()


Overwriting CosineSim.py

In [ ]:
# test on toy problem locally
!python CosineSim.py /Users/davidadams/Documents/W261/hw5/hw5_4/stripes_test.txt --file cosinesim_normalizations_test.txt > pairsims_test.txt
# for output see attached pdf 5.4.2.cosine

In [13]:
!head pairsims_test.txt


(['aardvark', 'spark'], 0.5547001962258429)
(['aardvark', 'zebra'], 0.5547001962248173)
(['cheetah', 'lion'], 0.9877295966495229)
(['spark', 'zebra'], 1.000000000000364)

In [81]:
# test on small sample locally
!python CosineSim.py stripes_small_local.txt --file cosinesim_normalizations_small.txt > pairsims_small_local.txt


using configs in /Users/davidadams/.mrjob.conf
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-0-mapper_part-00000
Counters from step 1:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-0-mapper-sorted
> sort /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-0-mapper_part-00000
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-1-mapper_part-00000
Counters from step 2:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-1-mapper-sorted
> sort /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-1-mapper_part-00000
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-1-reducer_part-00000
Counters from step 2:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-2-mapper_part-00000
Counters from step 3:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-2-mapper-sorted
> sort /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-2-mapper_part-00000
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-2-reducer_part-00000
Counters from step 3:
  (no counters found)
Moving /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/step-2-reducer_part-00000 -> /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/output/part-00000
Streaming final output from /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641/output
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.094925.815641

In [16]:
!head pairsims_small_local.txt


null	[["A", "AB"], 0.6747976561521299]
null	[["A", "AND"], 0.05051986783459755]
null	[["A", "AT"], 0.6056821814172901]
null	[["A", "About"], 0.49559253211024035]
null	[["A", "Abuse"], 0.014223446316548911]
null	[["A", "Academy"], 0.7226783437808204]
null	[["A", "According"], 0.6560288007622058]
null	[["A", "Account"], 0.736336879085416]
null	[["A", "Act"], 0.8432067710050359]
null	[["A", "Action"], 0.31907972126627154]

In [33]:
# test on small sample on emr
!python CosineSim.py stripes_small_emr.txt --file '/Users/davidadams/Documents/W261/hw5/hw5_4/cosinesim_normalizations_small_emr.txt' -r emr --num-ec2-instances 10 --ec2-task-instance-type m1.medium > pairsims_small_emr.txt


using configs in /Users/davidadams/.mrjob.conf
using existing scratch bucket mrjob-1febc2c04977da79
using s3://mrjob-1febc2c04977da79/tmp/ as our scratch dir on S3
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151014.050848.316219
writing master bootstrap script to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151014.050848.316219/b.py

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

Copying non-input files into s3://mrjob-1febc2c04977da79/tmp/CosineSim.davidadams.20151014.050848.316219/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-25NY72PDDA4I1
Created new job flow j-25NY72PDDA4I1
Job launched 31.4s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 62.4s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 94.0s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 125.0s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 156.2s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 187.3s ago, status STARTING: Configuring cluster software
Job launched 218.3s ago, status STARTING: Configuring cluster software
Job launched 249.3s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 280.4s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 311.5s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 342.6s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 373.6s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 404.7s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 435.8s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 466.8s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 498.0s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 529.1s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 560.2s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 591.3s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 622.4s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 1 of 3)
Job launched 653.5s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 2 of 3)
Job launched 684.6s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 2 of 3)
Job launched 715.6s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 2 of 3)
Job launched 746.8s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 2 of 3)
Job launched 778.1s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 2 of 3)
Job launched 809.0s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 2 of 3)
Job launched 840.1s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 2 of 3)
Job launched 871.2s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 2 of 3)
Job launched 902.3s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 2 of 3)
Job launched 933.4s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 964.5s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 995.5s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1026.6s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1057.7s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1088.7s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1119.8s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1150.9s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1181.9s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1212.9s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1244.0s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1275.2s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1306.3s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1337.5s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1368.6s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1399.8s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job launched 1430.8s ago, status RUNNING: Running step (CosineSim.davidadams.20151014.050848.316219: Step 3 of 3)
Job completed.
Running time was 1143.0s (not counting time spent waiting for the EC2 instances)
ec2_key_pair_file not specified, going to S3
Fetching counters from S3...
Waiting 5.0s for S3 eventual consistency
Counters from step 1:
  File Input Format Counters :
    Bytes Read: 1389241
  File Output Format Counters :
    Bytes Written: 384264823
  FileSystemCounters:
    FILE_BYTES_READ: 999891
    FILE_BYTES_WRITTEN: 3646239
    HDFS_BYTES_READ: 5616
    HDFS_BYTES_WRITTEN: 384264823
    S3_BYTES_READ: 1389241
  Job Counters :
    Launched map tasks: 46
    Launched reduce tasks: 15
    Rack-local map tasks: 46
    SLOTS_MILLIS_MAPS: 1002119
    SLOTS_MILLIS_REDUCES: 1301107
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 864600
    Combine input records: 0
    Combine output records: 0
    Map input bytes: 1179530
    Map input records: 4871
    Map output bytes: 1888554
    Map output materialized bytes: 1246834
    Map output records: 85797
    Physical memory (bytes) snapshot: 8522506240
    Reduce input groups: 4871
    Reduce input records: 85797
    Reduce output records: 14312911
    Reduce shuffle bytes: 1246834
    SPLIT_RAW_BYTES: 5616
    Spilled Records: 171594
    Total committed heap usage (bytes): 5730930688
    Virtual memory (bytes) snapshot: 32433438720
Counters from step 2:
  File Input Format Counters :
    Bytes Read: 384490174
  File Output Format Counters :
    Bytes Written: 240583558
  FileSystemCounters:
    FILE_BYTES_READ: 391738016
    FILE_BYTES_WRITTEN: 569870990
    HDFS_BYTES_READ: 384496768
    HDFS_BYTES_WRITTEN: 240583558
  Job Counters :
    Data-local map tasks: 37
    Launched map tasks: 46
    Launched reduce tasks: 18
    Rack-local map tasks: 9
    SLOTS_MILLIS_MAPS: 1241538
    SLOTS_MILLIS_REDUCES: 1390294
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 1246220
    Combine input records: 0
    Combine output records: 0
    Map input bytes: 384264823
    Map input records: 14312911
    Map output bytes: 384264823
    Map output materialized bytes: 204747790
    Map output records: 14312911
    Physical memory (bytes) snapshot: 10555609088
    Reduce input groups: 6684097
    Reduce input records: 14312911
    Reduce output records: 6684097
    Reduce shuffle bytes: 204747790
    SPLIT_RAW_BYTES: 6594
    Spilled Records: 41774213
    Total committed heap usage (bytes): 7333056512
    Virtual memory (bytes) snapshot: 36501614592
Counters from step 3:
  File Input Format Counters :
    Bytes Read: 240637453
  File Output Format Counters :
    Bytes Written: 303850271
  FileSystemCounters:
    FILE_BYTES_READ: 101359408
    FILE_BYTES_WRITTEN: 204366616
    HDFS_BYTES_READ: 240644518
    S3_BYTES_WRITTEN: 303850271
  Job Counters :
    Data-local map tasks: 42
    Launched map tasks: 52
    Launched reduce tasks: 22
    Rack-local map tasks: 10
    SLOTS_MILLIS_MAPS: 1017166
    SLOTS_MILLIS_REDUCES: 866318
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 527760
    Combine input records: 0
    Combine output records: 0
    Map input bytes: 240583558
    Map input records: 6684097
    Map output bytes: 240583558
    Map output materialized bytes: 101363387
    Map output records: 6684097
    Physical memory (bytes) snapshot: 10905907200
    Reduce input groups: 1
    Reduce input records: 6684097
    Reduce output records: 6684097
    Reduce shuffle bytes: 101363387
    SPLIT_RAW_BYTES: 7065
    Spilled Records: 13368194
    Total committed heap usage (bytes): 7471865856
    Virtual memory (bytes) snapshot: 38698094592
Streaming final output from s3://mrjob-1febc2c04977da79/tmp/CosineSim.davidadams.20151014.050848.316219/output/
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151014.050848.316219
Removing all files in s3://mrjob-1febc2c04977da79/tmp/CosineSim.davidadams.20151014.050848.316219/
Removing all files in s3://mrjob-1febc2c04977da79/tmp/logs/j-25NY72PDDA4I1/
Terminating job flow: j-25NY72PDDA4I1

In [34]:
!head pairsims_small_emr.txt


(['affecting', 'recover'], 0.35684652282655421)	
(['affecting', 'reports'], 0.28057011749009614)	
(['affecting', 'river'], 0.40824829046300043)	
(['affecting', 'round'], 0.29697153251304026)	
(['affecting', 'said'], 0.12090718310550932)	
(['affecting', 'sands'], 0.28867513459438765)	
(['affecting', 'satisfied'], 0.032962495512797924)	
(['affecting', 'sees'], 0.44715747382295384)	
(['affecting', 'sets'], 0.67180079807281434)	
(['affecting', 'shared'], 0.21650635094579074)	

In [ ]:
# run on entire dataset on emr
!python CosineSim.py stripes_all_emr.txt --file '/Users/davidadams/Documents/W261/hw5/hw5_4/cosinesim_normalizations.txt' -r emr --num-ec2-instances 6 --ec2-task-instance-type m1.medium > pairsims_all_emr.txt
# STILL RUNNING AT TIME OF SUBMISSION
# for output see attached pdf 5.4.2.cosine


using configs in /Users/davidadams/.mrjob.conf
using existing scratch bucket mrjob-03e94e1f06830625
using s3://mrjob-03e94e1f06830625/tmp/ as our scratch dir on S3
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.223205.678385
writing master bootstrap script to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/CosineSim.davidadams.20151012.223205.678385/b.py

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

Copying non-input files into s3://mrjob-03e94e1f06830625/tmp/CosineSim.davidadams.20151012.223205.678385/files/

In [42]:
%%writefile SortSims.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.step import MRStep
import operator
from ast import literal_eval
from mrjob.protocol import RawValueProtocol

class MR_SortSims(MRJob):
    '''
        From Jake's SortingExample
    '''
    
    OUTPUT_PROTOCOL = RawValueProtocol
    
    def jobconf(self):
        # update jobconf for reverse numberical sorting
        orig_jobconf = super(MR_SortSims, self).jobconf()        
        custom_jobconf = {
            'mapred.output.key.comparator.class': 'org.apache.hadoop.mapred.lib.KeyFieldBasedComparator',
            'mapred.text.key.comparator.options': '-k1rn',
        }
        combined_jobconf = orig_jobconf
        combined_jobconf.update(custom_jobconf)
        self.jobconf = combined_jobconf
        return combined_jobconf

    def mapper(self, _, line):
        # emit similarity of pair as key and pair as value 
        #   (b/c mapper output sorted by key)
        line = line.strip()
        pairsim = literal_eval(line)
        #pairsim = literal_eval(line.split('\t')[1])
        pair = pairsim[0]
        sim = pairsim[1]*1000
        yield float(sim), pair
    
    def reducer(self, sim, values):
        # output sorted similarities from mappers
        for pair in values:
            yield None, (pair, sim/1000.0)
    
    def steps(self):
        return [
            MRStep(mapper=self.mapper,
                   reducer=self.reducer)
            ]

if __name__ == '__main__':
    MR_SortSims.run()


Overwriting SortSims.py

In [28]:
!python SortSims.py pairsims_small_local.txt > sortedpairims_small_local.txt


using configs in /Users/davidadams/.mrjob.conf
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.045455.653631

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.045455.653631/step-0-mapper_part-00000
Counters from step 1:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.045455.653631/step-0-mapper-sorted
> sort /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.045455.653631/step-0-mapper_part-00000
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.045455.653631/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
Moving /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.045455.653631/step-0-reducer_part-00000 -> /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.045455.653631/output/part-00000
Streaming final output from /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.045455.653631/output
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.045455.653631

In [29]:
!head sortedpairims_small_local.txt


(['He', 'your'], 0.00010001232606782967)
(['Before', 'myself'], 0.00010022142072932997)
(['An', 'Because'], 0.0001003452943403656)
(['I', 'ask'], 0.00010036120771238924)
(['Administrative', 'a'], 0.000100398134956016)
(['I', 'various'], 0.0001003989470529988)
(['History', 'on'], 0.00010041805126285983)
(['At', 'sort'], 0.00010044103883908315)
(['California', 'his'], 0.00010047093942007364)
(['But', 'bent'], 0.00010052034892576802)

In [43]:
!python SortSims.py pairsims_small_emr.txt -r emr --num-ec2-instances 10 --ec2-task-instance-type m1.medium > sortedpairsims_small_emr1.txt


using configs in /Users/davidadams/.mrjob.conf
using existing scratch bucket mrjob-1febc2c04977da79
using s3://mrjob-1febc2c04977da79/tmp/ as our scratch dir on S3
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.062627.439867
writing master bootstrap script to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.062627.439867/b.py

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

Copying non-input files into s3://mrjob-1febc2c04977da79/tmp/SortSims.davidadams.20151014.062627.439867/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-2PJHMBNTIABA7
Created new job flow j-2PJHMBNTIABA7
Job launched 31.0s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 62.0s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 93.0s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 124.5s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 156.3s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 187.9s ago, status STARTING: Configuring cluster software
Job launched 218.9s ago, status STARTING: Configuring cluster software
Job launched 250.4s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 281.5s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 313.1s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 344.2s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 375.3s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 406.3s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 437.5s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 468.6s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 500.0s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 531.1s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 562.6s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 593.7s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 625.1s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 656.2s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job launched 687.6s ago, status RUNNING: Running step (SortSims.davidadams.20151014.062627.439867: Step 1 of 1)
Job completed.
Running time was 376.0s (not counting time spent waiting for the EC2 instances)
ec2_key_pair_file not specified, going to S3
Fetching counters from S3...
Waiting 5.0s for S3 eventual consistency
Counters from step 1:
  File Input Format Counters :
    Bytes Read: 304025738
  File Output Format Counters :
    Bytes Written: 303850592
  FileSystemCounters:
    FILE_BYTES_READ: 154324265
    FILE_BYTES_WRITTEN: 337180565
    HDFS_BYTES_READ: 5616
    S3_BYTES_READ: 304025738
    S3_BYTES_WRITTEN: 303850592
  Job Counters :
    Launched map tasks: 38
    Launched reduce tasks: 17
    Rack-local map tasks: 38
    SLOTS_MILLIS_MAPS: 3776736
    SLOTS_MILLIS_REDUCES: 1877286
    Total time spent by all maps waiting after reserving slots (ms): 0
    Total time spent by all reduces waiting after reserving slots (ms): 0
  Map-Reduce Framework:
    CPU time spent (ms): 1815990
    Combine input records: 0
    Combine output records: 0
    Map input bytes: 303850271
    Map input records: 6684097
    Map output bytes: 269095167
    Map output materialized bytes: 181476354
    Map output records: 6684097
    Physical memory (bytes) snapshot: 10024701952
    Reduce input groups: 3925146
    Reduce input records: 6684097
    Reduce output records: 6684097
    Reduce shuffle bytes: 181476354
    SPLIT_RAW_BYTES: 5616
    Spilled Records: 13368194
    Total committed heap usage (bytes): 6576730112
    Virtual memory (bytes) snapshot: 32944414720
Streaming final output from s3://mrjob-1febc2c04977da79/tmp/SortSims.davidadams.20151014.062627.439867/output/
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/SortSims.davidadams.20151014.062627.439867
Removing all files in s3://mrjob-1febc2c04977da79/tmp/SortSims.davidadams.20151014.062627.439867/
Removing all files in s3://mrjob-1febc2c04977da79/tmp/logs/j-2PJHMBNTIABA7/
Terminating job flow: j-2PJHMBNTIABA7

In [44]:
!head sortedpairsims_small_emr1.txt


(['Lives', 'Practical'], 1.0000000000056593)	
(['Presbyterian', 'proximal'], 1.0000000000013252)	
(['goals', 'rooms'], 1.0000000000007947)	
(['Valley', 'autumn'], 1.0000000000006399)	
(['Battle', 'aortic'], 1.000000000000393)	
(['optic', 'proximal'], 0.99999999999906453)	
(['Treatment', 'fluctuations'], 0.98569780243333371)	
(['Richard', 'la'], 0.98018693946650393)	
(['rising', 'wording'], 0.97340659133006679)	
(['Lives', 'battle'], 0.95540983478281549)	

Jaccard Similarity


In [58]:
from collections import defaultdict
from ast import literal_eval
def make_jaccard_cooccur():
    #stripesfilename = 'stripes_small_emr.txt'
    stripesfilename = 'stripes_test.txt'
    cooccurcountsfilename = 'cooccur_counts_test.txt'
    cooccurcounts = defaultdict(int)
    f2 = open(cooccurcountsfilename, 'w')
    with open(stripesfilename, 'r') as f1:
        for line in f1.readlines():
            line = line.strip()
            line = line.split('\t')
            word = line[0]
            cooccurrences = literal_eval(line[1])
            cooccurcounts_line = word+'\t'+str(len(cooccurrences.keys()))+'\n'
            f2.write(cooccurcounts_line)


make_jaccard_cooccur()

In [12]:
%%writefile Jaccard.py
from mrjob.conf import combine_dicts
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.step import MRStep
import operator
from itertools import combinations
from collections import defaultdict
from re import match
from ast import literal_eval
from math import sqrt

class MR_Jaccard(MRJob):

    def mapper(self, _, line):
        # for each stripe, emit each cooccurrence as the key
        #    and the input key word (the "document") as the value
        line = line.strip()
        line = line.split('\t')
        doc = line[0]
        posting = literal_eval(line[1])
        for word in posting.keys():
            yield word, doc

    def reducer_cross(self, key, values):
        # find all pairs from the cross of the words that cooccur with the key
        #    and emit that pair and 1
        val = list(values)
        for i in range(len(val)):
            w1=val[i]
            for j in range(i+1,len(val)):
                w2=val[j]
                # emit both orderings of the words in the pair
                if w1<w2:
                    yield (w1, w2), 1
                else:
                    yield (w2, w1), 1

    def reducer_total(self, pair, values):
        # sum the intermediate values for each pair of words and emit to one reducer
        yield None, (pair, sum(values))

    def reducer_init_calcsim(self):
        # load the cooccurrence counts for each word into memory in a map
        #     from the file that was created above
        self.cooccur_counts = dict()
        filename = 'cooccur_counts_small_emr.txt'
        #filename = 'cooccur_counts_all_emr.txt'
        with open(filename, 'r') as f:
            for line in f.readlines():
                line = line.strip()
                line = line.split('\t')
                word = line[0]
                count = int(line[1])
                self.cooccur_counts[word]=count
    
    def reducer_calcsim(self, _, pairsum):
        # calculate the Jaccard index for each pair of words
        #    using the aggregated numerator from the previous reducer
        #    and the cooccurrence counts
        for kv in pairsum:
            pair = kv[0]
            intersect = kv[1]
            word1 = pair[0]
            word2 = pair[1]
            union = self.cooccur_counts[word1]+self.cooccur_counts[word2]-intersect
            sim = 1.0*intersect/union
            yield None, (pair, sim)
    
    def steps(self):
        return [
            MRStep(mapper=self.mapper,
                   reducer=self.reducer_cross),
            MRStep(reducer=self.reducer_total),
            MRStep(reducer_init=self.reducer_init_calcsim,
                  reducer=self.reducer_calcsim)
            ]

if __name__ == '__main__':
    MR_Jaccard.run()


Overwriting Jaccard.py

In [13]:
# test on toy problem locally
%cd /Users/davidadams/Documents/W261/hw5/hw5_4
!python Jaccard.py /Users/davidadams/Documents/W261/hw5/hw5_4/stripes_test.txt --file cooccur_counts_test.txt > jaccardsims_test.txt


/Users/davidadams/Documents/W261/hw5/hw5_4
using configs in /Users/davidadams/.mrjob.conf
creating tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/Jaccard.davidadams.20151012.235904.084171

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/Jaccard.davidadams.20151012.235904.084171/step-0-mapper_part-00000
Counters from step 1:
  (no counters found)
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/Jaccard.davidadams.20151012.235904.084171/step-0-mapper-sorted
> sort /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/Jaccard.davidadams.20151012.235904.084171/step-0-mapper_part-00000
writing to /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/Jaccard.davidadams.20151012.235904.084171/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
Moving /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/Jaccard.davidadams.20151012.235904.084171/step-0-reducer_part-00000 -> /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/Jaccard.davidadams.20151012.235904.084171/output/part-00000
Streaming final output from /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/Jaccard.davidadams.20151012.235904.084171/output
removing tmp directory /var/folders/7t/6bhz6vw52k52g_3jqj57mz6r0000gn/T/Jaccard.davidadams.20151012.235904.084171

In [ ]:

HW5.5 Evaluate synonym detector

In this part of the assignment you will evaluate the success of you synonym detector.
Take the top 1,000 closest/most similar/correlative pairs of words as determined by your measure in (2), and use the synonyms function in the accompanying python code:
nltk_synonyms.py

Note: This will require installing the python nltk package:
http://www.nltk.org/install.html
and downloading its data with nltk.download().

For each (word1,word2) pair, check to see if word1 is in the list, synonyms(word2), and vice-versa. If one of the two is a synonym of the other, then consider this pair a 'hit', and then report the precision, recall, and F1 measure of your detector across your 1,000 best guesses. Report the macro averages of these measures.


In [5]:
'''
    HW 5.5
    Take the top 1,000 closest/most similar/correlative pairs of words 
    as determined by your measure in 5.4(2), and use the synonyms function 
    in nltk_synonyms.py

    For each (word1,word2) pair, check to see if word1 is in the list, 
    synonyms(word2), and vice-versa. If one of the two is a synonym 
    of the other, then consider this pair a 'hit', and then report 
    the precision, recall, and F1 measure  of your detector across 
    your 1,000 best guesses. Report the macro averages of these measures.

'''

# make directory for problem and change to that dir
!mkdir ~/Documents/W261/hw5/hw5_5/
%cd ~/Documents/W261/hw5/hw5_5/


mkdir: /Users/davidadams/Documents/W261/hw4/hw4_3/: File exists
/Users/davidadams/Documents/W261/hw4/hw4_3

In [45]:
from ast import literal_eval
from collections import defaultdict

def preprocess_synonyms():
    '''
        preprocess the similarity file to take the top 1,000 pairs
        and output all of the identified synonyms for each word
    '''
    synlists = defaultdict(list)
    #synfile = 'pairsims_small_localemr3.txt'
    synfile = 'sortedpairsims_small_emr1.txt'
    preprocessfile = 'synonymlists_small_emr.txt'
    linecount = 0
    with open(synfile, 'r') as f1:
        for line in f1:
            linecount+=1
            if linecount>=1000:
                break
            pairsim = literal_eval(line)
            word1, word2 = pairsim[0]
            synlists[word1].append(word2)
            synlists[word2].append(word1)
    
    with open(preprocessfile, 'w') as f2:
        for word in synlists.keys():
            line = word+'\t'+str(synlists[word])+'\n'
            f2.write(line)

            
preprocess_synonyms()

In [46]:
!head synonymlists_small_emr.txt


essay	['Spaniards']
represent	['behave']
managed	['work']
Indians	['battle', 'Session', 'Act', 'rights', 'Annals', 'Fleet', 'depths', 'description', 'chief']
Poetry	['apostle', 'surface', 'strength', 'Annual', 'Changing', 'Deputy', 'chamber', 'rooms', 'Garden', 'latter', 'lack', 'most', 'points', 'summer', 'war', 'recognition']
Annals	['Academy', 'Lives', 'vitro', 'Treatment', 'Geography', 'Indians', 'Growth', 'chief', 'A', 'Communist', 'pulse', 'American']
Republic	['aortic', 'Diseases', 'principle', 'and', 'An', 'King', 'French', 'division', 'increased']
controversial	['Given']
Department	['activities', 'personnel', 'Duke', 'Side', 'View', 'tomb', 'features', 'Appeal', 'Men']
Communist	['Annals']

In [54]:
from nltk.corpus import wordnet as wn
from ast import literal_eval

def calc_precisionrecall():
    synfile = 'synonymlists_small_emr.txt'
    precision_list = list()
    recall_list = list()
    with open(synfile, 'r') as f:
        for line in f:
            line = line.strip()
            word,synonyms = line.split('\t')
            simsynlist = literal_eval(synonyms)
            simsynlist = [w.lower() for w in simsynlist]
            simsyn = set(simsynlist)
            wnsynlist = list()
            for w in wn.synsets(word):
                for l in w.lemmas():
                    wnsynlist.append(l.name())
            wnsynlist = [w.lower() for w in wnsynlist]
            wnsyn = set(wnsynlist)

            num_true_pos = len(wnsyn.intersection(simsyn))
            num_labeled_pos = len(simsyn)
            num_all_pos = len(wnsyn)
            
            if num_all_pos>0 and num_labeled_pos>0:
                precision = 1.0*num_true_pos/num_labeled_pos
                recall = 1.0*num_true_pos/num_all_pos
            else:
                continue
            
            precision_list.append(precision)
            recall_list.append(recall)
    
    overall_precision = 1.0*sum(precision_list)/len(precision_list)
    overall_recall = 1.0*sum(recall_list)/len(recall_list)
    
    print "Overall Precision: ", overall_precision
    print "Overall Recall: ", overall_recall

            
calc_precisionrecall()


Overall Precision:  0.0036433626254
Overall Recall:  0.00103542914172