In [3]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import mrjobs as mr

Homework 4

Copy this notebook. Rename it as: YOURNAME-HW4-mapreduce-XX

with your name replacing YOURNAME and the xx replaced with the date you submit or copy this HW.

.

Upload your completed jupyter notebook to elearning site as your homework submission. Do not put this notebook on your github.

Do all the homeworks problems below: As noted doing the homework gets a 3 out of 5. To do more create tutorials on how to use Map reduce for different analysis that were not assigned in this HW.

Use the data/bible+shakes.nonpunc.txt file as the source of you analysis in this homework

Homework 4.1

A bigram is the combination of words. Find the 10 most common bigrams from the text. Order counts in the bigram combination for example "in the" is not the same bigram as "the in"


In [ ]:

Homework 4.2

Now do the same analysis but make the word order not count "in the" == "the in". Find the 10 most common ordered bigrams from the alice text.


In [ ]:

Homework 4.3

A trigram are three word combintation. Find the 10 most common ordered trigrams from the alice text. Make it so that the order of the words do not count in the trigram combination for example "in the air" is the same trigram as "the air in" or "air in the"...


In [ ]:

Homework 4.4

Create graphs to explain the relationship of the frequency of monograms ( words ) to bigrams and trigam frequencies


In [ ]:


In [ ]: