Now You Code 4: Movie Goers Zipcode Lookup

A movie company has hired you to help them enhance their data set. They would like to know which US State each of the respondents in their movie goers survey comes from, and ask you to produce a list of states and a count of movie goers from that state.

The movie goers dataset 'NYC1-moviegoers.csv' from NYC1 contains 'zip_code' but not city and state.

We will load another pandas dataset, the Zipcode Database here: 'https://raw.githubusercontent.com/mafudge/datasets/master/zipcodes/free-zipcode-database-Primary.csv' This data set contains Zip codes with primary city, state and approximate location.

Your goal is to figure out how to use the DataFrame.merge() method to combine these two data sets on matching zip code values. http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

After you merge the dataset, then you can complete the task and provide a count of movie goers by state.


In [35]:
# import pandas
import pandas as pd

# this turns off warning messages
import warnings
warnings.filterwarnings('ignore')

Part 1: Load the movie goers dataset into a Pandas DataFrame

Write code to load the movie goers dataset (in csv format) into the variable moviegoers and then print the first few rows.


In [ ]:

Part 2: Load the zip code database into a Pandas DataFrame

Write code to load the movie goers dataset (in csv format) into the variable zipcodes and then print the first few rows.

The database (in csv format) can be found here: 'https://raw.githubusercontent.com/mafudge/datasets/master/zipcodes/free-zipcode-database-Primary.csv'

HINT: You must include the named argument dtype={'Zipcode': object} to the read_csv() method to force the Zipcode series to be the same type as in the moviegoers dataframe.


In [ ]:

Part 3: Merge both data sets into a single combined DataFrame

Next we must merge the moviegoers DataFrame with the zipcodes DataFrame. To do this you must specify which zip code column from moviegoers matches the zip cod column from zipcodes (as you can see they have different names).

Help on method merge in module pandas.core.frame:

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False) method of pandas.core.frame.DataFrame instance
    Merge DataFrame objects by performing a database-style join operation by
    columns or indexes.

The type of merge we will do is an inner, because we only want rows when the zip codes match. This is called an intersection.

To complete a merge we must specify the column names from the left and right DataFrames. Most of the code has been written for you. Your task is to complete the columns for the merge, replacing ???? with the appropriate column names.


In [ ]:

Part 4: Merge both data sets into a single combined DataFrame

Finally, produce the desired output a list of states and counts of movie goers from the survey in each state.

Here's the top 5 for reference:

CA    116
MN     78
NY     60
TX     51
IL     50

Step 5: Questions

  1. Pandas programs are different than typical Python programs. Explain the process you followed to achieve the solution?

Answer:

  1. What was the most difficult aspect of this assignment?

Answer:

Step 6: Reflection

Reflect upon your experience completing this assignment. This should be a personal narrative, in your own voice, and cite specifics relevant to the activity as to help the grader understand how you arrived at the code you submitted. Things to consider touching upon: Elaborate on the process itself. Did your original problem analysis work as designed? How many iterations did you go through before you arrived at the solution? Where did you struggle along the way and how did you overcome it? What did you learn from completing the assignment? What do you need to work on to get better? What was most valuable and least valuable about this exercise? Do you have any suggestions for improvements?

To make a good reflection, you should journal your thoughts, questions and comments while you complete the exercise.

Keep your response to between 100 and 250 words.

--== Write Your Reflection Below Here ==--