A movie company has hired you to help them enhance their data set. They would like to know which US State each of the respondents in their movie goers survey comes from, and ask you to produce a list of states and a count of movie goers from that state.
The movie goers dataset 'NYC1-moviegoers.csv'
from NYC1 contains 'zip_code'
but not city and state.
We will load another pandas dataset, the Zipcode Database here:
'https://raw.githubusercontent.com/mafudge/datasets/master/zipcodes/free-zipcode-database-Primary.csv'
This data set contains Zip codes with primary city, state and approximate location.
Your goal is to figure out how to use the DataFrame.merge()
method to combine these two data sets on matching zip code values.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
After you merge the dataset, then you can complete the task and provide a count of movie goers by state.
In [35]:
# import pandas
import pandas as pd
# this turns off warning messages
import warnings
warnings.filterwarnings('ignore')
In [ ]:
Write code to load the movie goers dataset (in csv format) into the variable zipcodes
and then print the first few rows.
The database (in csv format) can be found here: 'https://raw.githubusercontent.com/mafudge/datasets/master/zipcodes/free-zipcode-database-Primary.csv'
HINT: You must include the named argument dtype={'Zipcode': object}
to the read_csv()
method to force the Zipcode
series to be the same type as in the moviegoers
dataframe.
In [ ]:
Next we must merge the moviegoers
DataFrame with the zipcodes
DataFrame. To do this you must specify which zip code column from moviegoers
matches the zip cod column from zipcodes
(as you can see they have different names).
Help on method merge in module pandas.core.frame:
merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False) method of pandas.core.frame.DataFrame instance
Merge DataFrame objects by performing a database-style join operation by
columns or indexes.
The type of merge we will do is an inner
, because we only want rows when the zip codes match. This is called an intersection.
To complete a merge we must specify the column names from the left and right DataFrames. Most of the code has been written for you. Your task is to complete the columns for the merge, replacing ????
with the appropriate column names.
In [ ]:
Reflect upon your experience completing this assignment. This should be a personal narrative, in your own voice, and cite specifics relevant to the activity as to help the grader understand how you arrived at the code you submitted. Things to consider touching upon: Elaborate on the process itself. Did your original problem analysis work as designed? How many iterations did you go through before you arrived at the solution? Where did you struggle along the way and how did you overcome it? What did you learn from completing the assignment? What do you need to work on to get better? What was most valuable and least valuable about this exercise? Do you have any suggestions for improvements?
To make a good reflection, you should journal your thoughts, questions and comments while you complete the exercise.
Keep your response to between 100 and 250 words.
--== Write Your Reflection Below Here ==--