In [1]:
import pandas as pd

Income dataset

https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t

The US Census generates a 'Median Household Income in the Past 12 Months (In 2014 Inflation-Adjusted Dollars)' from data collected by the American Community Survey. This report is based on 5-year estimates, from 2010-2014. The report I pulled is the most recent income-related information available by the US Census.

  • I'm currently reading the data collection methodology for this project, so will update the group about how the data was collected when done.

In [74]:
income = pd.read_excel("../data/unique/ACS_14_5YR_B19013.xls")

In [75]:
income = income.loc[8:]
income.head()


Out[75]:
B19013: MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Universe: Households Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
8 ZCTA5 06390 NaN Estimate NaN 151,333
9 ZCTA5 10001 NaN Estimate NaN 81,445
10 ZCTA5 10001 NaN Margin of Error NaN +/-8,743
11 ZCTA5 10002 NaN Estimate NaN 32,966
12 ZCTA5 10002 NaN Margin of Error NaN +/-2,301

In [76]:
income = income.drop(['Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3'], axis=1)

In [77]:
income = income.rename(columns={'B19013: MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Universe: Households': 'Zip_Code', 'Unnamed: 4': 'Median_Househould_Income', '$b': 'b'})

In [78]:
zips = []
for elem in income['Zip_Code']:
    zips.append(str(elem))

In [79]:
zips2 = []
for elem in zips:
    zips2.append(elem[6:])

In [80]:
income['Zip_Code'] = zips2
income['Zip_Code'] = pd.to_numeric(income['Zip_Code'])

In [81]:
income.head()


Out[81]:
Zip_Code Median_Househould_Income
8 6390.0 151,333
9 10001.0 81,445
10 10001.0 +/-8,743
11 10002.0 32,966
12 10002.0 +/-2,301

In [82]:
income["Zip"] = income["Zip_Code"].dropna().astype('int')
income.drop('Zip_Code', axis=1, inplace=True)

Merged income, zipcode, and station id for final dataframe


In [83]:
stations = pd.read_csv('../data/processed/stations.csv')

In [84]:
# left join of stations and income_zips
stations_income = stations.merge(income, how='inner', on='Zip')

In [85]:
stations_income.head()


Out[85]:
Station_id Station_Name Location Latitude Longitude Zip Median_Househould_Income
0 72 W 52 St & 11 Ave W 52 St & 11 Ave 40.767272 -73.993929 10019 90,174
1 72 W 52 St & 11 Ave W 52 St & 11 Ave 40.767272 -73.993929 10019 +/-8,895
2 173 Broadway & W 49 St Broadway & W 49 St 40.760683 -73.984527 10019 90,174
3 173 Broadway & W 49 St Broadway & W 49 St 40.760683 -73.984527 10019 +/-8,895
4 281 Grand Army Plaza & Central Park S Grand Army Plaza & Central Park S 40.764397 -73.973715 10019 90,174

In [86]:
print(type(stations_income))


<class 'pandas.core.frame.DataFrame'>

In [87]:
print(len(stations_income))


1198

In [89]:
stations_income.to_csv("../data/processed/stations-income.csv")