Annie Jankowsi and Jamil Uddin
Detroit has had a lot of media hype in the last 10 years about claims of gentrification. We thought it would be interesting to compare American Community Survey data from three well known neighborhoods in Brooklyn, New York that have been well known for gentrification in the past decade: Williamsburg, Bed Stuy and Bushwick. American Community Survey data was taken in the 2013 census.
In [2]:
import sys # system module
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics module
import datetime as dt # date and time module
import numpy as np # foundation for Pandas
import requests as r
from io import StringIO
import requests, io # internet and input tools
import zipfile as zf # zip file tools
import seaborn as sns
import sys
%matplotlib inline
# check versions
print('Python version: ', sys.version)
print('Pandas version: ', pd.__version__)
print('Today: ', dt.date.today())
In [3]:
# get "response" from url
url = 'http://www2.census.gov/acs2013_1yr/pums/csv_hus.zip'
r = requests.get(url)
# describe response
print('Response status code:', r.status_code)
print('Response type:', type(r))
print('Response .content:', type(r.content))
print('Response headers:\n', r.headers, sep='')
In [4]:
# get "response" from url
url1 = 'http://www2.census.gov/acs2013_1yr/pums/csv_pus.zip'
r1 = requests.get(url1)
# describe response
print('Response status code:', r1.status_code)
print('Response type:', type(r1))
print('Response .content:', type(r1.content))
print('Response headers:\n', r1.headers, sep='')
In [5]:
# convert bytes to zip file for housing data
mlz = zf.ZipFile(io.BytesIO(r.content))
print('Type of zipfile object:', type(mlz))
In [6]:
# convert bytes to zip file for population data
mlz1 = zf.ZipFile(io.BytesIO(r1.content))
print('Type of zipfile object:', type(mlz1))
In [7]:
# what's in the zip file?
mlz.namelist()
Out[7]:
In [8]:
# what's in the zip file?
mlz1.namelist()
Out[8]:
In [9]:
# extract and read csv's. These two files contain the housing data set:
housing1 = pd.read_csv(mlz.open(mlz.namelist()[0]))
housing2 = pd.read_csv(mlz.open(mlz.namelist()[1]))
In [10]:
# extract and read csv's. These two files contain the population data set:
population1 = pd.read_csv(mlz1.open(mlz1.namelist()[0]))
population2 = pd.read_csv(mlz1.open(mlz1.namelist()[1]))
Data dictionary can be found here: http://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt
PUMA info: http://www2.census.gov/geo/pdfs/reference/puma/2010_PUMA_Names.pdf
In [11]:
#Get select columns
smallhousing = housing1[["SERIALNO","ST", "PUMA","NP", "BATH", "BDSP", "BLD", "CONP", "FS", "MRGP", "RMSP", "RNTP", "VACS",
"VALP", "HINCP", "MV", "TAXP", "YBL"]]
smallhousing2 = housing2[["SERIALNO","ST", "PUMA","NP", "BATH", "BDSP", "BLD", "CONP", "FS", "MRGP", "RMSP", "RNTP", "VACS",
"VALP", "HINCP", "MV", "TAXP", "YBL"]]
#Combine both dataframes into a single housing Dataframe:
housingdfs = [smallhousing, smallhousing2]
housingComplete = pd.concat(housingdfs)
#rename columns so we understand what they are:
housingComplete = housingComplete.rename(columns={'ST':'State', 'NP':'Number of Persons','BATH':'Bathrooms','BDSP':'Bedrooms',
'BLD':'Units','CONP':'Condo_Fee','FS':'Food_Stamps?','MRGP':'Mortgage_Pmt',
'RMSP':'Number_Rooms','RNTP':'Monthly_Rent', 'VACS':'Vacancy_Status',
'VALP':'Property_Value','HINCP':'Household_Income', 'MV':'When_Moved',
'TAXP':'Property_Tax', 'YBL':'Year_Built'})
#Get select columns
smallpopulation1 = population1[["SERIALNO","PUMA","ST","DECADE","AGEP","SEX","SCHL"]]
smallpopulation2 = population2[["SERIALNO","PUMA","ST","DECADE","AGEP","SEX","SCHL"]]
#Combine both dataframes into a single population Dataframe:
populationdfs = [smallpopulation1, smallpopulation2]
populationComplete = pd.concat(populationdfs)
populationComplete = populationComplete.rename(columns={'ST':'State','AGEP':'Age'})
In [12]:
#Boolean Logic to refine data for Brooklyn
statecodesNYC = [36] #state code for NY
pumacodesNYC = [4003,4001,4002] #codes for Brooklyn (Bushwick, Bed Stuy, Williamsburg)
smallhousingNYC = housingComplete[(housingComplete['State'].isin(statecodesNYC) & housingComplete['PUMA'].isin(pumacodesNYC))]
smallpopulationNYC = populationComplete[(populationComplete['State'].isin(statecodesNYC)
& populationComplete['PUMA'].isin(pumacodesNYC))]
smallpopulationNYC.shape
Out[12]:
In [13]:
#Boolean Logic to refine data for Detroit
statecodesMI = [26] #state code for MI
pumacodesDetroit = [3209,3210,3211,3212] #codes for Detroit
smallhousingDetroit = housingComplete[(housingComplete['State'].isin(statecodesMI) & housingComplete['PUMA'].isin(pumacodesDetroit))]
smallpopulationDetroit = populationComplete[(populationComplete['State'].isin(statecodesMI)
& populationComplete['PUMA'].isin(pumacodesDetroit))]
smallpopulationDetroit.shape
Out[13]:
In [14]:
#Boolean Logic to refine data for Detroit and Brooklyn
statecodes = [26,36] #state code for Michigan and Brooklyn
pumacodes = [3209,3210,3211,3212,4003,4001,4002] #codes for detroit and Bushwick,Bed Stuy, and Williamsburg
smallhousingAll = housingComplete[(housingComplete['State'].isin(statecodes) & housingComplete['PUMA'].isin(pumacodes))]
smallpopulationAll = populationComplete[(populationComplete['State'].isin(statecodes) & populationComplete['PUMA'].isin(pumacodes))]
smallhousingAll.shape
Out[14]:
In [15]:
smallhousingAll.head()
Out[15]:
In [16]:
smallpopulationAll.head()
Out[16]:
In [17]:
#Merge housing and population files to do analysis on specific variables that are in different dataframes
combo = pd.merge(smallhousingAll, smallpopulationAll, # left and right df's
how='left', # add to left
on='SERIALNO' # link with this variable/column
)
combo1 = combo[["Household_Income","DECADE"]].dropna()
combo2 = combo[["State_x","Household_Income","Age","SEX","SCHL"]].dropna()
In [22]:
comboNYC = pd.merge(smallhousingNYC, smallpopulationNYC, # left and right df's
how='left', # add to left
on='SERIALNO' # link with this variable/column
)
comboNYC1 = comboNYC[["Household_Income","DECADE"]].dropna()
comboNYC2 = comboNYC[["State_x","Household_Income","Age","SEX","SCHL"]].dropna()
In [29]:
comboDetroit = pd.merge(smallhousingDetroit, smallpopulationDetroit, # left and right df's
how='left', # add to left
on='SERIALNO' # link with this variable/column
)
comboDetroit1 = comboDetroit[["Household_Income","DECADE"]].dropna()
comboDetroit2 = comboDetroit[["State_x","Household_Income","Age","SEX","SCHL"]].dropna()
These two scatter plots below compare Detroit and Brooklyn residents income by age. Income in Brooklyn is clearly much higher than in Detroit. These three neighborhoods in Brooklyn clearly have more residents as well, indicating Detroit is less densly populated.
In [19]:
g4 = sns.FacetGrid(combo2, col="State_x")
g4.map(sns.regplot, "Age", "Household_Income")
plt.xlim(0, 100)
plt.ylim(0, 1600000)
Out[19]:
The below joint plots indicate a younger population base in Brooklyn vs. Detroit, with a higher income range as well.
In [31]:
#combo2.plot.scatter("Age","Household_Income")
#Joint Plot showing Age vs. Household Income in NYC
g1 = sns.jointplot("Age","Household_Income", data=comboNYC2, kind="reg",
xlim=(0, 100), ylim=(0, 1600000), color="r", size=7)
In [32]:
#combo2.plot.scatter("Age","Household_Income")
#Joint Plot showing Age vs. Household Income in Detroit
g2 = sns.jointplot("Age","Household_Income", data=comboDetroit2, kind="reg",
xlim=(0, 100), ylim=(0, 1600000), color="r", size=7)
The below plot indicates the residents that moved to Detroit by decade and their respective incomes. A 1 indicates the residents moved in before 1950 and the numbers increase to 7 which is later than 2000.
In [33]:
combo1.plot.scatter("Household_Income","DECADE")
Out[33]:
The below plot indicates the residents that moved to Williamsburg, Bed Stuy, and Bushwick by decade and their respective incomes.
In [54]:
comboNYC.plot.scatter("Household_Income","DECADE")
Out[54]:
The below histogram demonstrates the number of people in Detroit that are paying different levels of mortgage payments.
In [35]:
fig, ax = plt.subplots()
sns.kdeplot(smallhousingDetroit["Mortgage_Pmt"], ax=ax)
ax.hist(smallhousingDetroit["Mortgage_Pmt"].dropna(), bins=25, alpha=0.25, normed=True, label="Mortgage Payment")
ax.legend()
fig.suptitle("Histogram of Mortgage Payments DETROIT")
plt.show()
The below histogram demonstrates the number of people in the three Brooklyn neighborhoods that are paying different levels of mortgage payments.
In [36]:
fig, ax = plt.subplots()
sns.kdeplot(smallhousingNYC["Mortgage_Pmt"], ax=ax)
ax.hist(smallhousingNYC["Mortgage_Pmt"].dropna(), bins=25, alpha=0.25, normed=True, label="Mortgage Payment")
ax.legend()
fig.suptitle("Histogram of Mortgage Payments BROOKLYN")
plt.show()
The histograms below reflect the number of people at varying income levels for Detroit ont he left and Brooklyn on the right. The concentration of lower income residents in Detroit is much higher.
In [38]:
g2 = sns.FacetGrid(smallhousingAll, col="State")
g2.map(sns.distplot, "Household_Income")
plt.xlim(0, 250000)
#plt.ylim(0, .00003)
Out[38]:
The below plot compares the number of people who own homes with varying numbers of bedrooms at different mortgage payment levels on food stamps (1) and not on food stamps(2) in Detroit.
In [39]:
sns.set(style="whitegrid")
# Draw a nested barplot to show mortgage payments by number of bedrooms, for people who have or don't have foodstamps
g = sns.factorplot(x="Bedrooms", y="Mortgage_Pmt", hue="Food_Stamps?", data=smallhousingDetroit,
size=6, kind="bar", palette="muted")
g.despine(left=True)
g.set_ylabels("Mortgage Payment")
Out[39]:
The below plot compares the number of people who own homes with varying numbers of bedrooms at different mortgage payment levels on food stamps (1) and not on food stamps (2) in WIlliamsburg, Bushwick and Bed Stuy. There are a higher level of residents in Detroit than Brooklyn on foodstamps
In [40]:
sns.set(style="whitegrid")
# Draw a nested barplot to show mortgage payments by number of bedrooms, for people who have or don't have foodstamps
g = sns.factorplot(x="Bedrooms", y="Mortgage_Pmt", hue="Food_Stamps?", data=smallhousingNYC,
size=6, kind="bar", palette="muted")
g.despine(left=True)
g.set_ylabels("Mortgage Payment")
Out[40]:
The below bar chart demonstrates how rent levels vary by the number of bedrooms for a Detroit housing unit.
In [41]:
averageRent = smallhousingAll[["Monthly_Rent","Bedrooms"]]
fig1 = pd.pivot_table(averageRent,index=["Bedrooms"])
fig1.plot(kind = "bar")
Out[41]:
The below bar chart demonstrates how rent levels vary by the number of bedrooms for a Brooklyn housing unit.
In [42]:
averageRent = smallhousingNYC[["Monthly_Rent","Bedrooms"]]
fig1 = pd.pivot_table(averageRent,index=["Bedrooms"])
fig1.plot(kind = "bar")
Out[42]:
The below histogram indicates how many buildings were built in these time periods in Brooklyn. The yearly data after 2005 indicates more buildings are being built per year in the last decade.
01 .1939 or earlier
02 .1940 to 1949
03 .1950 to 1959
04 .1960 to 1969
05 .1970 to 1979
06 .1980 to 1989
07 .1990 to 1999
08 .2000 to 2004
09 .2005
10 .2006
11 .2007
12 .2008
13 .2009
14 .2010
15 .2011
16 .2012
17 .2013
In [43]:
fig, ax = plt.subplots()
sns.kdeplot(smallhousingNYC["Year_Built"], ax=ax)
ax.hist(smallhousingNYC["Year_Built"].dropna(), bins=25, alpha=0.25, normed=True, label="Year Built")
ax.legend()
fig.suptitle("Histogram of Building Age BROOKLYN")
plt.show()
The below histogram indicates how many buildings were built in these time periods in Detroit. The number of new builds is much lower in Detroit than Brooklyn.
01 .1939 or earlier
02 .1940 to 1949
03 .1950 to 1959
04 .1960 to 1969
05 .1970 to 1979
06 .1980 to 1989
07 .1990 to 1999
08 .2000 to 2004
09 .2005
10 .2006
11 .2007
12 .2008
13 .2009
14 .2010
15 .2011
16 .2012
17 .2013
In [44]:
fig, ax = plt.subplots()
sns.kdeplot(smallhousingDetroit["Year_Built"], ax=ax)
ax.hist(smallhousingDetroit["Year_Built"].dropna(), bins=25, alpha=0.25, normed=True, label="Year Built")
ax.legend()
fig.suptitle("Histogram of Building Age DETROIT")
plt.show()