May 2017
Written by Divya Sasidharan at NYU Stern
Contact: ds5151@nyu.edu
Real estate is an active area in both investment market and academic research. Foreclosure auction properties are special properties that have different transaction and payment procedures from the regular sales, and the home buyers take much higher risk. In this project, we built a scalable analytics workflow to extract, clean and analyze foreclosure property data from multiple sources using Python tools and lib. We analyzed the correlation of foreclosure property price with crime score, walk score and school rating. We have that crime and school is major factor influencing the property price and demand.
In [3]:
import pandas as pd # Importing necessary data package
import matplotlib.pyplot as plt # pyplot module
import numpy as np
In [8]:
Zillow = pd.ExcelFile("Properties_philly_Kraggle_v2.xlsx")
zz = Zillow.parse('Properties_philly_Kraggle_v2')
zz
Out[8]:
In [9]:
print('Dimensions: ', zz.shape) # looking at the categories I can work with
In [10]:
print('Column labels: ', zz.columns) #Listing out the Column headings
In [11]:
print('Row labels: ', zz.index)
In [12]:
z = zz.dropna() #Dropped empty rows that does have any data
print('Dimensions: ', z.shape)
Following are the observation based on analyzing average House price by Crime rate, walkscore and school score against postal code. The data that affected the price change most was crime for the city of Philadelphia. School rating also influenced the price of the property. Walkscore did not have a strong impact it was overridden by crime score but it had combined effect at the city center.
In [13]:
plt.scatter(z['Postal Code'], z[' Violent Crime Rate '])
plt.show()
In [14]:
plt.scatter(z['Postal Code'], z[' Avg Walk&Transit score '])
plt.show()
In [44]:
plt.scatter(z['Postal Code'], z[' School Score '])
plt.show()
In [45]:
plt.scatter(z['Postal Code'], z['Sale Price/bid price'])
plt.show()
In [46]:
z1 = pd.DataFrame(z, columns = ['Address',
'Zillow Address',
'Sale Date',
'Opening Bid',
'Sale Price/bid price',
'Book/Writ',
'OPA',
'Postal Code',
'Attorney',
'Ward',
'Seller',
'Buyer',
'Sheriff Cost',
'Advertising',
'Other',
'Record Deed',
'Water',
'PGW',
' Avg Walk&Transit score ',
' Violent Crime Rate ',
' School Score ',
'Zillow Estimate',
'Rent Estimate',
'taxAssessment',
'yearBuilt',
'finished \n(SqFt)',
' bathrooms ',
' bedrooms ',
'PropType',
'Average comps'])
In [47]:
z1['Avg. Price'] = z1[['Zillow Estimate', 'taxAssessment', 'Average comps']].mean(axis=1)
In [48]:
list(z1)
Out[48]:
In [49]:
z1['Avg. Price'].head()
Out[49]:
In [50]:
z1['Avg. price per sq. ft'] = z1['Avg. Price']/z1['finished \n(SqFt)']
In [51]:
z1['Avg. price per sq. ft'].head()
Out[51]:
In [52]:
z1[' Violent Crime Rate '].median()
Out[52]:
In [53]:
z1[' Violent Crime Rate '].min()
Out[53]:
In [54]:
z1[' Violent Crime Rate '].max()
Out[54]:
In [55]:
crimerank = []
for row in z1[' Violent Crime Rate ']:
if row<0.344:
crimerank.append(1)
elif row>=0.344 and row<0.688:
crimerank.append(2)
elif row>=0.688 and row<1.032:
crimerank.append(3)
elif row>=1.032 and row<1.376:
crimerank.append(4)
else:
crimerank.append(5)
z1['Crime Rank'] = crimerank
In [56]:
z1['Crime Rank'].head()
Out[56]:
In [57]:
zcrime = z1.groupby(['Crime Rank'])['Avg. Price'].mean()
zcrime
Out[57]:
There are several factors that influences the house value and the most important is the crime for the city of Philadelphia. The same factor may not be the most influential factor in other cities for example, Manhattan, NY. The following chart shows that higher the rank (lower crime) higher the price of the property.
In [58]:
plt.figure(figsize = (16,7)) # plotting our data
plt.plot(zcrime, color = 'red', marker= '*')
plt.suptitle('Average price by crime', fontsize=18)
plt.xlabel('Crime Rank', fontsize=12)
plt.ylabel('Average price of houses', fontsize=12)
plt.show()
In [59]:
Walkrank = []
for row in z1[' Avg Walk&Transit score ']:
if row>88:
Walkrank.append(1)
elif row>77 and row<=88:
Walkrank.append(2)
elif row>66 and row<=77:
Walkrank.append(3)
elif row>55 and row<=66:
Walkrank.append(4)
else:
Walkrank.append(5)
z1['Walk Rank'] = Walkrank
In [60]:
z1['Walk Rank'].head()
Out[60]:
In [61]:
zwalk = z1.groupby(['Walk Rank'])['Avg. Price'].mean()
zwalk
Out[61]:
Philadelphia has the 6th best walkscore after New York, Jersey City, San Francisco, Boston and Newark. Analysis carried out by John I. Gilderbloom, William W. Riggs and Wesley L. Meares shows that walkability has a positive impact not only on neighborhood housing valuation but also on reduction of crime and foreclosure[8]. Our research analysis of Philadelphia did not find strong correlation between Walkscore and property price except in the city center.
In [62]:
plt.figure(figsize = (16,7)) # plotting our data
plt.plot(zwalk, color = 'blue', marker= '*')
plt.suptitle('Average price by walkrank', fontsize=18)
plt.xlabel('Walk Rank', fontsize=12)
plt.ylabel('Average price of houses', fontsize=12)
plt.show()
In [63]:
Schoolrank = []
for row in z1[' School Score ']:
if row>57.308:
Schoolrank.append(1)
elif row>43.816 and row<=57.308:
Schoolrank.append(2)
elif row>30.324 and row<=43.816:
Schoolrank.append(3)
elif row>16.832 and row<=30.324:
Schoolrank.append(4)
else:
Schoolrank.append(5)
z1['School Rank'] = Schoolrank
In [64]:
zschool = z1.groupby(['School Rank'])['Avg. Price'].mean()
zschool
Out[64]:
In [65]:
plt.figure(figsize = (16,7)) # plotting our data
plt.plot(zschool, color = 'blue', marker= '*')
plt.suptitle('Average price by schoolrank', fontsize=18)
plt.xlabel('School Rank', fontsize=12)
plt.ylabel('Average price of houses', fontsize=12)
plt.show()
In [66]:
z1.head()
Out[66]:
z1['Avg. price per sq. ft'] = z1['Avg. Price']/z1['finished \n(SqFt)']
In [69]:
z1['Closing Cost'] = z1['Avg. Price']*.085
In [70]:
z1['Rehab Cost'] = z1['finished \n(SqFt)']*25
In [71]:
z1['Estimated Max Bid Price']= z1['Avg. Price']-z1['Rehab Cost']
In [72]:
z1.head()
Out[72]:
Based on the analysis founding we used crime as our core factor to eliminate properties that are not good for investment. By only taking properties in the range below the crime median. Then making an assumption that all property needs rehab. including the average tax, closing and rehab cost we came up with a safe ceiling bid price, which we compared with the actual sales price. Most of them were below the ceiling price. Note ceiling price can be adjusted based on the percentage of profit required, if it’s an investment property
In [ ]: