In this assignment, you'll analyze a collection of data sets from the San Francisco Open Data Portal and Zillow. The data sets have been stored in the SQLite database sf_data.sqlite, which you can download here. The database contains the following tables:
| Table | Description |
|---|---|
crime |
Crime reports dating back to 2010. |
mobile_food_locations |
List of all locations where mobile food vendors sell. |
mobile_food_permits |
List of all mobile food vendor permits. More details here. |
mobile_food_schedule |
Schedules for mobile food vendors. |
noise |
Noise complaints dating back to August 2015. |
parking |
List of all parking lots. |
parks |
List of all parks. |
schools |
List of all schools. |
zillow |
Zillow rent and housing statistics dating back to 1996. More details here. |
The mobile_food_ tables are explicitly connected through the locationid and permit columns. The other tables are not connected, but you may be able to connect them using dates, latitude/longitude, or postal codes.
Shapefiles for US postal codes are available here. These may be useful for converting latitude/longitude to postal codes.
Shapefiles for San Francisco Neighborhoods are available here.
Exercise 1.1. Which mobile food vendor(s) sells at the most locations?
In [21]:
import pandas as pd
from matplotlib import pyplot as plt
import sqlalchemy as sqla
import sqlite3
%matplotlib inline
plt.style.use('ggplot')
In [22]:
conn0 = sqlite3.connect('sf_data.sqlite')
In [23]:
c = conn0.cursor()
In [24]:
c.execute("""select * from mobile_food_locations""")
c.fetchone()
Out[24]:
In [25]:
c.execute("""select * from mobile_food_permits""")
c.fetchone()
Out[25]:
In [26]:
c.description
Out[26]:
In [31]:
pd.read_sql_query("""select * from mobile_food_permits where Applicant = 'Liang Bai Ping' """, conn0)
Out[31]:
In [299]:
pd.read_sql_query("""select "permit","Status","Applicant","FacilityType",count(*) as "loc.number" from
(select * from mobile_food_schedule join mobile_food_permits
on mobile_food_schedule."permit" = mobile_food_permits."permit")
group by "permit"
order by "loc.number" desc
""", conn0).iloc[0:10]
Out[299]:
In [300]:
pd.read_sql_query("""select "permit","Status","FacilityType","Applicant",count(*) as "loc.number" from
(select * from mobile_food_schedule join mobile_food_permits
on mobile_food_schedule."permit" = mobile_food_permits."permit"
where mobile_food_permits.Status =='APPROVED')
group by "permit"
order by "loc.number" desc
""", conn0).iloc[0:10]
Out[300]:
May's Catering whose permit is 17MFF-0110 sells in most places after it is permitted by the authority.
Parks' Catering whose permit is 16MFF-0051 sells in most places currently.
Exercise 1.2. Ask and use the database to analyze 5 questions about San Francisco. For each question, write at least 150 words and support your answer with plots. Make a map for at least 2 of the 5 questions.
You should try to come up with some questions on your own, but these are examples of reasonable questions:
Please make sure to clearly state each of your questions in your submission.
In [8]:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import scipy.stats as ss
import numpy as np
from matplotlib.patches import Circle, Wedge, Polygon
from matplotlib.colors import LinearSegmentedColormap
import geopandas as gpd
1.2.1-Which parts of the city are the most and least expensive?
I found the the most and least expensive by making a pandas table "avgHprice" that sorted the averaging ZriPerSqft_AllHomes and MedianSoldPricePerSqft_AllHomes in Table Zillow after groupby RegionName. Based on the table "avgHprice", then I made a scatter plot of the four variables in the avgHprice table, and marking "RegionName" to few points that have the max and min values in each variables. Such that, I found that the most expensive "RegionName" by Medium Sold Price is 94104.
However, other data for this region is missing. So considering other variables, the most expensive "RegionName" is 94105, and least expensive "RegionName" are 94124. To check the results, I mapped the avgerage values of four variables in each region, the map also supported results.
Relatively, 94105 have highest ZriPerSqft_AllHomes, MedianSoldPricePerSqft_AllHomes and Turnover, though the PricetoRentRatio is not the highest.
Relatively, 94124 have the lowest ZriPerSqft_AllHomes, MedianSoldPricePerSqft_AllHomes and Turnover, though the PricetoRentRatio is at an average level.
In [5]:
c.execute("""select * from zillow
""")
c.fetchone() #zipcode, time,
Out[5]:
In [10]:
Hprice = pd.read_sql_query("select * from zillow",conn0)
Hprice.describe()
Out[10]:
In [11]:
avgHprice = Hprice.groupby(['RegionName'])['RegionName','ZriPerSqft_AllHomes','MedianSoldPricePerSqft_AllHomes'
,'PriceToRentRatio_AllHomes','Turnover_AllHomes'].mean()
avgHprice.head()
Out[11]:
In [18]:
avgHprice.sort_values(['ZriPerSqft_AllHomes'], ascending=0).head()
Out[18]:
In [19]:
avgHprice.sort_values(['MedianSoldPricePerSqft_AllHomes'], ascending=0).head()
Out[19]:
In [20]:
avgHprice.sort_values(['PriceToRentRatio_AllHomes'], ascending=0).head()
Out[20]:
In [12]:
plt.scatter(avgHprice['ZriPerSqft_AllHomes'], avgHprice['PriceToRentRatio_AllHomes'],
alpha=0.5,
s=30*avgHprice['Turnover_AllHomes'],
c=avgHprice['MedianSoldPricePerSqft_AllHomes'],
label='Turnover_AllHomes')
plt.xlabel('ZriPerSqft_AllHomes')
plt.ylabel('PriceToRentRatio_AllHomes')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15),
fancybox=True, shadow=True, ncol=5)
#plt.colorbar()
plt.title('Comparison of ZriPerSqft, Turnover,\n PriceToRentRatio and MedianSoldPricePerSqft \n in each Region')
cbar = plt.colorbar(ticks = [300,400,500,600,700,800,900])
cbar.ax.set_yticklabels(['Low: 300','400', '500','600', '700', '800','900'])
cbar.set_label(r'MedianSoldPricePerSqft');
for i, txt in zip(range(0,len(avgHprice['RegionName'])),avgHprice['RegionName']):
if i in list([0,19,24,18,4]):
plt.annotate(txt, (avgHprice['ZriPerSqft_AllHomes'].iloc[i],avgHprice['PriceToRentRatio_AllHomes'].iloc[i]+0.5))
In [13]:
avgHprice.iloc[[0,19,24,18,4]]
Out[13]:
In [128]:
def draw_sf_zip(width=2e7,proj='merc'):
figsize = [10,10]; dpi=80
fig = plt.figure(1, figsize = figsize, dpi = dpi)
ax = fig.add_subplot(111)
bmap = Basemap(width=width,height=width,projection=proj,
llcrnrlon=-122.6,llcrnrlat=37.6,urcrnrlon=-122.3,urcrnrlat=37.9,
resolution='h', #f for full
lat_0 = 37.7,
lon_0=-122.4, ax = ax)
bmap.drawcoastlines()
bmap.drawmapboundary()
bmap.readshapefile('cb_2015_us_zcta510_500k','zipcode')
return ax, fig, bmap
In [23]:
def find(lst = bmap.zipcode_info[0:100], key="GEOID10", value=['84536','94929']):
index = []
for i, dic in enumerate(lst):
if dic[key] in value:
index.append(i)
return index
In [24]:
list0=avgHprice['RegionName'].tolist()
list1=[]
for i in list0:
list1.append(str(i))
Zindex = find(lst = bmap.zipcode_info[:], value =list1[:])
In [34]:
ax, fig, bmap = draw_sf_zip()
ax.set_title('SF divided by Zipcode')
list1[4] #list([0,19,24,18,4])
Out[34]:
In [130]:
def mapHprice(value = 'ZriPerSqft_AllHomes'):
table0= avgHprice[[value,'RegionName']]
table0 = table0.dropna()
list0 = avgHprice['RegionName'].tolist()
rank = ss.rankdata(avgHprice[value].tolist())
n = len(rank)
rank = rank/float(n)
ax, fig, zmap = draw_sf_zip()
for (i, z) in zip(range(0,25),Zindex):
pp = Polygon(zmap.zipcode[z],facecolor=(1.,1.-rank[i]**2.,0.),
alpha=.3,zorder=1)
ax.add_patch(pp)
annotx =[]
annoty =[]
if i in list([0,19,24,18,4]): #add annotation to the most the least expensive region
for t in zmap.zipcode[z][:]: #calculate the center of area in one region
annotx.append(t[0])
annoty.append(t[1])
zipx = sum(annotx) / float(len(annotx))
zipy = sum(annoty) / float(len(annoty))
plt.annotate(list1[i], xy=(zipx-950,zipy),size=10,weight = 'black')
plt.title(value + ' in SF')
#plt.show()
In [131]:
mapHprice()
plt.show()
In [132]:
mapHprice('MedianSoldPricePerSqft_AllHomes')
In [133]:
mapHprice('Turnover_AllHomes')
plt.show()
In [ ]:
In [ ]:
1.2.2-Which parts of the city are the most dangerous (and at what times)?
To find the most dangerous parts in the city, I mapped a heatmap of the crime number. As the plot below show, Southern dstrict is the most dangerous place in SF. The crime number is almost 200000. Also, by using the barplot, the crime number in Southern district is much higher than other district in SF.
In [56]:
Crime = pd.read_sql_query("select * from crime",conn0)
In [57]:
Crime.head()
Out[57]:
In [135]:
def draw_sf(width=2e7,proj='merc'):
figsize = [10,10]; dpi=80
fig = plt.figure(1, figsize = figsize, dpi = dpi)
ax = fig.add_subplot(111)
bmap = Basemap(width=width,height=width,projection=proj,
llcrnrlon=-122.53,llcrnrlat=37.67,urcrnrlon=-122.33,urcrnrlat=37.85,
resolution='h', #f for full
lat_0 = 37.5,
lon_0=-122.5)
bmap.drawcoastlines()
bmap.drawmapboundary(fill_color='white')
#bmap.fillcontinents(color='white',lake_color='white')
bmap.readshapefile('geo_export_772ba198-1646-427d-8190-94c324b75364','SFFindNeighborhoods')
return ax, fig, bmap
In [140]:
ax, fig, bmap = draw_sf()
lons = list(Crime['Lon'])
lats = list(Crime['Lat'])
x,y = bmap(lons, lats)
#bmap.plot(x, y, 'o', markersize=5 ,markeredgecolor="none", alpha=0.33)
db = 0.003 # bin padding
lon_bins = np.linspace(min(lons)-db, max(lons)+db, 10+1) # 10 bins
lat_bins = np.linspace(min(lats)-db, max(lats)+db, 13+1) # 13 bins
density, _, _ = np.histogram2d(lats, lons, [lat_bins, lon_bins])
# Turn the lon/lat of the bins into 2 dimensional arrays ready
# for conversion into projected coordinates
lon_bins_2d, lat_bins_2d = np.meshgrid(lon_bins, lat_bins)
# convert the bin mesh to map coordinates:
xs, ys = bmap(lon_bins_2d, lat_bins_2d) # will be plotted using pcolormesh
# ######################################################################
# define custom colormap, white -> nicered, #E6072A = RGB(0.9,0.03,0.16)
cdict = {'red': ( (0.0, 1.0, 1.0),
(1.0, 0.9, 1.0) ),
'green':( (0.0, 1.0, 1.0),
(1.0, 0.03, 0.0) ),
'blue': ( (0.0, 1.0, 1.0),
(1.0, 0.16, 0.0) ) }
custom_map = LinearSegmentedColormap('custom_map', cdict)
plt.register_cmap(cmap=custom_map)
# add histogram squares and a corresponding colorbar to the map:
plt.pcolormesh(xs, ys, density, cmap="custom_map")
cbar = plt.colorbar(orientation='horizontal', shrink=0.625, aspect=20, fraction=0.2,pad=0.02)
cbar.set_label('Number of Crimes',size=18)
#plt.clim([0,100])
bmap.plot(x, y, 'o', markersize=1,#zorder=6,
markerfacecolor='#424FA4',markeredgecolor="none", alpha=0.03)
# http://matplotlib.org/basemap/api/basemap_api.html#mpl_toolkits.basemap.Basemap.drawmapscale
#bmap.drawmapscale(-119-6, 37-7.2, -119-6, 37-7.2, 500, barstyle='fancy', yoffset=20000)
plt.title('Distribution of Crimes in San Francisco')
plt.annotate('Southern', xy=bmap.SFFindNeighborhoods[21][1],
color ='black',weight = 'black', size = 13, alpha =0.7)
plt.show()
In [79]:
Crime.groupby(['PdDistrict']).size().plot(kind = 'Bar')
Out[79]:
As for most dangerous time in SF, I ploted the histogram of crime case time. From the histogram, it is clear that the most dangerous time in a day is around 6pm. Crime case number starts to increase from 5am and increases until 18-19pm, then drops.
In [81]:
times = pd.DatetimeIndex(Crime.Datetime)
grouped = Crime.groupby([times.hour, times.minute])
In [87]:
plt.hist(times.hour, bins = 12)
plt.title('Histogram of Crime Time in One Day')
plt.xlabel('Hours in one Day')
plt.ylabel('Frequancy')
plt.show()
1.2.3 -Are noise complaints and mobile food vendors related?
The noise complaints are not very much related with mobile food vendors.
The barplot of noise complaints type shows that only few complaints is about mobile food vendors. Also, from the heatmap marking the noise number and vendors, we can see that the density of noise complaints are not very high associated with the distribution of vendors, especially in southern part of the city. Though in the areas where there are a lot of complaints, the number of vendors is large. However, it is not the opposite, in areas where there is a lot of vendors, the density of complaints are not always high. So the association between them is not very high.
In [91]:
c.execute("""select * from mobile_food_locations
""")
c.fetchone()
Out[91]:
In [92]:
c.description
Out[92]:
In [5]:
Noise = pd.read_sql_query("select * from noise",conn0)
In [6]:
Noise = Noise.dropna()
In [7]:
Noise.head()
Out[7]:
In [8]:
Noise.groupby(['Type']).size()
Out[8]:
In [10]:
Noise.groupby(['Type']).size().plot(kind='Bar')
Out[10]:
In [99]:
Vendors = pd.read_sql_query("select * from mobile_food_locations",conn0)
In [100]:
Vendors = Vendors.dropna()
In [101]:
Vendors.head()
Out[101]:
In [149]:
ax, fig, bmap = draw_sf()
lons = list(Noise['Lon'])
lats = list(Noise['Lat'])
x,y = bmap(lons, lats) #for density
lons1 = list(Vendors['Longitude'])
lats1 = list(Vendors['Latitude'])
x1,y1 = bmap(lons1, lats1) #for points
db = 0.003 # bin padding
lon_bins = np.linspace(min(lons)-db, max(lons)+db, 10+1) # 10 bins
lat_bins = np.linspace(min(lats)-db, max(lats)+db, 13+1) # 13 bins
density, _, _ = np.histogram2d(lats, lons, [lat_bins, lon_bins])
# Turn the lon/lat of the bins into 2 dimensional arrays ready
# for conversion into projected coordinates
lon_bins_2d, lat_bins_2d = np.meshgrid(lon_bins, lat_bins)
# convert the bin mesh to map coordinates:
xs, ys = bmap(lon_bins_2d, lat_bins_2d) # will be plotted using pcolormesh
# ######################################################################
# define custom colormap, white -> nicered, #E6072A = RGB(0.9,0.03,0.16)
cdict = {'red': ( (0.0, 1.0, 1.0),
(1.0, 0.9, 1.0) ),
'green':( (0.0, 1.0, 1.0),
(1.0, 0.03, 0.0) ),
'blue': ( (0.0, 1.0, 1.0),
(1.0, 0.16, 0.0) ) }
custom_map = LinearSegmentedColormap('custom_map', cdict)
plt.register_cmap(cmap=custom_map)
# add histogram squares and a corresponding colorbar to the map:
plt.pcolormesh(xs, ys, density, cmap="custom_map")
cbar = plt.colorbar(orientation='horizontal', shrink=0.625, aspect=20, fraction=0.2,pad=0.02)
cbar.set_label('Density of Noise Complaints',size=18)
#plt.clim([0,100])
# translucent blue scatter plot of epicenters above histogram:
#x,y = bmap(lons, lats)
bmap.plot(x, y, 'o', markersize=8,zorder=6, markerfacecolor='#990000',markeredgecolor="none", alpha=0.01)
# make image bigger:
#plt.gcf().set_size_inches(15,15)
plt.scatter(x[1], y[1], label = 'Noise Complaints',alpha = 0.5, color = '#990000')
plt.scatter(x1, y1, label = 'Vendors', alpha =0.5, color = '#80FF00')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15),
fancybox=True, shadow=True, ncol=5)
plt.title('Distribution of Noise Complaints and \n Mobile Food Vendor in San Francisco')
plt.show()
1.2.4-What are the best times and places to find food trucks?
To find out the best times and places to find food truck, I created tables of food truck schedule from mobile_food_schedule and mobile_food_locations. Then I selected truck information based on food trucks openning time, from 0-6am, 7-12pm, 13-18pm, and 19-24pm. I ploted the locations of food trucks by their openning time into SF map.
On the map, in places around Potrero Hill and Dogpatch, I can basically find food trucks in all time slots. The marked place on the map is the best places to find food trucks
The best time to find food truck is during 7-12pm during one day.
In [204]:
c.execute("""select * from mobile_food_schedule""")
c.fetchone()
Out[204]:
In [221]:
c.execute("DROP TABLE Truck")
Out[221]:
In [222]:
#create a table of permit, FacolotyType and locationid
c.execute("""CREATE TABLE Truck
as select * from mobile_food_schedule
join mobile_food_permits
on mobile_food_schedule.permit = mobile_food_permits.permit
where mobile_food_permits.FacilityType = 'Truck' and mobile_food_permits.Status = 'APPROVED' """)
c.fetchone()
In [225]:
c.execute("DROP TABLE LocTime1")
Out[225]:
In [226]:
#LocTime1 is vendors open from 19 to 24
c.execute("""CREATE TABLE LocTime1
AS SELECT * FROM Truck
join mobile_food_locations
on Truck.locationid = mobile_food_locations.locationid
where (StartHour >19 and StartHour<=24) or (EndHour >19 and EndHour<=24) or (StartHour <19 and EndHour >=24)
""")
#c.execute("DROP TABLE LocTime1")
Out[226]:
In [227]:
LocTime1 = pd.read_sql("SELECT * FROM LocTime1", conn0)
In [229]:
LocTime1 = LocTime1[(LocTime1.T !=float(0)).all()]
In [230]:
LocTime1.head()
Out[230]:
In [231]:
c.execute("DROP TABLE LocTime2")
Out[231]:
In [232]:
c.execute("DROP TABLE LocTime3")
Out[232]:
In [233]:
c.execute("DROP TABLE LocTime4")
Out[233]:
In [234]:
#LocTime2 is vendors open from 13 to 18
c.execute("""CREATE TABLE LocTime2
AS SELECT * FROM Truck
join mobile_food_locations
on Truck.locationid = mobile_food_locations.locationid
where (StartHour >13 and StartHour<=18) or (EndHour >13 and EndHour<=18) or (StartHour <13 and EndHour >=18)
""")
#c.execute("DROP TABLE LocTime1")
LocTime2 = pd.read_sql("SELECT * FROM LocTime2", conn0)
LocTime2 = LocTime2[(LocTime2.T !=float(0)).all()]
In [235]:
#LocTime3 is vendors open from 7 to 12
c.execute("""CREATE TABLE LocTime3
AS SELECT * FROM Truck
join mobile_food_locations
on Truck.locationid = mobile_food_locations.locationid
where (StartHour >7 and StartHour<=12) or (EndHour >7 and EndHour<=12) or (StartHour <7 and EndHour >=12)
""")
#c.execute("DROP TABLE LocTime1")
LocTime3 = pd.read_sql("SELECT * FROM LocTime3", conn0)
LocTime3 = LocTime3[(LocTime3.T !=float(0)).all()]
In [236]:
#LocTime4 is vendors open from 0 to 6
c.execute("""CREATE TABLE LocTime4
AS SELECT * FROM Truck
join mobile_food_locations
on Truck.locationid = mobile_food_locations.locationid
where (StartHour >0 and StartHour<=6) or (EndHour >0 and EndHour<=6) or (StartHour <0 and EndHour >=6)
""")
#c.execute("DROP TABLE LocTime1")
LocTime4 = pd.read_sql("SELECT * FROM LocTime4", conn0)
LocTime4 = LocTime4[(LocTime4.T !=float(0)).all()]
In [291]:
size = 100
a =0.5
ax, fig, bmap = draw_sf()
lons3 = list(LocTime3['Longitude'])
lats3 = list(LocTime3['Latitude'])
x3,y3 = bmap(lons3, lats3) #for points
#bmap.plot(x1, y1, 'o', markersize=10,#zorder=6, markerfacecolor='#990000',markeredgecolor="none", alpha=0.15)
plt.scatter(x3, y3, label = 'Open in 7-12pm', alpha =a, color = '#990000', s =size)
lons2 = list(LocTime2['Longitude'])
lats2 = list(LocTime2['Latitude'])
x2,y2 = bmap(lons2, lats2) #for points
plt.scatter(x2, y2, label ='Open in 13-18pm', alpha =a, color = '#00ff00', s =size)
lons4 = list(LocTime4['Longitude'])
lats4 = list(LocTime4['Latitude'])
x4,y4 = bmap(lons4, lats4) #for points
plt.scatter(x4, y4, label = 'Open in 0-6am', alpha =a, color = '#0080ff', s =size)
lons1 = list(LocTime1['Longitude'])
lats1 = list(LocTime1['Latitude'])
x1,y1 = bmap(lons1, lats1) #for points
plt.scatter(x1, y1, label = 'Open in 19-24pm', alpha =a, color = '#ffff00', s =size)
plt.annotate('Potrero Hill\n Dogpatch ', xy = (max(x4), y4[1]+700),
color ='black',weight = 'black', size = 13, alpha =0.9)
plt.plot([min(x4)+2000, max(x4)], [y4[1],y4[1]+700], c = 'black')
plt.legend()
plt.title('Distribution of Mobile Food Vendors \n in San Franciscro during one Day')
plt.show()
1.2.5-Is there a relationship between housing prices and any of the other tables?
Relating previous maps to the housing price, it is easy to find that the places where crime rate and the number of noise complaints are low, the housing prices are high. Housing price is not very much related with the the number and availability of vendors. Now let's look the relationship between housing price and other tables.
By plotting the distribution map of parks, schools and parkings on the heatmap of SF housing price, we can see that the housing prices are high in places where there are many parks and schools. The housing prices are relatively low if there are many parkings.
In [314]:
Parks = pd.read_sql("SELECT * FROM parks", conn0).dropna()
In [305]:
c.execute("select * from zillow")
c.description
Out[305]:
In [318]:
Parks['Parks']= pd.Series('NA', index=Parks.index)
Parks.head()
Out[318]:
In [322]:
list(Parks.columns.values)
Out[322]:
In [379]:
def Hprice_factor(value = 'MedianSoldPricePerSqft_AllHomes', factor = Parks, alpha0 = 0.33, size=8):
table0= avgHprice[[value,'RegionName']]
table0 = table0.dropna()
list0 = avgHprice['RegionName'].tolist()
rank = ss.rankdata(avgHprice[value].tolist())
n = len(rank)
rank = rank/float(n)
ax, fig, zmap = draw_sf_zip()
lons = list(factor['Lon'])
lats = list(factor['Lat'])
x,y = zmap(lons, lats) #mark parks
zmap.plot(x, y, 'o', markersize=size ,markerfacecolor='#ff0000' ,
markeredgecolor="none", alpha=0.33)
plt.scatter(x[1], y[1], label = list(factor.columns.values)[-1], alpha =0.8,s =50, color = '#990000')
for (i, z) in zip(range(0,25),Zindex):
pp = Polygon(zmap.zipcode[z],facecolor=(1.,1.-rank[i]**2.,0.),
alpha=.3,zorder=1)
ax.add_patch(pp)
annotx =[]
annoty =[]
if i in list([0,19,24,18,4]): #add annotation to the most the least expensive region
for t in zmap.zipcode[z][:]: #calculate the center of area in one region
annotx.append(t[0])
annoty.append(t[1])
zipx = sum(annotx) / float(len(annotx))
zipy = sum(annoty) / float(len(annoty))
plt.annotate(list1[i], xy=(zipx-950,zipy),size=10,weight = 'black')
plt.legend()
plt.title(value + ' \n and Number of ' + list(factor.columns.values)[-1] + ' in SF')
plt.show()
#return ax, fig, zmap
In [381]:
Hprice_factor()
In [372]:
Schools = pd.read_sql("SELECT * FROM schools", conn0).dropna()
Schools['Schools']= pd.Series('NA', index=Schools.index)
Schools.head()
Out[372]:
In [382]:
Hprice_factor(value = 'MedianSoldPricePerSqft_AllHomes', factor = Schools, alpha0=0.05)
In [366]:
Parkings = pd.read_sql("SELECT * FROM parking", conn0).dropna()
Parkings['Parkings']= pd.Series('NA', index=Parkings.index)
Parkings.head()
Out[366]:
In [383]:
Hprice_factor(factor = Parkings, alpha0 = 0.005, size=5)
In [ ]: