Real world building list can be generated using parcel map shapefile downloaded from data.detroitmi.gov by clicking the 'export' button near upper right corner of the screen, and then select 'CSV' option.
Each valid row contains a multipolygon constituted by a list of coordinates. From this information, we can construct a rectangle using lowerleft corner and upperright corner derived from the polygon. Each rectangle would then represent a building.
Notes:
One can in principle assume a grid to represent buildings. However, there are two problems with this approach:
Since we have access to building info from data.detroitmi.gov. The real world building info is extracted from that file in combination with the provided .csv files from the course to provide a more meaningful analysis.
In [1]:
import numpy as np
import pandas as pd
In [2]:
parcels = pd.read_csv("../data/Parcel_Map/PARCELS.csv")
In [3]:
parcels.columns
Out[3]:
In [4]:
parcels = parcels[['the_geom', 'PARCELNO','PROPADDR']]
print("original number of parcels: %d" % parcels.shape[0])
In [5]:
parcels.dropna(axis=0,inplace=True)
print("after dropna: %d" % parcels.shape[0])
In [6]:
parcels.iloc[0,0]
Out[6]:
First column ('the_geom') contains coordinates forming a polygon. For simplicity, we can derive a rectangule with lower left corner and upper right corner to represent the area.
In [7]:
parcels.reset_index(inplace=True)
In [8]:
geoms = parcels['the_geom'].astype(str).apply(lambda x: x.split())
geoms = geoms.apply(lambda x: x[1:])
geoms = geoms.apply(lambda x: [x[0].lstrip('[').lstrip('(')]+x[1:-1]+[x[-1].rstrip(']').rstrip(')')])
In [9]:
def extract_lons(list_of_geom):
'''select longitudes from a list of multipolygon'''
np_list_of_geom = np.array(list_of_geom)
np_lons = np_list_of_geom[::2]
if (not np_lons.size):
return []
np_lons = np.char.lstrip(np_lons,'(')
np_lons = np.char.replace(np_lons,'EMPTY','Nan')
np_lons = np_lons.astype(float)
return np_lons
def extract_lats(list_of_geom):
'''select latitudes from a list of multipolygon'''
np_list_of_geom = np.array(list_of_geom)
np_lats = np_list_of_geom[1::2]
if (not np_lats.size):
return []
np_lats = np.char.rstrip(np_lats,',')
np_lats = np.char.rstrip(np_lats,')')
np_lats = np.char.replace(np_lats,'EMPTY','Nan')
np_lats = np_lats.astype(float)
return np_lats
In [10]:
parcels['lons'] = geoms.apply(lambda x: extract_lons(x))
parcels['lats'] = geoms.apply(lambda x: extract_lats(x))
In [11]:
parcels['llcrnrlon'] = parcels['lons'].apply(lambda x: min(x))
parcels['llcrnrlat'] = parcels['lats'].apply(lambda x: min(x))
parcels['urcrnrlon'] = parcels['lons'].apply(lambda x: max(x))
parcels['urcrnrlat'] = parcels['lats'].apply(lambda x: max(x))
parcels['lon'] = parcels['lons'].apply(lambda x: x.mean()) # center of rectangle
parcels['lat'] = parcels['lats'].apply(lambda x: x.mean()) # center of rectangle
parcels.dropna(inplace=True)
In [12]:
parcels['addr'] = parcels['PROPADDR'].apply(lambda x: x.lower())
In [2]:
from matplotlib import pyplot as plt
%matplotlib inline
In [14]:
fig, ax = plt.subplots(1)
ax.scatter(parcels['lon'], parcels['lat'], s = 2, color='darkblue', alpha = 0.1)
ax.set_xlabel('longitude')
ax.set_xlim(min(parcels['lon'])-0.01,max(parcels['lon']+0.01))
ax.set_ylabel('latitude')
ax.set_ylim(min(parcels['lat'])-0.01,max(parcels['lat']+0.01))
plt.title('Distribution of Buildings')
plt.show()
In [15]:
parcels['building_id'] = np.arange(0,parcels.shape[0])
In [16]:
parcels = parcels[['building_id','lon','lat','llcrnrlon','llcrnrlat','urcrnrlon','urcrnrlat','addr','PARCELNO']]
In [17]:
parcels['length'] = parcels['urcrnrlon'] - parcels['llcrnrlon']
parcels['width'] = parcels['urcrnrlat'] - parcels['llcrnrlat']
In [19]:
parcels = parcels[parcels['length'].notnull()]
parcels = parcels[parcels['width'].notnull()]
In [20]:
parcels.reset_index(inplace=True)
In [21]:
parcels.tail()
Out[21]:
In [22]:
parcels.to_csv('../data/buildings_step_0.csv', index=False)
In [3]:
parcels = pd.read_csv('../data/buildings_step_0.csv')
bboxes = parcels[['llcrnrlon','llcrnrlat','urcrnrlon','urcrnrlat']]
bboxes = bboxes.as_matrix()
In [24]:
from bbox import draw_screen_bbox # self-defined helper function
A subset of real buildings, just to show:
In [25]:
fig = plt.figure(figsize=(8,6))
for box in bboxes[0:1000]:
draw_screen_bbox(box, fig)
plt.xlim(-83.08,-83.03)
plt.ylim(42.32,42.375)
plt.show()
In [38]:
lengths = parcels['length'].values
widths = parcels['width'].values
indices = np.random.randint(0,lengths.shape[0],int(lengths.shape[0]/10) - 1)
lengths_sample = lengths[indices]
widths_sample = widths[indices]
In [40]:
fig = plt.figure(figsize=(10,4))
ax1 = plt.subplot2grid((1,2),(0,0))
ax1.hist(lengths_sample,200,facecolor='blue', alpha=0.75)
ax1.set_title("Distribution of Lengths")
ax1.set_xlim(0,0.003)
ax2 = plt.subplot2grid((1,2),(0,1))
ax2.hist(widths_sample,200,facecolor='red', alpha=0.75)
ax2.set_title("Distribution of Widths")
ax2.set_xlim(0,0.0015)
plt.show()
In [50]:
fig = plt.figure(figsize=(10,4))
ax1 = plt.subplot2grid((1,2),(0,0))
ax1.boxplot(lengths,0,"")
ax1.set_title("Distribution of Lengths")
ax1.set_ylim(0,0.001)
ax2 = plt.subplot2grid((1,2),(0,1))
ax2.boxplot(widths,0,"")
ax2.set_title("Distribution of Widths")
ax2.set_ylim(0,0.0008)
plt.show()
The distribution of both lengths of widths of rectangular boundaries are close to their median values. Length distribution is almost unimodal and normal. Widths are slightly right skewed.
In both cases, median values should be sufficient to represent each values
In [51]:
median_length = np.median(lengths)
median_width = np.median(widths)
In [52]:
print("Median of lengths: %.6f" % median_length)
print("Median of widths: %.6f" % median_width)
In [ ]: