This is my fourth attempt at creating a model using sklearn alogithms

In this iteration of analysis we'll be looking at breaking out categorical varaibles and making them binary, and seeing if that makes our model more accurate.

My last three attempts at this are below:

https://github.com/rileyrustad/CLCrawler/blob/master/First_Analysis.ipynb

https://github.com/rileyrustad/CLCrawler/blob/master/Second_Analysis.ipynb

https://github.com/rileyrustad/CLCrawler/blob/master/Third_Analysis.ipynb

Start with the Imports



In [1]:

    
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import json
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline









    



/Users/mac28/anaconda/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))

Load the data from our JSON file.

The data is stored as a dictionary of dictionaries in the json file. We store it that way beacause it's easy to add data to the existing master data file. Also, I haven't figured out how to get it in a database yet.



In [3]:

    
with open('/Users/mac28/src/pdxapartmentfinder/pipeline/data/MasterApartmentData.json') as f:
    my_dict = json.load(f)
dframe = DataFrame(my_dict)

dframe = dframe.T
dframe.describe()









    Out[3]:






  
    
      
      available
      bath
      bed
      cat
      content
      date
      dog
      feet
      furnished
      getphotos
      ...
      housingtype
      lastseen
      lat
      laundry
      long
      parking
      price
      smoking
      time
      wheelchair
    
  
  
    
      count
      9772
      37393
      38393
      38683
      38683
      38683
      38683
      34045
      281
      38683
      ...
      38663
      2792
      36640.0000
      36138
      36640.0000
      28827
      38436
      23750
      38683
      7596
    
    
      unique
      204
      17
      9
      2
      3981
      137
      2
      1548
      1
      24
      ...
      11
      2
      6744.0000
      5
      6918.0000
      7
      2119
      1
      25586
      1
    
    
      top
      jun 01
      1
      1
      1
      967
      2016-05-16
      1
      700
      furnished
      8
      ...
      apartment
      2016-05-27
      45.5142
      w/d in unit
      -122.6854
      off-street parking
      995
      no smoking
      10:57:57
      wheelchair accessible
    
    
      freq
      574
      25752
      14689
      26188
      219
      510
      24603
      680
      281
      3818
      ...
      30178
      2601
      437.0000
      23491
      398.0000
      9755
      844
      23750
      8
      7596
    
  

4 rows × 21 columns

Clean up the data a bit

Right now the 'shared' and 'split' are included in number of bathrooms. If I were to convert that to a number I would consider a shared/split bathroom to be half or 0.5 of a bathroom.



In [3]:

    
dframe.bath = dframe.bath.replace('shared',0.5)
dframe.bath = dframe.bath.replace('split',0.5)

Let's get a look at what the prices look like

To visualize it we need to get rid of null values. I haven't figured out the best way to clean this up yet. For now I'm going to drop any rows that have a null value, though I recognize that this is not a good analysis practice. We ended up dropping ~15% of data points.

😬

Also there were some CRAZY outliers, and this analysis is focused on finding a model for apartments for the 99% of us that can't afford crazy extravigant apartments



In [4]:

    
df = dframe[dframe.price < 10000][['bath','bed','feet','price']].dropna()
sns.distplot(df.price)









    Out[4]:





<matplotlib.axes._subplots.AxesSubplot at 0x115ba93d0>



In [5]:

    
data = dframe[dframe.lat > 45.4][dframe.lat < 45.6][dframe.long < -122.0][dframe.long > -123.5]
plt.figure(figsize=(15,10))
plt.scatter(data = data, x = 'long',y='lat')









    



/Users/mac28/anaconda/lib/python2.7/site-packages/pandas/core/frame.py:1997: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  "DataFrame index.", UserWarning)






    Out[5]:





<matplotlib.collections.PathCollection at 0x11ab1c750>

It looks like Portland!!!

Let's cluster the data. Start by creating a list of [['lat','long'], ...]



In [6]:

    
XYdf = dframe[dframe.lat > 45.4][dframe.lat < 45.6][dframe.long < -122.0][dframe.long > -123.5]
data = [[XYdf['lat'][i],XYdf['long'][i]] for i in XYdf.index]

We'll use K Means Clustering because that's the clustering method I recently learned in class! There may be others that work better, but this is the tool that I know



In [9]:

    
from sklearn.cluster import KMeans
km = KMeans(n_clusters=40)
km.fit(data)
neighborhoods = km.cluster_centers_



In [8]:

    
%pylab inline
figure(1,figsize=(20,12))
plot([row[1] for row in data],[row[0] for row in data],'b.')
for i in km.cluster_centers_:  
    plot(i[1],i[0], 'g*',ms=25)
'''Note to Riley: come back and make it look pretty'''









    



Populating the interactive namespace from numpy and matplotlib






    



WARNING: pylab import has clobbered these variables: ['f']
`%matplotlib` prevents importing * from pylab and numpy






    Out[8]:





'Note to Riley: come back and make it look pretty'

We chose our neighborhoods!

I've found that every once in a while the centers end up in different points, but are fairly consistant. Now let's process our data points and figure out where the closest neighborhood center is to it!



In [10]:

    
neighborhoods = neighborhoods.tolist()
for i in enumerate(neighborhoods):
    i[1].append(i[0])
print neighborhoods









    



[[45.552123420408165, -122.67131269489796, 0], [45.51092562287105, -122.45307534793187, 1], [45.506688980830674, -122.62617021405751, 2], [45.51290160229575, -122.68343233254558, 3], [45.587227178666666, -122.743129376, 4], [45.52450939197531, -122.55208463888889, 5], [45.452784070281126, -122.71596119076305, 6], [45.47244749687109, -122.56570525907384, 7], [45.51805015181518, -122.50170698019802, 8], [45.54218268181818, -122.83868146363636, 9], [45.48445444117647, -122.40939808823529, 10], [45.532534597305386, -122.70121312874251, 11], [45.478594089908256, -122.67784825504587, 12], [45.57852603030303, -122.68672382386364, 13], [45.534753504540866, -122.63718108476287, 14], [45.486881816, -122.6372868544, 15], [45.529539962154296, -122.6856359297671, 16], [45.49200958955224, -122.7456509925373, 17], [45.51650509148265, -122.40908712513144, 18], [45.515484257077276, -122.64211889671002, 19], [45.539150457831326, -122.77935245783132, 20], [45.545689939759036, -122.59564720783132, 21], [45.55325973846154, -122.53604236153846, 22], [45.52118124132232, -122.60366510082645, 23], [45.503803229067934, -122.57324186729858, 24], [45.563277824833705, -122.64478785809312, 25], [45.49269709178744, -122.5121487294686, 26], [45.5548645, -123.14423149999999, 27], [45.53120244650206, -122.4440400144033, 28], [45.46414142290749, -122.64893099559471, 29], [45.498214272401434, -122.67299030376344, 30], [45.490130411504424, -122.46661110176991, 31], [45.54266479098361, -122.49373927459017, 32], [45.52965521960785, -122.65906658529411, 33], [45.482160706632655, -122.60198717091836, 34], [45.49606003015075, -122.693913400335, 35], [45.414157103448275, -122.61978655172415, 36], [45.4689539245283, -122.8112003018868, 37], [45.59738002307692, -122.66026826153846, 38], [45.52132444802495, -122.69402066008315, 39]]

Create a function that will label each point with a number coresponding to it's neighborhood



In [11]:

    
def clusterer(X, Y,neighborhoods):
    neighbors = []
    for i in neighborhoods:
        distance = ((i[0]-X)**2 + (i[1]-Y)**2)
        neighbors.append(distance)
    closest = min(neighbors)
    return neighbors.index(closest)



In [12]:

    
neighborhoodlist = []
for i in dframe.index:
    neighborhoodlist.append(clusterer(dframe['lat'][i],dframe['long'][i],neighborhoods))
dframe['neighborhood'] = neighborhoodlist



In [13]:

    
dframe









    Out[13]:






  
    
      
      bath
      bed
      cat
      content
      date
      dog
      feet
      getphotos
      hasmap
      housingtype
      lat
      laundry
      long
      parking
      price
      smoking
      time
      wheelchair
      neighborhood
    
  
  
    
      5399866740
      1
      1
      0
      754
      2016-01-12
      0
      750
      8
      0
      apartment
      NaN
      w/d in unit
      NaN
      off-street parking
      1400
      NaN
      12:22:07
      NaN
      0
    
    
      5401772970
      1
      1
      1
      2632
      2016-01-13
      1
      659
      7
      1
      apartment
      45.531
      w/d in unit
      -122.664
      attached garage
      1350
      no smoking
      16:24:11
      wheelchair accessible
      33
    
    
      5402562933
      1.5
      NaN
      0
      1001
      2016-01-14
      0
      1
      2
      1
      apartment
      45.5333
      laundry on site
      -122.709
      carport
      1500
      no smoking
      09:12:40
      NaN
      11
    
    
      5402607488
      1
      2
      0
      2259
      2016-01-14
      0
      936
      12
      1
      condo
      45.5328
      w/d in unit
      -122.699
      attached garage
      1995
      NaN
      09:36:16
      NaN
      11
    
    
      5402822514
      1
      1
      0
      1110
      2016-01-14
      0
      624
      16
      1
      apartment
      45.5053
      w/d in unit
      -122.618
      street parking
      1495
      NaN
      11:31:03
      NaN
      2
    
    
      5402918870
      2.5
      3
      0
      1318
      2016-01-14
      0
      1684
      22
      1
      townhouse
      45.602
      w/d in unit
      -122.667
      attached garage
      1800
      no smoking
      12:24:52
      NaN
      38
    
    
      5403011764
      1
      1
      1
      1649
      2016-01-14
      1
      750
      14
      1
      apartment
      45.5555
      w/d hookups
      -122.658
      street parking
      1340
      NaN
      13:19:56
      NaN
      0
    
    
      5403019783
      1
      1
      1
      1324
      2016-01-14
      1
      640
      5
      1
      apartment
      45.5198
      laundry on site
      -122.687
      street parking
      1095
      no smoking
      13:24:43
      NaN
      39
    
    
      5403242242
      1
      0
      1
      2862
      2016-01-14
      1
      NaN
      12
      1
      apartment
      45.5413
      laundry on site
      -122.676
      street parking
      1235
      no smoking
      15:54:38
      NaN
      0
    
    
      5403320258
      1.5
      3
      0
      1598
      2016-01-14
      0
      1200
      17
      1
      house
      45.4079
      w/d in unit
      -122.762
      attached garage
      1725
      no smoking
      16:58:11
      wheelchair accessible
      6
    
    
      5404034182
      2
      2
      1
      4880
      2016-01-15
      1
      1010
      19
      1
      apartment
      45.464
      w/d in unit
      -122.642
      street parking
      1995
      no smoking
      08:57:41
      NaN
      29
    
    
      5404362542
      1
      2
      1
      1662
      2016-01-15
      1
      850
      8
      1
      apartment
      45.5664
      laundry on site
      -122.696
      off-street parking
      1395
      NaN
      11:53:28
      NaN
      13
    
    
      5404431092
      1
      1
      1
      1877
      2016-01-15
      1
      700
      14
      1
      apartment
      45.5855
      w/d in unit
      -122.732
      off-street parking
      1195
      NaN
      12:32:22
      NaN
      4
    
    
      5404439790
      1
      2
      1
      1860
      2016-01-15
      1
      900
      14
      1
      apartment
      45.5855
      w/d in unit
      -122.732
      off-street parking
      1395
      NaN
      12:37:14
      NaN
      4
    
    
      5404442485
      1
      1
      1
      1435
      2016-01-15
      1
      700
      11
      1
      apartment
      45.5855
      w/d in unit
      -122.732
      off-street parking
      1195
      NaN
      12:38:46
      NaN
      4
    
    
      5404447075
      1
      2
      1
      2603
      2016-01-15
      1
      850
      24
      1
      apartment
      45.4784
      w/d in unit
      -122.609
      street parking
      1395
      NaN
      12:41:25
      NaN
      34
    
    
      5404478114
      1
      2
      1
      2375
      2016-01-15
      1
      800
      17
      1
      apartment
      45.61
      w/d in unit
      -122.73
      off-street parking
      1295
      NaN
      12:59:13
      NaN
      4
    
    
      5404512932
      1
      2
      1
      2564
      2016-01-15
      1
      850
      23
      1
      apartment
      45.4784
      w/d in unit
      -122.609
      off-street parking
      1395
      NaN
      13:19:34
      NaN
      34
    
    
      5404543909
      1
      2
      0
      2626
      2016-01-15
      0
      825
      24
      1
      apartment
      45.4784
      w/d in unit
      -122.609
      off-street parking
      1395
      NaN
      13:37:48
      NaN
      34
    
    
      5404549721
      1
      1
      0
      722
      2016-01-15
      0
      650
      4
      1
      apartment
      45.5627
      w/d in unit
      -122.64
      street parking
      1150
      NaN
      13:41:21
      NaN
      25
    
    
      5404650486
      1
      2
      0
      3193
      2016-01-15
      0
      1000
      18
      1
      apartment
      45.5353
      laundry in bldg
      -122.643
      detached garage
      1695
      no smoking
      14:44:29
      NaN
      14
    
    
      5404727169
      1.5
      3
      0
      2625
      2016-01-15
      0
      1589
      15
      1
      house
      45.504
      w/d in unit
      -122.649
      off-street parking
      2795
      no smoking
      15:38:48
      NaN
      19
    
    
      5404782936
      1
      2
      0
      879
      2016-01-15
      0
      NaN
      10
      1
      duplex
      45.5193
      w/d hookups
      -122.6
      attached garage
      1450
      NaN
      16:22:51
      NaN
      23
    
    
      5404834100
      NaN
      0
      0
      1695
      2016-01-15
      0
      NaN
      2
      0
      apartment
      NaN
      NaN
      NaN
      NaN
      1300
      NaN
      17:06:13
      NaN
      0
    
    
      5404855534
      2
      2
      0
      2465
      2016-01-15
      0
      875
      6
      1
      apartment
      45.5218
      laundry in bldg
      -122.686
      NaN
      1695
      no smoking
      17:26:31
      NaN
      16
    
    
      5404994129
      1
      2
      0
      699
      2016-01-15
      0
      NaN
      0
      1
      apartment
      45.4941
      NaN
      -122.399
      NaN
      1100
      NaN
      19:56:47
      NaN
      10
    
    
      5404995096
      1
      2
      0
      678
      2016-01-15
      0
      NaN
      0
      1
      apartment
      45.4941
      NaN
      -122.399
      NaN
      1100
      NaN
      19:58:02
      NaN
      10
    
    
      5405638400
      1
      1
      1
      2163
      2016-01-16
      1
      298
      16
      1
      apartment
      45.5413
      laundry in bldg
      -122.676
      street parking
      1235
      no smoking
      10:05:11
      NaN
      0
    
    
      5405717413
      1.5
      2
      0
      1088
      2016-01-16
      0
      1172
      8
      1
      apartment
      45.5318
      w/d hookups
      -122.421
      attached garage
      1295
      no smoking
      10:48:07
      NaN
      18
    
    
      5407108078
      1
      1
      0
      2319
      2016-01-17
      0
      881
      21
      1
      condo
      45.5314
      w/d in unit
      -122.677
      detached garage
      2100
      no smoking
      11:46:56
      NaN
      16
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      5556757532
      2
      2
      0
      2530
      2016-04-25
      0
      1079
      10
      1
      apartment
      45.5473
      w/d in unit
      -122.441
      carport
      1370
      NaN
      17:34:15
      NaN
      28
    
    
      5556760594
      2
      2
      0
      663
      2016-04-25
      0
      1160
      4
      1
      apartment
      45.4352
      w/d in unit
      -122.723
      attached garage
      1700
      no smoking
      17:36:37
      NaN
      6
    
    
      5556761697
      1
      1
      1
      2480
      2016-04-25
      1
      NaN
      21
      1
      apartment
      45.5002
      w/d in unit
      -122.672
      off-street parking
      1485
      no smoking
      17:37:29
      NaN
      30
    
    
      5556762484
      1
      1
      0
      1064
      2016-04-25
      0
      751
      5
      1
      apartment
      45.5742
      w/d in unit
      -122.684
      off-street parking
      1175
      NaN
      17:38:06
      NaN
      13
    
    
      5556763065
      2
      2
      1
      1391
      2016-04-25
      1
      966
      16
      1
      apartment
      45.5224
      w/d in unit
      -122.655
      NaN
      2439
      NaN
      17:38:32
      NaN
      33
    
    
      5556765449
      2
      3
      1
      2070
      2016-04-25
      1
      1320
      24
      1
      townhouse
      45.4762
      w/d in unit
      -122.56
      attached garage
      2000
      no smoking
      17:40:22
      NaN
      7
    
    
      5556765534
      1
      1
      1
      2101
      2016-04-25
      1
      768
      8
      0
      apartment
      NaN
      w/d in unit
      NaN
      off-street parking
      1745
      no smoking
      17:40:25
      NaN
      0
    
    
      5556765620
      1
      0
      1
      2465
      2016-04-25
      1
      810
      6
      1
      apartment
      45.5266
      w/d in unit
      -122.679
      NaN
      1700
      NaN
      17:40:29
      NaN
      16
    
    
      5556768834
      1
      2
      1
      2557
      2016-04-25
      1
      860
      6
      1
      apartment
      45.7655
      laundry on site
      -122.893
      off-street parking
      1170
      NaN
      17:42:56
      NaN
      9
    
    
      5556771928
      1
      1
      1
      2963
      2016-04-25
      1
      810
      21
      1
      apartment
      45.5333
      w/d in unit
      -122.684
      attached garage
      2064
      no smoking
      17:45:18
      wheelchair accessible
      16
    
    
      5556775928
      1
      2
      0
      693
      2016-04-25
      0
      1060
      16
      1
      duplex
      45.5342
      w/d hookups
      -122.589
      attached garage
      1595
      no smoking
      17:48:22
      NaN
      21
    
    
      5556776523
      1
      2
      1
      1968
      2016-04-25
      1
      892
      17
      1
      apartment
      45.5183
      w/d in unit
      -122.695
      off-street parking
      2230
      no smoking
      17:48:48
      NaN
      39
    
    
      5556780176
      1
      1
      1
      2541
      2016-04-25
      1
      722
      9
      1
      apartment
      45.5308
      w/d in unit
      -122.683
      off-street parking
      1799
      no smoking
      17:51:37
      wheelchair accessible
      16
    
    
      5556782322
      3.5
      3
      1
      2018
      2016-04-25
      1
      1800
      18
      1
      townhouse
      45.5187
      w/d hookups
      -122.532
      attached garage
      2000
      no smoking
      17:53:11
      NaN
      5
    
    
      5556783479
      1
      1
      1
      2361
      2016-04-25
      1
      902
      11
      1
      apartment
      45.5165
      w/d in unit
      -122.644
      attached garage
      1867
      no smoking
      17:54:03
      wheelchair accessible
      19
    
    
      5556785128
      2
      2
      1
      1303
      2016-04-25
      1
      937
      18
      1
      apartment
      45.5505
      w/d in unit
      -122.676
      attached garage
      2112
      no smoking
      17:55:17
      NaN
      0
    
    
      5556789183
      1
      1
      1
      1650
      2016-04-25
      1
      NaN
      8
      1
      apartment
      45.5441
      w/d in unit
      -122.642
      attached garage
      1300
      no smoking
      17:58:20
      NaN
      14
    
    
      5556790409
      1
      1
      1
      1379
      2016-04-25
      1
      NaN
      9
      1
      apartment
      45.5006
      laundry on site
      -122.69
      NaN
      1195
      NaN
      17:59:19
      NaN
      35
    
    
      5556792321
      1
      1
      1
      1321
      2016-04-25
      1
      NaN
      8
      1
      apartment
      45.5006
      NaN
      -122.69
      NaN
      1125
      NaN
      18:00:50
      NaN
      35
    
    
      5556799674
      1
      1
      1
      1843
      2016-04-25
      1
      911
      11
      1
      apartment
      45.5165
      w/d in unit
      -122.644
      attached garage
      1313
      no smoking
      18:06:36
      wheelchair accessible
      19
    
    
      5556802007
      1
      2
      1
      1607
      2016-04-25
      1
      NaN
      10
      1
      apartment
      45.5121
      w/d in unit
      -122.635
      street parking
      1795
      no smoking
      18:08:26
      NaN
      19
    
    
      5556803288
      1
      1
      0
      1408
      2016-04-25
      0
      NaN
      9
      0
      house
      NaN
      laundry on site
      NaN
      street parking
      760
      no smoking
      18:09:29
      NaN
      0
    
    
      5556804706
      2
      2
      1
      2200
      2016-04-25
      1
      870
      9
      1
      apartment
      45.5145
      w/d in unit
      -122.687
      attached garage
      2581
      no smoking
      18:10:39
      wheelchair accessible
      3
    
    
      5556806112
      1
      1
      1
      1625
      2016-04-25
      1
      640
      12
      1
      apartment
      45.5118
      w/d in unit
      -122.645
      attached garage
      1622
      no smoking
      18:11:48
      NaN
      19
    
    
      5556809642
      1
      1
      1
      2263
      2016-04-25
      1
      643
      5
      1
      apartment
      45.4983
      w/d in unit
      -122.691
      off-street parking
      1615
      no smoking
      18:14:38
      NaN
      35
    
    
      5556811885
      1
      1
      1
      1700
      2016-04-25
      1
      NaN
      13
      1
      apartment
      45.5179
      w/d in unit
      -122.664
      off-street parking
      1250
      no smoking
      18:16:30
      NaN
      33
    
    
      5556814062
      2
      2
      1
      2102
      2016-04-25
      1
      951
      6
      1
      apartment
      45.4667
      NaN
      -122.565
      NaN
      1320
      NaN
      18:18:15
      NaN
      7
    
    
      5556814296
      1
      1
      1
      1780
      2016-04-25
      1
      614
      10
      1
      apartment
      45.5293
      w/d in unit
      -122.704
      detached garage
      1500
      no smoking
      18:18:25
      NaN
      11
    
    
      5556817078
      1
      0
      0
      826
      2016-04-25
      0
      NaN
      8
      1
      house
      45.484
      w/d in unit
      -122.636
      off-street parking
      2400
      no smoking
      18:20:42
      NaN
      15
    
    
      5556824878
      1
      2
      0
      639
      2016-04-25
      0
      1000
      8
      1
      apartment
      45.5219
      laundry in bldg
      -122.697
      off-street parking
      1795
      NaN
      18:27:02
      NaN
      39
    
  

27391 rows × 19 columns

Here's the new Part. We're breaking out the neighborhood values into their own columns. Now the algorithms can read them as categorical data rather than continuous data.



In [ ]:



In [14]:

    
from sklearn import preprocessing
def CategoricalToBinary(dframe,column_name):
    le = preprocessing.LabelEncoder()
    listy = le.fit_transform(dframe[column_name])
    dframe[column_name] = listy
    unique = dframe[column_name].unique()
    serieslist = [list() for _ in xrange(len(unique))]
    
    
    for column, _ in enumerate(serieslist):
        for i, item in enumerate(dframe[column_name]):
            if item == column:
                serieslist[column].append(1)
            else:
                serieslist[column].append(0)
        dframe[column_name+str(column)] = serieslist[column]

   
    return dframe



In [15]:

    
pd.set_option('max_columns', 100)
dframe = CategoricalToBinary(dframe,'housingtype')
dframe = CategoricalToBinary(dframe,'parking')
dframe = CategoricalToBinary(dframe,'laundry')
dframe = CategoricalToBinary(dframe,'smoking')
dframe = CategoricalToBinary(dframe,'wheelchair')
dframe = CategoricalToBinary(dframe,'neighborhood')
dframe









    



/Users/mac28/anaconda/lib/python2.7/site-packages/numpy/lib/arraysetops.py:200: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
  flag = np.concatenate(([True], aux[1:] != aux[:-1]))






    Out[15]:






  
    
      
      bath
      bed
      cat
      content
      date
      dog
      feet
      getphotos
      hasmap
      housingtype
      lat
      laundry
      long
      parking
      price
      smoking
      time
      wheelchair
      neighborhood
      housingtype0
      housingtype1
      housingtype2
      housingtype3
      housingtype4
      housingtype5
      housingtype6
      housingtype7
      housingtype8
      housingtype9
      housingtype10
      housingtype11
      parking0
      parking1
      parking2
      parking3
      parking4
      parking5
      parking6
      parking7
      laundry0
      laundry1
      laundry2
      laundry3
      laundry4
      laundry5
      smoking0
      smoking1
      wheelchair0
      wheelchair1
      neighborhood0
      neighborhood1
      neighborhood2
      neighborhood3
      neighborhood4
      neighborhood5
      neighborhood6
      neighborhood7
      neighborhood8
      neighborhood9
      neighborhood10
      neighborhood11
      neighborhood12
      neighborhood13
      neighborhood14
      neighborhood15
      neighborhood16
      neighborhood17
      neighborhood18
      neighborhood19
      neighborhood20
      neighborhood21
      neighborhood22
      neighborhood23
      neighborhood24
      neighborhood25
      neighborhood26
      neighborhood27
      neighborhood28
      neighborhood29
      neighborhood30
      neighborhood31
      neighborhood32
      neighborhood33
      neighborhood34
      neighborhood35
      neighborhood36
      neighborhood37
      neighborhood38
      neighborhood39
    
  
  
    
      5399866740
      1
      1
      0
      754
      2016-01-12
      0
      750
      8
      0
      1
      NaN
      5
      NaN
      5
      1400
      0
      12:22:07
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5401772970
      1
      1
      1
      2632
      2016-01-13
      1
      659
      7
      1
      1
      45.531
      5
      -122.664
      1
      1350
      1
      16:24:11
      1
      33
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      5402562933
      1.5
      NaN
      0
      1001
      2016-01-14
      0
      1
      2
      1
      1
      45.5333
      2
      -122.709
      2
      1500
      1
      09:12:40
      0
      11
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5402607488
      1
      2
      0
      2259
      2016-01-14
      0
      936
      12
      1
      2
      45.5328
      5
      -122.699
      1
      1995
      0
      09:36:16
      0
      11
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5402822514
      1
      1
      0
      1110
      2016-01-14
      0
      624
      16
      1
      1
      45.5053
      5
      -122.618
      6
      1495
      0
      11:31:03
      0
      2
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5402918870
      2.5
      3
      0
      1318
      2016-01-14
      0
      1684
      22
      1
      11
      45.602
      5
      -122.667
      1
      1800
      1
      12:24:52
      0
      38
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
    
    
      5403011764
      1
      1
      1
      1649
      2016-01-14
      1
      750
      14
      1
      1
      45.5555
      4
      -122.658
      6
      1340
      0
      13:19:56
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5403019783
      1
      1
      1
      1324
      2016-01-14
      1
      640
      5
      1
      1
      45.5198
      2
      -122.687
      6
      1095
      1
      13:24:43
      0
      39
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
      0
      0
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      5403242242
      1
      0
      1
      2862
      2016-01-14
      1
      NaN
      12
      1
      1
      45.5413
      2
      -122.676
      6
      1235
      1
      15:54:38
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5403320258
      1.5
      3
      0
      1598
      2016-01-14
      0
      1200
      17
      1
      6
      45.4079
      5
      -122.762
      1
      1725
      1
      16:58:11
      1
      6
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404034182
      2
      2
      1
      4880
      2016-01-15
      1
      1010
      19
      1
      1
      45.464
      5
      -122.642
      6
      1995
      1
      08:57:41
      0
      29
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404362542
      1
      2
      1
      1662
      2016-01-15
      1
      850
      8
      1
      1
      45.5664
      2
      -122.696
      5
      1395
      0
      11:53:28
      0
      13
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404431092
      1
      1
      1
      1877
      2016-01-15
      1
      700
      14
      1
      1
      45.5855
      5
      -122.732
      5
      1195
      0
      12:32:22
      0
      4
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404439790
      1
      2
      1
      1860
      2016-01-15
      1
      900
      14
      1
      1
      45.5855
      5
      -122.732
      5
      1395
      0
      12:37:14
      0
      4
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404442485
      1
      1
      1
      1435
      2016-01-15
      1
      700
      11
      1
      1
      45.5855
      5
      -122.732
      5
      1195
      0
      12:38:46
      0
      4
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404447075
      1
      2
      1
      2603
      2016-01-15
      1
      850
      24
      1
      1
      45.4784
      5
      -122.609
      6
      1395
      0
      12:41:25
      0
      34
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
    
    
      5404478114
      1
      2
      1
      2375
      2016-01-15
      1
      800
      17
      1
      1
      45.61
      5
      -122.73
      5
      1295
      0
      12:59:13
      0
      4
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404512932
      1
      2
      1
      2564
      2016-01-15
      1
      850
      23
      1
      1
      45.4784
      5
      -122.609
      5
      1395
      0
      13:19:34
      0
      34
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
    
    
      5404543909
      1
      2
      0
      2626
      2016-01-15
      0
      825
      24
      1
      1
      45.4784
      5
      -122.609
      5
      1395
      0
      13:37:48
      0
      34
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
    
    
      5404549721
      1
      1
      0
      722
      2016-01-15
      0
      650
      4
      1
      1
      45.5627
      5
      -122.64
      6
      1150
      0
      13:41:21
      0
      25
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404650486
      1
      2
      0
      3193
      2016-01-15
      0
      1000
      18
      1
      1
      45.5353
      1
      -122.643
      3
      1695
      1
      14:44:29
      0
      14
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404727169
      1.5
      3
      0
      2625
      2016-01-15
      0
      1589
      15
      1
      6
      45.504
      5
      -122.649
      5
      2795
      1
      15:38:48
      0
      19
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404782936
      1
      2
      0
      879
      2016-01-15
      0
      NaN
      10
      1
      4
      45.5193
      4
      -122.6
      1
      1450
      0
      16:22:51
      0
      23
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404834100
      NaN
      0
      0
      1695
      2016-01-15
      0
      NaN
      2
      0
      1
      NaN
      0
      NaN
      0
      1300
      0
      17:06:13
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404855534
      2
      2
      0
      2465
      2016-01-15
      0
      875
      6
      1
      1
      45.5218
      1
      -122.686
      0
      1695
      1
      17:26:31
      0
      16
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404994129
      1
      2
      0
      699
      2016-01-15
      0
      NaN
      0
      1
      1
      45.4941
      0
      -122.399
      0
      1100
      0
      19:56:47
      0
      10
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5404995096
      1
      2
      0
      678
      2016-01-15
      0
      NaN
      0
      1
      1
      45.4941
      0
      -122.399
      0
      1100
      0
      19:58:02
      0
      10
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5405638400
      1
      1
      1
      2163
      2016-01-16
      1
      298
      16
      1
      1
      45.5413
      1
      -122.676
      6
      1235
      1
      10:05:11
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      1
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5405717413
      1.5
      2
      0
      1088
      2016-01-16
      0
      1172
      8
      1
      1
      45.5318
      4
      -122.421
      1
      1295
      1
      10:48:07
      0
      18
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5407108078
      1
      1
      0
      2319
      2016-01-17
      0
      881
      21
      1
      2
      45.5314
      5
      -122.677
      3
      2100
      1
      11:46:56
      0
      16
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      5556757532
      2
      2
      0
      2530
      2016-04-25
      0
      1079
      10
      1
      1
      45.5473
      5
      -122.441
      2
      1370
      0
      17:34:15
      0
      28
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556760594
      2
      2
      0
      663
      2016-04-25
      0
      1160
      4
      1
      1
      45.4352
      5
      -122.723
      1
      1700
      1
      17:36:37
      0
      6
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556761697
      1
      1
      1
      2480
      2016-04-25
      1
      NaN
      21
      1
      1
      45.5002
      5
      -122.672
      5
      1485
      1
      17:37:29
      0
      30
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556762484
      1
      1
      0
      1064
      2016-04-25
      0
      751
      5
      1
      1
      45.5742
      5
      -122.684
      5
      1175
      0
      17:38:06
      0
      13
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556763065
      2
      2
      1
      1391
      2016-04-25
      1
      966
      16
      1
      1
      45.5224
      5
      -122.655
      0
      2439
      0
      17:38:32
      0
      33
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      5556765449
      2
      3
      1
      2070
      2016-04-25
      1
      1320
      24
      1
      11
      45.4762
      5
      -122.56
      1
      2000
      1
      17:40:22
      0
      7
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556765534
      1
      1
      1
      2101
      2016-04-25
      1
      768
      8
      0
      1
      NaN
      5
      NaN
      5
      1745
      1
      17:40:25
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556765620
      1
      0
      1
      2465
      2016-04-25
      1
      810
      6
      1
      1
      45.5266
      5
      -122.679
      0
      1700
      0
      17:40:29
      0
      16
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556768834
      1
      2
      1
      2557
      2016-04-25
      1
      860
      6
      1
      1
      45.7655
      2
      -122.893
      5
      1170
      0
      17:42:56
      0
      9
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556771928
      1
      1
      1
      2963
      2016-04-25
      1
      810
      21
      1
      1
      45.5333
      5
      -122.684
      1
      2064
      1
      17:45:18
      1
      16
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556775928
      1
      2
      0
      693
      2016-04-25
      0
      1060
      16
      1
      4
      45.5342
      4
      -122.589
      1
      1595
      1
      17:48:22
      0
      21
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556776523
      1
      2
      1
      1968
      2016-04-25
      1
      892
      17
      1
      1
      45.5183
      5
      -122.695
      5
      2230
      1
      17:48:48
      0
      39
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      5556780176
      1
      1
      1
      2541
      2016-04-25
      1
      722
      9
      1
      1
      45.5308
      5
      -122.683
      5
      1799
      1
      17:51:37
      1
      16
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556782322
      3.5
      3
      1
      2018
      2016-04-25
      1
      1800
      18
      1
      11
      45.5187
      4
      -122.532
      1
      2000
      1
      17:53:11
      0
      5
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      1
      1
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556783479
      1
      1
      1
      2361
      2016-04-25
      1
      902
      11
      1
      1
      45.5165
      5
      -122.644
      1
      1867
      1
      17:54:03
      1
      19
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556785128
      2
      2
      1
      1303
      2016-04-25
      1
      937
      18
      1
      1
      45.5505
      5
      -122.676
      1
      2112
      1
      17:55:17
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556789183
      1
      1
      1
      1650
      2016-04-25
      1
      NaN
      8
      1
      1
      45.5441
      5
      -122.642
      1
      1300
      1
      17:58:20
      0
      14
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556790409
      1
      1
      1
      1379
      2016-04-25
      1
      NaN
      9
      1
      1
      45.5006
      2
      -122.69
      0
      1195
      0
      17:59:19
      0
      35
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
    
    
      5556792321
      1
      1
      1
      1321
      2016-04-25
      1
      NaN
      8
      1
      1
      45.5006
      0
      -122.69
      0
      1125
      0
      18:00:50
      0
      35
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
    
    
      5556799674
      1
      1
      1
      1843
      2016-04-25
      1
      911
      11
      1
      1
      45.5165
      5
      -122.644
      1
      1313
      1
      18:06:36
      1
      19
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556802007
      1
      2
      1
      1607
      2016-04-25
      1
      NaN
      10
      1
      1
      45.5121
      5
      -122.635
      6
      1795
      1
      18:08:26
      0
      19
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556803288
      1
      1
      0
      1408
      2016-04-25
      0
      NaN
      9
      0
      6
      NaN
      2
      NaN
      6
      760
      1
      18:09:29
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
      0
      0
      0
      1
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556804706
      2
      2
      1
      2200
      2016-04-25
      1
      870
      9
      1
      1
      45.5145
      5
      -122.687
      1
      2581
      1
      18:10:39
      1
      3
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      0
      1
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556806112
      1
      1
      1
      1625
      2016-04-25
      1
      640
      12
      1
      1
      45.5118
      5
      -122.645
      1
      1622
      1
      18:11:48
      0
      19
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556809642
      1
      1
      1
      2263
      2016-04-25
      1
      643
      5
      1
      1
      45.4983
      5
      -122.691
      5
      1615
      1
      18:14:38
      0
      35
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
    
    
      5556811885
      1
      1
      1
      1700
      2016-04-25
      1
      NaN
      13
      1
      1
      45.5179
      5
      -122.664
      5
      1250
      1
      18:16:30
      0
      33
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      5556814062
      2
      2
      1
      2102
      2016-04-25
      1
      951
      6
      1
      1
      45.4667
      0
      -122.565
      0
      1320
      0
      18:18:15
      0
      7
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556814296
      1
      1
      1
      1780
      2016-04-25
      1
      614
      10
      1
      1
      45.5293
      5
      -122.704
      3
      1500
      1
      18:18:25
      0
      11
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556817078
      1
      0
      0
      826
      2016-04-25
      0
      NaN
      8
      1
      6
      45.484
      5
      -122.636
      5
      2400
      1
      18:20:42
      0
      15
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      1
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5556824878
      1
      2
      0
      639
      2016-04-25
      0
      1000
      8
      1
      1
      45.5219
      1
      -122.697
      5
      1795
      0
      18:27:02
      0
      39
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      1
      0
      0
      0
      0
      1
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
  

27391 rows × 89 columns



In [16]:

    
dframe = dframe.drop('date',1)
dframe = dframe.drop('housingtype',1)
dframe = dframe.drop('parking',1)
dframe = dframe.drop('laundry',1)
dframe = dframe.drop('smoking',1)
dframe = dframe.drop('wheelchair',1)
dframe = dframe.drop('neighborhood',1)
dframe = dframe.drop('time',1)



In [17]:

    
columns=list(dframe.columns)



In [18]:

    
from __future__ import division
print len(dframe)
df2 = dframe[dframe.price < 10000][columns].dropna()
print len(df2)
print len(df2)/len(dframe)

price = df2[['price']].values
columns.pop(columns.index('price'))
features = df2[columns].values

from sklearn.cross_validation import train_test_split
features_train, features_test, price_train, price_test = train_test_split(features, price, test_size=0.1, random_state=42)









    



27391
22164
0.80917089555

Ok, lets put it through Decision Tree!

What about Random Forest?



In [19]:

    
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
reg = RandomForestRegressor()
reg = reg.fit(features_train, price_train)









    



/Users/mac28/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:4: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().



In [ ]:

    
forest_pred = reg.predict(features_test)
forest_pred = np.array([[item] for item in forest_pred])



In [ ]:

    
print r2_score(forest_pred, price_test)
plt.scatter(forest_pred,price_test)



In [ ]:

    
df2['predictions'] = reg.predict(df2[columns])



In [ ]:

    
df2['predictions_diff'] = df2['predictions']-df2['price']



In [ ]:

    
sd = np.std(df2['predictions_diff'])
sns.kdeplot(df2['predictions_diff'][df2['predictions_diff']>-150][df2['predictions_diff']<150])
sns.plt.xlim(-150,150)



In [ ]:

    
data = df2[dframe.lat > 45.45][df2.lat < 45.6][df2.long < -122.4][df2.long > -122.8][df2['predictions_diff']>-150][df2['predictions_diff']<150]
plt.figure(figsize=(15,10))
plt.scatter(data = data, x = 'long',y='lat', c = 'predictions_diff',s=10,cmap='coolwarm')



In [ ]:

    
dframe



In [ ]:

    
print np.mean([1,2,34,np.nan])



In [ ]:

    
def averager(dframe):
    dframe = dframe.T
    dframe.dropna()
    averages = {}
    for listing in dframe:
        try:
            key = str(dframe[listing]['bed'])+','+str(dframe[listing]['bath'])+','+str(dframe[listing]['neighborhood'])+','+str(dframe[listing]['feet']-dframe[listing]['feet']%50)
            if key not in averages:
                averages[key] = {'average_list':[dframe[listing]['price']], 'average':0}
            elif key in averages:
                averages[key]['average_list'].append(dframe[listing]['price'])
        except TypeError:
            continue
    for entry in averages:
        averages[entry]['average'] = np.mean(averages[entry]['average_list'])
    return averages



In [ ]:

    
averages = averager(dframe)
print averages



In [ ]:

    
dframe['averages']= averages[str(dframe['bed'])+','+str(dframe['bath'])+','+str(dframe['neighborhood'])+','+str(dframe['feet']-dframe['feet']%50)]



In [ ]:

    
dframe.T

Wow! up to .87! That's our best yet! What if we add more trees???



In [ ]:

    
reg = RandomForestRegressor(n_estimators = 100)
reg = reg.fit(features_train, price_train)



In [ ]:

    
forest_pred = reg.predict(features_test)
forest_pred = np.array([[item] for item in forest_pred])



In [ ]:

    
print r2_score(forest_pred, price_test)
print plt.scatter(pred,price_test)



In [ ]:



In [36]:

    
from sklearn.tree import DecisionTreeRegressor
reg = DecisionTreeRegressor(max_depth = 5)
reg.fit(features_train, price_train)
print len(features_train[0])
columns = [str(x) for x in columns]
print columns
from sklearn.tree import export_graphviz
export_graphviz(reg,feature_names=columns)









    



80
['bath', 'bed', 'cat', 'content', 'dog', 'feet', 'getphotos', 'hasmap', 'lat', 'long', 'housingtype0', 'housingtype1', 'housingtype2', 'housingtype3', 'housingtype4', 'housingtype5', 'housingtype6', 'housingtype7', 'housingtype8', 'housingtype9', 'housingtype10', 'housingtype11', 'parking0', 'parking1', 'parking2', 'parking3', 'parking4', 'parking5', 'parking6', 'parking7', 'laundry0', 'laundry1', 'laundry2', 'laundry3', 'laundry4', 'laundry5', 'smoking0', 'smoking1', 'wheelchair0', 'wheelchair1', 'neighborhood0', 'neighborhood1', 'neighborhood2', 'neighborhood3', 'neighborhood4', 'neighborhood5', 'neighborhood6', 'neighborhood7', 'neighborhood8', 'neighborhood9', 'neighborhood10', 'neighborhood11', 'neighborhood12', 'neighborhood13', 'neighborhood14', 'neighborhood15', 'neighborhood16', 'neighborhood17', 'neighborhood18', 'neighborhood19', 'neighborhood20', 'neighborhood21', 'neighborhood22', 'neighborhood23', 'neighborhood24', 'neighborhood25', 'neighborhood26', 'neighborhood27', 'neighborhood28', 'neighborhood29', 'neighborhood30', 'neighborhood31', 'neighborhood32', 'neighborhood33', 'neighborhood34', 'neighborhood35', 'neighborhood36', 'neighborhood37', 'neighborhood38', 'neighborhood39']

Up to .88!

So what is our goal now? I'd like to see if adjusting the number of neighborhoods increases the accuracy. same for the affect with the number of trees



In [ ]:

    
def neighborhood_optimizer(dframe,neighborhood_number_range, counter_num):
    XYdf = dframe[dframe.lat > 45.4][dframe.lat < 45.6][dframe.long < -122.0][dframe.long > -123.5]
    data = [[XYdf['lat'][i],XYdf['long'][i]] for i in XYdf.index]
    r2_dict = []
    for i in neighborhood_number_range:
        counter = counter_num
        average_accuracy_list = []
        while counter > 0:
            km = KMeans(n_clusters=i)
            km.fit(data)
            neighborhoods = km.cluster_centers_
            neighborhoods = neighborhoods.tolist()
            for x in enumerate(neighborhoods):
                x[1].append(x[0])
            neighborhoodlist = []
            for z in dframe.index:
                neighborhoodlist.append(clusterer(dframe['lat'][z],dframe['long'][z],neighborhoods))
            dframecopy = dframe.copy()
            dframecopy['neighborhood'] = Series((neighborhoodlist), index=dframe.index)
            df2 = dframecopy[dframe.price < 10000][['bath','bed','feet','dog','cat','content','getphotos', 'hasmap', 'price','neighborhood']].dropna()
            features = df2[['bath','bed','feet','dog','cat','content','getphotos', 'hasmap', 'neighborhood']].values
            price = df2[['price']].values
            features_train, features_test, price_train, price_test = train_test_split(features, price, test_size=0.1)
            reg = RandomForestRegressor()
            reg = reg.fit(features_train, price_train)
            forest_pred = reg.predict(features_test)
            forest_pred = np.array([[item] for item in forest_pred])
            counter -= 1
            average_accuracy_list.append(r2_score(forest_pred, price_test))
        total = 0
        for entry in average_accuracy_list:
            total += entry
        r2_accuracy = total/len(average_accuracy_list)
        r2_dict.append((i,r2_accuracy))
    print r2_dict
    return r2_dict



In [ ]:

    
neighborhood_number_range = [i for _,i in enumerate(range(2,31,2))]
neighborhood_number_range



In [ ]:

    
r2_dict = neighborhood_optimizer(dframe,neighborhood_number_range,10)



In [ ]:

    
r2_dict[:][0]



In [ ]:

    
plt.scatter([x[0] for x in r2_dict],[x[1] for x in r2_dict])

Looks like the optimum is right around 10 or 11, and then starts to drop off. Let's get a little more granular and look at a smaller range



In [ ]:

    
neighborhood_number_range = [i for _,i in enumerate(range(7,15))]
neighborhood_number_range



In [ ]:

    
r2_dict = neighborhood_optimizer(dframe,neighborhood_number_range,10)



In [ ]:

    
print r2_dict
plt.scatter([x[0] for x in r2_dict],[x[1] for x in r2_dict])

Trying a few times, it looks like 10, 11 and 12 get the best results at ~.85. Of course, we'll need to redo some of these optomizations after we properly process our data. Hopefully we'll see some more consistency then too.



In [ ]:

    
r2_dict = neighborhood_optimizer(dframe,[10,11,12],25)

Note #1 to Riley: (From Last time) Perhaps look into another regressor? see if there's one that's inherantly better at this kind of thing.

Note #2 to Riley: Figure out how to process data so that you don't have to drop null values

Note #3 to Riley: convert categorical data into binary

Note #4 to Riley: I wonder if increasing the number of neighborhoods would become more accurate as we collect more data? like you could create a bunch of little accurate models instead of a bunch of bigger ones.

Learned: If you plan on using Decision Tree/Random Forest from SKLearn, make sure you collect your discrete variables in separate columns and make them binary yes or no(0 or 1).

	available	bath	bed	cat	content	date	dog	feet	furnished	getphotos	...	housingtype	lastseen	lat	laundry	long	parking	price	smoking	time	wheelchair
count	9772	37393	38393	38683	38683	38683	38683	34045	281	38683	...	38663	2792	36640.0000	36138	36640.0000	28827	38436	23750	38683	7596
unique	204	17	9	2	3981	137	2	1548	1	24	...	11	2	6744.0000	5	6918.0000	7	2119	1	25586	1
top	jun 01	1	1	1	967	2016-05-16	1	700	furnished	8	...	apartment	2016-05-27	45.5142	w/d in unit	-122.6854	off-street parking	995	no smoking	10:57:57	wheelchair accessible
freq	574	25752	14689	26188	219	510	24603	680	281	3818	...	30178	2601	437.0000	23491	398.0000	9755	844	23750	8	7596

	bath	bed	cat	content	date	dog	feet	getphotos	hasmap	housingtype	lat	laundry	long	parking	price	smoking	time	wheelchair	neighborhood
5399866740	1	1	0	754	2016-01-12	0	750	8	0	apartment	NaN	w/d in unit	NaN	off-street parking	1400	NaN	12:22:07	NaN	0
5401772970	1	1	1	2632	2016-01-13	1	659	7	1	apartment	45.531	w/d in unit	-122.664	attached garage	1350	no smoking	16:24:11	wheelchair accessible	33
5402562933	1.5	NaN	0	1001	2016-01-14	0	1	2	1	apartment	45.5333	laundry on site	-122.709	carport	1500	no smoking	09:12:40	NaN	11
5402607488	1	2	0	2259	2016-01-14	0	936	12	1	condo	45.5328	w/d in unit	-122.699	attached garage	1995	NaN	09:36:16	NaN	11
5402822514	1	1	0	1110	2016-01-14	0	624	16	1	apartment	45.5053	w/d in unit	-122.618	street parking	1495	NaN	11:31:03	NaN	2
5402918870	2.5	3	0	1318	2016-01-14	0	1684	22	1	townhouse	45.602	w/d in unit	-122.667	attached garage	1800	no smoking	12:24:52	NaN	38
5403011764	1	1	1	1649	2016-01-14	1	750	14	1	apartment	45.5555	w/d hookups	-122.658	street parking	1340	NaN	13:19:56	NaN	0
5403019783	1	1	1	1324	2016-01-14	1	640	5	1	apartment	45.5198	laundry on site	-122.687	street parking	1095	no smoking	13:24:43	NaN	39
5403242242	1	0	1	2862	2016-01-14	1	NaN	12	1	apartment	45.5413	laundry on site	-122.676	street parking	1235	no smoking	15:54:38	NaN	0
5403320258	1.5	3	0	1598	2016-01-14	0	1200	17	1	house	45.4079	w/d in unit	-122.762	attached garage	1725	no smoking	16:58:11	wheelchair accessible	6
5404034182	2	2	1	4880	2016-01-15	1	1010	19	1	apartment	45.464	w/d in unit	-122.642	street parking	1995	no smoking	08:57:41	NaN	29
5404362542	1	2	1	1662	2016-01-15	1	850	8	1	apartment	45.5664	laundry on site	-122.696	off-street parking	1395	NaN	11:53:28	NaN	13
5404431092	1	1	1	1877	2016-01-15	1	700	14	1	apartment	45.5855	w/d in unit	-122.732	off-street parking	1195	NaN	12:32:22	NaN	4
5404439790	1	2	1	1860	2016-01-15	1	900	14	1	apartment	45.5855	w/d in unit	-122.732	off-street parking	1395	NaN	12:37:14	NaN	4
5404442485	1	1	1	1435	2016-01-15	1	700	11	1	apartment	45.5855	w/d in unit	-122.732	off-street parking	1195	NaN	12:38:46	NaN	4
5404447075	1	2	1	2603	2016-01-15	1	850	24	1	apartment	45.4784	w/d in unit	-122.609	street parking	1395	NaN	12:41:25	NaN	34
5404478114	1	2	1	2375	2016-01-15	1	800	17	1	apartment	45.61	w/d in unit	-122.73	off-street parking	1295	NaN	12:59:13	NaN	4
5404512932	1	2	1	2564	2016-01-15	1	850	23	1	apartment	45.4784	w/d in unit	-122.609	off-street parking	1395	NaN	13:19:34	NaN	34
5404543909	1	2	0	2626	2016-01-15	0	825	24	1	apartment	45.4784	w/d in unit	-122.609	off-street parking	1395	NaN	13:37:48	NaN	34
5404549721	1	1	0	722	2016-01-15	0	650	4	1	apartment	45.5627	w/d in unit	-122.64	street parking	1150	NaN	13:41:21	NaN	25
5404650486	1	2	0	3193	2016-01-15	0	1000	18	1	apartment	45.5353	laundry in bldg	-122.643	detached garage	1695	no smoking	14:44:29	NaN	14
5404727169	1.5	3	0	2625	2016-01-15	0	1589	15	1	house	45.504	w/d in unit	-122.649	off-street parking	2795	no smoking	15:38:48	NaN	19
5404782936	1	2	0	879	2016-01-15	0	NaN	10	1	duplex	45.5193	w/d hookups	-122.6	attached garage	1450	NaN	16:22:51	NaN	23
5404834100	NaN	0	0	1695	2016-01-15	0	NaN	2	0	apartment	NaN	NaN	NaN	NaN	1300	NaN	17:06:13	NaN	0
5404855534	2	2	0	2465	2016-01-15	0	875	6	1	apartment	45.5218	laundry in bldg	-122.686	NaN	1695	no smoking	17:26:31	NaN	16
5404994129	1	2	0	699	2016-01-15	0	NaN	0	1	apartment	45.4941	NaN	-122.399	NaN	1100	NaN	19:56:47	NaN	10
5404995096	1	2	0	678	2016-01-15	0	NaN	0	1	apartment	45.4941	NaN	-122.399	NaN	1100	NaN	19:58:02	NaN	10
5405638400	1	1	1	2163	2016-01-16	1	298	16	1	apartment	45.5413	laundry in bldg	-122.676	street parking	1235	no smoking	10:05:11	NaN	0
5405717413	1.5	2	0	1088	2016-01-16	0	1172	8	1	apartment	45.5318	w/d hookups	-122.421	attached garage	1295	no smoking	10:48:07	NaN	18
5407108078	1	1	0	2319	2016-01-17	0	881	21	1	condo	45.5314	w/d in unit	-122.677	detached garage	2100	no smoking	11:46:56	NaN	16
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5556757532	2	2	0	2530	2016-04-25	0	1079	10	1	apartment	45.5473	w/d in unit	-122.441	carport	1370	NaN	17:34:15	NaN	28
5556760594	2	2	0	663	2016-04-25	0	1160	4	1	apartment	45.4352	w/d in unit	-122.723	attached garage	1700	no smoking	17:36:37	NaN	6
5556761697	1	1	1	2480	2016-04-25	1	NaN	21	1	apartment	45.5002	w/d in unit	-122.672	off-street parking	1485	no smoking	17:37:29	NaN	30
5556762484	1	1	0	1064	2016-04-25	0	751	5	1	apartment	45.5742	w/d in unit	-122.684	off-street parking	1175	NaN	17:38:06	NaN	13
5556763065	2	2	1	1391	2016-04-25	1	966	16	1	apartment	45.5224	w/d in unit	-122.655	NaN	2439	NaN	17:38:32	NaN	33
5556765449	2	3	1	2070	2016-04-25	1	1320	24	1	townhouse	45.4762	w/d in unit	-122.56	attached garage	2000	no smoking	17:40:22	NaN	7
5556765534	1	1	1	2101	2016-04-25	1	768	8	0	apartment	NaN	w/d in unit	NaN	off-street parking	1745	no smoking	17:40:25	NaN	0
5556765620	1	0	1	2465	2016-04-25	1	810	6	1	apartment	45.5266	w/d in unit	-122.679	NaN	1700	NaN	17:40:29	NaN	16
5556768834	1	2	1	2557	2016-04-25	1	860	6	1	apartment	45.7655	laundry on site	-122.893	off-street parking	1170	NaN	17:42:56	NaN	9
5556771928	1	1	1	2963	2016-04-25	1	810	21	1	apartment	45.5333	w/d in unit	-122.684	attached garage	2064	no smoking	17:45:18	wheelchair accessible	16
5556775928	1	2	0	693	2016-04-25	0	1060	16	1	duplex	45.5342	w/d hookups	-122.589	attached garage	1595	no smoking	17:48:22	NaN	21
5556776523	1	2	1	1968	2016-04-25	1	892	17	1	apartment	45.5183	w/d in unit	-122.695	off-street parking	2230	no smoking	17:48:48	NaN	39
5556780176	1	1	1	2541	2016-04-25	1	722	9	1	apartment	45.5308	w/d in unit	-122.683	off-street parking	1799	no smoking	17:51:37	wheelchair accessible	16
5556782322	3.5	3	1	2018	2016-04-25	1	1800	18	1	townhouse	45.5187	w/d hookups	-122.532	attached garage	2000	no smoking	17:53:11	NaN	5
5556783479	1	1	1	2361	2016-04-25	1	902	11	1	apartment	45.5165	w/d in unit	-122.644	attached garage	1867	no smoking	17:54:03	wheelchair accessible	19
5556785128	2	2	1	1303	2016-04-25	1	937	18	1	apartment	45.5505	w/d in unit	-122.676	attached garage	2112	no smoking	17:55:17	NaN	0
5556789183	1	1	1	1650	2016-04-25	1	NaN	8	1	apartment	45.5441	w/d in unit	-122.642	attached garage	1300	no smoking	17:58:20	NaN	14
5556790409	1	1	1	1379	2016-04-25	1	NaN	9	1	apartment	45.5006	laundry on site	-122.69	NaN	1195	NaN	17:59:19	NaN	35
5556792321	1	1	1	1321	2016-04-25	1	NaN	8	1	apartment	45.5006	NaN	-122.69	NaN	1125	NaN	18:00:50	NaN	35
5556799674	1	1	1	1843	2016-04-25	1	911	11	1	apartment	45.5165	w/d in unit	-122.644	attached garage	1313	no smoking	18:06:36	wheelchair accessible	19
5556802007	1	2	1	1607	2016-04-25	1	NaN	10	1	apartment	45.5121	w/d in unit	-122.635	street parking	1795	no smoking	18:08:26	NaN	19
5556803288	1	1	0	1408	2016-04-25	0	NaN	9	0	house	NaN	laundry on site	NaN	street parking	760	no smoking	18:09:29	NaN	0
5556804706	2	2	1	2200	2016-04-25	1	870	9	1	apartment	45.5145	w/d in unit	-122.687	attached garage	2581	no smoking	18:10:39	wheelchair accessible	3
5556806112	1	1	1	1625	2016-04-25	1	640	12	1	apartment	45.5118	w/d in unit	-122.645	attached garage	1622	no smoking	18:11:48	NaN	19
5556809642	1	1	1	2263	2016-04-25	1	643	5	1	apartment	45.4983	w/d in unit	-122.691	off-street parking	1615	no smoking	18:14:38	NaN	35
5556811885	1	1	1	1700	2016-04-25	1	NaN	13	1	apartment	45.5179	w/d in unit	-122.664	off-street parking	1250	no smoking	18:16:30	NaN	33
5556814062	2	2	1	2102	2016-04-25	1	951	6	1	apartment	45.4667	NaN	-122.565	NaN	1320	NaN	18:18:15	NaN	7
5556814296	1	1	1	1780	2016-04-25	1	614	10	1	apartment	45.5293	w/d in unit	-122.704	detached garage	1500	no smoking	18:18:25	NaN	11
5556817078	1	0	0	826	2016-04-25	0	NaN	8	1	house	45.484	w/d in unit	-122.636	off-street parking	2400	no smoking	18:20:42	NaN	15
5556824878	1	2	0	639	2016-04-25	0	1000	8	1	apartment	45.5219	laundry in bldg	-122.697	off-street parking	1795	NaN	18:27:02	NaN	39