This is my fourth attempt at creating a model using sklearn alogithms

In this iteration of analysis we'll be looking at breaking out categorical varaibles and making them binary, and seeing if that makes our model more accurate.

My last three attempts at this are below:

https://github.com/rileyrustad/CLCrawler/blob/master/First_Analysis.ipynb

https://github.com/rileyrustad/CLCrawler/blob/master/Second_Analysis.ipynb

https://github.com/rileyrustad/CLCrawler/blob/master/Third_Analysis.ipynb

Start with the Imports


In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import json
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


/Users/mac28/anaconda/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))

Load the data from our JSON file.

The data is stored as a dictionary of dictionaries in the json file. We store it that way beacause it's easy to add data to the existing master data file. Also, I haven't figured out how to get it in a database yet.


In [3]:
with open('/Users/mac28/src/pdxapartmentfinder/pipeline/data/MasterApartmentData.json') as f:
    my_dict = json.load(f)
dframe = DataFrame(my_dict)

dframe = dframe.T
dframe.describe()


Out[3]:
available bath bed cat content date dog feet furnished getphotos ... housingtype lastseen lat laundry long parking price smoking time wheelchair
count 9772 37393 38393 38683 38683 38683 38683 34045 281 38683 ... 38663 2792 36640.0000 36138 36640.0000 28827 38436 23750 38683 7596
unique 204 17 9 2 3981 137 2 1548 1 24 ... 11 2 6744.0000 5 6918.0000 7 2119 1 25586 1
top jun 01 1 1 1 967 2016-05-16 1 700 furnished 8 ... apartment 2016-05-27 45.5142 w/d in unit -122.6854 off-street parking 995 no smoking 10:57:57 wheelchair accessible
freq 574 25752 14689 26188 219 510 24603 680 281 3818 ... 30178 2601 437.0000 23491 398.0000 9755 844 23750 8 7596

4 rows × 21 columns

Clean up the data a bit

Right now the 'shared' and 'split' are included in number of bathrooms. If I were to convert that to a number I would consider a shared/split bathroom to be half or 0.5 of a bathroom.


In [3]:
dframe.bath = dframe.bath.replace('shared',0.5)
dframe.bath = dframe.bath.replace('split',0.5)

Let's get a look at what the prices look like

To visualize it we need to get rid of null values. I haven't figured out the best way to clean this up yet. For now I'm going to drop any rows that have a null value, though I recognize that this is not a good analysis practice. We ended up dropping ~15% of data points.

😬

Also there were some CRAZY outliers, and this analysis is focused on finding a model for apartments for the 99% of us that can't afford crazy extravigant apartments


In [4]:
df = dframe[dframe.price < 10000][['bath','bed','feet','price']].dropna()
sns.distplot(df.price)


Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x115ba93d0>

In [5]:
data = dframe[dframe.lat > 45.4][dframe.lat < 45.6][dframe.long < -122.0][dframe.long > -123.5]
plt.figure(figsize=(15,10))
plt.scatter(data = data, x = 'long',y='lat')


/Users/mac28/anaconda/lib/python2.7/site-packages/pandas/core/frame.py:1997: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  "DataFrame index.", UserWarning)
Out[5]:
<matplotlib.collections.PathCollection at 0x11ab1c750>

It looks like Portland!!!

Let's cluster the data. Start by creating a list of [['lat','long'], ...]


In [6]:
XYdf = dframe[dframe.lat > 45.4][dframe.lat < 45.6][dframe.long < -122.0][dframe.long > -123.5]
data = [[XYdf['lat'][i],XYdf['long'][i]] for i in XYdf.index]

We'll use K Means Clustering because that's the clustering method I recently learned in class! There may be others that work better, but this is the tool that I know


In [9]:
from sklearn.cluster import KMeans
km = KMeans(n_clusters=40)
km.fit(data)
neighborhoods = km.cluster_centers_

In [8]:
%pylab inline
figure(1,figsize=(20,12))
plot([row[1] for row in data],[row[0] for row in data],'b.')
for i in km.cluster_centers_:  
    plot(i[1],i[0], 'g*',ms=25)
'''Note to Riley: come back and make it look pretty'''


Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['f']
`%matplotlib` prevents importing * from pylab and numpy
Out[8]:
'Note to Riley: come back and make it look pretty'

We chose our neighborhoods!

I've found that every once in a while the centers end up in different points, but are fairly consistant. Now let's process our data points and figure out where the closest neighborhood center is to it!


In [10]:
neighborhoods = neighborhoods.tolist()
for i in enumerate(neighborhoods):
    i[1].append(i[0])
print neighborhoods


[[45.552123420408165, -122.67131269489796, 0], [45.51092562287105, -122.45307534793187, 1], [45.506688980830674, -122.62617021405751, 2], [45.51290160229575, -122.68343233254558, 3], [45.587227178666666, -122.743129376, 4], [45.52450939197531, -122.55208463888889, 5], [45.452784070281126, -122.71596119076305, 6], [45.47244749687109, -122.56570525907384, 7], [45.51805015181518, -122.50170698019802, 8], [45.54218268181818, -122.83868146363636, 9], [45.48445444117647, -122.40939808823529, 10], [45.532534597305386, -122.70121312874251, 11], [45.478594089908256, -122.67784825504587, 12], [45.57852603030303, -122.68672382386364, 13], [45.534753504540866, -122.63718108476287, 14], [45.486881816, -122.6372868544, 15], [45.529539962154296, -122.6856359297671, 16], [45.49200958955224, -122.7456509925373, 17], [45.51650509148265, -122.40908712513144, 18], [45.515484257077276, -122.64211889671002, 19], [45.539150457831326, -122.77935245783132, 20], [45.545689939759036, -122.59564720783132, 21], [45.55325973846154, -122.53604236153846, 22], [45.52118124132232, -122.60366510082645, 23], [45.503803229067934, -122.57324186729858, 24], [45.563277824833705, -122.64478785809312, 25], [45.49269709178744, -122.5121487294686, 26], [45.5548645, -123.14423149999999, 27], [45.53120244650206, -122.4440400144033, 28], [45.46414142290749, -122.64893099559471, 29], [45.498214272401434, -122.67299030376344, 30], [45.490130411504424, -122.46661110176991, 31], [45.54266479098361, -122.49373927459017, 32], [45.52965521960785, -122.65906658529411, 33], [45.482160706632655, -122.60198717091836, 34], [45.49606003015075, -122.693913400335, 35], [45.414157103448275, -122.61978655172415, 36], [45.4689539245283, -122.8112003018868, 37], [45.59738002307692, -122.66026826153846, 38], [45.52132444802495, -122.69402066008315, 39]]

Create a function that will label each point with a number coresponding to it's neighborhood


In [11]:
def clusterer(X, Y,neighborhoods):
    neighbors = []
    for i in neighborhoods:
        distance = ((i[0]-X)**2 + (i[1]-Y)**2)
        neighbors.append(distance)
    closest = min(neighbors)
    return neighbors.index(closest)

In [12]:
neighborhoodlist = []
for i in dframe.index:
    neighborhoodlist.append(clusterer(dframe['lat'][i],dframe['long'][i],neighborhoods))
dframe['neighborhood'] = neighborhoodlist

In [13]:
dframe


Out[13]:
bath bed cat content date dog feet getphotos hasmap housingtype lat laundry long parking price smoking time wheelchair neighborhood
5399866740 1 1 0 754 2016-01-12 0 750 8 0 apartment NaN w/d in unit NaN off-street parking 1400 NaN 12:22:07 NaN 0
5401772970 1 1 1 2632 2016-01-13 1 659 7 1 apartment 45.531 w/d in unit -122.664 attached garage 1350 no smoking 16:24:11 wheelchair accessible 33
5402562933 1.5 NaN 0 1001 2016-01-14 0 1 2 1 apartment 45.5333 laundry on site -122.709 carport 1500 no smoking 09:12:40 NaN 11
5402607488 1 2 0 2259 2016-01-14 0 936 12 1 condo 45.5328 w/d in unit -122.699 attached garage 1995 NaN 09:36:16 NaN 11
5402822514 1 1 0 1110 2016-01-14 0 624 16 1 apartment 45.5053 w/d in unit -122.618 street parking 1495 NaN 11:31:03 NaN 2
5402918870 2.5 3 0 1318 2016-01-14 0 1684 22 1 townhouse 45.602 w/d in unit -122.667 attached garage 1800 no smoking 12:24:52 NaN 38
5403011764 1 1 1 1649 2016-01-14 1 750 14 1 apartment 45.5555 w/d hookups -122.658 street parking 1340 NaN 13:19:56 NaN 0
5403019783 1 1 1 1324 2016-01-14 1 640 5 1 apartment 45.5198 laundry on site -122.687 street parking 1095 no smoking 13:24:43 NaN 39
5403242242 1 0 1 2862 2016-01-14 1 NaN 12 1 apartment 45.5413 laundry on site -122.676 street parking 1235 no smoking 15:54:38 NaN 0
5403320258 1.5 3 0 1598 2016-01-14 0 1200 17 1 house 45.4079 w/d in unit -122.762 attached garage 1725 no smoking 16:58:11 wheelchair accessible 6
5404034182 2 2 1 4880 2016-01-15 1 1010 19 1 apartment 45.464 w/d in unit -122.642 street parking 1995 no smoking 08:57:41 NaN 29
5404362542 1 2 1 1662 2016-01-15 1 850 8 1 apartment 45.5664 laundry on site -122.696 off-street parking 1395 NaN 11:53:28 NaN 13
5404431092 1 1 1 1877 2016-01-15 1 700 14 1 apartment 45.5855 w/d in unit -122.732 off-street parking 1195 NaN 12:32:22 NaN 4
5404439790 1 2 1 1860 2016-01-15 1 900 14 1 apartment 45.5855 w/d in unit -122.732 off-street parking 1395 NaN 12:37:14 NaN 4
5404442485 1 1 1 1435 2016-01-15 1 700 11 1 apartment 45.5855 w/d in unit -122.732 off-street parking 1195 NaN 12:38:46 NaN 4
5404447075 1 2 1 2603 2016-01-15 1 850 24 1 apartment 45.4784 w/d in unit -122.609 street parking 1395 NaN 12:41:25 NaN 34
5404478114 1 2 1 2375 2016-01-15 1 800 17 1 apartment 45.61 w/d in unit -122.73 off-street parking 1295 NaN 12:59:13 NaN 4
5404512932 1 2 1 2564 2016-01-15 1 850 23 1 apartment 45.4784 w/d in unit -122.609 off-street parking 1395 NaN 13:19:34 NaN 34
5404543909 1 2 0 2626 2016-01-15 0 825 24 1 apartment 45.4784 w/d in unit -122.609 off-street parking 1395 NaN 13:37:48 NaN 34
5404549721 1 1 0 722 2016-01-15 0 650 4 1 apartment 45.5627 w/d in unit -122.64 street parking 1150 NaN 13:41:21 NaN 25
5404650486 1 2 0 3193 2016-01-15 0 1000 18 1 apartment 45.5353 laundry in bldg -122.643 detached garage 1695 no smoking 14:44:29 NaN 14
5404727169 1.5 3 0 2625 2016-01-15 0 1589 15 1 house 45.504 w/d in unit -122.649 off-street parking 2795 no smoking 15:38:48 NaN 19
5404782936 1 2 0 879 2016-01-15 0 NaN 10 1 duplex 45.5193 w/d hookups -122.6 attached garage 1450 NaN 16:22:51 NaN 23
5404834100 NaN 0 0 1695 2016-01-15 0 NaN 2 0 apartment NaN NaN NaN NaN 1300 NaN 17:06:13 NaN 0
5404855534 2 2 0 2465 2016-01-15 0 875 6 1 apartment 45.5218 laundry in bldg -122.686 NaN 1695 no smoking 17:26:31 NaN 16
5404994129 1 2 0 699 2016-01-15 0 NaN 0 1 apartment 45.4941 NaN -122.399 NaN 1100 NaN 19:56:47 NaN 10
5404995096 1 2 0 678 2016-01-15 0 NaN 0 1 apartment 45.4941 NaN -122.399 NaN 1100 NaN 19:58:02 NaN 10
5405638400 1 1 1 2163 2016-01-16 1 298 16 1 apartment 45.5413 laundry in bldg -122.676 street parking 1235 no smoking 10:05:11 NaN 0
5405717413 1.5 2 0 1088 2016-01-16 0 1172 8 1 apartment 45.5318 w/d hookups -122.421 attached garage 1295 no smoking 10:48:07 NaN 18
5407108078 1 1 0 2319 2016-01-17 0 881 21 1 condo 45.5314 w/d in unit -122.677 detached garage 2100 no smoking 11:46:56 NaN 16
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5556757532 2 2 0 2530 2016-04-25 0 1079 10 1 apartment 45.5473 w/d in unit -122.441 carport 1370 NaN 17:34:15 NaN 28
5556760594 2 2 0 663 2016-04-25 0 1160 4 1 apartment 45.4352 w/d in unit -122.723 attached garage 1700 no smoking 17:36:37 NaN 6
5556761697 1 1 1 2480 2016-04-25 1 NaN 21 1 apartment 45.5002 w/d in unit -122.672 off-street parking 1485 no smoking 17:37:29 NaN 30
5556762484 1 1 0 1064 2016-04-25 0 751 5 1 apartment 45.5742 w/d in unit -122.684 off-street parking 1175 NaN 17:38:06 NaN 13
5556763065 2 2 1 1391 2016-04-25 1 966 16 1 apartment 45.5224 w/d in unit -122.655 NaN 2439 NaN 17:38:32 NaN 33
5556765449 2 3 1 2070 2016-04-25 1 1320 24 1 townhouse 45.4762 w/d in unit -122.56 attached garage 2000 no smoking 17:40:22 NaN 7
5556765534 1 1 1 2101 2016-04-25 1 768 8 0 apartment NaN w/d in unit NaN off-street parking 1745 no smoking 17:40:25 NaN 0
5556765620 1 0 1 2465 2016-04-25 1 810 6 1 apartment 45.5266 w/d in unit -122.679 NaN 1700 NaN 17:40:29 NaN 16
5556768834 1 2 1 2557 2016-04-25 1 860 6 1 apartment 45.7655 laundry on site -122.893 off-street parking 1170 NaN 17:42:56 NaN 9
5556771928 1 1 1 2963 2016-04-25 1 810 21 1 apartment 45.5333 w/d in unit -122.684 attached garage 2064 no smoking 17:45:18 wheelchair accessible 16
5556775928 1 2 0 693 2016-04-25 0 1060 16 1 duplex 45.5342 w/d hookups -122.589 attached garage 1595 no smoking 17:48:22 NaN 21
5556776523 1 2 1 1968 2016-04-25 1 892 17 1 apartment 45.5183 w/d in unit -122.695 off-street parking 2230 no smoking 17:48:48 NaN 39
5556780176 1 1 1 2541 2016-04-25 1 722 9 1 apartment 45.5308 w/d in unit -122.683 off-street parking 1799 no smoking 17:51:37 wheelchair accessible 16
5556782322 3.5 3 1 2018 2016-04-25 1 1800 18 1 townhouse 45.5187 w/d hookups -122.532 attached garage 2000 no smoking 17:53:11 NaN 5
5556783479 1 1 1 2361 2016-04-25 1 902 11 1 apartment 45.5165 w/d in unit -122.644 attached garage 1867 no smoking 17:54:03 wheelchair accessible 19
5556785128 2 2 1 1303 2016-04-25 1 937 18 1 apartment 45.5505 w/d in unit -122.676 attached garage 2112 no smoking 17:55:17 NaN 0
5556789183 1 1 1 1650 2016-04-25 1 NaN 8 1 apartment 45.5441 w/d in unit -122.642 attached garage 1300 no smoking 17:58:20 NaN 14
5556790409 1 1 1 1379 2016-04-25 1 NaN 9 1 apartment 45.5006 laundry on site -122.69 NaN 1195 NaN 17:59:19 NaN 35
5556792321 1 1 1 1321 2016-04-25 1 NaN 8 1 apartment 45.5006 NaN -122.69 NaN 1125 NaN 18:00:50 NaN 35
5556799674 1 1 1 1843 2016-04-25 1 911 11 1 apartment 45.5165 w/d in unit -122.644 attached garage 1313 no smoking 18:06:36 wheelchair accessible 19
5556802007 1 2 1 1607 2016-04-25 1 NaN 10 1 apartment 45.5121 w/d in unit -122.635 street parking 1795 no smoking 18:08:26 NaN 19
5556803288 1 1 0 1408 2016-04-25 0 NaN 9 0 house NaN laundry on site NaN street parking 760 no smoking 18:09:29 NaN 0
5556804706 2 2 1 2200 2016-04-25 1 870 9 1 apartment 45.5145 w/d in unit -122.687 attached garage 2581 no smoking 18:10:39 wheelchair accessible 3
5556806112 1 1 1 1625 2016-04-25 1 640 12 1 apartment 45.5118 w/d in unit -122.645 attached garage 1622 no smoking 18:11:48 NaN 19
5556809642 1 1 1 2263 2016-04-25 1 643 5 1 apartment 45.4983 w/d in unit -122.691 off-street parking 1615 no smoking 18:14:38 NaN 35
5556811885 1 1 1 1700 2016-04-25 1 NaN 13 1 apartment 45.5179 w/d in unit -122.664 off-street parking 1250 no smoking 18:16:30 NaN 33
5556814062 2 2 1 2102 2016-04-25 1 951 6 1 apartment 45.4667 NaN -122.565 NaN 1320 NaN 18:18:15 NaN 7
5556814296 1 1 1 1780 2016-04-25 1 614 10 1 apartment 45.5293 w/d in unit -122.704 detached garage 1500 no smoking 18:18:25 NaN 11
5556817078 1 0 0 826 2016-04-25 0 NaN 8 1 house 45.484 w/d in unit -122.636 off-street parking 2400 no smoking 18:20:42 NaN 15
5556824878 1 2 0 639 2016-04-25 0 1000 8 1 apartment 45.5219 laundry in bldg -122.697 off-street parking 1795 NaN 18:27:02 NaN 39

27391 rows × 19 columns

Here's the new Part. We're breaking out the neighborhood values into their own columns. Now the algorithms can read them as categorical data rather than continuous data.


In [ ]:


In [14]:
from sklearn import preprocessing
def CategoricalToBinary(dframe,column_name):
    le = preprocessing.LabelEncoder()
    listy = le.fit_transform(dframe[column_name])
    dframe[column_name] = listy
    unique = dframe[column_name].unique()
    serieslist = [list() for _ in xrange(len(unique))]
    
    
    for column, _ in enumerate(serieslist):
        for i, item in enumerate(dframe[column_name]):
            if item == column:
                serieslist[column].append(1)
            else:
                serieslist[column].append(0)
        dframe[column_name+str(column)] = serieslist[column]

   
    return dframe

In [15]:
pd.set_option('max_columns', 100)
dframe = CategoricalToBinary(dframe,'housingtype')
dframe = CategoricalToBinary(dframe,'parking')
dframe = CategoricalToBinary(dframe,'laundry')
dframe = CategoricalToBinary(dframe,'smoking')
dframe = CategoricalToBinary(dframe,'wheelchair')
dframe = CategoricalToBinary(dframe,'neighborhood')
dframe


/Users/mac28/anaconda/lib/python2.7/site-packages/numpy/lib/arraysetops.py:200: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
  flag = np.concatenate(([True], aux[1:] != aux[:-1]))
Out[15]:
bath bed cat content date dog feet getphotos hasmap housingtype lat laundry long parking price smoking time wheelchair neighborhood housingtype0 housingtype1 housingtype2 housingtype3 housingtype4 housingtype5 housingtype6 housingtype7 housingtype8 housingtype9 housingtype10 housingtype11 parking0 parking1 parking2 parking3 parking4 parking5 parking6 parking7 laundry0 laundry1 laundry2 laundry3 laundry4 laundry5 smoking0 smoking1 wheelchair0 wheelchair1 neighborhood0 neighborhood1 neighborhood2 neighborhood3 neighborhood4 neighborhood5 neighborhood6 neighborhood7 neighborhood8 neighborhood9 neighborhood10 neighborhood11 neighborhood12 neighborhood13 neighborhood14 neighborhood15 neighborhood16 neighborhood17 neighborhood18 neighborhood19 neighborhood20 neighborhood21 neighborhood22 neighborhood23 neighborhood24 neighborhood25 neighborhood26 neighborhood27 neighborhood28 neighborhood29 neighborhood30 neighborhood31 neighborhood32 neighborhood33 neighborhood34 neighborhood35 neighborhood36 neighborhood37 neighborhood38 neighborhood39
5399866740 1 1 0 754 2016-01-12 0 750 8 0 1 NaN 5 NaN 5 1400 0 12:22:07 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5401772970 1 1 1 2632 2016-01-13 1 659 7 1 1 45.531 5 -122.664 1 1350 1 16:24:11 1 33 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
5402562933 1.5 NaN 0 1001 2016-01-14 0 1 2 1 1 45.5333 2 -122.709 2 1500 1 09:12:40 0 11 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5402607488 1 2 0 2259 2016-01-14 0 936 12 1 2 45.5328 5 -122.699 1 1995 0 09:36:16 0 11 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5402822514 1 1 0 1110 2016-01-14 0 624 16 1 1 45.5053 5 -122.618 6 1495 0 11:31:03 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5402918870 2.5 3 0 1318 2016-01-14 0 1684 22 1 11 45.602 5 -122.667 1 1800 1 12:24:52 0 38 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
5403011764 1 1 1 1649 2016-01-14 1 750 14 1 1 45.5555 4 -122.658 6 1340 0 13:19:56 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5403019783 1 1 1 1324 2016-01-14 1 640 5 1 1 45.5198 2 -122.687 6 1095 1 13:24:43 0 39 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
5403242242 1 0 1 2862 2016-01-14 1 NaN 12 1 1 45.5413 2 -122.676 6 1235 1 15:54:38 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5403320258 1.5 3 0 1598 2016-01-14 0 1200 17 1 6 45.4079 5 -122.762 1 1725 1 16:58:11 1 6 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404034182 2 2 1 4880 2016-01-15 1 1010 19 1 1 45.464 5 -122.642 6 1995 1 08:57:41 0 29 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
5404362542 1 2 1 1662 2016-01-15 1 850 8 1 1 45.5664 2 -122.696 5 1395 0 11:53:28 0 13 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404431092 1 1 1 1877 2016-01-15 1 700 14 1 1 45.5855 5 -122.732 5 1195 0 12:32:22 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404439790 1 2 1 1860 2016-01-15 1 900 14 1 1 45.5855 5 -122.732 5 1395 0 12:37:14 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404442485 1 1 1 1435 2016-01-15 1 700 11 1 1 45.5855 5 -122.732 5 1195 0 12:38:46 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404447075 1 2 1 2603 2016-01-15 1 850 24 1 1 45.4784 5 -122.609 6 1395 0 12:41:25 0 34 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
5404478114 1 2 1 2375 2016-01-15 1 800 17 1 1 45.61 5 -122.73 5 1295 0 12:59:13 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404512932 1 2 1 2564 2016-01-15 1 850 23 1 1 45.4784 5 -122.609 5 1395 0 13:19:34 0 34 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
5404543909 1 2 0 2626 2016-01-15 0 825 24 1 1 45.4784 5 -122.609 5 1395 0 13:37:48 0 34 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
5404549721 1 1 0 722 2016-01-15 0 650 4 1 1 45.5627 5 -122.64 6 1150 0 13:41:21 0 25 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404650486 1 2 0 3193 2016-01-15 0 1000 18 1 1 45.5353 1 -122.643 3 1695 1 14:44:29 0 14 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404727169 1.5 3 0 2625 2016-01-15 0 1589 15 1 6 45.504 5 -122.649 5 2795 1 15:38:48 0 19 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404782936 1 2 0 879 2016-01-15 0 NaN 10 1 4 45.5193 4 -122.6 1 1450 0 16:22:51 0 23 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404834100 NaN 0 0 1695 2016-01-15 0 NaN 2 0 1 NaN 0 NaN 0 1300 0 17:06:13 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404855534 2 2 0 2465 2016-01-15 0 875 6 1 1 45.5218 1 -122.686 0 1695 1 17:26:31 0 16 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404994129 1 2 0 699 2016-01-15 0 NaN 0 1 1 45.4941 0 -122.399 0 1100 0 19:56:47 0 10 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5404995096 1 2 0 678 2016-01-15 0 NaN 0 1 1 45.4941 0 -122.399 0 1100 0 19:58:02 0 10 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5405638400 1 1 1 2163 2016-01-16 1 298 16 1 1 45.5413 1 -122.676 6 1235 1 10:05:11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5405717413 1.5 2 0 1088 2016-01-16 0 1172 8 1 1 45.5318 4 -122.421 1 1295 1 10:48:07 0 18 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5407108078 1 1 0 2319 2016-01-17 0 881 21 1 2 45.5314 5 -122.677 3 2100 1 11:46:56 0 16 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5556757532 2 2 0 2530 2016-04-25 0 1079 10 1 1 45.5473 5 -122.441 2 1370 0 17:34:15 0 28 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
5556760594 2 2 0 663 2016-04-25 0 1160 4 1 1 45.4352 5 -122.723 1 1700 1 17:36:37 0 6 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556761697 1 1 1 2480 2016-04-25 1 NaN 21 1 1 45.5002 5 -122.672 5 1485 1 17:37:29 0 30 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
5556762484 1 1 0 1064 2016-04-25 0 751 5 1 1 45.5742 5 -122.684 5 1175 0 17:38:06 0 13 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556763065 2 2 1 1391 2016-04-25 1 966 16 1 1 45.5224 5 -122.655 0 2439 0 17:38:32 0 33 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
5556765449 2 3 1 2070 2016-04-25 1 1320 24 1 11 45.4762 5 -122.56 1 2000 1 17:40:22 0 7 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556765534 1 1 1 2101 2016-04-25 1 768 8 0 1 NaN 5 NaN 5 1745 1 17:40:25 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556765620 1 0 1 2465 2016-04-25 1 810 6 1 1 45.5266 5 -122.679 0 1700 0 17:40:29 0 16 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556768834 1 2 1 2557 2016-04-25 1 860 6 1 1 45.7655 2 -122.893 5 1170 0 17:42:56 0 9 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556771928 1 1 1 2963 2016-04-25 1 810 21 1 1 45.5333 5 -122.684 1 2064 1 17:45:18 1 16 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556775928 1 2 0 693 2016-04-25 0 1060 16 1 4 45.5342 4 -122.589 1 1595 1 17:48:22 0 21 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556776523 1 2 1 1968 2016-04-25 1 892 17 1 1 45.5183 5 -122.695 5 2230 1 17:48:48 0 39 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
5556780176 1 1 1 2541 2016-04-25 1 722 9 1 1 45.5308 5 -122.683 5 1799 1 17:51:37 1 16 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556782322 3.5 3 1 2018 2016-04-25 1 1800 18 1 11 45.5187 4 -122.532 1 2000 1 17:53:11 0 5 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556783479 1 1 1 2361 2016-04-25 1 902 11 1 1 45.5165 5 -122.644 1 1867 1 17:54:03 1 19 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556785128 2 2 1 1303 2016-04-25 1 937 18 1 1 45.5505 5 -122.676 1 2112 1 17:55:17 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556789183 1 1 1 1650 2016-04-25 1 NaN 8 1 1 45.5441 5 -122.642 1 1300 1 17:58:20 0 14 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556790409 1 1 1 1379 2016-04-25 1 NaN 9 1 1 45.5006 2 -122.69 0 1195 0 17:59:19 0 35 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
5556792321 1 1 1 1321 2016-04-25 1 NaN 8 1 1 45.5006 0 -122.69 0 1125 0 18:00:50 0 35 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
5556799674 1 1 1 1843 2016-04-25 1 911 11 1 1 45.5165 5 -122.644 1 1313 1 18:06:36 1 19 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556802007 1 2 1 1607 2016-04-25 1 NaN 10 1 1 45.5121 5 -122.635 6 1795 1 18:08:26 0 19 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556803288 1 1 0 1408 2016-04-25 0 NaN 9 0 6 NaN 2 NaN 6 760 1 18:09:29 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556804706 2 2 1 2200 2016-04-25 1 870 9 1 1 45.5145 5 -122.687 1 2581 1 18:10:39 1 3 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556806112 1 1 1 1625 2016-04-25 1 640 12 1 1 45.5118 5 -122.645 1 1622 1 18:11:48 0 19 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556809642 1 1 1 2263 2016-04-25 1 643 5 1 1 45.4983 5 -122.691 5 1615 1 18:14:38 0 35 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
5556811885 1 1 1 1700 2016-04-25 1 NaN 13 1 1 45.5179 5 -122.664 5 1250 1 18:16:30 0 33 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
5556814062 2 2 1 2102 2016-04-25 1 951 6 1 1 45.4667 0 -122.565 0 1320 0 18:18:15 0 7 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556814296 1 1 1 1780 2016-04-25 1 614 10 1 1 45.5293 5 -122.704 3 1500 1 18:18:25 0 11 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556817078 1 0 0 826 2016-04-25 0 NaN 8 1 6 45.484 5 -122.636 5 2400 1 18:20:42 0 15 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5556824878 1 2 0 639 2016-04-25 0 1000 8 1 1 45.5219 1 -122.697 5 1795 0 18:27:02 0 39 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

27391 rows × 89 columns


In [16]:
dframe = dframe.drop('date',1)
dframe = dframe.drop('housingtype',1)
dframe = dframe.drop('parking',1)
dframe = dframe.drop('laundry',1)
dframe = dframe.drop('smoking',1)
dframe = dframe.drop('wheelchair',1)
dframe = dframe.drop('neighborhood',1)
dframe = dframe.drop('time',1)

In [17]:
columns=list(dframe.columns)

In [18]:
from __future__ import division
print len(dframe)
df2 = dframe[dframe.price < 10000][columns].dropna()
print len(df2)
print len(df2)/len(dframe)

price = df2[['price']].values
columns.pop(columns.index('price'))
features = df2[columns].values

from sklearn.cross_validation import train_test_split
features_train, features_test, price_train, price_test = train_test_split(features, price, test_size=0.1, random_state=42)


27391
22164
0.80917089555

Ok, lets put it through Decision Tree!

What about Random Forest?


In [19]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
reg = RandomForestRegressor()
reg = reg.fit(features_train, price_train)


/Users/mac28/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:4: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().

In [ ]:
forest_pred = reg.predict(features_test)
forest_pred = np.array([[item] for item in forest_pred])

In [ ]:
print r2_score(forest_pred, price_test)
plt.scatter(forest_pred,price_test)

In [ ]:
df2['predictions'] = reg.predict(df2[columns])

In [ ]:
df2['predictions_diff'] = df2['predictions']-df2['price']

In [ ]:
sd = np.std(df2['predictions_diff'])
sns.kdeplot(df2['predictions_diff'][df2['predictions_diff']>-150][df2['predictions_diff']<150])
sns.plt.xlim(-150,150)

In [ ]:
data = df2[dframe.lat > 45.45][df2.lat < 45.6][df2.long < -122.4][df2.long > -122.8][df2['predictions_diff']>-150][df2['predictions_diff']<150]
plt.figure(figsize=(15,10))
plt.scatter(data = data, x = 'long',y='lat', c = 'predictions_diff',s=10,cmap='coolwarm')

In [ ]:
dframe

In [ ]:
print np.mean([1,2,34,np.nan])

In [ ]:
def averager(dframe):
    dframe = dframe.T
    dframe.dropna()
    averages = {}
    for listing in dframe:
        try:
            key = str(dframe[listing]['bed'])+','+str(dframe[listing]['bath'])+','+str(dframe[listing]['neighborhood'])+','+str(dframe[listing]['feet']-dframe[listing]['feet']%50)
            if key not in averages:
                averages[key] = {'average_list':[dframe[listing]['price']], 'average':0}
            elif key in averages:
                averages[key]['average_list'].append(dframe[listing]['price'])
        except TypeError:
            continue
    for entry in averages:
        averages[entry]['average'] = np.mean(averages[entry]['average_list'])
    return averages

In [ ]:
averages = averager(dframe)
print averages

In [ ]:
dframe['averages']= averages[str(dframe['bed'])+','+str(dframe['bath'])+','+str(dframe['neighborhood'])+','+str(dframe['feet']-dframe['feet']%50)]

In [ ]:
dframe.T

Wow! up to .87! That's our best yet! What if we add more trees???


In [ ]:
reg = RandomForestRegressor(n_estimators = 100)
reg = reg.fit(features_train, price_train)

In [ ]:
forest_pred = reg.predict(features_test)
forest_pred = np.array([[item] for item in forest_pred])

In [ ]:
print r2_score(forest_pred, price_test)
print plt.scatter(pred,price_test)

In [ ]:


In [36]:
from sklearn.tree import DecisionTreeRegressor
reg = DecisionTreeRegressor(max_depth = 5)
reg.fit(features_train, price_train)
print len(features_train[0])
columns = [str(x) for x in columns]
print columns
from sklearn.tree import export_graphviz
export_graphviz(reg,feature_names=columns)


80
['bath', 'bed', 'cat', 'content', 'dog', 'feet', 'getphotos', 'hasmap', 'lat', 'long', 'housingtype0', 'housingtype1', 'housingtype2', 'housingtype3', 'housingtype4', 'housingtype5', 'housingtype6', 'housingtype7', 'housingtype8', 'housingtype9', 'housingtype10', 'housingtype11', 'parking0', 'parking1', 'parking2', 'parking3', 'parking4', 'parking5', 'parking6', 'parking7', 'laundry0', 'laundry1', 'laundry2', 'laundry3', 'laundry4', 'laundry5', 'smoking0', 'smoking1', 'wheelchair0', 'wheelchair1', 'neighborhood0', 'neighborhood1', 'neighborhood2', 'neighborhood3', 'neighborhood4', 'neighborhood5', 'neighborhood6', 'neighborhood7', 'neighborhood8', 'neighborhood9', 'neighborhood10', 'neighborhood11', 'neighborhood12', 'neighborhood13', 'neighborhood14', 'neighborhood15', 'neighborhood16', 'neighborhood17', 'neighborhood18', 'neighborhood19', 'neighborhood20', 'neighborhood21', 'neighborhood22', 'neighborhood23', 'neighborhood24', 'neighborhood25', 'neighborhood26', 'neighborhood27', 'neighborhood28', 'neighborhood29', 'neighborhood30', 'neighborhood31', 'neighborhood32', 'neighborhood33', 'neighborhood34', 'neighborhood35', 'neighborhood36', 'neighborhood37', 'neighborhood38', 'neighborhood39']

Up to .88!

So what is our goal now? I'd like to see if adjusting the number of neighborhoods increases the accuracy. same for the affect with the number of trees


In [ ]:
def neighborhood_optimizer(dframe,neighborhood_number_range, counter_num):
    XYdf = dframe[dframe.lat > 45.4][dframe.lat < 45.6][dframe.long < -122.0][dframe.long > -123.5]
    data = [[XYdf['lat'][i],XYdf['long'][i]] for i in XYdf.index]
    r2_dict = []
    for i in neighborhood_number_range:
        counter = counter_num
        average_accuracy_list = []
        while counter > 0:
            km = KMeans(n_clusters=i)
            km.fit(data)
            neighborhoods = km.cluster_centers_
            neighborhoods = neighborhoods.tolist()
            for x in enumerate(neighborhoods):
                x[1].append(x[0])
            neighborhoodlist = []
            for z in dframe.index:
                neighborhoodlist.append(clusterer(dframe['lat'][z],dframe['long'][z],neighborhoods))
            dframecopy = dframe.copy()
            dframecopy['neighborhood'] = Series((neighborhoodlist), index=dframe.index)
            df2 = dframecopy[dframe.price < 10000][['bath','bed','feet','dog','cat','content','getphotos', 'hasmap', 'price','neighborhood']].dropna()
            features = df2[['bath','bed','feet','dog','cat','content','getphotos', 'hasmap', 'neighborhood']].values
            price = df2[['price']].values
            features_train, features_test, price_train, price_test = train_test_split(features, price, test_size=0.1)
            reg = RandomForestRegressor()
            reg = reg.fit(features_train, price_train)
            forest_pred = reg.predict(features_test)
            forest_pred = np.array([[item] for item in forest_pred])
            counter -= 1
            average_accuracy_list.append(r2_score(forest_pred, price_test))
        total = 0
        for entry in average_accuracy_list:
            total += entry
        r2_accuracy = total/len(average_accuracy_list)
        r2_dict.append((i,r2_accuracy))
    print r2_dict
    return r2_dict

In [ ]:
neighborhood_number_range = [i for _,i in enumerate(range(2,31,2))]
neighborhood_number_range

In [ ]:
r2_dict = neighborhood_optimizer(dframe,neighborhood_number_range,10)

In [ ]:
r2_dict[:][0]

In [ ]:
plt.scatter([x[0] for x in r2_dict],[x[1] for x in r2_dict])

Looks like the optimum is right around 10 or 11, and then starts to drop off. Let's get a little more granular and look at a smaller range


In [ ]:
neighborhood_number_range = [i for _,i in enumerate(range(7,15))]
neighborhood_number_range

In [ ]:
r2_dict = neighborhood_optimizer(dframe,neighborhood_number_range,10)

In [ ]:
print r2_dict
plt.scatter([x[0] for x in r2_dict],[x[1] for x in r2_dict])

Trying a few times, it looks like 10, 11 and 12 get the best results at ~.85. Of course, we'll need to redo some of these optomizations after we properly process our data. Hopefully we'll see some more consistency then too.


In [ ]:
r2_dict = neighborhood_optimizer(dframe,[10,11,12],25)

Note #1 to Riley: (From Last time) Perhaps look into another regressor? see if there's one that's inherantly better at this kind of thing.

Note #2 to Riley: Figure out how to process data so that you don't have to drop null values

Note #3 to Riley: convert categorical data into binary

Note #4 to Riley: I wonder if increasing the number of neighborhoods would become more accurate as we collect more data? like you could create a bunch of little accurate models instead of a bunch of bigger ones.

Learned: If you plan on using Decision Tree/Random Forest from SKLearn, make sure you collect your discrete variables in separate columns and make them binary yes or no(0 or 1).