Homework 1 - Data Analysis and Regression

In this assignment your challenge is to do some basic analysis for Airbnb. Provided in hw/data/ there are 2 data files, bookings.csv and listings.csv. The objective is to practice data munging and begin our exploration of regression.



In [483]:

    
# Standard imports for data analysis packages in Python
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import Image

# This enables inline Plots
%matplotlib inline

Part 1 - Data exploration

First, create 2 data frames: `listings` and `bookings` from their respective data files



In [484]:

    
#Extracting Data from files and storing in dataframe
listings = pd.read_csv('../data/listings.csv')
bookings = pd.read_csv('../data/bookings.csv')



In [488]:

    
#Identifying the columns in the listings dataframe
listings.head()









    Out[488]:






  
    
      
      prop_id
      prop_type
      neighborhood
      price
      person_capacity
      picture_count
      description_length
      tenure_months
    
  
  
    
      0
       1
       Property type 1
       Neighborhood 14
       140
       3
       11
       232
       30
    
    
      1
       2
       Property type 1
       Neighborhood 14
        95
       2
        3
        37
       29
    
    
      2
       3
       Property type 2
       Neighborhood 16
        95
       2
       16
       172
       29
    
    
      3
       4
       Property type 2
       Neighborhood 13
        90
       2
       19
       472
       28
    
    
      4
       5
       Property type 1
       Neighborhood 15
       125
       5
       21
       442
       28
    
  

5 rows × 8 columns



In [489]:

    
#Identifying the columns in the bookings dataframe
bookings.head()









    Out[489]:






  
    
      
      prop_id
      booking_date
    
  
  
    
      0
        9
       2011-06-17
    
    
      1
       13
       2011-08-12
    
    
      2
       21
       2011-06-20
    
    
      3
       28
       2011-05-05
    
    
      4
       29
       2011-11-17
    
  

5 rows × 2 columns

What is the mean, median and standard deviation of price, person capacity, picture count, description length and tenure of the properties?



In [490]:

    
#Mean, Standard Deviation of price, person capacity, picture count, description length and tenure of the properties 
listings.iloc[: , 3:8].describe()









    Out[490]:






  
    
      
      price
      person_capacity
      picture_count
      description_length
      tenure_months
    
  
  
    
      count
        408.000000
       408.000000
       408.000000
        408.000000
       408.000000
    
    
      mean
        187.806373
         2.997549
        14.389706
        309.159314
         8.487745
    
    
      std
        353.050858
         1.594676
        10.477428
        228.021684
         5.872088
    
    
      min
         39.000000
         1.000000
         1.000000
          0.000000
         1.000000
    
    
      25%
         90.000000
         2.000000
         6.000000
        179.000000
         4.000000
    
    
      50%
        125.000000
         2.000000
        12.000000
        250.000000
         7.000000
    
    
      75%
        199.000000
         4.000000
        20.000000
        389.500000
        13.000000
    
    
      max
       5000.000000
        10.000000
        71.000000
       1969.000000
        30.000000
    
  

8 rows × 5 columns



In [480]:

    
#Median of the above columns
listings.iloc[: , 3:8].median()









    Out[480]:





price                 125
person_capacity         2
picture_count          12
description_length    250
tenure_months           7
dtype: float64

What what are the mean price, person capacity, picture count, description length and tenure of the properties grouped by property type?



In [491]:

    
# Mean Price, person capacity, picture count, description length, tenure months grouped by property type
listings.groupby(["prop_type"])["price","person_capacity","picture_count","description_length","tenure_months"].mean()









    Out[491]:






  
    
      
      price
      person_capacity
      picture_count
      description_length
      tenure_months
    
    
      prop_type
      
      
      
      
      
    
  
  
    
      Property type 1
       237.085502
       3.516729
       14.695167
       313.171004
        8.464684
    
    
      Property type 2
        93.288889
       2.000000
       13.948148
       304.851852
        8.377778
    
    
      Property type 3
        63.750000
       1.750000
        8.750000
       184.750000
       13.750000
    
  

3 rows × 5 columns

Same, but by property type per neighborhood?



In [492]:

    
listings.groupby(["prop_type","neighborhood"])["price","person_capacity","picture_count","description_length","tenure_months"].mean()









    Out[492]:






  
    
      
      
      price
      person_capacity
      picture_count
      description_length
      tenure_months
    
    
      prop_type
      neighborhood
      
      
      
      
      
    
  
  
    
      Property type 1
      Neighborhood 1
        85.000000
       2.000000
       26.000000
       209.000000
        6.000000
    
    
      Neighborhood 10
       142.500000
       3.500000
       13.333333
       391.000000
        3.833333
    
    
      Neighborhood 11
       159.428571
       3.214286
        9.928571
       379.000000
        9.642857
    
    
      Neighborhood 12
       365.615385
       3.435897
       10.820513
       267.205128
        7.897436
    
    
      Neighborhood 13
       241.897959
       4.061224
       15.653061
       290.408163
        9.122449
    
    
      Neighborhood 14
       164.676471
       3.205882
       14.764706
       317.205882
        8.441176
    
    
      Neighborhood 15
       178.880000
       3.720000
       14.320000
       321.760000
        9.320000
    
    
      Neighborhood 16
       158.928571
       2.928571
       21.642857
       310.714286
        7.071429
    
    
      Neighborhood 17
       189.869565
       3.521739
       16.086957
       317.347826
        9.869565
    
    
      Neighborhood 18
       173.590909
       2.954545
       16.090909
       369.227273
        8.227273
    
    
      Neighborhood 19
       222.375000
       3.625000
       11.000000
       254.500000
        6.500000
    
    
      Neighborhood 2
       250.000000
       6.000000
        8.000000
       423.000000
        6.000000
    
    
      Neighborhood 20
       804.333333
       2.777778
        9.444444
       223.555556
        9.666667
    
    
      Neighborhood 21
       362.500000
       4.250000
       49.000000
       306.250000
       14.750000
    
    
      Neighborhood 22
       225.000000
       3.000000
       19.000000
       500.000000
        9.000000
    
    
      Neighborhood 5
       194.500000
       2.500000
        8.500000
       266.500000
       11.500000
    
    
      Neighborhood 6
       146.000000
       3.333333
       12.666667
       290.666667
        4.000000
    
    
      Neighborhood 7
       161.000000
       3.666667
       14.333333
       343.000000
        5.333333
    
    
      Neighborhood 8
       174.750000
       5.000000
       11.000000
       300.000000
        6.750000
    
    
      Neighborhood 9
       151.142857
       4.285714
       13.428571
       471.428571
        5.714286
    
    
      Property type 2
      Neighborhood 10
       137.500000
       2.000000
       20.000000
       126.000000
        3.500000
    
    
      Neighborhood 11
        78.750000
       2.000000
       16.750000
       161.250000
       11.250000
    
    
      Neighborhood 12
        96.894737
       1.947368
       10.473684
       244.526316
        9.842105
    
    
      Neighborhood 13
        81.130435
       1.826087
       16.695652
       418.565217
        9.739130
    
    
      Neighborhood 14
        83.809524
       1.857143
       15.904762
       348.619048
        8.714286
    
    
      Neighborhood 15
        95.000000
       2.266667
       11.733333
       301.733333
        8.200000
    
    
      Neighborhood 16
        83.625000
       2.062500
       15.375000
       246.250000
        6.687500
    
    
      Neighborhood 17
       102.454545
       2.000000
       15.454545
       308.272727
        7.181818
    
    
      Neighborhood 18
       120.666667
       2.222222
       12.333333
       297.777778
        9.222222
    
    
      Neighborhood 19
        88.875000
       2.000000
       15.125000
       383.375000
        5.500000
    
    
      Neighborhood 20
        60.000000
       1.000000
        3.000000
       101.000000
        6.000000
    
    
      Neighborhood 3
        60.000000
       2.000000
        7.000000
       264.000000
        9.000000
    
    
      Neighborhood 4
        60.000000
       2.000000
       10.000000
        95.000000
       11.000000
    
    
      Neighborhood 7
       100.000000
       2.000000
        3.000000
       148.000000
        2.000000
    
    
      Neighborhood 8
       350.000000
       4.000000
        5.000000
       223.000000
        3.000000
    
    
      Neighborhood 9
       110.000000
       2.000000
        3.500000
       114.500000
        9.000000
    
    
      Property type 3
      Neighborhood 11
        75.000000
       2.000000
       15.000000
       196.000000
        8.000000
    
    
      Neighborhood 14
        75.000000
       1.000000
        1.000000
       113.000000
        5.000000
    
    
      Neighborhood 17
        65.000000
       2.000000
       15.000000
       189.000000
       23.000000
    
    
      Neighborhood 4
        40.000000
       2.000000
        4.000000
       241.000000
       19.000000
    
  

40 rows × 5 columns

Plot daily bookings:



In [590]:

    
#Line Plot using valu_counts.
bookings.booking_date.value_counts().plot(kind='line')









    Out[590]:





<matplotlib.axes.AxesSubplot at 0x1265ca350>



In [494]:

    
#line plot using group by
bookings.groupby("booking_date")["booking_date"].count().plot(kind = 'line')









    Out[494]:





<matplotlib.axes.AxesSubplot at 0x12ece6f50>

Plot the daily bookings per neighborhood (provide a legend)



In [495]:

    
#listmerge.groupby(["neighborhood","booking_date"])"prop_id"].agg(['count']).unstack(0)

Part 2 - Develop a data set



In [496]:

    
#Grouping the bookings data in order to get the counts per property
nbookings = bookings.groupby("prop_id")[["prop_id"]].count()

#renaming the count column to number_of_bookings as required in the question
nbookings.rename(columns = {"booking_date" :  "number_of_bookings"}, inplace = True)

#dropping redundant columns
nbookings.drop("prop_id", axis=1, inplace=True)  # Axis 0 is rows, Axis 1 is columns



In [509]:

    
#nbookings



In [499]:

    
#indexing the listings to prop_id. In order to do that, creating a column to replicate the prop_id. Used this to join with same index in bookings
listings["prop_index"]=listings["prop_id"]
listings.set_index(["prop_index"], inplace=True)



In [507]:

    
#listings.prop_id



In [503]:

    
#joining bookings and listings on index. Here index is prop_id
listings = listings.join(nbookings)



In [510]:

    
#listings.head(100)

Add the columns `number_of_bookings` and `booking_rate` (number_of_bookings/tenure_months) to your `listings` data frame



In [511]:

    
#adding new calculated column
listings["booking_rate"] = listings["number_of_bookings"]/listings["tenure_months"]



In [513]:

    
listings.head(1)









    Out[513]:






  
    
      
      prop_id
      prop_type
      neighborhood
      price
      person_capacity
      picture_count
      description_length
      tenure_months
      number_of_bookings
      booking_rate
    
    
      prop_index
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1
       1
       Property type 1
       Neighborhood 14
       140
       3
       11
       232
       30
       4
       0.133333
    
  

1 rows × 10 columns

We only want to analyze well established properties, so let's filter out any properties that have a tenure less than 10 months



In [473]:

    
#moving the well established properties to another dataframe
listings_we = listings[listings.tenure_months >= 10]

`prop_type` and `neighborhood` are categorical variables, use `get_dummies()` (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.reshape.get_dummies.html) to transform this column of categorical data to many columns of boolean values (after applying this function correctly there should be 1 column for every prop_type and 1 column for every neighborhood category.



In [515]:

    
#seperating the prop_type into booloean values
dummies_prop_type = pd.get_dummies(listings_we["prop_type"])



In [516]:

    
#seperating the neighborhood into booloean values
dummies_neighborhood = pd.get_dummies(listings_we["neighborhood"])



In [589]:

    
dummies_neighborhood.info()
dummies_prop_type.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 144 entries, 1 to 144
Data columns (total 16 columns):
Neighborhood 11    144 non-null float64
Neighborhood 12    144 non-null float64
Neighborhood 13    144 non-null float64
Neighborhood 14    144 non-null float64
Neighborhood 15    144 non-null float64
Neighborhood 16    144 non-null float64
Neighborhood 17    144 non-null float64
Neighborhood 18    144 non-null float64
Neighborhood 19    144 non-null float64
Neighborhood 20    144 non-null float64
Neighborhood 21    144 non-null float64
Neighborhood 4     144 non-null float64
Neighborhood 5     144 non-null float64
Neighborhood 7     144 non-null float64
Neighborhood 8     144 non-null float64
Neighborhood 9     144 non-null float64
dtypes: float64(16)<class 'pandas.core.frame.DataFrame'>
Int64Index: 144 entries, 1 to 144
Data columns (total 3 columns):
Property type 1    144 non-null float64
Property type 2    144 non-null float64
Property type 3    144 non-null float64
dtypes: float64(3)



In [530]:

    
#joining the prop_type boolean columns to the main dataframe
listings_we = listings_we.join(dummies_prop_type)



In [531]:

    
#joining the neighborhood booloean values to the main dataframe
listings_we = listings_we.join(dummies_neighborhood)



In [533]:

    
listings_we.columns
#columns with null values
#booking_rate, number_of_bookings









    Out[533]:





Index([u'prop_id', u'prop_type', u'neighborhood', u'price', u'person_capacity', u'picture_count', u'description_length', u'tenure_months', u'number_of_bookings', u'booking_rate', u'Property type 1', u'Property type 2', u'Property type 3', u'Neighborhood 11', u'Neighborhood 12', u'Neighborhood 13', u'Neighborhood 14', u'Neighborhood 15', u'Neighborhood 16', u'Neighborhood 17', u'Neighborhood 18', u'Neighborhood 19', u'Neighborhood 20', u'Neighborhood 21', u'Neighborhood 4', u'Neighborhood 5', u'Neighborhood 7', u'Neighborhood 8', u'Neighborhood 9'], dtype='object')



In [534]:

    
#number of rows with null values in booking_rate
len(listings_we[pd.isnull(listings_we.booking_rate)])
#len(listings_we)
#len(listings)









    Out[534]:





31



In [535]:

    
#Since booking rate is going to be our predictor variable, let us not consider the rows with null(NaN) 
#booking rate in our train/test sets
listings_we = listings_we[listings_we.booking_rate.notnull()]

create test and training sets for your regressors and predictors

predictor (y) is booking_rate, regressors (X) are everything else, except prop_id,booking_rate,prop_type,neighborhood and number_of_bookings
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html
http://pandas.pydata.org/pandas-docs/stable/basics.html#dropping-labels-from-an-axis



In [536]:

    
from sklearn.cross_validation import train_test_split



In [537]:

    
#features is X and target is y
features = listings_we[["price","person_capacity","picture_count","description_length","tenure_months","Property type 1","Property type 2", u'Property type 3', u'Neighborhood 11', u'Neighborhood 12', u'Neighborhood 13', u'Neighborhood 14', u'Neighborhood 15', u'Neighborhood 16', u'Neighborhood 17', u'Neighborhood 18', u'Neighborhood 19', u'Neighborhood 20', u'Neighborhood 21', u'Neighborhood 4', u'Neighborhood 5', u'Neighborhood 9']].values
target = listings_we["booking_rate"].values



In [538]:

    
#X.shape gives the rows,column count of X
features.shape









    Out[538]:





(113, 22)



In [539]:

    
#y.shape gives the target value count
target.shape









    Out[539]:





(113,)



In [540]:

    
#values in target
target









    Out[540]:





array([ 0.13333333,  0.03448276,  0.96428571,  3.14285714,  0.08333333,
        1.2173913 ,  0.13043478,  1.13043478,  0.31818182,  2.04545455,
        2.85      ,  1.55      ,  0.63157895,  2.31578947,  0.36842105,
        0.05263158,  0.47368421,  0.5       ,  2.88888889,  1.52941176,
        0.17647059,  0.11764706,  4.        ,  0.5       ,  0.125     ,
        0.5625    ,  0.0625    ,  0.0625    ,  0.1875    ,  0.125     ,
        0.125     ,  0.125     ,  0.125     ,  0.125     ,  0.1875    ,
        0.0625    ,  0.1875    ,  0.125     ,  0.5       ,  0.1875    ,
        0.125     ,  0.125     ,  0.25      ,  0.1875    ,  0.0625    ,
        0.25      ,  0.125     ,  0.0625    ,  0.4375    ,  0.1875    ,
        0.4375    ,  0.86666667,  4.53333333,  1.86666667,  0.33333333,
        0.13333333,  0.06666667,  0.26666667,  0.13333333,  0.13333333,
        4.06666667,  0.2       ,  0.06666667,  1.06666667,  3.33333333,
        0.26666667,  1.        ,  1.5       ,  0.07142857,  0.64285714,
        4.92857143,  0.5       ,  0.85714286,  0.35714286,  1.21428571,
        1.30769231,  1.46153846,  1.61538462,  0.76923077,  2.53846154,
        0.91666667,  0.75      ,  2.58333333,  0.16666667,  0.25      ,
        3.08333333,  2.        ,  0.27272727,  0.72727273,  0.72727273,
        1.        ,  0.09090909,  3.3       ,  3.8       ,  3.2       ,
        3.6       ,  0.4       ,  3.        ,  4.1       ,  3.4       ,
        1.9       ,  6.7       ,  2.        ,  2.2       ,  3.7       ,
        4.6       ,  0.1       ,  3.7       ,  3.9       ,  1.        ,
        0.1       ,  4.8       ,  2.1       ])



In [562]:

    
#train test split
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=0.33)

Part 3 - Model `booking_rate`

Create a linear regression model of your listings



In [563]:

    
from sklearn.linear_model import LinearRegression
lr = LinearRegression()

fit your model with your test sets



In [564]:

    
lr.fit(features_train, target_train)









    Out[564]:





LinearRegression(copy_X=True, fit_intercept=True, normalize=False)



In [565]:

    
#predicting y from X_test using the linear model created from train data
target_pred = lr.predict(features_test)



In [566]:

    
#square sum error
sum_sq_model = np.sum((target_test - target_pred) ** 2)
sum_sq_model









    Out[566]:





101.6471710188284

report the score

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.score



In [567]:

    
lr.score(features_test,target_test)
#lr.score(features_train,target_train)
#features_train.shape









    Out[567]:





0.14628053517557793



In [568]:

    
#plotting original test Vs predicted
fig, ax = plt.subplots(1, 1)

ax.scatter(target_pred, target_test)
ax.set_xlabel('Predicted')
ax.set_ylabel('Actual')

# Draw the ideal line
ax.plot(target, target, 'r')









    Out[568]:





[<matplotlib.lines.Line2D at 0x12f948d90>]

Interpret the results of the above model:

What does the score method do?
What does this tell us about our model?

The score method gives us the R-square value. It returns the R^2 of self.predict(X) wrt. y. Best possible score is 1.0, lower values are worse. Since the score in this case are low, this tells us that the model is not good



In [568]:

Optional - Iterate

Create an alternative predictor (e.g. monthly revenue) and use the same modeling pattern in Part 3 to



In [569]:

    
#space to understand which columns is correlating with booking_rate
from pandas.tools.plotting import scatter_matrix
#scat = scatter_matrix(listings_we[["price","person_capacity","picture_count","description_length","tenure_months","Property type 1","Property type 2", u'Property type 3', u'Neighborhood 11', u'Neighborhood 12', u'Neighborhood 13', u'Neighborhood 14', u'Neighborhood 15', u'Neighborhood 16', u'Neighborhood 17', u'Neighborhood 18', u'Neighborhood 19', u'Neighborhood 20', u'Neighborhood 21', u'Neighborhood 4', u'Neighborhood 5', u'Neighborhood 9']])
scat = scatter_matrix(listings_we[["booking_rate","person_capacity"]])
#,"person_capacity","picture_count","description_length","tenure_months","Property type 1","Property type 2", u'Property type 3', u'Neighborhood 11', u'Neighborhood 12', u'Neighborhood 13', u'Neighborhood 14', u'Neighborhood 15', u'Neighborhood 16', u'Neighborhood 17', u'Neighborhood 18', u'Neighborhood 19', u'Neighborhood 20', u'Neighborhood 21', u'Neighborhood 4', u'Neighborhood 5', u'Neighborhood 9']])



In [576]:

    
#Taking log of booking rate. Histogram looks more normally distributed than just the booking rate. See below two histograms
booking_rate_log = np.log(listings_we.booking_rate)
#listings_we.booking_rate.hist()
booking_rate_log.hist()









    Out[576]:





<matplotlib.axes.AxesSubplot at 0x128c3f910>



In [571]:

    
#original boking rate
listings_we["booking_rate"].hist()









    Out[571]:





<matplotlib.axes.AxesSubplot at 0x129feb190>



In [585]:

    
#since taking the log gives negative results, shifting the data by 5 points.
listings_we["booking_rate_log"] = np.log(listings_we.booking_rate)+5



In [586]:

    
#below histogram (log + 5) is ore normal and on the positive axis
listings_we["booking_rate_log"].hist()









    Out[586]:





<matplotlib.axes.AxesSubplot at 0x129dee090>



In [587]:

    
#same steps with log
lr_log = LinearRegression()
features_log = listings_we[["price","person_capacity","picture_count","description_length","tenure_months","Property type 1","Property type 2", u'Property type 3', u'Neighborhood 11', u'Neighborhood 12', u'Neighborhood 13', u'Neighborhood 14', u'Neighborhood 15', u'Neighborhood 16', u'Neighborhood 17', u'Neighborhood 18', u'Neighborhood 19', u'Neighborhood 20', u'Neighborhood 21', u'Neighborhood 4', u'Neighborhood 5', u'Neighborhood 9']].values
target_log = listings_we["booking_rate_log"].values
features_log_train, features_log_test, target_log_train, target_log_test = train_test_split(features_log, target_log, test_size=0.33)
lr_log.fit(features_log_train, target_log_train)
target_log_pred = lr_log.predict(features_log_test)
sum_sq_model_log = np.sum((target_log_test - target_log_pred) ** 2)
print sum_sq_model_log
print lr_log.score(features_log_test,target_log_test)









    



58.8595012474
0.323705291529



In [588]:

    
#plotting graph same as above
fig, ax = plt.subplots(1, 1)

ax.scatter(target_log_pred, target_log_test)
ax.set_xlabel('Predicted')
ax.set_ylabel('Actual')

# Draw the ideal line
ax.plot(target, target, 'r')









    Out[588]:





[<matplotlib.lines.Line2D at 0x128f42fd0>]



In [464]:

    
#space for trying other transformations
booking_rate_arcsin = np.(listings_we.booking_rate)



In [465]:

    
booking_rate_arcsin.hist()









    Out[465]:





<matplotlib.axes.AxesSubplot at 0x1298c7bd0>



In [ ]:

	prop_id	prop_type	neighborhood	price	person_capacity	picture_count	description_length	tenure_months
0	1	Property type 1	Neighborhood 14	140	3	11	232	30
1	2	Property type 1	Neighborhood 14	95	2	3	37	29
2	3	Property type 2	Neighborhood 16	95	2	16	172	29
3	4	Property type 2	Neighborhood 13	90	2	19	472	28
4	5	Property type 1	Neighborhood 15	125	5	21	442	28

	prop_id	booking_date
0	9	2011-06-17
1	13	2011-08-12
2	21	2011-06-20
3	28	2011-05-05
4	29	2011-11-17

	price	person_capacity	picture_count	description_length	tenure_months
count	408.000000	408.000000	408.000000	408.000000	408.000000
mean	187.806373	2.997549	14.389706	309.159314	8.487745
std	353.050858	1.594676	10.477428	228.021684	5.872088
min	39.000000	1.000000	1.000000	0.000000	1.000000
25%	90.000000	2.000000	6.000000	179.000000	4.000000
50%	125.000000	2.000000	12.000000	250.000000	7.000000
75%	199.000000	4.000000	20.000000	389.500000	13.000000
max	5000.000000	10.000000	71.000000	1969.000000	30.000000

	price	person_capacity	picture_count	description_length	tenure_months
prop_type
Property type 1	237.085502	3.516729	14.695167	313.171004	8.464684
Property type 2	93.288889	2.000000	13.948148	304.851852	8.377778
Property type 3	63.750000	1.750000	8.750000	184.750000	13.750000

		price	person_capacity	picture_count	description_length	tenure_months
prop_type	neighborhood
Property type 1	Neighborhood 1	85.000000	2.000000	26.000000	209.000000	6.000000
	Neighborhood 10	142.500000	3.500000	13.333333	391.000000	3.833333
	Neighborhood 11	159.428571	3.214286	9.928571	379.000000	9.642857
	Neighborhood 12	365.615385	3.435897	10.820513	267.205128	7.897436
	Neighborhood 13	241.897959	4.061224	15.653061	290.408163	9.122449
	Neighborhood 14	164.676471	3.205882	14.764706	317.205882	8.441176
	Neighborhood 15	178.880000	3.720000	14.320000	321.760000	9.320000
	Neighborhood 16	158.928571	2.928571	21.642857	310.714286	7.071429
	Neighborhood 17	189.869565	3.521739	16.086957	317.347826	9.869565
	Neighborhood 18	173.590909	2.954545	16.090909	369.227273	8.227273
	Neighborhood 19	222.375000	3.625000	11.000000	254.500000	6.500000
	Neighborhood 2	250.000000	6.000000	8.000000	423.000000	6.000000
	Neighborhood 20	804.333333	2.777778	9.444444	223.555556	9.666667
	Neighborhood 21	362.500000	4.250000	49.000000	306.250000	14.750000
	Neighborhood 22	225.000000	3.000000	19.000000	500.000000	9.000000
	Neighborhood 5	194.500000	2.500000	8.500000	266.500000	11.500000
	Neighborhood 6	146.000000	3.333333	12.666667	290.666667	4.000000
	Neighborhood 7	161.000000	3.666667	14.333333	343.000000	5.333333
	Neighborhood 8	174.750000	5.000000	11.000000	300.000000	6.750000
	Neighborhood 9	151.142857	4.285714	13.428571	471.428571	5.714286
Property type 2	Neighborhood 10	137.500000	2.000000	20.000000	126.000000	3.500000
	Neighborhood 11	78.750000	2.000000	16.750000	161.250000	11.250000
	Neighborhood 12	96.894737	1.947368	10.473684	244.526316	9.842105
	Neighborhood 13	81.130435	1.826087	16.695652	418.565217	9.739130
	Neighborhood 14	83.809524	1.857143	15.904762	348.619048	8.714286
	Neighborhood 15	95.000000	2.266667	11.733333	301.733333	8.200000
	Neighborhood 16	83.625000	2.062500	15.375000	246.250000	6.687500
	Neighborhood 17	102.454545	2.000000	15.454545	308.272727	7.181818
	Neighborhood 18	120.666667	2.222222	12.333333	297.777778	9.222222
	Neighborhood 19	88.875000	2.000000	15.125000	383.375000	5.500000
	Neighborhood 20	60.000000	1.000000	3.000000	101.000000	6.000000
	Neighborhood 3	60.000000	2.000000	7.000000	264.000000	9.000000
	Neighborhood 4	60.000000	2.000000	10.000000	95.000000	11.000000
	Neighborhood 7	100.000000	2.000000	3.000000	148.000000	2.000000
	Neighborhood 8	350.000000	4.000000	5.000000	223.000000	3.000000
	Neighborhood 9	110.000000	2.000000	3.500000	114.500000	9.000000
Property type 3	Neighborhood 11	75.000000	2.000000	15.000000	196.000000	8.000000
	Neighborhood 14	75.000000	1.000000	1.000000	113.000000	5.000000
	Neighborhood 17	65.000000	2.000000	15.000000	189.000000	23.000000
	Neighborhood 4	40.000000	2.000000	4.000000	241.000000	19.000000

	prop_id	prop_type	neighborhood	price	person_capacity	picture_count	description_length	tenure_months	number_of_bookings	booking_rate
prop_index
1	1	Property type 1	Neighborhood 14	140	3	11	232	30	4	0.133333