Homework 1 - Data Analysis and Regression

In this assignment your challenge is to do some basic analysis for Airbnb. Provided in hw/data/ there are 2 data files, bookings.csv and listings.csv. The objective is to practice data munging and begin our exploration of regression.



In [2]:

    
# Standard imports for data analysis packages in Python
import pandas as pd
import numpy as np
import seaborn as sns  # for pretty layout of plots
import matplotlib.pyplot as plt

# This enables inline Plots
%matplotlib inline

Part 1 - Data exploration

First, create 2 data frames: `listings` and `bookings` from their respective data files



In [3]:

    
# Load airbnb data from data folder under hw/data/
bookings = pd.read_csv('../data/bookings.csv')
listings = pd.read_csv('../data/listings.csv')

bookings.head(5)
listings.head(5)









    Out[3]:






  
    
      
      prop_id
      prop_type
      neighborhood
      price
      person_capacity
      picture_count
      description_length
      tenure_months
    
  
  
    
      0
       1
       Property type 1
       Neighborhood 14
       140
       3
       11
       232
       30
    
    
      1
       2
       Property type 1
       Neighborhood 14
        95
       2
        3
        37
       29
    
    
      2
       3
       Property type 2
       Neighborhood 16
        95
       2
       16
       172
       29
    
    
      3
       4
       Property type 2
       Neighborhood 13
        90
       2
       19
       472
       28
    
    
      4
       5
       Property type 1
       Neighborhood 15
       125
       5
       21
       442
       28

What is the mean, median and standard deviation of price, person capacity, picture count, description length and tenure of the properties?



In [4]:

    
# Describe the dataset - This gives you a summary of numerical columns

listings.describe()









    Out[4]:






  
    
      
      prop_id
      price
      person_capacity
      picture_count
      description_length
      tenure_months
    
  
  
    
      count
       408.000000
        408.000000
       408.000000
       408.000000
        408.000000
       408.000000
    
    
      mean
       204.500000
        187.806373
         2.997549
        14.389706
        309.159314
         8.487745
    
    
      std
       117.923704
        353.050858
         1.594676
        10.477428
        228.021684
         5.872088
    
    
      min
         1.000000
         39.000000
         1.000000
         1.000000
          0.000000
         1.000000
    
    
      25%
       102.750000
         90.000000
         2.000000
         6.000000
        179.000000
         4.000000
    
    
      50%
       204.500000
        125.000000
         2.000000
        12.000000
        250.000000
         7.000000
    
    
      75%
       306.250000
        199.000000
         4.000000
        20.000000
        389.500000
        13.000000
    
    
      max
       408.000000
       5000.000000
        10.000000
        71.000000
       1969.000000
        30.000000

What what are the mean price, person capacity, picture count, description length and tenure of the properties grouped by property type?



In [5]:

    
# There are 3 different property types. Group by prop_type first and then describe.

listings_prop_type = listings.groupby(listings.prop_type)
listings_prop_type.describe()









    Out[5]:






  
    
      
      
      description_length
      person_capacity
      picture_count
      price
      prop_id
      tenure_months
    
    
      prop_type
      
      
      
      
      
      
      
    
  
  
    
      Property type 1
      count
        269.000000
       269.000000
       269.000000
        269.000000
       269.000000
       269.000000
    
    
      mean
        313.171004
         3.516729
        14.695167
        237.085502
       204.754647
         8.464684
    
    
      std
        214.769141
         1.644955
        10.623651
        425.710534
       119.643061
         5.773367
    
    
      min
         17.000000
         1.000000
         1.000000
         40.000000
         1.000000
         1.000000
    
    
      25%
        193.000000
         2.000000
         6.000000
        120.000000
        95.000000
         4.000000
    
    
      50%
        266.000000
         3.000000
        12.000000
        150.000000
       208.000000
         7.000000
    
    
      75%
        388.000000
         4.000000
        20.000000
        229.000000
       310.000000
        14.000000
    
    
      max
       1719.000000
        10.000000
        71.000000
       5000.000000
       408.000000
        30.000000
    
    
      Property type 2
      count
        135.000000
       135.000000
       135.000000
        135.000000
       135.000000
       135.000000
    
    
      mean
        304.851852
         2.000000
        13.948148
         93.288889
       206.392593
         8.377778
    
    
      std
        255.135332
         0.846415
        10.255191
         42.261246
       114.074805
         5.963654
    
    
      min
          0.000000
         1.000000
         1.000000
         39.000000
         3.000000
         1.000000
    
    
      25%
        150.500000
         2.000000
         6.500000
         69.000000
       120.500000
         4.000000
    
    
      50%
        239.000000
         2.000000
        11.000000
         89.000000
       200.000000
         7.000000
    
    
      75%
        402.500000
         2.000000
        19.500000
         99.000000
       299.500000
        10.500000
    
    
      max
       1969.000000
         6.000000
        56.000000
        350.000000
       405.000000
        29.000000
    
    
      Property type 3
      count
          4.000000
         4.000000
         4.000000
          4.000000
         4.000000
         4.000000
    
    
      mean
        184.750000
         1.750000
         8.750000
         63.750000
       123.500000
        13.750000
    
    
      std
         53.093471
         0.500000
         7.320064
         16.520190
       132.879645
         8.616844
    
    
      min
        113.000000
         1.000000
         1.000000
         40.000000
        10.000000
         5.000000
    
    
      25%
        170.000000
         1.750000
         3.250000
         58.750000
        17.500000
         7.250000
    
    
      50%
        192.500000
         2.000000
         9.500000
         70.000000
        99.000000
        13.500000
    
    
      75%
        207.250000
         2.000000
        15.000000
         75.000000
       205.000000
        20.000000
    
    
      max
        241.000000
         2.000000
        15.000000
         75.000000
       286.000000
        23.000000

Same, but by property type per neighborhood?



In [6]:

    
# There are 3 different property types and 22 neighborhoods. Group by prop_type and neighborhood first and then describe.

listings_prop_type = listings.groupby([listings.prop_type, listings.neighborhood])
listings_prop_type.describe()









    Out[6]:






  
    
      
      
      
      description_length
      person_capacity
      picture_count
      price
      prop_id
      tenure_months
    
    
      prop_type
      neighborhood
      
      
      
      
      
      
      
    
  
  
    
      Property type 1
      Neighborhood 1
      count
          1.000000
        1.000000
        1.000000
         1.000000
         1.000000
        1.000000
    
    
      mean
        209.000000
        2.000000
       26.000000
        85.000000
       235.000000
        6.000000
    
    
      std
               NaN
             NaN
             NaN
              NaN
              NaN
             NaN
    
    
      min
        209.000000
        2.000000
       26.000000
        85.000000
       235.000000
        6.000000
    
    
      25%
        209.000000
        2.000000
       26.000000
        85.000000
       235.000000
        6.000000
    
    
      50%
        209.000000
        2.000000
       26.000000
        85.000000
       235.000000
        6.000000
    
    
      75%
        209.000000
        2.000000
       26.000000
        85.000000
       235.000000
        6.000000
    
    
      max
        209.000000
        2.000000
       26.000000
        85.000000
       235.000000
        6.000000
    
    
      Neighborhood 10
      count
          6.000000
        6.000000
        6.000000
         6.000000
         6.000000
        6.000000
    
    
      mean
        391.000000
        3.500000
       13.333333
       142.500000
       307.500000
        3.833333
    
    
      std
        146.929915
        1.224745
        8.571270
        36.979724
        55.301899
        1.602082
    
    
      min
        160.000000
        2.000000
        4.000000
        90.000000
       255.000000
        1.000000
    
    
      25%
        312.250000
        2.500000
        6.750000
       135.000000
       265.000000
        3.250000
    
    
      50%
        425.500000
        4.000000
       12.000000
       137.500000
       297.000000
        4.500000
    
    
      75%
        499.000000
        4.000000
       20.250000
       147.500000
       330.500000
        5.000000
    
    
      max
        537.000000
        5.000000
       24.000000
       205.000000
       400.000000
        5.000000
    
    
      Neighborhood 11
      count
         14.000000
       14.000000
       14.000000
        14.000000
        14.000000
       14.000000
    
    
      mean
        379.000000
        3.214286
        9.928571
       159.428571
       174.000000
        9.642857
    
    
      std
        396.956111
        1.311404
        5.928605
        70.962302
       110.198422
        5.212517
    
    
      min
         82.000000
        2.000000
        5.000000
        95.000000
        52.000000
        1.000000
    
    
      25%
        242.500000
        2.000000
        6.000000
       103.750000
        70.500000
        6.000000
    
    
      50%
        295.500000
        3.000000
        8.000000
       130.000000
       142.500000
        9.500000
    
    
      75%
        373.000000
        3.750000
        9.750000
       196.500000
       243.500000
       15.500000
    
    
      max
       1719.000000
        6.000000
       23.000000
       319.000000
       401.000000
       16.000000
    
    
      Neighborhood 12
      count
         39.000000
       39.000000
       39.000000
        39.000000
        39.000000
       39.000000
    
    
      mean
        267.205128
        3.435897
       10.820513
       365.615385
       211.307692
        7.897436
    
    
      std
        137.867820
        1.874972
        7.118810
       686.086484
       118.670520
        5.290482
    
    
      min
         45.000000
        1.000000
        1.000000
        60.000000
        22.000000
        1.000000
    
    
      25%
        182.000000
        2.000000
        6.000000
       125.000000
       105.500000
        3.000000
    
    
      50%
        251.000000
        3.000000
        8.000000
       150.000000
       223.000000
        6.000000
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      Property type 3
      Neighborhood 11
      std
               NaN
             NaN
             NaN
              NaN
              NaN
             NaN
    
    
      min
        196.000000
        2.000000
       15.000000
        75.000000
       178.000000
        8.000000
    
    
      25%
        196.000000
        2.000000
       15.000000
        75.000000
       178.000000
        8.000000
    
    
      50%
        196.000000
        2.000000
       15.000000
        75.000000
       178.000000
        8.000000
    
    
      75%
        196.000000
        2.000000
       15.000000
        75.000000
       178.000000
        8.000000
    
    
      max
        196.000000
        2.000000
       15.000000
        75.000000
       178.000000
        8.000000
    
    
      Neighborhood 14
      count
          1.000000
        1.000000
        1.000000
         1.000000
         1.000000
        1.000000
    
    
      mean
        113.000000
        1.000000
        1.000000
        75.000000
       286.000000
        5.000000
    
    
      std
               NaN
             NaN
             NaN
              NaN
              NaN
             NaN
    
    
      min
        113.000000
        1.000000
        1.000000
        75.000000
       286.000000
        5.000000
    
    
      25%
        113.000000
        1.000000
        1.000000
        75.000000
       286.000000
        5.000000
    
    
      50%
        113.000000
        1.000000
        1.000000
        75.000000
       286.000000
        5.000000
    
    
      75%
        113.000000
        1.000000
        1.000000
        75.000000
       286.000000
        5.000000
    
    
      max
        113.000000
        1.000000
        1.000000
        75.000000
       286.000000
        5.000000
    
    
      Neighborhood 17
      count
          1.000000
        1.000000
        1.000000
         1.000000
         1.000000
        1.000000
    
    
      mean
        189.000000
        2.000000
       15.000000
        65.000000
        10.000000
       23.000000
    
    
      std
               NaN
             NaN
             NaN
              NaN
              NaN
             NaN
    
    
      min
        189.000000
        2.000000
       15.000000
        65.000000
        10.000000
       23.000000
    
    
      25%
        189.000000
        2.000000
       15.000000
        65.000000
        10.000000
       23.000000
    
    
      50%
        189.000000
        2.000000
       15.000000
        65.000000
        10.000000
       23.000000
    
    
      75%
        189.000000
        2.000000
       15.000000
        65.000000
        10.000000
       23.000000
    
    
      max
        189.000000
        2.000000
       15.000000
        65.000000
        10.000000
       23.000000
    
    
      Neighborhood 4
      count
          1.000000
        1.000000
        1.000000
         1.000000
         1.000000
        1.000000
    
    
      mean
        241.000000
        2.000000
        4.000000
        40.000000
        20.000000
       19.000000
    
    
      std
               NaN
             NaN
             NaN
              NaN
              NaN
             NaN
    
    
      min
        241.000000
        2.000000
        4.000000
        40.000000
        20.000000
       19.000000
    
    
      25%
        241.000000
        2.000000
        4.000000
        40.000000
        20.000000
       19.000000
    
    
      50%
        241.000000
        2.000000
        4.000000
        40.000000
        20.000000
       19.000000
    
    
      75%
        241.000000
        2.000000
        4.000000
        40.000000
        20.000000
       19.000000
    
    
      max
        241.000000
        2.000000
        4.000000
        40.000000
        20.000000
       19.000000
    
  

320 rows × 6 columns

Plot daily bookings:



In [88]:

    
# datetime field is imported as "object" which is the same as strings
bookings.booking_date = pd.to_datetime(bookings.booking_date)
# create new data frame with date and number of bookings
plot_bookings = bookings['booking_date'].value_counts()
# sort by date
plot_bookings.sort_index()
plot_bookings.plot()









    Out[88]:





<matplotlib.axes._subplots.AxesSubplot at 0x10a341310>



In [70]:

    
###Plot the daily bookings per neighborhood (provide a legend)



In [91]:

    
listings.neighborhood.value_counts()









    Out[91]:





Neighborhood 13    72
Neighborhood 12    58
Neighborhood 14    56
Neighborhood 15    40
Neighborhood 17    35
Neighborhood 18    31
Neighborhood 16    30
Neighborhood 11    19
Neighborhood 19    16
Neighborhood 20    10
Neighborhood 9      9
Neighborhood 10     8
Neighborhood 8      5
Neighborhood 7      4
Neighborhood 21     4
Neighborhood 6      3
Neighborhood 4      2
Neighborhood 5      2
Neighborhood 22     1
Neighborhood 2      1
Neighborhood 3      1
Neighborhood 1      1
dtype: int64

Part 2 - Develop a data set



In [ ]:

Add the columns `number_of_bookings` and `booking_rate` (number_of_bookings/tenure_months) to your `listings` data frame



In [36]:

    
bookings_by_prop = bookings.groupby('prop_id')[['prop_id']].count()
bookings_by_prop.rename(columns={'prop_id': 'number_of_bookings'}, inplace=True)
bookings_by_prop.reset_index(inplace=True)
bookings_by_prop.head()

listings.merge(bookings_by_prop, on='prop_id', how='left')

#def get_booking_rate(numb):   
    #return numb
    
#listings['booking_rate'] = listings['number_of_bookings'].map(get_booking_rate)









    Out[36]:






  
    
      
      prop_id
      prop_type
      neighborhood
      price
      person_capacity
      picture_count
      description_length
      tenure_months
      number_of_bookings
    
  
  
    
      0  
         1
       Property type 1
       Neighborhood 14
       140
       3
       11
        232
       30
        4
    
    
      1  
         2
       Property type 1
       Neighborhood 14
        95
       2
        3
         37
       29
      NaN
    
    
      2  
         3
       Property type 2
       Neighborhood 16
        95
       2
       16
        172
       29
        1
    
    
      3  
         4
       Property type 2
       Neighborhood 13
        90
       2
       19
        472
       28
       27
    
    
      4  
         5
       Property type 1
       Neighborhood 15
       125
       5
       21
        442
       28
      NaN
    
    
      5  
         6
       Property type 2
       Neighborhood 13
        89
       2
       10
        886
       28
       88
    
    
      6  
         7
       Property type 2
       Neighborhood 13
        85
       1
       11
         58
       24
        2
    
    
      7  
         8
       Property type 1
       Neighborhood 18
       120
       2
       13
        685
       23
       28
    
    
      8  
         9
       Property type 1
       Neighborhood 13
       210
       6
       27
        180
       23
        3
    
    
      9  
        10
       Property type 3
       Neighborhood 17
        65
       2
       15
        189
       23
       26
    
    
      10 
        11
       Property type 1
       Neighborhood 13
       145
       3
        9
        140
       22
        7
    
    
      11 
        12
       Property type 2
       Neighborhood 13
        89
       2
       11
        153
       22
       45
    
    
      12 
        13
       Property type 2
       Neighborhood 14
        96
       2
       10
        245
       20
       57
    
    
      13 
        14
       Property type 2
       Neighborhood 17
        95
       2
        8
        139
       20
       31
    
    
      14 
        15
       Property type 2
       Neighborhood 11
        95
       2
       15
        255
       19
       12
    
    
      15 
        16
       Property type 2
       Neighborhood 14
        95
       2
       39
        334
       19
       44
    
    
      16 
        17
       Property type 2
       Neighborhood 12
        65
       2
        7
        333
       19
        7
    
    
      17 
        18
       Property type 2
       Neighborhood 14
       100
       2
        5
         39
       19
      NaN
    
    
      18 
        19
       Property type 2
       Neighborhood 12
        65
       2
        9
        448
       19
        1
    
    
      19 
        20
       Property type 3
        Neighborhood 4
        40
       2
        4
        241
       19
        9
    
    
      20 
        21
       Property type 1
       Neighborhood 13
       295
       5
       22
        228
       18
        9
    
    
      21 
        22
       Property type 1
       Neighborhood 12
       275
       4
       20
        523
       18
      NaN
    
    
      22 
        23
       Property type 2
       Neighborhood 14
        90
       3
       15
        411
       18
       52
    
    
      23 
        24
       Property type 1
       Neighborhood 16
       110
       2
       10
        495
       17
       26
    
    
      24 
        25
       Property type 1
       Neighborhood 18
       215
       2
       16
        190
       17
        3
    
    
      25 
        26
       Property type 1
       Neighborhood 14
       139
       2
       20
        395
       17
        2
    
    
      26 
        27
       Property type 1
       Neighborhood 15
       180
       6
       26
        325
       17
       68
    
    
      27 
        28
       Property type 2
       Neighborhood 12
        95
       2
        8
        137
       16
        8
    
    
      28 
        29
       Property type 1
       Neighborhood 17
       125
       3
       31
        327
       16
        2
    
    
      29 
        30
       Property type 2
       Neighborhood 17
        90
       3
        5
        582
       16
        9
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      378
       379
       Property type 1
       Neighborhood 18
        79
       2
        5
        348
        2
        3
    
    
      379
       380
       Property type 1
       Neighborhood 14
       145
       4
        9
        253
        2
       33
    
    
      380
       381
       Property type 1
       Neighborhood 16
       159
       3
       17
        206
        2
       23
    
    
      381
       382
       Property type 1
       Neighborhood 19
       120
       4
       19
        203
        2
        4
    
    
      382
       383
       Property type 1
       Neighborhood 14
       200
       4
       13
        116
        2
      NaN
    
    
      383
       384
       Property type 1
       Neighborhood 14
       425
       6
       13
        449
        2
        4
    
    
      384
       385
       Property type 1
       Neighborhood 13
       110
       4
       22
        464
        1
        9
    
    
      385
       386
       Property type 1
       Neighborhood 14
        97
       2
        5
        195
        1
       52
    
    
      386
       387
       Property type 1
       Neighborhood 14
       125
       2
        6
        425
        1
       43
    
    
      387
       388
       Property type 1
       Neighborhood 15
       229
       4
        5
        301
        1
      NaN
    
    
      388
       389
       Property type 2
       Neighborhood 17
        39
       1
       21
        247
        1
        8
    
    
      389
       390
       Property type 1
       Neighborhood 12
       150
       2
        9
        141
        1
       10
    
    
      390
       391
       Property type 1
       Neighborhood 14
        95
       2
        8
        217
        1
       33
    
    
      391
       392
       Property type 2
       Neighborhood 17
       105
       2
       26
        224
        1
       18
    
    
      392
       393
       Property type 1
       Neighborhood 19
       319
       6
        8
        213
        1
       16
    
    
      393
       394
       Property type 2
       Neighborhood 15
       190
       4
       12
        179
        1
        6
    
    
      394
       395
       Property type 1
       Neighborhood 15
       120
       2
        5
        192
        1
       12
    
    
      395
       396
       Property type 1
       Neighborhood 13
       120
       4
        4
        374
        1
      NaN
    
    
      396
       397
       Property type 1
       Neighborhood 13
       139
       4
       13
        303
        1
       24
    
    
      397
       398
       Property type 1
       Neighborhood 12
        60
       2
       10
        102
        1
        1
    
    
      398
       399
       Property type 2
       Neighborhood 16
       100
       1
        6
         81
        1
      NaN
    
    
      399
       400
       Property type 1
       Neighborhood 10
        90
       2
       23
        523
        1
        9
    
    
      400
       401
       Property type 1
       Neighborhood 11
       150
       4
        8
         82
        1
        2
    
    
      401
       402
       Property type 1
        Neighborhood 7
       300
       4
       19
        550
        1
        5
    
    
      402
       403
       Property type 1
       Neighborhood 19
       100
       4
        2
         69
        1
        1
    
    
      403
       404
       Property type 2
       Neighborhood 14
       100
       1
        8
        235
        1
        3
    
    
      404
       405
       Property type 2
       Neighborhood 13
        85
       2
       27
       1048
        1
       19
    
    
      405
       406
       Property type 1
        Neighborhood 9
        70
       3
       18
        153
        1
       19
    
    
      406
       407
       Property type 1
       Neighborhood 13
       129
       2
       13
        370
        1
       15
    
    
      407
       408
       Property type 1
       Neighborhood 14
       100
       3
       21
        707
        1
       54
    
  

408 rows × 9 columns

We only want to analyze well established properties, so let's filter out any properties that have a tenure less than 10 months



In [35]:

    
listings_filtered = listings[listings.tenure_months >= 10]
listings_filtered.describe()









    Out[35]:






  
    
      
      prop_id
      price
      person_capacity
      picture_count
      description_length
      tenure_months
    
  
  
    
      count
       144.000000
        144.000000
       144.000000
       144.000000
        144.000000
       144.000000
    
    
      mean
        72.500000
        188.895833
         3.354167
        14.104167
        308.645833
        15.173611
    
    
      std
        41.713307
        215.771464
         1.837712
        10.890246
        177.770044
         4.215074
    
    
      min
         1.000000
         40.000000
         1.000000
         1.000000
          0.000000
        10.000000
    
    
      25%
        36.750000
         93.750000
         2.000000
         6.000000
        198.500000
        12.000000
    
    
      50%
        72.500000
        135.000000
         3.000000
        11.000000
        282.500000
        15.000000
    
    
      75%
       108.250000
        245.250000
         4.000000
        18.250000
        388.000000
        16.000000
    
    
      max
       144.000000
       2394.000000
         8.000000
        71.000000
       1111.000000
        30.000000

`prop_type` and `neighborhood` are categorical variables, use `get_dummies()` (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.reshape.get_dummies.html) to transform this column of categorical data to many columns of boolean values (after applying this function correctly there should be 1 column for every prop_type and 1 column for every neighborhood category.



In [81]:

    
import pandas as pd
dummies1 = pd.get_dummies(listings['neighborhood'])
dummies1

dummies2 = pd.get_dummies(listings['prop_type'])
dummies2









    Out[81]:






  
    
      
      Property type 1
      Property type 2
      Property type 3
    
  
  
    
      0  
       1
       0
       0
    
    
      1  
       1
       0
       0
    
    
      2  
       0
       1
       0
    
    
      3  
       0
       1
       0
    
    
      4  
       1
       0
       0
    
    
      5  
       0
       1
       0
    
    
      6  
       0
       1
       0
    
    
      7  
       1
       0
       0
    
    
      8  
       1
       0
       0
    
    
      9  
       0
       0
       1
    
    
      10 
       1
       0
       0
    
    
      11 
       0
       1
       0
    
    
      12 
       0
       1
       0
    
    
      13 
       0
       1
       0
    
    
      14 
       0
       1
       0
    
    
      15 
       0
       1
       0
    
    
      16 
       0
       1
       0
    
    
      17 
       0
       1
       0
    
    
      18 
       0
       1
       0
    
    
      19 
       0
       0
       1
    
    
      20 
       1
       0
       0
    
    
      21 
       1
       0
       0
    
    
      22 
       0
       1
       0
    
    
      23 
       1
       0
       0
    
    
      24 
       1
       0
       0
    
    
      25 
       1
       0
       0
    
    
      26 
       1
       0
       0
    
    
      27 
       0
       1
       0
    
    
      28 
       1
       0
       0
    
    
      29 
       0
       1
       0
    
    
      ...
      ...
      ...
      ...
    
    
      378
       1
       0
       0
    
    
      379
       1
       0
       0
    
    
      380
       1
       0
       0
    
    
      381
       1
       0
       0
    
    
      382
       1
       0
       0
    
    
      383
       1
       0
       0
    
    
      384
       1
       0
       0
    
    
      385
       1
       0
       0
    
    
      386
       1
       0
       0
    
    
      387
       1
       0
       0
    
    
      388
       0
       1
       0
    
    
      389
       1
       0
       0
    
    
      390
       1
       0
       0
    
    
      391
       0
       1
       0
    
    
      392
       1
       0
       0
    
    
      393
       0
       1
       0
    
    
      394
       1
       0
       0
    
    
      395
       1
       0
       0
    
    
      396
       1
       0
       0
    
    
      397
       1
       0
       0
    
    
      398
       0
       1
       0
    
    
      399
       1
       0
       0
    
    
      400
       1
       0
       0
    
    
      401
       1
       0
       0
    
    
      402
       1
       0
       0
    
    
      403
       0
       1
       0
    
    
      404
       0
       1
       0
    
    
      405
       1
       0
       0
    
    
      406
       1
       0
       0
    
    
      407
       1
       0
       0
    
  

408 rows × 3 columns

create test and training sets for your regressors and predictors

predictor (y) is booking_rate, regressors (X) are everything else, except prop_id,booking_rate,prop_type,neighborhood and number_of_bookings
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html
http://pandas.pydata.org/pandas-docs/stable/basics.html#dropping-labels-from-an-axis



In [ ]:

    
from sklearn.cross_validation import train_test_split



In [ ]:

Part 3 - Model `booking_rate`

Create a linear regression model of your listings



In [ ]:

    
from sklearn.linear_model import LinearRegression
lr = LinearRegression()

fit your model with your test sets



In [ ]:

report the score

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.score



In [ ]:

Interpret the results of the above model:

What does the score method do?
What does this tell us about our model?

...type here...

Optional - Iterate

Create an alternative predictor (e.g. monthly revenue) and use the same modeling pattern in Part 3 to



In [ ]:

	prop_id	prop_type	neighborhood	price	person_capacity	picture_count	description_length	tenure_months
0	1	Property type 1	Neighborhood 14	140	3	11	232	30
1	2	Property type 1	Neighborhood 14	95	2	3	37	29
2	3	Property type 2	Neighborhood 16	95	2	16	172	29
3	4	Property type 2	Neighborhood 13	90	2	19	472	28
4	5	Property type 1	Neighborhood 15	125	5	21	442	28

	prop_id	price	person_capacity	picture_count	description_length	tenure_months
count	408.000000	408.000000	408.000000	408.000000	408.000000	408.000000
mean	204.500000	187.806373	2.997549	14.389706	309.159314	8.487745
std	117.923704	353.050858	1.594676	10.477428	228.021684	5.872088
min	1.000000	39.000000	1.000000	1.000000	0.000000	1.000000
25%	102.750000	90.000000	2.000000	6.000000	179.000000	4.000000
50%	204.500000	125.000000	2.000000	12.000000	250.000000	7.000000
75%	306.250000	199.000000	4.000000	20.000000	389.500000	13.000000
max	408.000000	5000.000000	10.000000	71.000000	1969.000000	30.000000

		description_length	person_capacity	picture_count	price	prop_id	tenure_months
prop_type
Property type 1	count	269.000000	269.000000	269.000000	269.000000	269.000000	269.000000
	mean	313.171004	3.516729	14.695167	237.085502	204.754647	8.464684
	std	214.769141	1.644955	10.623651	425.710534	119.643061	5.773367
	min	17.000000	1.000000	1.000000	40.000000	1.000000	1.000000
	25%	193.000000	2.000000	6.000000	120.000000	95.000000	4.000000
	50%	266.000000	3.000000	12.000000	150.000000	208.000000	7.000000
	75%	388.000000	4.000000	20.000000	229.000000	310.000000	14.000000
	max	1719.000000	10.000000	71.000000	5000.000000	408.000000	30.000000
Property type 2	count	135.000000	135.000000	135.000000	135.000000	135.000000	135.000000
	mean	304.851852	2.000000	13.948148	93.288889	206.392593	8.377778
	std	255.135332	0.846415	10.255191	42.261246	114.074805	5.963654
	min	0.000000	1.000000	1.000000	39.000000	3.000000	1.000000
	25%	150.500000	2.000000	6.500000	69.000000	120.500000	4.000000
	50%	239.000000	2.000000	11.000000	89.000000	200.000000	7.000000
	75%	402.500000	2.000000	19.500000	99.000000	299.500000	10.500000
	max	1969.000000	6.000000	56.000000	350.000000	405.000000	29.000000
Property type 3	count	4.000000	4.000000	4.000000	4.000000	4.000000	4.000000
	mean	184.750000	1.750000	8.750000	63.750000	123.500000	13.750000
	std	53.093471	0.500000	7.320064	16.520190	132.879645	8.616844
	min	113.000000	1.000000	1.000000	40.000000	10.000000	5.000000
	25%	170.000000	1.750000	3.250000	58.750000	17.500000	7.250000
	50%	192.500000	2.000000	9.500000	70.000000	99.000000	13.500000
	75%	207.250000	2.000000	15.000000	75.000000	205.000000	20.000000
	max	241.000000	2.000000	15.000000	75.000000	286.000000	23.000000

			description_length	person_capacity	picture_count	price	prop_id	tenure_months
prop_type	neighborhood
Property type 1	Neighborhood 1	count	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
		mean	209.000000	2.000000	26.000000	85.000000	235.000000	6.000000
		std	NaN	NaN	NaN	NaN	NaN	NaN
		min	209.000000	2.000000	26.000000	85.000000	235.000000	6.000000
		25%	209.000000	2.000000	26.000000	85.000000	235.000000	6.000000
		50%	209.000000	2.000000	26.000000	85.000000	235.000000	6.000000
		75%	209.000000	2.000000	26.000000	85.000000	235.000000	6.000000
		max	209.000000	2.000000	26.000000	85.000000	235.000000	6.000000
	Neighborhood 10	count	6.000000	6.000000	6.000000	6.000000	6.000000	6.000000
		mean	391.000000	3.500000	13.333333	142.500000	307.500000	3.833333
		std	146.929915	1.224745	8.571270	36.979724	55.301899	1.602082
		min	160.000000	2.000000	4.000000	90.000000	255.000000	1.000000
		25%	312.250000	2.500000	6.750000	135.000000	265.000000	3.250000
		50%	425.500000	4.000000	12.000000	137.500000	297.000000	4.500000
		75%	499.000000	4.000000	20.250000	147.500000	330.500000	5.000000
		max	537.000000	5.000000	24.000000	205.000000	400.000000	5.000000
	Neighborhood 11	count	14.000000	14.000000	14.000000	14.000000	14.000000	14.000000
		mean	379.000000	3.214286	9.928571	159.428571	174.000000	9.642857
		std	396.956111	1.311404	5.928605	70.962302	110.198422	5.212517
		min	82.000000	2.000000	5.000000	95.000000	52.000000	1.000000
		25%	242.500000	2.000000	6.000000	103.750000	70.500000	6.000000
		50%	295.500000	3.000000	8.000000	130.000000	142.500000	9.500000
		75%	373.000000	3.750000	9.750000	196.500000	243.500000	15.500000
		max	1719.000000	6.000000	23.000000	319.000000	401.000000	16.000000
	Neighborhood 12	count	39.000000	39.000000	39.000000	39.000000	39.000000	39.000000
		mean	267.205128	3.435897	10.820513	365.615385	211.307692	7.897436
		std	137.867820	1.874972	7.118810	686.086484	118.670520	5.290482
		min	45.000000	1.000000	1.000000	60.000000	22.000000	1.000000
		25%	182.000000	2.000000	6.000000	125.000000	105.500000	3.000000
		50%	251.000000	3.000000	8.000000	150.000000	223.000000	6.000000
...	...	...	...	...	...	...	...	...
Property type 3	Neighborhood 11	std	NaN	NaN	NaN	NaN	NaN	NaN
		min	196.000000	2.000000	15.000000	75.000000	178.000000	8.000000
		25%	196.000000	2.000000	15.000000	75.000000	178.000000	8.000000
		50%	196.000000	2.000000	15.000000	75.000000	178.000000	8.000000
		75%	196.000000	2.000000	15.000000	75.000000	178.000000	8.000000
		max	196.000000	2.000000	15.000000	75.000000	178.000000	8.000000
	Neighborhood 14	count	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
		mean	113.000000	1.000000	1.000000	75.000000	286.000000	5.000000
		std	NaN	NaN	NaN	NaN	NaN	NaN
		min	113.000000	1.000000	1.000000	75.000000	286.000000	5.000000
		25%	113.000000	1.000000	1.000000	75.000000	286.000000	5.000000
		50%	113.000000	1.000000	1.000000	75.000000	286.000000	5.000000
		75%	113.000000	1.000000	1.000000	75.000000	286.000000	5.000000
		max	113.000000	1.000000	1.000000	75.000000	286.000000	5.000000
	Neighborhood 17	count	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
		mean	189.000000	2.000000	15.000000	65.000000	10.000000	23.000000
		std	NaN	NaN	NaN	NaN	NaN	NaN
		min	189.000000	2.000000	15.000000	65.000000	10.000000	23.000000
		25%	189.000000	2.000000	15.000000	65.000000	10.000000	23.000000
		50%	189.000000	2.000000	15.000000	65.000000	10.000000	23.000000
		75%	189.000000	2.000000	15.000000	65.000000	10.000000	23.000000
		max	189.000000	2.000000	15.000000	65.000000	10.000000	23.000000
	Neighborhood 4	count	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
		mean	241.000000	2.000000	4.000000	40.000000	20.000000	19.000000
		std	NaN	NaN	NaN	NaN	NaN	NaN
		min	241.000000	2.000000	4.000000	40.000000	20.000000	19.000000
		25%	241.000000	2.000000	4.000000	40.000000	20.000000	19.000000
		50%	241.000000	2.000000	4.000000	40.000000	20.000000	19.000000
		75%	241.000000	2.000000	4.000000	40.000000	20.000000	19.000000
		max	241.000000	2.000000	4.000000	40.000000	20.000000	19.000000

	prop_id	price	person_capacity	picture_count	description_length	tenure_months
count	144.000000	144.000000	144.000000	144.000000	144.000000	144.000000
mean	72.500000	188.895833	3.354167	14.104167	308.645833	15.173611
std	41.713307	215.771464	1.837712	10.890246	177.770044	4.215074
min	1.000000	40.000000	1.000000	1.000000	0.000000	10.000000
25%	36.750000	93.750000	2.000000	6.000000	198.500000	12.000000
50%	72.500000	135.000000	3.000000	11.000000	282.500000	15.000000
75%	108.250000	245.250000	4.000000	18.250000	388.000000	16.000000
max	144.000000	2394.000000	8.000000	71.000000	1111.000000	30.000000

	Property type 1	Property type 2	Property type 3
0	1	0	0
1	1	0	0
2	0	1	0
3	0	1	0
4	1	0	0
5	0	1	0
6	0	1	0
7	1	0	0
8	1	0	0
9	0	0	1
10	1	0	0
11	0	1	0
12	0	1	0
13	0	1	0
14	0	1	0
15	0	1	0
16	0	1	0
17	0	1	0
18	0	1	0
19	0	0	1
20	1	0	0
21	1	0	0
22	0	1	0
23	1	0	0
24	1	0	0
25	1	0	0
26	1	0	0
27	0	1	0
28	1	0	0
29	0	1	0
...	...	...	...
378	1	0	0
379	1	0	0
380	1	0	0
381	1	0	0
382	1	0	0
383	1	0	0
384	1	0	0
385	1	0	0
386	1	0	0
387	1	0	0
388	0	1	0
389	1	0	0
390	1	0	0
391	0	1	0
392	1	0	0
393	0	1	0
394	1	0	0
395	1	0	0
396	1	0	0
397	1	0	0
398	0	1	0
399	1	0	0
400	1	0	0
401	1	0	0
402	1	0	0
403	0	1	0
404	0	1	0
405	1	0	0
406	1	0	0
407	1	0	0

	Property type 1	Property type 2	Property type 3
0	1	0	0
1	1	0	0
2	0	1	0
3	0	1	0
4	1	0	0
5	0	1	0
6	0	1	0
7	1	0	0
8	1	0	0
9	0	0	1
10	1	0	0
11	0	1	0
12	0	1	0
13	0	1	0
14	0	1	0
15	0	1	0
16	0	1	0
17	0	1	0
18	0	1	0
19	0	0	1
20	1	0	0
21	1	0	0
22	0	1	0
23	1	0	0
24	1	0	0
25	1	0	0
26	1	0	0
27	0	1	0
28	1	0	0
29	0	1	0
...	...	...	...
378	1	0	0
379	1	0	0
380	1	0	0
381	1	0	0
382	1	0	0
383	1	0	0
384	1	0	0
385	1	0	0
386	1	0	0
387	1	0	0
388	0	1	0
389	1	0	0
390	1	0	0
391	0	1	0
392	1	0	0
393	0	1	0
394	1	0	0
395	1	0	0
396	1	0	0
397	1	0	0
398	0	1	0
399	1	0	0
400	1	0	0
401	1	0	0
402	1	0	0
403	0	1	0
404	0	1	0
405	1	0	0
406	1	0	0
407	1	0	0