Homework 1 - Data Analysis and Regression

In this assignment your challenge is to do some basic analysis for Airbnb. Provided in hw/data/ there are 2 data files, bookings.csv and listings.csv. The objective is to practice data munging and begin our exploration of regression.


In [1]:
!ls ../data
# Standard imports for data analysis packages in Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pprint import pprint  # for pretty printing

# This enables inline Plots
%matplotlib inline

# Limit rows displayed in notebook
pd.set_option('display.max_rows', 10)
pd.set_option('display.precision', 2)


bookings.csv listings.csv

Part 1 - Data exploration

First, create 2 data frames: listings and bookings from their respective data files


In [2]:
listings = pd.read_csv('../data/listings.csv')
listings


Out[2]:
prop_id prop_type neighborhood price person_capacity picture_count description_length tenure_months
0 1 Property type 1 Neighborhood 14 140 3 11 232 30
1 2 Property type 1 Neighborhood 14 95 2 3 37 29
2 3 Property type 2 Neighborhood 16 95 2 16 172 29
3 4 Property type 2 Neighborhood 13 90 2 19 472 28
4 5 Property type 1 Neighborhood 15 125 5 21 442 28
... ... ... ... ... ... ... ... ...
403 404 Property type 2 Neighborhood 14 100 1 8 235 1
404 405 Property type 2 Neighborhood 13 85 2 27 1048 1
405 406 Property type 1 Neighborhood 9 70 3 18 153 1
406 407 Property type 1 Neighborhood 13 129 2 13 370 1
407 408 Property type 1 Neighborhood 14 100 3 21 707 1

408 rows × 8 columns


In [3]:
bookings = pd.read_csv('../data/bookings.csv', parse_dates=['booking_date'])
bookings


Out[3]:
prop_id booking_date
0 9 2011-06-17
1 13 2011-08-12
2 21 2011-06-20
3 28 2011-05-05
4 29 2011-11-17
... ... ...
6071 408 2011-06-02
6072 408 2011-08-22
6073 408 2011-07-24
6074 408 2011-01-12
6075 408 2011-09-08

6076 rows × 2 columns

What is the mean, median and standard deviation of price, person capacity, picture count, description length and tenure of the properties?


In [4]:
listings.describe()


Out[4]:
prop_id price person_capacity picture_count description_length tenure_months
count 408.0 408.0 408.0 408.0 408.0 408.0
mean 204.5 187.8 3.0 14.4 309.2 8.5
std 117.9 353.1 1.6 10.5 228.0 5.9
min 1.0 39.0 1.0 1.0 0.0 1.0
25% 102.8 90.0 2.0 6.0 179.0 4.0
50% 204.5 125.0 2.0 12.0 250.0 7.0
75% 306.2 199.0 4.0 20.0 389.5 13.0
max 408.0 5000.0 10.0 71.0 1969.0 30.0

What what are the mean price, person capacity, picture count, description length and tenure of the properties grouped by property type?


In [5]:
pd.set_option('display.max_rows', 30)
listings.groupby('prop_type').describe()


Out[5]:
description_length person_capacity picture_count price prop_id tenure_months
prop_type
Property type 1 count 269.0 269.0 269.0 269.0 269.0 269.0
mean 313.2 3.5 14.7 237.1 204.8 8.5
std 214.8 1.6 10.6 425.7 119.6 5.8
min 17.0 1.0 1.0 40.0 1.0 1.0
25% 193.0 2.0 6.0 120.0 95.0 4.0
50% 266.0 3.0 12.0 150.0 208.0 7.0
75% 388.0 4.0 20.0 229.0 310.0 14.0
max 1719.0 10.0 71.0 5000.0 408.0 30.0
Property type 2 count 135.0 135.0 135.0 135.0 135.0 135.0
mean 304.9 2.0 13.9 93.3 206.4 8.4
std 255.1 0.8 10.3 42.3 114.1 6.0
min 0.0 1.0 1.0 39.0 3.0 1.0
25% 150.5 2.0 6.5 69.0 120.5 4.0
50% 239.0 2.0 11.0 89.0 200.0 7.0
75% 402.5 2.0 19.5 99.0 299.5 10.5
max 1969.0 6.0 56.0 350.0 405.0 29.0
Property type 3 count 4.0 4.0 4.0 4.0 4.0 4.0
mean 184.8 1.8 8.8 63.8 123.5 13.8
std 53.1 0.5 7.3 16.5 132.9 8.6
min 113.0 1.0 1.0 40.0 10.0 5.0
25% 170.0 1.8 3.2 58.8 17.5 7.2
50% 192.5 2.0 9.5 70.0 99.0 13.5
75% 207.2 2.0 15.0 75.0 205.0 20.0
max 241.0 2.0 15.0 75.0 286.0 23.0

Same, but by property type per neighborhood?


In [6]:
pd.set_option('display.max_rows', 40)
listings.groupby(['neighborhood','prop_type']).agg('mean')


Out[6]:
prop_id price person_capacity picture_count description_length tenure_months
neighborhood prop_type
Neighborhood 1 Property type 1 235.0 85.0 2.0 26.0 209.0 6.0
Neighborhood 10 Property type 1 307.5 142.5 3.5 13.3 391.0 3.8
Property type 2 327.0 137.5 2.0 20.0 126.0 3.5
Neighborhood 11 Property type 1 174.0 159.4 3.2 9.9 379.0 9.6
Property type 2 146.2 78.8 2.0 16.8 161.2 11.2
Property type 3 178.0 75.0 2.0 15.0 196.0 8.0
Neighborhood 12 Property type 1 211.3 365.6 3.4 10.8 267.2 7.9
Property type 2 164.3 96.9 1.9 10.5 244.5 9.8
Neighborhood 13 Property type 1 190.1 241.9 4.1 15.7 290.4 9.1
Property type 2 199.0 81.1 1.8 16.7 418.6 9.7
Neighborhood 14 Property type 1 220.8 164.7 3.2 14.8 317.2 8.4
Property type 2 195.0 83.8 1.9 15.9 348.6 8.7
Property type 3 286.0 75.0 1.0 1.0 113.0 5.0
Neighborhood 15 Property type 1 191.6 178.9 3.7 14.3 321.8 9.3
Property type 2 194.7 95.0 2.3 11.7 301.7 8.2
Neighborhood 16 Property type 1 233.0 158.9 2.9 21.6 310.7 7.1
Property type 2 251.6 83.6 2.1 15.4 246.2 6.7
Neighborhood 17 Property type 1 166.0 189.9 3.5 16.1 317.3 9.9
Property type 2 242.2 102.5 2.0 15.5 308.3 7.2
Property type 3 10.0 65.0 2.0 15.0 189.0 23.0
Neighborhood 18 Property type 1 210.0 173.6 3.0 16.1 369.2 8.2
Property type 2 179.3 120.7 2.2 12.3 297.8 9.2
Neighborhood 19 Property type 1 253.2 222.4 3.6 11.0 254.5 6.5
Property type 2 256.8 88.9 2.0 15.1 383.4 5.5
Neighborhood 2 Property type 1 244.0 250.0 6.0 8.0 423.0 6.0
Neighborhood 20 Property type 1 174.1 804.3 2.8 9.4 223.6 9.7
Property type 2 230.0 60.0 1.0 3.0 101.0 6.0
Neighborhood 21 Property type 1 79.2 362.5 4.2 49.0 306.2 14.8
Neighborhood 22 Property type 1 162.0 225.0 3.0 19.0 500.0 9.0
Neighborhood 3 Property type 2 166.0 60.0 2.0 7.0 264.0 9.0
Neighborhood 4 Property type 2 118.0 60.0 2.0 10.0 95.0 11.0
Property type 3 20.0 40.0 2.0 4.0 241.0 19.0
Neighborhood 5 Property type 1 132.5 194.5 2.5 8.5 266.5 11.5
Neighborhood 6 Property type 1 291.3 146.0 3.3 12.7 290.7 4.0
Neighborhood 7 Property type 1 273.3 161.0 3.7 14.3 343.0 5.3
Property type 2 365.0 100.0 2.0 3.0 148.0 2.0
Neighborhood 8 Property type 1 218.2 174.8 5.0 11.0 300.0 6.8
Property type 2 343.0 350.0 4.0 5.0 223.0 3.0
Neighborhood 9 Property type 1 265.9 151.1 4.3 13.4 471.4 5.7
Property type 2 165.5 110.0 2.0 3.5 114.5 9.0

Plot daily bookings:


In [7]:
dailybookings = bookings
dailybookings['day']=dailybookings.booking_date.map(lambda dt: dt.strftime("%Y-%m-%d"))
dailybookings['month']=dailybookings.booking_date.map(lambda dt: dt.strftime("%Y-%m"))
dailybookings.head(2)


Out[7]:
prop_id booking_date day month
0 9 2011-06-17 2011-06-17 2011-06
1 13 2011-08-12 2011-08-12 2011-08

In [8]:
dailytotals = dailybookings.groupby('day').count()
dailytotals.prop_id.plot()


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x107713b50>

In [9]:
#more simply
bookings.booking_date.value_counts().plot()


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x1078d7490>

Plot the daily bookings per neighborhood (provide a legend)


In [10]:
dailylistings = pd.merge(dailybookings, listings, on='prop_id')
dailylistings.head(2)
listingtotals =dailylistings.groupby(['day','neighborhood']).count()
listingtotals.head(2)


Out[10]:
prop_id booking_date month prop_type price person_capacity picture_count description_length tenure_months
day neighborhood
2011-01-01 Neighborhood 13 4 4 4 4 4 4 4 4 4
Neighborhood 14 3 3 3 3 3 3 3 3 3

In [11]:
bookingPivot = pd.pivot_table(dailylistings, values='prop_id', index='booking_date', 
                              columns='neighborhood', aggfunc='count')
bookingPivot.head(2)
#?bookingPivot.columns

#listingtotals.head(2)


Out[11]:
neighborhood Neighborhood 1 Neighborhood 10 Neighborhood 11 Neighborhood 12 Neighborhood 13 Neighborhood 14 Neighborhood 15 Neighborhood 16 Neighborhood 17 Neighborhood 18 ... Neighborhood 20 Neighborhood 21 Neighborhood 22 Neighborhood 3 Neighborhood 4 Neighborhood 5 Neighborhood 6 Neighborhood 7 Neighborhood 8 Neighborhood 9
booking_date
2011-01-01 NaN NaN NaN NaN 4 3 1 NaN 1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2
2011-01-02 NaN NaN 1 NaN 1 3 1 1 NaN 2 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

2 rows × 21 columns


In [12]:
plt.figure(); bookingPivot.plot(figsize=(20,10)); plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)


Out[12]:
<matplotlib.legend.Legend at 0x107c6a890>
<matplotlib.figure.Figure at 0x1077ce250>

Part 2 - Develop a data set


In [13]:
monthlyBookings = pd.DataFrame(dailybookings,columns= ['month','prop_id'])
#monthlyMerge = pd.merge(monthlyTotals, listings, on='prop_id')
monthlyBookings['number_of_bookings'] = 1


MonthTotals = monthlyBookings.groupby(['month','prop_id'], as_index=False).agg('sum')

monthlyMerge = pd.merge(MonthTotals, listings, on='prop_id')
monthlyMerge['booking_rate'] = (monthlyMerge.number_of_bookings +0.0)  / monthlyMerge.tenure_months 
monthlyMerge.describe()


Out[13]:
prop_id number_of_bookings price person_capacity picture_count description_length tenure_months booking_rate
count 2008.0 2008.0 2008.0 2008.0 2008.0 2008.0 2008.0 2008.0
mean 210.7 3.0 131.8 2.8 17.3 342.7 8.1 0.7
std 113.6 2.5 128.5 1.4 10.5 238.7 5.6 0.9
min 1.0 1.0 39.0 1.0 2.0 0.0 1.0 0.0
25% 123.0 1.0 85.0 2.0 9.0 196.0 4.0 0.2
50% 210.0 2.0 100.0 2.0 16.0 264.5 7.0 0.3
75% 309.0 4.0 149.0 3.0 23.0 425.0 10.0 0.8
max 408.0 19.0 3050.0 8.0 71.0 1969.0 30.0 10.0

Add the columns number_of_bookings and booking_rate (number_of_bookings/tenure_months) to your listings data frame


In [ ]:

We only want to analyze well established properties, so let's filter out any properties that have a tenure less than 10 months


In [14]:
monthlyFilter = monthlyMerge[monthlyMerge.tenure_months  > 9]
monthlyFilter.describe()


Out[14]:
prop_id number_of_bookings price person_capacity picture_count description_length tenure_months booking_rate
count 661.0 661.0 661.0 661.0 661.0 661.0 661.0 661.0
mean 81.0 2.9 140.6 2.9 15.7 328.0 14.6 0.2
std 45.2 2.4 174.2 1.6 10.0 189.6 4.5 0.2
min 1.0 1.0 40.0 1.0 4.0 0.0 10.0 0.0
25% 30.0 1.0 80.0 2.0 9.0 197.0 10.0 0.1
50% 90.0 2.0 96.0 2.0 15.0 288.0 14.0 0.2
75% 122.0 4.0 145.0 3.0 20.0 418.0 16.0 0.3
max 143.0 16.0 2394.0 8.0 71.0 1111.0 30.0 1.2

prop_type and neighborhood are categorical variables, use get_dummies() (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.reshape.get_dummies.html) to transform this column of categorical data to many columns of boolean values (after applying this function correctly there should be 1 column for every prop_type and 1 column for every neighborhood category.


In [15]:
#pd.get_dummies(monthlyFilter, prefix=['prop_type'])

for x in ['prop_type','neighborhood']:
    just_dummies = pd.get_dummies(monthlyFilter[x])
    step_1 = pd.concat([monthlyFilter, just_dummies], axis=1) 
    step_1.drop([x], inplace=True, axis=1)
    monthlyFilter = step_1
monthlyFilter.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 661 entries, 0 to 2004
Data columns (total 26 columns):
month                 661 non-null object
prop_id               661 non-null int64
number_of_bookings    661 non-null int64
price                 661 non-null int64
person_capacity       661 non-null int64
picture_count         661 non-null int64
description_length    661 non-null int64
tenure_months         661 non-null int64
booking_rate          661 non-null float64
Property type 1       661 non-null float64
Property type 2       661 non-null float64
Property type 3       661 non-null float64
Neighborhood 11       661 non-null float64
Neighborhood 12       661 non-null float64
Neighborhood 13       661 non-null float64
Neighborhood 14       661 non-null float64
Neighborhood 15       661 non-null float64
Neighborhood 16       661 non-null float64
Neighborhood 17       661 non-null float64
Neighborhood 18       661 non-null float64
Neighborhood 19       661 non-null float64
Neighborhood 20       661 non-null float64
Neighborhood 21       661 non-null float64
Neighborhood 4        661 non-null float64
Neighborhood 8        661 non-null float64
Neighborhood 9        661 non-null float64
dtypes: float64(18), int64(7), object(1)

In [17]:
booking_rate = monthlyFilter.booking_rate 
booking_rate


Out[17]:
0     0.5
1     0.2
2     0.2
3     0.3
4     0.1
5     0.3
6     0.5
7     0.5
8     0.1
9     0.1
10    0.2
11    0.1
12    0.1
13    0.4
14    0.1
...
1979    0.1
1980    0.1
1987    0.1
1988    0.1
1989    0.1
1990    0.3
1991    0.4
1992    0.1
1993    0.1
1994    0.1
1995    0.1
1996    0.2
1997    0.1
2003    0.1
2004    0.1
Name: booking_rate, Length: 661, dtype: float64

In [18]:
killme =  ['prop_id','booking_rate','month','number_of_bookings']
#killme =  ['month']
monthlyFilter.drop(killme, inplace=True, axis=1)
monthlyFilter.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 661 entries, 0 to 2004
Data columns (total 22 columns):
price                 661 non-null int64
person_capacity       661 non-null int64
picture_count         661 non-null int64
description_length    661 non-null int64
tenure_months         661 non-null int64
Property type 1       661 non-null float64
Property type 2       661 non-null float64
Property type 3       661 non-null float64
Neighborhood 11       661 non-null float64
Neighborhood 12       661 non-null float64
Neighborhood 13       661 non-null float64
Neighborhood 14       661 non-null float64
Neighborhood 15       661 non-null float64
Neighborhood 16       661 non-null float64
Neighborhood 17       661 non-null float64
Neighborhood 18       661 non-null float64
Neighborhood 19       661 non-null float64
Neighborhood 20       661 non-null float64
Neighborhood 21       661 non-null float64
Neighborhood 4        661 non-null float64
Neighborhood 8        661 non-null float64
Neighborhood 9        661 non-null float64
dtypes: float64(17), int64(5)

create test and training sets for your regressors and predictors

predictor (y) is booking_rate, regressors (X) are everything else, except prop_id,booking_rate,prop_type,neighborhood and number_of_bookings
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html
http://pandas.pydata.org/pandas-docs/stable/basics.html#dropping-labels-from-an-axis


In [19]:
from sklearn.cross_validation import train_test_split

In [20]:
#a_train, a_test = train_test_split(monthlyFilter,test_size=0.4)
X_train, X_test, y_train, y_test = train_test_split(monthlyFilter, booking_rate, test_size=0.4)
#a_train.shape

In [21]:
print y_test.shape, X_test.shape


(265,) (265, 22)

Part 3 - Model booking_rate

Create a linear regression model of your listings


In [22]:
from sklearn.linear_model import LinearRegression
#lr = LinearRegression()
#X_train = a_train[:,range(1,23)]
#X_train.shape

In [141]:
#Y_train = a_train[:,0]
#Y_train.shape


Out[141]:
(396,)

fit your model with your test sets


In [23]:
lr = LinearRegression()

In [25]:
lr.fit(X_train, y_train)


Out[25]:
LinearRegression(copy_X=True, fit_intercept=True, normalize=False)

In [146]:
#X_test = a_test[:,range(1,23)]
#X_test.shape


Out[146]:
(265, 22)

In [150]:
#Y_test = a_test[:,0]
#Y_test.shape


Out[150]:
(265,)

In [26]:
lr.score(X_test, y_test)


Out[26]:
0.20580260972997833

In [40]:
?lr.score

Interpret the results of the above model:

  • What does the score method do?
  • What does this tell us about our model?

In [ ]:

This looks it is not very correlated at .2 it would be better correlated if it were closer 1 1. We need to find better factors

Optional - Iterate

Create an alternative predictor (e.g. monthly revenue) and use the same modeling pattern in Part 3 to


In [38]:
#I am running late here.
#what I would want to do and don't have time is to make a measure to determine how long something has been open and 
#the see the growth

import datetime 
def addmonths(date,months):
    targetmonth=months+date.month
    try:
        date.replace(year=date.year+int(targetmonth/12),month=(targetmonth%12))
    except:
        # There is an exception if the day of the month we're in does not exist in the target month
        # Go to the FIRST of the month AFTER, then go back one day.
        date.replace(year=date.year+int((targetmonth+1)/12),month=((targetmonth+1)%12),day=1)
        date+=datetime.timedelta(days=-1)
        
monthlyMerge = pd.merge(MonthTotals, listings, on='prop_id')
monthlyMerge['booking_rate'] = (monthlyMerge.number_of_bookings +0.0)  / monthlyMerge.tenure_months 
monthlyMerge.month.max

monthlyMerge['month_time'] = pd.to_datetime(monthlyMerge.month)
from datetime import timedelta
monthlyMerge['startTime'] = monthlyMerge.tenure_months.map(lambda dt: addmonths(pd.to_datetime("2011-12-01"),-dt))
monthlyMerge.startTime


Out[38]:
0     None
1     None
2     None
3     None
4     None
5     None
6     None
7     None
8     None
9     None
10    None
11    None
12    None
13    None
14    None
...
1993    None
1994    None
1995    None
1996    None
1997    None
1998    None
1999    None
2000    None
2001    None
2002    None
2003    None
2004    None
2005    None
2006    None
2007    None
Name: startTime, Length: 2008, dtype: object

In [36]:
?timedelta

In [30]:
#I have run out of time to make this work...


Out[30]:
number_of_bookings price person_capacity picture_count description_length tenure_months Property type 1 Property type 2 Property type 3 Neighborhood 11 ... Neighborhood 15 Neighborhood 16 Neighborhood 17 Neighborhood 18 Neighborhood 19 Neighborhood 20 Neighborhood 21 Neighborhood 4 Neighborhood 8 Neighborhood 9
number_of_bookings 1.0e+00 -1.7e-01 -8.1e-02 1.4e-02 1.4e-01 5.6e-02 -3.3e-01 3.4e-01 -3.7e-02 -4.7e-02 ... 0.2 3.0e-03 0.1 -7.0e-03 6.0e-02 -7.1e-02 -9.0e-02 -4.8e-02 -3.1e-02 -2.9e-02
price -1.7e-01 1.0e+00 5.3e-01 5.7e-02 -7.2e-03 7.2e-03 3.2e-01 -3.0e-01 -7.3e-02 -3.4e-03 ... -0.1 -5.8e-02 -0.1 -7.7e-02 -5.6e-03 5.5e-02 1.4e-01 -6.4e-02 2.4e-02 -6.0e-02
person_capacity -8.1e-02 5.3e-01 1.0e+00 1.5e-01 -2.2e-02 -8.4e-02 3.7e-01 -3.5e-01 -8.7e-02 -8.5e-03 ... 0.2 -1.3e-01 -0.1 -6.3e-02 -2.0e-02 5.3e-03 9.4e-02 -7.1e-02 7.6e-02 -4.4e-02
picture_count 1.4e-02 5.7e-02 1.5e-01 1.0e+00 2.2e-01 -1.1e-01 1.8e-01 -1.6e-01 -6.8e-02 -6.6e-02 ... 0.0 -5.6e-03 -0.1 -6.0e-02 1.5e-01 -7.2e-02 4.0e-01 -1.1e-01 -3.0e-02 -1.5e-01
description_length 1.4e-01 -7.2e-03 -2.2e-02 2.2e-01 1.0e+00 9.1e-02 -2.7e-02 5.7e-02 -9.8e-02 -2.1e-01 ... -0.1 -1.0e-01 -0.1 1.7e-01 2.0e-01 -5.1e-02 -9.3e-02 -1.0e-01 -3.9e-02 -9.3e-02
tenure_months 5.6e-02 7.2e-03 -8.4e-02 -1.1e-01 9.1e-02 1.0e+00 -1.8e-01 1.1e-01 2.4e-01 -1.1e-01 ... -0.1 -4.1e-02 0.1 -1.7e-02 -1.5e-01 -1.2e-02 7.2e-03 9.9e-03 -4.0e-02 -1.2e-01
Property type 1 -3.3e-01 3.2e-01 3.7e-01 1.8e-01 -2.7e-02 -1.8e-01 1.0e+00 -9.6e-01 -1.7e-01 1.3e-01 ... -0.1 2.1e-02 -0.2 -3.1e-02 -1.4e-01 7.9e-02 1.1e-01 -1.4e-01 3.5e-02 1.6e-01
Property type 2 3.4e-01 -3.0e-01 -3.5e-01 -1.6e-01 5.7e-02 1.1e-01 -9.6e-01 1.0e+00 -1.3e-01 -1.2e-01 ... 0.1 -1.1e-02 0.0 4.6e-02 1.5e-01 -7.6e-02 -1.1e-01 1.8e-02 -3.4e-02 -1.6e-01
Property type 3 -3.7e-02 -7.3e-02 -8.7e-02 -6.8e-02 -9.8e-02 2.4e-01 -1.7e-01 -1.3e-01 1.0e+00 -4.3e-02 ... -0.1 -3.4e-02 0.4 -4.9e-02 -2.9e-02 -1.3e-02 -1.9e-02 4.0e-01 -5.9e-03 -2.8e-02
Neighborhood 11 -4.7e-02 -3.4e-03 -8.5e-03 -6.6e-02 -2.1e-01 -1.1e-01 1.3e-01 -1.2e-01 -4.3e-02 1.0e+00 ... -0.1 -6.4e-02 -0.1 -9.1e-02 -5.4e-02 -2.5e-02 -3.5e-02 -3.5e-02 -1.1e-02 -5.1e-02
Neighborhood 12 -1.4e-01 4.8e-02 1.8e-02 -5.5e-02 9.5e-03 -1.2e-01 1.7e-01 -1.5e-01 -5.3e-02 -9.8e-02 ... -0.1 -7.8e-02 -0.1 -1.1e-01 -6.6e-02 -3.0e-02 -4.3e-02 -4.3e-02 -1.4e-02 -6.3e-02
Neighborhood 13 -2.6e-02 1.7e-01 1.0e-01 -7.6e-02 1.3e-01 2.5e-01 -6.3e-02 8.8e-02 -8.2e-02 -1.5e-01 ... -0.2 -1.2e-01 -0.1 -1.7e-01 -1.0e-01 -4.7e-02 -6.7e-02 -6.7e-02 -2.1e-02 -9.8e-02
Neighborhood 14 6.9e-03 -5.5e-02 -7.8e-02 2.3e-01 6.5e-02 7.4e-02 9.3e-03 9.2e-03 -6.2e-02 -1.1e-01 ... -0.2 -9.1e-02 -0.1 -1.3e-01 -7.7e-02 -3.5e-02 -5.0e-02 -5.0e-02 -1.6e-02 -7.3e-02
Neighborhood 15 1.9e-01 -5.2e-02 1.6e-01 1.9e-02 -7.1e-02 -1.0e-01 -8.2e-02 1.0e-01 -6.0e-02 -1.1e-01 ... 1.0 -8.8e-02 -0.1 -1.3e-01 -7.4e-02 -3.4e-02 -4.9e-02 -4.9e-02 -1.5e-02 -7.1e-02
Neighborhood 16 3.0e-03 -5.8e-02 -1.3e-01 -5.6e-03 -1.0e-01 -4.1e-02 2.1e-02 -1.1e-02 -3.4e-02 -6.4e-02 ... -0.1 1.0e+00 -0.1 -7.3e-02 -4.3e-02 -2.0e-02 -2.8e-02 -2.8e-02 -8.8e-03 -4.1e-02
Neighborhood 17 7.1e-02 -6.7e-02 -9.2e-02 -1.1e-01 -7.5e-02 1.3e-01 -1.5e-01 4.5e-02 3.5e-01 -7.8e-02 ... -0.1 -6.2e-02 1.0 -8.9e-02 -5.3e-02 -2.4e-02 -3.4e-02 -3.4e-02 -1.1e-02 -5.0e-02
Neighborhood 18 -7.0e-03 -7.7e-02 -6.3e-02 -6.0e-02 1.7e-01 -1.7e-02 -3.1e-02 4.6e-02 -4.9e-02 -9.1e-02 ... -0.1 -7.3e-02 -0.1 1.0e+00 -6.1e-02 -2.8e-02 -4.0e-02 -4.0e-02 -1.3e-02 -5.8e-02
Neighborhood 19 6.0e-02 -5.6e-03 -2.0e-02 1.5e-01 2.0e-01 -1.5e-01 -1.4e-01 1.5e-01 -2.9e-02 -5.4e-02 ... -0.1 -4.3e-02 -0.1 -6.1e-02 1.0e+00 -1.7e-02 -2.4e-02 -2.4e-02 -7.4e-03 -3.4e-02
Neighborhood 20 -7.1e-02 5.5e-02 5.3e-03 -7.2e-02 -5.1e-02 -1.2e-02 7.9e-02 -7.6e-02 -1.3e-02 -2.5e-02 ... -0.0 -2.0e-02 -0.0 -2.8e-02 -1.7e-02 1.0e+00 -1.1e-02 -1.1e-02 -3.4e-03 -1.6e-02
Neighborhood 21 -9.0e-02 1.4e-01 9.4e-02 4.0e-01 -9.3e-02 7.2e-03 1.1e-01 -1.1e-01 -1.9e-02 -3.5e-02 ... -0.0 -2.8e-02 -0.0 -4.0e-02 -2.4e-02 -1.1e-02 1.0e+00 -1.5e-02 -4.8e-03 -2.2e-02
Neighborhood 4 -4.8e-02 -6.4e-02 -7.1e-02 -1.1e-01 -1.0e-01 9.9e-03 -1.4e-01 1.8e-02 4.0e-01 -3.5e-02 ... -0.0 -2.8e-02 -0.0 -4.0e-02 -2.4e-02 -1.1e-02 -1.5e-02 1.0e+00 -4.8e-03 -2.2e-02
Neighborhood 8 -3.1e-02 2.4e-02 7.6e-02 -3.0e-02 -3.9e-02 -4.0e-02 3.5e-02 -3.4e-02 -5.9e-03 -1.1e-02 ... -0.0 -8.8e-03 -0.0 -1.3e-02 -7.4e-03 -3.4e-03 -4.8e-03 -4.8e-03 1.0e+00 -7.1e-03
Neighborhood 9 -2.9e-02 -6.0e-02 -4.4e-02 -1.5e-01 -9.3e-02 -1.2e-01 1.6e-01 -1.6e-01 -2.8e-02 -5.1e-02 ... -0.1 -4.1e-02 -0.1 -5.8e-02 -3.4e-02 -1.6e-02 -2.2e-02 -2.2e-02 -7.1e-03 1.0e+00

23 rows × 23 columns


In [32]:
check = foo["price"]

In [35]:
check


Out[35]:
number_of_bookings   -1.7e-01
price                 1.0e+00
person_capacity       5.3e-01
picture_count         5.7e-02
description_length   -7.2e-03
tenure_months         7.2e-03
Property type 1       3.2e-01
Property type 2      -3.0e-01
Property type 3      -7.3e-02
Neighborhood 11      -3.4e-03
Neighborhood 12       4.8e-02
Neighborhood 13       1.7e-01
Neighborhood 14      -5.5e-02
Neighborhood 15      -5.2e-02
Neighborhood 16      -5.8e-02
Neighborhood 17      -6.7e-02
Neighborhood 18      -7.7e-02
Neighborhood 19      -5.6e-03
Neighborhood 20       5.5e-02
Neighborhood 21       1.4e-01
Neighborhood 4       -6.4e-02
Neighborhood 8        2.4e-02
Neighborhood 9       -6.0e-02
Name: price, dtype: float64

In [ ]: