Sniffing data frames

We're going to use a real kaggle competition data set to explore Pandas dataframes. Grab the rent.csv.zip file and unzip it.



In [1]:

    
import pandas as pd
df = pd.read_csv("data/rent.csv", parse_dates=['created'])
df.head(2)









    Out[1]:







  
    
      
      bathrooms
      bedrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      listing_id
      longitude
      manager_id
      photos
      price
      street_address
    
  
  
    
      0
      1.5
      3
      53a5b119ba8f7b61d4e010512e0dfc85
      2016-06-24 07:54:24
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      Metropolitan Avenue
      []
      medium
      40.7145
      7211212
      -73.9425
      5ba989232d0489da1b5f2c45f6688adc
      ['https://photos.renthop.com/2/7211212_1ed4542...
      3000
      792 Metropolitan Avenue
    
    
      1
      1.0
      2
      c5c8a357cba207596b04d1afd1e4f130
      2016-06-12 12:19:27
      
      Columbus Avenue
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
      low
      40.7947
      7150865
      -73.9667
      7533621a882f71e25173b27e3139d83d
      ['https://photos.renthop.com/2/7150865_be3306c...
      5465
      808 Columbus Avenue



In [2]:

    
df.head(2).T









    Out[2]:







  
    
      
      0
      1
    
  
  
    
      bathrooms
      1.5
      1
    
    
      bedrooms
      3
      2
    
    
      building_id
      53a5b119ba8f7b61d4e010512e0dfc85
      c5c8a357cba207596b04d1afd1e4f130
    
    
      created
      2016-06-24 07:54:24
      2016-06-12 12:19:27
    
    
      description
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      
    
    
      display_address
      Metropolitan Avenue
      Columbus Avenue
    
    
      features
      []
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
    
    
      interest_level
      medium
      low
    
    
      latitude
      40.7145
      40.7947
    
    
      listing_id
      7211212
      7150865
    
    
      longitude
      -73.9425
      -73.9667
    
    
      manager_id
      5ba989232d0489da1b5f2c45f6688adc
      7533621a882f71e25173b27e3139d83d
    
    
      photos
      ['https://photos.renthop.com/2/7211212_1ed4542...
      ['https://photos.renthop.com/2/7150865_be3306c...
    
    
      price
      3000
      5465
    
    
      street_address
      792 Metropolitan Avenue
      808 Columbus Avenue

Sniff the data



In [3]:

    
df.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49352 entries, 0 to 49351
Data columns (total 15 columns):
bathrooms          49352 non-null float64
bedrooms           49352 non-null int64
building_id        49352 non-null object
created            49352 non-null datetime64[ns]
description        47906 non-null object
display_address    49217 non-null object
features           49352 non-null object
interest_level     49352 non-null object
latitude           49352 non-null float64
listing_id         49352 non-null int64
longitude          49352 non-null float64
manager_id         49352 non-null object
photos             49352 non-null object
price              49352 non-null int64
street_address     49342 non-null object
dtypes: datetime64[ns](1), float64(3), int64(3), object(8)
memory usage: 5.6+ MB



In [4]:

    
df.describe()









    Out[4]:







  
    
      
      bathrooms
      bedrooms
      latitude
      listing_id
      longitude
      price
    
  
  
    
      count
      49352.00000
      49352.000000
      49352.000000
      4.935200e+04
      49352.000000
      4.935200e+04
    
    
      mean
      1.21218
      1.541640
      40.741545
      7.024055e+06
      -73.955716
      3.830174e+03
    
    
      std
      0.50142
      1.115018
      0.638535
      1.262746e+05
      1.177912
      2.206687e+04
    
    
      min
      0.00000
      0.000000
      0.000000
      6.811957e+06
      -118.271000
      4.300000e+01
    
    
      25%
      1.00000
      1.000000
      40.728300
      6.915888e+06
      -73.991700
      2.500000e+03
    
    
      50%
      1.00000
      1.000000
      40.751800
      7.021070e+06
      -73.977900
      3.150000e+03
    
    
      75%
      1.00000
      2.000000
      40.774300
      7.128733e+06
      -73.954800
      4.100000e+03
    
    
      max
      10.00000
      8.000000
      44.883500
      7.753784e+06
      0.000000
      4.490000e+06



In [5]:

    
df.price.sort_values(ascending=False).head(10)









    Out[5]:





19558    4490000
9590     1150000
30689    1070000
29665    1070000
10581     135000
25538     111111
45674     100000
29082      90000
7336       85000
47995      80000
Name: price, dtype: int64

Get column



In [6]:

    
df.price.head(5) # works only on right hand side of assignment or in expression









    Out[6]:





0    3000
1    5465
2    2850
3    3275
4    3350
Name: price, dtype: int64



In [7]:

    
df['price'].head(5)









    Out[7]:





0    3000
1    5465
2    2850
3    3275
4    3350
Name: price, dtype: int64

Column computations

Can grab values and take average:



In [8]:

    
prices = df['price']
avg_rent = prices.mean()
print(f"Average rent is ${avg_rent:.0f}")









    



Average rent is $3830



In [9]:

    
df.latitude.min(), df.latitude.max()









    Out[9]:





(0.0, 44.8835)



In [10]:

    
bybaths = df.groupby(['bathrooms']).mean()
bybaths









    Out[10]:







  
    
      
      bedrooms
      latitude
      listing_id
      longitude
      price
    
    
      bathrooms
      
      
      
      
      
    
  
  
    
      0.0
      0.849840
      40.756168
      7.027187e+06
      -73.970516
      4001.530351
    
    
      1.0
      1.251662
      40.740873
      7.024082e+06
      -73.954589
      3233.940490
    
    
      1.5
      2.271318
      40.748909
      7.026701e+06
      -73.965980
      4257.066667
    
    
      2.0
      2.685117
      40.750483
      7.022952e+06
      -73.974603
      5501.066971
    
    
      2.5
      2.851986
      40.759694
      7.018145e+06
      -73.968822
      9028.920578
    
    
      3.0
      3.293960
      40.764863
      7.028051e+06
      -73.999571
      10391.060403
    
    
      3.5
      3.542857
      40.764344
      7.061043e+06
      -73.964676
      14534.785714
    
    
      4.0
      4.301887
      40.513449
      7.025477e+06
      -73.492230
      16239.213836
    
    
      4.5
      4.137931
      40.780462
      7.058375e+06
      -73.900438
      20868.000000
    
    
      5.0
      4.400000
      38.723515
      7.024429e+06
      -70.273665
      27214.700000
    
    
      5.5
      4.400000
      40.745340
      6.975588e+06
      -73.975340
      26300.000000
    
    
      6.0
      5.250000
      40.762925
      7.005858e+06
      -73.966400
      50250.000000
    
    
      6.5
      7.000000
      40.833600
      7.198431e+06
      -74.216000
      14500.000000
    
    
      7.0
      5.000000
      40.733500
      6.840943e+06
      -73.997400
      60000.000000
    
    
      10.0
      2.000000
      40.763300
      6.849204e+06
      -73.984900
      3600.000000



In [11]:

    
bybaths = bybaths.reset_index() # overcome quirk in Pandas
bybaths.head(3)









    Out[11]:







  
    
      
      bathrooms
      bedrooms
      latitude
      listing_id
      longitude
      price
    
  
  
    
      0
      0.0
      0.849840
      40.756168
      7.027187e+06
      -73.970516
      4001.530351
    
    
      1
      1.0
      1.251662
      40.740873
      7.024082e+06
      -73.954589
      3233.940490
    
    
      2
      1.5
      2.271318
      40.748909
      7.026701e+06
      -73.965980
      4257.066667



In [12]:

    
bybaths[['bathrooms','price']] # print just num baths, avg price









    Out[12]:







  
    
      
      bathrooms
      price
    
  
  
    
      0
      0.0
      4001.530351
    
    
      1
      1.0
      3233.940490
    
    
      2
      1.5
      4257.066667
    
    
      3
      2.0
      5501.066971
    
    
      4
      2.5
      9028.920578
    
    
      5
      3.0
      10391.060403
    
    
      6
      3.5
      14534.785714
    
    
      7
      4.0
      16239.213836
    
    
      8
      4.5
      20868.000000
    
    
      9
      5.0
      27214.700000
    
    
      10
      5.5
      26300.000000
    
    
      11
      6.0
      50250.000000
    
    
      12
      6.5
      14500.000000
    
    
      13
      7.0
      60000.000000
    
    
      14
      10.0
      3600.000000

Columns vs subsets



In [13]:

    
# get column
df['bedrooms'].head(5)









    Out[13]:





0    3
1    2
2    1
3    1
4    4
Name: bedrooms, dtype: int64



In [14]:

    
df[['bedrooms','bathrooms']].head(3)

Get rows



In [15]:

    
# get index 3 row
df.iloc[3] # same as df.iloc[3,:]









    Out[15]:





bathrooms                                                          1
bedrooms                                                           1
building_id                         28d9ad350afeaab8027513a3e52ac8d5
created                                          2016-04-18 02:22:02
description        Building Amenities - Garage - Garden - fitness...
display_address                                     East 49th Street
features                               ['Hardwood Floors', 'No Fee']
interest_level                                                   low
latitude                                                     40.7539
listing_id                                                   6888711
longitude                                                   -73.9677
manager_id                          1067e078446a7897d2da493d2f741316
photos             ['https://photos.renthop.com/2/6888711_6e660ce...
price                                                           3275
street_address                                  333 East 49th Street
Name: 3, dtype: object



In [16]:

    
df.iloc[0:2] # first two rows









    Out[16]:







  
    
      
      bathrooms
      bedrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      listing_id
      longitude
      manager_id
      photos
      price
      street_address
    
  
  
    
      0
      1.5
      3
      53a5b119ba8f7b61d4e010512e0dfc85
      2016-06-24 07:54:24
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      Metropolitan Avenue
      []
      medium
      40.7145
      7211212
      -73.9425
      5ba989232d0489da1b5f2c45f6688adc
      ['https://photos.renthop.com/2/7211212_1ed4542...
      3000
      792 Metropolitan Avenue
    
    
      1
      1.0
      2
      c5c8a357cba207596b04d1afd1e4f130
      2016-06-12 12:19:27
      
      Columbus Avenue
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
      low
      40.7947
      7150865
      -73.9667
      7533621a882f71e25173b27e3139d83d
      ['https://photos.renthop.com/2/7150865_be3306c...
      5465
      808 Columbus Avenue



In [17]:

    
df.iloc[0:2][['created','features']] # first two rows, show 2 columns









    Out[17]:







  
    
      
      created
      features
    
  
  
    
      0
      2016-06-24 07:54:24
      []
    
    
      1
      2016-06-12 12:19:27
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...

Indexing, Get rows by index key



In [18]:

    
df = df.set_index('listing_id')
df.head(3)









    Out[18]:







  
    
      
      bathrooms
      bedrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      longitude
      manager_id
      photos
      price
      street_address
    
    
      listing_id
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      7211212
      1.5
      3
      53a5b119ba8f7b61d4e010512e0dfc85
      2016-06-24 07:54:24
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      Metropolitan Avenue
      []
      medium
      40.7145
      -73.9425
      5ba989232d0489da1b5f2c45f6688adc
      ['https://photos.renthop.com/2/7211212_1ed4542...
      3000
      792 Metropolitan Avenue
    
    
      7150865
      1.0
      2
      c5c8a357cba207596b04d1afd1e4f130
      2016-06-12 12:19:27
      
      Columbus Avenue
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
      low
      40.7947
      -73.9667
      7533621a882f71e25173b27e3139d83d
      ['https://photos.renthop.com/2/7150865_be3306c...
      5465
      808 Columbus Avenue
    
    
      6887163
      1.0
      1
      c3ba40552e2120b0acfc3cb5730bb2aa
      2016-04-17 03:26:41
      Top Top West Village location, beautiful Pre-w...
      W 13 Street
      ['Laundry In Building', 'Dishwasher', 'Hardwoo...
      high
      40.7388
      -74.0018
      d9039c43983f6e564b1482b273bd7b01
      ['https://photos.renthop.com/2/6887163_de85c42...
      2850
      241 W 13 Street



In [19]:

    
df.loc[7150865]









    Out[19]:





bathrooms                                                          1
bedrooms                                                           2
building_id                         c5c8a357cba207596b04d1afd1e4f130
created                                          2016-06-12 12:19:27
description                                                         
display_address                                      Columbus Avenue
features           ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
interest_level                                                   low
latitude                                                     40.7947
longitude                                                   -73.9667
manager_id                          7533621a882f71e25173b27e3139d83d
photos             ['https://photos.renthop.com/2/7150865_be3306c...
price                                                           5465
street_address                                   808 Columbus Avenue
Name: 7150865, dtype: object



In [20]:

    
df = df.reset_index()
df.head(3)









    Out[20]:







  
    
      
      listing_id
      bathrooms
      bedrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      longitude
      manager_id
      photos
      price
      street_address
    
  
  
    
      0
      7211212
      1.5
      3
      53a5b119ba8f7b61d4e010512e0dfc85
      2016-06-24 07:54:24
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      Metropolitan Avenue
      []
      medium
      40.7145
      -73.9425
      5ba989232d0489da1b5f2c45f6688adc
      ['https://photos.renthop.com/2/7211212_1ed4542...
      3000
      792 Metropolitan Avenue
    
    
      1
      7150865
      1.0
      2
      c5c8a357cba207596b04d1afd1e4f130
      2016-06-12 12:19:27
      
      Columbus Avenue
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
      low
      40.7947
      -73.9667
      7533621a882f71e25173b27e3139d83d
      ['https://photos.renthop.com/2/7150865_be3306c...
      5465
      808 Columbus Avenue
    
    
      2
      6887163
      1.0
      1
      c3ba40552e2120b0acfc3cb5730bb2aa
      2016-04-17 03:26:41
      Top Top West Village location, beautiful Pre-w...
      W 13 Street
      ['Laundry In Building', 'Dishwasher', 'Hardwoo...
      high
      40.7388
      -74.0018
      d9039c43983f6e564b1482b273bd7b01
      ['https://photos.renthop.com/2/6887163_de85c42...
      2850
      241 W 13 Street

Allows non-unique indexes



In [21]:

    
df_beds = df.set_index('bedrooms')
df_beds.loc[3].head(3)









    Out[21]:







  
    
      
      listing_id
      bathrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      longitude
      manager_id
      photos
      price
      street_address
    
    
      bedrooms
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      3
      7211212
      1.5
      53a5b119ba8f7b61d4e010512e0dfc85
      2016-06-24 07:54:24
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      Metropolitan Avenue
      []
      medium
      40.7145
      -73.9425
      5ba989232d0489da1b5f2c45f6688adc
      ['https://photos.renthop.com/2/7211212_1ed4542...
      3000
      792 Metropolitan Avenue
    
    
      3
      6858062
      1.0
      205f95d4a78f1f3befda48b89edc9669
      2016-04-12 02:39:45
      BEAUTIFUL 2 BEDROOM POSSIBLE CONVERSION INTO T...
      Madison Avenue
      ['Doorman', 'Elevator', 'Dishwasher', 'Hardwoo...
      low
      40.7454
      -73.9845
      3793e58c60343a3fd6846ca2d2ef3c7f
      ['https://photos.renthop.com/2/6858062_5cfb9d9...
      4395
      121 Madison Avenue
    
    
      3
      6890563
      1.0
      be6b7c3fdf3f63a2756306f4af7788a6
      2016-04-18 04:46:30
      These pictures are from a similarlisting.
      Thompson St
      ['Washer/Dryer']
      low
      40.7231
      -74.0044
      64249f81378907ae7cf65e8ccb4bd8dc
      ['https://photos.renthop.com/2/6890563_1b98fae...
      3733
      25 Thompson St

Checking for missing values



In [22]:

    
df.isnull().head(5)









    Out[22]:







  
    
      
      listing_id
      bathrooms
      bedrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      longitude
      manager_id
      photos
      price
      street_address
    
  
  
    
      0
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
    
    
      1
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
    
    
      2
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
    
    
      3
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
    
    
      4
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False



In [23]:

    
df.isnull().any()









    Out[23]:





listing_id         False
bathrooms          False
bedrooms           False
building_id        False
created            False
description         True
display_address     True
features           False
interest_level     False
latitude           False
longitude          False
manager_id         False
photos             False
price              False
street_address      True
dtype: bool

Find all rows in data frame where description is missing:



In [24]:

    
df.description.isnull().head(5)









    Out[24]:





0    False
1    False
2    False
3    False
4    False
Name: description, dtype: bool



In [25]:

    
df[df.description.isnull()].head(5)









    Out[25]:







  
    
      
      listing_id
      bathrooms
      bedrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      longitude
      manager_id
      photos
      price
      street_address
    
  
  
    
      57
      6923973
      1.0
      0
      ca9c594aba4534beae6ce71def209994
      2016-04-26 01:39:25
      NaN
      East 74th Street
      []
      low
      40.7699
      -73.9565
      b7de4cb395920136663132057fa89d84
      []
      2700
      315 East 74th Street
    
    
      67
      6891766
      1.0
      1
      5a922b81c28dd44b2ffadf603c90c0d5
      2016-04-19 01:42:01
      NaN
      West 50th Street
      ['Doorman', 'Fitness Center', 'Pool', 'Elevato...
      low
      40.7619
      -73.9854
      62b685cc0d876c3a1a51d63a0d6a8082
      []
      3697
      250 West 50TH Street
    
    
      73
      6890772
      1.0
      1
      bde4018a1c4f290eb89af695a7a1d644
      2016-04-18 04:59:44
      NaN
      East 85th Street
      ['Elevator', 'Cats Allowed', 'Dogs Allowed', '...
      low
      40.7777
      -73.9532
      d1737922fe92ccb0dc37ba85589e6415
      []
      2945
      225 East 85th Street
    
    
      174
      6853469
      1.0
      1
      0
      2016-04-10 05:13:03
      NaN
      83rd Avenue
      ['Doorman', 'Elevator', 'Cats Allowed', 'Dogs ...
      low
      40.7111
      -73.8272
      e6472c7237327dd3903b3d6f6a94515a
      ['https://photos.renthop.com/2/6853469_75548f7...
      1675
      123-30 83rd Avenue
    
    
      210
      6846567
      1.0
      1
      6289dd7229f0d3b87254860764be70ab
      2016-04-09 01:39:12
      NaN
      West 28th Street
      ['Doorman', 'Fitness Center', 'Elevator', 'Cat...
      low
      40.7512
      -74.0026
      62b685cc0d876c3a1a51d63a0d6a8082
      []
      4030
      525 West 28th Street

Another query to get all apt rows with price above 1M$



In [26]:

    
(df.price>1000000).head(5)









    Out[26]:





0    False
1    False
2    False
3    False
4    False
Name: price, dtype: bool



In [27]:

    
df[df.price>1000000]









    Out[27]:







  
    
      
      listing_id
      bathrooms
      bedrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      longitude
      manager_id
      photos
      price
      street_address
    
  
  
    
      9590
      7208794
      1.0
      2
      5d3525a5085445e7fcd64a53aac3cb0a
      2016-06-24 05:02:58
      NaN
      West 116th Street
      ['Doorman', 'Elevator', 'Cats Allowed', 'Dogs ...
      low
      40.8011
      -73.9480
      d1737922fe92ccb0dc37ba85589e6415
      []
      1150000
      40 West 116th Street
    
    
      19558
      7208764
      1.0
      2
      cd25bbea2af848ebe9821da820b725da
      2016-06-24 05:02:11
      NaN
      Hudson Street
      ['Doorman', 'Elevator', 'Cats Allowed', 'Dogs ...
      low
      40.7299
      -74.0071
      d1737922fe92ccb0dc37ba85589e6415
      []
      4490000
      421 Hudson Street
    
    
      29665
      7013217
      1.0
      1
      37385c8a58176b529964083315c28e32
      2016-05-14 05:21:28
      
      West 57th Street
      ['Doorman', 'Cats Allowed', 'Dogs Allowed']
      low
      40.7676
      -73.9844
      8f5a9c893f6d602f4953fcc0b8e6e9b4
      []
      1070000
      333 West 57th Street
    
    
      30689
      7036279
      1.0
      1
      37385c8a58176b529964083315c28e32
      2016-05-19 02:37:06
      This 1 Bedroom apartment is located on a prime...
      West 57th Street
      ['Doorman', 'Elevator', 'Pre-War', 'Dogs Allow...
      low
      40.7676
      -73.9844
      18133bc914e6faf6f8cc1bf29d66fc0d
      ['https://photos.renthop.com/2/7036279_924b52f...
      1070000
      333 West 57th Street



In [28]:

    
df[(df.price>1000) & (df.price<10_000)].head(3) # parentheses are required in query!!!!









    Out[28]:







  
    
      
      listing_id
      bathrooms
      bedrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      longitude
      manager_id
      photos
      price
      street_address
    
  
  
    
      0
      7211212
      1.5
      3
      53a5b119ba8f7b61d4e010512e0dfc85
      2016-06-24 07:54:24
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      Metropolitan Avenue
      []
      medium
      40.7145
      -73.9425
      5ba989232d0489da1b5f2c45f6688adc
      ['https://photos.renthop.com/2/7211212_1ed4542...
      3000
      792 Metropolitan Avenue
    
    
      1
      7150865
      1.0
      2
      c5c8a357cba207596b04d1afd1e4f130
      2016-06-12 12:19:27
      
      Columbus Avenue
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
      low
      40.7947
      -73.9667
      7533621a882f71e25173b27e3139d83d
      ['https://photos.renthop.com/2/7150865_be3306c...
      5465
      808 Columbus Avenue
    
    
      2
      6887163
      1.0
      1
      c3ba40552e2120b0acfc3cb5730bb2aa
      2016-04-17 03:26:41
      Top Top West Village location, beautiful Pre-w...
      W 13 Street
      ['Laundry In Building', 'Dishwasher', 'Hardwoo...
      high
      40.7388
      -74.0018
      d9039c43983f6e564b1482b273bd7b01
      ['https://photos.renthop.com/2/6887163_de85c42...
      2850
      241 W 13 Street

Histogram variables



In [29]:

    
df.bathrooms.value_counts()









    Out[29]:





1.0     39422
2.0      7660
3.0       745
1.5       645
0.0       313
2.5       277
4.0       159
3.5        70
4.5        29
5.0        20
5.5         5
6.0         4
6.5         1
10.0        1
7.0         1
Name: bathrooms, dtype: int64



In [30]:

    
import matplotlib.pyplot as plt

plt.xlabel('Num Bathrooms')
plt.ylabel('Num Apts')
plt.hist(df.bathrooms, bins=20)
plt.show()









    





<Figure size 640x480 with 1 Axes>



In [31]:

    
plt.xlabel('Num Bedrooms')
plt.ylabel('Num Apts')
plt.hist(df.bedrooms, bins=20)
plt.show()



In [32]:

    
plt.xlabel('Price')
plt.ylabel('Num Apts at that price')
plt.hist(df.price, bins=45) # not useful since loooong right tail
plt.show()



In [33]:

    
import numpy as np
df_log = df.copy()
df_log["price"] = np.log( df["price"] )
#OR:
#df_log["price"] = df["price"].apply(np.log)
df_log.price.head(3)









    Out[33]:





0    8.006368
1    8.606119
2    7.955074
Name: price, dtype: float64



In [34]:

    
plt.xlabel('Price')
plt.ylabel('Num Apts at that price')
plt.hist(df_log.price, bins=45) # not useful since loooong right tail
plt.show()

Inter-variable variation



In [35]:

    
bybaths.plot.line('bathrooms','price', style='-o')   
plt.show()



In [36]:

    
# OR, can do directly
plt.plot(bybaths.bathrooms, bybaths.price, marker='o') # note slightly different arguments
plt.show()

Test your knowledge

Get the column of bathrooms from df

Iterate through list of columns and print them out

Get row 6 from df

Set the index of df to bedrooms then get all apartments with 3 bedrooms using the index

Get all rows where price > 100_000 per month

Drop column building_id from df

Get all rows where price is between 1000 and 2000.

Apply the np.log() function to the price and store in a new column called log_price

Get columns latitude and longitude into their own dataframe

Solutions



In [68]:

    
df.bathrooms

for colname in df:
    print(colname)
    
df.loc[5]

df = df.set_index('bedrooms')
df.loc[3]

df[df.price > 100_000]

#del df.building_id
df = df.drop('building_id', axis=1)

df[ (df.price>1000) & (df.price<2000) ]

df['log_price'] = np.log( df.price )

df[ ['longitude','latitude'] ]

Clean up

Prices



In [38]:

    
df_clean = df[(df.price>1_000) & (df.price<10_000)]



In [39]:

    
plt.xlabel('Price')
plt.ylabel('Num Apts at that price')
plt.hist(df_clean.price, bins=45)
plt.show()



In [40]:

    
plt.scatter(df_clean.bedrooms, df_clean.price, alpha=0.1)
plt.xlabel("Bedrooms", fontsize=12) 
plt.ylabel("Rent price", fontsize=12)
plt.show()

Location



In [41]:

    
df_missing = df_clean[(df_clean.longitude==0) |
                      (df_clean.latitude==0)]
len(df_missing)









    Out[41]:





11



In [42]:

    
# only 11 filter out
df_clean = df_clean[(df_clean.longitude!=0) |
                    (df_clean.latitude!=0)]

Use & to get "and" whereas | is "or".

Using GPS checker gives a rough outline for New York City of latitude, longitude 40.55, -74.1 on the lower left and 40.94, -73.67 on the upper right. Let's filter others out.



In [43]:

    
df_clean = df_clean[(df_clean['latitude']>40.55) &
                    (df_clean['latitude']<40.94) &
                    (df_clean['longitude']>-74.1) &
                    (df_clean['longitude']<-73.67)]
print(len(df_clean), len(df))









    



48300 49352

Heatmap



In [44]:

    
from matplotlib import colors

fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111)
plt.scatter(df_clean.longitude,
            df_clean.latitude,
            alpha=0.8,
            s=1.5, # pixel size
            vmin=1000, vmax=4000, # limit price range so 4000 is red
            c=df_clean['price'],
            cmap='rainbow', # color map
            marker='.')
plt.colorbar()
plt.show()



In [ ]:

Let's train a model

Get numeric fields only:



In [45]:

    
df_train = df_clean[['bathrooms', 'bedrooms', 'longitude', 'latitude', 'price']]
df_train.head(5)



In [46]:

    
X_train = df_train[['bedrooms','bathrooms','latitude','longitude']]
y_train = df_train['price']



In [47]:

    
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=100, n_jobs=-1, oob_score=True)

rf.fit(X_train, y_train)
print(f"OOB R^2 score is {rf.oob_score_:.3f} (range is -infinity to 1.0; 1.0 is perfect)")









    



OOB R^2 score is 0.867 (range is -infinity to 1.0; 1.0 is perfect)

What does model tell us about features?



In [48]:

    
# pip install rfpimp
from rfpimp import *
I = oob_importances(rf, X_train, y_train)
I.plot(kind='barh', legend=False)
plt.show()

Synthesize features



In [49]:

    
df['one'] = 1
df.head(3)









    Out[49]:







  
    
      
      listing_id
      bathrooms
      bedrooms
      building_id
      created
      description
      display_address
      features
      interest_level
      latitude
      longitude
      manager_id
      photos
      price
      street_address
      log_price
      one
    
  
  
    
      0
      7211212
      1.5
      3
      53a5b119ba8f7b61d4e010512e0dfc85
      2016-06-24 07:54:24
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      Metropolitan Avenue
      []
      medium
      40.7145
      -73.9425
      5ba989232d0489da1b5f2c45f6688adc
      ['https://photos.renthop.com/2/7211212_1ed4542...
      3000
      792 Metropolitan Avenue
      8.006368
      1
    
    
      1
      7150865
      1.0
      2
      c5c8a357cba207596b04d1afd1e4f130
      2016-06-12 12:19:27
      
      Columbus Avenue
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
      low
      40.7947
      -73.9667
      7533621a882f71e25173b27e3139d83d
      ['https://photos.renthop.com/2/7150865_be3306c...
      5465
      808 Columbus Avenue
      8.606119
      1
    
    
      2
      6887163
      1.0
      1
      c3ba40552e2120b0acfc3cb5730bb2aa
      2016-04-17 03:26:41
      Top Top West Village location, beautiful Pre-w...
      W 13 Street
      ['Laundry In Building', 'Dishwasher', 'Hardwoo...
      high
      40.7388
      -74.0018
      d9039c43983f6e564b1482b273bd7b01
      ['https://photos.renthop.com/2/6887163_de85c42...
      2850
      241 W 13 Street
      7.955074
      1

Add random column of appropriate length



In [50]:

    
df2 = df.copy()
df2['random'] = np.random.random(size=len(df))
df2.head(2).T









    Out[50]:







  
    
      
      0
      1
    
  
  
    
      listing_id
      7211212
      7150865
    
    
      bathrooms
      1.5
      1
    
    
      bedrooms
      3
      2
    
    
      building_id
      53a5b119ba8f7b61d4e010512e0dfc85
      c5c8a357cba207596b04d1afd1e4f130
    
    
      created
      2016-06-24 07:54:24
      2016-06-12 12:19:27
    
    
      description
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      
    
    
      display_address
      Metropolitan Avenue
      Columbus Avenue
    
    
      features
      []
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
    
    
      interest_level
      medium
      low
    
    
      latitude
      40.7145
      40.7947
    
    
      longitude
      -73.9425
      -73.9667
    
    
      manager_id
      5ba989232d0489da1b5f2c45f6688adc
      7533621a882f71e25173b27e3139d83d
    
    
      photos
      ['https://photos.renthop.com/2/7211212_1ed4542...
      ['https://photos.renthop.com/2/7150865_be3306c...
    
    
      price
      3000
      5465
    
    
      street_address
      792 Metropolitan Avenue
      808 Columbus Avenue
    
    
      log_price
      8.00637
      8.60612
    
    
      one
      1
      1
    
    
      random
      0.630259
      0.124522



In [51]:

    
df2['i'] = [i for i in range(len(df))]
df2.head(2).T









    Out[51]:







  
    
      
      0
      1
    
  
  
    
      listing_id
      7211212
      7150865
    
    
      bathrooms
      1.5
      1
    
    
      bedrooms
      3
      2
    
    
      building_id
      53a5b119ba8f7b61d4e010512e0dfc85
      c5c8a357cba207596b04d1afd1e4f130
    
    
      created
      2016-06-24 07:54:24
      2016-06-12 12:19:27
    
    
      description
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
      
    
    
      display_address
      Metropolitan Avenue
      Columbus Avenue
    
    
      features
      []
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
    
    
      interest_level
      medium
      low
    
    
      latitude
      40.7145
      40.7947
    
    
      longitude
      -73.9425
      -73.9667
    
    
      manager_id
      5ba989232d0489da1b5f2c45f6688adc
      7533621a882f71e25173b27e3139d83d
    
    
      photos
      ['https://photos.renthop.com/2/7211212_1ed4542...
      ['https://photos.renthop.com/2/7150865_be3306c...
    
    
      price
      3000
      5465
    
    
      street_address
      792 Metropolitan Avenue
      808 Columbus Avenue
    
    
      log_price
      8.00637
      8.60612
    
    
      one
      1
      1
    
    
      random
      0.630259
      0.124522
    
    
      i
      0
      1

The data set has a features attribute (of type string) with a list of features about the apartment.



In [52]:

    
df.features.head(5)









    Out[52]:





0                                                   []
1    ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
2    ['Laundry In Building', 'Dishwasher', 'Hardwoo...
3                        ['Hardwood Floors', 'No Fee']
4                                          ['Pre-War']
Name: features, dtype: object

Let's create three new boolean columns that indicate whether the apartment has a doorman, parking, or laundry. Start by making a copy of the data frame because we'll be modifying it (otherwise we'll get error "A value is trying to be set on a copy of a slice from a DataFrame"):



In [53]:

    
df_aug = df[['bedrooms','bathrooms','latitude','longitude',
             'features','price']].copy()

Then we normalize the features column so that missing features values become blanks and we lowercase all of the strings.



In [54]:

    
# rewrite features column
df_aug['features'] = df_aug['features'].fillna('') # fill missing w/blanks
df_aug['features'] = df_aug['features'].str.lower() # normalize to lower case

Create the three boolean columns by checking for the presence or absence of a string in the features column.



In [55]:

    
df_aug['doorman'] = df_aug['features'].str.contains("doorman")
df_aug['parking'] = df_aug['features'].str.contains("parking|garage")
df_aug['laundry'] = df_aug['features'].str.contains("laundry")
del df_aug['features'] # don't need this anymore
df_aug.head(3)









    Out[55]:







  
    
      
      bedrooms
      bathrooms
      latitude
      longitude
      price
      doorman
      parking
      laundry
    
  
  
    
      0
      3
      1.5
      40.7145
      -73.9425
      3000
      False
      False
      False
    
    
      1
      2
      1.0
      40.7947
      -73.9667
      5465
      True
      False
      False
    
    
      2
      1
      1.0
      40.7388
      -74.0018
      2850
      False
      False
      True

The other way to drop a column other than del is with drop() function:



In [56]:

    
df2 = df.drop('description',axis=1) # drop doesn't affect df in place, returns new one
df2.head(2).T # kill this column, return new df without that column









    Out[56]:







  
    
      
      0
      1
    
  
  
    
      listing_id
      7211212
      7150865
    
    
      bathrooms
      1.5
      1
    
    
      bedrooms
      3
      2
    
    
      building_id
      53a5b119ba8f7b61d4e010512e0dfc85
      c5c8a357cba207596b04d1afd1e4f130
    
    
      created
      2016-06-24 07:54:24
      2016-06-12 12:19:27
    
    
      display_address
      Metropolitan Avenue
      Columbus Avenue
    
    
      features
      []
      ['Doorman', 'Elevator', 'Fitness Center', 'Cat...
    
    
      interest_level
      medium
      low
    
    
      latitude
      40.7145
      40.7947
    
    
      longitude
      -73.9425
      -73.9667
    
    
      manager_id
      5ba989232d0489da1b5f2c45f6688adc
      7533621a882f71e25173b27e3139d83d
    
    
      photos
      ['https://photos.renthop.com/2/7211212_1ed4542...
      ['https://photos.renthop.com/2/7150865_be3306c...
    
    
      price
      3000
      5465
    
    
      street_address
      792 Metropolitan Avenue
      808 Columbus Avenue
    
    
      log_price
      8.00637
      8.60612
    
    
      one
      1
      1

Let's do some numerical feature stuff



In [57]:

    
df_aug["beds_to_baths"] = df_aug["bedrooms"]/(df_aug["bathrooms"]+1)
df_aug.head(3)









    Out[57]:







  
    
      
      bedrooms
      bathrooms
      latitude
      longitude
      price
      doorman
      parking
      laundry
      beds_to_baths
    
  
  
    
      0
      3
      1.5
      40.7145
      -73.9425
      3000
      False
      False
      False
      1.2
    
    
      1
      2
      1.0
      40.7947
      -73.9667
      5465
      True
      False
      False
      1.0
    
    
      2
      1
      1.0
      40.7388
      -74.0018
      2850
      False
      False
      True
      0.5

Beyond our scope here, but let's retrain model to see if it improves OOB score.



In [58]:

    
df_clean = df_aug[(df.price>1_000) & (df.price<10_000)]
df_train = df_clean[['bathrooms', 'bedrooms', 'longitude', 'latitude',
                   'doorman', 'parking', 'laundry', 'beds_to_baths', 'price']]
X_train = df_train[['bedrooms','bathrooms','latitude','longitude',
                    'doorman', 'parking', 'laundry', 'beds_to_baths']]
y_train = df_train['price']

rf = RandomForestRegressor(n_estimators=100, n_jobs=-1, oob_score=True)
rf.fit(X_train, y_train)
print(f"OOB R^2 score is {rf.oob_score_:.3f} (range is -infinity to 1.0; 1.0 is perfect)")









    



OOB R^2 score is 0.870 (range is -infinity to 1.0; 1.0 is perfect)



In [114]:

    
I = pd.DataFrame(data={'Feature':X_train.columns, 'Importance':rf.feature_importances_})



In [117]:

    
I.sort_values('Importance',ascending=False)









    Out[117]:







  
    
      
      Feature
      Importance
    
  
  
    
      1
      bathrooms
      0.426254
    
    
      3
      longitude
      0.234839
    
    
      2
      latitude
      0.168338
    
    
      0
      bedrooms
      0.067082
    
    
      7
      beds_to_baths
      0.062770
    
    
      4
      doorman
      0.026538
    
    
      6
      laundry
      0.010321
    
    
      5
      parking
      0.003859

That score is slightly better but not by much.

Convert categorical to numeric data

This is not general but works for small set of categories:



In [59]:

    
df['interest_level'] = df['interest_level'].map({'low':1,'medium':2,'high':3})



In [60]:

    
df[['interest_level']].head(5)









    Out[60]:







  
    
      
      interest_level
    
  
  
    
      0
      2
    
    
      1
      1
    
    
      2
      3
    
    
      3
      1
    
    
      4
      1

Convert types



In [118]:

    
df['some_boolean'] = True



In [120]:

    
df['some_boolean'] = df['some_boolean'].astype('int8')
df.some_boolean.head()









    Out[120]:





bedrooms
3    1
2    1
1    1
1    1
4    1
Name: some_boolean, dtype: int8

Convert dates



In [64]:

    
df['dayofweek'] = df['created'].dt.dayofweek  # add dow column
df['day'] = df['created'].dt.day
df['month'] = df['created'].dt.month
df.head(1).T









    Out[64]:







  
    
      
      0
    
  
  
    
      listing_id
      7211212
    
    
      bathrooms
      1.5
    
    
      bedrooms
      3
    
    
      building_id
      53a5b119ba8f7b61d4e010512e0dfc85
    
    
      created
      2016-06-24 07:54:24
    
    
      description
      A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
    
    
      display_address
      Metropolitan Avenue
    
    
      features
      []
    
    
      interest_level
      2
    
    
      latitude
      40.7145
    
    
      longitude
      -73.9425
    
    
      manager_id
      5ba989232d0489da1b5f2c45f6688adc
    
    
      photos
      ['https://photos.renthop.com/2/7211212_1ed4542...
    
    
      price
      3000
    
    
      street_address
      792 Metropolitan Avenue
    
    
      log_price
      8.00637
    
    
      one
      1
    
    
      some_boolean
      1
    
    
      dayofweek
      4
    
    
      day
      24
    
    
      month
      6

Feather format



In [65]:

    
df.to_feather("/tmp/rent.feather")



In [66]:

    
% time df = pd.read_feather("/tmp/rent.feather")









    



CPU times: user 70.6 ms, sys: 43.9 ms, total: 114 ms
Wall time: 113 ms

Compare to loading CSV; like 5x slower:



In [67]:

    
% time df = pd.read_csv("data/rent.csv")









    



CPU times: user 597 ms, sys: 73.5 ms, total: 671 ms
Wall time: 670 ms

Test your knowledge

Filter out all rows with more than 5 bathrooms and put back into same dataframe

Show a scatter plot of bathrooms vs price

Show a heatmap whose color is a function of number of bedrooms.

Insert a column called avg_price that is the average of all prices

Using a dictionary, convert all original interest_level values to 10, 20, 30 for the low, medium, and high categories

Convert manager_id column to be categorical not string type

Create a new column called day_of_year from the created field with the day of year, 1-365 for each row

	bathrooms	bedrooms	building_id	created	description	display_address	features	interest_level	latitude	listing_id	longitude	manager_id	photos	price	street_address
0	1.5	3	53a5b119ba8f7b61d4e010512e0dfc85	2016-06-24 07:54:24	A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...	Metropolitan Avenue	[]	medium	40.7145	7211212	-73.9425	5ba989232d0489da1b5f2c45f6688adc	['https://photos.renthop.com/2/7211212_1ed4542...	3000	792 Metropolitan Avenue
1	1.0	2	c5c8a357cba207596b04d1afd1e4f130	2016-06-12 12:19:27		Columbus Avenue	['Doorman', 'Elevator', 'Fitness Center', 'Cat...	low	40.7947	7150865	-73.9667	7533621a882f71e25173b27e3139d83d	['https://photos.renthop.com/2/7150865_be3306c...	5465	808 Columbus Avenue

	0	1
bathrooms	1.5	1
bedrooms	3	2
building_id	53a5b119ba8f7b61d4e010512e0dfc85	c5c8a357cba207596b04d1afd1e4f130
created	2016-06-24 07:54:24	2016-06-12 12:19:27
description	A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
display_address	Metropolitan Avenue	Columbus Avenue
features	[]	['Doorman', 'Elevator', 'Fitness Center', 'Cat...
interest_level	medium	low
latitude	40.7145	40.7947
listing_id	7211212	7150865
longitude	-73.9425	-73.9667
manager_id	5ba989232d0489da1b5f2c45f6688adc	7533621a882f71e25173b27e3139d83d
photos	['https://photos.renthop.com/2/7211212_1ed4542...	['https://photos.renthop.com/2/7150865_be3306c...
price	3000	5465
street_address	792 Metropolitan Avenue	808 Columbus Avenue

	bathrooms	bedrooms	latitude	listing_id	longitude	price
count	49352.00000	49352.000000	49352.000000	4.935200e+04	49352.000000	4.935200e+04
mean	1.21218	1.541640	40.741545	7.024055e+06	-73.955716	3.830174e+03
std	0.50142	1.115018	0.638535	1.262746e+05	1.177912	2.206687e+04
min	0.00000	0.000000	0.000000	6.811957e+06	-118.271000	4.300000e+01
25%	1.00000	1.000000	40.728300	6.915888e+06	-73.991700	2.500000e+03
50%	1.00000	1.000000	40.751800	7.021070e+06	-73.977900	3.150000e+03
75%	1.00000	2.000000	40.774300	7.128733e+06	-73.954800	4.100000e+03
max	10.00000	8.000000	44.883500	7.753784e+06	0.000000	4.490000e+06

	bedrooms	latitude	listing_id	longitude	price
bathrooms
0.0	0.849840	40.756168	7.027187e+06	-73.970516	4001.530351
1.0	1.251662	40.740873	7.024082e+06	-73.954589	3233.940490
1.5	2.271318	40.748909	7.026701e+06	-73.965980	4257.066667
2.0	2.685117	40.750483	7.022952e+06	-73.974603	5501.066971
2.5	2.851986	40.759694	7.018145e+06	-73.968822	9028.920578
3.0	3.293960	40.764863	7.028051e+06	-73.999571	10391.060403
3.5	3.542857	40.764344	7.061043e+06	-73.964676	14534.785714
4.0	4.301887	40.513449	7.025477e+06	-73.492230	16239.213836
4.5	4.137931	40.780462	7.058375e+06	-73.900438	20868.000000
5.0	4.400000	38.723515	7.024429e+06	-70.273665	27214.700000
5.5	4.400000	40.745340	6.975588e+06	-73.975340	26300.000000
6.0	5.250000	40.762925	7.005858e+06	-73.966400	50250.000000
6.5	7.000000	40.833600	7.198431e+06	-74.216000	14500.000000
7.0	5.000000	40.733500	6.840943e+06	-73.997400	60000.000000
10.0	2.000000	40.763300	6.849204e+06	-73.984900	3600.000000

	listing_id	bathrooms	bedrooms	building_id	created	description	display_address	features	interest_level	latitude	longitude	manager_id	photos	price	street_address
0	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
1	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
3	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
4	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False

	listing_id	bathrooms	bedrooms	building_id	created	description	display_address	features	interest_level	latitude	longitude	manager_id	photos	price	street_address
57	6923973	1.0	0	ca9c594aba4534beae6ce71def209994	2016-04-26 01:39:25	NaN	East 74th Street	[]	low	40.7699	-73.9565	b7de4cb395920136663132057fa89d84	[]	2700	315 East 74th Street
67	6891766	1.0	1	5a922b81c28dd44b2ffadf603c90c0d5	2016-04-19 01:42:01	NaN	West 50th Street	['Doorman', 'Fitness Center', 'Pool', 'Elevato...	low	40.7619	-73.9854	62b685cc0d876c3a1a51d63a0d6a8082	[]	3697	250 West 50TH Street
73	6890772	1.0	1	bde4018a1c4f290eb89af695a7a1d644	2016-04-18 04:59:44	NaN	East 85th Street	['Elevator', 'Cats Allowed', 'Dogs Allowed', '...	low	40.7777	-73.9532	d1737922fe92ccb0dc37ba85589e6415	[]	2945	225 East 85th Street
174	6853469	1.0	1	0	2016-04-10 05:13:03	NaN	83rd Avenue	['Doorman', 'Elevator', 'Cats Allowed', 'Dogs ...	low	40.7111	-73.8272	e6472c7237327dd3903b3d6f6a94515a	['https://photos.renthop.com/2/6853469_75548f7...	1675	123-30 83rd Avenue
210	6846567	1.0	1	6289dd7229f0d3b87254860764be70ab	2016-04-09 01:39:12	NaN	West 28th Street	['Doorman', 'Fitness Center', 'Elevator', 'Cat...	low	40.7512	-74.0026	62b685cc0d876c3a1a51d63a0d6a8082	[]	4030	525 West 28th Street

	listing_id	bathrooms	bedrooms	building_id	created	description	display_address	features	interest_level	latitude	longitude	manager_id	photos	price	street_address
9590	7208794	1.0	2	5d3525a5085445e7fcd64a53aac3cb0a	2016-06-24 05:02:58	NaN	West 116th Street	['Doorman', 'Elevator', 'Cats Allowed', 'Dogs ...	low	40.8011	-73.9480	d1737922fe92ccb0dc37ba85589e6415	[]	1150000	40 West 116th Street
19558	7208764	1.0	2	cd25bbea2af848ebe9821da820b725da	2016-06-24 05:02:11	NaN	Hudson Street	['Doorman', 'Elevator', 'Cats Allowed', 'Dogs ...	low	40.7299	-74.0071	d1737922fe92ccb0dc37ba85589e6415	[]	4490000	421 Hudson Street
29665	7013217	1.0	1	37385c8a58176b529964083315c28e32	2016-05-14 05:21:28		West 57th Street	['Doorman', 'Cats Allowed', 'Dogs Allowed']	low	40.7676	-73.9844	8f5a9c893f6d602f4953fcc0b8e6e9b4	[]	1070000	333 West 57th Street
30689	7036279	1.0	1	37385c8a58176b529964083315c28e32	2016-05-19 02:37:06	This 1 Bedroom apartment is located on a prime...	West 57th Street	['Doorman', 'Elevator', 'Pre-War', 'Dogs Allow...	low	40.7676	-73.9844	18133bc914e6faf6f8cc1bf29d66fc0d	['https://photos.renthop.com/2/7036279_924b52f...	1070000	333 West 57th Street

	Feature	Importance
1	bathrooms	0.426254
3	longitude	0.234839
2	latitude	0.168338
0	bedrooms	0.067082
7	beds_to_baths	0.062770
4	doorman	0.026538
6	laundry	0.010321
5	parking	0.003859