notebook.community

Edit and run



In [1]:

    
import h2o
import time
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.deeplearning import H2ODeepLearningEstimator



In [2]:

    
# Explore a typical Data Science workflow with H2O and Python
#
# Goal: assist the manager of CitiBike of NYC to load-balance the bicycles
# across the CitiBike network of stations, by predicting the number of bike
# trips taken from the station every day.  Use 10 million rows of historical
# data, and eventually add weather data.


# Connect to a cluster
h2o.init()









    



Warning: Version mismatch. H2O is version (unknown), but the python package is version UNKNOWN.






    




H2O cluster uptime: 
7 minutes 51 seconds 28 milliseconds 
H2O cluster version: 
(unknown)
H2O cluster name: 
spIdea
H2O cluster total nodes: 
1
H2O cluster total memory: 
12.44 GB
H2O cluster total cores: 
8
H2O cluster allowed cores: 
8
H2O cluster healthy: 
True
H2O Connection ip: 
127.0.0.1
H2O Connection port: 
54321



In [3]:

    
from h2o.h2o import _locate # private function. used to find files within h2o git project directory.

# Set this to True if you want to fetch the data directly from S3.
# This is useful if your cluster is running in EC2.
data_source_is_s3 = False

def mylocate(s):
    if data_source_is_s3:
        return "s3n://h2o-public-test-data/" + s
    else:
        return _locate(s)



In [4]:

    
# Pick either the big or the small demo.
# Big data is 10M rows
small_test = [mylocate("bigdata/laptop/citibike-nyc/2013-10.csv")]
big_test =   [mylocate("bigdata/laptop/citibike-nyc/2013-07.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2013-08.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2013-09.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2013-10.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2013-11.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2013-12.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2014-01.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2014-02.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2014-03.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2014-04.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2014-05.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2014-06.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2014-07.csv"),
              mylocate("bigdata/laptop/citibike-nyc/2014-08.csv")]

# ----------

# 1- Load data - 1 row per bicycle trip.  Has columns showing the start and end
# station, trip duration and trip start time and day.  The larger dataset
# totals about 10 million rows
print "Import and Parse bike data"
data = h2o.import_file(path=small_test)









    



Import and Parse bike data

Parse Progress: [##################################################] 100%



In [5]:

    
# ----------

# 2- light data munging: group the bike starts per-day, converting the 10M rows
# of trips to about 140,000 station&day combos - predicting the number of trip
# starts per-station-per-day.

# Convert start time to: Day since the Epoch
startime = data["starttime"]
secsPerDay=1000*60*60*24
data["Days"] = (startime/secsPerDay).floor()
data.describe()









    



Rows:1,037,712 Cols:15

Chunk compression summary:






    




chunk_type
chunk_name
count
count_percentage
size
size_percentage
C0L
Constant Integers
17
2.2135415
    1.3 KB
0.0
CBS
Bits
48
6.25
  130.0 KB
0.2
C1N
1-Byte Integers (w/o NAs)
48
6.25
 1016.6 KB
1.6816467
C1S
1-Byte Fractions
79
10.286459
    1.6 MB
2.7740283
C2
2-Byte Integers
243
31.640625
   10.0 MB
16.99867
C2S
2-Byte Fractions
49
6.3802085
    2.0 MB
3.429456
C4
4-Byte Integers
32
4.166667
    2.6 MB
4.4884815
C8
64-bit Integers
60
7.8125
    9.9 MB
16.745453
C8D
64-bit Reals
192
25.0
   31.7 MB
53.665054






    



Frame distribution summary:






    





size
number_of_rows
number_of_chunks_per_column
number_of_chunks
172.16.2.52:54321
   59.0 MB
1037712.0
48.0
768.0
mean
   59.0 MB
1037712.0
48.0
768.0
min
   59.0 MB
1037712.0
48.0
768.0
max
   59.0 MB
1037712.0
48.0
768.0
stddev
      0  B
0.0
0.0
0.0
total
   59.0 MB
1037712.0
48.0
768.0






    










    





       tripduration  starttime        stoptime         start station id  start station name      start station latitude  start station longitude  end station id  end station name              end station latitude  end station longitude  bikeid       usertype      birth year   gender        Days         
type   int           time             time             int               enum                    real                    real                     int             enum                          real                  real                   int          enum          int          int           int          
mins   60.0          1.380610868e+12  1.380611083e+12  72.0              0.0                     40.680342423            -74.01713445             72.0            0.0                           40.680342423          -74.01713445           14529.0      0.0           1899.0       0.0           15979.0      
mean   825.614754383 1.38191371692e+12 1.38191454253e+12 443.714212614     NaN                     40.7345188586           -73.9911328848           443.207421712   NaN                           40.7342847885         -73.9912702982         17644.0716451 0.906095332809 1975.77839486 1.12375591686 15993.8523906
maxs   1259480.0     1.383289197e+12  1.38341851e+12   3002.0            329.0                   40.770513               -73.9500479759           3002.0          329.0                         40.770513             -73.9500479759         20757.0      1.0           1997.0       2.0           16010.0      
sigma  2000.3732323  778871729.132    778847387.503    354.434325075     NaN                     0.0195734073053         0.0123161234106          357.398217058   NaN                           0.0195578458116       0.0123855811965        1717.68112134 0.291696182123 11.1314906238 0.544380593291 9.02215033588
zeros  0             0                0                0                 5239                    0                       0                        0               5449                          0                     0                      0            97446         0            97498         0            
missing 0             0                0                0                 0                       0                       0                        0               0                             0                     0                      0            0             97445        0             0            
0      326.0         1.380610868e+12  1.380611194e+12  239.0             Willoughby St & Fleet St 40.69196566             -73.9813018              366.0           Clinton Ave & Myrtle Ave      40.693261             -73.968896             16052.0      Subscriber    1982.0       1.0           15979.0      
1      729.0         1.380610881e+12  1.38061161e+12   322.0             Clinton St & Tillary St 40.696192               -73.991218               398.0           Atlantic Ave & Furman St      40.69165183           -73.9999786            19412.0      Customer      nan          0.0           15979.0      
2      520.0         1.380610884e+12  1.380611404e+12  174.0             E 25 St & 1 Ave         40.7381765              -73.97738662             403.0           E 2 St & 2 Ave                40.72502876           -73.99069656           19645.0      Subscriber    1984.0       1.0           15979.0      
3      281.0         1.380610885e+12  1.380611166e+12  430.0             York St & Jay St        40.7014851              -73.98656928             323.0           Lawrence St & Willoughby St   40.69236178           -73.98631746           16992.0      Subscriber    1985.0       1.0           15979.0      
4      196.0         1.380610887e+12  1.380611083e+12  403.0             E 2 St & 2 Ave          40.72502876             -73.99069656             401.0           Allen St & Rivington St       40.72019576           -73.98997825           15690.0      Subscriber    1986.0       1.0           15979.0      
5      1948.0        1.380610908e+12  1.380612856e+12  369.0             Washington Pl & 6 Ave   40.73224119             -74.00026394             307.0           Canal St & Rutgers St         40.71427487           -73.98990025           19846.0      Subscriber    1977.0       1.0           15979.0      
6      1327.0        1.380610908e+12  1.380612235e+12  254.0             W 11 St & 6 Ave         40.73532427             -73.99800419             539.0           Metropolitan Ave & Bedford Ave 40.71534825           -73.96024116           14563.0      Subscriber    1986.0       2.0           15979.0      
7      1146.0        1.380610917e+12  1.380612063e+12  490.0             8 Ave & W 33 St         40.751551               -73.993934               438.0           St Marks Pl & 1 Ave           40.72779126           -73.98564945           16793.0      Subscriber    1959.0       1.0           15979.0      
8      380.0         1.380610918e+12  1.380611298e+12  468.0             Broadway & W 55 St      40.7652654              -73.98192338             385.0           E 55 St & 2 Ave               40.75797322           -73.96603308           16600.0      Customer      nan          0.0           15979.0      
9      682.0         1.380610925e+12  1.380611607e+12  300.0             Shevchenko Pl & E 6 St  40.728145               -73.990214               519.0           Pershing Square N             40.75188406           -73.97770164           15204.0      Subscriber    1992.0       1.0           15979.0



In [6]:

    
# Now do a monster Group-By.  Count bike starts per-station per-day.  Ends up
# with about 340 stations times 400 days (140,000 rows).  This is what we want
# to predict.
grouped = data.group_by(["Days","start station name"])
bpd = grouped.count().get_frame() # Compute bikes-per-day
bpd.set_name(2,"bikes")
bpd.show()
bpd.describe()
bpd.dim









    





  Days start station name       bikes
 15980 9 Ave & W 18 St            137
 15989 Allen St & Hester St       110
 16003 Centre St & Chambers St     142
 15995 Concord St & Bridge St      21
 15987 E 14 St & Avenue B         113
 16005 8 Ave & W 52 St            129
 16009 South St & Whitehall St      70
 15989 Pike St & E Broadway        55
 15991 Watts St & Greenwich St     101
 15985 Monroe St & Bedford Ave      15







    



Rows:10,450 Cols:3

Chunk compression summary:






    




chunk_type
chunk_name
count
count_percentage
size
size_percentage
C1S
1-Byte Fractions
32
33.333336
   12.8 KB
22.15888
C2
2-Byte Integers
64
66.66667
   45.1 KB
77.84112






    



Frame distribution summary:






    





size
number_of_rows
number_of_chunks_per_column
number_of_chunks
172.16.2.52:54321
   57.9 KB
10450.0
32.0
96.0
mean
   57.9 KB
10450.0
32.0
96.0
min
   57.9 KB
10450.0
32.0
96.0
max
   57.9 KB
10450.0
32.0
96.0
stddev
      0  B
0.0
0.0
0.0
total
   57.9 KB
10450.0
32.0
96.0






    










    





       Days         start station name     bikes        
type   int          enum                   int          
mins   15979.0      0.0                    1.0          
mean   15994.4415311 NaN                    99.3025837321
maxs   16010.0      329.0                  553.0        
sigma  9.23370172444 NaN                    72.9721964301
zeros  0            32                     0            
missing 0            0                      0            
0      15980.0      9 Ave & W 18 St        137.0        
1      15989.0      Allen St & Hester St   110.0        
2      16003.0      Centre St & Chambers St 142.0        
3      15995.0      Concord St & Bridge St 21.0         
4      15987.0      E 14 St & Avenue B     113.0        
5      16005.0      8 Ave & W 52 St        129.0        
6      16009.0      South St & Whitehall St 70.0         
7      15989.0      Pike St & E Broadway   55.0         
8      15991.0      Watts St & Greenwich St 101.0        
9      15985.0      Monroe St & Bedford Ave 15.0         







    Out[6]:





[10450, 3]



In [7]:

    
# Quantiles: the data is fairly unbalanced; some station/day combos are wildly
# more popular than others.
print "Quantiles of bikes-per-day"
bpd["bikes"].quantile().show()









    



Quantiles of bikes-per-day






    





  Probs   bikesQuantiles
  0.01             4.49
  0.1             19   
  0.25            43   
  0.333            57   
  0.5             87   
  0.667           118   
  0.75           137   
  0.9            192   
  0.99           334.51



In [8]:

    
# A little feature engineering
# Add in month-of-year (seasonality; fewer bike rides in winter than summer)
secs = bpd["Days"]*secsPerDay
bpd["Month"]     = secs.month().asfactor()
# Add in day-of-week (work-week; more bike rides on Sunday than Monday)
bpd["DayOfWeek"] = secs.dayOfWeek()
print "Bikes-Per-Day"
bpd.describe()









    



Bikes-Per-Day
Rows:10,450 Cols:3

Chunk compression summary:






    




chunk_type
chunk_name
count
count_percentage
size
size_percentage
CBS
Bits
32
20.0
    3.5 KB
4.709156
C1N
1-Byte Integers (w/o NAs)
32
20.0
   12.3 KB
16.729826
C1S
1-Byte Fractions
32
20.0
   12.8 KB
17.408241
C2
2-Byte Integers
64
40.0
   45.1 KB
61.152775






    



Frame distribution summary:






    





size
number_of_rows
number_of_chunks_per_column
number_of_chunks
172.16.2.52:54321
   73.7 KB
10450.0
32.0
160.0
mean
   73.7 KB
10450.0
32.0
160.0
min
   73.7 KB
10450.0
32.0
160.0
max
   73.7 KB
10450.0
32.0
160.0
stddev
      0  B
0.0
0.0
0.0
total
   73.7 KB
10450.0
32.0
160.0






    










    





       Days         start station name     bikes        Month         DayOfWeek  
type   int          enum                   int          enum          enum       
mins   15979.0      0.0                    1.0          0.0           0.0        
mean   15994.4415311 NaN                    99.3025837321 0.968612440191 NaN        
maxs   16010.0      329.0                  553.0        1.0           6.0        
sigma  9.23370172444 NaN                    72.9721964301 0.174371128617 NaN        
zeros  0            32                     0            328           1635       
missing 0            0                      0            0             0          
0      15980.0      9 Ave & W 18 St        137.0        10            Tue        
1      15989.0      Allen St & Hester St   110.0        10            Thu        
2      16003.0      Centre St & Chambers St 142.0        10            Thu        
3      15995.0      Concord St & Bridge St 21.0         10            Wed        
4      15987.0      E 14 St & Avenue B     113.0        10            Tue        
5      16005.0      8 Ave & W 52 St        129.0        10            Sat        
6      16009.0      South St & Whitehall St 70.0         10            Wed        
7      15989.0      Pike St & E Broadway   55.0         10            Thu        
8      15991.0      Watts St & Greenwich St 101.0        10            Sat        
9      15985.0      Monroe St & Bedford Ave 15.0         10            Sun



In [9]:

    
# ----------
# 3- Fit a model on train; using test as validation

# Function for doing class test/train/holdout split
def split_fit_predict(data):
  global gbm0,drf0,glm0,dl0
  # Classic Test/Train split
  r = data['Days'].runif()   # Random UNIForm numbers, one per row
  train = data[  r  < 0.6]
  test  = data[(0.6 <= r) & (r < 0.9)]
  hold  = data[ 0.9 <= r ]
  print "Training data has",train.ncol,"columns and",train.nrow,"rows, test has",test.nrow,"rows, holdout has",hold.nrow
  bike_names_x = data.names
  bike_names_x.remove("bikes")
  
  # Run GBM
  s = time.time()
  
  gbm0 = H2OGradientBoostingEstimator(ntrees=500, # 500 works well
                                      max_depth=6,
                                      learn_rate=0.1)
    

  gbm0.train(x               =bike_names_x,
             y               ="bikes",
             training_frame  =train,
             validation_frame=test)

  gbm_elapsed = time.time() - s

  # Run DRF
  s = time.time()
    
  drf0 = H2ORandomForestEstimator(ntrees=250, max_depth=30)

  drf0.train(x               =bike_names_x,
             y               ="bikes",
             training_frame  =train,
             validation_frame=test)
    
  drf_elapsed = time.time() - s 
    
    
  # Run GLM
  if "WC1" in bike_names_x: bike_names_x.remove("WC1")
  s = time.time()

  glm0 = H2OGeneralizedLinearEstimator(Lambda=[1e-5], family="poisson")
    
  glm0.train(x               =bike_names_x,
             y               ="bikes",
             training_frame  =train,
             validation_frame=test)

  glm_elapsed = time.time() - s
  
  # Run DL
  s = time.time()

  dl0 = H2ODeepLearningEstimator(hidden=[50,50,50,50], epochs=50)
    
  dl0.train(x               =bike_names_x,
            y               ="bikes",
            training_frame  =train,
            validation_frame=test)
    
  dl_elapsed = time.time() - s
  
  # ----------
  # 4- Score on holdout set & report
  train_r2_gbm = gbm0.model_performance(train).r2()
  test_r2_gbm  = gbm0.model_performance(test ).r2()
  hold_r2_gbm  = gbm0.model_performance(hold ).r2()
#   print "GBM R2 TRAIN=",train_r2_gbm,", R2 TEST=",test_r2_gbm,", R2 HOLDOUT=",hold_r2_gbm
  
  train_r2_drf = drf0.model_performance(train).r2()
  test_r2_drf  = drf0.model_performance(test ).r2()
  hold_r2_drf  = drf0.model_performance(hold ).r2()
#   print "DRF R2 TRAIN=",train_r2_drf,", R2 TEST=",test_r2_drf,", R2 HOLDOUT=",hold_r2_drf
  
  train_r2_glm = glm0.model_performance(train).r2()
  test_r2_glm  = glm0.model_performance(test ).r2()
  hold_r2_glm  = glm0.model_performance(hold ).r2()
#   print "GLM R2 TRAIN=",train_r2_glm,", R2 TEST=",test_r2_glm,", R2 HOLDOUT=",hold_r2_glm
    
  train_r2_dl = dl0.model_performance(train).r2()
  test_r2_dl  = dl0.model_performance(test ).r2()
  hold_r2_dl  = dl0.model_performance(hold ).r2()
#   print " DL R2 TRAIN=",train_r2_dl,", R2 TEST=",test_r2_dl,", R2 HOLDOUT=",hold_r2_dl
    
  # make a pretty HTML table printout of the results

  header = ["Model", "R2 TRAIN", "R2 TEST", "R2 HOLDOUT", "Model Training Time (s)"]
  table  = [
            ["GBM", train_r2_gbm, test_r2_gbm, hold_r2_gbm, round(gbm_elapsed,3)],
            ["DRF", train_r2_drf, test_r2_drf, hold_r2_drf, round(drf_elapsed,3)],
            ["GLM", train_r2_glm, test_r2_glm, hold_r2_glm, round(glm_elapsed,3)],
            ["DL ", train_r2_dl,  test_r2_dl,  hold_r2_dl , round(dl_elapsed,3) ],
           ]
  h2o.H2ODisplay(table,header)
  # --------------



In [10]:

    
# Split the data (into test & train), fit some models and predict on the holdout data
split_fit_predict(bpd)
# Here we see an r^2 of 0.91 for GBM, and 0.71 for GLM.  This means given just
# the station, the month, and the day-of-week we can predict 90% of the
# variance of the bike-trip-starts.









    



Training data has 5 columns and 6289 rows, test has 3124 rows, holdout has 1037

gbm Model Build Progress: [##################################################] 100%

drf Model Build Progress: [##################################################] 100%

glm Model Build Progress: [##################################################] 100%

deeplearning Model Build Progress: [##################################################] 100%






    




Model
R2 TRAIN
R2 TEST
R2 HOLDOUT
Model Training Time (s)
GBM
1.0
0.9
0.9
6.753
DRF
0.8
0.8
0.8
5.624
GLM
0.9
0.8
0.8
0.144
DL 
1.0
0.9
0.9
7.885



In [11]:

    
# ----------
# 5- Now lets add some weather
# Load weather data
wthr1 = h2o.import_file(path=[mylocate("bigdata/laptop/citibike-nyc/31081_New_York_City__Hourly_2013.csv"),
                               mylocate("bigdata/laptop/citibike-nyc/31081_New_York_City__Hourly_2014.csv")])
# Peek at the data
wthr1.describe()









    



Parse Progress: [##################################################] 100%
Rows:17,520 Cols:50

Chunk compression summary:






    




chunk_type
chunk_name
count
count_percentage
size
size_percentage
C0L
Constant Integers
107
6.294118
    8.4 KB
0.7889721
C0D
Constant Reals
436
25.647058
   34.1 KB
3.2148771
CXI
Sparse Integers
17
1.0
    1.5 KB
0.1
C1
1-Byte Integers
346
20.352942
  197.4 KB
18.634672
C1N
1-Byte Integers (w/o NAs)
214
12.588236
  122.3 KB
11.544063
C1S
1-Byte Fractions
214
12.588236
  125.3 KB
11.822968
C2S
2-Byte Fractions
196
11.529412
  214.5 KB
20.242111
C4S
4-Byte Fractions
170
10.0
  356.1 KB
33.612423






    



Frame distribution summary:






    





size
number_of_rows
number_of_chunks_per_column
number_of_chunks
172.16.2.52:54321
    1.0 MB
17520.0
34.0
1700.0
mean
    1.0 MB
17520.0
34.0
1700.0
min
    1.0 MB
17520.0
34.0
1700.0
max
    1.0 MB
17520.0
34.0
1700.0
stddev
      0  B
0.0
0.0
0.0
total
    1.0 MB
17520.0
34.0
1700.0






    










    





       Year Local    Month Local  Day Local    Hour Local   Year UTC      Month UTC    Day UTC      Hour UTC     Cavok Reported  Cloud Ceiling (m)  Cloud Cover Fraction  Cloud Cover Fraction 1  Cloud Cover Fraction 2  Cloud Cover Fraction 3  Cloud Cover Fraction 4  Cloud Cover Fraction 5  Cloud Cover Fraction 6  Cloud Height (m) 1  Cloud Height (m) 2  Cloud Height (m) 3  Cloud Height (m) 4  Cloud Height (m) 5  Cloud Height (m) 6  Dew Point (C)  Humidity Fraction  Precipitation One Hour (mm)  Pressure Altimeter (mbar)  Pressure Sea Level (mbar)  Pressure Station (mbar)  Snow Depth (cm)  Temperature (C)  Visibility (km)  Weather Code 1  Weather Code 1/ Description  Weather Code 2  Weather Code 2/ Description  Weather Code 3  Weather Code 3/ Description  Weather Code 4  Weather Code 4/ Description  Weather Code 5  Weather Code 5/ Description  Weather Code 6  Weather Code 6/ Description  Weather Code Most Severe / Icon Code  Weather Code Most Severe  Weather Code Most Severe / Description  Wind Direction (degrees)  Wind Gust (m/s)  Wind Speed (m/s)  
type   int           int          int          int          int           int          int          int          int             real               real                  real                    real                    real                    int                     int                     int                     real                real                real                int                 int                 int                 real           real               real                         real                       int                        int                      int              real             real             int             enum                         int             enum                         int             enum                         int             enum                         int             enum                         int             enum                         int                                   int                       enum                                    int                       real             real              
mins   2013.0        1.0          1.0          0.0          2013.0        1.0          1.0          0.0          0.0             61.0               0.0                   0.0                     0.25                    0.5                     NaN                     NaN                     NaN                     60.96               213.36              365.76              NaN                 NaN                 NaN                 -26.7          0.1251             0.0                          983.2949                   NaN                        NaN                      NaN              -15.6            0.001            1.0             0.0                          1.0             0.0                          1.0             0.0                          1.0             0.0                          1.0             0.0                          3.0             0.0                          0.0                                   1.0                       0.0                                     10.0                      7.2              0.0               
mean   2013.5        6.52602739726 15.7205479452 11.5         2013.50057078 6.52511415525 15.721347032 11.5001141553 0.0             1306.31195846      0.416742490522        0.361207349081          0.872445384073          0.963045685279          0.0                     0.0                     0.0                     1293.9822682        1643.73900166       2084.89386376       0.0                 0.0                 0.0                 4.31304646766  0.596736389159     1.37993010753                1017.82581441              0.0                        0.0                      0.0              12.5789090701    14.3914429682    4.84251968504   NaN                          3.65867689358   NaN                          2.84660766962   NaN                          2.01149425287   NaN                          4.125           NaN                          3.0             0.0                          1.37848173516                         4.84251968504             NaN                                     194.69525682              9.42216948073    2.41032887849     
maxs   2014.0        12.0         31.0         23.0         2015.0        12.0         31.0         23.0         0.0             3657.6             1.0                   1.0                     1.0                     1.0                     NaN                     NaN                     NaN                     3657.5999           3657.5999           3657.5999           NaN                 NaN                 NaN                 24.4           1.0                26.924                       1042.2113                  NaN                        NaN                      NaN              36.1             16.0934          60.0            11.0                         60.0            10.0                         36.0            7.0                          27.0            4.0                          27.0            2.0                          3.0             0.0                          16.0                                  60.0                      11.0                                    360.0                     20.58            10.8              
sigma  0.500014270017 3.44794972385 8.79649804852 6.92238411188 0.500584411716 3.44782405458 8.79561488868 6.92230165203 0.0             995.339856966      0.462720830993        0.42770569708           0.197155690367          0.0861015598104         -0.0                    -0.0                    -0.0                    962.743095854       916.73861349        887.215847511       -0.0                -0.0                -0.0                10.9731282097  0.185792011866     2.56215129179                7.46451697179              -0.0                       -0.0                     -0.0             10.0396739531    3.69893623033    5.70486576983   NaN                          6.13386253912   NaN                          5.80553286364   NaN                          3.12340844261   NaN                          6.15223536611   NaN                          0.0             0.0                          4.07386062702                         5.70486576983             NaN                                     106.350000031             1.81511871115    1.61469790524     
zeros  0             0            0            730          0             0            0            730          17455           0                  8758                  8758                    0                       0                       -17520                  -17520                  -17520                  0                   0                   0                   -17520              -17520              -17520              268            0                  501                          0                          -17520                     -17520                   -17520           269              0                0               17                           0               30                           0               13                           -5044           -5024                        -11241          -11229                       -17030          -17028                       14980                                 0                         17                                      0                         0                2768              
missing 0             0            0            0            0             0            0            0            65              10780              375                   375                     14682                   16535                   17520                   17520                   17520                   9103                14683               16535               17520               17520               17520               67             67                 15660                        360                        17520                      17520                    17520            67               412              14980           14980                        16477           16477                        17181           17181                        17433           17433                        17504           17504                        17518           17518                        0                                     14980                     14980                                   9382                      14381            1283              
0      2013.0        1.0          1.0          0.0          2013.0        1.0          1.0          5.0          0.0             2895.6             1.0                   0.9                     1.0                     nan                     nan                     nan                     nan                     2895.5999           3352.8              nan                 nan                 nan                 nan                 -5.0           0.5447             nan                          1013.0917                  nan                        nan                      nan              3.3              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               nan                       nan              2.57              
1      2013.0        1.0          1.0          1.0          2013.0        1.0          1.0          6.0          0.0             3048.0             1.0                   1.0                     nan                     nan                     nan                     nan                     nan                     3048.0              nan                 nan                 nan                 nan                 nan                 -4.4           0.5463             nan                          1012.0759                  nan                        nan                      nan              3.9              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               260.0                     9.77             4.63              
2      2013.0        1.0          1.0          2.0          2013.0        1.0          1.0          7.0          0.0             1828.8             1.0                   1.0                     nan                     nan                     nan                     nan                     nan                     1828.7999           nan                 nan                 nan                 nan                 nan                 -3.3           0.619              nan                          1012.4145                  nan                        nan                      nan              3.3              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               nan                       7.72             1.54              
3      2013.0        1.0          1.0          3.0          2013.0        1.0          1.0          8.0          0.0             1463.0             1.0                   1.0                     nan                     nan                     nan                     nan                     nan                     1463.04             nan                 nan                 nan                 nan                 nan                 -2.8           0.6159             nan                          1012.4145                  nan                        nan                      nan              3.9              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               nan                       nan              3.09              
4      2013.0        1.0          1.0          4.0          2013.0        1.0          1.0          9.0          0.0             1402.1             1.0                   1.0                     nan                     nan                     nan                     nan                     nan                     1402.08             nan                 nan                 nan                 nan                 nan                 -2.8           0.6159             nan                          1012.7531                  nan                        nan                      nan              3.9              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               260.0                     nan              4.12              
5      2013.0        1.0          1.0          5.0          2013.0        1.0          1.0          10.0         0.0             1524.0             1.0                   1.0                     nan                     nan                     nan                     nan                     nan                     1524.0              nan                 nan                 nan                 nan                 nan                 -2.8           0.6159             nan                          1012.4145                  nan                        nan                      nan              3.9              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               nan                       nan              3.09              
6      2013.0        1.0          1.0          6.0          2013.0        1.0          1.0          11.0         0.0             1524.0             1.0                   1.0                     nan                     nan                     nan                     nan                     nan                     1524.0              nan                 nan                 nan                 nan                 nan                 -3.3           0.5934             nan                          1012.0759                  nan                        nan                      nan              3.9              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               nan                       9.26             3.09              
7      2013.0        1.0          1.0          7.0          2013.0        1.0          1.0          12.0         0.0             1524.0             1.0                   1.0                     nan                     nan                     nan                     nan                     nan                     1524.0              nan                 nan                 nan                 nan                 nan                 -3.3           0.5934             nan                          1012.4145                  nan                        nan                      nan              3.9              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               260.0                     9.26             4.63              
8      2013.0        1.0          1.0          8.0          2013.0        1.0          1.0          13.0         0.0             1524.0             1.0                   1.0                     nan                     nan                     nan                     nan                     nan                     1524.0              nan                 nan                 nan                 nan                 nan                 -2.8           0.6425             nan                          1012.4145                  nan                        nan                      nan              3.3              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               260.0                     nan              3.09              
9      2013.0        1.0          1.0          9.0          2013.0        1.0          1.0          14.0         0.0             1524.0             1.0                   0.9                     1.0                     nan                     nan                     nan                     nan                     1524.0              3657.5999           nan                 nan                 nan                 nan                 -2.8           0.6159             nan                          1012.4145                  nan                        nan                      nan              3.9              16.0934          nan                                          nan                                          nan                                          nan                                          nan                                          nan                                          0.0                                   nan                                                               nan                       9.26             3.09



In [12]:

    
# Lots of columns in there!  Lets plan on converting to time-since-epoch to do
# a 'join' with the bike data, plus gather weather info that might affect
# cyclists - rain, snow, temperature.  Alas, drop the "snow" column since it's
# all NA's.  Also add in dew point and humidity just in case.  Slice out just
# the columns of interest and drop the rest.
wthr2 = wthr1[["Year Local","Month Local","Day Local","Hour Local","Dew Point (C)","Humidity Fraction","Precipitation One Hour (mm)","Temperature (C)","Weather Code 1/ Description"]]

wthr2.set_name(wthr2.names.index("Precipitation One Hour (mm)"), "Rain (mm)")
wthr2.set_name(wthr2.names.index("Weather Code 1/ Description"), "WC1")
wthr2.describe()
# Much better!









    



Rows:17,520 Cols:9

Chunk compression summary:






    




chunk_type
chunk_name
count
count_percentage
size
size_percentage
C0L
Constant Integers
46
15.0
    3.6 KB
1.0267857
C1
1-Byte Integers
34
11.111112
   19.4 KB
5.533482
C1N
1-Byte Integers (w/o NAs)
90
29.411766
   51.5 KB
14.706473
CUD
Unique Reals
103
33.660133
  140.3 KB
40.09375
C8D
64-bit Reals
33
10.784314
  135.2 KB
38.63951






    



Frame distribution summary:






    





size
number_of_rows
number_of_chunks_per_column
number_of_chunks
172.16.2.52:54321
  350.0 KB
17520.0
34.0
306.0
mean
  350.0 KB
17520.0
34.0
306.0
min
  350.0 KB
17520.0
34.0
306.0
max
  350.0 KB
17520.0
34.0
306.0
stddev
      0  B
0.0
0.0
0.0
total
  350.0 KB
17520.0
34.0
306.0






    










    





       Year Local    Month Local  Day Local    Hour Local   Dew Point (C)  Humidity Fraction  Rain (mm)    Temperature (C)  WC1  
type   int           int          int          int          real           real               real         real             enum 
mins   2013.0        1.0          1.0          0.0          -26.7          0.1251             0.0          -15.6            0.0  
mean   2013.5        6.52602739726 15.7205479452 11.5         4.31304646766  0.596736389159     1.37993010753 12.5789090701    NaN  
maxs   2014.0        12.0         31.0         23.0         24.4           1.0                26.924       36.1             11.0 
sigma  0.500014270017 3.44794972385 8.79649804852 6.92238411188 10.9731282097  0.185792011866     2.56215129179 10.0396739531    NaN  
zeros  0             0            0            730          268            0                  501          269              17   
missing 0             0            0            0            67             67                 15660        67               14980
0      2013.0        1.0          1.0          0.0          -5.0           0.5447             nan          3.3                   
1      2013.0        1.0          1.0          1.0          -4.4           0.5463             nan          3.9                   
2      2013.0        1.0          1.0          2.0          -3.3           0.619              nan          3.3                   
3      2013.0        1.0          1.0          3.0          -2.8           0.6159             nan          3.9                   
4      2013.0        1.0          1.0          4.0          -2.8           0.6159             nan          3.9                   
5      2013.0        1.0          1.0          5.0          -2.8           0.6159             nan          3.9                   
6      2013.0        1.0          1.0          6.0          -3.3           0.5934             nan          3.9                   
7      2013.0        1.0          1.0          7.0          -3.3           0.5934             nan          3.9                   
8      2013.0        1.0          1.0          8.0          -2.8           0.6425             nan          3.3                   
9      2013.0        1.0          1.0          9.0          -2.8           0.6159             nan          3.9



In [13]:

    
# Filter down to the weather at Noon
wthr3 = wthr2[ wthr2["Hour Local"]==12 ]



In [14]:

    
# Lets now get Days since the epoch... we'll convert year/month/day into Epoch
# time, and then back to Epoch days.  Need zero-based month and days, but have
# 1-based.
wthr3["msec"] = h2o.H2OFrame.mktime(year=wthr3["Year Local"], month=wthr3["Month Local"]-1, day=wthr3["Day Local"]-1, hour=wthr3["Hour Local"])
secsPerDay=1000*60*60*24
wthr3["Days"] = (wthr3["msec"]/secsPerDay).floor()
wthr3.describe()
# msec looks sane (numbers like 1.3e12 are in the correct range for msec since
# 1970).  Epoch Days matches closely with the epoch day numbers from the
# CitiBike dataset.









    



Rows:730 Cols:10

Chunk compression summary:






    




chunk_type
chunk_name
count
count_percentage
size
size_percentage
C0L
Constant Integers
80
21.390373
    6.3 KB
12.498779
C0D
Constant Reals
13
3.4759357
    1.0 KB
2.0310516
C1
1-Byte Integers
30
8.021391
    2.6 KB
5.2455816
C1N
1-Byte Integers (w/o NAs)
56
14.973262
    4.9 KB
9.801778
C1S
1-Byte Fractions
34
9.090909
    3.5 KB
7.0032225
C2S
2-Byte Fractions
34
9.090909
    4.2 KB
8.4288645
CUD
Unique Reals
25
6.6844916
    3.6 KB
7.2297626
C8D
64-bit Reals
102
27.272728
   23.9 KB
47.76096






    



Frame distribution summary:






    





size
number_of_rows
number_of_chunks_per_column
number_of_chunks
172.16.2.52:54321
   50.0 KB
730.0
34.0
374.0
mean
   50.0 KB
730.0
34.0
374.0
min
   50.0 KB
730.0
34.0
374.0
max
   50.0 KB
730.0
34.0
374.0
stddev
      0  B
0.0
0.0
0.0
total
   50.0 KB
730.0
34.0
374.0






    










    





       Year Local    Month Local  Day Local    Hour Local  Dew Point (C)  Humidity Fraction  Rain (mm)    Temperature (C)  WC1  msec            Days         
type   int           int          int          int         real           real               real         real             enum int             int          
mins   2013.0        1.0          1.0          12.0        -26.7          0.1723             0.0          -13.9            0.0  1.3570704e+12   15706.0      
mean   2013.5        6.52602739726 15.7205479452 12.0        4.23012379642  0.539728198074     1.53125714286 14.0687757909    NaN  1.3885608526e+12 16070.5      
maxs   2014.0        12.0         31.0         12.0        23.3           1.0                12.446       34.4             10.0 1.420056e+12    16435.0      
sigma  0.500342818004 3.45021529307 8.80227802701 0.0         11.1062964725  0.179945027923     2.36064248615 10.3989855149    NaN  18219740080.4   210.877136425
zeros  0             0            0            0           14             0                  -174         7                -83  0               0            
missing 0             0            0            0           3              3                  660          3                620  0               0            
0      2013.0        1.0          1.0          12.0        -3.3           0.5934             nan          3.9                   1.3570704e+12   15706.0      
1      2013.0        1.0          2.0          12.0        -11.7          0.4806             nan          -2.2                  1.3571568e+12   15707.0      
2      2013.0        1.0          3.0          12.0        -10.6          0.5248             nan          -2.2                  1.3572432e+12   15708.0      
3      2013.0        1.0          4.0          12.0        -7.2           0.4976             nan          2.2                   1.3573296e+12   15709.0      
4      2013.0        1.0          5.0          12.0        -7.2           0.426              nan          4.4                   1.357416e+12    15710.0      
5      2013.0        1.0          6.0          12.0        -1.7           0.6451             nan          4.4              haze 1.3575024e+12   15711.0      
6      2013.0        1.0          7.0          12.0        -6.1           0.4119             nan          6.1                   1.3575888e+12   15712.0      
7      2013.0        1.0          8.0          12.0        -1.7           0.5314             nan          7.2                   1.3576752e+12   15713.0      
8      2013.0        1.0          9.0          12.0        0.6            0.56               nan          8.9              haze 1.3577616e+12   15714.0      
9      2013.0        1.0          10.0         12.0        -6.1           0.3952             nan          6.7                   1.357848e+12    15715.0



In [15]:

    
# Lets drop off the extra time columns to make a easy-to-handle dataset.
wthr4 = wthr3.drop("Year Local").drop("Month Local").drop("Day Local").drop("Hour Local").drop("msec")



In [16]:

    
# Also, most rain numbers are missing - lets assume those are zero rain days
rain = wthr4["Rain (mm)"]
rain[ rain.isna() ] = 0
wthr4["Rain (mm)"] = rain



In [17]:

    
# ----------
# 6 - Join the weather data-per-day to the bike-starts-per-day
print "Merge Daily Weather with Bikes-Per-Day"
bpd_with_weather = bpd.merge(wthr4,allLeft=True,allRite=False)
bpd_with_weather.describe()
bpd_with_weather.show()









    



Merge Daily Weather with Bikes-Per-Day
Rows:10,450 Cols:10

Chunk compression summary:






    




chunk_type
chunk_name
count
count_percentage
size
size_percentage
C0L
Constant Integers
32
10.0
    2.5 KB
1.4274563
CBS
Bits
32
10.0
    3.5 KB
1.9817107
C1
1-Byte Integers
32
10.0
   12.3 KB
7.0402584
C1N
1-Byte Integers (w/o NAs)
32
10.0
   12.3 KB
7.0402584
C1S
1-Byte Fractions
32
10.0
   12.8 KB
7.32575
C2
2-Byte Integers
64
20.0
   45.1 KB
25.73436
CUD
Unique Reals
96
30.000002
   86.6 KB
49.450207






    



Frame distribution summary:






    





size
number_of_rows
number_of_chunks_per_column
number_of_chunks
172.16.2.52:54321
  175.1 KB
10450.0
32.0
320.0
mean
  175.1 KB
10450.0
32.0
320.0
min
  175.1 KB
10450.0
32.0
320.0
max
  175.1 KB
10450.0
32.0
320.0
stddev
      0  B
0.0
0.0
0.0
total
  175.1 KB
10450.0
32.0
320.0






    










    





       Days         start station name     bikes        Month         DayOfWeek  Humidity Fraction  Rain (mm)  Temperature (C)  WC1       Dew Point (C)  
type   int          enum                   int          enum          enum       real               int        real             enum      real           
mins   15979.0      0.0                    1.0          0.0           0.0        0.3485             0.0        9.4              2.0       -2.2           
mean   15994.4415311 NaN                    99.3025837321 0.968612440191 NaN        0.562374191388     0.0        16.9630717703    NaN       7.77999043062  
maxs   16010.0      329.0                  553.0        1.0           6.0        0.8718             0.0        26.1             8.0       19.4           
sigma  9.23370172444 NaN                    72.9721964301 0.174371128617 NaN        0.149631413472     0.0        4.29746634617    NaN       6.49151146664  
zeros  0            32                     0            328           1635       0                  10450      0                0         0              
missing 0            0                      0            0             0          0                  0          0                9134      0              
0      15980.0      9 Ave & W 18 St        137.0        10            Tue        0.5019             0.0        25.0                       13.9           
1      15989.0      Allen St & Hester St   110.0        10            Thu        0.8631             0.0        16.7             light rain 14.4           
2      16003.0      Centre St & Chambers St 142.0        10            Thu        0.4578             0.0        9.4                        -1.7           
3      15995.0      Concord St & Bridge St 21.0         10            Wed        0.6765             0.0        20.6                       14.4           
4      15987.0      E 14 St & Avenue B     113.0        10            Tue        0.6455             0.0        14.4                       7.8            
5      16005.0      8 Ave & W 52 St        129.0        10            Sat        0.3818             0.0        12.8                       -1.1           
6      16009.0      South St & Whitehall St 70.0         10            Wed        0.7265             0.0        18.3                       13.3           
7      15989.0      Pike St & E Broadway   55.0         10            Thu        0.8631             0.0        16.7             light rain 14.4           
8      15991.0      Watts St & Greenwich St 101.0        10            Sat        0.6659             0.0        15.6                       9.4            
9      15985.0      Monroe St & Bedford Ave 15.0         10            Sun        0.842              0.0        22.2                       19.4           







    





  Days start station name       bikes   Month DayOfWeek    Humidity Fraction   Rain (mm)   Temperature (C) WC1         Dew Point (C)
 15980 9 Ave & W 18 St            137      10 Tue                     0.5019           0              25                       13.9
 15989 Allen St & Hester St       110      10 Thu                     0.8631           0              16.7 light rain            14.4
 16003 Centre St & Chambers St     142      10 Thu                     0.4578           0               9.4                      -1.7
 15995 Concord St & Bridge St      21      10 Wed                     0.6765           0              20.6                      14.4
 15987 E 14 St & Avenue B         113      10 Tue                     0.6455           0              14.4                       7.8
 16005 8 Ave & W 52 St            129      10 Sat                     0.3818           0              12.8                      -1.1
 16009 South St & Whitehall St      70      10 Wed                     0.7265           0              18.3                      13.3
 15989 Pike St & E Broadway        55      10 Thu                     0.8631           0              16.7 light rain            14.4
 15991 Watts St & Greenwich St     101      10 Sat                     0.6659           0              15.6                       9.4
 15985 Monroe St & Bedford Ave      15      10 Sun                     0.842           0              22.2                      19.4



In [18]:

    
# 7 - Test/Train split again, model build again, this time with weather
split_fit_predict(bpd_with_weather)









    



Training data has 10 columns and 6210 rows, test has 3167 rows, holdout has 1073

gbm Model Build Progress: [##################################################] 100%

drf Model Build Progress: [##################################################] 100%

glm Model Build Progress: [##################################################] 100%

deeplearning Model Build Progress: [##################################################] 100%






    




Model
R2 TRAIN
R2 TEST
R2 HOLDOUT
Model Training Time (s)
GBM
1.0
0.9
0.9
12.294
DRF
0.9
0.7
0.7
11.171
GLM
0.9
0.8
0.8
0.144
DL 
1.0
0.9
0.9
9.004

H2O cluster uptime:	7 minutes 51 seconds 28 milliseconds
H2O cluster version:	(unknown)
H2O cluster name:	spIdea
H2O cluster total nodes:	1
H2O cluster total memory:	12.44 GB
H2O cluster total cores:	8
H2O cluster allowed cores:	8
H2O cluster healthy:	True
H2O Connection ip:	127.0.0.1
H2O Connection port:	54321

chunk_type	chunk_name	count	count_percentage	size	size_percentage
C0L	Constant Integers	17	2.2135415	1.3 KB	0.0
CBS	Bits	48	6.25	130.0 KB	0.2
C1N	1-Byte Integers (w/o NAs)	48	6.25	1016.6 KB	1.6816467
C1S	1-Byte Fractions	79	10.286459	1.6 MB	2.7740283
C2	2-Byte Integers	243	31.640625	10.0 MB	16.99867
C2S	2-Byte Fractions	49	6.3802085	2.0 MB	3.429456
C4	4-Byte Integers	32	4.166667	2.6 MB	4.4884815
C8	64-bit Integers	60	7.8125	9.9 MB	16.745453
C8D	64-bit Reals	192	25.0	31.7 MB	53.665054

	size	number_of_rows	number_of_chunks_per_column	number_of_chunks
172.16.2.52:54321	59.0 MB	1037712.0	48.0	768.0
mean	59.0 MB	1037712.0	48.0	768.0
min	59.0 MB	1037712.0	48.0	768.0
max	59.0 MB	1037712.0	48.0	768.0
stddev	0 B	0.0	0.0	0.0
total	59.0 MB	1037712.0	48.0	768.0

	tripduration	starttime	stoptime	start station id	start station name	start station latitude	start station longitude	end station id	end station name	end station latitude	end station longitude	bikeid	usertype	birth year	gender	Days
type	int	time	time	int	enum	real	real	int	enum	real	real	int	enum	int	int	int
mins	60.0	1.380610868e+12	1.380611083e+12	72.0	0.0	40.680342423	-74.01713445	72.0	0.0	40.680342423	-74.01713445	14529.0	0.0	1899.0	0.0	15979.0
mean	825.614754383	1.38191371692e+12	1.38191454253e+12	443.714212614	NaN	40.7345188586	-73.9911328848	443.207421712	NaN	40.7342847885	-73.9912702982	17644.0716451	0.906095332809	1975.77839486	1.12375591686	15993.8523906
maxs	1259480.0	1.383289197e+12	1.38341851e+12	3002.0	329.0	40.770513	-73.9500479759	3002.0	329.0	40.770513	-73.9500479759	20757.0	1.0	1997.0	2.0	16010.0
sigma	2000.3732323	778871729.132	778847387.503	354.434325075	NaN	0.0195734073053	0.0123161234106	357.398217058	NaN	0.0195578458116	0.0123855811965	1717.68112134	0.291696182123	11.1314906238	0.544380593291	9.02215033588
zeros	0	0	0	0	5239	0	0	0	5449	0	0	0	97446	0	97498	0
missing	0	0	0	0	0	0	0	0	0	0	0	0	0	97445	0	0
0	326.0	1.380610868e+12	1.380611194e+12	239.0	Willoughby St & Fleet St	40.69196566	-73.9813018	366.0	Clinton Ave & Myrtle Ave	40.693261	-73.968896	16052.0	Subscriber	1982.0	1.0	15979.0
1	729.0	1.380610881e+12	1.38061161e+12	322.0	Clinton St & Tillary St	40.696192	-73.991218	398.0	Atlantic Ave & Furman St	40.69165183	-73.9999786	19412.0	Customer	nan	0.0	15979.0
2	520.0	1.380610884e+12	1.380611404e+12	174.0	E 25 St & 1 Ave	40.7381765	-73.97738662	403.0	E 2 St & 2 Ave	40.72502876	-73.99069656	19645.0	Subscriber	1984.0	1.0	15979.0
3	281.0	1.380610885e+12	1.380611166e+12	430.0	York St & Jay St	40.7014851	-73.98656928	323.0	Lawrence St & Willoughby St	40.69236178	-73.98631746	16992.0	Subscriber	1985.0	1.0	15979.0
4	196.0	1.380610887e+12	1.380611083e+12	403.0	E 2 St & 2 Ave	40.72502876	-73.99069656	401.0	Allen St & Rivington St	40.72019576	-73.98997825	15690.0	Subscriber	1986.0	1.0	15979.0
5	1948.0	1.380610908e+12	1.380612856e+12	369.0	Washington Pl & 6 Ave	40.73224119	-74.00026394	307.0	Canal St & Rutgers St	40.71427487	-73.98990025	19846.0	Subscriber	1977.0	1.0	15979.0
6	1327.0	1.380610908e+12	1.380612235e+12	254.0	W 11 St & 6 Ave	40.73532427	-73.99800419	539.0	Metropolitan Ave & Bedford Ave	40.71534825	-73.96024116	14563.0	Subscriber	1986.0	2.0	15979.0
7	1146.0	1.380610917e+12	1.380612063e+12	490.0	8 Ave & W 33 St	40.751551	-73.993934	438.0	St Marks Pl & 1 Ave	40.72779126	-73.98564945	16793.0	Subscriber	1959.0	1.0	15979.0
8	380.0	1.380610918e+12	1.380611298e+12	468.0	Broadway & W 55 St	40.7652654	-73.98192338	385.0	E 55 St & 2 Ave	40.75797322	-73.96603308	16600.0	Customer	nan	0.0	15979.0
9	682.0	1.380610925e+12	1.380611607e+12	300.0	Shevchenko Pl & E 6 St	40.728145	-73.990214	519.0	Pershing Square N	40.75188406	-73.97770164	15204.0	Subscriber	1992.0	1.0	15979.0

Days	start station name	bikes
15980	9 Ave & W 18 St	137
15989	Allen St & Hester St	110
16003	Centre St & Chambers St	142
15995	Concord St & Bridge St	21
15987	E 14 St & Avenue B	113
16005	8 Ave & W 52 St	129
16009	South St & Whitehall St	70
15989	Pike St & E Broadway	55
15991	Watts St & Greenwich St	101
15985	Monroe St & Bedford Ave	15

Probs	bikesQuantiles
0.01	4.49
0.1	19
0.25	43
0.333	57
0.5	87
0.667	118
0.75	137
0.9	192
0.99	334.51

Model	R2 TRAIN	R2 TEST	R2 HOLDOUT	Model Training Time (s)
GBM	1.0	0.9	0.9	6.753
DRF	0.8	0.8	0.8	5.624
GLM	0.9	0.8	0.8	0.144
DL	1.0	0.9	0.9	7.885

Days	start station name	bikes	Month	DayOfWeek	Humidity Fraction	Temperature (C)	WC1	Dew Point (C)
15980	9 Ave & W 18 St	137	10	Tue	0.5019	25		13.9
15989	Allen St & Hester St	110	10	Thu	0.8631	16.7	light rain	14.4
16003	Centre St & Chambers St	142	10	Thu	0.4578	9.4		-1.7
15995	Concord St & Bridge St	21	10	Wed	0.6765	20.6		14.4
15987	E 14 St & Avenue B	113	10	Tue	0.6455	14.4		7.8
16005	8 Ave & W 52 St	129	10	Sat	0.3818	12.8		-1.1
16009	South St & Whitehall St	70	10	Wed	0.7265	18.3		13.3
15989	Pike St & E Broadway	55	10	Thu	0.8631	16.7	light rain	14.4
15991	Watts St & Greenwich St	101	10	Sat	0.6659	15.6		9.4
15985	Monroe St & Bedford Ave	15	10	Sun	0.842	22.2		19.4