Course Logistics

functional

In [4]:

knitr::opts_chunk$set(warning=FALSE, message=FALSE, fig.align = 'center')

In [5]:

options(jupiter.rich_display=FALSE)

In [6]:

library(dplyr)
library(stringr)
taxi_url <- "http://alizaidi.blob.core.windows.net/training/taxi_df.rds"
taxi_df  <- readRDS(gzcon(url(taxi_url)))
(taxi_df <- tbl_df(taxi_df))

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

VendorID passenger_count trip_distance RateCodeID store_and_fwd_flag payment_type fare_amount tip_amount tolls_amount pickup_hour pickup_dow dropoff_hour dropoff_dow pickup_nhood dropoff_nhood kSplits

	1                  1                   1.80              1                  N                  2                   9.5                0.00              0.00               6-10               Sat                6-10               Sat                Morningside Heights Hamilton Heights   A                  
	1                  2                   0.90              1                  N                  1                   6.5                1.55              0.00               6-10               Sat                6-10               Sat                Midtown            Midtown            A                  
	1                  1                   0.90              1                  N                  1                   7.0                1.66              0.00               6-10               Sat                6-10               Sat                Lower East Side    Soho               A                  
	1                  1                   0.30              1                  N                  2                   3.0                0.00              0.00               6-10               Sat                6-10               Sat                Financial District Financial District A                  
	2                  1                   0.96              1                  N                  1                   5.5                1.30              0.00               6-10               Thu                6-10               Thu                Chelsea            West Village       A                  
	2                  1                   2.01              1                  N                  1                   9.5                2.16              0.00               10-5               Sun                10-5               Sun                Upper East Side    Harlem             A                  
	2                  3                   3.14              1                  N                  1                  12.5                2.50              0.00               12-4               Sun                12-4               Sun                Fort Green         Soho               A                  
	1                  1                   0.50              1                  N                  1                   4.0                1.00              0.00               12-4               Sun                12-4               Sun                Upper East Side    Upper East Side    A                  
	2                  1                   0.67              1                  N                  1                   5.0                1.00              0.00               12-4               Thu                12-4               Thu                Upper West Side    Upper West Side    A                  
	2                  1                  15.20              2                  N                  1                  52.0               14.33              5.33               12-4               Thu                12-4               Thu                NA                 Clinton            A                  
	2                  5                   2.96              1                  N                  2                  20.5                0.00              0.00               12-4               Thu                12-4               Thu                Upper East Side    Garment District   A                  
	1                  1                   0.70              1                  N                  2                   6.0                0.00              0.00               9-12               Mon                9-12               Mon                Upper East Side    Upper East Side    A                  
	1                  1                   2.60              1                  N                  1                  16.0                3.35              0.00               9-12               Thu                9-12               Thu                Upper East Side    Gramercy           A                  
	2                  2                   0.79              1                  N                  2                   5.0                0.00              0.00               12-4               Wed                12-4               Wed                NA                 NA                 A                  
	2                  1                   3.37              1                  N                  1                  18.0                3.60              0.00               12-4               Wed                12-4               Wed                Upper East Side    Chelsea            A                  
	1                  3                   2.40              1                  N                  2                  11.0                0.00              0.00               6-10               Tue                10-5               Tue                East Village       Garment District   A                  
	1                  1                  16.30              1                  Y                  1                  45.0               11.57              0.00               6-10               Tue                10-5               Tue                NA                 NA                 A                  
	1                  1                   5.70              1                  N                  1                  25.0                5.16              0.00               12-4               Mon                12-4               Mon                Midtown            NA                 A                  
	1                  1                   3.20              1                  N                  2                  16.0                0.00              0.00               12-4               Mon                12-4               Mon                Midtown            Upper West Side    A                  
	1                  1                   0.70              1                  N                  2                   4.5                0.00              0.00               12-4               Mon                12-4               Mon                Upper West Side    Harlem             A                  
	1                  1                   1.00              1                  N                  2                   6.0                0.00              0.00               10-5               Tue                10-5               Tue                Midtown            Upper West Side    A                  
	1                  1                   1.50              1                  N                  1                   8.5                1.00              0.00               10-5               Tue                10-5               Tue                West Village       East Village       A                  
	1                  2                   5.00              1                  N                  2                  21.5                0.00              0.00               12-4               Mon                12-4               Mon                Midtown            Jackson Heights    A                  
	1                  1                   2.00              1                  N                  1                  11.5                3.69              0.00               12-4               Mon                12-4               Mon                Downtown           Fort Green         A                  
	1                  1                   1.40              1                  N                  1                   8.0                1.75              0.00               12-4               Mon                12-4               Mon                Upper East Side    Upper East Side    A                  
	1                  1                   1.40              1                  N                  1                   8.5                1.85              0.00               12-4               Mon                12-4               Mon                Midtown            Gramercy           A                  
	1                  1                   1.80              1                  N                  2                   9.5                0.00              0.00               12-4               Mon                12-4               Mon                Upper West Side    Midtown            A                  
	2                  1                   6.08              1                  N                  1                  18.5                3.70              0.00               6-10               Sun                6-10               Sun                Financial District Upper East Side    A                  
	2                  1                   1.96              1                  N                  1                   8.5                2.12              0.00               6-10               Sun                6-10               Sun                Chelsea            Midtown            A                  
	2                  6                   1.26              1                  N                  2                   7.0                0.00              0.00               6-10               Sun                6-10               Sun                Garment District   Murray Hill        A                  
	⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
	2                  1                  14.21              1                  N                  2                  39.5                0.00              0.00               6-10               Tue                6-10               Tue                NA                 Sunny Side         A                  
	2                  1                   1.79              1                  N                  2                  10.0                0.00              0.00               6-10               Tue                6-10               Tue                Upper East Side    Upper West Side    A                  
	2                  1                   5.34              1                  N                  2                  20.0                0.00              0.00               6-10               Tue                6-10               Tue                Upper West Side    Battery Park       A                  
	2                  1                   1.51              1                  N                  1                   8.0                1.86              0.00               6-10               Tue                6-10               Tue                Midtown            Upper East Side    A                  
	2                  6                   2.92              1                  N                  1                  12.5                1.00              0.00               4-6                Sat                4-6                Sat                Upper East Side    Midtown            A                  
	2                  6                   1.41              1                  N                  2                   7.5                0.00              0.00               4-6                Sun                4-6                Sun                West Village       Soho               A                  
	2                  2                   4.91              1                  N                  1                  17.0                3.00              0.00               10-5               Tue                10-5               Tue                NA                 Upper East Side    A                  
	2                  3                   0.92              1                  N                  2                   5.0                0.00              0.00               10-5               Tue                10-5               Tue                East Village       Gramercy           A                  
	2                  2                  13.31              1                  N                  2                  37.0                0.00              0.00               10-5               Tue                10-5               Tue                NA                 NA                 A                  
	2                  1                   3.46              1                  N                  2                  12.5                0.00              0.00               10-5               Tue                10-5               Tue                East Village       Clinton            A                  
	2                  5                   6.08              1                  N                  1                  20.5                4.36              0.00               10-5               Tue                10-5               Tue                Garment District   Cobble Hill        A                  
	2                  1                   1.41              1                  N                  1                   7.0                1.66              0.00               10-5               Tue                10-5               Tue                Soho               East Village       A                  
	2                  1                   0.83              1                  N                  1                   5.5                1.36              0.00               10-5               Fri                10-5               Fri                Lower East Side    East Village       A                  
	2                  3                   1.44              1                  N                  1                   8.5                1.86              0.00               12-4               Thu                12-4               Thu                Upper West Side    Clinton            A                  
	2                  1                   1.00              1                  N                  2                   7.5                0.00              0.00               12-4               Thu                12-4               Thu                Upper East Side    Upper West Side    A                  
	2                  1                   5.72              1                  N                  1                  24.0                4.96              0.00               5-9                Tue                9-12               Tue                Financial District Upper East Side    A                  
	2                  1                   8.83              1                  N                  1                  34.0               10.03              5.33               5-9                Tue                9-12               Tue                NA                 Midtown            A                  
	2                  2                   1.38              1                  N                  1                   6.5                1.46              0.00               5-9                Tue                5-9                Tue                Upper East Side    Yorkville          A                  
	2                  1                   1.61              1                  N                  1                  14.5                3.06              0.00               5-9                Tue                9-12               Tue                Gramercy           Midtown            A                  
	2                  1                   2.28              1                  N                  1                  15.0                3.16              0.00               5-9                Tue                9-12               Tue                Upper East Side    Murray Hill        A                  
	2                  1                   1.00              1                  N                  1                   6.0                0.00              0.00               5-9                Tue                5-9                Tue                Chelsea            Chelsea            A                  
	2                  1                   3.24              1                  N                  2                  13.5                0.00              0.00               10-5               Sat                10-5               Sat                NA                 Fort Green         A                  
	2                  1                   1.13              1                  N                  1                   6.5                1.95              0.00               10-5               Sat                10-5               Sat                Midtown            Midtown            A                  
	2                  1                   2.27              1                  N                  2                   9.5                0.00              0.00               10-5               Sat                10-5               Sat                Upper East Side    Harlem             A                  
	2                  1                   1.87              1                  N                  1                   8.5                1.00              0.00               10-5               Sat                10-5               Sat                Morningside Heights Hamilton Heights   A                  
	2                  5                   1.57              1                  N                  2                   7.0                0.00              0.00               10-5               Sun                10-5               Sun                Midtown            Chelsea            A                  
	2                  2                   2.75              1                  N                  2                  12.0                0.00              0.00               10-5               Sun                10-5               Sun                Tribeca            Garment District   A                  
	2                  5                   9.18              1                  N                  1                  30.5                6.36              0.00               10-5               Sun                10-5               Sun                Midtown            Park Slope         A                  
	2                  1                   0.79              1                  N                  1                   5.0                1.16              0.00               6-10               Sat                6-10               Sat                Upper West Side    Upper West Side    A                  
	2                  1                   0.75              1                  N                  1                   4.5                1.00              0.00               6-10               Sat                6-10               Sat                Chelsea            Chelsea            A                  

In [7]:

class(taxi_df)

	'tbl_df'
	'tbl'
	'data.frame'

In [8]:

filter(taxi_df,
       dropoff_dow %in% c("Fri", "Sat", "Sun"),
       tip_amount > 1)

VendorID passenger_count trip_distance RateCodeID store_and_fwd_flag payment_type fare_amount tip_amount tolls_amount pickup_hour pickup_dow dropoff_hour dropoff_dow pickup_nhood dropoff_nhood kSplits

	1                 2                 0.90              1                 N                 1                  6.5              1.55              0.00              6-10              Sat               6-10              Sat               Midtown           Midtown           A                 
	1                 1                 0.90              1                 N                 1                  7.0              1.66              0.00              6-10              Sat               6-10              Sat               Lower East Side   Soho              A                 
	2                 1                 2.01              1                 N                 1                  9.5              2.16              0.00              10-5              Sun               10-5              Sun               Upper East Side   Harlem            A                 
	2                 3                 3.14              1                 N                 1                 12.5              2.50              0.00              12-4              Sun               12-4              Sun               Fort Green        Soho              A                 
	2                 1                 6.08              1                 N                 1                 18.5              3.70              0.00              6-10              Sun               6-10              Sun               Financial District Upper East Side   A                 
	2                 1                 1.96              1                 N                 1                  8.5              2.12              0.00              6-10              Sun               6-10              Sun               Chelsea           Midtown           A                 
	1                 1                 0.40              1                 N                 1                  4.0              1.06              0.00              10-5              Sat               10-5              Sat               Greenwich Village East Village      A                 
	1                 2                 1.70              1                 N                 1                 10.0              2.00              0.00              12-4              Fri               4-6               Fri               Midtown           Upper East Side   A                 
	1                 2                 0.60              1                 N                 1                  4.5              1.70              0.00              6-10              Sat               6-10              Sat               Tribeca           Tribeca           A                 
	1                 2                 9.30              1                 N                 1                 28.5              5.86              0.00              6-10              Sat               6-10              Sat               NA                Upper East Side   A                 
	1                 2                 5.20              1                 N                 1                 17.5              3.85              0.00              6-10              Fri               6-10              Fri               Upper East Side   Soho              A                 
	1                 1                 8.50              1                 N                 1                 24.5              6.30              5.33              6-10              Fri               6-10              Fri               NA                Upper East Side   A                 
	1                 1                 2.20              1                 N                 1                 15.5              4.07              0.00              6-10              Sat               6-10              Sat               Garment District  East Village      A                 
	1                 1                 5.30              1                 N                 1                 20.5              6.39              0.00              6-10              Sat               6-10              Sat               Financial District Midtown           A                 
	2                 1                 1.01              1                 N                 1                  5.5              1.58              0.00              4-6               Sun               4-6               Sun               Yorkville         Upper East Side   A                 
	1                 1                 0.40              1                 N                 1                  4.5              1.58              0.00              6-10              Fri               6-10              Fri               East Village      East Village      A                 
	1                 1                 7.70              1                 N                 1                 22.0              5.83              5.33              6-10              Fri               6-10              Fri               NA                Harlem            A                 
	2                 1                 1.01              1                 N                 1                  7.0              1.95              0.00              5-9               Fri               5-9               Fri               Gramercy          Murray Hill       A                 
	2                 1                 0.79              1                 N                 1                  8.0              2.20              0.00              5-9               Fri               5-9               Fri               Midtown           Midtown           A                 
	1                 1                 0.50              1                 N                 1                  4.0              1.20              0.00              5-9               Sat               5-9               Sat               Upper East Side   Upper East Side   A                 
	1                 1                 2.90              1                 N                 1                 10.5              2.25              0.00              5-9               Sat               5-9               Sat               Greenwich Village Upper East Side   A                 
	2                 1                 1.16              1                 N                 1                  6.0              1.82              0.00              10-5              Fri               10-5              Fri               Financial District Battery Park      A                 
	1                 1                 0.60              1                 N                 1                  5.5              1.35              0.00              6-10              Sat               6-10              Sat               Lower East Side   East Village      A                 
	2                 1                 1.09              1                 N                 1                 14.0              2.80              0.00              6-10              Sat               6-10              Sat               Greenwich Village East Village      A                 
	1                 1                 2.40              1                 N                 1                 12.5              2.00              0.00              4-6               Fri               4-6               Fri               Midtown           Chelsea           A                 
	2                 1                 1.28              1                 N                 1                  5.5              1.20              0.00              6-10              Sun               6-10              Sun               Midtown           Upper East Side   A                 
	2                 1                 5.08              1                 N                 1                 17.5              3.60              0.00              6-10              Sun               6-10              Sun               East Village      Upper West Side   A                 
	2                 5                 1.29              1                 N                 1                  7.0              1.88              0.00              6-10              Sun               6-10              Sun               Upper East Side   Upper East Side   A                 
	2                 1                 2.83              1                 N                 1                 10.0              2.10              0.00              6-10              Sun               6-10              Sun               Upper East Side   Gramercy          A                 
	1                 1                 2.50              1                 N                 1                 10.5              2.00              0.00              6-10              Sat               10-5              Sat               Midtown           Upper East Side   A                 
	⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
	2                 5                  2.36             1                 N                 1                 10.0              3.24              0                 5-9               Fri               5-9               Fri               Gramercy          Tribeca           A                 
	2                 1                  1.24             1                 N                 1                  6.0              1.46              0                 10-5              Sat               10-5              Sat               Garment District  Midtown           A                 
	2                 2                  1.24             1                 N                 1                  7.0              1.66              0                 6-10              Sat               6-10              Sat               Gramercy          Midtown           A                 
	2                 1                  2.63             1                 N                 1                 11.0              2.46              0                 6-10              Sat               6-10              Sat               Garment District  Upper East Side   A                 
	2                 5                  3.14             1                 N                 1                 20.0              4.16              0                 5-9               Fri               5-9               Fri               Sunny Side        Upper East Side   A                 
	2                 1                  2.25             1                 N                 1                 11.0              3.54              0                 5-9               Fri               5-9               Fri               Upper East Side   Upper West Side   A                 
	2                 1                  0.71             1                 N                 1                  8.5              1.86              0                 5-9               Fri               5-9               Fri               Upper East Side   Upper East Side   A                 
	2                 1                  3.04             1                 N                 1                 10.0              2.26              0                 6-10              Sat               6-10              Sat               Financial District Lower East Side   A                 
	2                 3                 10.36             1                 N                 1                 31.0              4.00              0                 6-10              Sat               6-10              Sat               Chelsea           Inwood            A                 
	2                 2                  1.56             1                 N                 1                  8.5              1.50              0                 12-4              Sat               12-4              Sat               Upper East Side   Upper West Side   A                 
	2                 2                  2.04             1                 N                 1                  8.5              1.86              0                 4-6               Sun               4-6               Sun               Upper East Side   Yorkville         A                 
	2                 1                  7.44             1                 N                 1                 27.5              5.66              0                 6-10              Sat               6-10              Sat               Chelsea           Bushwick          A                 
	2                 1                  1.96             1                 N                 1                  9.5              2.06              0                 12-4              Sat               12-4              Sat               Financial District Greenwich Village A                 
	2                 3                  3.30             1                 N                 1                 13.5              3.58              0                 12-4              Sat               4-6               Sat               Garment District  Upper West Side   A                 
	2                 1                  1.32             1                 N                 1                  9.0              1.96              0                 12-4              Sat               4-6               Sat               Midtown           Garment District  A                 
	2                 1                  2.85             1                 N                 1                 12.0              2.56              0                 12-4              Sat               4-6               Sat               Upper East Side   Midtown           A                 
	2                 2                  0.83             1                 N                 1                  6.0              1.36              0                 12-4              Sat               12-4              Sat               Clinton           Garment District  A                 
	2                 1                  2.96             1                 N                 1                 12.0              2.00              0                 4-6               Sun               4-6               Sun               Yorkville         North Sutton Area A                 
	2                 1                  2.64             1                 N                 1                 11.5              2.46              0                 12-4              Sat               12-4              Sat               Garment District  Central Park      A                 
	2                 5                  1.40             1                 N                 1                  7.5              1.66              0                 9-12              Sat               9-12              Sat               Garment District  Midtown           A                 
	2                 1                  0.59             1                 N                 1                  4.5              1.32              0                 9-12              Sun               9-12              Sun               Chelsea           Chelsea           A                 
	2                 1                  1.34             1                 N                 1                  7.5              1.76              0                 10-5              Fri               10-5              Sat               East Village      Greenwich Village A                 
	2                 1                  4.62             1                 N                 1                 20.0              4.26              0                 10-5              Fri               10-5              Sat               Soho              Bushwick          A                 
	2                 2                  0.76             1                 N                 1                  4.5              1.16              0                 10-5              Sat               10-5              Sat               Yorkville         Yorkville         A                 
	2                 1                  7.03             1                 N                 1                 24.0              1.70              0                 10-5              Sat               10-5              Sat               Little Italy      Upper West Side   A                 
	2                 4                  1.25             1                 N                 1                  6.5              1.56              0                 10-5              Sat               10-5              Sat               Little Italy      West Village      A                 
	2                 1                  0.83             1                 N                 1                  5.5              1.36              0                 10-5              Fri               10-5              Fri               Lower East Side   East Village      A                 
	2                 1                  1.13             1                 N                 1                  6.5              1.95              0                 10-5              Sat               10-5              Sat               Midtown           Midtown           A                 
	2                 5                  9.18             1                 N                 1                 30.5              6.36              0                 10-5              Sun               10-5              Sun               Midtown           Park Slope        A                 
	2                 1                  0.79             1                 N                 1                  5.0              1.16              0                 6-10              Sat               6-10              Sat               Upper West Side   Upper West Side   A                 

In [9]:

library(stringr)
#table(taxi_df$pickup_nhood)
#harlem_pickups <- filter(taxi_df, str_detect(pickup_nhood, "Harlem")) #str_detect: equivalent of 'Like' in SQL
  harlem_pickups <- filter(taxi_df, pickup_nhood == "Harlem" | pickup_nhood == "East Harlem")
nrow(harlem_pickups)
#findistr_dropoffs <- filter(harlem_pickups, str_detect(dropoff_nhood, "Financial District"))
    findistr_dropoffs <- filter(harlem_pickups, dropoff_nhood == "Financial District")
nrow(findistr_dropoffs)

# or all together (without creating additional objects == memory)
nrow(filter(taxi_df, str_detect(pickup_nhood, "Harlem"), dropoff_nhood == "Financial District"))

32025

99

99

In [10]:

select(taxi_df, pickup_nhood, dropoff_nhood,
       fare_amount, dropoff_hour, trip_distance)

pickup_nhood dropoff_nhood fare_amount dropoff_hour trip_distance

	Morningside Heights Hamilton Heights    9.5               6-10                1.80              
	Midtown            Midtown             6.5               6-10                0.90              
	Lower East Side    Soho                7.0               6-10                0.90              
	Financial District Financial District  3.0               6-10                0.30              
	Chelsea            West Village        5.5               6-10                0.96              
	Upper East Side    Harlem              9.5               10-5                2.01              
	Fort Green         Soho               12.5               12-4                3.14              
	Upper East Side    Upper East Side     4.0               12-4                0.50              
	Upper West Side    Upper West Side     5.0               12-4                0.67              
	NA                 Clinton            52.0               12-4               15.20              
	Upper East Side    Garment District   20.5               12-4                2.96              
	Upper East Side    Upper East Side     6.0               9-12                0.70              
	Upper East Side    Gramercy           16.0               9-12                2.60              
	NA                 NA                  5.0               12-4                0.79              
	Upper East Side    Chelsea            18.0               12-4                3.37              
	East Village       Garment District   11.0               10-5                2.40              
	NA                 NA                 45.0               10-5               16.30              
	Midtown            NA                 25.0               12-4                5.70              
	Midtown            Upper West Side    16.0               12-4                3.20              
	Upper West Side    Harlem              4.5               12-4                0.70              
	Midtown            Upper West Side     6.0               10-5                1.00              
	West Village       East Village        8.5               10-5                1.50              
	Midtown            Jackson Heights    21.5               12-4                5.00              
	Downtown           Fort Green         11.5               12-4                2.00              
	Upper East Side    Upper East Side     8.0               12-4                1.40              
	Midtown            Gramercy            8.5               12-4                1.40              
	Upper West Side    Midtown             9.5               12-4                1.80              
	Financial District Upper East Side    18.5               6-10                6.08              
	Chelsea            Midtown             8.5               6-10                1.96              
	Garment District   Murray Hill         7.0               6-10                1.26              
	⋮ ⋮ ⋮ ⋮ ⋮
	NA                 Sunny Side         39.5               6-10               14.21              
	Upper East Side    Upper West Side    10.0               6-10                1.79              
	Upper West Side    Battery Park       20.0               6-10                5.34              
	Midtown            Upper East Side     8.0               6-10                1.51              
	Upper East Side    Midtown            12.5               4-6                 2.92              
	West Village       Soho                7.5               4-6                 1.41              
	NA                 Upper East Side    17.0               10-5                4.91              
	East Village       Gramercy            5.0               10-5                0.92              
	NA                 NA                 37.0               10-5               13.31              
	East Village       Clinton            12.5               10-5                3.46              
	Garment District   Cobble Hill        20.5               10-5                6.08              
	Soho               East Village        7.0               10-5                1.41              
	Lower East Side    East Village        5.5               10-5                0.83              
	Upper West Side    Clinton             8.5               12-4                1.44              
	Upper East Side    Upper West Side     7.5               12-4                1.00              
	Financial District Upper East Side    24.0               9-12                5.72              
	NA                 Midtown            34.0               9-12                8.83              
	Upper East Side    Yorkville           6.5               5-9                 1.38              
	Gramercy           Midtown            14.5               9-12                1.61              
	Upper East Side    Murray Hill        15.0               9-12                2.28              
	Chelsea            Chelsea             6.0               5-9                 1.00              
	NA                 Fort Green         13.5               10-5                3.24              
	Midtown            Midtown             6.5               10-5                1.13              
	Upper East Side    Harlem              9.5               10-5                2.27              
	Morningside Heights Hamilton Heights    8.5               10-5                1.87              
	Midtown            Chelsea             7.0               10-5                1.57              
	Tribeca            Garment District   12.0               10-5                2.75              
	Midtown            Park Slope         30.5               10-5                9.18              
	Upper West Side    Upper West Side     5.0               6-10                0.79              
	Chelsea            Chelsea             4.5               6-10                0.75              

In [11]:

#select(arrange(taxi_df, desc(fare_amount), pickup_nhood),
#       fare_amount, pickup_nhood)

head(select(arrange(taxi_df, desc(fare_amount), pickup_nhood, dropoff_nhood),
       fare_amount, pickup_nhood, dropoff_nhood), 10)

fare_amount pickup_nhood dropoff_nhood

	3130.30        Borough Park   Borough Park   
	3130.30        Upper West Side Upper West Side
	 990.00        NA             NA             
	 900.00        Little Italy   Little Italy   
	 900.00        NA             NA             
	 630.01        Chelsea        Murray Hill    
	 600.00        Throggs Neck   City Island    
	 500.00        Chinatown      Chinatown      
	 500.00        Chinatown      Chinatown      
	 500.00        Clinton        NA             

In [12]:

#Exercise
head(select(arrange(taxi_df, desc(tip_amount), dropoff_nhood, pickup_nhood), tip_amount, dropoff_nhood, pickup_nhood))

tip_amount dropoff_nhood pickup_nhood

	454.00            Bedford-Stuyvesant Bedford-Stuyvesant
	300.00            Greenwich Village Chelsea           
	239.07            Clinton           Greenwich Village 
	221.20            Clinton           Central Park      
	202.00            Lower East Side   Upper East Side   
	173.00            NA                NA                

In [13]:

taxi_df <- mutate(taxi_df, tip_pct = tip_amount/fare_amount)
head(select(taxi_df, tip_pct, fare_amount, tip_amount))
head(transmute(taxi_df, tip_pct = tip_amount/fare_amount))

tip_pct fare_amount tip_amount

	0.0000000 9.5      0.00     
	0.2384615 6.5      1.55     
	0.2371429 7.0      1.66     
	0.0000000 3.0      0.00     
	0.2363636 5.5      1.30     
	0.2273684 9.5      2.16     

tip_pct

	0.0000000
	0.2384615
	0.2371429
	0.0000000
	0.2363636
	0.2273684

In [14]:

str(taxi_df)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	3770319 obs. of  17 variables:
 $ VendorID          : chr  "1" "1" "1" "1" ...
 $ passenger_count   : int  1 2 1 1 1 1 3 1 1 1 ...
 $ trip_distance     : num  1.8 0.9 0.9 0.3 0.96 2.01 3.14 0.5 0.67 15.2 ...
 $ RateCodeID        : chr  "1" "1" "1" "1" ...
 $ store_and_fwd_flag: chr  "N" "N" "N" "N" ...
 $ payment_type      : chr  "2" "1" "1" "2" ...
 $ fare_amount       : num  9.5 6.5 7 3 5.5 9.5 12.5 4 5 52 ...
 $ tip_amount        : num  0 1.55 1.66 0 1.3 ...
 $ tolls_amount      : num  0 0 0 0 0 0 0 0 0 5.33 ...
 $ pickup_hour       : chr  "6-10" "6-10" "6-10" "6-10" ...
 $ pickup_dow        : chr  "Sat" "Sat" "Sat" "Sat" ...
 $ dropoff_hour      : chr  "6-10" "6-10" "6-10" "6-10" ...
 $ dropoff_dow       : chr  "Sat" "Sat" "Sat" "Sat" ...
 $ pickup_nhood      : chr  "Morningside Heights" "Midtown" "Lower East Side" "Financial District" ...
 $ dropoff_nhood     : chr  "Hamilton Heights" "Midtown" "Soho" "Financial District" ...
 $ kSplits           : chr  "A" "A" "A" "A" ...
 $ tip_pct           : num  0 0.238 0.237 0 0.236 ...

In [15]:

class(taxi_df)

	'tbl_df'
	'tbl'
	'data.frame'

In [16]:

grouped_taxi <- group_by(taxi_df, dropoff_nhood)
class(grouped_taxi)
head(grouped_taxi)#grouping is not visible, but the class knows about the grouping

	'grouped_df'
	'tbl_df'
	'tbl'
	'data.frame'

VendorID passenger_count trip_distance RateCodeID store_and_fwd_flag payment_type fare_amount tip_amount tolls_amount pickup_hour pickup_dow dropoff_hour dropoff_dow pickup_nhood dropoff_nhood kSplits tip_pct

	1                  1                  1.80               1                  N                  2                  9.5                0.00               0                  6-10               Sat                6-10               Sat                Morningside Heights Hamilton Heights   A                  0.0000000          
	1                  2                  0.90               1                  N                  1                  6.5                1.55               0                  6-10               Sat                6-10               Sat                Midtown            Midtown            A                  0.2384615          
	1                  1                  0.90               1                  N                  1                  7.0                1.66               0                  6-10               Sat                6-10               Sat                Lower East Side    Soho               A                  0.2371429          
	1                  1                  0.30               1                  N                  2                  3.0                0.00               0                  6-10               Sat                6-10               Sat                Financial District Financial District A                  0.0000000          
	2                  1                  0.96               1                  N                  1                  5.5                1.30               0                  6-10               Thu                6-10               Thu                Chelsea            West Village       A                  0.2363636          
	2                  1                  2.01               1                  N                  1                  9.5                2.16               0                  10-5               Sun                10-5               Sun                Upper East Side    Harlem             A                  0.2273684          

In [17]:

nrow(summarize(group_by(taxi_df, dropoff_nhood),
          Num = n(), ave_tip_pct = mean(tip_pct)))

nrow(summarize(grouped_taxi, Num=n(), ave_tip_pct=mean(tip_pct)))  #calculated above

122

122

In [18]:

summarise(group_by(taxi_df, pickup_nhood, dropoff_nhood),
          Num = n(), ave_tip_pct = mean(tip_pct))

pickup_nhood dropoff_nhood Num ave_tip_pct

	Ardon Heights           Ardon Heights              1                    0.00000000              
	Astoria-Long Island City Astoria-Long Island City 6714                    0.09712743              
	Astoria-Long Island City Auburndale                13                    0.06520367              
	Astoria-Long Island City Battery Park              14                    0.10140865              
	Astoria-Long Island City Bay Ridge                  5                    0.15388657              
	Astoria-Long Island City Bedford Park               3                    0.05550679              
	Astoria-Long Island City Bedford-Stuyvesant        68                    0.09819024              
	Astoria-Long Island City Bensonhurst                2                    0.00000000              
	Astoria-Long Island City Boerum Hill               12                    0.09941085              
	Astoria-Long Island City Borough Park               9                    0.08205749              
	Astoria-Long Island City Brownsville                4                    0.10376598              
	Astoria-Long Island City Bushwick                  53                    0.05616446              
	Astoria-Long Island City Canarsie                   3                    0.08080808              
	Astoria-Long Island City Carnegie Hill             29                    0.10760679              
	Astoria-Long Island City Carroll Gardens           19                    0.15358826              
	Astoria-Long Island City Central Park              72                    0.12755575              
	Astoria-Long Island City Chelsea                  224                    0.11327865              
	Astoria-Long Island City Chinatown                  8                    0.09015545              
	Astoria-Long Island City Clearview                  7                    0.02959184              
	Astoria-Long Island City Clinton                  163                    0.11348404              
	Astoria-Long Island City Cobble Hill                2                    0.10235294              
	Astoria-Long Island City Douglastown-Little Neck    2                    0.00000000              
	Astoria-Long Island City Downtown                  22                    0.13353090              
	Astoria-Long Island City Dyker Heights              1                    0.00000000              
	Astoria-Long Island City East Brooklyn              8                    0.04740602              
	Astoria-Long Island City East Harlem               79                    0.07145051              
	Astoria-Long Island City East Village             171                    0.14311855              
	Astoria-Long Island City Financial District       117                    0.13240868              
	Astoria-Long Island City Flushing                  54                    0.08736876              
	Astoria-Long Island City Fordham                    5                    0.04092308              
	⋮ ⋮ ⋮ ⋮
	NA                      Soundview                 114                   0.02754077              
	NA                      South Beach                15                   0.11433528              
	NA                      South Bronx               236                   0.05923333              
	NA                      Springfield Gardens       655                   0.06109878              
	NA                      Spuyten Duyvil            131                   0.11685827              
	NA                      Sunny Side               2678                   0.12713336              
	NA                      Sunset Park               288                   0.11079174              
	NA                      The Rockaways             259                   0.06733444              
	NA                      Throggs Neck               80                   0.08751933              
	NA                      Todt Hill                  12                   0.03739845              
	NA                      Tottensville                2                   0.14492754              
	NA                      Tremont                   119                   0.02535450              
	NA                      Tribeca                  1931                   0.16457058              
	NA                      Union Port                 64                   0.04806241              
	NA                      University Heights         82                   0.03717917              
	NA                      Upper East Side         10544                   0.16246302              
	NA                      Upper West Side          7967                   0.16120439              
	NA                      Utopia                    677                   0.06395129              
	NA                      Wakefield-Williamsbridge   107                   0.04609067              
	NA                      Washington Heights       1241                   0.13664189              
	NA                      West Village             1726                   0.16814002              
	NA                      Westerleigh-Castleton      11                   0.04190773              
	NA                      Whitestone                  1                   0.00000000              
	NA                      Williams Bridge            65                   0.06279349              
	NA                      Williamsburg             5111                   0.14403349              
	NA                      Woodhaven-Richmond Hill   480                   0.03536936              
	NA                      Woodlawn-Nordwood          24                   0.08135023              
	NA                      Woodside                  765                   0.08492267              
	NA                      Yorkville                 957                   0.15686904              
	NA                      NA                      27834                   0.47254372              

In [19]:

filter(arrange(summarise(group_by(taxi_df, pickup_nhood, dropoff_nhood), Num = n(), ave_tip_pct = mean(tip_pct)), desc(ave_tip_pct)), Num >= 10)

pickup_nhood dropoff_nhood Num ave_tip_pct

	Kings Bridge            Kings Bridge               11                   1.2256871               
	Upper East Side         Brownsville                23                   0.9447013               
	Bay Ridge               Bay Ridge                  60                   0.7915420               
	The Rockaways           The Rockaways              33                   0.7828405               
	Gravesend-Sheepshead Bay Gravesend-Sheepshead Bay    80                   0.5122171               
	NA                      NA                      27834                   0.4725437               
	Bensonhurst             Bensonhurst                59                   0.4512018               
	Upper East Side         Spuyten Duyvil             65                   0.3463543               
	Soho                    Mott Haven                 12                   0.3239580               
	Bedford Park            Bedford Park               30                   0.3078915               
	Midtown                 Westerleigh-Castleton      10                   0.3017644               
	Bay Ridge               Gravesend-Sheepshead Bay    10                   0.2951905               
	Clinton                 Bay Ridge                  90                   0.2893189               
	Midtown                 South Beach                17                   0.2473338               
	Battery Park            Greenwood                  15                   0.2396805               
	Fordham                 Fordham                    15                   0.2326797               
	Bedford-Stuyvesant      Bedford-Stuyvesant        936                   0.2293516               
	Greenwood               Financial District         11                   0.2273952               
	Carnegie Hill           Carroll Gardens            13                   0.2223942               
	Lower East Side         Spuyten Duyvil             10                   0.2105018               
	Murray Hill             Clearview                  10                   0.2102742               
	Nkew Gardens            Midtown                    10                   0.2101923               
	Battery Park            Park Slope                134                   0.2032499               
	Hunts Point             Hunts Point                13                   0.2014353               
	Washington Heights      NA                        150                   0.1995337               
	Forest Hills            Gramercy                   26                   0.1990748               
	Battery Park            Battery Park             1145                   0.1987076               
	Jackson Heights         Financial District         13                   0.1970842               
	Carroll Gardens         West Village               51                   0.1966033               
	Financial District      Spuyten Duyvil             10                   0.1964487               
	⋮ ⋮ ⋮ ⋮
	Yorkville               South Bronx             127                     0.018011226             
	Parkchester             Parkchester              28                     0.017944170             
	East Harlem             Bedford Park             17                     0.017535651             
	Clinton                 Soundview                27                     0.017476971             
	Sunny Side              Woodhaven-Richmond Hill  13                     0.017094017             
	Yorkville               Woodlawn-Nordwood        12                     0.016944444             
	Williamsburg            Brownsville              11                     0.016161616             
	Greenwich Village       Soundview                21                     0.015615496             
	Brownsville             Bedford-Stuyvesant       14                     0.015566502             
	Harlem                  Morris Heights           31                     0.015263697             
	South Bronx             East Harlem              12                     0.015151515             
	Brownsville             East Brooklyn            12                     0.014800000             
	Mott Haven              Tremont                  18                     0.014422658             
	Jamaica                 Utopia                   11                     0.013722127             
	West Village            High Bridge              15                     0.013529412             
	Fordham                 Tremont                  12                     0.013333333             
	Utopia                  Jamaica                  21                     0.012786596             
	Harlem                  Parkchester              23                     0.012385605             
	Garment District        East Brooklyn            37                     0.012137732             
	Harlem                  Wakefield-Williamsbridge  11                     0.006060606             
	East Harlem             Hunts Point              25                     0.005934066             
	High Bridge             Mott Haven               43                     0.004651163             
	Woodhaven-Richmond Hill Jamaica                  16                     0.004166667             
	Yorkville               Soundview                26                     0.003870043             
	Hamilton Heights        Morris Heights           16                     0.000000000             
	Jamaica                 Woodhaven-Richmond Hill  13                     0.000000000             
	Mott Haven              Soundview                16                     0.000000000             
	Mott Haven              University Heights       10                     0.000000000             
	Tremont                 Fordham                  11                     0.000000000             
	University Heights      Inwood                   11                     0.000000000             

In [20]:

filter(
  arrange(
    summarise(
      group_by(taxi_df,
               pickup_nhood, dropoff_nhood),
      Num = n(),
      ave_tip_pct = mean(tip_pct)),
    desc(ave_tip_pct)),
  Num >= 10)

pickup_nhood dropoff_nhood Num ave_tip_pct

	Kings Bridge            Kings Bridge               11                   1.2256871               
	Upper East Side         Brownsville                23                   0.9447013               
	Bay Ridge               Bay Ridge                  60                   0.7915420               
	The Rockaways           The Rockaways              33                   0.7828405               
	Gravesend-Sheepshead Bay Gravesend-Sheepshead Bay    80                   0.5122171               
	NA                      NA                      27834                   0.4725437               
	Bensonhurst             Bensonhurst                59                   0.4512018               
	Upper East Side         Spuyten Duyvil             65                   0.3463543               
	Soho                    Mott Haven                 12                   0.3239580               
	Bedford Park            Bedford Park               30                   0.3078915               
	Midtown                 Westerleigh-Castleton      10                   0.3017644               
	Bay Ridge               Gravesend-Sheepshead Bay    10                   0.2951905               
	Clinton                 Bay Ridge                  90                   0.2893189               
	Midtown                 South Beach                17                   0.2473338               
	Battery Park            Greenwood                  15                   0.2396805               
	Fordham                 Fordham                    15                   0.2326797               
	Bedford-Stuyvesant      Bedford-Stuyvesant        936                   0.2293516               
	Greenwood               Financial District         11                   0.2273952               
	Carnegie Hill           Carroll Gardens            13                   0.2223942               
	Lower East Side         Spuyten Duyvil             10                   0.2105018               
	Murray Hill             Clearview                  10                   0.2102742               
	Nkew Gardens            Midtown                    10                   0.2101923               
	Battery Park            Park Slope                134                   0.2032499               
	Hunts Point             Hunts Point                13                   0.2014353               
	Washington Heights      NA                        150                   0.1995337               
	Forest Hills            Gramercy                   26                   0.1990748               
	Battery Park            Battery Park             1145                   0.1987076               
	Jackson Heights         Financial District         13                   0.1970842               
	Carroll Gardens         West Village               51                   0.1966033               
	Financial District      Spuyten Duyvil             10                   0.1964487               
	⋮ ⋮ ⋮ ⋮
	Yorkville               South Bronx             127                     0.018011226             
	Parkchester             Parkchester              28                     0.017944170             
	East Harlem             Bedford Park             17                     0.017535651             
	Clinton                 Soundview                27                     0.017476971             
	Sunny Side              Woodhaven-Richmond Hill  13                     0.017094017             
	Yorkville               Woodlawn-Nordwood        12                     0.016944444             
	Williamsburg            Brownsville              11                     0.016161616             
	Greenwich Village       Soundview                21                     0.015615496             
	Brownsville             Bedford-Stuyvesant       14                     0.015566502             
	Harlem                  Morris Heights           31                     0.015263697             
	South Bronx             East Harlem              12                     0.015151515             
	Brownsville             East Brooklyn            12                     0.014800000             
	Mott Haven              Tremont                  18                     0.014422658             
	Jamaica                 Utopia                   11                     0.013722127             
	West Village            High Bridge              15                     0.013529412             
	Fordham                 Tremont                  12                     0.013333333             
	Utopia                  Jamaica                  21                     0.012786596             
	Harlem                  Parkchester              23                     0.012385605             
	Garment District        East Brooklyn            37                     0.012137732             
	Harlem                  Wakefield-Williamsbridge  11                     0.006060606             
	East Harlem             Hunts Point              25                     0.005934066             
	High Bridge             Mott Haven               43                     0.004651163             
	Woodhaven-Richmond Hill Jamaica                  16                     0.004166667             
	Yorkville               Soundview                26                     0.003870043             
	Hamilton Heights        Morris Heights           16                     0.000000000             
	Jamaica                 Woodhaven-Richmond Hill  13                     0.000000000             
	Mott Haven              Soundview                16                     0.000000000             
	Mott Haven              University Heights       10                     0.000000000             
	Tremont                 Fordham                  11                     0.000000000             
	University Heights      Inwood                   11                     0.000000000             

In [21]:

taxi_df %>%
  group_by(pickup_nhood, dropoff_nhood) %>%
  summarize(Num = n(),
            ave_tip_pct = mean(tip_pct)) %>%
  arrange(desc(ave_tip_pct)) %>%
  filter(Num >= 10)

pickup_nhood dropoff_nhood Num ave_tip_pct

	Kings Bridge            Kings Bridge               11                   1.2256871               
	Upper East Side         Brownsville                23                   0.9447013               
	Bay Ridge               Bay Ridge                  60                   0.7915420               
	The Rockaways           The Rockaways              33                   0.7828405               
	Gravesend-Sheepshead Bay Gravesend-Sheepshead Bay    80                   0.5122171               
	NA                      NA                      27834                   0.4725437               
	Bensonhurst             Bensonhurst                59                   0.4512018               
	Upper East Side         Spuyten Duyvil             65                   0.3463543               
	Soho                    Mott Haven                 12                   0.3239580               
	Bedford Park            Bedford Park               30                   0.3078915               
	Midtown                 Westerleigh-Castleton      10                   0.3017644               
	Bay Ridge               Gravesend-Sheepshead Bay    10                   0.2951905               
	Clinton                 Bay Ridge                  90                   0.2893189               
	Midtown                 South Beach                17                   0.2473338               
	Battery Park            Greenwood                  15                   0.2396805               
	Fordham                 Fordham                    15                   0.2326797               
	Bedford-Stuyvesant      Bedford-Stuyvesant        936                   0.2293516               
	Greenwood               Financial District         11                   0.2273952               
	Carnegie Hill           Carroll Gardens            13                   0.2223942               
	Lower East Side         Spuyten Duyvil             10                   0.2105018               
	Murray Hill             Clearview                  10                   0.2102742               
	Nkew Gardens            Midtown                    10                   0.2101923               
	Battery Park            Park Slope                134                   0.2032499               
	Hunts Point             Hunts Point                13                   0.2014353               
	Washington Heights      NA                        150                   0.1995337               
	Forest Hills            Gramercy                   26                   0.1990748               
	Battery Park            Battery Park             1145                   0.1987076               
	Jackson Heights         Financial District         13                   0.1970842               
	Carroll Gardens         West Village               51                   0.1966033               
	Financial District      Spuyten Duyvil             10                   0.1964487               
	⋮ ⋮ ⋮ ⋮
	Yorkville               South Bronx             127                     0.018011226             
	Parkchester             Parkchester              28                     0.017944170             
	East Harlem             Bedford Park             17                     0.017535651             
	Clinton                 Soundview                27                     0.017476971             
	Sunny Side              Woodhaven-Richmond Hill  13                     0.017094017             
	Yorkville               Woodlawn-Nordwood        12                     0.016944444             
	Williamsburg            Brownsville              11                     0.016161616             
	Greenwich Village       Soundview                21                     0.015615496             
	Brownsville             Bedford-Stuyvesant       14                     0.015566502             
	Harlem                  Morris Heights           31                     0.015263697             
	South Bronx             East Harlem              12                     0.015151515             
	Brownsville             East Brooklyn            12                     0.014800000             
	Mott Haven              Tremont                  18                     0.014422658             
	Jamaica                 Utopia                   11                     0.013722127             
	West Village            High Bridge              15                     0.013529412             
	Fordham                 Tremont                  12                     0.013333333             
	Utopia                  Jamaica                  21                     0.012786596             
	Harlem                  Parkchester              23                     0.012385605             
	Garment District        East Brooklyn            37                     0.012137732             
	Harlem                  Wakefield-Williamsbridge  11                     0.006060606             
	East Harlem             Hunts Point              25                     0.005934066             
	High Bridge             Mott Haven               43                     0.004651163             
	Woodhaven-Richmond Hill Jamaica                  16                     0.004166667             
	Yorkville               Soundview                26                     0.003870043             
	Hamilton Heights        Morris Heights           16                     0.000000000             
	Jamaica                 Woodhaven-Richmond Hill  13                     0.000000000             
	Mott Haven              Soundview                16                     0.000000000             
	Mott Haven              University Heights       10                     0.000000000             
	Tremont                 Fordham                  11                     0.000000000             
	University Heights      Inwood                   11                     0.000000000             

In [22]:

mht_url <- "http://alizaidi.blob.core.windows.net/training/manhattan.rds"
manhattan_hoods <- readRDS(gzcon(url(mht_url)))
taxi_df %>%
  filter(pickup_nhood %in% manhattan_hoods,
         dropoff_nhood %in% manhattan_hoods) %>%
  group_by(dropoff_nhood, pickup_nhood) %>%
  summarize(ave_tip = mean(tip_pct),
            ave_dist = mean(trip_distance)) %>%
  filter(ave_dist > 3, ave_tip > 0.05)

dropoff_nhood pickup_nhood ave_tip ave_dist

	Battery Park       Central Park       0.12694563          6.149281          
	Battery Park       Clinton            0.11996579          4.016902          
	Battery Park       East Harlem        0.07116177         10.124000          
	Battery Park       East Village       0.14717019          3.537367          
	Battery Park       Garment District   0.13463903          3.965532          
	Battery Park       Gramercy           0.14396885          4.153174          
	Battery Park       Hamilton Heights   0.06770436          8.843571          
	Battery Park       Harlem             0.13829591          9.039286          
	Battery Park       Inwood             0.18235294         11.950000          
	Battery Park       Midtown            0.13428280          5.496734          
	Battery Park       Morningside Heights 0.15595534          7.947419          
	Battery Park       Murray Hill        0.14230493          5.457552          
	Battery Park       North Sutton Area  0.13318941          6.866011          
	Battery Park       Upper East Side    0.13112513          7.677720          
	Battery Park       Upper West Side    0.13287111          5.773742          
	Battery Park       Washington Heights 0.12779408         10.206923          
	Battery Park       Yorkville          0.14815491          8.544762          
	Central Park       Battery Park       0.11153630          5.983153          
	Central Park       Chinatown          0.12276618          5.425789          
	Central Park       East Village       0.13408203          4.644099          
	Central Park       Financial District 0.09623395          6.620338          
	Central Park       Gramercy           0.12756278          3.222990          
	Central Park       Greenwich Village  0.12932688          3.919407          
	Central Park       Little Italy       0.11914271          4.660275          
	Central Park       Lower East Side    0.11894056          5.403105          
	Central Park       Soho               0.10882615          4.381541          
	Central Park       Tribeca            0.13259363          5.202881          
	Central Park       Washington Heights 0.09936008          4.700000          
	Central Park       West Village       0.13151131          3.813692          
	Chelsea            Central Park       0.13334925          3.164956          
	⋮ ⋮ ⋮ ⋮
	Washington Heights Yorkville          0.08562844          5.209286          
	West Village       Central Park       0.11439411          3.897739          
	West Village       East Harlem        0.12856174          7.203200          
	West Village       Hamilton Heights   0.11837929          7.470263          
	West Village       Harlem             0.10529889          7.113514          
	West Village       Inwood             0.20812500         10.460000          
	West Village       Morningside Heights 0.14703183          6.389273          
	West Village       North Sutton Area  0.13483326          3.603191          
	West Village       Upper East Side    0.13367464          4.646157          
	West Village       Upper West Side    0.13723914          3.919866          
	West Village       Washington Heights 0.14651280          8.680526          
	West Village       Yorkville          0.13939565          6.126538          
	Yorkville          Battery Park       0.14375157          8.927209          
	Yorkville          Chelsea            0.12630881          4.988345          
	Yorkville          Chinatown          0.08176344          7.073214          
	Yorkville          Clinton            0.11385074          4.278752          
	Yorkville          East Village       0.12104080          5.233184          
	Yorkville          Financial District 0.11838887          8.146121          
	Yorkville          Garment District   0.10531276          4.162365          
	Yorkville          Gramercy           0.12129103          4.255384          
	Yorkville          Greenwich Village  0.12414755          5.513012          
	Yorkville          Inwood             0.07585581          6.710000          
	Yorkville          Little Italy       0.18410847          5.886102          
	Yorkville          Lower East Side    0.11105280          6.343524          
	Yorkville          Midtown            0.12631612          3.006321          
	Yorkville          Murray Hill        0.12157790          3.433576          
	Yorkville          Soho               0.14101288          6.410496          
	Yorkville          Tribeca            0.13259729          7.757073          
	Yorkville          Washington Heights 0.09903927          4.951351          
	Yorkville          West Village       0.12226814          6.025485          

In [23]:

library(ggplot2)
taxi_df %>%
  filter(pickup_nhood %in% manhattan_hoods,
         dropoff_nhood %in% manhattan_hoods) %>%
  group_by(dropoff_nhood, pickup_nhood) %>%
  summarize(ave_tip = mean(tip_pct),
            ave_dist = mean(trip_distance)) %>%
  filter(ave_dist > 3, ave_tip > 0.05) %>%
  ggplot(aes(x = pickup_nhood, y = dropoff_nhood)) +
    geom_tile(aes(fill = ave_tip), colour = "white") +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          legend.position = 'bottom') +
    scale_fill_gradient(low = "white", high = "steelblue")

In [24]:

library(ggplot2)
taxi_df %>%
  filter(pickup_nhood %in% manhattan_hoods,
         dropoff_nhood %in% manhattan_hoods) %>%
  group_by(dropoff_nhood, pickup_nhood) %>%
  summarize(ave_tip = mean(tip_pct),
            ave_dist = mean(trip_distance)) %>%
  filter(ave_dist > 3, ave_tip > 0.05) %>%
  ggplot(aes(x = pickup_nhood, y = dropoff_nhood)) +
    geom_tile(aes(fill = ave_tip), colour = "white") +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          legend.position = 'bottom') +
    scale_fill_gradient(low = "white", high = "steelblue")

In [25]:

taxi_df %>%
  filter(pickup_nhood %in% manhattan_hoods,
         dropoff_nhood %in% manhattan_hoods) %>%
  group_by(dropoff_nhood, pickup_nhood) %>%
  summarize(ave_tip = mean(tip_pct),
            ave_dist = mean(trip_distance)) %>%
  lm(ave_tip ~ ave_dist, data = .) -> taxi_model         # -> to assign left hand size to the right hand size
summary(taxi_model)

#OR 

#taxi_df %>%
#  filter(pickup_nhood %in% manhattan_hoods,
#         dropoff_nhood %in% manhattan_hoods) %>%
#  group_by(dropoff_nhood, pickup_nhood) %>%
#  summarize(ave_tip = mean(tip_pct),
#            ave_dist = mean(trip_distance)) %>%
#  ungroup() %>%                                       # <-
#  lm(ave_tip ~ ., data = .) -> taxi_model             # <-
#summary(taxi_model)

Call:
lm(formula = ave_tip ~ ave_dist, data = .)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.112258 -0.010882  0.002727  0.014168  0.140976 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.1324307  0.0016071  82.402  < 2e-16 ***
ave_dist    -0.0017345  0.0003004  -5.773 1.15e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02468 on 724 degrees of freedom
Multiple R-squared:  0.04401,	Adjusted R-squared:  0.04269 
F-statistic: 33.33 on 1 and 724 DF,  p-value: 1.153e-08

In [26]:

str(manhattan_hoods)

 chr [1:28] "Chinatown" "Little Italy" "Tribeca" "Lower East Side" ...

In [27]:

library(ggplot2)
taxi_df %>%
  filter(dropoff_nhood %in% manhattan_hoods) %>%
  group_by(dropoff_nhood, pickup_dow) %>%
  summarize(ave_fare = mean(fare_amount)) %>%
  ggplot(aes(x = pickup_dow, y = dropoff_nhood)) +
    geom_tile(aes(fill = ave_fare), colour = "white") +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          legend.position = 'bottom') +
    scale_fill_gradient(low = "white", high = "steelblue")

In [28]:

taxi_hood_sum <- function(taxi_data = taxi_df) {

  mht_url <- "http://alizaidi.blob.core.windows.net/training/manhattan.rds"

  manhattan_hoods <- readRDS(gzcon(url(mht_url)))
  taxi_data %>%
    filter(pickup_nhood %in% manhattan_hoods,
           dropoff_nhood %in% manhattan_hoods) %>%
    group_by(dropoff_nhood, pickup_nhood) %>%
    summarize(ave_tip = mean(tip_pct),
              ave_dist = mean(trip_distance)) %>%
    filter(ave_dist > 3, ave_tip > 0.05) -> sum_df

  return(sum_df)

}

In [29]:

tile_plot_hood <- function(df = taxi_hood_sum()) {

  library(ggplot2)

  ggplot(data = df, aes(x = pickup_nhood, y = dropoff_nhood)) +
    geom_tile(aes(fill = ave_tip), colour = "white") +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          legend.position = 'bottom') +
    scale_fill_gradient(low = "white", high = "steelblue") -> gplot

  return(gplot)
}

In [30]:

#library(plotly)

#taxi_hood_sum(taxi_df) %>% tile_plot_hood 
#OR
tile_plot_hood()

In [ ]:

In [31]:

library(plotly)

In [32]:

embed_notebook(ggplotly(tile_plot_hood()))

In [ ]:

taxi_df %>% group_by(dropoff_dow) %>%
  filter(!is.na(dropoff_nhood), !is.na(pickup_nhood)) %>%
  arrange(desc(tip_pct)) %>%
  do(slice(., 1:2)) %>%
  select(dropoff_dow, tip_amount, tip_pct,
         fare_amount, dropoff_nhood, pickup_nhood)

In [ ]:

dow_lms <- taxi_df %>% sample_n(10^4) %>%
  group_by(dropoff_dow) %>%
  do(lm_tip = lm(tip_pct ~ pickup_nhood + passenger_count + pickup_hour,
     data = .))

In [ ]:

dow_lms

In [ ]:

summary(dow_lms$lm_tip[[1]])
library(broom)
dow_lms %>% tidy(lm_tip)

In [ ]:

library(broom)
taxi_df %>% sample_n(10^5) %>%
  group_by(dropoff_dow) %>%
  do(glance(lm(tip_pct ~ pickup_nhood + passenger_count + pickup_hour,
     data = .)))

In [ ]:

taxi_df %>% sample_n(10^5) %>%
  group_by(dropoff_dow) %>%
  do(tidy(lm(tip_pct ~ pickup_nhood + passenger_count + pickup_hour,
     data = .)))

Symbol	Meaning
`<-`	assignment operator
`>`	ready for a new command
`+`	awaiting the completion of an existing command
`?`	get help for following function

	Homogeneous	Heterogeneous
1d	Atomic vector	List
2d	Matrix	Data frame
nd	Array

VendorID	passenger_count	trip_distance	RateCodeID	store_and_fwd_flag	payment_type	fare_amount	tip_amount	tolls_amount	pickup_hour	pickup_dow	dropoff_hour	dropoff_dow	pickup_nhood	dropoff_nhood	kSplits
1	1	1.80	1	N	2	9.5	0.00	0.00	6-10	Sat	6-10	Sat	Morningside Heights	Hamilton Heights	A
1	2	0.90	1	N	1	6.5	1.55	0.00	6-10	Sat	6-10	Sat	Midtown	Midtown	A
1	1	0.90	1	N	1	7.0	1.66	0.00	6-10	Sat	6-10	Sat	Lower East Side	Soho	A
1	1	0.30	1	N	2	3.0	0.00	0.00	6-10	Sat	6-10	Sat	Financial District	Financial District	A
2	1	0.96	1	N	1	5.5	1.30	0.00	6-10	Thu	6-10	Thu	Chelsea	West Village	A
2	1	2.01	1	N	1	9.5	2.16	0.00	10-5	Sun	10-5	Sun	Upper East Side	Harlem	A
2	3	3.14	1	N	1	12.5	2.50	0.00	12-4	Sun	12-4	Sun	Fort Green	Soho	A
1	1	0.50	1	N	1	4.0	1.00	0.00	12-4	Sun	12-4	Sun	Upper East Side	Upper East Side	A
2	1	0.67	1	N	1	5.0	1.00	0.00	12-4	Thu	12-4	Thu	Upper West Side	Upper West Side	A
2	1	15.20	2	N	1	52.0	14.33	5.33	12-4	Thu	12-4	Thu	NA	Clinton	A
2	5	2.96	1	N	2	20.5	0.00	0.00	12-4	Thu	12-4	Thu	Upper East Side	Garment District	A
1	1	0.70	1	N	2	6.0	0.00	0.00	9-12	Mon	9-12	Mon	Upper East Side	Upper East Side	A
1	1	2.60	1	N	1	16.0	3.35	0.00	9-12	Thu	9-12	Thu	Upper East Side	Gramercy	A
2	2	0.79	1	N	2	5.0	0.00	0.00	12-4	Wed	12-4	Wed	NA	NA	A
2	1	3.37	1	N	1	18.0	3.60	0.00	12-4	Wed	12-4	Wed	Upper East Side	Chelsea	A
1	3	2.40	1	N	2	11.0	0.00	0.00	6-10	Tue	10-5	Tue	East Village	Garment District	A
1	1	16.30	1	Y	1	45.0	11.57	0.00	6-10	Tue	10-5	Tue	NA	NA	A
1	1	5.70	1	N	1	25.0	5.16	0.00	12-4	Mon	12-4	Mon	Midtown	NA	A
1	1	3.20	1	N	2	16.0	0.00	0.00	12-4	Mon	12-4	Mon	Midtown	Upper West Side	A
1	1	0.70	1	N	2	4.5	0.00	0.00	12-4	Mon	12-4	Mon	Upper West Side	Harlem	A
1	1	1.00	1	N	2	6.0	0.00	0.00	10-5	Tue	10-5	Tue	Midtown	Upper West Side	A
1	1	1.50	1	N	1	8.5	1.00	0.00	10-5	Tue	10-5	Tue	West Village	East Village	A
1	2	5.00	1	N	2	21.5	0.00	0.00	12-4	Mon	12-4	Mon	Midtown	Jackson Heights	A
1	1	2.00	1	N	1	11.5	3.69	0.00	12-4	Mon	12-4	Mon	Downtown	Fort Green	A
1	1	1.40	1	N	1	8.0	1.75	0.00	12-4	Mon	12-4	Mon	Upper East Side	Upper East Side	A
1	1	1.40	1	N	1	8.5	1.85	0.00	12-4	Mon	12-4	Mon	Midtown	Gramercy	A
1	1	1.80	1	N	2	9.5	0.00	0.00	12-4	Mon	12-4	Mon	Upper West Side	Midtown	A
2	1	6.08	1	N	1	18.5	3.70	0.00	6-10	Sun	6-10	Sun	Financial District	Upper East Side	A
2	1	1.96	1	N	1	8.5	2.12	0.00	6-10	Sun	6-10	Sun	Chelsea	Midtown	A
2	6	1.26	1	N	2	7.0	0.00	0.00	6-10	Sun	6-10	Sun	Garment District	Murray Hill	A
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
2	1	14.21	1	N	2	39.5	0.00	0.00	6-10	Tue	6-10	Tue	NA	Sunny Side	A
2	1	1.79	1	N	2	10.0	0.00	0.00	6-10	Tue	6-10	Tue	Upper East Side	Upper West Side	A
2	1	5.34	1	N	2	20.0	0.00	0.00	6-10	Tue	6-10	Tue	Upper West Side	Battery Park	A
2	1	1.51	1	N	1	8.0	1.86	0.00	6-10	Tue	6-10	Tue	Midtown	Upper East Side	A
2	6	2.92	1	N	1	12.5	1.00	0.00	4-6	Sat	4-6	Sat	Upper East Side	Midtown	A
2	6	1.41	1	N	2	7.5	0.00	0.00	4-6	Sun	4-6	Sun	West Village	Soho	A
2	2	4.91	1	N	1	17.0	3.00	0.00	10-5	Tue	10-5	Tue	NA	Upper East Side	A
2	3	0.92	1	N	2	5.0	0.00	0.00	10-5	Tue	10-5	Tue	East Village	Gramercy	A
2	2	13.31	1	N	2	37.0	0.00	0.00	10-5	Tue	10-5	Tue	NA	NA	A
2	1	3.46	1	N	2	12.5	0.00	0.00	10-5	Tue	10-5	Tue	East Village	Clinton	A
2	5	6.08	1	N	1	20.5	4.36	0.00	10-5	Tue	10-5	Tue	Garment District	Cobble Hill	A
2	1	1.41	1	N	1	7.0	1.66	0.00	10-5	Tue	10-5	Tue	Soho	East Village	A
2	1	0.83	1	N	1	5.5	1.36	0.00	10-5	Fri	10-5	Fri	Lower East Side	East Village	A
2	3	1.44	1	N	1	8.5	1.86	0.00	12-4	Thu	12-4	Thu	Upper West Side	Clinton	A
2	1	1.00	1	N	2	7.5	0.00	0.00	12-4	Thu	12-4	Thu	Upper East Side	Upper West Side	A
2	1	5.72	1	N	1	24.0	4.96	0.00	5-9	Tue	9-12	Tue	Financial District	Upper East Side	A
2	1	8.83	1	N	1	34.0	10.03	5.33	5-9	Tue	9-12	Tue	NA	Midtown	A
2	2	1.38	1	N	1	6.5	1.46	0.00	5-9	Tue	5-9	Tue	Upper East Side	Yorkville	A
2	1	1.61	1	N	1	14.5	3.06	0.00	5-9	Tue	9-12	Tue	Gramercy	Midtown	A
2	1	2.28	1	N	1	15.0	3.16	0.00	5-9	Tue	9-12	Tue	Upper East Side	Murray Hill	A
2	1	1.00	1	N	1	6.0	0.00	0.00	5-9	Tue	5-9	Tue	Chelsea	Chelsea	A
2	1	3.24	1	N	2	13.5	0.00	0.00	10-5	Sat	10-5	Sat	NA	Fort Green	A
2	1	1.13	1	N	1	6.5	1.95	0.00	10-5	Sat	10-5	Sat	Midtown	Midtown	A
2	1	2.27	1	N	2	9.5	0.00	0.00	10-5	Sat	10-5	Sat	Upper East Side	Harlem	A
2	1	1.87	1	N	1	8.5	1.00	0.00	10-5	Sat	10-5	Sat	Morningside Heights	Hamilton Heights	A
2	5	1.57	1	N	2	7.0	0.00	0.00	10-5	Sun	10-5	Sun	Midtown	Chelsea	A
2	2	2.75	1	N	2	12.0	0.00	0.00	10-5	Sun	10-5	Sun	Tribeca	Garment District	A
2	5	9.18	1	N	1	30.5	6.36	0.00	10-5	Sun	10-5	Sun	Midtown	Park Slope	A
2	1	0.79	1	N	1	5.0	1.16	0.00	6-10	Sat	6-10	Sat	Upper West Side	Upper West Side	A
2	1	0.75	1	N	1	4.5	1.00	0.00	6-10	Sat	6-10	Sat	Chelsea	Chelsea	A

fare_amount	pickup_nhood	dropoff_nhood
3130.30	Borough Park	Borough Park
3130.30	Upper West Side	Upper West Side
990.00	NA	NA
900.00	Little Italy	Little Italy
900.00	NA	NA
630.01	Chelsea	Murray Hill
600.00	Throggs Neck	City Island
500.00	Chinatown	Chinatown
500.00	Chinatown	Chinatown
500.00	Clinton	NA

tip_amount	dropoff_nhood	pickup_nhood
454.00	Bedford-Stuyvesant	Bedford-Stuyvesant
300.00	Greenwich Village	Chelsea
239.07	Clinton	Greenwich Village
221.20	Clinton	Central Park
202.00	Lower East Side	Upper East Side
173.00	NA	NA

tip_pct	fare_amount	tip_amount
0.0000000	9.5	0.00
0.2384615	6.5	1.55
0.2371429	7.0	1.66
0.0000000	3.0	0.00
0.2363636	5.5	1.30
0.2273684	9.5	2.16

pickup_nhood	dropoff_nhood	Num	ave_tip_pct
Ardon Heights	Ardon Heights	1	0.00000000
Astoria-Long Island City	Astoria-Long Island City	6714	0.09712743
Astoria-Long Island City	Auburndale	13	0.06520367
Astoria-Long Island City	Battery Park	14	0.10140865
Astoria-Long Island City	Bay Ridge	5	0.15388657
Astoria-Long Island City	Bedford Park	3	0.05550679
Astoria-Long Island City	Bedford-Stuyvesant	68	0.09819024
Astoria-Long Island City	Bensonhurst	2	0.00000000
Astoria-Long Island City	Boerum Hill	12	0.09941085
Astoria-Long Island City	Borough Park	9	0.08205749
Astoria-Long Island City	Brownsville	4	0.10376598
Astoria-Long Island City	Bushwick	53	0.05616446
Astoria-Long Island City	Canarsie	3	0.08080808
Astoria-Long Island City	Carnegie Hill	29	0.10760679
Astoria-Long Island City	Carroll Gardens	19	0.15358826
Astoria-Long Island City	Central Park	72	0.12755575
Astoria-Long Island City	Chelsea	224	0.11327865
Astoria-Long Island City	Chinatown	8	0.09015545
Astoria-Long Island City	Clearview	7	0.02959184
Astoria-Long Island City	Clinton	163	0.11348404
Astoria-Long Island City	Cobble Hill	2	0.10235294
Astoria-Long Island City	Douglastown-Little Neck	2	0.00000000
Astoria-Long Island City	Downtown	22	0.13353090
Astoria-Long Island City	Dyker Heights	1	0.00000000
Astoria-Long Island City	East Brooklyn	8	0.04740602
Astoria-Long Island City	East Harlem	79	0.07145051
Astoria-Long Island City	East Village	171	0.14311855
Astoria-Long Island City	Financial District	117	0.13240868
Astoria-Long Island City	Flushing	54	0.08736876
Astoria-Long Island City	Fordham	5	0.04092308
⋮	⋮	⋮	⋮
NA	Soundview	114	0.02754077
NA	South Beach	15	0.11433528
NA	South Bronx	236	0.05923333
NA	Springfield Gardens	655	0.06109878
NA	Spuyten Duyvil	131	0.11685827
NA	Sunny Side	2678	0.12713336
NA	Sunset Park	288	0.11079174
NA	The Rockaways	259	0.06733444
NA	Throggs Neck	80	0.08751933
NA	Todt Hill	12	0.03739845
NA	Tottensville	2	0.14492754
NA	Tremont	119	0.02535450
NA	Tribeca	1931	0.16457058
NA	Union Port	64	0.04806241
NA	University Heights	82	0.03717917
NA	Upper East Side	10544	0.16246302
NA	Upper West Side	7967	0.16120439
NA	Utopia	677	0.06395129
NA	Wakefield-Williamsbridge	107	0.04609067
NA	Washington Heights	1241	0.13664189
NA	West Village	1726	0.16814002
NA	Westerleigh-Castleton	11	0.04190773
NA	Whitestone	1	0.00000000
NA	Williams Bridge	65	0.06279349
NA	Williamsburg	5111	0.14403349
NA	Woodhaven-Richmond Hill	480	0.03536936
NA	Woodlawn-Nordwood	24	0.08135023
NA	Woodside	765	0.08492267
NA	Yorkville	957	0.15686904
NA	NA	27834	0.47254372

pickup_nhood	dropoff_nhood	Num	ave_tip_pct
Kings Bridge	Kings Bridge	11	1.2256871
Upper East Side	Brownsville	23	0.9447013
Bay Ridge	Bay Ridge	60	0.7915420
The Rockaways	The Rockaways	33	0.7828405
Gravesend-Sheepshead Bay	Gravesend-Sheepshead Bay	80	0.5122171
NA	NA	27834	0.4725437
Bensonhurst	Bensonhurst	59	0.4512018
Upper East Side	Spuyten Duyvil	65	0.3463543
Soho	Mott Haven	12	0.3239580
Bedford Park	Bedford Park	30	0.3078915
Midtown	Westerleigh-Castleton	10	0.3017644
Bay Ridge	Gravesend-Sheepshead Bay	10	0.2951905
Clinton	Bay Ridge	90	0.2893189
Midtown	South Beach	17	0.2473338
Battery Park	Greenwood	15	0.2396805
Fordham	Fordham	15	0.2326797
Bedford-Stuyvesant	Bedford-Stuyvesant	936	0.2293516
Greenwood	Financial District	11	0.2273952
Carnegie Hill	Carroll Gardens	13	0.2223942
Lower East Side	Spuyten Duyvil	10	0.2105018
Murray Hill	Clearview	10	0.2102742
Nkew Gardens	Midtown	10	0.2101923
Battery Park	Park Slope	134	0.2032499
Hunts Point	Hunts Point	13	0.2014353
Washington Heights	NA	150	0.1995337
Forest Hills	Gramercy	26	0.1990748
Battery Park	Battery Park	1145	0.1987076
Jackson Heights	Financial District	13	0.1970842
Carroll Gardens	West Village	51	0.1966033
Financial District	Spuyten Duyvil	10	0.1964487
⋮	⋮	⋮	⋮
Yorkville	South Bronx	127	0.018011226
Parkchester	Parkchester	28	0.017944170
East Harlem	Bedford Park	17	0.017535651
Clinton	Soundview	27	0.017476971
Sunny Side	Woodhaven-Richmond Hill	13	0.017094017
Yorkville	Woodlawn-Nordwood	12	0.016944444
Williamsburg	Brownsville	11	0.016161616
Greenwich Village	Soundview	21	0.015615496
Brownsville	Bedford-Stuyvesant	14	0.015566502
Harlem	Morris Heights	31	0.015263697
South Bronx	East Harlem	12	0.015151515
Brownsville	East Brooklyn	12	0.014800000
Mott Haven	Tremont	18	0.014422658
Jamaica	Utopia	11	0.013722127
West Village	High Bridge	15	0.013529412
Fordham	Tremont	12	0.013333333
Utopia	Jamaica	21	0.012786596
Harlem	Parkchester	23	0.012385605
Garment District	East Brooklyn	37	0.012137732
Harlem	Wakefield-Williamsbridge	11	0.006060606
East Harlem	Hunts Point	25	0.005934066
High Bridge	Mott Haven	43	0.004651163
Woodhaven-Richmond Hill	Jamaica	16	0.004166667
Yorkville	Soundview	26	0.003870043
Hamilton Heights	Morris Heights	16	0.000000000
Jamaica	Woodhaven-Richmond Hill	13	0.000000000
Mott Haven	Soundview	16	0.000000000
Mott Haven	University Heights	10	0.000000000
Tremont	Fordham	11	0.000000000
University Heights	Inwood	11	0.000000000

dropoff_nhood	pickup_nhood	ave_tip	ave_dist
Battery Park	Central Park	0.12694563	6.149281
Battery Park	Clinton	0.11996579	4.016902
Battery Park	East Harlem	0.07116177	10.124000
Battery Park	East Village	0.14717019	3.537367
Battery Park	Garment District	0.13463903	3.965532
Battery Park	Gramercy	0.14396885	4.153174
Battery Park	Hamilton Heights	0.06770436	8.843571
Battery Park	Harlem	0.13829591	9.039286
Battery Park	Inwood	0.18235294	11.950000
Battery Park	Midtown	0.13428280	5.496734
Battery Park	Morningside Heights	0.15595534	7.947419
Battery Park	Murray Hill	0.14230493	5.457552
Battery Park	North Sutton Area	0.13318941	6.866011
Battery Park	Upper East Side	0.13112513	7.677720
Battery Park	Upper West Side	0.13287111	5.773742
Battery Park	Washington Heights	0.12779408	10.206923
Battery Park	Yorkville	0.14815491	8.544762
Central Park	Battery Park	0.11153630	5.983153
Central Park	Chinatown	0.12276618	5.425789
Central Park	East Village	0.13408203	4.644099
Central Park	Financial District	0.09623395	6.620338
Central Park	Gramercy	0.12756278	3.222990
Central Park	Greenwich Village	0.12932688	3.919407
Central Park	Little Italy	0.11914271	4.660275
Central Park	Lower East Side	0.11894056	5.403105
Central Park	Soho	0.10882615	4.381541
Central Park	Tribeca	0.13259363	5.202881
Central Park	Washington Heights	0.09936008	4.700000
Central Park	West Village	0.13151131	3.813692
Chelsea	Central Park	0.13334925	3.164956
⋮	⋮	⋮	⋮
Washington Heights	Yorkville	0.08562844	5.209286
West Village	Central Park	0.11439411	3.897739
West Village	East Harlem	0.12856174	7.203200
West Village	Hamilton Heights	0.11837929	7.470263
West Village	Harlem	0.10529889	7.113514
West Village	Inwood	0.20812500	10.460000
West Village	Morningside Heights	0.14703183	6.389273
West Village	North Sutton Area	0.13483326	3.603191
West Village	Upper East Side	0.13367464	4.646157
West Village	Upper West Side	0.13723914	3.919866
West Village	Washington Heights	0.14651280	8.680526
West Village	Yorkville	0.13939565	6.126538
Yorkville	Battery Park	0.14375157	8.927209
Yorkville	Chelsea	0.12630881	4.988345
Yorkville	Chinatown	0.08176344	7.073214
Yorkville	Clinton	0.11385074	4.278752
Yorkville	East Village	0.12104080	5.233184
Yorkville	Financial District	0.11838887	8.146121
Yorkville	Garment District	0.10531276	4.162365
Yorkville	Gramercy	0.12129103	4.255384
Yorkville	Greenwich Village	0.12414755	5.513012
Yorkville	Inwood	0.07585581	6.710000
Yorkville	Little Italy	0.18410847	5.886102
Yorkville	Lower East Side	0.11105280	6.343524
Yorkville	Midtown	0.12631612	3.006321
Yorkville	Murray Hill	0.12157790	3.433576
Yorkville	Soho	0.14101288	6.410496
Yorkville	Tribeca	0.13259729	7.757073
Yorkville	Washington Heights	0.09903927	4.951351
Yorkville	West Village	0.12226814	6.025485

Course Logistics

Day One

R U Ready?

Day Two

Scalable Data Analysis with Microsoft R

Day Three

Distributing Computing on Spark Clusters with R

Prerequisites

Computing Environments

Development Environments

Where to Write R Code

What is R?

Why should I care?

R's Philosophy

What R Thou?

The aRt of Being Lazy

Lazy Evaluation in R

R's Programming Paradigm

Keys to R

Strengths of R

Where R Succeeds

Weaknesses of R

Where R Falls Short

Some Essential Open Source Packages

R Foundations

Command line prompts

I'm Lost!

Getting Help for R

Quick Tour of Things You Need to Know

Data Structures

Quick Tour of Things You Need to Know

Data Types

Manipulating Data Structures

Subsetting Operators

Object Representation

Data Manipulation with the dplyr Package

Overview

Why use dplyr?

The Grammar of Data Manipulation

Tidy Data and Happier Coding

Premature Optimization

Manipulation verbs

Aggregation verbs

NYC Taxi Data

Data for Class

Viewing Data

tibble

Filtering and Reordering Data

Subsetting Data

Filter

Exercise

Solution

Select a set of columns

Select Example

Select: Other Options

Reordering Data

Arrange

Exercise

Summary

Data Aggregations and Transformations

Transformations

Summarise Data by Groups

Group By Neighborhoods Example

Chaining/Piping

Standard Code

Reformatted

Magrittr

Put that Function in Your Pipe and...

Pipe + group_by()

Pipe and Plot

Piping to other arguments

Exercise

Functional Programming

Creating Functional Pipelines

Too Many Pipes?

Reusable code

Functional Pipelines

Summarization

Functional Pipelines

Plotting Function

Data Manipulation with the `dplyr` Package