In [3]:
%sql -d standard
SELECT
*
FROM
`nyc-tlc.yellow.trips`
LIMIT
5
Out[3]:
Let's look at the table schema:
In [4]:
%bigquery schema --table nyc-tlc:yellow.trips
Out[4]:
In [8]:
%%bq query -n pickup_time
WITH subquery AS (
SELECT
EXTRACT(HOUR FROM pickup_datetime) AS hour
FROM
`nyc-tlc.yellow.trips`)
SELECT
Hour,
COUNT(Hour) AS count
FROM
subquery
GROUP BY
Hour
ORDER BY
count DESC
Let's name this query result pickup_time and reference it to create the chart below.
In [9]:
# Let's visualize the pick-up time distribution
%chart columns --data pickup_time
Out[9]:
7:00 PM is the most common pick-up time.
In [10]:
%%sql -d legacy -m vendor
SELECT
TOP(vendor_id) AS vendor,
COUNT(*) AS count
FROM
[nyc-tlc:yellow.trips]
Let's label this query result vendor and reference it to create the following pie chart.
In [11]:
%chart pie --data vendor
Out[11]:
In [12]:
%%sql -d legacy
SELECT
QUANTILES(trip_distance, 5) AS quantile,
MIN(trip_distance) AS min,
MAX(trip_distance) AS max,
AVG(trip_distance) AS avg,
STDDEV(trip_distance) AS std_dev
FROM
[nyc-tlc:yellow.trips]
Out[12]:
Datalab also supports LaTeX rendering. The min distance is $-4.08\times10^7$ miles (interesting!), $Q_1$ is 0.9 miles and $Q_3$ is 2.7 miles. The trip distance is skewed to the right since the mean is greater than the median (1.54 miles).
In [13]:
%%bq query -n pickup_location
SELECT
pickup_latitude,
pickup_longitude
FROM
`nyc-tlc.yellow.trips`
LIMIT
10
In [25]:
%%chart map --data pickup_location
Out[25]:
In [28]:
%%bq query -n dispute
SELECT
trip_distance,
fare_amount
FROM
`nyc-tlc.yellow.trips`
WHERE
rate_code = "2"
AND payment_type = "DIS"
In [29]:
%%chart scatter --data dispute
height: 400
hAxis:
title: Distance
vAxis:
title: Fare Amount
trendlines:
0:
type: line
color: green
showR2: true
visibleInLegend: true
Out[29]:
There seems to be a weak positive relationship ($r = +\sqrt{r^2} = 0.145$) between the trip distance and the fare amount for taxis that picked up rides from the airport and had payment disputes.
In [ ]:
!git add *
!git commit -m "your message"
!git push
Delete your Datalab VM instance to avoid incurring charges to your account.