This sample notebook demonstrates working with Google Analytics page views and session data exported to Google BigQuery.
Google Analytics offers BigQuery export as part of its premium offering. If you're a premium user, you have the ability to export any of your analytics views to a BigQuery dataset that you own. If you're not, you can use the Analytics API to retrieve and import the data used to generate the default Analytics dashboards.
The sample data used in this notebook shares the same schema as the Google Analytics BigQuery export, but it is from a sample, pubicly available account. It is also small in size. This notebook demonstrates one possible custom analytics scenario, and is not based upon actual data.
Related Links:
In [1]:
import google.datalab.bigquery as bq
In [2]:
%%bq tables describe -n "google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910"
Out[2]:
The Google Analytics dataset has a large schema. It should be interesting to inspect some of the data in important columns.
In [3]:
%%bq query -n sessions
SELECT fullVisitorId, visitId, hit.hitNumber as hitNumber, hit.page.pagePath as path
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
CROSS JOIN UNNEST(hits) as hit
ORDER BY visitStartTime, hitNumber
In [4]:
%bq execute --query sessions
Out[4]:
The data is organized as a set of visits (or sessions), with each visit containing a set of hits (or page views), in succession. Each hit has a URL path associated with it. Here is another query that shows paths and the number of hits across sessions.
In [5]:
%%bq query -n hits
SELECT hit.page.pagePath as path, COUNT(visitId) as hitCount
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
CROSS JOIN UNNEST(hits) as hit
GROUP BY path
ORDER BY hitCount DESC
In [6]:
%%bq execute -q hits
Out[6]:
For the purposes of this sample, the question to be answered is "Which path leads to higher conversion ratio: users landing on the landing page (path = '/') or users landing on a product page (eg. '/vests/yellow.html')?" "Conversion" is defined as the user loading the '/confirm/' page within a single session.
In [7]:
%%bq query -n conversions
WITH
AnnotatedVisits AS (
SELECT
visitId,
hit.page.pagePath AS path,
hit.hitNumber AS hitNumber,
'/confirm.html' IN (SELECT page.pagePath FROM UNNEST(hits)) AS transacted
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
CROSS JOIN UNNEST(hits) AS hit
ORDER BY visitStartTime, hitNumber)
SELECT
IF (path = '/', 'home', 'product') AS start,
IF (transacted, 'completed', 'abandoned') AS outcome,
COUNT(*) AS count
FROM AnnotatedVisits
WHERE hitNumber = 1
GROUP BY start, outcome
ORDER BY outcome, start
In [8]:
%%bq execute -q conversions
Out[8]:
The matrix tells us a bit about completed visits vs. abandoned visits, depending on the starting point. However, this is more easily seen in a sankey diagram, which is provided by the Google Charting API.
In [9]:
%%chart sankey --data conversions
{
"sankey": {
"node": {
"colors": [ "black", "red", "black", "green" ]
}
}
}
Out[9]:
This sample demonstrated how you can apply the power of BigQuery to your Google Analytics, and more generally to any analytics telemetry data. Essentially, you can build custom metrics and visualizations, and track the behavior and engagement of your audience more effectively by joining analytics data with your own data.