Google BigQuery Standard SQL supports parameterization. It is interesting to be able to use Python variables defined in the notebook as parameter values for SQL.
This notebook is an example how to use parameterized queries.
In [2]:
%%bq query -n logs_query
SELECT * FROM `cloud-datalab-samples.httplogs.logs_20140615`
In [3]:
%bq sample -q logs_query --count 10
Out[3]:
In [6]:
%%bq query
SELECT endpoint FROM `cloud-datalab-samples.httplogs.logs_20140615` GROUP BY endpoint
Out[6]:
Parameters are declared in SQL queries using a @name
syntax within the SQL, and then defining name
's value when executing the query. Notice you will have to define the query and execute it in two different cells. The shorthand way of running queries (using %%bq query
without --name
) gives you little control over the execution of the query.
In [7]:
%%bq query -n endpoint_stats
SELECT *
FROM `cloud-datalab-samples.httplogs.logs_20140615`
WHERE endpoint = @endpoint
LIMIT 10
In [8]:
%%bq execute -q endpoint_stats
parameters:
- name: endpoint
type: STRING
value: Interact2
Out[8]:
This defined a SQL query with a string parameter named endpoint
, which can be filled when executing the query. Let's give it some value in a separate cell:
In [9]:
endpoint_val = 'Interact3'
In order to reference the variable defined above, Google Cloud Datalab offers the $var
syntax, which can be invoked in the magic command:
In [10]:
%%bq execute -q endpoint_stats
parameters:
- name: endpoint
type: STRING
value: $endpoint_val
Out[10]:
This can also be achieved using the Python API instead of the magic commands (%%bq
). This is how we will create and execute a parameterized query using the API:
In [11]:
import google.datalab.bigquery as bq
endpoint_stats2 = bq.Query(sql='''
SELECT *
FROM `cloud-datalab-samples.httplogs.logs_20140615`
WHERE endpoint = @endpoint
LIMIT 10
''')
endpoint_value = 'Interact3'
query_parameters = [
{
'name': 'endpoint',
'parameterType': {'type': 'STRING'},
'parameterValue': {'value': endpoint_value}
}
]
job = endpoint_stats2.execute(query_params=query_parameters)
job.result()
Out[11]:
Parameterization enables one part of the SQL and Python integration: being able to use values in Python code in the notebook, and passing them in as part of the query when retrieving data from BigQuery.
The next notebook will cover the other part of the SQL and Python integration: retrieving query results into the notebook for use with Python code.