BQX Jupyter Notebook Demo

This example shows you how to play with data along with pandas.

Communication between this code and BigQuery API is fully automated thanks to pandas.

Initialize


In [1]:
from bqx.query import Query as Q
from bqx.parts import Table as T, Column as C
from bqx.func import SUM
import pandas as pd
import pandas.io.gbq as gbq

Create query


In [2]:
shakespeare = T('publicdata:samples.shakespeare')
count = C('word_count')
corpus = C('corpus')
count_sum = SUM(count).AS('count_sum')

q = (
    Q()
    .SELECT(corpus, count_sum)
    .FROM(shakespeare)
    .GROUP_BY(corpus)
    .ORDER_BY(count_sum).DESC())
print(q.getq())


SELECT corpus, SUM(word_count) AS count_sum
FROM publicdata:samples.shakespeare
GROUP BY corpus
ORDER BY count_sum
DESC

Execute query

Excecuting query and retrieving result are really easy. Just call pandas.io.gbq.read_gbq() with Query.getq() inside and you'll get a DataFrame.

That's it!


In [6]:
gbq.read_gbq(q.getq(), 'YOUR_PROJECT_ID', reauth=True)


Waiting for job to complete...
Out[6]:
corpus count_sum
0 hamlet 32446
1 kingrichardiii 31868
2 coriolanus 29535
3 cymbeline 29231
4 2kinghenryiv 28241
5 othello 28076
6 kinglear 28001
7 kinghenryv 27894
8 troilusandcressida 27837
9 antonyandcleopatra 27310
10 2kinghenryvi 27165
11 1kinghenryiv 26401
12 3kinghenryvi 26288
13 kinghenryviii 26265
14 romeoandjuliet 26192
15 winterstale 26181
16 allswellthatendswell 24622
17 kingrichardii 24150
18 merrywivesofwindsor 24033
19 measureforemeasure 23303
20 1kinghenryvi 23272
21 loveslabourslost 23189
22 asyoulikeit 23087
23 muchadoaboutnothing 22760
24 merchantofvenice 22448
25 tamingoftheshrew 22358
26 kingjohn 21983
27 titusandronicus 21911
28 twelfthnight 21633
29 juliuscaesar 21052
30 periclesprinceoftyre 19846
31 timonofathens 19841
32 twogentlemenofverona 18493
33 macbeth 18439
34 sonnets 17805
35 tempest 17593
36 midsummersnightsdream 17348
37 comedyoferrors 16361
38 rapeoflucrece 15221
39 venusandadonis 10035
40 various 3545
41 loverscomplaint 2586