Before you begin

Use the Cloud Resource Manager to Create a Cloud Platform project if you do not already have one.
Enable billing for the project.
Enable BigQuery APIs for the project.

Provide your credentials to the runtime



In [0]:

    
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Optional: Enable data table display

Colab includes the google.colab.data_table package that can be used to display large pandas dataframes as an interactive data table. It can be enabled with:



In [0]:

    
%load_ext google.colab.data_table

If you would prefer to return to the classic Pandas dataframe display, you can disable this by running:

%unload_ext google.colab.data_table

Use BigQuery via magics

The google.cloud.bigquery library also includes a magic command which runs a query and either displays the result or saves it to a variable as a DataFrame.



In [0]:

    
# Display query output immediately

%%bigquery --project yourprojectid
SELECT 
  COUNT(*) as total_rows
FROM `bigquery-public-data.samples.gsod`



In [0]:

    
# Save output in a variable `df`

%%bigquery --project yourprojectid df
SELECT 
  COUNT(*) as total_rows
FROM `bigquery-public-data.samples.gsod`



In [0]:

    
df

Use BigQuery through google-cloud-bigquery

See BigQuery documentation and library reference documentation.

The GSOD sample table contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.

Declare the Cloud project ID which will be used throughout this notebook



In [0]:

    
project_id = '[your project ID]'

Sample approximately 2000 random rows



In [0]:

    
from google.cloud import bigquery

client = bigquery.Client(project=project_id)

sample_count = 2000
row_count = client.query('''
  SELECT 
    COUNT(*) as total
  FROM `bigquery-public-data.samples.gsod`''').to_dataframe().total[0]

df = client.query('''
  SELECT
    *
  FROM
    `bigquery-public-data.samples.gsod`
  WHERE RAND() < %d/%d
''' % (sample_count, row_count)).to_dataframe()

print('Full dataset has %d rows' % row_count)









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-1c78a0ab70ae> in <module>()
      1 from google.cloud import bigquery
      2 
----> 3 client = bigquery.Client(project=project_id)
      4 
      5 sample_count = 2000

NameError: name 'project_id' is not defined

Describe the sampled data



In [0]:

    
df.describe()









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-ea8415b8a3ee> in <module>()
----> 1 df.describe()

NameError: name 'df' is not defined

View the first 10 rows



In [0]:

    
df.head(10)









    Out[0]:







  
    
      
      station_number
      wban_number
      year
      month
      day
      mean_temp
      num_mean_temp_samples
      mean_dew_point
      num_mean_dew_point_samples
      mean_sealevel_pressure
      num_mean_sealevel_pressure_samples
      mean_station_pressure
      num_mean_station_pressure_samples
      mean_visibility
      num_mean_visibility_samples
      mean_wind_speed
      num_mean_wind_speed_samples
      max_sustained_wind_speed
      max_gust_wind_speed
      max_temperature
      max_temperature_explicit
      min_temperature
      min_temperature_explicit
      total_precipitation
      snow_depth
      fog
      rain
      snow
      hail
      thunder
      tornado
    
  
  
    
      0
      105780
      99999
      1968
      9
      13
      46.000000
      8
      44.200001
      8.0
      NaN
      NaN
      NaN
      NaN
      4.700000
      8.0
      15.3
      8.0
      21.0
      NaN
      43.000000
      False
      None
      None
      0.02
      NaN
      True
      True
      True
      True
      True
      True
    
    
      1
      25710
      99999
      1989
      5
      2
      51.299999
      23
      44.900002
      23.0
      NaN
      NaN
      NaN
      NaN
      6.600000
      23.0
      7.5
      23.0
      12.0
      NaN
      42.799999
      True
      None
      None
      0.00
      NaN
      False
      False
      False
      False
      False
      False
    
    
      2
      475160
      99999
      2003
      4
      26
      45.200001
      16
      44.500000
      16.0
      NaN
      NaN
      NaN
      NaN
      4.500000
      16.0
      7.4
      16.0
      13.0
      NaN
      39.200001
      True
      None
      None
      NaN
      NaN
      False
      False
      False
      False
      False
      False
    
    
      3
      476720
      99999
      1989
      12
      8
      51.599998
      4
      34.000000
      4.0
      1005.400024
      4.0
      NaN
      NaN
      9.000000
      4.0
      5.0
      4.0
      8.0
      NaN
      41.900002
      True
      None
      None
      0.00
      NaN
      False
      False
      False
      False
      False
      False
    
    
      4
      940040
      99999
      1991
      6
      9
      84.300003
      4
      75.199997
      4.0
      1009.900024
      4.0
      1009.099976
      4.0
      24.900000
      4.0
      6.0
      4.0
      9.9
      NaN
      79.199997
      True
      None
      None
      0.00
      NaN
      False
      False
      False
      False
      False
      False
    
    
      5
      103250
      99999
      1976
      7
      23
      63.599998
      13
      48.599998
      13.0
      1022.099976
      5.0
      NaN
      NaN
      24.600000
      13.0
      3.0
      12.0
      6.0
      NaN
      48.200001
      False
      None
      None
      0.00
      NaN
      False
      False
      False
      False
      False
      False
    
    
      6
      919280
      99999
      1981
      2
      21
      83.800003
      5
      75.900002
      5.0
      1007.900024
      5.0
      NaN
      NaN
      17.100000
      4.0
      8.5
      5.0
      8.9
      NaN
      77.000000
      False
      None
      None
      0.00
      NaN
      False
      False
      False
      False
      False
      False
    
    
      7
      961710
      99999
      2004
      6
      23
      81.900002
      6
      74.900002
      6.0
      1010.500000
      6.0
      1008.000000
      6.0
      4.800000
      6.0
      1.5
      6.0
      5.1
      NaN
      74.800003
      False
      None
      None
      0.00
      NaN
      False
      False
      False
      False
      False
      False
    
    
      8
      172400
      99999
      1990
      6
      6
      65.400002
      7
      43.900002
      7.0
      1018.200012
      7.0
      906.099976
      7.0
      18.600000
      7.0
      2.0
      7.0
      4.1
      NaN
      45.000000
      False
      None
      None
      0.00
      NaN
      False
      False
      False
      False
      False
      False
    
    
      9
      38790
      99999
      1973
      4
      7
      44.799999
      8
      32.900002
      8.0
      1018.299988
      8.0
      NaN
      NaN
      17.200001
      8.0
      9.1
      8.0
      15.0
      NaN
      35.599998
      True
      None
      None
      NaN
      NaN
      False
      False
      False
      False
      False
      False



In [0]:

    
# 10 highest total_precipitation samples
df.sort_values('total_precipitation', ascending=False).head(10)[['station_number', 'year', 'month', 'day', 'total_precipitation']]









    Out[0]:







  
    
      
      station_number
      year
      month
      day
      total_precipitation
    
  
  
    
      644
      230220
      1964
      7
      15
      5.91
    
    
      1155
      985430
      2008
      12
      8
      3.46
    
    
      1196
      248260
      1961
      11
      1
      2.95
    
    
      1588
      257670
      1959
      8
      9
      2.95
    
    
      980
      299150
      1962
      3
      1
      2.95
    
    
      1325
      470250
      1965
      11
      25
      2.95
    
    
      1917
      288380
      1994
      8
      6
      2.32
    
    
      1211
      585190
      1995
      4
      14
      2.32
    
    
      250
      647000
      2005
      8
      19
      2.20
    
    
      1418
      964710
      1975
      9
      8
      1.97

Use BigQuery through pandas-gbq

The pandas-gbq library is a community led project by the pandas community. It covers basic functionality, such as writing a DataFrame to BigQuery and running a query, but as a third-party library it may not handle all BigQuery features or use cases.

Pandas GBQ Documentation



In [0]:

    
import pandas as pd

sample_count = 2000
df = pd.io.gbq.read_gbq('''
  SELECT name, SUM(number) as count
  FROM `bigquery-public-data.usa_names.usa_1910_2013`
  WHERE state = 'TX'
  GROUP BY name
  ORDER BY count DESC
  LIMIT 100
''', project_id=project_id, dialect='standard')

df.head()

	name	count
0	James	272793
1	John	235139
2	Michael	225320
3	Robert	220399
4	David	219028

	station_number	wban_number	year	month	day	mean_temp	num_mean_temp_samples	mean_dew_point	num_mean_dew_point_samples	mean_sealevel_pressure	num_mean_sealevel_pressure_samples	mean_station_pressure	num_mean_station_pressure_samples	mean_visibility	num_mean_visibility_samples	mean_wind_speed	num_mean_wind_speed_samples	max_sustained_wind_speed	max_gust_wind_speed	max_temperature	max_temperature_explicit	min_temperature	min_temperature_explicit	total_precipitation	snow_depth	fog	rain	snow	hail	thunder	tornado
0	105780	99999	1968	9	13	46.000000	8	44.200001	8.0	NaN	NaN	NaN	NaN	4.700000	8.0	15.3	8.0	21.0	NaN	43.000000	False	None	None	0.02	NaN	True	True	True	True	True	True
1	25710	99999	1989	5	2	51.299999	23	44.900002	23.0	NaN	NaN	NaN	NaN	6.600000	23.0	7.5	23.0	12.0	NaN	42.799999	True	None	None	0.00	NaN	False	False	False	False	False	False
2	475160	99999	2003	4	26	45.200001	16	44.500000	16.0	NaN	NaN	NaN	NaN	4.500000	16.0	7.4	16.0	13.0	NaN	39.200001	True	None	None	NaN	NaN	False	False	False	False	False	False
3	476720	99999	1989	12	8	51.599998	4	34.000000	4.0	1005.400024	4.0	NaN	NaN	9.000000	4.0	5.0	4.0	8.0	NaN	41.900002	True	None	None	0.00	NaN	False	False	False	False	False	False
4	940040	99999	1991	6	9	84.300003	4	75.199997	4.0	1009.900024	4.0	1009.099976	4.0	24.900000	4.0	6.0	4.0	9.9	NaN	79.199997	True	None	None	0.00	NaN	False	False	False	False	False	False
5	103250	99999	1976	7	23	63.599998	13	48.599998	13.0	1022.099976	5.0	NaN	NaN	24.600000	13.0	3.0	12.0	6.0	NaN	48.200001	False	None	None	0.00	NaN	False	False	False	False	False	False
6	919280	99999	1981	2	21	83.800003	5	75.900002	5.0	1007.900024	5.0	NaN	NaN	17.100000	4.0	8.5	5.0	8.9	NaN	77.000000	False	None	None	0.00	NaN	False	False	False	False	False	False
7	961710	99999	2004	6	23	81.900002	6	74.900002	6.0	1010.500000	6.0	1008.000000	6.0	4.800000	6.0	1.5	6.0	5.1	NaN	74.800003	False	None	None	0.00	NaN	False	False	False	False	False	False
8	172400	99999	1990	6	6	65.400002	7	43.900002	7.0	1018.200012	7.0	906.099976	7.0	18.600000	7.0	2.0	7.0	4.1	NaN	45.000000	False	None	None	0.00	NaN	False	False	False	False	False	False
9	38790	99999	1973	4	7	44.799999	8	32.900002	8.0	1018.299988	8.0	NaN	NaN	17.200001	8.0	9.1	8.0	15.0	NaN	35.599998	True	None	None	NaN	NaN	False	False	False	False	False	False

	station_number	year	month	day	total_precipitation
644	230220	1964	7	15	5.91
1155	985430	2008	12	8	3.46
1196	248260	1961	11	1	2.95
1588	257670	1959	8	9	2.95
980	299150	1962	3	1	2.95
1325	470250	1965	11	25	2.95
1917	288380	1994	8	6	2.32
1211	585190	1995	4	14	2.32
250	647000	2005	8	19	2.20
1418	964710	1975	9	8	1.97