Notebook: using jsonstat.py with eurostat api

This Jupyter notebook shows the python library jsonstat.py in action. It shows how to explore dataset downloaded from a data provider. This notebook uses some datasets from Eurostat. Eurostat provides a rest api to download its datasets. You can find details about the api here It is possible to use a query builder for discovering the rest api parameters. The following image shows the query builder:



In [1]:

    
# all import here
from __future__ import print_function
import os
import pandas as pd
import jsonstat

import matplotlib as plt
%matplotlib inline

1 - Exploring data with one dimension (time) with size > 1

Following cell downloads a datataset from eurostat. If the file is already downloaded use the copy presents on the disk. Caching file is useful to avoid downloading dataset every time notebook runs. Caching can speed the development, and provides consistent results. You can see the raw data here



In [2]:

    
url_1 = 'http://ec.europa.eu/eurostat/wdds/rest/data/v1.1/json/en/nama_gdp_c?precision=1&geo=IT&unit=EUR_HAB&indic_na=B1GM'
file_name_1 = "eurostat-name_gpd_c-geo_IT.json"

file_path_1 = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.ec.europa.eu_eurostat", file_name_1))
if os.path.exists(file_path_1):
    print("using already donwloaded file {}".format(file_path_1))
else:
    print("download file")
    jsonstat.download(url_1, file_name_1)
    file_path_1 = file_name_1









    



using already donwloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.ec.europa.eu_eurostat/eurostat-name_gpd_c-geo_IT.json

Initialize JsonStatCollection with eurostat data and print some info about the collection.



In [3]:

    
collection_1 = jsonstat.from_file(file_path_1)
collection_1









    Out[3]:




JsonstatCollection contains the following JsonStatDataSet:pos dataset
0 'nama_gdp_c'

Previous collection contains only a dataset named 'nama_gdp_c'



In [4]:

    
nama_gdp_c_1 = collection_1.dataset('nama_gdp_c')
nama_gdp_c_1









    Out[4]:




name:   'nama_gdp_c'title:  'GDP and main components - Current prices'size: 4pos id label size role
0 unit unit 1 
1 indic_na indic_na 1 
2 geo geo 1 
3 time time 69

All dimensions of the dataset 'nama_gdp_c' are of size 1 with exception of time dimension. Let's explore the time dimension.



In [5]:

    
nama_gdp_c_1.dimension('time')









    Out[5]:




pos idx label
0 '1946' '1946'
1 '1947' '1947'
2 '1948' '1948'
3 '1949' '1949'
... ... ...

Get value for year 2012.



In [6]:

    
nama_gdp_c_1.value(time='2012')









    Out[6]:





25700

Convert the jsonstat data into a pandas dataframe.



In [7]:

    
df_1 = nama_gdp_c_1.to_data_frame('time', content='id')
df_1.tail()









    Out[7]:






  
    
      
      unit
      indic_na
      geo
      Value
    
    
      time
      
      
      
      
    
  
  
    
      2010
      EUR_HAB
      B1GM
      IT
      25700.0
    
    
      2011
      EUR_HAB
      B1GM
      IT
      26000.0
    
    
      2012
      EUR_HAB
      B1GM
      IT
      25700.0
    
    
      2013
      EUR_HAB
      B1GM
      IT
      25600.0
    
    
      2014
      EUR_HAB
      B1GM
      IT
      NaN

Adding a simple plot



In [8]:

    
df_1 = df_1.dropna() # remove rows with NaN values
df_1.plot(grid=True, figsize=(20,5))









    Out[8]:





<matplotlib.axes._subplots.AxesSubplot at 0x114bc12b0>

2 - Exploring data with two dimensions (geo, time) with size > 1

Download or use the jsonstat file cached on disk. The cache is used to avoid internet download during the devolopment to make the things a bit faster. You can see the raw data here



In [9]:

    
url_2 = 'http://ec.europa.eu/eurostat/wdds/rest/data/v1.1/json/en/nama_gdp_c?precision=1&geo=IT&geo=FR&unit=EUR_HAB&indic_na=B1GM'
file_name_2 = "eurostat-name_gpd_c-geo_IT_FR.json"

file_path_2 = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.ec.europa.eu_eurostat", file_name_2))
if os.path.exists(file_path_2):
    print("using alredy donwloaded file {}".format(file_path_2))
else:
    print("download file and storing on disk")
    jsonstat.download(url, file_name_2)
    file_path_2 = file_name_2









    



using alredy donwloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.ec.europa.eu_eurostat/eurostat-name_gpd_c-geo_IT_FR.json



In [10]:

    
collection_2 = jsonstat.from_file(file_path_2)
nama_gdp_c_2 = collection_2.dataset('nama_gdp_c')
nama_gdp_c_2









    Out[10]:




name:   'nama_gdp_c'title:  'GDP and main components - Current prices'size: 4pos id label size role
0 unit unit 1 
1 indic_na indic_na 1 
2 geo geo 2 
3 time time 69



In [11]:

    
nama_gdp_c_2.dimension('geo')









    Out[11]:




pos idx label
0 'FR' 'France'
1 'IT' 'Italy'



In [12]:

    
nama_gdp_c_2.value(time='2012',geo='IT')









    Out[12]:





25700



In [13]:

    
nama_gdp_c_2.value(time='2012',geo='FR')









    Out[13]:





31100



In [14]:

    
df_2 = nama_gdp_c_2.to_table(content='id',rtype=pd.DataFrame)
df_2.tail()



In [15]:

    
df_FR_IT = df_2.dropna()[['time', 'geo', 'Value']]
df_FR_IT = df_FR_IT.pivot('time', 'geo', 'Value')
df_FR_IT.plot(grid=True, figsize=(20,5))









    Out[15]:





<matplotlib.axes._subplots.AxesSubplot at 0x114c0f0b8>



In [16]:

    
df_3 = nama_gdp_c_2.to_data_frame('time', content='id', blocked_dims={'geo':'FR'})
df_3 = df_3.dropna()
df_3.plot(grid=True,figsize=(20,5))









    Out[16]:





<matplotlib.axes._subplots.AxesSubplot at 0x1178e7d30>



In [17]:

    
df_4 = nama_gdp_c_2.to_data_frame('time', content='id', blocked_dims={'geo':'IT'})
df_4 = df_4.dropna()
df_4.plot(grid=True,figsize=(20,5))









    Out[17]:





<matplotlib.axes._subplots.AxesSubplot at 0x117947630>

pos	id	label	size	role
0	unit	unit	1
1	indic_na	indic_na	1
2	geo	geo	1
3	time	time	69

pos	idx	label
0	'1946'	'1946'
1	'1947'	'1947'
2	'1948'	'1948'
3	'1949'	'1949'
...	...	...

	unit	indic_na	geo	Value
time
2010	EUR_HAB	B1GM	IT	25700.0
2011	EUR_HAB	B1GM	IT	26000.0
2012	EUR_HAB	B1GM	IT	25700.0
2013	EUR_HAB	B1GM	IT	25600.0
2014	EUR_HAB	B1GM	IT	NaN