Notebook: using jsonstat.py python library with jsonstat format version 2.

This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON dissemination format. For more information about the format see the official site.

In this notebook it is used the data file oecd-canada-col.json from json-stat.org site. This file is compliant to the version 2 of jsonstat. This notebook is equal to version 1. The only difference is the datasource.



In [1]:

    
# all import here
from __future__ import print_function
import os
import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe
import jsonstat     # import jsonstat.py package

import matplotlib as plt  # for plotting 
%matplotlib inline

Download or use cached file oecd-canada-col.json. Caching file on disk permits to work off-line and to speed up the exploration of the data.



In [2]:

    
url = 'http://json-stat.org/samples/oecd-canada-col.json'
file_name = "oecd-canada-col.json"

file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org", file_name))
if os.path.exists(file_path):
    print("using already downloaded file {}".format(file_path))
else:
    print("download file and storing on disk")
    jsonstat.download(url, file_name)
    file_path = file_name









    



using already downloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.json-stat.org/oecd-canada-col.json

Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.



In [3]:

    
collection = jsonstat.from_file(file_path)
collection









    Out[3]:




JsonstatCollection contains the following JsonStatDataSet:pos dataset
0 'Unemployment rate in the OECD countries 2003-2014'
1 'Population by sex and age group. Canada. 2012'

Select the firt dataset. Oecd dataset has three dimensions (concept, area, year), and contains 432 values.



In [4]:

    
oecd = collection.dataset(0)
oecd









    Out[4]:




name:   'Unemployment rate in the OECD countries 2003-2014'label:  'Unemployment rate in the OECD countries 2003-2014'size: 432pos id label size role
0 concept indicator 1 metric
1 area OECD countries, EU15 and total 36 geo
2 year 2003-2014 12 time



In [5]:

    
oecd.dimension('concept')









    Out[5]:




pos idx label
0 'UNR' 'unemployment rate'



In [6]:

    
oecd.dimension('area')









    Out[6]:




pos idx label
0 'AU' 'Australia'
1 'AT' 'Austria'
2 'BE' 'Belgium'
3 'CA' 'Canada'
... ... ...



In [7]:

    
oecd.dimension('year')









    Out[7]:




pos idx label
0 '2003' ''
1 '2004' ''
2 '2005' ''
3 '2006' ''
... ... ...

Shows some detailed info about dimensions.

Accessing value in the dataset

Print the value in oecd dataset for area = IT and year = 2012



In [8]:

    
oecd.data(area='IT', year='2012')









    Out[8]:





JsonStatValue(idx=201, value=10.55546863, status=None)



In [9]:

    
oecd.value(area='IT', year='2012')









    Out[9]:





10.55546863



In [10]:

    
oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.39663128









    Out[10]:





5.39663128



In [11]:

    
oecd.value(concept='UNR',area='AU',year='2004')









    Out[11]:





5.39663128

Trasforming dataset into pandas DataFrame



In [12]:

    
df_oecd = oecd.to_data_frame('year', content='id')
df_oecd.head()



In [13]:

    
df_oecd['area'].describe() # area contains 36 values









    Out[13]:





count     432
unique     36
top        IT
freq       12
Name: area, dtype: object

Extract a subset of data in a pandas dataframe from the jsonstat dataset. We can trasform dataset freezing the dimension area to a specific country (Canada)



In [14]:

    
df_oecd_ca = oecd.to_data_frame('year', content='id', blocked_dims={'area':'CA'})
df_oecd_ca.tail()



In [15]:

    
df_oecd_ca['area'].describe()  # area contains only one value (CA)









    Out[15]:





count     12
unique     1
top       CA
freq      12
Name: area, dtype: object



In [16]:

    
df_oecd_ca.plot(grid=True)









    Out[16]:





<matplotlib.axes._subplots.AxesSubplot at 0x11409a908>

Trasforming a dataset into a python list



In [17]:

    
oecd.to_table()[:5]









    Out[17]:





[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Australia', '2004', 5.39663128],
 ['unemployment rate', 'Australia', '2005', 5.044790587],
 ['unemployment rate', 'Australia', '2006', 4.789362794]]

It is possible to trasform jsonstat data into table in different order



In [18]:

    
order = [i.did for i in oecd.dimensions()]
order = order[::-1]  # reverse list
table = oecd.to_table(order=order)
table[:5]









    Out[18]:





[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Austria', '2003', 4.278559338],
 ['unemployment rate', 'Belgium', '2003', 8.158333333],
 ['unemployment rate', 'Canada', '2003', 7.594616751]]

pos	dataset
0	'Unemployment rate in the OECD countries 2003-2014'
1	'Population by sex and age group. Canada. 2012'

pos	id	label	size	role
0	concept	indicator	1	metric
1	area	OECD countries, EU15 and total	36	geo
2	year	2003-2014	12	time

pos	idx	label
0	'AU'	'Australia'
1	'AT'	'Austria'
2	'BE'	'Belgium'
3	'CA'	'Canada'
...	...	...

	concept	area	Value
year
2003	UNR	AU	5.943826
2004	UNR	AU	5.396631
2005	UNR	AU	5.044791
2006	UNR	AU	4.789363
2007	UNR	AU	4.379649

	concept	area	Value
year
2010	UNR	CA	7.988900
2011	UNR	CA	7.453610
2012	UNR	CA	7.323584
2013	UNR	CA	7.169742
2014	UNR	CA	6.881227