oecd-canada-jsonstat_v2


Notebook: using jsonstat.py python library with jsonstat format version 2.

This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON dissemination format. For more information about the format see the official site.

In this notebook it is used the data file oecd-canada-col.json from json-stat.org site. This file is compliant to the version 2 of jsonstat. This notebook is equal to version 1. The only difference is the datasource.


In [1]:
# all import here
from __future__ import print_function
import os
import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe
import jsonstat     # import jsonstat.py package

import matplotlib as plt  # for plotting 
%matplotlib inline

Download or use cached file oecd-canada-col.json. Caching file on disk permits to work off-line and to speed up the exploration of the data.


In [2]:
url = 'http://json-stat.org/samples/oecd-canada-col.json'
file_name = "oecd-canada-col.json"

file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org", file_name))
if os.path.exists(file_path):
    print("using already downloaded file {}".format(file_path))
else:
    print("download file and storing on disk")
    jsonstat.download(url, file_name)
    file_path = file_name


using already downloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.json-stat.org/oecd-canada-col.json

Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.


In [3]:
collection = jsonstat.from_file(file_path)
collection


Out[3]:
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'Unemployment rate in the OECD countries 2003-2014'
1'Population by sex and age group. Canada. 2012'

Select the firt dataset. Oecd dataset has three dimensions (concept, area, year), and contains 432 values.


In [4]:
oecd = collection.dataset(0)
oecd


Out[4]:
name: 'Unemployment rate in the OECD countries 2003-2014'label: 'Unemployment rate in the OECD countries 2003-2014'size: 432
posidlabelsizerole
0conceptindicator1metric
1areaOECD countries, EU15 and total36geo
2year2003-201412time

In [5]:
oecd.dimension('concept')


Out[5]:
posidxlabel
0'UNR''unemployment rate'

In [6]:
oecd.dimension('area')


Out[6]:
posidxlabel
0'AU''Australia'
1'AT''Austria'
2'BE''Belgium'
3'CA''Canada'
.........

In [7]:
oecd.dimension('year')


Out[7]:
posidxlabel
0'2003'''
1'2004'''
2'2005'''
3'2006'''
.........

Shows some detailed info about dimensions.

Accessing value in the dataset

Print the value in oecd dataset for area = IT and year = 2012


In [8]:
oecd.data(area='IT', year='2012')


Out[8]:
JsonStatValue(idx=201, value=10.55546863, status=None)

In [9]:
oecd.value(area='IT', year='2012')


Out[9]:
10.55546863

In [10]:
oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.39663128


Out[10]:
5.39663128

In [11]:
oecd.value(concept='UNR',area='AU',year='2004')


Out[11]:
5.39663128

Trasforming dataset into pandas DataFrame


In [12]:
df_oecd = oecd.to_data_frame('year', content='id')
df_oecd.head()


Out[12]:
concept area Value
year
2003 UNR AU 5.943826
2004 UNR AU 5.396631
2005 UNR AU 5.044791
2006 UNR AU 4.789363
2007 UNR AU 4.379649

In [13]:
df_oecd['area'].describe() # area contains 36 values


Out[13]:
count     432
unique     36
top        IT
freq       12
Name: area, dtype: object

Extract a subset of data in a pandas dataframe from the jsonstat dataset. We can trasform dataset freezing the dimension area to a specific country (Canada)


In [14]:
df_oecd_ca = oecd.to_data_frame('year', content='id', blocked_dims={'area':'CA'})
df_oecd_ca.tail()


Out[14]:
concept area Value
year
2010 UNR CA 7.988900
2011 UNR CA 7.453610
2012 UNR CA 7.323584
2013 UNR CA 7.169742
2014 UNR CA 6.881227

In [15]:
df_oecd_ca['area'].describe()  # area contains only one value (CA)


Out[15]:
count     12
unique     1
top       CA
freq      12
Name: area, dtype: object

In [16]:
df_oecd_ca.plot(grid=True)


Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x11409a908>

Trasforming a dataset into a python list


In [17]:
oecd.to_table()[:5]


Out[17]:
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Australia', '2004', 5.39663128],
 ['unemployment rate', 'Australia', '2005', 5.044790587],
 ['unemployment rate', 'Australia', '2006', 4.789362794]]

It is possible to trasform jsonstat data into table in different order


In [18]:
order = [i.did for i in oecd.dimensions()]
order = order[::-1]  # reverse list
table = oecd.to_table(order=order)
table[:5]


Out[18]:
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Austria', '2003', 4.278559338],
 ['unemployment rate', 'Belgium', '2003', 8.158333333],
 ['unemployment rate', 'Canada', '2003', 7.594616751]]