This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON dissemination format. For more information about the format see the official site.
In this notebook it is used the data file oecd-canada-col.json from json-stat.org site. This file is compliant to the version 2 of jsonstat. This notebook is equal to version 1. The only difference is the datasource.
In [1]:
# all import here
from __future__ import print_function
import os
import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe
import jsonstat # import jsonstat.py package
import matplotlib as plt # for plotting
%matplotlib inline
Download or use cached file oecd-canada-col.json. Caching file on disk permits to work off-line and to speed up the exploration of the data.
In [2]:
url = 'http://json-stat.org/samples/oecd-canada-col.json'
file_name = "oecd-canada-col.json"
file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org", file_name))
if os.path.exists(file_path):
print("using already downloaded file {}".format(file_path))
else:
print("download file and storing on disk")
jsonstat.download(url, file_name)
file_path = file_name
Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.
In [3]:
collection = jsonstat.from_file(file_path)
collection
Out[3]:
Select the firt dataset. Oecd dataset has three dimensions (concept, area, year), and contains 432 values.
In [4]:
oecd = collection.dataset(0)
oecd
Out[4]:
In [5]:
oecd.dimension('concept')
Out[5]:
In [6]:
oecd.dimension('area')
Out[6]:
In [7]:
oecd.dimension('year')
Out[7]:
Shows some detailed info about dimensions.
In [8]:
oecd.data(area='IT', year='2012')
Out[8]:
In [9]:
oecd.value(area='IT', year='2012')
Out[9]:
In [10]:
oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.39663128
Out[10]:
In [11]:
oecd.value(concept='UNR',area='AU',year='2004')
Out[11]:
In [12]:
df_oecd = oecd.to_data_frame('year', content='id')
df_oecd.head()
Out[12]:
In [13]:
df_oecd['area'].describe() # area contains 36 values
Out[13]:
Extract a subset of data in a pandas dataframe from the jsonstat dataset. We can trasform dataset freezing the dimension area to a specific country (Canada)
In [14]:
df_oecd_ca = oecd.to_data_frame('year', content='id', blocked_dims={'area':'CA'})
df_oecd_ca.tail()
Out[14]:
In [15]:
df_oecd_ca['area'].describe() # area contains only one value (CA)
Out[15]:
In [16]:
df_oecd_ca.plot(grid=True)
Out[16]:
In [17]:
oecd.to_table()[:5]
Out[17]:
It is possible to trasform jsonstat data into table in different order
In [18]:
order = [i.did for i in oecd.dimensions()]
order = order[::-1] # reverse list
table = oecd.to_table(order=order)
table[:5]
Out[18]: