This Jupyter notebook shows how to use jsonstat.py python library to explore Istat data. Istat is Italian National Institute of Statistics. It publishs a rest api for querying italian statistics.
We starts importing some modules.
In [1]:
from __future__ import print_function
import os
import istat
from IPython.core.display import HTML
Following code sets a cache dir where to store json files download by Istat api. Storing file on disk speed up development, and assures consistent results over time. Anyway you can delete file to donwload a fresh copy.
In [2]:
cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached"))
istat.cache_dir(cache_dir)
print("cache_dir is '{}'".format(istat.cache_dir()))
Using istat api, we can shows the istat areas used to categorize the datasets
In [3]:
istat.areas()
Out[3]:
Following code list all datasets contained into area Prices
.
In [4]:
istat_area_prices = istat.area('Prices')
istat_area_prices.datasets()
Out[4]:
List all dimension for dataset DCSP_IPAB
(House price index)
In [5]:
istat_dataset_dcsp_ipab = istat_area_prices.dataset('DCSP_IPAB')
istat_dataset_dcsp_ipab
Out[5]:
Finally from istat dataset we extracts data in jsonstat format by specifying dimensions we are interested.
In [6]:
spec = {
"Territory": 1, "Index type": 18,
# "Measure": 0, # "Purchases of dwelling": 0, # "Time and frequency": 0
}
# convert istat dataset into jsonstat collection and print some info
collection = istat_dataset_dcsp_ipab.getvalues(spec)
collection
Out[6]:
The previous call is equivalent to call istat api with a "1,18,0,0,0" string of number. Below is the mapping from the number and dimensions:
dimension | ||
---|---|---|
Territory | 1 | Italy |
Type | 18 | house price index (base 2010=100) - quarterly data' |
Measure | 0 | ALL |
Purchase of dwelling | 0 | ALL |
Time and frequency | 0 | ALL |
In [7]:
json_stat_data = istat_dataset_dcsp_ipab.getvalues("1,18,0,0,0")
json_stat_data
Out[7]:
Now we have a jsonstat collection, let expore it with the api of jsonstat.py
Print some info of one dataset contained into the above jsonstat collection
In [8]:
jsonstat_dataset = collection.dataset('IDMISURA1*IDTYPPURCH*IDTIME')
jsonstat_dataset
Out[8]:
Print info about the dimensions to get an idea about the data
In [9]:
jsonstat_dataset.dimension('IDMISURA1')
Out[9]:
In [10]:
jsonstat_dataset.dimension('IDTYPPURCH')
Out[10]:
In [11]:
jsonstat_dataset.dimension('IDTIME')
Out[11]:
In [12]:
import pandas as pd
df = jsonstat_dataset.to_table(rtype=pd.DataFrame)
df.head()
Out[12]:
In [13]:
filtered = df.loc[
(df['Measure'] == 'index number') & (df['Purchases of dwellings'] == 'H1 - all items'),
['Time and frequency', 'Value']
]
filtered.set_index('Time and frequency')
Out[13]:
In [14]:
%matplotlib inline
import matplotlib.pyplot as plt
values = filtered['Value'].tolist()
labels = filtered['Time and frequency']
xs = [i + 0.1 for i, _ in enumerate(values)]
# bars are by default width 0.8, so we'll add 0.1 to the left coordinates
# so that each bar is centered
# plot bars with left x-coordinates [xs], heights [num_oscars]
plt.figure(figsize=(15,4))
plt.bar(xs, values)
plt.ylabel("value")
plt.title("house index")
# label x-axis with movie names at bar centers
plt.xticks([i + 0.5 for i, _ in enumerate(labels)], labels, rotation='vertical')
plt.show()