Notebook: using jsonstat.py to explore ISTAT data (house price index)

This Jupyter notebook shows how to use jsonstat.py python library to explore Istat data. Istat is Italian National Institute of Statistics. It publishs a rest api for querying italian statistics.

We starts importing some modules.


In [1]:
from __future__ import print_function
import os
import istat
from IPython.core.display import HTML

Step 1: using istat module to get a jsonstat collection

Following code sets a cache dir where to store json files download by Istat api. Storing file on disk speed up development, and assures consistent results over time. Anyway you can delete file to donwload a fresh copy.


In [2]:
cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached"))
istat.cache_dir(cache_dir)
print("cache_dir is '{}'".format(istat.cache_dir()))


cache_dir is '/Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tmp/istat_cached'

Using istat api, we can shows the istat areas used to categorize the datasets


In [3]:
istat.areas()


Out[3]:
iddesc
32011 Population and housing census
4Enterprises
7Environment and Energy
8Population and Households
9Households Economic Conditions and Disparities
10Health statistics
11Social Security and Welfare
12Education and training
13Communication, culture and leisure
14Justice and Security
15Citizens' opinions and satisfaction with life
16Social participation
17National Accounts
19Agriculture
20Industry and Construction
21Services
22Public Administrations and Private Institutions
24External Trade and Internationalisation
25Prices
26Labour

Following code list all datasets contained into area Prices.


In [4]:
istat_area_prices = istat.area('Prices')
istat_area_prices.datasets()


Out[4]:
codnamedim
DCSC_FABBRESID_1Construction costs index - monthly data5
DCSC_PREZPRODSERV_1Services producer prices index5
DCSC_PREZZPIND_1Producer price index for industrial products - monthly data6
DCSP_FOI1FOI – Monthly data until 20105
DCSP_FOI1B2010FOI - Monthly data from 2011 to 20155
DCSP_FOI1B2015FOI - Monthly data from 2016 onwards5
DCSP_FOI2FOI – Annual average until 20105
DCSP_FOI2B2010FOI – Annual average from 2011 onwards5
DCSP_FOI2B2015FOI - Annual average from 2016 onwards5
DCSP_FOI3FOI – Weights until 20104
DCSP_FOI3B2010FOI - Weights from 2011 to 20154
DCSP_FOI3B2015FOI - Weights from 2016 onwards4
DCSP_IPABHouse price index 5
DCSP_IPCA1HICP - Monthly data from 2001 to 2015 (base 2005=100)5
DCSP_IPCA1B2015HICP - Monthly data from 2001 onwards (base 2015=100)5
DCSP_IPCA2HICP - Annual average from 2001 to 2015 (base 2005=100)5
DCSP_IPCA2B2015HICP - Annual average from 2001 onwards (base 2015=100)5
DCSP_IPCA3HICP – Weights from 2001 onwards4
DCSP_IPCATC1HICP at constant tax rates - Monthly data from 2002 to 2015 (base 2005=100)5
DCSP_IPCATC1B2015HICP at constant tax rates - Monthly data from 2002 onwards (base 2015=100)5
DCSP_IPCATC2HICP at constant tax rates - Annual average from 2002 to 2015 (base 2005=100)5
DCSP_IPCATC2B2015HICP at constant tax rates - Annual average from 2002 onwards (base 2015=100)5
DCSP_NIC1B2015NIC - Monthly data from 2016 onwards5
DCSP_NIC3B2015NIC - Weights from 2016 onwards4
DCSP_NICDUENIC – Annual average until 20105
DCSP_NICDUEB2010NIC – Annual average from 2011 onwards5
DCSP_NICTRENIC – Weights until 20104
DCSP_NICTREB2010NIC - Weights from 2011 to 20154
DCSP_NICUNOBNIC – Monthly data until 20105
DCSP_NICUNOBB2010NIC - Monthly data from 2011 to 20155

List all dimension for dataset DCSP_IPAB (House price index)


In [5]:
istat_dataset_dcsp_ipab = istat_area_prices.dataset('DCSP_IPAB')
istat_dataset_dcsp_ipab


Out[5]:
DCSP_IPAB(5):House price index
nrnamenr. valuesvalues (first 3 values)
0Territory11:'Italy'
1Index type318:'house price index (base 2010=100) - quarterly data', 19:'house price index (base 2010=100) - annual average', 20:'house price index (base 2010=100) - weights' ...
2Measure58:'annual average rate of change', 4:'index number', 22:'not applicable' ...
3Purchases of dwellings34:'H1 - all items', 5:'H11 - new dwellings', 6:'H12 - existing dwellings' ...
4Time and frequency292112:'Q1-2011', 2178:'Q3-2014', 2116:'Q2-2011' ...

Finally from istat dataset we extracts data in jsonstat format by specifying dimensions we are interested.


In [6]:
spec = { 
    "Territory": 1, "Index type": 18, 
    # "Measure": 0, # "Purchases of dwelling": 0, # "Time and frequency": 0
}
# convert istat dataset into jsonstat collection and print some info
collection = istat_dataset_dcsp_ipab.getvalues(spec)
collection


Out[6]:
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'IDMISURA1*IDTYPPURCH*IDTIME'

The previous call is equivalent to call istat api with a "1,18,0,0,0" string of number. Below is the mapping from the number and dimensions:

dimension
Territory 1 Italy
Type 18 house price index (base 2010=100) - quarterly data'
Measure 0 ALL
Purchase of dwelling 0 ALL
Time and frequency 0 ALL

In [7]:
json_stat_data = istat_dataset_dcsp_ipab.getvalues("1,18,0,0,0")
json_stat_data


Out[7]:
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'IDMISURA1*IDTYPPURCH*IDTIME'

step2: using jsonstat.py api.

Now we have a jsonstat collection, let expore it with the api of jsonstat.py

Print some info of one dataset contained into the above jsonstat collection


In [8]:
jsonstat_dataset = collection.dataset('IDMISURA1*IDTYPPURCH*IDTIME')
jsonstat_dataset


Out[8]:
name: 'IDMISURA1*IDTYPPURCH*IDTIME'label: 'House price index by Measure, Purchases of dwellings and Time and frequency - Italy - house price index (base 2010=100) - quarterly data'size: 207
posidlabelsizerole
0IDMISURA1Measure3
1IDTYPPURCHPurchases of dwellings3
2IDTIMETime and frequency23

Print info about the dimensions to get an idea about the data


In [9]:
jsonstat_dataset.dimension('IDMISURA1')


Out[9]:
posidxlabel
0'4''index number'
1'6''percentage changes on the previous period'
2'7''percentage changes on the same period of the previous year'

In [10]:
jsonstat_dataset.dimension('IDTYPPURCH')


Out[10]:
posidxlabel
0'4''H1 - all items'
1'5''H11 - new dwellings'
2'6''H12 - existing dwellings'

In [11]:
jsonstat_dataset.dimension('IDTIME')


Out[11]:
posidxlabel
0'2093''Q1-2010'
1'2097''Q2-2010'
2'2102''Q3-2010'
3'2106''Q4-2010'
.........

In [12]:
import pandas as pd
df = jsonstat_dataset.to_table(rtype=pd.DataFrame)
df.head()


Out[12]:
Measure Purchases of dwellings Time and frequency Value
0 index number H1 - all items Q1-2010 99.5
1 index number H1 - all items Q2-2010 100.0
2 index number H1 - all items Q3-2010 100.3
3 index number H1 - all items Q4-2010 100.2
4 index number H1 - all items Q1-2011 100.1

In [13]:
filtered = df.loc[
    (df['Measure'] == 'index number') & (df['Purchases of dwellings'] == 'H1 - all items'), 
    ['Time and frequency', 'Value']
]
filtered.set_index('Time and frequency')


Out[13]:
Value
Time and frequency
Q1-2010 99.5
Q2-2010 100.0
Q3-2010 100.3
Q4-2010 100.2
Q1-2011 100.1
Q2-2011 101.2
Q3-2011 101.2
Q4-2011 100.5
Q1-2012 99.9
Q2-2012 99.1
Q3-2012 97.4
Q4-2012 95.3
Q1-2013 93.9
Q2-2013 93.3
Q3-2013 91.9
Q4-2013 90.2
Q1-2014 89.3
Q2-2014 88.7
Q3-2014 88.3
Q4-2014 86.9
Q1-2015 86.1
Q2-2015 86.1
Q3-2015 86.3

In [14]:
%matplotlib inline
import matplotlib.pyplot as plt

values = filtered['Value'].tolist()
labels = filtered['Time and frequency']

xs = [i + 0.1 for i, _ in enumerate(values)]
# bars are by default width 0.8, so we'll add 0.1 to the left coordinates 
# so that each bar is centered

# plot bars with left x-coordinates [xs], heights [num_oscars]
plt.figure(figsize=(15,4))
plt.bar(xs, values)
plt.ylabel("value")
plt.title("house index")

# label x-axis with movie names at bar centers
plt.xticks([i + 0.5 for i, _ in enumerate(labels)], labels, rotation='vertical') 
plt.show()