This experiment takes a dataset about divorces per year after marrige (link: https://www.data.gv.at/katalog/dataset/7fa00c8b-6189-42b8-af93-cc1ebff0a818) and plots the number of divorces per year from 1985 to 2014, for marriges that held between ten and eleven years. The experiment consists of three steps:
In [1]:
import pymongo
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json
import re
from pymongo import MongoClient
%matplotlib inline
In [2]:
client = MongoClient('mongodb')
db = client.dp
collection = db.divorce
Perform the following steps for the transformation:
values objectDURATION field contains a string of the form "x to under y years". Parse the first value x
In [3]:
data = db.divorce.find()[0]['data']
for entry in data:
entry['DIVORCES'] = entry['values'][0]['NUMBER']
s = entry['DURATION']
tmp = re.findall(r'\d+', s)
if (len(tmp) == 1):
tmp[0] = 0
del entry['values']
del entry['NUTS1']
del entry['NUTS2']
entry['DURATION'] = tmp[0]
Transform to JSON for pandas import:
In [4]:
data_json = json.dumps(data)
In [5]:
df = pd.read_json(data_json)
filtered = df[df.DURATION == 10].filter(items=['DIVORCES','REF_YEAR'])
filtered
Out[5]:
In [6]:
filtered.plot.bar(x='REF_YEAR',y='DIVORCES')
Out[6]:
This figure depicts the number of divorces per year between 1985 and 2014, for all marriges that held more than ten, but less than eleven years.
In [ ]: