Author: Stephan, stephan.gabler@bayesimpact.org
This data is available on the Pole Emploi website in different formats, so far we only use the XML versions. On a high level, this dataset describes jobs and its skills, activities, work environments and requirements. Jobs, activities and skills are organized in hierarchical structures, the jobs itself according to two different systems. Jobs, Skills and activities each have unique identifiers assigned.
In [1]:
from __future__ import division
import os
import matplotlib.pyplot as plt
import seaborn as sns
from bob_emploi.data_analysis.lib import read_data
data_folder = os.getenv('DATA_FOLDER')
In [2]:
fiche_dicts = read_data.load_fiches_from_xml(os.path.join(data_folder, 'rome/ficheMetierXml'))
rome = [read_data.fiche_extractor(f) for f in fiche_dicts]
In [3]:
n_skills = [len(x['skills']) for x in rome]
ax = sns.distplot(n_skills, kde=False)
ax.set_title('number of skills per job_group');
In [4]:
n_activities = [len(x['activities']) for x in rome]
ax = sns.distplot(n_activities, kde=False)
ax.set_title('number of activities per job_group');
In [5]:
n_titles = [len(x['titles']) for x in rome]
ax = sns.distplot(n_titles, kde=False)
ax.set_title('number of job titles per job_group');