Author: Pascal, pascal@bayesimpact.org

ROME-FAP Mapping

Skip the run test because the ROME version has to be updated to make it work in the exported repository. TODO: Update ROME and remove the skiptest flag.

There are two distinct classifications of French jobs: the one from ROME (Référentiel Opérationnel des Métiers et Emplois) and from FAP (FAmilles Professionnelles). This document tries to connect those reliably.

The mapping is explained in details on the FAP website see above. However some ROME groups don't have a direct FAP but different jobs part of the same ROME group might have different FAP.

This notebook prepares a spreadsheet that can be used to quickly manually select the correct FAP for the ones that are ambiguous.


In [1]:
from os import path
import codecs
import os
import pandas
from bob_emploi.data_analysis.lib import read_data

data_folder = os.getenv('DATA_FOLDER')
appellations = pandas.read_csv(path.join(data_folder, 'rome/csv/unix_referentiel_appellation_v330_utf8.csv'))
rome_names = pandas.read_csv(path.join(data_folder, 'rome/csv/unix_referentiel_code_rome_v330_utf8.csv'))
fap_names = read_data.parse_intitule_fap(path.join(data_folder, 'intitule_fap2009.txt'))
with codecs.open(path.join(data_folder, 'crosswalks/passage_fap2009_romev3.txt'), 'r', 'latin-1') as fap_file:
    fap_romeq_mapping = read_data.parse_fap_rome_crosswalk(fap_file.readlines())
# parse_fap_rome_crosswalk gives actually qualified ROME codes.
fap_romeq_mapping = fap_romeq_mapping.rename(columns={'rome': 'romeQ'})

In [2]:
fap_romeq_mapping['rome'] = fap_romeq_mapping['romeQ'].apply(lambda s: s[:5])
fap_mapping = fap_romeq_mapping.groupby(['rome','fap'], as_index=False).first()
del(fap_mapping['romeQ'])

In [3]:
flatten_mapping = fap_mapping.groupby('rome', as_index=False).agg({'fap': lambda x: sorted(x.tolist())})
flatten_mapping['fap1'] = flatten_mapping['fap'].apply(lambda x: x[0])
flatten_mapping['fap2'] = flatten_mapping['fap'].apply(lambda x: x[1] if len(x) > 1 else '')
flatten_mapping['fap3'] = flatten_mapping['fap'].apply(lambda x: x[2] if len(x) > 2 else '')
print('There is maximum {:d} FAP codes per ROME.'.format(flatten_mapping['fap'].str.len().max()))
del(flatten_mapping['fap'])
flatten_mapping.head()

# Drop non ambiguous.
ambiguous_romes_fap_mapping = flatten_mapping[flatten_mapping['fap2'] != '']
ambiguous_romes_fap_mapping.head()


There is maximum 3 FAP codes per ROME.
Out[3]:
rome fap1 fap2 fap3
4 A1204 A0Z42 G1Z70 H0Z91
6 A1301 A2Z70 A2Z90
7 A1302 A2Z70 A2Z90
8 A1303 A2Z70 A2Z90
20 A1412 E0Z21 E1Z42

In [4]:
named_mapping = ambiguous_romes_fap_mapping
named_mapping = pandas.merge(
    named_mapping, fap_names,
    left_on='fap1', right_on='fap_code', how='left')
named_mapping.rename(columns={'fap_code': 'fap_code_1', 'fap_name': 'fap_name_1'}, inplace=True)
named_mapping = pandas.merge(
    named_mapping, fap_names,
    left_on='fap2', right_on='fap_code', how='left')
named_mapping.rename(columns={'fap_code': 'fap_code_2', 'fap_name': 'fap_name_2'}, inplace=True)
named_mapping = pandas.merge(
    named_mapping, fap_names,
    left_on='fap3', right_on='fap_code', how='left')
named_mapping.rename(columns={'fap_code': 'fap_code_3', 'fap_name': 'fap_name_3'}, inplace=True)

rome_name_clean = rome_names.rename(columns={'code_rome': 'rome', 'libelle_rome': 'rome_name', 'code_ogr': 'code_ogr_rome'})

named_mapping = pandas.merge(named_mapping, rome_name_clean, on='rome').fillna('')
named_mapping.head(2).transpose()


Out[4]:
0 1
rome A1204 A1301
fap1 A0Z42 A2Z70
fap2 G1Z70 A2Z90
fap3 H0Z91
fap_code_1 A0Z42 A2Z70
fap_name_1 Bûcherons, sylviculteurs salariés et agents fo... Techniciens et agents d'encadrement d'exploita...
fap_code_2 G1Z70 A2Z90
fap_name_2 Techniciens et agents de maîtrise de la mainte... Ingénieurs, cadres techniques de l'agriculture
fap_code_3 H0Z91
fap_name_3 Cadres techniques de la maintenance et de l'en...
code_fiche_em 13 27
code_ogr_rome 10 12
rome_name Protection du patrimoine naturel Conseil et assistance technique en agriculture
statut 1 1

In [5]:
appellations_clean = appellations.rename(columns={'code_rome': 'rome'})

to_resolve = pandas.DataFrame(
    pandas.merge(appellations_clean, named_mapping, on='rome'),
    columns=[
        'rome', 'rome_name',
        'code_ogr', 'libelle_appellation_court',
        'fap_code_1', 'fap_name_1',
        'fap_code_2', 'fap_name_2',
        'fap_code_3', 'fap_name_3',
    ]).fillna('')

to_resolve.to_csv(path.join(data_folder, 'ambiguous_rome_fap.csv'))

to_resolve.head()


Out[5]:
rome rome_name code_ogr libelle_appellation_court fap_code_1 fap_name_1 fap_code_2 fap_name_2 fap_code_3 fap_name_3
0 F1402 Extraction solide 10200 Abatteur / Abatteuse de carrière B0Z20 Ouvriers non qualifiés des travaux publics, du... B1Z40 Ouvriers qualifiés des travaux publics, du bét...
1 F1402 Extraction solide 11149 Artificier / Artificière de carrière B0Z20 Ouvriers non qualifiés des travaux publics, du... B1Z40 Ouvriers qualifiés des travaux publics, du bét...
2 F1402 Extraction solide 11581 Boutefeu B0Z20 Ouvriers non qualifiés des travaux publics, du... B1Z40 Ouvriers qualifiés des travaux publics, du bét...
3 F1402 Extraction solide 11694 Carrier / Carrière B0Z20 Ouvriers non qualifiés des travaux publics, du... B1Z40 Ouvriers qualifiés des travaux publics, du bét...
4 F1402 Extraction solide 11695 Carrier ébaucheur / Carrière ébaucheuse B0Z20 Ouvriers non qualifiés des travaux publics, du... B1Z40 Ouvriers qualifiés des travaux publics, du bét...

You can find the ambiguous ROME FAP mapping file in /data_analysis/notebooks/data/ambiguous_rome_fap.csv or in Google Spreadsheet.