Author: Pascal, pascal@bayesimpact.org

Updated: 2018-06-14

BMO across Years

BMO means "Besoin en Main d'Oeuvre" and analyzes the working force needed (it is done by calling employers all over France). It's done once per year. See the official website for more details.

In Bob Emploi we want to display BMO data useful for the user. One way we want to show some context is to give the values for several years. Before doing that we want to make sure that no user would get surprised by the data for their specific case.



In [1]:

    
import glob
import os
from os import path
import re

import pandas
import seaborn as sns
sns.set()

bmo_file_names = glob.glob(path.join(os.getenv('DATA_FOLDER'), 'bmo/bmo_*'))
bmo_df_dict = {}
for bmo_file_name in sorted(bmo_file_names):
    df = pandas.read_csv(bmo_file_name, dtype={'DEPARTEMENT_CODE': str})
    year = int(re.search(r'\d+', bmo_file_name).group())
    df['year'] = year
    bmo_df_dict[year] = df
    
sorted(bmo_df_dict.keys())









    Out[1]:





[2014, 2015, 2016, 2017, 2018]

Check that all the CSV files have the same structure.



In [2]:

    
def assert_all_the_same(dataframes, select_data_func, name='Values'):
    first = None
    for key, df in sorted(dataframes.items(), key=lambda kv: kv[0]):
        if first is None:
            first = select_data_func(df)
            continue
        other = select_data_func(df)
        if first - other:
            print('{} removed in {}:\n{}'.format(name, key, sorted(first - other)))
        if other - first:
            print('{} added in {}:\n{}'.format(name, key, sorted(other - first)))

assert_all_the_same(bmo_df_dict, lambda df: set(df.columns), name='Columns')









    



Columns removed in 2015:
['ROME_PROFESSION_CARD_CODE', 'ROME_PROFESSION_CARD_NAME']
Columns added in 2015:
['FAP_CODE', 'FAP_NAME']
Columns removed in 2016:
['ROME_PROFESSION_CARD_CODE', 'ROME_PROFESSION_CARD_NAME']
Columns added in 2016:
['FAP_CODE', 'FAP_NAME']
Columns removed in 2017:
['ROME_PROFESSION_CARD_CODE', 'ROME_PROFESSION_CARD_NAME']
Columns added in 2017:
['FAP_CODE', 'FAP_NAME']
Columns removed in 2018:
['ROME_PROFESSION_CARD_CODE', 'ROME_PROFESSION_CARD_NAME']
Columns added in 2018:
['FAP_CODE', 'FAP_NAME']

Hmm, this is fishy, why would the BMO be by ROME? Let's check the values:



In [3]:

    
bmo_df_dict[2014].ROME_PROFESSION_CARD_CODE.unique()









    Out[3]:





array(['A0Z40', 'A0Z41', 'Q2Z90', 'V2Z91', 'R4Z93', 'D2Z41', 'A0Z42',
       'A1Z40', 'U1Z81', 'A0Z43', 'V2Z90', 'B6Z70', 'A1Z41', 'D1Z40',
       'L1Z60', 'A1Z42', 'A2Z90', 'A3Z40', 'B2Z40', 'A2Z70', 'A3Z41',
       'A3Z90', 'B0Z20', 'B0Z21', 'B2Z43', 'B2Z44', 'D2Z42', 'B1Z40',
       'V2Z93', 'E1Z42', 'W0Z80', 'Q2Z91', 'B2Z41', 'B2Z42', 'B4Z41',
       'B3Z20', 'B6Z72', 'M2Z92', 'B4Z44', 'B6Z71', 'B4Z43', 'G1Z80',
       'B7Z91', 'B4Z42', 'B5Z40', 'C0Z20', 'D0Z20', 'B6Z73', 'D4Z41',
       'D6Z70', 'G0B40', 'U0Z80', 'D2Z40', 'B7Z90', 'C2Z70', 'D3Z20',
       'C1Z40', 'H0Z91', 'J3Z43', 'D1Z41', 'C2Z71', 'C2Z80', 'E1Z41',
       'E0Z20', 'D6Z80', 'E0Z22', 'E0Z23', 'E0Z24', 'E0Z21', 'E1Z43',
       'G0A40', 'D6Z71', 'L2Z60', 'D4Z40', 'F3Z41', 'E1Z40', 'E2Z80',
       'F0Z20', 'E1Z46', 'E1Z47', 'F4Z20', 'G0A41', 'E2Z70', 'F1Z41',
       'F4Z41', 'F1Z40', 'F5Z70', 'F2Z20', 'E1Z44', 'G0A42', 'G0B41',
       'H0Z92', 'G0A43', 'G1Z70', 'M0Z60', 'L2Z61', 'L5Z90', 'L3Z80',
       'H0Z90', 'J0Z20', 'G1Z71', 'J1Z40', 'J3Z40', 'J3Z42', 'R2Z83',
       'J3Z44', 'K0Z20', 'J3Z41', 'J4Z40', 'L0Z60', 'J1Z80', 'M2Z90',
       'J5Z80', 'J5Z60', 'N0Z90', 'Q0Z60', 'J6Z90', 'J6Z91', 'J5Z62',
       'J6Z92', 'M1Z80', 'J5Z61', 'R0Z61', 'J4Z60', 'L4Z80', 'J4Z80',
       'K0Z40', 'R1Z60', 'L5Z92', 'M1Z81', 'L4Z81', 'L6Z90', 'Q1Z80',
       'R0Z60', 'L5Z91', 'Q1Z81', 'R4Z90', 'S3Z90', 'T2B60', 'T3Z61',
       'R1Z61', 'M2Z91', 'N0Z91', 'S2Z81', 'P3Z90', 'R4Z92', 'T0Z60',
       'R1Z62', 'R1Z67', 'R4Z91', 'S0Z40', 'U0Z81', 'R3Z80', 'R2Z80',
       'S0Z20', 'R1Z66', 'R1Z63', 'S1Z20', 'T4Z61', 'S0Z42', 'R3Z82',
       'T2A60', 'T3Z60', 'S1Z40', 'S0Z41', 'U0Z90', 'V3Z70', 'U1Z80',
       'S1Z80', 'S2Z60', 'S2Z61', 'S2Z80', 'T4Z60', 'T6Z61', 'T1Z60',
       'U0Z91', 'T4Z62', 'U1Z91', 'V0Z60', 'V3Z71', 'V3Z90', 'U1Z82',
       'V4Z80', 'U0Z92', 'U1Z92', 'U1Z93', 'V2Z92', 'V3Z80', 'V1Z80',
       'V4Z83', 'V1Z81', 'W0Z92', 'V5Z81', 'V5Z82', 'V5Z84', 'V4Z85',
       'W1Z80', 'W0Z90', 'W0Z91'], dtype=object)

OK, those are actually the FAP code, so we only need some columns renaming.



In [4]:

    
for df in bmo_df_dict.values():
    df.rename(columns={
            'ROME_PROFESSION_CARD_CODE': 'FAP_CODE',
            'ROME_PROFESSION_CARD_NAME': 'FAP_NAME',
        }, inplace=True)
assert_all_the_same(bmo_df_dict, lambda df: set(df.columns), name='Columns')

Let's check the column types:



In [5]:

    
for column in bmo_df_dict[2017].columns:
    assert_all_the_same(
        bmo_df_dict,
        lambda df: set([df.dtypes[column]]),
        name='Column Type for {}'.format(column))









    



Column Type for NB_DIFF_RECRUT_PROJECTS removed in 2015:
[dtype('O')]
Column Type for NB_DIFF_RECRUT_PROJECTS added in 2015:
[dtype('float64')]
Column Type for NB_DIFF_RECRUT_PROJECTS removed in 2016:
[dtype('O')]
Column Type for NB_DIFF_RECRUT_PROJECTS added in 2016:
[dtype('float64')]
Column Type for NB_RECRUT_PROJECTS removed in 2015:
[dtype('O')]
Column Type for NB_RECRUT_PROJECTS added in 2015:
[dtype('float64')]
Column Type for NB_RECRUT_PROJECTS removed in 2016:
[dtype('O')]
Column Type for NB_RECRUT_PROJECTS added in 2016:
[dtype('float64')]
Column Type for NB_SEASON_RECRUT_PROJECTS removed in 2015:
[dtype('O')]
Column Type for NB_SEASON_RECRUT_PROJECTS added in 2015:
[dtype('float64')]
Column Type for NB_SEASON_RECRUT_PROJECTS removed in 2016:
[dtype('O')]
Column Type for NB_SEASON_RECRUT_PROJECTS added in 2016:
[dtype('float64')]

Ouch, those numbers should all be floats not objects. Let's check some values:



In [6]:

    
bmo_df_dict[2014].NB_RECRUT_PROJECTS.head()









    Out[6]:





0    39,27177888
1    97,19287631
2             85
3    152,0641712
4    170,6651188
Name: NB_RECRUT_PROJECTS, dtype: object



In [7]:

    
for field in ('NB_RECRUT_PROJECTS', 'NB_DIFF_RECRUT_PROJECTS', 'NB_SEASON_RECRUT_PROJECTS'):
    for bmo_df in bmo_df_dict.values():
        bmo_df[field] = bmo_df[field].astype(str)\
            .str.replace(',', '.')\
            .str.replace(' ', '')\
            .replace('-', '0')\
            .replace('*', '0')\
            .astype(float)
bmo_df_dict[2014].NB_RECRUT_PROJECTS.head()









    Out[7]:





0     39.271779
1     97.192876
2     85.000000
3    152.064171
4    170.665119
Name: NB_RECRUT_PROJECTS, dtype: float64

After this cleanup, let's check again if the column types are consistant.



In [8]:

    
for column in bmo_df_dict[2017].columns:
    assert_all_the_same(
        bmo_df_dict,
        lambda df: set([df.dtypes[column]]),
        name='Column Type for {}'.format(column))

Cool, now column types are the same for each year.

Let's check if we have the same values in key fields, and do some point checks that were known to fail in previous versions:



In [9]:

    
assert_all_the_same(bmo_df_dict, lambda df: set(df.FAP_CODE.tolist()), name='FAP Codes')
assert_all_the_same(bmo_df_dict, lambda df: set(df.DEPARTEMENT_CODE.tolist()), name='Departement Codes')

fap_codes = set(bmo_df_dict[2016].FAP_CODE.tolist())
if 'A0Z40' not in fap_codes:
    print('FAP A0Z40 is missing.')

departement_codes = set(bmo_df_dict[2016].DEPARTEMENT_CODE.tolist())
if '2A' not in departement_codes:
    print('Département 2A is missing.')
if '01' not in departement_codes:
    print('Département 01 is missing.')









    



FAP Codes added in 2018:
['V5Z00']
Departement Codes removed in 2015:
['1', '2', '3', '4', '5', '6', '7', '8', '9']
Departement Codes added in 2015:
['01', '02', '03', '04', '05', '06', '07', '08', '09']
Departement Codes removed in 2017:
['1', '2', '3', '4', '5', '6', '7', '8', '9']
Departement Codes added in 2017:
['01', '02', '03', '04', '05', '06', '07', '08', '09']
Département 01 is missing.

So it seems that some départements are not correctly set. And there's a new FAP code in 2018.



In [10]:

    
for _, df in bmo_df_dict.items():
    df['DEPARTEMENT_CODE'] = df.DEPARTEMENT_CODE.str.pad(2, fillchar='0')
assert_all_the_same(bmo_df_dict, lambda df: set(df['DEPARTEMENT_CODE'].tolist()), name='Departement Codes')
departement_codes = set(bmo_df_dict[2016].DEPARTEMENT_CODE.tolist())
assert('2A' in departement_codes)
assert('01' in departement_codes)

OK, we've fixed the départements.

What about the extra FAP code?



In [11]:

    
bmo_2018 = bmo_df_dict[2018]
v5z_counts = bmo_2018[bmo_2018.FAP_CODE.str.startswith('V5')]\
    .groupby('FAP_CODE').NB_RECRUT_PROJECTS.sum().to_frame()
v5z_counts['name'] = bmo_2018[['FAP_CODE', 'FAP_NAME']]\
    .drop_duplicates().set_index('FAP_CODE').FAP_NAME
v5z_counts









    Out[11]:







  
    
      
      NB_RECRUT_PROJECTS
      name
    
    
      FAP_CODE
      
      
    
  
  
    
      V5Z00
      27.0
      Exploitants d'équipements sportifs et culturels
    
    
      V5Z81
      85313.0
      Professionnels de l'animation socioculturelle ...
    
    
      V5Z82
      22926.0
      Sportifs et animateurs sportifs
    
    
      V5Z84
      13202.0
      Surveillants d'établissements scolaires (surve...

Hmm OK, this does not seem to be a bug, just a type of job that is rare as there's very few hiring planned for that type of job for the whole France.

All seems good now: the Data Frame have the same columns, and the same set of values in the important columns.

Let's compare global stats.



In [12]:

    
bmo_df = pandas.concat(bmo_df_dict[year] for year in sorted(bmo_df_dict.keys()))



In [13]:

    
bmo_df.groupby(['year']).count()









    Out[13]:







  
    
      
      CATCHMENT_AREA_CODE
      CATCHMENT_AREA_NAME
      DEPARTEMENT_CODE
      DEPARTEMENT_NAME
      FAP_CODE
      FAP_NAME
      NB_DIFF_RECRUT_PROJECTS
      NB_RECRUT_PROJECTS
      NB_SEASON_RECRUT_PROJECTS
      PROFESSION_FAMILY_CODE
      PROFESSION_FAMILY_NAME
      REGION_CODE
      REGION_LIB
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2014
      39095
      39095
      39095
      39095
      39095
      39095
      39095
      39095
      39095
      39095
      39095
      39095
      39095
    
    
      2015
      38732
      38732
      38732
      38732
      38732
      38732
      38732
      38732
      38732
      38732
      38732
      38732
      38732
    
    
      2016
      39501
      39501
      39501
      39500
      39501
      39501
      39501
      39501
      39501
      39501
      39501
      39501
      39501
    
    
      2017
      42429
      42429
      42429
      42428
      42429
      42429
      42429
      42429
      42429
      42429
      42429
      42429
      42429
    
    
      2018
      45182
      45182
      45182
      45181
      45182
      45182
      45182
      45182
      45182
      45182
      45182
      45182
      45182



In [14]:

    
bmo_df.groupby(['year']).sum()









    Out[14]:







  
    
      
      CATCHMENT_AREA_CODE
      NB_DIFF_RECRUT_PROJECTS
      NB_RECRUT_PROJECTS
      NB_SEASON_RECRUT_PROJECTS
      REGION_CODE
    
    
      year
      
      
      
      
      
    
  
  
    
      2014
      198679531
      5.899533e+05
      1.700541e+06
      667037.379418
      1981300
    
    
      2015
      194959082
      5.632995e+05
      1.739277e+06
      683418.511234
      1944091
    
    
      2016
      198859068
      5.915074e+05
      1.827271e+06
      741914.580000
      1982872
    
    
      2017
      212328291
      7.407540e+05
      1.976268e+06
      775679.000000
      2194196
    
    
      2018
      237433992
      1.016211e+06
      2.318471e+06
      800877.000000
      2364854

In early 2017 when BMO 2017 was released, it had the same values as BMO 2016. Obviously this is not the case for BMO 2018, so we can use the data.

Evolutions

Let's get an idea on how the BMO is evolving locally. Here are some basic questions:

Which are the top 5 département with more hiring than last year (in %)?
Which are the top 5 job families with more hiring than last year (in %)?

Let's try merging the data per local market. First let's check if we can merge on the "bassin d'emploi" level:



In [15]:

    
assert_all_the_same(bmo_df_dict, lambda df: set(df.CATCHMENT_AREA_CODE.tolist()), name='Catchment Area Codes')









    



Catchment Area Codes removed in 2015:
[9317, 9318]
Catchment Area Codes added in 2015:
[105, 106, 107, 108, 109, 110, 111, 112, 2316, 4317, 7329, 9326, 9331, 9332]
Catchment Area Codes removed in 2016:
[9309, 9310, 9317, 9318]
Catchment Area Codes added in 2016:
[105, 106, 107, 108, 109, 110, 111, 112, 2316, 4317, 7329, 9326, 9329, 9331, 9332, 9333]
Catchment Area Codes removed in 2017:
[1101, 1102, 1103, 1113, 1114, 1135, 1136, 1144, 1145, 1160, 1165, 1166, 1176, 1177, 1178, 1187, 1189, 1194, 1195, 9309, 9310, 9317, 9318]
Catchment Area Codes added in 2017:
[105, 106, 107, 108, 109, 110, 111, 112, 1107, 1108, 1109, 1115, 1116, 1117, 1118, 1119, 1124, 1125, 1126, 1127, 1128, 1129, 1137, 1138, 1139, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 2316, 4317, 7329, 9119, 9120, 9326, 9329, 9331, 9332, 9333]
Catchment Area Codes removed in 2018:
[1101, 1102, 1103, 1113, 1114, 1135, 1136, 1144, 1145, 1160, 1165, 1166, 1176, 1177, 1178, 1187, 1189, 1194, 1195, 2102, 2104, 2107, 2108, 2111, 2112, 2113, 2114, 2115, 2116, 2117, 2118, 2119, 2120, 2121, 2201, 2202, 2203, 2204, 2205, 2206, 2207, 2208, 2210, 2211, 2212, 2213, 2214, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310, 2311, 2312, 2313, 2314, 2501, 2502, 2503, 2504, 2505, 2506, 2507, 2508, 2509, 2510, 2511, 2512, 2601, 2602, 2603, 2604, 2605, 2606, 2607, 2608, 2609, 2610, 2611, 2612, 2613, 2614, 2615, 2616, 2618, 3101, 3102, 3103, 3104, 3105, 3106, 3107, 3108, 3109, 3110, 3111, 3112, 3113, 3114, 3115, 4101, 4102, 4103, 4104, 4105, 4106, 4107, 4108, 4109, 4110, 4111, 4112, 4113, 4114, 4115, 4116, 4201, 4202, 4203, 4204, 4205, 4207, 4209, 4210, 4211, 4212, 4213, 4214, 4301, 4302, 4303, 4308, 4309, 4310, 4311, 4312, 4313, 4315, 4316, 5401, 5402, 5403, 5404, 5406, 5408, 5409, 5410, 5411, 5412, 5413, 5414, 5415, 5418, 5419, 5420, 5421, 5422, 5423, 7201, 7202, 7203, 7204, 7205, 7206, 7207, 7208, 7209, 7210, 7211, 7212, 7213, 7214, 7215, 7216, 7301, 7304, 7305, 7306, 7307, 7308, 7310, 7311, 7312, 7313, 7314, 7315, 7317, 7318, 7319, 7320, 7321, 7322, 7323, 7325, 7326, 7327, 7328, 7401, 7402, 7403, 7404, 7405, 7406, 7407, 7408, 7409, 8201, 8202, 8203, 8204, 8205, 8206, 8207, 8208, 8209, 8210, 8211, 8212, 8213, 8214, 8215, 8216, 8217, 8219, 8220, 8221, 8226, 8227, 8228, 8230, 8231, 8232, 8235, 8236, 8240, 8241, 8242, 8243, 8244, 8245, 8246, 8247, 8248, 8249, 8301, 8302, 8303, 8304, 8305, 8307, 8308, 8309, 8310, 8311, 8312, 8313, 8314, 8315, 9101, 9102, 9103, 9104, 9105, 9106, 9107, 9108, 9109, 9110, 9111, 9112, 9113, 9114, 9115, 9116, 9117, 9118, 9309, 9310, 9317, 9318]
Catchment Area Codes added in 2018:
[105, 1107, 1108, 1109, 1115, 1116, 1117, 1118, 1119, 1124, 1125, 1126, 1127, 1128, 1129, 1137, 1138, 1139, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 2701, 2702, 2703, 2704, 2705, 2706, 2707, 2708, 2709, 2710, 2711, 2712, 2713, 2714, 2715, 2716, 2718, 2719, 2720, 2721, 2726, 2727, 2728, 2729, 2730, 2731, 2733, 2734, 2735, 2801, 2802, 2803, 2804, 2805, 2806, 2807, 2808, 2809, 2810, 2811, 2812, 2813, 2814, 2816, 2817, 2818, 2819, 2820, 2821, 2822, 2823, 2824, 2825, 2826, 2827, 2828, 3201, 3202, 3203, 3204, 3205, 3206, 3207, 3208, 3210, 3211, 3212, 3213, 3214, 3216, 3217, 3218, 3219, 3220, 3221, 3222, 3223, 3224, 3225, 3226, 3227, 3228, 3229, 3230, 4402, 4404, 4407, 4408, 4411, 4412, 4413, 4414, 4415, 4416, 4417, 4418, 4419, 4420, 4421, 4422, 4423, 4424, 4425, 4426, 4427, 4428, 4429, 4430, 4431, 4432, 4433, 4434, 4435, 4436, 4437, 4438, 4439, 4440, 4441, 4442, 4444, 4446, 4447, 4448, 4449, 4450, 4451, 7501, 7502, 7503, 7504, 7506, 7508, 7509, 7510, 7511, 7512, 7513, 7514, 7515, 7518, 7519, 7520, 7521, 7522, 7523, 7524, 7525, 7526, 7527, 7528, 7529, 7530, 7531, 7532, 7533, 7534, 7535, 7536, 7537, 7538, 7539, 7540, 7541, 7542, 7543, 7544, 7545, 7546, 7547, 7548, 7601, 7604, 7605, 7606, 7607, 7608, 7610, 7611, 7612, 7613, 7614, 7615, 7617, 7618, 7619, 7620, 7621, 7622, 7623, 7625, 7626, 7627, 7628, 7629, 7630, 7631, 7632, 7633, 7634, 7635, 7636, 7637, 7638, 7639, 7640, 7641, 7642, 7643, 7644, 7645, 7646, 7647, 7648, 7649, 8401, 8402, 8403, 8404, 8405, 8406, 8407, 8408, 8409, 8410, 8411, 8412, 8413, 8414, 8415, 8416, 8417, 8419, 8420, 8421, 8426, 8427, 8428, 8430, 8431, 8432, 8435, 8436, 8437, 8438, 8439, 8440, 8441, 8442, 8443, 8444, 8445, 8446, 8447, 8448, 8449, 8450, 8451, 8453, 8454, 8455, 8456, 8457, 8458, 8459, 8460, 8461, 9326, 9329, 9331, 9332, 9333]

Ooops no, it looks like the subdivision of départements has changed between the years, therefore we are going to first aggregate data at the départements level:



In [16]:

    
number_columns = ['NB_DIFF_RECRUT_PROJECTS', 'NB_RECRUT_PROJECTS', 'NB_SEASON_RECRUT_PROJECTS']
columns = list(set(bmo_df.columns) - set(number_columns) - {'CATCHMENT_AREA_CODE', 'CATCHMENT_AREA_NAME'})
bmo_df_by_departement = bmo_df.groupby(columns).sum()[number_columns].reset_index()
bmo_df_by_departement.head()









    Out[16]:







  
    
      
      FAP_CODE
      DEPARTEMENT_NAME
      FAP_NAME
      DEPARTEMENT_CODE
      REGION_LIB
      PROFESSION_FAMILY_NAME
      REGION_CODE
      year
      PROFESSION_FAMILY_CODE
      NB_DIFF_RECRUT_PROJECTS
      NB_RECRUT_PROJECTS
      NB_SEASON_RECRUT_PROJECTS
    
  
  
    
      0
      A0Z40
      AIN
      Agriculteurs salariés (dont ouvriers agricoles...
      01
      RHONE-ALPES
      Autres métiers
      82
      2014
      Z
      77.014690
      181.432673
      98.742183
    
    
      1
      A0Z40
      AIN
      Agriculteurs salariés, ouvriers agricoles
      01
      AUVERGNE-RHONE-ALPES
      Autres métiers
      84
      2017
      Z
      90.000000
      233.000000
      158.000000
    
    
      2
      A0Z40
      AIN
      Agriculteurs salariés, ouvriers agricoles
      01
      AUVERGNE-RHONE-ALPES
      Autres métiers
      84
      2018
      Z
      180.000000
      230.000000
      88.000000
    
    
      3
      A0Z40
      AIN
      Agriculteurs salariés, ouvriers agricoles
      01
      RHONE-ALPES
      Autres métiers
      82
      2015
      Z
      91.974097
      165.643586
      142.851832
    
    
      4
      A0Z40
      AIN
      Agriculteurs salariés, ouvriers agricoles
      01
      RHONE-ALPES
      Autres métiers
      82
      2016
      Z
      23.990000
      105.960000
      68.530000

OK, now we can merge data per local market:



In [17]:

    
bmo_evolution = bmo_df_by_departement[bmo_df_by_departement.year == 2018]\
    .merge(
        bmo_df_by_departement[bmo_df_by_departement.year == 2017],
        on=['DEPARTEMENT_CODE', 'FAP_CODE'], how='outer', suffixes=['_2018', '_2017'])\
    .merge(
        bmo_df_by_departement[bmo_df_by_departement.year == 2016],
        on=['DEPARTEMENT_CODE', 'FAP_CODE'], how='outer', suffixes=['', '_2016'])\
    .merge(
        bmo_df_by_departement[bmo_df_by_departement.year == 2015],
        on=['DEPARTEMENT_CODE', 'FAP_CODE'], how='outer', suffixes=['', '_2015'])\
    .merge(
        bmo_df_by_departement[bmo_df_by_departement.year == 2014],
        on=['DEPARTEMENT_CODE', 'FAP_CODE'], how='outer', suffixes=['', '_2014'])

Evolution by Département

Show the distribution of hiring growth and volumes for départements: we do not expect to see any specific patterns, but this is a good way to have a glance to all départements at once.



In [18]:

    
bmo_per_departement = bmo_evolution.groupby(['DEPARTEMENT_CODE', 'DEPARTEMENT_NAME']).sum()
bmo_per_departement['evolution_2018'] = bmo_per_departement.NB_RECRUT_PROJECTS_2018.div(bmo_per_departement.NB_RECRUT_PROJECTS_2017)
bmo_per_departement['percent_evolution_2018'] = (bmo_per_departement.evolution_2018 - 1) * 100
bmo_per_departement.plot(kind='scatter', x='NB_RECRUT_PROJECTS_2018', y='percent_evolution_2018');

There are few outliers. Let's first check the départements with the most hiring to understand all the outliers on the right:



In [19]:

    
dimensions_2018 = ['percent_evolution_2018', 'NB_RECRUT_PROJECTS_2018']
bmo_per_departement.sort_values('NB_RECRUT_PROJECTS_2018', ascending=False)[dimensions_2018].head(8)









    Out[19]:







  
    
      
      
      percent_evolution_2018
      NB_RECRUT_PROJECTS_2018
    
    
      DEPARTEMENT_CODE
      DEPARTEMENT_NAME
      
      
    
  
  
    
      75
      PARIS
      18.584912
      142226.0
    
    
      13
      BOUCHES-DU-RHÔNE
      8.437169
      84067.0
    
    
      69
      RHÔNE
      21.853762
      81759.0
    
    
      59
      NORD
      31.803262
      79509.0
    
    
      92
      HAUTS-DE-SEINE
      16.964538
      78599.0
    
    
      35
      ILLE-ET-VILAINE
      20.221163
      67622.0
    
    
      33
      GIRONDE
      17.143167
      64861.0
    
    
      06
      ALPES-MARITIMES
      23.105891
      54920.0

Obviously Paris is way bigger than the rest but still has a very strong evolution (+18.6%). The other ones are all in large cities and metropolitan areas which correlates the France Stratégie report.

Now let's check the extreme changes:



In [20]:

    
extreme_changes_2018 = bmo_per_departement.sort_values('percent_evolution_2018', ascending=False)\
    [dimensions_2018]
display(extreme_changes_2018.head())
extreme_changes_2018.tail()









    







  
    
      
      
      percent_evolution_2018
      NB_RECRUT_PROJECTS_2018
    
    
      DEPARTEMENT_CODE
      DEPARTEMENT_NAME
      
      
    
  
  
    
      53
      MAYENNE
      40.795328
      10126.0
    
    
      50
      MANCHE
      36.272615
      15516.0
    
    
      16
      CHARENTE
      36.081144
      11605.0
    
    
      23
      CREUSE
      35.079459
      2295.0
    
    
      52
      HAUTE-MARNE
      34.768101
      4039.0
    
  








    Out[20]:







  
    
      
      
      percent_evolution_2018
      NB_RECRUT_PROJECTS_2018
    
    
      DEPARTEMENT_CODE
      DEPARTEMENT_NAME
      
      
    
  
  
    
      89
      YONNE
      -1.332149
      7777.0
    
    
      58
      NIÈVRE
      -8.207527
      4317.0
    
    
      39
      JURA
      -8.415435
      6693.0
    
    
      973
      MARTINIQUE
      -10.297857
      4939.0
    
    
      971
      GUADELOUPE
      -12.550710
      10347.0

Nice, a lot of départements ~1h30 train away from Paris are growing a lot (although they might have had a bad year in 2017). On the bottom, we find some DOM départements again. This is a bit depressing to see that the situation that is not good is getting even worse. :-(

Evolution by Job Family

Show the distribution of hiring growth and volumes for job families:



In [21]:

    
bmo_per_fap = bmo_evolution.groupby(['FAP_CODE', 'FAP_NAME']).sum()
bmo_per_fap['evolution_2018'] = bmo_per_fap.NB_RECRUT_PROJECTS_2018.div(bmo_per_fap.NB_RECRUT_PROJECTS_2017)
bmo_per_fap['percent_evolution_2018'] = (bmo_per_fap.evolution_2018 - 1) * 100
bmo_per_fap.plot(kind='scatter', x='NB_RECRUT_PROJECTS_2018', y='percent_evolution_2018');

A first conclusion is that only job families with few hiring have huge evolutions. However let's look at the job family that got +150%.



In [22]:

    
bmo_per_fap.sort_values('percent_evolution_2018', ascending=False)[dimensions_2018].head()









    Out[22]:







  
    
      
      
      percent_evolution_2018
      NB_RECRUT_PROJECTS_2018
    
    
      FAP_CODE
      FAP_NAME
      
      
    
  
  
    
      G0A42
      Mainteniciens en biens électrodomestiques
      152.103560
      779.0
    
    
      E0Z22
      Ouvriers non qualifiés en métallurgie, verre, céramique et matériaux de construction
      99.802176
      2020.0
    
    
      D0Z20
      Ouvriers non qualifiés travaillant par enlèvement ou formage de métal
      83.943834
      9039.0
    
    
      J6Z91
      Personnels navigants de l'aviation (techniques et commerciaux)
      78.534031
      341.0
    
    
      C2Z71
      Dessinateurs en électricité et en électronique
      74.693878
      428.0

So "domotics" is a real hype in 2018…

Let's check the distribution again only focusing on job families with more than 20k hires in the year:



In [23]:

    
bmo_per_large_fap = bmo_per_fap[bmo_per_fap.NB_RECRUT_PROJECTS_2018 >= 20000]
bmo_per_large_fap.plot(kind='scatter', x='NB_RECRUT_PROJECTS_2018', y='percent_evolution_2018');

That looks reasonnable, let's check top growth:



In [24]:

    
bmo_per_large_fap.sort_values('percent_evolution_2018', ascending=False)[dimensions_2018].head()









    Out[24]:







  
    
      
      
      percent_evolution_2018
      NB_RECRUT_PROJECTS_2018
    
    
      FAP_CODE
      FAP_NAME
      
      
    
  
  
    
      J3Z43
      Conducteurs routiers et grands routiers
      51.811831
      30206.0
    
    
      R0Z60
      Employés de libre-service
      47.105533
      58726.0
    
    
      B3Z20
      Ouvriers non qualifiés du second oeuvre du bâtiment
      32.519596
      22148.0
    
    
      J3Z42
      Conducteurs et livreurs sur courte distance
      27.501587
      30135.0
    
    
      M2Z90
      Ingénieurs et cadres d'études, R et D en informatique, chefs de projets informatiques
      27.233455
      42412.0

It looks like drivers are not a things of the past yet…

Conclusion

So BMO 2018 data looks pretty good and after a quick clean-up we were able to compare it with the previous years (2014, 2015, 2016 & 2017).

Some high level insights can be deduced from comparing these datasets. For instance, hiring is in really good shape in Paris, and in main metropolitan areas. Also hiring is going really well for transportation drivers as well as, ironically, self-service shops.

Possible next steps:

compare 2017 -> 2018 evolution with the evolution from 2016 -> 2017.
go more local and understand at the "bassin d'emploi" level the local dynamics. E.g. find where a local decrease is compensated by a local increase next door so that we could advise our users to commute a bit more.
compare flux and stock: the BMO is mixing new hires and replacements, but if we had a dataset with the number of currently employed persons we could distinguish local markets that are booming, and the ones that are becoming less stable (more hiring, but more firing as well).

	NB_RECRUT_PROJECTS	name
FAP_CODE
V5Z00	27.0	Exploitants d'équipements sportifs et culturels
V5Z81	85313.0	Professionnels de l'animation socioculturelle ...
V5Z82	22926.0	Sportifs et animateurs sportifs
V5Z84	13202.0	Surveillants d'établissements scolaires (surve...

	CATCHMENT_AREA_CODE	CATCHMENT_AREA_NAME	DEPARTEMENT_CODE	DEPARTEMENT_NAME	FAP_CODE	FAP_NAME	NB_DIFF_RECRUT_PROJECTS	NB_RECRUT_PROJECTS	NB_SEASON_RECRUT_PROJECTS	PROFESSION_FAMILY_CODE	PROFESSION_FAMILY_NAME	REGION_CODE	REGION_LIB
year
2014	39095	39095	39095	39095	39095	39095	39095	39095	39095	39095	39095	39095	39095
2015	38732	38732	38732	38732	38732	38732	38732	38732	38732	38732	38732	38732	38732
2016	39501	39501	39501	39500	39501	39501	39501	39501	39501	39501	39501	39501	39501
2017	42429	42429	42429	42428	42429	42429	42429	42429	42429	42429	42429	42429	42429
2018	45182	45182	45182	45181	45182	45182	45182	45182	45182	45182	45182	45182	45182

	CATCHMENT_AREA_CODE	NB_DIFF_RECRUT_PROJECTS	NB_RECRUT_PROJECTS	NB_SEASON_RECRUT_PROJECTS	REGION_CODE
year
2014	198679531	5.899533e+05	1.700541e+06	667037.379418	1981300
2015	194959082	5.632995e+05	1.739277e+06	683418.511234	1944091
2016	198859068	5.915074e+05	1.827271e+06	741914.580000	1982872
2017	212328291	7.407540e+05	1.976268e+06	775679.000000	2194196
2018	237433992	1.016211e+06	2.318471e+06	800877.000000	2364854

	FAP_CODE	DEPARTEMENT_NAME	FAP_NAME	DEPARTEMENT_CODE	REGION_LIB	PROFESSION_FAMILY_NAME	REGION_CODE	year	PROFESSION_FAMILY_CODE	NB_DIFF_RECRUT_PROJECTS	NB_RECRUT_PROJECTS	NB_SEASON_RECRUT_PROJECTS
0	A0Z40	AIN	Agriculteurs salariés (dont ouvriers agricoles...	01	RHONE-ALPES	Autres métiers	82	2014	Z	77.014690	181.432673	98.742183
1	A0Z40	AIN	Agriculteurs salariés, ouvriers agricoles	01	AUVERGNE-RHONE-ALPES	Autres métiers	84	2017	Z	90.000000	233.000000	158.000000
2	A0Z40	AIN	Agriculteurs salariés, ouvriers agricoles	01	AUVERGNE-RHONE-ALPES	Autres métiers	84	2018	Z	180.000000	230.000000	88.000000
3	A0Z40	AIN	Agriculteurs salariés, ouvriers agricoles	01	RHONE-ALPES	Autres métiers	82	2015	Z	91.974097	165.643586	142.851832
4	A0Z40	AIN	Agriculteurs salariés, ouvriers agricoles	01	RHONE-ALPES	Autres métiers	82	2016	Z	23.990000	105.960000	68.530000

		percent_evolution_2018	NB_RECRUT_PROJECTS_2018
DEPARTEMENT_CODE	DEPARTEMENT_NAME
75	PARIS	18.584912	142226.0
13	BOUCHES-DU-RHÔNE	8.437169	84067.0
69	RHÔNE	21.853762	81759.0
59	NORD	31.803262	79509.0
92	HAUTS-DE-SEINE	16.964538	78599.0
35	ILLE-ET-VILAINE	20.221163	67622.0
33	GIRONDE	17.143167	64861.0
06	ALPES-MARITIMES	23.105891	54920.0

		percent_evolution_2018	NB_RECRUT_PROJECTS_2018
DEPARTEMENT_CODE	DEPARTEMENT_NAME
53	MAYENNE	40.795328	10126.0
50	MANCHE	36.272615	15516.0
16	CHARENTE	36.081144	11605.0
23	CREUSE	35.079459	2295.0
52	HAUTE-MARNE	34.768101	4039.0

		percent_evolution_2018	NB_RECRUT_PROJECTS_2018
DEPARTEMENT_CODE	DEPARTEMENT_NAME
89	YONNE	-1.332149	7777.0
58	NIÈVRE	-8.207527	4317.0
39	JURA	-8.415435	6693.0
973	MARTINIQUE	-10.297857	4939.0
971	GUADELOUPE	-12.550710	10347.0

		percent_evolution_2018	NB_RECRUT_PROJECTS_2018
FAP_CODE	FAP_NAME
G0A42	Mainteniciens en biens électrodomestiques	152.103560	779.0
E0Z22	Ouvriers non qualifiés en métallurgie, verre, céramique et matériaux de construction	99.802176	2020.0
D0Z20	Ouvriers non qualifiés travaillant par enlèvement ou formage de métal	83.943834	9039.0
J6Z91	Personnels navigants de l'aviation (techniques et commerciaux)	78.534031	341.0
C2Z71	Dessinateurs en électricité et en électronique	74.693878	428.0

		percent_evolution_2018	NB_RECRUT_PROJECTS_2018
FAP_CODE	FAP_NAME
J3Z43	Conducteurs routiers et grands routiers	51.811831	30206.0
R0Z60	Employés de libre-service	47.105533	58726.0
B3Z20	Ouvriers non qualifiés du second oeuvre du bâtiment	32.519596	22148.0
J3Z42	Conducteurs et livreurs sur courte distance	27.501587	30135.0
M2Z90	Ingénieurs et cadres d'études, R et D en informatique, chefs de projets informatiques	27.233455	42412.0