WorkUp Events

Author: pascal@bayes.org

Date: 2017-08-13

In July 2017, the WorkUp team gave us a programatical access to their events. This notebook explores how we could use it to give examples of pro events to our users to motivate them to go to some of them or others.

The whole dataset is directly downloadable at https://www.workuper.com/events/index_json.json and it is also available with the command docker-compose run --rm data-analysis-prepare data/workuper.json.

Loading and General View

First let's load the json file:


In [1]:
import json
import os
from os import path

import pandas as pd

DATA_FOLDER = os.getenv('DATA_FOLDER')

events = pd.read_json(path.join(DATA_FOLDER, 'workup.json'))
events.head()


Out[1]:
address category created_at date dateend description favorite id latitude longitude organiser price slug status subscription_link time title updated_at user_id website
0 Bordeaux, France ["Trouver sa voie", "Trouver un job", "Changer... 2017-06-20 15:25:40 2017-08-29 2017-08-29 <p>Chaque mois, MADIRCOM organise un événement... NaN 238 44.837789 -0.579180 MADIRCOM 0 ap-heros-candidats-madircom-bordeaux approved https://www.eventbrite.fr/e/billets-ap-heros-c... 2000-01-01T18:00:00Z AP HEROS CANDIDATS MADIRCOM - BORDEAUX 2017-07-07 10:05:39 2505 www.madircom.com
1 La Grande Halle de la Villette, 211 Avenue Jea... ["Trouver sa voie", "Trouver un job", "Changer... 2017-03-24 15:30:09 2018-01-19 2018-01-20 <b>Trouver un emploi</b>, <b>créer son entrepr... NaN 53 48.891172 2.390472 Altice Media Events 0 le-salon-du-travail-et-de-la-mobilite-professi... approved http://www.salondutravail.fr/ 2000-01-01T10:00:00Z Le Salon du Travail et de la Mobilité Professi... 2017-03-25 14:40:19 2063 www.salondutravail.fr
2 50 Quai Charles de Gaulle, Lyon, France ["Trouver un job", "Changer de métier"] 2017-05-16 10:15:31 2017-09-20 2017-09-20 <blockquote><p>Le salon des 10 000 emplois met... NaN 161 45.785001 4.854624 Job Rencontres 0 salon-des-10-000-emplois-lyon approved http://www.jobrencontres.fr/salon-recrutement-... 2000-01-01T08:30:00Z Salon des 10 000 emplois - Lyon 2017-05-16 10:15:52 2117 http://www.jobrencontres.fr/
3 Quai des Chartrons, Bordeaux, France ["Trouver un job", "Changer de métier"] 2017-05-16 10:18:50 2017-09-28 2017-09-28 <blockquote><p>Le salon des 10 000 emplois met... NaN 162 44.853082 -0.566987 Job Rencontres 0 salon-des-10-000-emplois-bordeaux approved http://www.jobrencontres.fr/salon-recrutement-... 2000-01-01T08:30:00Z Salon des 10 000 emplois - Bordeaux 2017-05-16 10:19:04 2117 http://www.jobrencontres.fr/
4 Rond-Point du Prado, Marseille, France ["Trouver un job", "Changer de métier"] 2017-05-16 10:37:05 2017-10-06 2017-10-06 <blockquote><p>Le salon des 10 000 emplois met... NaN 163 43.272516 5.391503 Job Rencontres 0 salon-des-10-000-emplois-marseille approved http://www.jobrencontres.fr/salon-recrutement-... 2000-01-01T08:30:00Z Salon des 10 000 emplois - Marseille 2017-05-16 10:37:27 2117 http://www.jobrencontres.fr/

Cool! Before exploring each individual fields, let's see how many events there are, and whether those fields are always set:


In [2]:
events.describe(include='all').head(3)


Out[2]:
address category created_at date dateend description favorite id latitude longitude organiser price slug status subscription_link time title updated_at user_id website
count 13 13 13 13 13 13 0.0 13.0 13.0 13.0 13 13.0 13 13 13 13 13 13 13.0 13
unique 13 5 13 13 13 12 NaN NaN NaN NaN 8 NaN 13 1 11 6 12 13 NaN 8
top 26 RUE SERPOLLET, 26 RUE SERPOLLET, 75020 PAR... ["Trouver sa voie", "Trouver un job", "Changer... 2017-08-08 21:34:51 2017-09-30 00:00:00 2017-09-28 <div>Notre devise : ensemble, transformons le ... NaN NaN NaN NaN Activ'Action NaN job-boost-3-atelier-coaching-emploi-et-reconve... approved http://www.jobrencontres.fr/salon-recrutement-... 2000-01-01T12:45:00Z Activ'Boost @Paris 12 ème (12h45-16h) 2017-08-03 09:38:18 NaN activaction.org

Hum, there are not that many rows: 13 only. However all fields seem to be set, except for favorite which is never set.

Extracting Useful Info

By a quick glance to the data above, we can classify fields between useful, irrelevant, and others to explore.

The ones that seem directly useful:

  • title
  • address combined with latitude and longitude
  • date and dateend
  • organiser

And then in a lesser extent (too much details for what we want to use them):

  • description
  • subscription_link
  • website
  • time

However the following ones are irrelevant to us as they seem only useful for the WorkUp database:

  • favorite
  • id
  • status
  • created_at
  • updated_at
  • user_id

So it leaves 3 fields: category, price and slug that we should explore a bit more.

Obvious Fields

Let's check quickly that the obvious fields have useful values. The titles:


In [3]:
pd.options.display.max_colwidth = 100
events.title.to_frame()


Out[3]:
title
0 AP HEROS CANDIDATS MADIRCOM - BORDEAUX
1 Le Salon du Travail et de la Mobilité Professionnelle
2 Salon des 10 000 emplois - Lyon
3 Salon des 10 000 emplois - Bordeaux
4 Salon des 10 000 emplois - Marseille
5 Activ'Boost @Paris 12 ème (12h45-16h)
6 Activ'Jump @Paris 12 (12h45-16h)
7 Activ'Boost @Paris 12 ème (12h45-16h)
8 Activ'Boost @Paris 20ème (12h45-16h)
9 Freelance Day
10 JOB BOOST 3 - ATELIER COACHING EMPLOI et RECONVERSION
11 Atelier et Conférence au CIDJ - TROUVER SA VOIE by 4 coachs
12 Les Jeudis d'Amélie : Maîtrisez Linkedin dans votre recherche d'emploi @PARIS

Perfect, we can use it directly as a title to show to our users. Note that the title frequently involves the city, and sometimes the timing. Also we can see that the use of upper case letters is less than ideal.

The addresses:


In [4]:
events[['address', 'latitude', 'longitude']]


Out[4]:
address latitude longitude
0 Bordeaux, France 44.837789 -0.579180
1 La Grande Halle de la Villette, 211 Avenue Jean Jaurès, 75019 Paris 48.891172 2.390472
2 50 Quai Charles de Gaulle, Lyon, France 45.785001 4.854624
3 Quai des Chartrons, Bordeaux, France 44.853082 -0.566987
4 Rond-Point du Prado, Marseille, France 43.272516 5.391503
5 PARIS ANIM' - MAISON DES ENSEMBLES, 3 RUE D'ALIGRE, 75012 PARIS, FRANCE 48.848123 2.377220
6 MAISON DES ENSEMBLES, 3 RUE D'ALIGRE, 75012 PARIS, FRANCE 48.848123 2.377220
7 PARIS ANIM' - MAISON DES ENSEMBLES, 3 RUE D'ALIGRE, 75012 PARIS, FRANCE 48.848123 2.377220
8 26 RUE SERPOLLET, 26 RUE SERPOLLET, 75020 PARIS, FRANCE 48.860908 2.413105
9 37 Rue de Turenne, 75003 Paris, France 48.856946 2.364321
10 33 Rue Berger, Paris, France 48.861721 2.344056
11 101 Quai Branly, Paris, France 48.854771 2.289524
12 12 Villa de Lourcine, 75014 Paris, France 48.830706 2.339032

Quickly comparing two addresses in Bordeaux, we can see that the latitude, longitude is probably the exact one (pretty cool). For our application we could filter events that are not so far from the user's target city. The address is not always formatted the same way so we will use mainly the lat/lng.

The dates:


In [5]:
events[['date', 'dateend']]


Out[5]:
date dateend
0 2017-08-29 2017-08-29
1 2018-01-19 2018-01-20
2 2017-09-20 2017-09-20
3 2017-09-28 2017-09-28
4 2017-10-06 2017-10-06
5 2017-08-17 2017-08-17
6 2017-08-21 2017-08-21
7 2017-08-24 2017-08-24
8 2017-08-28 2017-08-28
9 2017-09-14 2017-09-14
10 2017-09-26 2017-09-29
11 2017-09-30 2017-09-30
12 2017-09-21 2017-09-21

The dates are all in the future which probably indicates that WorkUp already filters the past ones. Note that the dateend is almost always the same as date which probably indicates that those events only last one day. For our purpose we will ignore dateend for now.

The organisers:


In [6]:
events.organiser.to_frame()


Out[6]:
organiser
0 MADIRCOM
1 Altice Media Events
2 Job Rencontres
3 Job Rencontres
4 Job Rencontres
5 Activ'Action
6 Activ'Action
7 Activ'Action
8 Activ'Action
9 Cohome
10 AlexforjoB et CoWanted
11 AlexforjoB
12 Amélie Favre Guittet

As for the title, this is pretty clean and useful.

Richer Fields

The description, exact time, the website or a direct subscription_link are not required for what we want. Our goal is not to speed up the process of our users subscribing to a specific event but to have them realize that there are many of them and that they should dig a bit more this way of enlarging their network or improving their job search.

However it would be useful to get all those details from a secondary page if they wanted to. From the WorkUp website, there are pages with full details that are accessible with a URL like this one: https://www.workuper.com/events/salon-des-10-000-emplois-marseille. The good thing is that the dataset contains the last part of the URL in the slug field. So we will keep this field to rebuild the full page URLs.

Others

Let's check the price field: we could decide to hide events that are not free, or at least warn our users early.


In [7]:
events[['title', 'price']]


Out[7]:
title price
0 AP HEROS CANDIDATS MADIRCOM - BORDEAUX 0
1 Le Salon du Travail et de la Mobilité Professionnelle 0
2 Salon des 10 000 emplois - Lyon 0
3 Salon des 10 000 emplois - Bordeaux 0
4 Salon des 10 000 emplois - Marseille 0
5 Activ'Boost @Paris 12 ème (12h45-16h) 0
6 Activ'Jump @Paris 12 (12h45-16h) 0
7 Activ'Boost @Paris 12 ème (12h45-16h) 0
8 Activ'Boost @Paris 20ème (12h45-16h) 0
9 Freelance Day 12
10 JOB BOOST 3 - ATELIER COACHING EMPLOI et RECONVERSION 300
11 Atelier et Conférence au CIDJ - TROUVER SA VOIE by 4 coachs 0
12 Les Jeudis d'Amélie : Maîtrisez Linkedin dans votre recherche d'emploi @PARIS 21

Cool! Most of them are free. However some of them are not, and one of them is really expensive. Here we would need a product decision on whether we show them and how.

Finally, let's check the category field:


In [8]:
events.category.iloc[0]


Out[8]:
'["Trouver sa voie", "Trouver un job", "Changer de métier"]'

Woops, this seems like a JSON encoded string (which means there was a double JSON encoding as we already decoded once to create the dataset). Let's decode it:


In [9]:
events['categories'] = events.category.apply(lambda l: json.loads(l))
events.categories.iloc[0]


Out[9]:
['Trouver sa voie', 'Trouver un job', 'Changer de métier']

Now let's list all the available categories:


In [10]:
all_categories = set(c for categories in events.categories.tolist() for c in categories)
all_categories


Out[10]:
{'Changer de boite',
 'Changer de métier',
 'Créer sa boite',
 'Trouver sa voie',
 'Trouver un job'}

Great! Some of them match exactly some questions that our users have answered: so we could directly filter on the events that might interest them.

Conclusion

Despite having a very small amount of data today, the API provided by WorkUp is perfect for us and would easily get integrated in Bob Emploi.

Few things that WorkUp could fix (apart from getting more events across the country):

  • Make sure the capitalization of the title field is clean.
  • Fix the double-JSON-encoding of the category field.

After that, there are many fields that we can use out of the box to display the events, and few others that can be used to filter them as appropriate for a given user:

  • latitude, longitude, to narrow the list of events to the ones close to the user.
  • price, to select only the free or cheap ones.
  • category, to select the ones that match what the user is trying to do.

Ultimately we might also want to filter some events that are linked to certain industries or certain kinds of jobs (e.g. "Freelance Day") but as most events are actually kind of generic for now, this is not a priority.