To view our notebook's maps inside the notebook, please go to :

http://nbviewer.jupyter.org/github/gregunz/ada2017/blob/master/03%20-%20Interactive%20Viz/homework_03.ipynb

Please note that the maps are still saved here (in the map folder) in case nbviewer would throw any error (like 404, which happened to us sometimes)


In [1]:
import numpy as np
import os
import pandas as pd
import json
import folium
import branca
import seaborn as sns
import matplotlib.pyplot as plt

from IPython.display import IFrame, Image
from ipywidgets import interact

%matplotlib inline

In [2]:
folium.__version__ == '0.5.0'


Out[2]:
True

We were given the instruction that the unemployed rate of Switzerland on https://www.amstat.ch/v2/index.jsp also contained the people that were not jobless but looking for a job, which didn't make sense with the documentation on https://www.amstat.ch/v2/index.jsp, the raw numbers and the comments on Mattermost.

For those reasons, we will assume that unemployed rate for Switzerland are simply "number of jobless people/Total of active population" and not "number of people looking for a job/Total of active population"

Choice of the datasets

Regarding the European dataset on unemployment rate, we chose to use the latest annual complete dataset which is the one from January to December 2016. We could have chosen the latest complete dataset for a month, but we want to avoid the bias induced by the seasons.

To make sure that we compare rates that are comparable, we chose the Swiss dataset on unemployment rate for the same period. Moreover, we used the interface provided on https://www.amstat.ch/ to rename and select precisely the data that we are interested in. Also note we store the rates per nationalities in a separate file because if we start splitting the datas in a lot of categories (age category, nationalities, etc.) it is then cumbersome to aggregate them.

We show the steps taken to obtain each dataset with pictures at the end of this notebook as an appendice.

Part 1. Comparison of Switzerland's unemployed rate with the rest of Europe's

After downloading each data files, we clean them to get a cleaner and easier to read data file. Such cleaning can be done using a program such as Excel or Libreoffice. For larger files that cannot be open by these programs, we can use a script. We describe the cleaning below:

  • For the european dataset:
    • We only keep the name of the country and the unemployment rate.
    • Remove the parentheses and its content from the name of the country of Germany.
  • For the swiss dataset:
    • We remove the columns with the details of each month to keep only the total
    • The file contains a 2 rows header, we merge it into a single row header and translate the header names into english.
    • We move the "Total" name from the "Language region" column to the "Canton" column

Loading data and cleaning them.


In [3]:
#The europe data file
europe_data_file = 'data/european_rate_2016.csv'
europe_df = pd.read_csv(europe_data_file)

#The swiss data file excluding nationalities
swiss_data_file_age = 'data/swiss_rate_tot+age+lang.csv'
swiss_df = pd.read_csv(swiss_data_file_age, encoding="UTF-16")

#The swiss data file with nationalities
swiss_data_file_nationality = 'data/swiss_rate_2016.csv'
swiss_nat_df = pd.read_csv(swiss_data_file_nationality)

#Clean the number so that they are represented as an int
swiss_df['Jobless'] = swiss_df['Jobless'].apply(lambda x: int(x.replace("'", "")))
swiss_df['Total'] = swiss_df['Total'].apply(lambda x: int(x.replace("'", "")))

#Let's get the rate per canton by aggregating the data
swiss_df["tmp"] = swiss_df["Rate"] * swiss_df["Total"]
swiss_df_canton_grouped = swiss_df[["Canton", "Jobless", "Total", "tmp"]].groupby("Canton").sum()
swiss_df_canton_grouped["Rate"] = pd.DataFrame(swiss_df_canton_grouped["tmp"] / swiss_df_canton_grouped["Total"])
swiss_df_canton_grouped = swiss_df_canton_grouped.reset_index()
del swiss_df["tmp"]
del swiss_df_canton_grouped["tmp"]

In [4]:
#Add the nationalities to the swiss dataframe grouped by canton
swiss_nat_df = swiss_nat_df.pivot(index='Canton', columns='Nationality')['Rate']
swiss_df_canton_grouped = pd.merge(swiss_df_canton_grouped, swiss_nat_df.reset_index(), on='Canton')
del swiss_nat_df

Europe VS Switzerland

We decided to add the Swiss data inside the Europe dataframe in order to compare Switzerland with the rest of Europe


In [5]:
europe_df = europe_df[['Name', 'Rate']]
#Create a new single row dataframe for Switzerland
swiss_country_df = pd.DataFrame(
    [['Switzerland', swiss_df_canton_grouped.loc[swiss_df_canton_grouped['Canton'] == 'Total']['Rate'].values[0]]],
    columns=['Name', 'Rate'])

#Add the single row dataframe of Switzerland to the Europe dataframe
europe_df = europe_df.append(swiss_country_df)

In [6]:
europe_topo_path = 'topojson/europe.topojson.json'
europe_json_data = json.load(open(europe_topo_path, encoding="UTF-8"))

Choropleth

For the europe, as we couldn't personalize the styling function using the choropleth function we decided to use the TopoJson function. We did so in order to style differently when the data is not unknown which is the case for europe.


In [7]:
rate_min = min(europe_df['Rate'])
rate_max = max(europe_df['Rate'])

# color scale from min rate to max rate
color_scale = branca.colormap.linear.OrRd.scale(rate_min, rate_max)
color_scale = color_scale.to_step(n=8)

In [8]:
def style_function(country):
    rate = europe_df.loc[europe_df['Name'] == country['properties']['NAME']]['Rate'].values
    if len(rate) > 0: 
        # country is in the dataframe
        return {
            'fillOpacity': 0.8,
            'weight': 0.5,
            'color': 'black',
            'fillColor': color_scale(rate)
        }
    else:
        # country is not in the dataframe, hence we put its color as black
        return {
            'fillOpacity': 0.2,
            'weight': 0.2,
            'color': 'black',
            'fillColor': 'black'
        }

In [9]:
europe_unemployment_map = folium.Map([53,22], tiles='cartodbpositron', zoom_start=4)
g = folium.TopoJson(europe_json_data,
                'objects.europe',
                style_function=style_function,
               ).add_to(europe_unemployment_map)

color_scale.caption = 'Unemployment Rate (%)'
color_scale.add_to(europe_unemployment_map)

europe_title = "europe_unemployement_rate"
europe_unemployment_map.save("maps/{}.html".format(europe_title))

print('https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/{}.html' \
      .format(europe_title.replace(" ", "%20").replace("%", "%25")))

europe_unemployment_map


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/europe_unemployement_rate.html
Out[9]:

We used a linear scale because it's the one that is more intuitive. By intuitive we mean that a high color difference is proportional to the rate difference as we usually expect.

But because of the two outliers (Spain and Greece) we cannot really distinguish differences between countries with low rate.

Hence we decided to still use a linear scale but this time the outliers are aggregated together in the last color of the scale (which now means rate of 15 and +) :


In [10]:
rate_max = europe_df['Rate'].quantile(.95)
color_scale = branca.colormap.linear.OrRd.scale(rate_min, rate_max)        
color_scale = color_scale.to_step(n=8)
color_scale.index[-1] = np.float(max(europe_df['Rate']))

europe_unemployment_map_2 = folium.Map([53,22], tiles='cartodbpositron', zoom_start=4)

g = folium.TopoJson(europe_json_data,
                'objects.europe',
                style_function=style_function,
               ).add_to(europe_unemployment_map_2)

color_scale.caption = 'Unemployment Rate (scale without outliers) (%)'
color_scale.add_to(europe_unemployment_map_2)

europe_title_2 = "europe_unemployement_rate_2"
europe_unemployment_map_2.save("maps/{}.html".format(europe_title_2))

print('https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/{}.html' \
      .format(europe_title_2.replace(" ", "%20").replace("%", "%25")))

europe_unemployment_map_2


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/europe_unemployement_rate_2.html
Out[10]:

Comments on the plots above:

As we can see the recorded unemployment rate in Europe goes from around 3% to approximatelly 24%.

We can clearly notice that Greece and Spain rates are really high, this is obviously expected with the last events that happened in those countries.

Switzerland rate is the lowest, swiss people should be proud of this!

When a country is greyed out, it means that the data was not provided.

Part 2. Comparison of unemployed rates in Switzerland at the canton level

Choropleth

With the swiss data, we have data for all cantons. Hence we can use the choropleth function which is easier and more convienient than the "TopoJson" one.


In [11]:
swiss_topo_path = r'topojson/ch-cantons.topojson.json'
swiss_json_data = json.load(open(swiss_topo_path, encoding="utf-8"))

In [12]:
# roughly the center of Switzerland
geo_swiss_center = [46.8, 8.35]

# function to draw a map without duplicating lots of code
def draw_swiss_map(df, title, column_name, **args):
    swiss_map = folium.Map(geo_swiss_center, tiles='cartodbpositron', zoom_start=7.5)

    swiss_map.choropleth(
        geo_data=swiss_json_data,
        data=df,
        topojson='objects.cantons',
        columns=['Canton', column_name],
        fill_color='OrRd',
        key_on='properties.name',
        legend_name=title,
        **args
    )
    swiss_map.save("maps/{}.html".format(title))
    print('https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/{}.html' \
          .format(title.replace("%", "%25").replace(" ", "%20")))
    return swiss_map

Unemployment rate in Switzerland at the Canton level


In [13]:
draw_swiss_map(swiss_df_canton_grouped, 'Unemployment Rate (%)', 'Rate')


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Unemployment%20Rate%20(%25).html
Out[13]:

As we can see above, Eastern and Central Cantons have a lower unemployement rate than the Western Cantons on average.

All people looking for a job, jobless or not

As mentionned above, the rate we have corresponds to the proportion of people without jobs. We will visualize below the rate of people looking for a job.

Note that it is not the same scale as for the previous map. Here what we want to show is the visualization of the rate of people looking for a job and not compare both rates, if it would have been the case, we would have used the same scale.


In [14]:
# Adding the people looking for a new job that already have a job to the rate
swiss_df_canton_grouped['Looking for a job rate'] = swiss_df_canton_grouped['Rate'] * swiss_df_canton_grouped['Total'] / swiss_df_canton_grouped['Jobless']

In [15]:
draw_swiss_map(swiss_df_canton_grouped, 'Looking for a job (%)', 'Looking for a job rate')


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Looking%20for%20a%20job%20(%25).html
Out[15]:

Observations

Both maps follow pretty much the same trends but with a bigger contrast between the Center/East and the West/Border.

Part 3. Difference in the rate between Swiss and foreign workers

To make sure that we use can use the colors to compare the rates between the Swiss and foreign citizens, we use the same colormap for the two following maps.


In [16]:
threshold_scale = np.linspace(
    min([swiss_df_canton_grouped['Suisses'].min(), swiss_df_canton_grouped['Etrangers'].min()]),
    max([swiss_df_canton_grouped['Suisses'].max(), swiss_df_canton_grouped['Etrangers'].max()]),
    6,
    dtype=int
).tolist()

In [17]:
draw_swiss_map(swiss_df_canton_grouped,
               'Unemployment Rate (%) of Swiss Citizen',
               'Suisses',
               threshold_scale=threshold_scale)


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Unemployment%20Rate%20(%25)%20of%20Swiss%20Citizen.html
Out[17]:

In [18]:
draw_swiss_map(
    swiss_df_canton_grouped,
    "Unemployment Rate (%) of foreign Citizen",
    "Etrangers",
    threshold_scale=threshold_scale
)


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Unemployment%20Rate%20(%25)%20of%20foreign%20Citizen.html
Out[18]:

Observations

We can see that more foreign people (in proportion) are unemployed than swiss people in general.

Part 3 bis

We also wanted to show the difference on the same map by showing the difference of rates between Swiss and foreigners


In [19]:
swiss_df_canton_grouped['Foreign - Swiss'] = (swiss_df_canton_grouped['Etrangers'] - swiss_df_canton_grouped['Suisses']) / swiss_df_canton_grouped['Etrangers']

In [20]:
draw_swiss_map(
    swiss_df_canton_grouped,
    'Differerence Swiss Foreign Unemployment Rate (%) of Citizen',
    'Foreign - Swiss'
)


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Differerence%20Swiss%20Foreign%20Unemployment%20Rate%20(%25)%20of%20Citizen.html
Out[20]:

Observations

This map shows in which cantons there are the biggest differences between swiss and foreign citizen in the unemployment rate. The closer we are to 1 means that

Part 3 bisbis

We also wanted to show the difference on the same map, but this time by showing the ratio of the rates between swiss and foreigners


In [21]:
swiss_df_canton_grouped['Foreign / Swiss'] = swiss_df_canton_grouped['Etrangers'] / swiss_df_canton_grouped['Suisses']

In [22]:
draw_swiss_map(
    swiss_df_canton_grouped,
    'Ratio Swiss Foreign Unemployment Rate (%) of Citizen',
    'Foreign / Swiss'
)


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Ratio%20Swiss%20Foreign%20Unemployment%20Rate%20(%25)%20of%20Citizen.html
Out[22]:

Here, we can see that where the unemployment rate was the worse, foreigners and Swiss have almost equal chances to be unemployed. Whereas in the center of Switzerland, foreigners can be up to 7 times more likely to be unemployed.

This could come from a more conservative state of mind present in Central Switzerland and can be observed frequently if we look a the votation results.

Part 3 : age category

We use the predifined 3 ages categories provided by Amstat, that is: 15-24, 25-49 and 50+


In [23]:
swiss_age_canton_df = swiss_df.groupby(["Canton", "Age category"]).sum().reset_index()
fig, ax = plt.subplots(figsize=(20,10))
plot = sns.barplot(x="Canton", y="Rate", hue="Age category", data=swiss_age_canton_df);
plot.set_xticklabels(plot.get_xticklabels(), rotation=90, size=20)
plt.show()


Even though it is not asked, let's plot three different maps. (One for each age category)


In [24]:
ages = ["15-24 ans", "25-49 ans", "50 ans et plus"]
threshold_scale_age = np.linspace(
    min([swiss_age_canton_df[swiss_age_canton_df["Age category"] == age]["Rate"].min() for age in ages]),
    max([swiss_age_canton_df[swiss_age_canton_df["Age category"] == age]["Rate"].max() for age in ages]),
    6,
    dtype=float
).tolist()
maps = list()

for age in ages:
    maps.append(draw_swiss_map(
        swiss_age_canton_df[swiss_age_canton_df["Age category"] == age],
        "Unemployement rate (of {})".format(age),
        "Rate",
        threshold_scale = threshold_scale_age
    ))

def draw_map_age(age):
    return maps[ages.index(age)]


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Unemployement%20rate%20(of%2015-24%20ans).html
https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Unemployement%20rate%20(of%2025-49%20ans).html
https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Unemployement%20rate%20(of%2050%20ans%20et%20plus).html

In [25]:
interact(draw_map_age, age=ages);


Unfortunatly, the maps above won't show because we are using an interactive library, but here are some screenshots:


In [26]:
Image('./img/interact1.png')


Out[26]:

In [28]:
Image('./img/interact2.png')


Out[28]:

In [29]:
Image('./img/interact3.png')


Out[29]:

Observations:

More young people are unemployed and it's decreasing with the age. Suprisingly enough people 50+ have good rates!

BONUS

Note that we let the canton Ticino be in neither side because it is generally omited from the Röstigraben separation. Also we do not separate "upper Valais" and "Lower Valais", we do it that way to be consistent with the data Amstat provides us.

In other words, this is a simplified version separating Latin* cantons from the Germanic* ones

*(we look at the majority, we don't take into account billingual cantons)


In [30]:
# copy of swiss df without row "Total"
rosti_df = swiss_df[swiss_df['Canton'] != 'Total'].copy()

# adding a column with the weighted rate in order to compute the average rate per "röstigraben part"
rosti_df["Weighted rate"] = rosti_df["Rate"] * rosti_df["Total"]

# aggregating weighted rates with respect to their "röstigraben part" 
rosti_df = rosti_df.groupby(["Language region"])[["Weighted rate", "Total"]].sum().head()

# adding a column with the rate per "röstigraben part"
rosti_df["Rate by rösti part"] = rosti_df["Weighted rate"] / rosti_df["Total"]

swiss_df_with_rosti = pd.merge(swiss_df, rosti_df.reset_index(), on="Language region")

In [31]:
draw_swiss_map(
    swiss_df_with_rosti,
    'Unemployment Rate (%) in Röstigraben parts',
    'Rate by rösti part',
    line_weight=0,
    fill_opacity=0.7,
    smooth_factor=0.1,
)


https://github.com/gregunz/ada2017/tree/master/03%20-%20Interactive%20Viz/maps/Unemployment%20Rate%20(%25)%20in%20Röstigraben%20parts.html
Out[31]:

We aggregated the cantons together depending on the language spoken and computed the weighted average of the unemployed rate depending on the number of unemployed people. We can clearly see a huge difference between the French (& Italian) speaking part and the rest (almost a factor of 2), this separation is also called the "Röstigraben" (Ticcino might be handle separately but we decided to show germanic vs latin).

Appendice

Europe data


In [32]:
Image('./img/get_europe_rates.png')


Out[32]:

Swiss data


In [33]:
Image('./img/get_swiss_rates_1.png')


Out[33]:

In [34]:
Image('./img/get_swiss_rates_2.png')


Out[34]:

In [35]:
Image('./img/get_swiss_rates_3.png')


Out[35]:

In [36]:
Image('./img/get_swiss_rates_4.png')


Out[36]:

In [37]:
Image('./img/get_swiss_rates_5.png')


Out[37]: