In [1]:
#As always, we import everything
import os
import pandas as pd
import json
import folium
import math
%matplotlib inline
import matplotlib.pyplot as plt
from geopy.geocoders import Nominatim
import base64
We define a RESET constant, that allows the recreation of all graphs and all mappings. We will detail later in the notebook its use.
In [2]:
RESET = False
First, let's get the data and put it in a dataframe. The .tsv file was found on the eurostat website. We had to modify the data so that it is better adapted for our analysis. First, in the dataset we found, there are entries for entities that are not countries and 2 countries that are not in Europe. We also changed the country code for Greece, as it was EL in this dataset, but usually GR is used. Moreover, this data lacked information for Switzerland so we added it manually. Using the other website with swiss statistics, we generated the unemployment rates averaged annually to match the data in this dataset. Then inserted them in the file manually.
In [3]:
eu_data = pd.read_table('data/tsdec450.tsv', na_values=': ')
Now, we can process it to get the id for each country.
In [4]:
eu_data['info'] = eu_data['info'].apply(lambda x: x.split(",")[3])
eu_data
Out[4]:
Now we can define a function that returns the color of a country based on its unemployment rate. For this, we need to choose a palette of colors : we used colorbrewer to get one which is color-blind friendly. To get the color for a country, we simply bin all the unemployment_rate into one of the colors.
We decided to use a linear scale from the lowest unemployment value to the highest in our data.
In [5]:
colors_unemployement = ['#fef0d9','#fdd49e','#fdbb84','#fc8d59','#e34a33','#b30000']
min_unemployement_eu = eu_data['2016'].min()
max_unemployement_eu = eu_data['2016'].max()
def get_color_eu(country, properties):
values = eu_data.loc[eu_data['info'] == country, '2016'].values
if len(values) == 0:
return '#000000'
unemployement_rate = values[0]
ratio = (unemployement_rate - min_unemployement_eu) / (max_unemployement_eu - min_unemployement_eu)
index = math.floor(ratio * len(colors_unemployement))
if index == len(colors_unemployement):
index = index - 1
return colors_unemployement[index]
We also define a function that checks if a country is in the dataframe, and if not, we make its overly transparent.
In [6]:
def get_opacity_eu(country):
values = eu_data.loc[eu_data['info'] == country, '2016'].values
return 1 if len(values) > 0 else 0
Now we can create the Europe map, and add the overlay, using the previous functions.
The second map is only used to display the key. We used to following website to convert the topojson file to gejson https://jeffpaine.github.io/geojson-topojson/
Note that we cannot use only the choropleth map since there are countries for which we do not have the data. We'd like these countries to be transparent so we only use that map for the key.
In [7]:
map_eu = folium.Map([51,15], tiles='cartodbpositron', zoom_start=4)
# Color of the country
folium.TopoJson(
open('topojson/europe.topojson.json'),
object_path='objects.europe',
style_function=lambda feature: {
'fillOpacity' : get_opacity_eu(feature['id']), #opacity for the fill color
'opacity' : get_opacity_eu(feature['id']), #opacity for the borders
'fillColor': get_color_eu(feature['id'], feature['properties']),
'color' : 'black',
'weight' : 1
}
).add_to(map_eu)
map_eu.choropleth(geo_data='topojson/europe.geojson.json',data=eu_data,
columns=['info', '2016'],
key_on='feature.id',
fill_color='OrRd',
fill_opacity=0,
line_opacity=0.0,
legend_name='Percentage of unemployment in country')
For interactivity, we decided to place, for each country, a popup displaying the graph of unemployment from 1990 to 2016. Folium provides a way to add a marker, that when clicked on, displays a popup.
Thus, we first need to get, for each country its center to place the marker at the right coordinates. We used the geopy library to do this.
As this operation is quite slow, and not 100% accurate for each country, we decided to create a mapping using the library, then save it to a JSON, and finally to manually edit this JSON if the coordinates were not accurate. When this was done, we only need to reload this JSON and not recreate it each time the notebook is run.
The file containing the mapping between country ID and name was generated beforehand as well but we choose not to regenerate it ever because at this point in the program, it's not possible to efficiently obtain it. We will need it to display country names related to their code.
In [8]:
with open('id_country_mapping.json', 'r') as infile:
id_country_mapping = json.load(infile)
if RESET:
position_mapping = {}
geolocator = Nominatim()
for country_id in eu_data['info']:
if country_id in id_country_mapping:
location = geolocator.geocode(id_country_mapping[country_id])
if (location):
position_mapping[country_id] = [location.latitude,location.longitude]
with open('position_mapping.json', 'w') as outfile:
json.dump(position_mapping, outfile)
else:
with open('position_mapping.json', 'r') as infile:
position_mapping = json.load(infile)
Now that we have the positions, for each country we create a graph and save it. We used the previously defined id_country_mapping
to get the full name of each country.
In [9]:
if RESET:
for index, row in eu_data.iterrows():
global id_country_mapping
if row.values[0] in id_country_mapping:
row_values = row.iloc[1:].astype(float)
plot = row_values.transpose().plot()
plt.ylim(0,30)
plt.xlabel('Year')
plt.ylabel('% of unemployed')
plt.title('Graph of unemployment in ' + id_country_mapping[row.values[0]])
plt.savefig('eu_graphs/graph_' + row.values[0] + '.png')
plt.close()
We decided to host all the graphs on our personal server, in order to keep the maps at a low size.
Now that we have both the graphs and the positions, we can, for each country, place a custom marker (the country's flag) that displays the corresponding graph when clicked.
You can see the map here :
In [10]:
all_markers = folium.FeatureGroup("Markers")
for country_id in eu_data['info']:
if country_id in position_mapping:
location = position_mapping[country_id]
if location:
html = '<img src="http://feudal-ambitions.com/ADA/HW3/eu_graphs/graph_' + country_id + '" width=450 height=300>'
iframe = folium.IFrame(html, width=470, height=320)
popup = folium.Popup(iframe, max_width=2650)
custom_icon = folium.features.CustomIcon("http://feudal-ambitions.com/ADA/HW3/eu_flags/" + country_id.lower() + ".png", icon_size=(25, 25))
all_markers.add_child(folium.Marker(location, popup=popup, icon=custom_icon))
map_eu.add_child(all_markers)
map_eu
Out[10]:
In this map, we can see that southern and western european countries have the highest unemployment on the continent. Some of these countries such as Greece or Spain experienced financial crises in recent times which explain these large number. Moreover, thanks to our interactive graphs, we can visualise the evolution of many countries such as Poland which today has a low unemployment rate but historically exprienced very high unemployment.
When it comes to Switzerland, we can see that it's in the lower unemployment bracket. In fact, Swiss unemployment rate is among the lowest and has been historically very low, even during crises that touched most other European countries.
For this exercise, we generate a .csv file from the amstat swiss statistics website. The file we get from the site is the one that shows unemployment rates. These rates are defined on the website as the number of unemployed people divided by the active population of the canton scaled by 100. We modified the file manually to remove some useless title columns and lines and changed the names of the cantons into their codes.
In [11]:
ch_data = pd.read_csv('data/unemployed_switzerland.csv')
In [12]:
ch_data
Out[12]:
We define similar functions for constructing the canton mappings to IDs and their location. Everything is almost the same as for EU countries. Like for Europe we generate a geojson to display the choropleth map. For the markers on the Swiss map, we put each canton's coat-of-arms.
In [13]:
colors_unemployement = ['#fef0d9','#fdd49e','#fdbb84','#fc8d59','#e34a33','#b30000']
In [14]:
map_ch = folium.Map([46.8,8.3], tiles='cartodbpositron', zoom_start=8)
map_ch.choropleth(geo_data='topojson/ch-cantons.geojson.json',data=ch_data,
columns=['Canton', '2017'],
key_on='feature.id',
fill_color='OrRd',
fill_opacity=1,
line_opacity=1,
legend_name='Percentage of unemployment in canton')
In [15]:
with open('id_canton_mapping.json', 'r') as infile:
id_canton_mapping = json.load(infile)
if RESET:
position_canton_mapping = {}
geolocator = Nominatim()
for canton_id in id_canton_mapping:
location = geolocator.geocode(id_canton_mapping[canton_id])
if (location):
position_canton_mapping[canton_id] = [location.latitude,location.longitude]
with open('position_canton_mapping.json', 'w') as outfile:
json.dump(position_canton_mapping, outfile)
else:
with open('position_canton_mapping.json', 'r') as infile:
position_canton_mapping = json.load(infile)
In [16]:
if RESET:
for index, row in ch_data.iterrows():
global id_canton_mapping
if row.values[0] in id_canton_mapping:
row_values = row.iloc[1:].astype(float)
plot = row_values.transpose().plot()
plt.ylim(0,30)
plt.xlabel('Year')
plt.ylabel('% of unemployed')
plt.title('Graph of unemployment in ' + id_canton_mapping[row.values[0]])
plt.savefig('ch_graphs/graph_' + row.values[0] + '.png')
plt.close()
You can see the map here :
In [17]:
all_markers = folium.FeatureGroup("Markers")
for canton_id in id_canton_mapping:
location = position_canton_mapping[canton_id]
html = '<img src="http://feudal-ambitions.com/ADA/HW3/ch_graphs/graph_' + canton_id + '" width=450 height=300>'
iframe = folium.IFrame(html, width=470, height=320)
popup = folium.Popup(iframe, max_width=2650)
custom_icon = folium.features.CustomIcon("http://feudal-ambitions.com/ADA/HW3/ch_flags/" + canton_id + ".png", icon_size=(25, 30))
all_markers.add_child(folium.Marker(location, popup=popup, icon=custom_icon))
map_ch.add_child(all_markers)
map_ch
Out[17]:
As before, we added unemployment evolution graphs for interactivity.
We can note on this map that western regions seem to have higher unemplyment rates than eastern regions. As a general rule, the cantons of the french speaking Romandie seem to experience higher unemployment rates. Zurich, which houses the largest city of the country also has heightened unemployment rate. The cantons that have the least unemployment are mostly in the inner or mountainous Swiss regions and all have low, rural populations.
Now we would like to view the statistics containing the jobseekers that are currently employed. Unfortunetely, the website does not provide the rate of all jobseekers in the population. So we calculate these rates ourselves.
First we take the numbers of all jobseekers and the number of jobless jobseekers (not the rates). Using the number of unemployed people and the rates from before, we calculate the active population of the cantons. Then we can calculate the rate of all jobseekers (employer or not) in the canton.
In [18]:
ch_data2 = pd.read_csv('data/unemployed_num_switzerland.csv', thousands="'")
ch_data3 = pd.read_csv('data/jobseekers_switzerland.csv', thousands="'")
ch_data4 = pd.read_csv('data/unemployed_foreigner_swiss.csv', thousands="'")
In [19]:
active_pop = ((ch_data2.iloc[:, 1:] * 100) / ch_data.iloc[:, 1:])
ch_jobseeker_rate = (ch_data3.iloc[:, 1:] * 100) / active_pop
ch_jobseeker_rate = pd.concat((ch_data['Canton'], ch_jobseeker_rate), axis=1)
ch_jobseeker_rate
Out[19]:
As before, we redefine the necessary functions with small changes and plot our map.
In [20]:
colors_unemployement = ['#fef0d9','#fdd49e','#fdbb84','#fc8d59','#e34a33','#b30000']
In [21]:
map_ch_j = folium.Map([46.8,8.3], tiles='cartodbpositron', zoom_start=8)
map_ch_j.choropleth(geo_data='topojson/ch-cantons.geojson.json',data=ch_jobseeker_rate,
columns=['Canton', '2017'],
key_on='feature.id',
fill_color='OrRd',
fill_opacity=1,
line_opacity=1,
legend_name='Percentage of jobseekers in canton')
In [22]:
if RESET:
for index, row in ch_jobseeker_rate.iterrows():
global id_canton_mapping
row_values = row.iloc[1:].astype(float)
plot = row_values.transpose().plot()
plt.ylim(0,30)
plt.xlabel('Year')
plt.ylabel('% of jobseekers')
plt.title('Graph of jobseekers in ' + id_canton_mapping[row.values[0]])
plt.savefig('ch_graphs/graph_j_' + row.values[0] + '.png')
plt.close()
In [23]:
all_markersj = folium.FeatureGroup("Markers")
for canton_id in id_canton_mapping:
location = position_canton_mapping[canton_id]
html = '<img src="http://feudal-ambitions.com/ADA/HW3/ch_graphs/graph_j_' + canton_id + '" width=450 height=300>'
iframe = folium.IFrame(html, width=470, height=320)
popup = folium.Popup(iframe, max_width=2650)
custom_icon = folium.features.CustomIcon("http://feudal-ambitions.com/ADA/HW3/ch_flags/" + canton_id + ".png", icon_size=(25, 30))
all_markersj.add_child(folium.Marker(location, popup=popup, icon=custom_icon))
map_ch_j.add_child(all_markersj)
map_ch_j
Out[23]:
In this case we see that in general the rates are higher which makes sense since we add to unemployed people the employed jobseekers. Moreover we see that Romandie still has the highest rates although this can be explained by the original high unemployment figures. We can see some changes in how the cantons are ranked in the rates like Valais for example.
For this exercise, we again use the amstat swiss statistics website to generate data that we need. This time we generate unemployment numbers separated by nationality and later further separated by age.
In [24]:
ch_data4
ch_foreigner = ch_data4[ch_data4.Nationalité == 'Etrangers']
ch_foreigner.index = range(0,len(ch_foreigner))
ch_foreigner = ch_foreigner.drop('Nationalité', axis=1)
ch_swiss = ch_data4[ch_data4.Nationalité == 'Suisses']
ch_swiss.index = range(0,len(ch_swiss))
ch_swiss = ch_swiss.drop('Nationalité', axis=1)
In [25]:
ch_swiss_rate = (ch_swiss.iloc[:, 1:] * 100) / active_pop
ch_swiss_rate = pd.concat((ch_data['Canton'], ch_swiss_rate), axis=1)
ch_foreigner_rate = (ch_foreigner.iloc[:, 1:] * 100) / active_pop
ch_foreigner_rate = pd.concat((ch_data['Canton'], ch_foreigner_rate), axis=1)
In [26]:
ch_swiss
Out[26]:
As always, we decided to show the graphs of unemployment, this time, for both Swiss and foreigners.
In [27]:
if RESET:
for index, row in ch_swiss_rate.iterrows():
global id_canton_mapping
row_values = row.iloc[2:].astype(float)
plot = row_values.transpose().plot()
row_values = ch_foreigner_rate.iloc[index][2:].astype(float)
plot = row_values.transpose().plot()
plt.ylim(0,10)
plt.legend(['Swiss', 'Foreigners'])
plt.xlabel('Year')
plt.ylabel('% of jobseekers')
plt.title('Graph of jobseekers in ' + id_canton_mapping[row.values[0]])
plt.savefig('ch_graphs/graph_s_f_' + row.values[0] + '.png')
plt.close()
In order to display the most informations in a single map, we decided to show small bar plots for each canton, showing the Swiss and the foreigners unemployment rates, as well as a color for each canton, indicating the difference.
First, we need to create these small bar plots, and save them. The produced graphs are then hosted on our personal server. As always, we only create them if RESET is set to true.
In [28]:
if RESET:
for index, row in ch_swiss_rate.iterrows():
row_values = [row.iloc[-1], ch_foreigner_rate.iloc[index][-1]]
plot = plt.bar([0,1], row_values)
plot[0].set_color('b')
plot[1].set_color('orange')
plt.ylim(0,6)
plt.axis('off')
plt.savefig('icon_ch/icon_' + row.values[0] + '.png')
plt.close()
Now, we can write the function that outputs a color based on the difference between the two unemployment rates. We decided to use two linear scales, on for the negative values, and one for the positive ones. This is done to ensure that white values correspond to a small difference, and colorful ones for large difference either positive of negative.
Blue colors indicate that the unemployment rate is higher for swiss nationals while red colors indicate that the rate is higher for foregin nationals.
In [29]:
colors_s_f_neg = ['#b2182b','#d6604d','#f0b572','#fddbc7']
colors_s_f_pos = ['#d1e5f0','#92c5de','#4393c3','#2166ac']
diff_ch_for = ch_swiss_rate.drop('Canton', axis=1).subtract(ch_foreigner_rate.drop('Canton', axis=1))
diff_ch_for = pd.concat((ch_data['Canton'], diff_ch_for), axis=1)
min_s_f_ch = diff_ch_for['2017'].min()
max_s_f_ch = diff_ch_for['2017'].max()
def get_color_ch_s_f(canton, properties):
diff_ch_s_f = diff_ch_for.loc[diff_ch_for['Canton'] == canton, '2017'].values
if (diff_ch_s_f < 0):
ratio = (diff_ch_s_f - min_s_f_ch) / (-min_s_f_ch)
colors = colors_s_f_neg
else:
ratio = (diff_ch_s_f) / (max_s_f_ch)
colors = colors_s_f_pos
index = math.floor(ratio * len(colors))
if index == len(colors):
index = index - 1
return colors[index]
As always, we can now create the map.
In [30]:
map_ch_s_f = folium.Map([46.8,8.3], tiles='cartodbpositron', zoom_start=8)
# Color of the cantons
folium.TopoJson(
open('topojson/ch-cantons.topojson.json'),
object_path='objects.cantons',
style_function=lambda feature: {
'fillOpacity' : 1,
'opacity' : 1,
'fillColor': get_color_ch_s_f(feature['id'], feature['properties']),
'color' : 'black',
'weight' : 1
}
).add_to(map_ch_s_f)
Out[30]:
And now we add the 'markers' (that are the small bar plots), that display the graphs.
In [31]:
all_markers_s_f = folium.FeatureGroup("Markers")
for canton_id in id_canton_mapping:
location = position_canton_mapping[canton_id]
html = '<img src="http://feudal-ambitions.com/ADA/HW3/ch_graphs/graph_s_f_' + canton_id + '" width=450 height=300>'
iframe = folium.IFrame(html, width=470, height=320)
popup = folium.Popup(iframe, max_width=2650)
custom_icon = folium.features.CustomIcon("http://feudal-ambitions.com/ADA/HW3/icon_ch/icon_"+ canton_id +".png", icon_size=(30, 60))
all_markers_s_f.add_child(folium.Marker(location, popup=popup, icon=custom_icon))
map_ch_s_f.add_child(all_markers_s_f)
map_ch_s_f
Out[31]:
We can see that there is some variability in each canton and while there's a bit more cantons with a higher swiss unemployment rate, there isn't a clear rule as to which canton has more unemployed people of either category. The biggest differences are seen in Valais and in Jura. This could be due to the fact that a lot of French people work in Switzerland's border cantons, leading to less job placements for Swiss nationals. On the other hand, Valais is pretty difficult to access from neighboring countries, meaning that swiss nationals can fill most of the placement opportunities in the canton.
For this second part, we used the amsat data containing the age information for January of 2017.
First we clean the data by removing unncecessary titles and columns like before. After that we split the data in 6: 2 dataframes for the nationality which are divided in 3 dataframe for each age category.
We then plot the data of all the dataframes grouped by canton.
In [32]:
ch_jobseeker_age = pd.read_csv('data/jobseeker_ch_for_age.csv', thousands="'")
In [33]:
# clean the data
ch_jobseeker_age = ch_jobseeker_age[ch_jobseeker_age['Classes dâge 15-24, 25-49, 50 ans et plus'] != 'Total']
ch_jobseeker_age = ch_jobseeker_age[ch_jobseeker_age['Nationalité'] != 'Total']
ch_jobseeker_age = ch_jobseeker_age[ch_jobseeker_age['Canton'] != 'Total']
# Split by Nationality
ch_jobseeker_age_foreigner = ch_jobseeker_age[ch_jobseeker_age['Nationalité'] == 'Etrangers']
ch_jobseeker_age_swiss = ch_jobseeker_age[ch_jobseeker_age['Nationalité'] == 'Suisses']
In [34]:
ch_jobseeker_age_foreigner_1 = (ch_jobseeker_age_foreigner[ch_jobseeker_age_foreigner['Classes dâge 15-24, 25-49, 50 ans et plus'] == '1'])
ch_jobseeker_age_foreigner_1 = ch_jobseeker_age_foreigner_1.drop(['Nationalité','Classes dâge 15-24, 25-49, 50 ans et plus', 'Unnamed: 3'], 1)
ch_jobseeker_age_foreigner_2 = (ch_jobseeker_age_foreigner[ch_jobseeker_age_foreigner['Classes dâge 15-24, 25-49, 50 ans et plus'] == '2'])
ch_jobseeker_age_foreigner_2 = ch_jobseeker_age_foreigner_2.drop(['Nationalité','Classes dâge 15-24, 25-49, 50 ans et plus', 'Unnamed: 3'], 1)
ch_jobseeker_age_foreigner_3 = (ch_jobseeker_age_foreigner[ch_jobseeker_age_foreigner['Classes dâge 15-24, 25-49, 50 ans et plus'] == '3'])
ch_jobseeker_age_foreigner_3 = ch_jobseeker_age_foreigner_3.drop(['Nationalité','Classes dâge 15-24, 25-49, 50 ans et plus', 'Unnamed: 3'], 1)
In [35]:
ch_jobseeker_age_swiss_1 = (ch_jobseeker_age_swiss[ch_jobseeker_age_swiss['Classes dâge 15-24, 25-49, 50 ans et plus'] == '1'])
ch_jobseeker_age_swiss_1 = ch_jobseeker_age_swiss_1.drop(['Nationalité','Classes dâge 15-24, 25-49, 50 ans et plus', 'Unnamed: 3'], 1)
ch_jobseeker_age_swiss_2 = (ch_jobseeker_age_swiss[ch_jobseeker_age_swiss['Classes dâge 15-24, 25-49, 50 ans et plus'] == '2'])
ch_jobseeker_age_swiss_2 = ch_jobseeker_age_swiss_2.drop(['Nationalité','Classes dâge 15-24, 25-49, 50 ans et plus', 'Unnamed: 3'], 1)
ch_jobseeker_age_swiss_3 = (ch_jobseeker_age_swiss[ch_jobseeker_age_swiss['Classes dâge 15-24, 25-49, 50 ans et plus'] == '3'])
ch_jobseeker_age_swiss_3 = ch_jobseeker_age_swiss_3.drop(['Nationalité','Classes dâge 15-24, 25-49, 50 ans et plus', 'Unnamed: 3'], 1)
In [36]:
width = 1/8
colors_foreigners = ['#b2182b','#d6604d','#f0b572']
colors_swiss = ['#2166ac', '#4393c3', '#92c5de']
fig, ax = plt.subplots(figsize=(20,7))
pos = list(range(len(ch_jobseeker_age_foreigner_1['2017'])))
pop = active_pop['2017']
values = ch_jobseeker_age_foreigner_1['2017'].astype(float).values
plt.bar(pos, values/pop*100, width, color=colors_foreigners[0])
values = ch_jobseeker_age_foreigner_2['2017'].astype(float).values
plt.bar([p + width for p in pos], values/pop*100, width, color=colors_foreigners[1])
values = ch_jobseeker_age_foreigner_3['2017'].astype(float).values
plt.bar([p + width*2 for p in pos], values/pop*100, width, color=colors_foreigners[2])
values = ch_jobseeker_age_swiss_1['2017'].astype(float).values
plt.bar([p + width*3.5 for p in pos], values/pop*100, width, color=colors_swiss[0])
values = ch_jobseeker_age_swiss_2['2017'].astype(float).values
plt.bar([p + width*4.5 for p in pos], values/pop*100, width, color=colors_swiss[1])
values = ch_jobseeker_age_swiss_3['2017'].astype(float).values
plt.bar([p + width*5.5 for p in pos], values/pop*100, width, color=colors_swiss[2])
ax.set_ylabel('Jobseekers (%)')
ax.set_title('Foreigner jobseekers')
ax.set_xticks([p + 3 * width for p in pos])
plt.legend(['Foreigner: 15-24', 'Foreigner: 25-49', 'Foreigner: 50 and more', 'Swiss: 15-24', 'Swiss: 25-49', 'Swiss: 50 and more',], loc='upper left')
catons_names = ch_jobseeker_age_foreigner_1['Canton'].values
ax.set_xticklabels(catons_names)
for tick in ax.get_xticklabels(): #We rotate the ticks for better readability
tick.set_rotation(90)
plt.xlim(min(pos)-width, max(pos)+width*6)
plt.grid(axis='y')
plt.show()
We can see that there is a bigger difference between the age categories '25-49' and '50 and more' in the foreigners than in the Swiss. This could be due to the fact that foreigners that are over 50 years old tend to go back to their origin countries.
In [ ]: