The macroeconomic goal to achieve full employment comes with the negative externality of gender inequality, as female workers generally lack employment opportunities in comparison to their male counterparts. This project analyses how female unemployment rates have evolved over the last 20 years (1995-2014) in South America; a continent characterized by its machista mentality, and an example comparable to many developing regions around the world. The questions we address are as follows:
We tackle these questions by first producing regression plots to analyse the gender gap in unemployment across 5 South American countries with different macroeconomic situations: Argentina, Colombia, Ecuador, Peru and Uruguay. Secondly, we create a choropleth map presenting the female unemployment rates in South America for every year. Finally, we generate a timelapse of the map to portray how the female unemployment rates have changed throughout the years.
Our aim is to raise awareness about the ongoing issue of female labor force participation and economic gender inequality in the subcontinent. Additionally, we will be able to recognize which of the countries have their female labor force relatively more involved in the economy. This will open a possibility for further research for anyone interested in the factors that increase/decrease female labor force participation, be it across South America, or in any other similar developing countries.
In [1]:
import sys # system module
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics module
import datetime as dt # date and time module
import numpy as np # foundation for pandas
import seaborn.apionly as sns # more matplotlib graphics
from IPython.display import YouTubeVideo # visualizing Youtube Videos
from IPython.display import VimeoVideo # visualizing Vimeo Videos
# plotly imports for choropleth map
from plotly.offline import iplot, iplot_mpl # plotting functions
import plotly.graph_objs as go # ditto
import plotly # to print version and init notebook
import plotly.plotly as py
from IPython.display import display
# these lines make our graphics show up in the notebook
plotly.offline.init_notebook_mode()
%matplotlib inline
print('Today: ', dt.date.today())
About the Sources
The data was accessed from two sources: The Iter-American Development Bank and the World Bank. We obtained the female and male unemployment rates from the former, and the GDP growth rates from the latter.
The Iter-American Development Bank only allowed us to download 5 countries and only 5 years at a time, thus we downloaded the data as many times as necessary to get data from 1990-2014 for 10 South American countries, and saved it as one excel file. The World Bank had all world countries, however we only needed the 10 South American Countries, so in the downloaded excel file, we saved the countries we were going to work with.
Accessing the Specific Excel Files
The data files are available in our github account. The data is not edited, it is only limited to the South American countries. All data shaping has been done in this Python project.
In [2]:
# Links for the data
GrowthRates = 'https://rawgit.com/malusg/DataBootcamp_FinalProject/master/Growth_Rates_SA.xlsx'
UnemploymentRates = 'https://rawgit.com/malusg/DataBootcamp_FinalProject/master/Unemployment_Rates_SA1.xls'
In [3]:
GDP = pd.read_excel(GrowthRates)
GDP = GDP.drop(['Indicator Code', 'Indicator Name'], axis=1) # Irrelevant variables
# set the index to be country name/code so that just the years are columns
GDP = GDP.set_index(["Country Name", "Country Code"])
# make sure columns are now integers as this is needed to merge the data (below)
GDP.columns = GDP.columns.astype(int)
# only use data starting at 1995
GDP = GDP[list(range(1995, 2015))]
# transpose (years as index), unstack years into columns, then pop country Code out to be column
GDP = GDP.T.unstack().reset_index(level="Country Code")
# clean up index level names and column names
GDP.index.names = ["Country", "Year"]
GDP.columns = ["ISO", "GrowthRate"]
GDP.head()
Out[3]:
In [4]:
UNEMP = pd.read_excel(UnemploymentRates)
# fix capitalization of country names
UNEMP['Country'] = UNEMP['Country'].str.capitalize()
# set index to be Country/variable
UNEMP = UNEMP.set_index(['Country', 'Indicator Name'])
# extract only the columns that are years starting from 1995
UNEMP = UNEMP[list(range(1995, 2015))]
# pull years into index, then rotate the indicators up top
UNEMP = UNEMP.stack().unstack(level="Indicator Name")
# clean up index level names
UNEMP.index.names = ["Country", "Year"]
# clean up column names
UNEMP.columns = ["UnemploymentFemale", "UnemploymentMale"]
UNEMP.head()
Out[4]:
In [5]:
merged = pd.merge(UNEMP, GDP,
how="outer", # keep all columns
left_index=True, # use index as merge keys
right_index=True)
merged.head()
Out[5]:
Presenting the relationship between female and male unemployment rates, and economic growth rates in a sample of 5 Latin American Countries: Argentina, Colombia, Ecuador, Peru and Uruguay.
This sample was chosen out of the subcontinent due to data availability, the similarity in culture, the similarity in emerging market status of the economies, and in order to group by geographical proximity.
In [6]:
plt.style.use("seaborn-muted")
In [7]:
# Scatter plots for the 5 selctede countries using matplotlib
Argentina = merged[merged["ISO"] == "ARG"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Argentina, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Argentina, color= 'blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("ARGENTINA", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)
Colombia = merged[merged["ISO"] == "COL"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Colombia, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Colombia, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("COLOMBIA", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)
Ecuador = merged[merged["ISO"] == "ECU"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Ecuador, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Ecuador, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("ECUADOR", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)
Peru = merged[merged["ISO"] == "PER"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Peru, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Peru, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("PERU", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)
Uruguay = merged[merged["ISO"] == "URY"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Uruguay, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Uruguay, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("URUGUAY", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)
Out[7]:
In [8]:
Years = {} # dictionary to be used in function for choropleth map
for i in range(0, 20): # loop to extract every year
Years[1995+i] = merged.xs(1995+i, level="Year", axis=0)
In [9]:
Years[2002] # choose the year you wish to see
Out[9]:
In [10]:
def map_for_year(year): #defining a function
# color scale
scl = [[0.0, 'rgb(238, 232, 205)'],[0.2, 'rgb(238, 232, 170)'],[0.4, 'rgb(238, 221, 130)'],\
[0.6, 'rgb(238, 201, 0)'],[0.8, 'rgb(238, 173, 14)'],[1.0, 'rgb(238, 136, 51)']]
# create a trace for the map
trace = dict(type="choropleth",
locations=Years[year]["ISO"], # use ISO names
z=Years[year]["UnemploymentFemale"],
colorscale= scl,
autocolorscale = False,
text=Years[year].index, colorbar = dict(
title = "Female Unemployment (%)"), marker = dict(
line = dict (color = 'rgb(105,105,105)',
width = 1)))
# map layout
layout = dict(geo={"scope": "south america", "resolution": 50},
width=750, height=550,
title = 'Female Unemployment Rate in South America ({0})'.format(year))
map_fig = go.Figure(data=[trace], layout=layout)
return map_fig
In [11]:
fig_year = map_for_year(1998) #Choose the year you wish to see
iplot(fig_year, link_text="")
Presenting the changes in female unemployment rates across the subcontinent over the years.
Our timelapse was created using map frames. In order to save the frames you need to create a plotly account and link it to this IPython file. The results are shown in the timelapse below.
In [12]:
#Save the frames for each year,
for year in Years.keys():
fig = map_for_year(year)
py.image.save_as(fig, filename='map_year{0}.png'.format(year))
In [13]:
VimeoVideo("165192103",width=800, height=500,)
Out[13]:
In [14]:
YouTubeVideo("JPsYa8szLUk", width=800, height=500,)
Out[14]:
These graphics reveal some key issues; constant economic gender inequality, high female unemployment rates irrespective of GDP growth, and a slow trend in either of these issues improving. This suggests, therefore, that the topic is one which deserves more attention in order to reach desired levels of employment in the global economy, and particularly in developing countries.