Under the Boot: A Study of Female Unemployment Rates in South America

NYU Stern School of Business | Data Bootcamp Final Project

Project Overview

The macroeconomic goal to achieve full employment comes with the negative externality of gender inequality, as female workers generally lack employment opportunities in comparison to their male counterparts. This project analyses how female unemployment rates have evolved over the last 20 years (1995-2014) in South America; a continent characterized by its machista mentality, and an example comparable to many developing regions around the world. The questions we address are as follows:

  • What is the relationship between unemployment rates and economic growth?
  • How substantial is the unemployment gender inequality gap in South America?
  • What has been the direction of female unemployment rates throughout the years?

We tackle these questions by first producing regression plots to analyse the gender gap in unemployment across 5 South American countries with different macroeconomic situations: Argentina, Colombia, Ecuador, Peru and Uruguay. Secondly, we create a choropleth map presenting the female unemployment rates in South America for every year. Finally, we generate a timelapse of the map to portray how the female unemployment rates have changed throughout the years.

Our aim is to raise awareness about the ongoing issue of female labor force participation and economic gender inequality in the subcontinent. Additionally, we will be able to recognize which of the countries have their female labor force relatively more involved in the economy. This will open a possibility for further research for anyone interested in the factors that increase/decrease female labor force participation, be it across South America, or in any other similar developing countries.

Importing the necessary packages


In [1]:
import sys                             # system module
import pandas as pd                    # data package
import matplotlib.pyplot as plt        # graphics module  
import datetime as dt                  # date and time module
import numpy as np                     # foundation for pandas
import seaborn.apionly as sns          # more matplotlib graphics

from IPython.display import YouTubeVideo # visualizing Youtube Videos
from IPython.display import VimeoVideo   # visualizing Vimeo Videos

# plotly imports for choropleth map
from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly                                # to print version and init notebook
import plotly.plotly as py
from IPython.display import display

# these lines make our graphics show up in the notebook           
plotly.offline.init_notebook_mode()
%matplotlib inline 

print('Today: ', dt.date.today())


Today:  2016-05-05

The Data

  • Varibales - GDP Growth Rate (%), Female Unemployment 15-64yrs (%), Male Unemployment 15-64yrs (%)-
  • Region - South America
  • Countries - Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Suriname, Uruguay, and Venezuela.
  • Years of Data - 1990 to 2014

About the Sources

The data was accessed from two sources: The Iter-American Development Bank and the World Bank. We obtained the female and male unemployment rates from the former, and the GDP growth rates from the latter.

The Iter-American Development Bank only allowed us to download 5 countries and only 5 years at a time, thus we downloaded the data as many times as necessary to get data from 1990-2014 for 10 South American countries, and saved it as one excel file. The World Bank had all world countries, however we only needed the 10 South American Countries, so in the downloaded excel file, we saved the countries we were going to work with.

Accessing the Specific Excel Files

The data files are available in our github account. The data is not edited, it is only limited to the South American countries. All data shaping has been done in this Python project.


In [2]:
# Links for the data

GrowthRates = 'https://rawgit.com/malusg/DataBootcamp_FinalProject/master/Growth_Rates_SA.xlsx'
UnemploymentRates = 'https://rawgit.com/malusg/DataBootcamp_FinalProject/master/Unemployment_Rates_SA1.xls'

Data set 1: Shaping and Cleaning


In [3]:
GDP = pd.read_excel(GrowthRates)
GDP = GDP.drop(['Indicator Code', 'Indicator Name'], axis=1) # Irrelevant variables

# set the index to be country name/code so that just the years are columns
GDP = GDP.set_index(["Country Name", "Country Code"])

# make sure columns are now integers as this is needed to merge the data (below)
GDP.columns = GDP.columns.astype(int)

# only use data starting at 1995
GDP = GDP[list(range(1995, 2015))]

# transpose (years as index), unstack years into columns, then pop country Code out to be column
GDP = GDP.T.unstack().reset_index(level="Country Code")

# clean up index level names and column names
GDP.index.names = ["Country", "Year"]
GDP.columns = ["ISO", "GrowthRate"]
GDP.head()


Out[3]:
ISO GrowthRate
Country Year
Argentina 1995 ARG -2.845210
1996 ARG 5.526690
1997 ARG 8.111047
1998 ARG 3.850179
1999 ARG -3.385457

Data set 2: Shaping and Cleaning


In [4]:
UNEMP = pd.read_excel(UnemploymentRates)

# fix capitalization of country names
UNEMP['Country'] = UNEMP['Country'].str.capitalize()

# set index to be Country/variable
UNEMP = UNEMP.set_index(['Country', 'Indicator Name'])

# extract only the columns that are years starting from 1995
UNEMP = UNEMP[list(range(1995, 2015))]

# pull years into index, then rotate the indicators up top
UNEMP = UNEMP.stack().unstack(level="Indicator Name")

# clean up index level names
UNEMP.index.names = ["Country", "Year"]

# clean up column names
UNEMP.columns = ["UnemploymentFemale", "UnemploymentMale"]
UNEMP.head()


Out[4]:
UnemploymentFemale UnemploymentMale
Country Year
Argentina 1995 19.350000 15.446
1996 20.077999 15.405
1997 16.594999 11.833
1998 14.196000 11.414
1999 15.546000 12.943

Merging the Data


In [5]:
merged = pd.merge(UNEMP, GDP, 
                  how="outer",       # keep all columns
                  left_index=True,   # use index as merge keys
                  right_index=True)

merged.head()


Out[5]:
UnemploymentFemale UnemploymentMale ISO GrowthRate
Country Year
Argentina 1995 19.350000 15.446 ARG -2.845210
1996 20.077999 15.405 ARG 5.526690
1997 16.594999 11.833 ARG 8.111047
1998 14.196000 11.414 ARG 3.850179
1999 15.546000 12.943 ARG -3.385457

Presenting the relationship between female and male unemployment rates, and economic growth rates in a sample of 5 Latin American Countries: Argentina, Colombia, Ecuador, Peru and Uruguay.

This sample was chosen out of the subcontinent due to data availability, the similarity in culture, the similarity in emerging market status of the economies, and in order to group by geographical proximity.


In [6]:
plt.style.use("seaborn-muted")

In [7]:
# Scatter plots for the 5 selctede countries using matplotlib

Argentina = merged[merged["ISO"] == "ARG"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Argentina, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Argentina, color= 'blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("ARGENTINA", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)


Colombia = merged[merged["ISO"] == "COL"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Colombia, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Colombia, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("COLOMBIA", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)


Ecuador = merged[merged["ISO"] == "ECU"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Ecuador, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Ecuador, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("ECUADOR", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)


Peru = merged[merged["ISO"] == "PER"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Peru, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Peru, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("PERU", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)


Uruguay = merged[merged["ISO"] == "URY"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Uruguay, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Uruguay, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("URUGUAY", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)


Out[7]:
<matplotlib.legend.Legend at 0x10ef82e80>

Selecting variables for specific years


In [8]:
Years = {}                                                   # dictionary to be used in function for choropleth map
for i in range(0, 20):                                       # loop to extract every year
    Years[1995+i] = merged.xs(1995+i, level="Year", axis=0)

In [9]:
Years[2002]                                                  # choose the year you wish to see


Out[9]:
UnemploymentFemale UnemploymentMale ISO GrowthRate
Country
Argentina 18.228001 17.863001 ARG -10.894485
Bolivia 5.570000 3.235000 BOL 2.485566
Brazil 11.812000 7.475000 BRA 3.053161
Chile NaN NaN CHL 2.166909
Colombia 20.106001 12.430000 COL 2.503980
Ecuador NaN NaN ECU 4.096777
Guyana NaN NaN GUY 1.051001
Paraguay 14.125000 9.017000 PRY -0.021402
Peru 6.019000 5.336000 PER 5.453765
Suriname NaN NaN SUR 4.299944
Uruguay 21.457001 13.761000 URY -7.732007
Venezuela 18.471001 14.420000 VEN -8.855647

Choropleth Map: Annual Female Unemployment Rates in South America


In [10]:
def map_for_year(year):    #defining a function
    
    # color scale
    scl = [[0.0, 'rgb(238, 232, 205)'],[0.2, 'rgb(238, 232, 170)'],[0.4, 'rgb(238, 221, 130)'],\
            [0.6, 'rgb(238, 201, 0)'],[0.8, 'rgb(238, 173, 14)'],[1.0, 'rgb(238, 136, 51)']]

    # create a trace for the map
    trace = dict(type="choropleth",
                 locations=Years[year]["ISO"],         # use ISO names
                 z=Years[year]["UnemploymentFemale"], 
                 colorscale= scl,                   
                 autocolorscale = False,
                 text=Years[year].index, colorbar = dict(
                 title = "Female Unemployment (%)"), marker = dict(
                 line = dict (color = 'rgb(105,105,105)',
                    width = 1)))
    # map layout
    layout = dict(geo={"scope": "south america", "resolution": 50}, 
                  width=750, height=550,
                  title = 'Female Unemployment Rate in South America ({0})'.format(year))
    
    map_fig = go.Figure(data=[trace], layout=layout)
    return map_fig

In [11]:
fig_year = map_for_year(1998) #Choose the year you wish to see
iplot(fig_year, link_text="")


Timelapse: Female Unemployment Trend in South America

Presenting the changes in female unemployment rates across the subcontinent over the years.

Our timelapse was created using map frames. In order to save the frames you need to create a plotly account and link it to this IPython file. The results are shown in the timelapse below.


In [12]:
#Save the frames for each year, 

for year in Years.keys():
    fig = map_for_year(year)
    py.image.save_as(fig, filename='map_year{0}.png'.format(year))

In [13]:
VimeoVideo("165192103",width=800, height=500,)


Out[13]:

Case Study (Colombia): Reports agree with our data


In [14]:
YouTubeVideo("JPsYa8szLUk", width=800, height=500,)


Out[14]:

Concluding Remarks

These graphics reveal some key issues; constant economic gender inequality, high female unemployment rates irrespective of GDP growth, and a slow trend in either of these issues improving. This suggests, therefore, that the topic is one which deserves more attention in order to reach desired levels of employment in the global economy, and particularly in developing countries.