Latin Female

Under the Boot: A Study of Female Unemployment Rates in South America

NYU Stern School of Business | Data Bootcamp Final Project

Created by Gracious Nyamupachitu and Maria Lucia Sanchez

Browse by Content:

Project Overview
Final Dataset - Merged
Regplots - Unemployment Trends
Loop - Variables per Year
Choropleth Map of South America
Timelapse of Female Unemployment Rates
Concluding Case Study

Project Overview

The macroeconomic goal to achieve full employment comes with the negative externality of gender inequality, as female workers generally lack employment opportunities in comparison to their male counterparts. This project analyses how female unemployment rates have evolved over the last 20 years (1995-2014) in South America; a continent characterized by its machista mentality, and an example comparable to many developing regions around the world. The questions we address are as follows:

What is the relationship between unemployment rates and economic growth?
How substantial is the unemployment gender inequality gap in South America?
What has been the direction of female unemployment rates throughout the years?

We tackle these questions by first producing regression plots to analyse the gender gap in unemployment across 5 South American countries with different macroeconomic situations: Argentina, Colombia, Ecuador, Peru and Uruguay. Secondly, we create a choropleth map presenting the female unemployment rates in South America for every year. Finally, we generate a timelapse of the map to portray how the female unemployment rates have changed throughout the years.

Our aim is to raise awareness about the ongoing issue of female labor force participation and economic gender inequality in the subcontinent. Additionally, we will be able to recognize which of the countries have their female labor force relatively more involved in the economy. This will open a possibility for further research for anyone interested in the factors that increase/decrease female labor force participation, be it across South America, or in any other similar developing countries.

Importing the necessary packages



In [1]:

    
import sys                             # system module
import pandas as pd                    # data package
import matplotlib.pyplot as plt        # graphics module  
import datetime as dt                  # date and time module
import numpy as np                     # foundation for pandas
import seaborn.apionly as sns          # more matplotlib graphics

from IPython.display import YouTubeVideo # visualizing Youtube Videos
from IPython.display import VimeoVideo   # visualizing Vimeo Videos

# plotly imports for choropleth map
from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly                                # to print version and init notebook
import plotly.plotly as py
from IPython.display import display

# these lines make our graphics show up in the notebook           
plotly.offline.init_notebook_mode()
%matplotlib inline 

print('Today: ', dt.date.today())









    











    



Today:  2016-05-05

The Data

Varibales - GDP Growth Rate (%), Female Unemployment 15-64yrs (%), Male Unemployment 15-64yrs (%)-
Region - South America
Countries - Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Suriname, Uruguay, and Venezuela.
Years of Data - 1990 to 2014

About the Sources

The data was accessed from two sources: The Iter-American Development Bank and the World Bank. We obtained the female and male unemployment rates from the former, and the GDP growth rates from the latter.

The Iter-American Development Bank only allowed us to download 5 countries and only 5 years at a time, thus we downloaded the data as many times as necessary to get data from 1990-2014 for 10 South American countries, and saved it as one excel file. The World Bank had all world countries, however we only needed the 10 South American Countries, so in the downloaded excel file, we saved the countries we were going to work with.

Accessing the Specific Excel Files

The data files are available in our github account. The data is not edited, it is only limited to the South American countries. All data shaping has been done in this Python project.



In [2]:

    
# Links for the data

GrowthRates = 'https://rawgit.com/malusg/DataBootcamp_FinalProject/master/Growth_Rates_SA.xlsx'
UnemploymentRates = 'https://rawgit.com/malusg/DataBootcamp_FinalProject/master/Unemployment_Rates_SA1.xls'

Data set 1: Shaping and Cleaning



In [3]:

    
GDP = pd.read_excel(GrowthRates)
GDP = GDP.drop(['Indicator Code', 'Indicator Name'], axis=1) # Irrelevant variables

# set the index to be country name/code so that just the years are columns
GDP = GDP.set_index(["Country Name", "Country Code"])

# make sure columns are now integers as this is needed to merge the data (below)
GDP.columns = GDP.columns.astype(int)

# only use data starting at 1995
GDP = GDP[list(range(1995, 2015))]

# transpose (years as index), unstack years into columns, then pop country Code out to be column
GDP = GDP.T.unstack().reset_index(level="Country Code")

# clean up index level names and column names
GDP.index.names = ["Country", "Year"]
GDP.columns = ["ISO", "GrowthRate"]
GDP.head()

Data set 2: Shaping and Cleaning



In [4]:

    
UNEMP = pd.read_excel(UnemploymentRates)

# fix capitalization of country names
UNEMP['Country'] = UNEMP['Country'].str.capitalize()

# set index to be Country/variable
UNEMP = UNEMP.set_index(['Country', 'Indicator Name'])

# extract only the columns that are years starting from 1995
UNEMP = UNEMP[list(range(1995, 2015))]

# pull years into index, then rotate the indicators up top
UNEMP = UNEMP.stack().unstack(level="Indicator Name")

# clean up index level names
UNEMP.index.names = ["Country", "Year"]

# clean up column names
UNEMP.columns = ["UnemploymentFemale", "UnemploymentMale"]
UNEMP.head()









    Out[4]:






  
    
      
      
      UnemploymentFemale
      UnemploymentMale
    
    
      Country
      Year
      
      
    
  
  
    
      Argentina
      1995
      19.350000
      15.446
    
    
      1996
      20.077999
      15.405
    
    
      1997
      16.594999
      11.833
    
    
      1998
      14.196000
      11.414
    
    
      1999
      15.546000
      12.943

Merging the Data



In [5]:

    
merged = pd.merge(UNEMP, GDP, 
                  how="outer",       # keep all columns
                  left_index=True,   # use index as merge keys
                  right_index=True)

merged.head()









    Out[5]:






  
    
      
      
      UnemploymentFemale
      UnemploymentMale
      ISO
      GrowthRate
    
    
      Country
      Year
      
      
      
      
    
  
  
    
      Argentina
      1995
      19.350000
      15.446
      ARG
      -2.845210
    
    
      1996
      20.077999
      15.405
      ARG
      5.526690
    
    
      1997
      16.594999
      11.833
      ARG
      8.111047
    
    
      1998
      14.196000
      11.414
      ARG
      3.850179
    
    
      1999
      15.546000
      12.943
      ARG
      -3.385457

Country Plots: Unemployment Trends

Presenting the relationship between female and male unemployment rates, and economic growth rates in a sample of 5 Latin American Countries: Argentina, Colombia, Ecuador, Peru and Uruguay.

This sample was chosen out of the subcontinent due to data availability, the similarity in culture, the similarity in emerging market status of the economies, and in order to group by geographical proximity.



In [6]:

    
plt.style.use("seaborn-muted")



In [7]:

    
# Scatter plots for the 5 selctede countries using matplotlib

Argentina = merged[merged["ISO"] == "ARG"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Argentina, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Argentina, color= 'blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("ARGENTINA", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)


Colombia = merged[merged["ISO"] == "COL"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Colombia, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Colombia, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("COLOMBIA", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)


Ecuador = merged[merged["ISO"] == "ECU"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Ecuador, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Ecuador, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("ECUADOR", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)


Peru = merged[merged["ISO"] == "PER"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Peru, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Peru, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("PERU", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)


Uruguay = merged[merged["ISO"] == "URY"]
fig, ax = plt.subplots()
sns.regplot(x="GrowthRate", y="UnemploymentFemale", data=Uruguay, color='orange', ax=ax, ci=None, label="Female Unemployment Rate (%)")
sns.regplot(x="GrowthRate", y="UnemploymentMale", data=Uruguay, color='blue', ax=ax, ci=None, label="Male Unemployment Rate (%)")
ax.set_title("URUGUAY", fontsize=20)
ax.set_xlabel('GDP Growth Rate (%)')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True)









    Out[7]:





<matplotlib.legend.Legend at 0x10ef82e80>

Selecting variables for specific years



In [8]:

    
Years = {}                                                   # dictionary to be used in function for choropleth map
for i in range(0, 20):                                       # loop to extract every year
    Years[1995+i] = merged.xs(1995+i, level="Year", axis=0)



In [9]:

    
Years[2002]                                                  # choose the year you wish to see









    Out[9]:






  
    
      
      UnemploymentFemale
      UnemploymentMale
      ISO
      GrowthRate
    
    
      Country
      
      
      
      
    
  
  
    
      Argentina
      18.228001
      17.863001
      ARG
      -10.894485
    
    
      Bolivia
      5.570000
      3.235000
      BOL
      2.485566
    
    
      Brazil
      11.812000
      7.475000
      BRA
      3.053161
    
    
      Chile
      NaN
      NaN
      CHL
      2.166909
    
    
      Colombia
      20.106001
      12.430000
      COL
      2.503980
    
    
      Ecuador
      NaN
      NaN
      ECU
      4.096777
    
    
      Guyana
      NaN
      NaN
      GUY
      1.051001
    
    
      Paraguay
      14.125000
      9.017000
      PRY
      -0.021402
    
    
      Peru
      6.019000
      5.336000
      PER
      5.453765
    
    
      Suriname
      NaN
      NaN
      SUR
      4.299944
    
    
      Uruguay
      21.457001
      13.761000
      URY
      -7.732007
    
    
      Venezuela
      18.471001
      14.420000
      VEN
      -8.855647

Choropleth Map: Annual Female Unemployment Rates in South America



In [10]:

    
def map_for_year(year):    #defining a function
    
    # color scale
    scl = [[0.0, 'rgb(238, 232, 205)'],[0.2, 'rgb(238, 232, 170)'],[0.4, 'rgb(238, 221, 130)'],\
            [0.6, 'rgb(238, 201, 0)'],[0.8, 'rgb(238, 173, 14)'],[1.0, 'rgb(238, 136, 51)']]

    # create a trace for the map
    trace = dict(type="choropleth",
                 locations=Years[year]["ISO"],         # use ISO names
                 z=Years[year]["UnemploymentFemale"], 
                 colorscale= scl,                   
                 autocolorscale = False,
                 text=Years[year].index, colorbar = dict(
                 title = "Female Unemployment (%)"), marker = dict(
                 line = dict (color = 'rgb(105,105,105)',
                    width = 1)))
    # map layout
    layout = dict(geo={"scope": "south america", "resolution": 50}, 
                  width=750, height=550,
                  title = 'Female Unemployment Rate in South America ({0})'.format(year))
    
    map_fig = go.Figure(data=[trace], layout=layout)
    return map_fig



In [11]:

    
fig_year = map_for_year(1998) #Choose the year you wish to see
iplot(fig_year, link_text="")

Timelapse: Female Unemployment Trend in South America

Presenting the changes in female unemployment rates across the subcontinent over the years.

Our timelapse was created using map frames. In order to save the frames you need to create a plotly account and link it to this IPython file. The results are shown in the timelapse below.



In [12]:

    
#Save the frames for each year, 

for year in Years.keys():
    fig = map_for_year(year)
    py.image.save_as(fig, filename='map_year{0}.png'.format(year))



In [13]:

    
VimeoVideo("165192103",width=800, height=500,)









    Out[13]:

Case Study (Colombia): Reports agree with our data



In [14]:

    
YouTubeVideo("JPsYa8szLUk", width=800, height=500,)









    Out[14]:

Concluding Remarks

These graphics reveal some key issues; constant economic gender inequality, high female unemployment rates irrespective of GDP growth, and a slow trend in either of these issues improving. This suggests, therefore, that the topic is one which deserves more attention in order to reach desired levels of employment in the global economy, and particularly in developing countries.

		ISO	GrowthRate
Country	Year
Argentina	1995	ARG	-2.845210
	1996	ARG	5.526690
	1997	ARG	8.111047
	1998	ARG	3.850179
	1999	ARG	-3.385457

	UnemploymentFemale	UnemploymentMale	ISO	GrowthRate
Country
Argentina	18.228001	17.863001	ARG	-10.894485
Bolivia	5.570000	3.235000	BOL	2.485566
Brazil	11.812000	7.475000	BRA	3.053161
Chile	NaN	NaN	CHL	2.166909
Colombia	20.106001	12.430000	COL	2.503980
Ecuador	NaN	NaN	ECU	4.096777
Guyana	NaN	NaN	GUY	1.051001
Paraguay	14.125000	9.017000	PRY	-0.021402
Peru	6.019000	5.336000	PER	5.453765
Suriname	NaN	NaN	SUR	4.299944
Uruguay	21.457001	13.761000	URY	-7.732007
Venezuela	18.471001	14.420000	VEN	-8.855647