Gender Wage Gap

Satya Gatiganti

Fall 2016

A ubiquitous issue that still exists in modern society is the presence of a gender wage gap. It would be interesting to further analyze the issue by observing how different wage gaps are between countries the world and determine what possible political and economic reasons could attribute to a country's wage gap. Then a closer study will be conducted on the gender wage gap in the United States and how the wage gap changes in different industries of profession, as well as by the role of other demographic factors like race. There seem to be an innumerable number of reasons for the gender wage gap. This project aims to focus on a few interesting factors that could have correlation with the gender wage gap.

Part One - Comparing the gender wage gap between countries

This first part of the analysis requires the use of the data source: https://data.oecd.org/earnwage/gender-wage-gap.htm and www.ilo.org/gwr-figures to construct visual depictions of the gender wage gap around the world.

Part Two - What factors affect the gender wage gap in a global context?

Women's Political and Economic Rights:

Using the data source http://www.humanrightsdata.com/p/data-documentation.html

Human Development Index:

Using the data source http://hdr.undp.org/en/data

GDP Per Capita:

Using the data source http://data.worldbank.org/data-catalog/world-development-indicators

Part Three - Segmenting the Gender Wage Gap in the US

US Wage Gap Segmented by Profession:

Using the data source http://www.bls.gov/cps/cpsaat39.xlsx

US Wage Gap Segmented by Race:

Using the data source http://www.bls.gov/cps/cpsaat37.xlsx

Importing Packages

I first imported all packages necessary to extract my data and to plot my graphs.



In [4]:

    
%matplotlib inline
import sys
import pandas as pd
import pandas_datareader as 
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt
import pycountry
import seaborn.apionly as sns

from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly                                # just to print version and init notebook
import cufflinks as cf                       # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)

import plotly.plotly as py

# these lines make our graphics show up in the notebook
%matplotlib inline             
plotly.offline.init_notebook_mode(connected=True)
print('Python version: ', sys.version)
print('Pandas version: ', pd.__version__)
print('Today: ', dt.date.today())









    











    











    











    



Python version:  3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Pandas version:  0.19.0
Today:  2016-12-22

Part One

Cleaning Up the Data

The first data source (https://data.oecd.org/earnwage/gender-wage-gap.htm) describes the gender wage gap in percentage terms, of 26 different countries and the other data source (www.ilo.org/gwr-figures) contains similar countries, but also includes developing countries that the first data source lacks. I will be using both of these data sources for my analysis.

Data Source - OECD



In [5]:

    
url = "https://stats.oecd.org/sdmx-json/data/DP_LIVE/.WAGEGAP.../OECD?contentType=csv&detail=code&separator=comma&csv-lang=en"
wg = pd.read_csv(url)  
wg
wg = wg[wg.SUBJECT!= 'SELFEMPLOYED'] #Only interested in total employment, and not just self employment
wg = wg[wg.TIME == 2010] #Only looking at the wage gaps for the most recent year: 2010
wg = wg.drop('INDICATOR', axis =1).drop('MEASURE', axis =1).drop('FREQUENCY', axis = 1).drop('Flag Codes', axis =1).drop('SUBJECT', axis=1).drop('TIME', axis =1)#these 
#Dropped all columns that I do not need
wg.columns = ["ISO", "Gender Wage Gap in % Difference"]
wg.head()









    Out[5]:






  
    
      
      ISO
      Gender Wage Gap in % Difference
    
  
  
    
      34
      AUS
      14.042933
    
    
      49
      AUT
      19.188862
    
    
      65
      BEL
      7.043796
    
    
      83
      CAN
      18.977470
    
    
      101
      CZE
      15.798503

Data Source - ILO

The ILO spreadsheet only contained country names and not the ISO codes. To convert the country names to ISO codes, I used the following code to substitute country names for their respective ISO codes



In [6]:

    
file2 = 'data/GWG.xlsx'
gw = pd.read_excel(file2, encoding='latin-1')
gw = gw.drop('Explained wage gap', axis =1)
gw.columns = ['Country', 'Gender Wage Gap in % Difference']
gw 

import csv

dic={}


# with open("wikipedia-iso-country-codes.csv") as f:
#     file= csv.DictReader(f, delimiter=',')
#     for line in file:
#         dic[line['English short name lower case']]=line['Alpha-3 code']
        

# countries=gw['Country']

# [dic[x] for x in countries]

#Copied and pasted the values that were obtained from the line of code above 
  
gw['ISO'] = ['USA',
 'IRL',
 'GBR',
 'EST',
 'ISL',
 'CZE',
 'CYP',
 'NOR',
 'AUT',
 'NLD',
 'DEU',
 'GRC',
 'SVK',
 'BEL',
 'FIN',
 'BGR',
 'FRA',
 'ITA',
 'ESP',
 'LUX',
 'DNK',
 'LVA',
 'ROU',
 'PRT',
 'HUN',
 'POL',
 'SVN',
 'LTU',
 'SWE',
 'RUS',
 'ARG',
 'URY',
 'BRA',
 'CHL',
 'CHN',
 'PER',
 'MEX',
 'VNM', 
 'IND']

gw = gw[['ISO', 'Country', 'Gender Wage Gap in % Difference']]

gw = gw.drop('Country', axis =1)
gw.head()









    Out[6]:






  
    
      
      ISO
      Gender Wage Gap in % Difference
    
  
  
    
      0
      USA
      35.79502
    
    
      1
      IRL
      29.10423
    
    
      2
      GBR
      29.05460
    
    
      3
      EST
      28.94645
    
    
      4
      ISL
      27.83397

Merging the two data sets together

I combined the two data sets together using the code below and also dropped all duplicated countries



In [7]:

    
combination = pd.concat([wg, gw,], axis = 0)
combination = combination.drop_duplicates('ISO')
combination.head()









    Out[7]:






  
    
      
      ISO
      Gender Wage Gap in % Difference
    
  
  
    
      34
      AUS
      14.042933
    
    
      49
      AUT
      19.188862
    
    
      65
      BEL
      7.043796
    
    
      83
      CAN
      18.977470
    
    
      101
      CZE
      15.798503

Mapping out the Gender Wage Gaps in the World



In [8]:

    
layout = dict(geo={"scope": "world", "resolution": 150}, 
        title = 'Gender Wage Gaps',
         width=750, height=550)



In [9]:

    
trace = dict(type="choropleth",
             locations=combination["ISO"],   # use ISO names
             z=combination["Gender Wage Gap in % Difference"], # defines the color, 
             colorscale=[[0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
            [0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[1,"rgb(220, 220, 220)"]],  
             text=combination.index,     
             colorbar = dict(title = 'Gender Wage Gap in % Difference between Men and Women'),
            ) 
             


iplot(go.Figure(data=[trace], layout=layout), link_text="")

By looking at the map, we can make several general observations about the difference in gender wage gaps in different areas of the world. Europe in general, seems to have lower wage gaps as compared to the rest of the world. South America and Eastern Asia both pose generally higher wage gaps as the two areas general encompass developing countries. These developing countries could have more limits on advancement of women in the workplace, or have only recently began to encourage women advancement. Korea on this map holds the highest record at 39% while Sweden holds the lowest at 4%.

Part 2 - CIRI Data

I used the data source, CIRI, for retrieving women's political and economic rights in a country in order to run a correlation with its respective gender wage gap. I extrapolated only the variables I need, including the country, year and the scores for women's economic and political rights. Again, I needed to convert the country names into ISO codes so I used the code below to do so. A score of 1.0 indicates limited rights while a score of 3.0 indicates full rights. I expect countries with lower economic and political rights to have higher wage gaps.



In [7]:

    
url = "http://unstats.un.org/unsd/methods/m49/m49alpha.htm"
iso = pd.read_html(url, attrs={"border": "0", "cellpadding": "2"}, header=0)[0]
iso = iso.rename(columns={"ISO ALPHA-3 code": "ISO", 
                          "Country or area name": "Country"})
iso = iso.drop("Numerical  code", axis=1)
iso = iso.set_index("Country")
iso.head()









    Out[7]:






  
    
      
      ISO
    
    
      Country
      
    
  
  
    
      Afghanistan
      AFG
    
    
      Åland Islands
      ALA
    
    
      Albania
      ALB
    
    
      Algeria
      DZA
    
    
      American Samoa
      ASM



In [8]:

    
#Women's Political and Economic Rights
url1 = 'https://drive.google.com/uc?export=download&id=0BxDpF6GQ-6fbbEdZYmRXekhGMFE'
hr = pd.read_csv(url1)
hr = hr[['CTRY', 'YEAR', 'WECON', 'WOPOL']]
hr.columns = ['Country', 'Year', 'Economic Rights', 'Political Rights']
hr = hr[hr.Year == 2011] #Only need data from the most recent year, which is 2011
hr.head()









    Out[8]:






  
    
      
      Country
      Year
      Economic Rights
      Political Rights
    
  
  
    
      30
      Afghanistan
      2011
      0.0
      2.0
    
    
      61
      Albania
      2011
      1.0
      2.0
    
    
      92
      Algeria
      2011
      1.0
      2.0
    
    
      123
      Andorra
      2011
      3.0
      3.0
    
    
      154
      Angola
      2011
      1.0
      3.0



In [9]:

    
iso_hr = iso[iso.index.isin(hr['Country'])]
iso_hr.head()









    Out[9]:






  
    
      
      ISO
    
    
      Country
      
    
  
  
    
      Afghanistan
      AFG
    
    
      Albania
      ALB
    
    
      Algeria
      DZA
    
    
      Andorra
      AND
    
    
      Angola
      AGO



In [10]:

    
hr = hr.set_index('Country')



In [11]:

    
hr = hr.merge(iso, left_index=True, right_index=True)
hr.head()









    Out[11]:






  
    
      
      Year
      Economic Rights
      Political Rights
      ISO
    
    
      Country
      
      
      
      
    
  
  
    
      Afghanistan
      2011
      0.0
      2.0
      AFG
    
    
      Albania
      2011
      1.0
      2.0
      ALB
    
    
      Algeria
      2011
      1.0
      2.0
      DZA
    
    
      Andorra
      2011
      3.0
      3.0
      AND
    
    
      Angola
      2011
      1.0
      3.0
      AGO



In [12]:

    
hr = hr.set_index('ISO')



In [14]:

    
combination = combination.set_index('ISO')

Gender Wage Gap in % Difference vs Economic Rights Swarmplot



In [15]:

    
combination1 = combination.merge(hr, left_index=True, right_index=True)
combination1









    Out[15]:






  
    
      
      Gender Wage Gap in % Difference
      Year
      Economic Rights
      Political Rights
    
    
      ISO
      
      
      
      
    
  
  
    
      ARG
      27.200000
      2011
      2.0
      3.0
    
    
      AUS
      14.042933
      2011
      3.0
      2.0
    
    
      AUT
      19.188862
      2011
      3.0
      3.0
    
    
      BEL
      7.043796
      2011
      2.0
      3.0
    
    
      BRA
      24.351110
      2011
      1.0
      2.0
    
    
      BGR
      18.323460
      2011
      2.0
      2.0
    
    
      CAN
      18.977470
      2011
      2.0
      2.0
    
    
      CHL
      23.231090
      2011
      1.0
      2.0
    
    
      CHN
      22.893980
      2011
      1.0
      2.0
    
    
      COL
      6.430555
      2011
      1.0
      2.0
    
    
      CYP
      25.660240
      2011
      1.0
      2.0
    
    
      DNK
      8.895099
      2011
      3.0
      3.0
    
    
      EST
      26.601547
      2011
      2.0
      2.0
    
    
      FIN
      18.876999
      2011
      3.0
      3.0
    
    
      FRA
      14.054337
      2011
      2.0
      2.0
    
    
      DEU
      16.750813
      2011
      3.0
      3.0
    
    
      GRC
      12.172841
      2011
      2.0
      2.0
    
    
      HUN
      6.381712
      2011
      1.0
      2.0
    
    
      ISL
      14.314784
      2011
      3.0
      3.0
    
    
      IND
      24.800000
      2011
      1.0
      3.0
    
    
      IRL
      12.844729
      2011
      3.0
      2.0
    
    
      ISR
      20.700092
      2011
      1.0
      2.0
    
    
      ITA
      9.940335
      2011
      3.0
      2.0
    
    
      JPN
      28.684301
      2011
      1.0
      2.0
    
    
      LVA
      13.333333
      2011
      1.0
      2.0
    
    
      LTU
      6.956522
      2011
      2.0
      2.0
    
    
      LUX
      4.968271
      2011
      3.0
      3.0
    
    
      MEX
      11.627907
      2011
      2.0
      2.0
    
    
      NLD
      18.597561
      2011
      3.0
      3.0
    
    
      NZL
      7.011236
      2011
      3.0
      3.0
    
    
      NOR
      8.059702
      2011
      3.0
      3.0
    
    
      PER
      22.629050
      2011
      1.0
      2.0
    
    
      POL
      7.190207
      2011
      1.0
      2.0
    
    
      PRT
      13.450867
      2011
      2.0
      2.0
    
    
      ROU
      15.370390
      2011
      1.0
      2.0
    
    
      SVN
      11.633663
      2011
      2.0
      2.0
    
    
      ESP
      6.585059
      2011
      2.0
      3.0
    
    
      SWE
      14.321260
      2011
      2.0
      3.0
    
    
      CHE
      20.053595
      2011
      3.0
      2.0
    
    
      TUR
      20.064724
      2011
      1.0
      2.0
    
    
      USA
      18.810680
      2011
      3.0
      2.0
    
    
      URY
      27.168630
      2011
      2.0
      2.0



In [16]:

    
ax = sns.swarmplot(x="Economic Rights", y="Gender Wage Gap in % Difference", data=combination1)
fig_mpl = ax.get_figure()

I used a swarmplot for this data set because there are only 3 scores on the scale of women's economic rights: 1.0, 2.0 or 3.0. It can be deciphered that generally that at a score of 1.0, there is a larger cluster of countries with higher gender wage gaps between 20% and 25%. Examples include Brazil, Peru, and China where their gender wage gaps and economic rights score are 24%, 23%, and 22% respectively. Women's economic rights include equal pay for equal work, job security, and equality in hiring and promotion practices. This is consistent with the current economic environment of developing countries. The outliers include Poland, Hungary, and Colombia all of which have relatively low wage gaps. It's important to keep in mind that a score of 1.0 includes economic rights under law but the laws do not necessarily have strict and vigilant enforcement by the government. This could attribute to the low score but low wage gap. In addition, the economic score encompasses factors such as the right to work in occupations classified as dangerous and right to work at night which are less commonly associated factors.

A less of a distinction can be made between a score of 2 and a score of 3. But, overall, a trend can be noted where a women's economic rights does have an effect on the differential wage gap.



In [17]:

    
fig, ax = plt.subplots()
ax.scatter(combination1['Political Rights'], combination1['Gender Wage Gap in % Difference'],     # x,y variables         # size of bubbles
            alpha=0.5)   
ax.set_title('Gender Wage Gap vs Political Rights', loc='left', fontsize=14)
ax.set_xlabel('Political Rights')
ax.set_ylabel('Gender Wage Gap in % Difference')









    Out[17]:





<matplotlib.text.Text at 0x11d5bd198>

Political rights include a women's right to vote, hold political office, and petition government officials. Under a score of 3.0, there is a larger cluster of countries that have lower gender wage gaps between 5% and 10%. However, there is a less of a noteworthy trend between the correlation of the political rights score and the gender wage gap. Perhaps a country's womens' political rights does not necessarily correlate with their economic rights?

Correlation between womens' political rights and economic rights



In [19]:

    
np.corrcoef(hr['Economic Rights'], hr['Political Rights'])[0, 1]









    Out[19]:





0.98981153519249365

There appears to be a very high correlation coefficient between womens' political and economic rights. So it can be concluded that the role of a womens' political rights is not as indicatuve of the gender wage gap as their economic rights do.

Average gender wage gaps for countries with a political rights score of 2 and 3

In this section, the means of the gender wage gaps for countries with political rights scores of 2/3 are calculated to measure if a lower political rights score will result in a lower gender wage gap.



In [130]:

    
# Gender Wage Gap in % Difference when Political Rights is 2.0 
scoretwopolitical = [2.0]
political2 = combination1[combination1['Political Rights'].isin(scoretwopolitical)].drop('Economic Rights', axis =1).drop('Year', axis =1)
political2.head()









    Out[130]:






  
    
      
      Gender Wage Gap in % Difference
      Political Rights
    
    
      ISO
      
      
    
  
  
    
      AUS
      14.042933
      2.0
    
    
      BRA
      24.351110
      2.0
    
    
      BGR
      18.323460
      2.0
    
    
      CAN
      18.977470
      2.0
    
    
      CHL
      23.231090
      2.0



In [131]:

    
politicalrights2 = political2['Gender Wage Gap in % Difference']
politicalrights2.mean()









    Out[131]:





17.301709266666666



In [128]:

    
# Gender Wage Gap in % Difference when Political Rights is 3.0 
scoretwopolitical = [3.0]
political3 = combination1[combination1['Political Rights'].isin(scoretwopolitical)].drop('Economic Rights', axis =1).drop('Year', axis =1)
political3.head()









    Out[128]:






  
    
      
      Gender Wage Gap in % Difference
      Political Rights
    
    
      ISO
      
      
    
  
  
    
      ARG
      27.200000
      3.0
    
    
      AUT
      19.188862
      3.0
    
    
      BEL
      7.043796
      3.0
    
    
      DNK
      8.895099
      3.0
    
    
      FIN
      18.876999
      3.0



In [132]:

    
politicalrights3 = political3['Gender Wage Gap in % Difference']
politicalrights3.mean()









    Out[132]:





14.043817242857141

The hypothesis that a lower political rights score would indicate a higher gender wage gap seems to be consistent with the calculated means. However, it is important to keep in mind that the sample size for countries with a score of 3 is rather small.

Part 2 - HDI Data

For this data set, I will compare the HDI index of a country with its gender wage gap. The HDI index is defined as a composite statistic of life expectancy, education, and per capita income indicators, which are used to rank countries into four tiers of human development. I expect countries with lower HDI indexes to have higher wage gaps. For the first step, I am converting the country names to their appropriate ISO codes.



In [20]:

    
url = "http://unstats.un.org/unsd/methods/m49/m49alpha.htm"
iso = pd.read_html(url, attrs={"border": "0", "cellpadding": "2"}, header=0)[0]
iso = iso.rename(columns={"ISO ALPHA-3 code": "ISO", 
                          "Country or area name": "Country"})
iso = iso.drop("Numerical  code", axis=1)
iso = iso.set_index("Country")
iso.head()









    Out[20]:






  
    
      
      ISO
    
    
      Country
      
    
  
  
    
      Afghanistan
      AFG
    
    
      Åland Islands
      ALA
    
    
      Albania
      ALB
    
    
      Algeria
      DZA
    
    
      American Samoa
      ASM



In [21]:

    
file2 = 'data/2015_Statistical_Annex_Table_1.xls'
hdi = pd.read_excel(file2)
hdi = hdi[['Country','Human Development Index']]
hdi.head()









    Out[21]:






  
    
      
      Country
      Human Development Index
    
  
  
    
      0
      Norway
      0.943877
    
    
      1
      Australia
      0.934958
    
    
      2
      Switzerland
      0.929613
    
    
      3
      Denmark
      0.923328
    
    
      4
      Netherlands
      0.921794



In [22]:

    
iso_hdi = iso[iso.index.isin(hdi['Country'])]

iso_hdi.head()









    Out[22]:






  
    
      
      ISO
    
    
      Country
      
    
  
  
    
      Afghanistan
      AFG
    
    
      Albania
      ALB
    
    
      Algeria
      DZA
    
    
      Andorra
      AND
    
    
      Angola
      AGO



In [23]:

    
hdi = hdi.set_index('Country')



In [24]:

    
hdi = hdi.merge(iso, left_index=True, right_index=True)
hdi.head()









    Out[24]:






  
    
      
      Human Development Index
      ISO
    
    
      Country
      
      
    
  
  
    
      Afghanistan
      0.465264
      AFG
    
    
      Albania
      0.732766
      ALB
    
    
      Algeria
      0.735624
      DZA
    
    
      Andorra
      0.844642
      AND
    
    
      Angola
      0.531591
      AGO



In [25]:

    
hdi = hdi.set_index('ISO')



In [26]:

    
combination2 = combination.merge(hdi, left_index=True, right_index=True)
combination2.head()









    Out[26]:






  
    
      
      Gender Wage Gap in % Difference
      Human Development Index
    
    
      ISO
      
      
    
  
  
    
      ARG
      27.200000
      0.835572
    
    
      AUS
      14.042933
      0.934958
    
    
      AUT
      19.188862
      0.885027
    
    
      BEL
      7.043796
      0.890263
    
    
      BRA
      24.351110
      0.755292



In [27]:

    
fig, ax = plt.subplots()
ax.scatter(combination2['Human Development Index'], combination2['Gender Wage Gap in % Difference'],     # x,y variables         # size of bubbles
            alpha=0.5)   
ax.set_title('Gender Wage Gap vs HDI', loc='left', fontsize=14)
ax.set_xlabel('HDI')
ax.set_ylabel('Gender Wage Gap in % Difference')









    Out[27]:





<matplotlib.text.Text at 0x11dc6c0b8>



In [29]:

    
np.corrcoef(combination2['Gender Wage Gap in % Difference'], combination2['Human Development Index'])[0, 1]









    Out[29]:





-0.29077952683532698

Analysis - HDI Data

When using the scatter plot and taking the correlation coefficient (-.29) into consideration, it can be established that there is a weak correlation between gender wage gaps and their respective HDI indexes. Perhaps, the HDI index contains too many broad factors, each of which has a different relation with the gender wage gap.

Part 2 - GDP Per Capita Data

For this data set, I will compare the GDP per capita of a country with its gender wage gap. GDP per capita is an indicator of a country's economic activity as well as the purchasing power of its residents. It denotes the living standards and how advanced any particular economy maybe. I would expect countries with lower GDP per capitas to have higher gender wage gaps. For the first step, I am converting the country names to their appropriate ISO codes.



In [44]:

    
url = "http://unstats.un.org/unsd/methods/m49/m49alpha.htm"
iso = pd.read_html(url, attrs={"border": "0", "cellpadding": "2"}, header=0)[0]
iso = iso.rename(columns={"ISO ALPHA-3 code": "ISO", 
                          "Country or area name": "Country"})
iso = iso.drop("Numerical  code", axis=1)
iso = iso.set_index("Country")
iso.head()









    Out[44]:






  
    
      
      ISO
    
    
      Country
      
    
  
  
    
      Afghanistan
      AFG
    
    
      Åland Islands
      ALA
    
    
      Albania
      ALB
    
    
      Algeria
      DZA
    
    
      American Samoa
      ASM



In [45]:

    
# World Bank Data 
file2 = '/Users/satyagatiganti/Desktop/Data_Bootcamp/WDI_Data.csv'
wb = pd.read_csv(file2, encoding = 'latin-1')
wb = wb.rename(columns={'Indicator Name': 'Factor'})
vlist = ['GDP per capita (constant 2010 US$)']
wb = wb[wb['Factor'].isin(vlist)]
wb = wb[['Country Name', 'Country Code', 'Factor', '2015']]
wb.columns = ['Country', 'ISO', 'Factor', 'GDP Per Capita']
wb = wb.drop('Factor', axis=1)
wb.head(5)









    Out[45]:






  
    
      
      Country
      ISO
      GDP Per Capita
    
  
  
    
      497
      Arab World
      ARB
      6400.320671
    
    
      1943
      Caribbean small states
      CSS
      9004.192587
    
    
      3389
      Central Europe and the Baltics
      CEB
      14163.027829
    
    
      4835
      Early-demographic dividend
      EAR
      3339.790563
    
    
      6281
      East Asia & Pacific
      EAS
      9041.731421



In [46]:

    
iso_wb = iso[iso.index.isin(wb['Country'])]

iso_wb.head()









    Out[46]:






  
    
      
      ISO
    
    
      Country
      
    
  
  
    
      Afghanistan
      AFG
    
    
      Albania
      ALB
    
    
      Algeria
      DZA
    
    
      American Samoa
      ASM
    
    
      Andorra
      AND



In [47]:

    
wb = wb.set_index('Country')



In [48]:

    
wb = wb.merge(iso, left_index=True, right_index =True)
wb.head()









    Out[48]:






  
    
      
      ISO_x
      GDP Per Capita
      ISO_y
    
    
      Country
      
      
      
    
  
  
    
      Afghanistan
      AFG
      623.925524
      AFG
    
    
      Albania
      ALB
      4541.386209
      ALB
    
    
      Algeria
      DZA
      4794.048900
      DZA
    
    
      American Samoa
      ASM
      NaN
      ASM
    
    
      Andorra
      ADO
      NaN
      AND



In [49]:

    
wb = wb.set_index('ISO_x')



In [51]:

    
combination3 = combination.merge(wb, left_index=True, right_index=True)
combination3.head()









    Out[51]:






  
    
      
      Gender Wage Gap in % Difference
      GDP Per Capita
      ISO_y
    
  
  
    
      ARG
      27.200000
      10514.587895
      ARG
    
    
      AUS
      14.042933
      54717.706705
      AUS
    
    
      AUT
      19.188862
      47667.805610
      AUT
    
    
      BEL
      7.043796
      44863.088182
      BEL
    
    
      BRA
      24.351110
      11159.254155
      BRA



In [52]:

    
combination3 = combination3.sort(['GDP Per Capita'], ascending = 1)
combination3.head()









    



/Users/satyagatiganti/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:1: FutureWarning:

sort(columns=....) is deprecated, use sort_values(by=.....)







    Out[52]:






  
    
      
      Gender Wage Gap in % Difference
      GDP Per Capita
      ISO_y
    
  
  
    
      IND
      24.800000
      1805.579625
      IND
    
    
      PER
      22.629050
      5974.477245
      PER
    
    
      CHN
      22.893980
      6416.183355
      CHN
    
    
      COL
      6.430555
      7447.779147
      COL
    
    
      BGR
      18.323460
      7502.436143
      BGR



In [53]:

    
fig, ax = plt.subplots(figsize = (20,8))  
combination3['Gender Wage Gap in % Difference'].plot(kind = 'bar', ax=ax)
ax.set_title('Gender Wage Gap by GDP Per Capita', loc='center', fontsize=14)
ax.set_xlabel('Countries in Ascending GDP Per Capita', fontsize = 12)
ax.set_ylabel('Gender Wage Gap in % Difference', fontsize = 12)









    Out[53]:





<matplotlib.text.Text at 0x1177661d0>

The countries on the x-axis in the graph above are listed in ascending order of GDP Per Capita. When looking at the graph, there appears to be little correlation between GDP per capita and Gender Wage Gap in % Difference. Similar to the analysis of political rights, perhaps GDP per capita is too broad of a factor to use in evaluating the gender wage gap.

Part 2 - Maternity Leave Data

For this data set, I will compare the Maternity Leave in Weeks of a country with its gender wage gap. I would expect countries with shorter maternity leaves to have higher gender wage gaps because this might indicate discrimination in the workplace. For the first step, I am converting the country names to their appropriate ISO codes



In [127]:

    
url = "http://unstats.un.org/unsd/methods/m49/m49alpha.htm"
iso = pd.read_html(url, attrs={"border": "0", "cellpadding": "2"}, header=0)[0]
iso = iso.rename(columns={"ISO ALPHA-3 code": "ISO", 
                          "Country or area name": "Country"})
iso = iso.drop("Numerical  code", axis=1)
iso = iso.set_index("Country")
iso.head()









    Out[127]:






  
    
      
      ISO
    
    
      Country
      
    
  
  
    
      Afghanistan
      AFG
    
    
      Åland Islands
      ALA
    
    
      Albania
      ALB
    
    
      Algeria
      DZA
    
    
      American Samoa
      ASM



In [128]:

    
file3 = 'data/Maternity Leave Data.xlsx'
ml = pd.read_excel(file3)
ml = ml[['Country', '2015']]
ml.columns = ['Country', 'Maternity Leave in Weeks']
ml = ml.drop(0).drop(36)
ml.head()









    Out[128]:






  
    
      
      Country
      Maternity Leave in Weeks
    
  
  
    
      1
      Australia
      6.0
    
    
      2
      Austria
      16.0
    
    
      3
      Belgium
      15.0
    
    
      4
      Canada
      17.0
    
    
      5
      Chile
      18.0



In [129]:

    
iso_ml = iso[iso.index.isin(ml['Country'])]

iso_ml.head()









    Out[129]:






  
    
      
      ISO
    
    
      Country
      
    
  
  
    
      Australia
      AUS
    
    
      Austria
      AUT
    
    
      Belgium
      BEL
    
    
      Canada
      CAN
    
    
      Chile
      CHL



In [130]:

    
ml = ml.set_index('Country')



In [131]:

    
ml = ml.merge(iso, left_index=True, right_index =True)
ml.head()









    Out[131]:






  
    
      
      Maternity Leave in Weeks
      ISO
    
    
      Country
      
      
    
  
  
    
      Australia
      6.0
      AUS
    
    
      Austria
      16.0
      AUT
    
    
      Belgium
      15.0
      BEL
    
    
      Canada
      17.0
      CAN
    
    
      Chile
      18.0
      CHL



In [132]:

    
ml = ml.set_index('ISO')



In [133]:

    
combination4 = combination.merge(ml, left_index=True, right_index=True)
combination4.head()









    Out[133]:






  
    
      
      Gender Wage Gap in % Difference
      Maternity Leave in Weeks
    
    
      ISO
      
      
    
  
  
    
      AUS
      14.042933
      6.0
    
    
      AUT
      19.188862
      16.0
    
    
      BEL
      7.043796
      15.0
    
    
      CAN
      18.977470
      17.0
    
    
      CHL
      23.231090
      18.0



In [134]:

    
np.corrcoef(combination4['Maternity Leave in Weeks'], combination4['Gender Wage Gap in % Difference'])[0, 1]









    Out[134]:





-0.076387039957295663

No figure was charted for this anaylsis because the correlation coefficient between maternity leave in weeks and the gender wage gap in % difference appears to be minimal, indicating a weak correlation. This policy appears to have little correlation with salaries, indicating there is not necessarily discrimination with regards to this.

Part 3 - Segmenting the Gender Wage Gap in the US

The current gender wage gap in the US stands at nearly 20%. I will be segmenting this percentage by profession and by race.

For the BLS data set, I will extrapolate a variety of professions, each of which is already associated with its own gender wage gap. I want to compare the difference in wage gaps according to industry and the reasons for these differences.



In [10]:

    
#US earnings by occupation and sex
file4 = 'http://www.bls.gov/cps/cpsaat39.xlsx'
us = pd.read_excel(file4)
us = us.drop(0).drop(1).drop(2).drop(3).drop(5)
us = us.drop('Unnamed: 1', axis =1).drop('Unnamed: 2', axis=1).drop('Unnamed: 3', axis =1).drop('Unnamed: 5', axis=1)
us.columns = ['Occupation', 'Men median weekly earnings', 'Women median weekly earnings']
us.head()









    Out[10]:






  
    
      
      Occupation
      Men median weekly earnings
      Women median weekly earnings
    
  
  
    
      4
      NaN
      Median weekly earnings
      Median weekly earnings
    
    
      6
      Total, full-time wage and salary workers
      895
      726
    
    
      7
      NaN
      NaN
      NaN
    
    
      8
      Management, professional, and related occupations
      1383
      996
    
    
      9
      Management, business, and financial operations...
      1436
      1073

I extracted the men and women median weekly earnings for 12 different industries. These professions are spread across many different industries.



In [11]:

    
occupationlist = ['Management occupations', 'Business and financial operations occupations', 'Architecture and engineering occupations', 'Community and social service occupations','Legal occupations', 'Arts, design, entertainment, sports, and media occupations', 'Healthcare practitioners and technical occupations', 'Food preparation and serving related occupations', 'Sales and related occupations', 'Office and administrative support occupations', 'Construction and extraction occupations', 'Education, training, and library occupations']
us = us[us['Occupation'].isin(occupationlist)]
us









    Out[11]:






  
    
      
      Occupation
      Men median weekly earnings
      Women median weekly earnings
    
  
  
    
      10
      Management occupations
      1486
      1139
    
    
      41
      Business and financial operations occupations
      1327
      1004
    
    
      88
      Architecture and engineering occupations
      1452
      1257
    
    
      134
      Community and social service occupations
      973
      845
    
    
      143
      Legal occupations
      1877
      1135
    
    
      149
      Education, training, and library occupations
      1144
      907
    
    
      161
      Arts, design, entertainment, sports, and media...
      1088
      942
    
    
      181
      Healthcare practitioners and technical occupat...
      1272
      991
    
    
      248
      Food preparation and serving related occupations
      481
      414
    
    
      292
      Sales and related occupations
      880
      578
    
    
      311
      Office and administrative support occupations
      693
      646
    
    
      376
      Construction and extraction occupations
      751
      704



In [12]:

    
# cleaning up the data and making the names of the occupations shorter
us = us.set_value(10, 'Occupation', 'Management').set_value(41, 'Occupation', 'Business & financial ops').set_value(88, 'Occupation', 'Architecture and Engineering').set_value(134, 'Occupation', 'Community & social service').set_value(143, 'Occupation', 'Legal').set_value(149, 'Occupation', 'Education').set_value(161, 'Occupation', 'Arts, entertainment, sports').set_value(181, 'Occupation', 'Healthcare').set_value(248, 'Occupation', 'Food Prep').set_value(292, 'Occupation', 'Sales').set_value(311, 'Occupation', 'Office & admin support').set_value(376, 'Occupation', 'Construction')
us









    Out[12]:






  
    
      
      Occupation
      Men median weekly earnings
      Women median weekly earnings
    
  
  
    
      10
      Management
      1486
      1139
    
    
      41
      Business & financial ops
      1327
      1004
    
    
      88
      Architecture and Engineering
      1452
      1257
    
    
      134
      Community & social service
      973
      845
    
    
      143
      Legal
      1877
      1135
    
    
      149
      Education
      1144
      907
    
    
      161
      Arts, entertainment, sports
      1088
      942
    
    
      181
      Healthcare
      1272
      991
    
    
      248
      Food Prep
      481
      414
    
    
      292
      Sales
      880
      578
    
    
      311
      Office & admin support
      693
      646
    
    
      376
      Construction
      751
      704



In [13]:

    
us = us.set_index('Occupation')



In [14]:

    
men = dict(type="scatter", 
            name="Men", 
            mode="markers",                       # draw dots
            x=us["Men median weekly earnings"],                    # x data
            y=us.index,                    # y data
            marker={"color": "Blue", "size": 12}  # dot color/size
           )
women = dict(type="scatter", 
             name="Women", 
             mode="markers",
             x=us['Women median weekly earnings'], 
             y=us.index,
             marker={"color": "Pink", "size": 12}
            )

def draw_line(row):
    sc = row.name
    line = dict(type="scatter",                # trace type
                x=[row["Women median weekly earnings"], row["Men median weekly earnings"]],  # x data
                y=[sc, sc],                    # y data flat
                mode="lines",                  # draw line
                name=sc,                       # name trace
                showlegend=False,              # no legend entry
                line={"color": "gray"}         # line color
               )
    return line
lines = list(us.apply(draw_line, axis=1))

layout = go.Layout 

layout = dict(width=600, height=750,                        # plot width/height
              yaxis={"title": "Occupation"},                    # yaxis label
              title="Gender earnings disparity by profession",            # title
              xaxis={"title": "Median Weekly Earnings"})
            
    # xaxis label}
             
# use + for two lists
data = [men, women] + lines  

# build and display the figure
fig = go.Figure(data=data, layout=layout)
iplot(fig)

According to the graph above, the largest wage gap is in the legal industry. It could be due to the practice areas women take up in the law. For example, women are more likely to practice family or employment law, both of which offer lower salaries than other types of laws that men are more likely to take up. However, after doing some research, it is uncovered that women lawyers are paid less regardless of how much they work. Some attribute the problem to the negotiating powers of women versus men. Areas that include a small wage gap are construction and admin support. The nature of these industries is such that they offer less flexibility in negotiation of salaries.

Part 3 - US Wage Gap by Race



In [18]:

    
file5 = 'http://www.bls.gov/cps/cpsaat37.xlsx'
demographics = pd.read_excel(file5)
demographics
demographics = demographics.drop(0).drop(1).drop(2).drop(3).drop(4).drop(5).drop(6).drop(7).drop(8).drop(9).drop(10).drop(11).drop(12).drop(13).drop(14).drop(15).drop(19).drop(23).drop(27).drop(31).drop(32).drop(16).drop(20).drop(24).drop(28)      
demographics = demographics.drop('Unnamed: 1', axis=1).drop('Unnamed: 3', axis =1).drop('Unnamed: 2',axis =1)#.drop('HOUSEHOLD DATA ANNUAL AVERAGES 37. Median weekly earnings of full-time wage and salary workers by selected characteristics', axis =1)
demographics.columns = ['Demographics', 'Median weekly earnings']
demographics = demographics.drop('Demographics', axis=1)
vlist4 = [920.0, 680.0, 1129.0, 5631.0]
demographics = demographics[demographics['Median weekly earnings'].isin(vlist)]
demographics['Race'] = ['White', 'Black or African American', 'Asian', 'Hispanic or Latino']
demographics['Women Median Weekly Earnings'] = [743.0, 615.0, 877.0, 556.0]
demographics['Men Median Weekly Earnings']= [920.0, 680.0, 1129.0, 631.0]
demographics = demographics.drop('Median weekly earnings', axis=1)
demographics = demographics.set_index('Race')
demographics









    Out[18]:






  
    
      
      Women Median Weekly Earnings
      Men Median Weekly Earnings
    
    
      Race
      
      
    
  
  
    
      White
      743.0
      920.0
    
    
      Black or African American
      615.0
      680.0
    
    
      Asian
      877.0
      1129.0
    
    
      Hispanic or Latino
      556.0
      631.0



In [19]:

    
Women = dict(type="bar",                                      # trace type
           orientation="h",                                 # make bars horizontal
           name="Women",                                      # legend entry
           x=demographics["Women Median Weekly Earnings"],                               # x data
           y=demographics.index,                                # y data
           marker={"color": "Pink"}                         # blue bars
          )
Men = dict(type="bar",                                    # trace type
             orientation="h",                               # horizontal bars
             name="Men",                                  # legend entry
             x=demographics["Men Median Weekly Earnings"],                           # x data
             y=demographics.index,                              # y data
             marker={"color": "Blue"}                       # pink bars
            )
layout = dict(width=650, height=750,                        # plot width/height
              yaxis={"title": "Race"},                    # yaxis label
              title="Earnings by Gender and Race",            # title
              xaxis={"title": "Median Weekly Earnings"}  # xaxis label}
             )

iplot(go.Figure(data=[Men, Women], layout=layout))

According to the bar graph above, the largest gender wage among races is obseved between Asian women and Asian men. Studies have shows that this wage gap can be attributable to discrimination in the work place, especially among Asian women. Other contributing factors, according to Pew Research, are risk aversion and negotiation skills; both of which Asians do not mobilize as heavily as by other races. Interestingly, negotiation plays a significant role in explaining the wage gap, whether segmented by occupation or by race. Another theory is that perhaps women do not necessarily seek the same high paying STEM fields as men do because of the experiences rooted in the prejudices, such as higher encouragement for men to pursue these professions. An important point to distinguish is that both Asian men and women make more money than their black, hispanic, or white counterparts. However, as Asian women are climbing the career ladder and thereby earning higher salaries, there is far more discrimination present at the executive levels. This could account for the larger wage gap among Asians.

Conclusion

Of the factors I used to compate the gender wages gaps in the world with, only the economic rights of a country seems to have correlation with the gender wage gaps. This indicates that the gender wage gap is a highly complex situation that has no direct and consistent reasons for its existence. The gender wage gap is attributable to an individual country's culture, government, and economic opportunities.

Segmenting the wage gap in the United States allows us to observe where women are making headways, and where there is still progress to be made. Professions such as food prep, construction, and admin support shower smaller wage gaps where as legal and sales demonstrate higher wage gaps. This alludes to the skill level, work culture, and long lasting prejudices imbedded within these professions. In terms of race, while women on average, are earning more than they did in the past, there still exists high barriers at more senior and executive levels.

Overall, this analysis demonstrates that women are certainly making headway in closing the gender wage gap but there is still room for improvement.



In [ ]:

	ISO	Gender Wage Gap in % Difference
34	AUS	14.042933
49	AUT	19.188862
65	BEL	7.043796
83	CAN	18.977470
101	CZE	15.798503

	ISO	Gender Wage Gap in % Difference
0	USA	35.79502
1	IRL	29.10423
2	GBR	29.05460
3	EST	28.94645
4	ISL	27.83397

	ISO
Country
Afghanistan	AFG
Åland Islands	ALA
Albania	ALB
Algeria	DZA
American Samoa	ASM

	Country	Year	Economic Rights	Political Rights
30	Afghanistan	2011	0.0	2.0
61	Albania	2011	1.0	2.0
92	Algeria	2011	1.0	2.0
123	Andorra	2011	3.0	3.0
154	Angola	2011	1.0	3.0

	Gender Wage Gap in % Difference	Year	Economic Rights	Political Rights
ISO
ARG	27.200000	2011	2.0	3.0
AUS	14.042933	2011	3.0	2.0
AUT	19.188862	2011	3.0	3.0
BEL	7.043796	2011	2.0	3.0
BRA	24.351110	2011	1.0	2.0
BGR	18.323460	2011	2.0	2.0
CAN	18.977470	2011	2.0	2.0
CHL	23.231090	2011	1.0	2.0
CHN	22.893980	2011	1.0	2.0
COL	6.430555	2011	1.0	2.0
CYP	25.660240	2011	1.0	2.0
DNK	8.895099	2011	3.0	3.0
EST	26.601547	2011	2.0	2.0
FIN	18.876999	2011	3.0	3.0
FRA	14.054337	2011	2.0	2.0
DEU	16.750813	2011	3.0	3.0
GRC	12.172841	2011	2.0	2.0
HUN	6.381712	2011	1.0	2.0
ISL	14.314784	2011	3.0	3.0
IND	24.800000	2011	1.0	3.0
IRL	12.844729	2011	3.0	2.0
ISR	20.700092	2011	1.0	2.0
ITA	9.940335	2011	3.0	2.0
JPN	28.684301	2011	1.0	2.0
LVA	13.333333	2011	1.0	2.0
LTU	6.956522	2011	2.0	2.0
LUX	4.968271	2011	3.0	3.0
MEX	11.627907	2011	2.0	2.0
NLD	18.597561	2011	3.0	3.0
NZL	7.011236	2011	3.0	3.0
NOR	8.059702	2011	3.0	3.0
PER	22.629050	2011	1.0	2.0
POL	7.190207	2011	1.0	2.0
PRT	13.450867	2011	2.0	2.0
ROU	15.370390	2011	1.0	2.0
SVN	11.633663	2011	2.0	2.0
ESP	6.585059	2011	2.0	3.0
SWE	14.321260	2011	2.0	3.0
CHE	20.053595	2011	3.0	2.0
TUR	20.064724	2011	1.0	2.0
USA	18.810680	2011	3.0	2.0
URY	27.168630	2011	2.0	2.0

	Country	Human Development Index
0	Norway	0.943877
1	Australia	0.934958
2	Switzerland	0.929613
3	Denmark	0.923328
4	Netherlands	0.921794

	Human Development Index	ISO
Country
Afghanistan	0.465264	AFG
Albania	0.732766	ALB
Algeria	0.735624	DZA
Andorra	0.844642	AND
Angola	0.531591	AGO

	Gender Wage Gap in % Difference	Human Development Index
ISO
ARG	27.200000	0.835572
AUS	14.042933	0.934958
AUT	19.188862	0.885027
BEL	7.043796	0.890263
BRA	24.351110	0.755292

	Country	ISO	GDP Per Capita
497	Arab World	ARB	6400.320671
1943	Caribbean small states	CSS	9004.192587
3389	Central Europe and the Baltics	CEB	14163.027829
4835	Early-demographic dividend	EAR	3339.790563
6281	East Asia & Pacific	EAS	9041.731421

	ISO_x	GDP Per Capita	ISO_y
Country
Afghanistan	AFG	623.925524	AFG
Albania	ALB	4541.386209	ALB
Algeria	DZA	4794.048900	DZA
American Samoa	ASM	NaN	ASM
Andorra	ADO	NaN	AND

	Gender Wage Gap in % Difference	GDP Per Capita	ISO_y
ARG	27.200000	10514.587895	ARG
AUS	14.042933	54717.706705	AUS
AUT	19.188862	47667.805610	AUT
BEL	7.043796	44863.088182	BEL
BRA	24.351110	11159.254155	BRA

	Gender Wage Gap in % Difference	GDP Per Capita	ISO_y
IND	24.800000	1805.579625	IND
PER	22.629050	5974.477245	PER
CHN	22.893980	6416.183355	CHN
COL	6.430555	7447.779147	COL
BGR	18.323460	7502.436143	BGR

	Maternity Leave in Weeks	ISO
Country
Australia	6.0	AUS
Austria	16.0	AUT
Belgium	15.0	BEL
Canada	17.0	CAN
Chile	18.0	CHL

	Occupation	Men median weekly earnings	Women median weekly earnings
4	NaN	Median weekly earnings	Median weekly earnings
6	Total, full-time wage and salary workers	895	726
7	NaN	NaN	NaN
8	Management, professional, and related occupations	1383	996
9	Management, business, and financial operations...	1436	1073

	Occupation	Men median weekly earnings	Women median weekly earnings
10	Management occupations	1486	1139
41	Business and financial operations occupations	1327	1004
88	Architecture and engineering occupations	1452	1257
134	Community and social service occupations	973	845
143	Legal occupations	1877	1135
149	Education, training, and library occupations	1144	907
161	Arts, design, entertainment, sports, and media...	1088	942
181	Healthcare practitioners and technical occupat...	1272	991
248	Food preparation and serving related occupations	481	414
292	Sales and related occupations	880	578
311	Office and administrative support occupations	693	646
376	Construction and extraction occupations	751	704

	Occupation	Men median weekly earnings	Women median weekly earnings
10	Management	1486	1139
41	Business & financial ops	1327	1004
88	Architecture and Engineering	1452	1257
134	Community & social service	973	845
143	Legal	1877	1135
149	Education	1144	907
161	Arts, entertainment, sports	1088	942
181	Healthcare	1272	991
248	Food Prep	481	414
292	Sales	880	578
311	Office & admin support	693	646
376	Construction	751	704

	Women Median Weekly Earnings	Men Median Weekly Earnings
Race
White	743.0	920.0
Black or African American	615.0	680.0
Asian	877.0	1129.0
Hispanic or Latino	556.0	631.0