Final Project

Income Inequality and Voter-turnout Rate

Fiona Wang, Elizabeth He, Jeremy Muhia

Abstract

Class-divide and wealth inequality in America is at an all-time high and has dramatically increased since the 1970s. Following the most recent Presidential elections, one might wonder whether income and voter turnout can be somehow related.

Numerous economists have argued that “voters are virtually a carbon copy of the citizen population” (Wolfing, Rosenstone 1980). However, more recent research are starting to question this correlation. It is not that the earlier economists and researchers have been wrong, but “the preferences of voters and nonvoters are becoming increasingly divergent” (Mcelwee 2015). This is particularly relevant in the recent election, where many said that in order to win, both parties need to know the voting demographics of the swing state population. Many analysts broke down the population by race, gender, and education; However we believe that income inequality plays a role too. In fact there has been strong evidence pointing to such a conclusion: in the 2012 election, 80.2 percent of people whose yearly income is above \$150,000 voted, while only 46.9 percent of people whose yearly income falls below $10,000 voted (Mcelwee 2015). This class-voting bias was observed in the 2008 and 2010 election too. Hence we would like to understand if there really is a voting disparity between the rich and the poor in America, and around the world.

We started out by using voter turnout data from International Institute for Democracy and Electoral Assistance (IDEA), a strong favorite amongst researchers looking into voting patterns and data. They have a comprehensive voter turnout data by country-year. We will also be using Gini coefficient data from CLIO, a reputable database with worldwide data on social, economic, and institutional indicators for the past five centuries. We believe that Gini coefficient is a reliable measure of the income distribution within a country, as it is the most commonly used measure of inequality.

As our interest lies particularly on class-voter bias’ effect on the recent election, but there is not enough data for the 2016 election, we chose to focus on Gini coefficient/income inequality data from 2000. We believe the social and political climate back then was the closest to the recent election, given the dot com bubble and the Gore vs. Bush election recount. The economic climate was also similar, as the NASDAQ index was near the same values then as it is now.



In [3]:

    
import sys                             # system module
import pandas as pd                    # data package
import matplotlib.pyplot as plt        # graphics module  
import datetime as dt                  # date and time module
import numpy as np                     # foundation for Pandas
import seaborn.apionly as sns          # fancy matplotlib graphics (no styling)
from pandas_datareader import wb, data as web  # worldbank data
import quandl
import json
import time

# plotly imports
from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly                                # just to print version and init notebook
import cufflinks as cf                       # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)

# these lines make our graphics show up in the notebook
%matplotlib inline             
plotly.offline.init_notebook_mode(connected=True)

# check versions (overkill, but why not?)
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())









    











    











    











    



Python version: 3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Pandas version:  0.19.0
Plotly version:  1.12.11
Today:  2016-12-22

JSON Helper Function

The decision to use a local JSON file containing API keys and data URLs was made in order to keep that sensitive information confidential



In [4]:

    
# the function below translates JSON files into a Python dictionary
def convert_json_file(filename):
    with open(filename, 'r') as full_json_file:
        json_data = json.load(full_json_file)
        full_json_file.close()
        return json_data



In [5]:

    
# translate the JSON file containing secret API keys and other info into a dictionary for use throughout the program
secrets = convert_json_file('data/keys.json')

Get Income Data from Quandl and Voter Data from not Quandl

Getting data from Quandl was straight forward. Just make an HTTP GET request to their API using their easy to use module and the data was there as a dataframe. Downloading a .xls file on the otherhand was tricky. In order to do this, the file had to be formatted to a .csv file and hosted on GitHub in order to make the notebook less localized.



In [6]:

    
# this gets the US Income Inequality data form quandl as a dataframe
us_income_ineq = quandl.get('CLIO/USA_II', authtoken=secrets['QUANDL_KEY'])



In [7]:

    
us_income_ineq



In [8]:

    
# this gets the UK Income Inequality data as a dataframe
uk_income_ineq = quandl.get('CLIO/GBR_II', authtoken=secrets['QUANDL_KEY'])



In [9]:

    
uk_income_ineq



In [10]:

    
# this gets the Australia Income Inequality data as a dataframe
australia_income_ineq = quandl.get('CLIO/AUS_II', authtoken=secrets['QUANDL_KEY'])



In [11]:

    
australia_income_ineq



In [12]:

    
# this gets the Canada Income Inequality data as a dataframe
canada_income_ineq = quandl.get('CLIO/CAN_II', authtoken=secrets['QUANDL_KEY'])



In [13]:

    
canada_income_ineq



In [14]:

    
# read and clean up the .csv file that is linked to and store it as a dataframe
voter_data = pd.read_csv(secrets['VOTER_DATA']).dropna(0)



In [15]:

    
# check for any rows with a blank column 'Country'
voter_data.tail(15)









    Out[15]:






  
    
      
      Country
      Year
      Data
    
  
  
    
      3056
      Yugoslavia, FR/Union of Serbia and Montenegro
      2000
      64.17
    
    
      3057
      Yugoslavia, FR/Union of Serbia and Montenegro
      1996
      53.29
    
    
      3058
      Yugoslavia, FR/Union of Serbia and Montenegro
      1993
      67.39
    
    
      3063
      Zambia
      2016
      56.03
    
    
      3064
      Zambia
      2011
      53.65
    
    
      3065
      Zambia
      2006
      70.74
    
    
      3066
      Zambia
      2001
      68.55
    
    
      3067
      Zambia
      1996
      78.49
    
    
      3068
      Zambia
      1991
      44.44
    
    
      3069
      Zambia
      1968
      82.47
    
    
      3070
      Zimbabwe
      2008
      40.81
    
    
      3071
      Zimbabwe
      2005
      47.66
    
    
      3072
      Zimbabwe
      2000
      48.33
    
    
      3073
      Zimbabwe
      1995
      30.81
    
    
      3074
      Zimbabwe
      1979
      63.89



In [17]:

    
# this isolates and stores the US voter turnout data
us_voter_data = voter_data.loc[voter_data['Country'].isin(['United States'])].drop_duplicates(['Year', 'Data'])
us_voter_data = us_voter_data.set_index(['Year'])
us_voter_data.index = pd.to_datetime(us_voter_data.index, format='%Y')
us_voter_data









    Out[17]:






  
    
      
      Country
      Data
    
    
      Year
      
      
    
  
  
    
      2014-01-01
      United States
      42.50
    
    
      2012-01-01
      United States
      64.44
    
    
      2010-01-01
      United States
      48.59
    
    
      2008-01-01
      United States
      64.36
    
    
      2006-01-01
      United States
      47.52
    
    
      2004-01-01
      United States
      68.75
    
    
      2002-01-01
      United States
      45.31
    
    
      2000-01-01
      United States
      63.76
    
    
      1998-01-01
      United States
      51.55
    
    
      1996-01-01
      United States
      65.97
    
    
      1994-01-01
      United States
      57.64
    
    
      1992-01-01
      United States
      78.02
    
    
      1990-01-01
      United States
      56.03
    
    
      1988-01-01
      United States
      72.48
    
    
      1986-01-01
      United States
      54.89
    
    
      1984-01-01
      United States
      74.63
    
    
      1982-01-01
      United States
      61.10
    
    
      1980-01-01
      United States
      76.53
    
    
      1978-01-01
      United States
      57.04
    
    
      1976-01-01
      United States
      77.64
    
    
      1974-01-01
      United States
      58.15
    
    
      1972-01-01
      United States
      79.85
    
    
      1970-01-01
      United States
      70.32
    
    
      1968-01-01
      United States
      89.66



In [18]:

    
# this isolates and stores the UK voter turnout data
uk_voter_data = voter_data.loc[voter_data['Country'].isin(['United Kingdom'])].drop_duplicates(['Year', 'Data'])
uk_voter_data = uk_voter_data.set_index(['Year'])
uk_voter_data.index = pd.to_datetime(uk_voter_data.index, format='%Y')
uk_voter_data









    Out[18]:






  
    
      
      Country
      Data
    
    
      Year
      
      
    
  
  
    
      2015-01-01
      United Kingdom
      66.12
    
    
      2010-01-01
      United Kingdom
      65.77
    
    
      2005-01-01
      United Kingdom
      61.36
    
    
      2001-01-01
      United Kingdom
      59.38
    
    
      1997-01-01
      United Kingdom
      71.46
    
    
      1992-01-01
      United Kingdom
      77.83
    
    
      1987-01-01
      United Kingdom
      75.42
    
    
      1983-01-01
      United Kingdom
      72.81
    
    
      1979-01-01
      United Kingdom
      76.00
    
    
      1974-01-01
      United Kingdom
      72.93
    
    
      1970-01-01
      United Kingdom
      72.15
    
    
      1966-01-01
      United Kingdom
      75.96
    
    
      1964-01-01
      United Kingdom
      77.17
    
    
      1959-01-01
      United Kingdom
      78.71
    
    
      1955-01-01
      United Kingdom
      76.78
    
    
      1951-01-01
      United Kingdom
      81.89
    
    
      1950-01-01
      United Kingdom
      83.61
    
    
      1945-01-01
      United Kingdom
      72.55



In [19]:

    
# this isolates and stores the Australia voter turnout data
australia_voter_data = voter_data.loc[voter_data['Country'].isin(['Australia'])].drop_duplicates(['Year', 'Data'])
australia_voter_data = australia_voter_data.set_index(['Year'])
australia_voter_data.index = pd.to_datetime(australia_voter_data.index, format='%Y')
australia_voter_data









    Out[19]:






  
    
      
      Country
      Data
    
    
      Year
      
      
    
  
  
    
      2016-01-01
      Australia
      91.01
    
    
      2013-01-01
      Australia
      93.23
    
    
      2010-01-01
      Australia
      93.22
    
    
      2007-01-01
      Australia
      94.76
    
    
      2004-01-01
      Australia
      94.32
    
    
      2001-01-01
      Australia
      94.85
    
    
      1998-01-01
      Australia
      94.99
    
    
      1996-01-01
      Australia
      95.77
    
    
      1993-01-01
      Australia
      95.75
    
    
      1990-01-01
      Australia
      95.31
    
    
      1987-01-01
      Australia
      93.84
    
    
      1984-01-01
      Australia
      94.19
    
    
      1983-01-01
      Australia
      94.64
    
    
      1980-01-01
      Australia
      94.35
    
    
      1977-01-01
      Australia
      95.08
    
    
      1975-01-01
      Australia
      95.39
    
    
      1974-01-01
      Australia
      95.40
    
    
      1972-01-01
      Australia
      95.38
    
    
      1969-01-01
      Australia
      94.97
    
    
      1966-01-01
      Australia
      95.13
    
    
      1963-01-01
      Australia
      95.71
    
    
      1961-01-01
      Australia
      95.22
    
    
      1958-01-01
      Australia
      95.44
    
    
      1955-01-01
      Australia
      95.00
    
    
      1954-01-01
      Australia
      96.05
    
    
      1951-01-01
      Australia
      95.97
    
    
      1949-01-01
      Australia
      95.94
    
    
      1946-01-01
      Australia
      93.95



In [20]:

    
# this isolates and stores the US voter turnout data
canada_voter_data = voter_data.loc[voter_data['Country'].isin(['Canada'])].drop_duplicates(['Year', 'Data'])
canada_voter_data = canada_voter_data.set_index(['Year'])
canada_voter_data.index = pd.to_datetime(canada_voter_data.index, format='%Y')
canada_voter_data



In [22]:

    
# this creates a dictionary of countries along with their respective voter and income inequality data
summary_dictionary = {canada_voter_data['Country'].iloc[0]:[canada_voter_data['Data'].iloc[0],
                                                            canada_income_ineq['Value'].iloc[-1]],
                      australia_voter_data['Country'].iloc[0]:[australia_voter_data['Data'].iloc[0],
                                                              australia_income_ineq['Value'].iloc[-1]],
                      us_voter_data['Country'].iloc[0]:[us_voter_data['Data'].iloc[0],
                                                        us_income_ineq['Value'].iloc[-1]],
                      uk_voter_data['Country'].iloc[0]:[uk_voter_data['Data'].iloc[0],
                                                        uk_income_ineq['Value'].iloc[-1]]
                     }



In [23]:

    
summary_dictionary









    Out[23]:





{'Australia': [91.010000000000005, 44.219727407400001],
 'Canada': [68.280000000000001, 41.0614293446],
 'United Kingdom': [66.120000000000005, 40.465648999400003],
 'United States': [42.5, 43.862823822000003]}



In [24]:

    
# from the dictionary, we can create and shape a dataframe of the country data of interest
summary_df = pd.DataFrame.from_dict(summary_dictionary, orient='index')
summary_df = summary_df.reset_index()
summary_df.columns = ['Country', 'Voter Turnout Rate', 'Gini Points']
summary_df









    Out[24]:






  
    
      
      Country
      Voter Turnout Rate
      Gini Points
    
  
  
    
      0
      Canada
      68.28
      41.061429
    
    
      1
      United States
      42.50
      43.862824
    
    
      2
      Australia
      91.01
      44.219727
    
    
      3
      United Kingdom
      66.12
      40.465649

Plotting

After collecting the relevant data and shaping it, we can now start plotting the information. These visualizations help us understand the data that's been gathered.

US Voting and Income Data

Here, we plot the percentage of eligible individuals participating in the political process against the distribution of wealth in the United States. Interestingly, beginning around 1970 political participation began to decline as income inequality began to increase. Of course, voter participation changes dramatically between presidential and non-presidential elections. Periods like 2002 to 2004 show 20% increases in voter turnout rate.



In [25]:

    
# create traces of both the income inequality and voter turnout data for the US
us_income_trace = go.Scatter(
    x = us_income_ineq.index.to_series(),
    y = us_income_ineq['Value'],
    name = 'Income Data',
    mode = 'lines+markers'
)

us_voter_trace = go.Scatter(
    x = us_voter_data.index.to_series(),
    y = us_voter_data['Data'],
    name = 'Voter Data',
    mode = 'lines+markers'
)

us_summary_layout = dict(
    width = 750,
    height = 450,
    title = 'US Income Inequality and Voting Patterns',
    xaxis = {'title':'Year'},
    yaxis = {'title':'Gini Points & Political Participation Rate'}
)



In [26]:

    
us_summary_data = [us_income_trace, us_voter_trace]



In [27]:

    
iplot(go.Figure(data=us_summary_data, layout=us_summary_layout))

UK Income and Voting Data

The United Kingdom shows no immediately discernable pattern. However, growing income inequality in the last few decades could be resulting in growing political participation in the 2000s as globalism and the prominence of the EU prompt greater participation for the better or worse.



In [28]:

    
# create traces of both the income inequality and voter turnout data for the UK
uk_income_trace = go.Scatter(
    x = uk_income_ineq.index.to_series(),
    y = uk_income_ineq['Value'],
    name = 'Income Data',
    mode = 'lines+markers'
)

uk_voter_trace = go.Scatter(
    x = uk_voter_data.index.to_series(),
    y = uk_voter_data['Data'],
    name = 'Voter Data',
    mode = 'lines+markers'
)

uk_summary_layout = dict(
    width = 750,
    height = 450,
    title = 'UK Income Inequality and Voting Patterns',
    xaxis = {'title':'Year'},
    yaxis = {'title':'Gini Points & Political Participation Rate'}
)



In [29]:

    
uk_summary_data = [uk_income_trace, uk_voter_trace]



In [30]:

    
iplot(go.Figure(data=uk_summary_data, layout=uk_summary_layout))

Australia Income and Voting Data

Australia is quite consistent in eligible voter participation. Still, growing income inequality coupled with voter participation that's slightly falling, may lead to a turnaround.



In [31]:

    
# create traces of both the income inequality and voter turnout data for Australia
australia_income_trace = go.Scatter(
    x = australia_income_ineq.index.to_series(),
    y = australia_income_ineq['Value'],
    name = 'Income Data',
    mode = 'lines+markers'
)

australia_voter_trace = go.Scatter(
    x = australia_voter_data.index.to_series(),
    y = australia_voter_data['Data'],
    name = 'Voter Data',
    mode = 'lines+markers'
)

australia_summary_layout = dict(
    width = 750,
    height = 450,
    title = 'Australia Income Inequality and Voting Patterns',
    xaxis = {'title':'Year'},
    yaxis = {'title':'Gini Points & Political Participation Rate'}
)



In [32]:

    
australia_summary_data = [australia_income_trace, australia_voter_trace]



In [33]:

    
iplot(go.Figure(data=australia_summary_data, layout=australia_summary_layout))

Canada Income and Voting Data

Interestingly, Canadian citizens seem to have reacted quite quickly to a sharp increase in income inequality over the 1990s. Furthermore, the decrease in voter participation over the 1990s seems to have correlated with the decline in the wealth gap.



In [34]:

    
# create traces of both the income inequality and voter turnout data for Canada
canada_income_trace = go.Scatter(
    x = canada_income_ineq.index.to_series(),
    y = canada_income_ineq['Value'],
    name = 'Income Data',
    mode = 'lines+markers'
)

canada_voter_trace = go.Scatter(
    x = canada_voter_data.index.to_series(),
    y = canada_voter_data['Data'],
    name = 'Voter Data',
    mode = 'lines+markers'
)

canada_summary_layout = dict(
    width = 750,
    height = 450,
    title = 'Canada Income Inequality and Voting Patterns',
    xaxis = {'title':'Year'},
    yaxis = {'title':'Gini Points & Political Participation Rate'}
)



In [35]:

    
canada_summary_data = [canada_income_trace, canada_voter_trace]



In [36]:

    
iplot(go.Figure(data=canada_summary_data, layout=canada_summary_layout))

Summary

After plotting the four countries against each other, It seems that there is not very much correlation within the buckets from the initial grouping



In [40]:

    
# create a trace of the data that will be used to create an informative plot later
summary_trace = go.Scatter(
    x = summary_df['Voter Turnout Rate'],
    y = summary_df['Gini Points'],
    mode = 'markers',
    text = summary_df['Country'],
    marker = dict(
        size = 14,
        color = np.random.randn(700)
    )
)

# this layout will create meaningful formatting to help understand the data better
summary_layout = dict(
    width = 750,
    height = 450,
    title = 'Income Inequality vs. Voter Turnout',
    xaxis = {'title':'Voter Turnout Rate'},
    yaxis = {'title':'Gini Points'}
)



In [41]:

    
# store the trace in an array for plotting
data = [summary_trace]



In [42]:

    
# finally plot the data
iplot(go.Figure(data=data, layout=summary_layout))

Bibliography/Data Source



In [ ]:

	Value
Year
1820-12-31	57.007260
1850-12-31	43.797350
1870-12-31	51.340740
1890-12-31	45.523170
1910-12-31	51.090485
1929-12-31	54.274109
1950-12-31	39.424993
1960-12-31	38.023718
1970-12-31	36.064610
1980-12-31	36.814285
1990-12-31	39.719876
2000-12-31	43.862824

	Value
Year
1820-12-31	59.270000
1850-12-31	43.491390
1870-12-31	48.970000
1890-12-31	37.366840
1910-12-31	41.861650
1929-12-31	42.512820
1950-12-31	30.497877
1960-12-31	28.501914
1970-12-31	28.900000
1980-12-31	33.998850
1990-12-31	38.624790
2000-12-31	40.465649

	Value
Year
1850-12-31	41.307390
1870-12-31	47.545060
1890-12-31	39.354040
1910-12-31	40.727470
1929-12-31	36.270165
1950-12-31	37.933493
1960-12-31	35.016606
1970-12-31	31.818443
1980-12-31	39.336425
1990-12-31	41.571508
2000-12-31	44.219727

	Value
Year
1820-12-31	45.136050
1850-12-31	26.737010
1870-12-31	43.792240
1890-12-31	41.229970
1910-12-31	40.669061
1929-12-31	41.948842
1950-12-31	36.273740
1960-12-31	34.571472
1970-12-31	33.780417
1980-12-31	33.515236
1990-12-31	31.880740
2000-12-31	41.061429

	Country	Year	Data
3056	Yugoslavia, FR/Union of Serbia and Montenegro	2000	64.17
3057	Yugoslavia, FR/Union of Serbia and Montenegro	1996	53.29
3058	Yugoslavia, FR/Union of Serbia and Montenegro	1993	67.39
3063	Zambia	2016	56.03
3064	Zambia	2011	53.65
3065	Zambia	2006	70.74
3066	Zambia	2001	68.55
3067	Zambia	1996	78.49
3068	Zambia	1991	44.44
3069	Zambia	1968	82.47
3070	Zimbabwe	2008	40.81
3071	Zimbabwe	2005	47.66
3072	Zimbabwe	2000	48.33
3073	Zimbabwe	1995	30.81
3074	Zimbabwe	1979	63.89

	Country	Data
Year
2014-01-01	United States	42.50
2012-01-01	United States	64.44
2010-01-01	United States	48.59
2008-01-01	United States	64.36
2006-01-01	United States	47.52
2004-01-01	United States	68.75
2002-01-01	United States	45.31
2000-01-01	United States	63.76
1998-01-01	United States	51.55
1996-01-01	United States	65.97
1994-01-01	United States	57.64
1992-01-01	United States	78.02
1990-01-01	United States	56.03
1988-01-01	United States	72.48
1986-01-01	United States	54.89
1984-01-01	United States	74.63
1982-01-01	United States	61.10
1980-01-01	United States	76.53
1978-01-01	United States	57.04
1976-01-01	United States	77.64
1974-01-01	United States	58.15
1972-01-01	United States	79.85
1970-01-01	United States	70.32
1968-01-01	United States	89.66

	Country	Data
Year
2015-01-01	United Kingdom	66.12
2010-01-01	United Kingdom	65.77
2005-01-01	United Kingdom	61.36
2001-01-01	United Kingdom	59.38
1997-01-01	United Kingdom	71.46
1992-01-01	United Kingdom	77.83
1987-01-01	United Kingdom	75.42
1983-01-01	United Kingdom	72.81
1979-01-01	United Kingdom	76.00
1974-01-01	United Kingdom	72.93
1970-01-01	United Kingdom	72.15
1966-01-01	United Kingdom	75.96
1964-01-01	United Kingdom	77.17
1959-01-01	United Kingdom	78.71
1955-01-01	United Kingdom	76.78
1951-01-01	United Kingdom	81.89
1950-01-01	United Kingdom	83.61
1945-01-01	United Kingdom	72.55

	Country	Data
Year
2016-01-01	Australia	91.01
2013-01-01	Australia	93.23
2010-01-01	Australia	93.22
2007-01-01	Australia	94.76
2004-01-01	Australia	94.32
2001-01-01	Australia	94.85
1998-01-01	Australia	94.99
1996-01-01	Australia	95.77
1993-01-01	Australia	95.75
1990-01-01	Australia	95.31
1987-01-01	Australia	93.84
1984-01-01	Australia	94.19
1983-01-01	Australia	94.64
1980-01-01	Australia	94.35
1977-01-01	Australia	95.08
1975-01-01	Australia	95.39
1974-01-01	Australia	95.40
1972-01-01	Australia	95.38
1969-01-01	Australia	94.97
1966-01-01	Australia	95.13
1963-01-01	Australia	95.71
1961-01-01	Australia	95.22
1958-01-01	Australia	95.44
1955-01-01	Australia	95.00
1954-01-01	Australia	96.05
1951-01-01	Australia	95.97
1949-01-01	Australia	95.94
1946-01-01	Australia	93.95

	Country	Data
Year
2015-01-01	Canada	68.28
2011-01-01	Canada	61.11
2008-01-01	Canada	59.52
2006-01-01	Canada	64.67
2004-01-01	Canada	60.91
2000-01-01	Canada	61.18
1997-01-01	Canada	67.00
1993-01-01	Canada	69.64
1988-01-01	Canada	75.29
1984-01-01	Canada	75.34
1980-01-01	Canada	69.32
1979-01-01	Canada	75.69
1974-01-01	Canada	71.00
1972-01-01	Canada	77.20
1968-01-01	Canada	75.67
1965-01-01	Canada	75.88
1963-01-01	Canada	80.30
1962-01-01	Canada	80.13
1958-01-01	Canada	80.57
1957-01-01	Canada	75.05
1953-01-01	Canada	67.87
1949-01-01	Canada	74.79
1945-01-01	Canada	76.31