Final Project

Income Inequality and Voter-turnout Rate

Fiona Wang, Elizabeth He, Jeremy Muhia

Abstract

Class-divide and wealth inequality in America is at an all-time high and has dramatically increased since the 1970s. Following the most recent Presidential elections, one might wonder whether income and voter turnout can be somehow related.

Numerous economists have argued that “voters are virtually a carbon copy of the citizen population” (Wolfing, Rosenstone 1980). However, more recent research are starting to question this correlation. It is not that the earlier economists and researchers have been wrong, but “the preferences of voters and nonvoters are becoming increasingly divergent” (Mcelwee 2015). This is particularly relevant in the recent election, where many said that in order to win, both parties need to know the voting demographics of the swing state population. Many analysts broke down the population by race, gender, and education; However we believe that income inequality plays a role too. In fact there has been strong evidence pointing to such a conclusion: in the 2012 election, 80.2 percent of people whose yearly income is above \$150,000 voted, while only 46.9 percent of people whose yearly income falls below $10,000 voted (Mcelwee 2015). This class-voting bias was observed in the 2008 and 2010 election too. Hence we would like to understand if there really is a voting disparity between the rich and the poor in America, and around the world.

We started out by using voter turnout data from International Institute for Democracy and Electoral Assistance (IDEA), a strong favorite amongst researchers looking into voting patterns and data. They have a comprehensive voter turnout data by country-year. We will also be using Gini coefficient data from CLIO, a reputable database with worldwide data on social, economic, and institutional indicators for the past five centuries. We believe that Gini coefficient is a reliable measure of the income distribution within a country, as it is the most commonly used measure of inequality.

As our interest lies particularly on class-voter bias’ effect on the recent election, but there is not enough data for the 2016 election, we chose to focus on Gini coefficient/income inequality data from 2000. We believe the social and political climate back then was the closest to the recent election, given the dot com bubble and the Gore vs. Bush election recount. The economic climate was also similar, as the NASDAQ index was near the same values then as it is now.


In [3]:
import sys                             # system module
import pandas as pd                    # data package
import matplotlib.pyplot as plt        # graphics module  
import datetime as dt                  # date and time module
import numpy as np                     # foundation for Pandas
import seaborn.apionly as sns          # fancy matplotlib graphics (no styling)
from pandas_datareader import wb, data as web  # worldbank data
import quandl
import json
import time

# plotly imports
from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly                                # just to print version and init notebook
import cufflinks as cf                       # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)

# these lines make our graphics show up in the notebook
%matplotlib inline             
plotly.offline.init_notebook_mode(connected=True)

# check versions (overkill, but why not?)
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())


Python version: 3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Pandas version:  0.19.0
Plotly version:  1.12.11
Today:  2016-12-22

JSON Helper Function

The decision to use a local JSON file containing API keys and data URLs was made in order to keep that sensitive information confidential


In [4]:
# the function below translates JSON files into a Python dictionary
def convert_json_file(filename):
    with open(filename, 'r') as full_json_file:
        json_data = json.load(full_json_file)
        full_json_file.close()
        return json_data

In [5]:
# translate the JSON file containing secret API keys and other info into a dictionary for use throughout the program
secrets = convert_json_file('data/keys.json')

Get Income Data from Quandl and Voter Data from not Quandl

Getting data from Quandl was straight forward. Just make an HTTP GET request to their API using their easy to use module and the data was there as a dataframe. Downloading a .xls file on the otherhand was tricky. In order to do this, the file had to be formatted to a .csv file and hosted on GitHub in order to make the notebook less localized.


In [6]:
# this gets the US Income Inequality data form quandl as a dataframe
us_income_ineq = quandl.get('CLIO/USA_II', authtoken=secrets['QUANDL_KEY'])

In [7]:
us_income_ineq


Out[7]:
Value
Year
1820-12-31 57.007260
1850-12-31 43.797350
1870-12-31 51.340740
1890-12-31 45.523170
1910-12-31 51.090485
1929-12-31 54.274109
1950-12-31 39.424993
1960-12-31 38.023718
1970-12-31 36.064610
1980-12-31 36.814285
1990-12-31 39.719876
2000-12-31 43.862824

In [8]:
# this gets the UK Income Inequality data as a dataframe
uk_income_ineq = quandl.get('CLIO/GBR_II', authtoken=secrets['QUANDL_KEY'])

In [9]:
uk_income_ineq


Out[9]:
Value
Year
1820-12-31 59.270000
1850-12-31 43.491390
1870-12-31 48.970000
1890-12-31 37.366840
1910-12-31 41.861650
1929-12-31 42.512820
1950-12-31 30.497877
1960-12-31 28.501914
1970-12-31 28.900000
1980-12-31 33.998850
1990-12-31 38.624790
2000-12-31 40.465649

In [10]:
# this gets the Australia Income Inequality data as a dataframe
australia_income_ineq = quandl.get('CLIO/AUS_II', authtoken=secrets['QUANDL_KEY'])

In [11]:
australia_income_ineq


Out[11]:
Value
Year
1850-12-31 41.307390
1870-12-31 47.545060
1890-12-31 39.354040
1910-12-31 40.727470
1929-12-31 36.270165
1950-12-31 37.933493
1960-12-31 35.016606
1970-12-31 31.818443
1980-12-31 39.336425
1990-12-31 41.571508
2000-12-31 44.219727

In [12]:
# this gets the Canada Income Inequality data as a dataframe
canada_income_ineq = quandl.get('CLIO/CAN_II', authtoken=secrets['QUANDL_KEY'])

In [13]:
canada_income_ineq


Out[13]:
Value
Year
1820-12-31 45.136050
1850-12-31 26.737010
1870-12-31 43.792240
1890-12-31 41.229970
1910-12-31 40.669061
1929-12-31 41.948842
1950-12-31 36.273740
1960-12-31 34.571472
1970-12-31 33.780417
1980-12-31 33.515236
1990-12-31 31.880740
2000-12-31 41.061429

In [14]:
# read and clean up the .csv file that is linked to and store it as a dataframe
voter_data = pd.read_csv(secrets['VOTER_DATA']).dropna(0)

In [15]:
# check for any rows with a blank column 'Country'
voter_data.tail(15)


Out[15]:
Country Year Data
3056 Yugoslavia, FR/Union of Serbia and Montenegro 2000 64.17
3057 Yugoslavia, FR/Union of Serbia and Montenegro 1996 53.29
3058 Yugoslavia, FR/Union of Serbia and Montenegro 1993 67.39
3063 Zambia 2016 56.03
3064 Zambia 2011 53.65
3065 Zambia 2006 70.74
3066 Zambia 2001 68.55
3067 Zambia 1996 78.49
3068 Zambia 1991 44.44
3069 Zambia 1968 82.47
3070 Zimbabwe 2008 40.81
3071 Zimbabwe 2005 47.66
3072 Zimbabwe 2000 48.33
3073 Zimbabwe 1995 30.81
3074 Zimbabwe 1979 63.89

In [17]:
# this isolates and stores the US voter turnout data
us_voter_data = voter_data.loc[voter_data['Country'].isin(['United States'])].drop_duplicates(['Year', 'Data'])
us_voter_data = us_voter_data.set_index(['Year'])
us_voter_data.index = pd.to_datetime(us_voter_data.index, format='%Y')
us_voter_data


Out[17]:
Country Data
Year
2014-01-01 United States 42.50
2012-01-01 United States 64.44
2010-01-01 United States 48.59
2008-01-01 United States 64.36
2006-01-01 United States 47.52
2004-01-01 United States 68.75
2002-01-01 United States 45.31
2000-01-01 United States 63.76
1998-01-01 United States 51.55
1996-01-01 United States 65.97
1994-01-01 United States 57.64
1992-01-01 United States 78.02
1990-01-01 United States 56.03
1988-01-01 United States 72.48
1986-01-01 United States 54.89
1984-01-01 United States 74.63
1982-01-01 United States 61.10
1980-01-01 United States 76.53
1978-01-01 United States 57.04
1976-01-01 United States 77.64
1974-01-01 United States 58.15
1972-01-01 United States 79.85
1970-01-01 United States 70.32
1968-01-01 United States 89.66

In [18]:
# this isolates and stores the UK voter turnout data
uk_voter_data = voter_data.loc[voter_data['Country'].isin(['United Kingdom'])].drop_duplicates(['Year', 'Data'])
uk_voter_data = uk_voter_data.set_index(['Year'])
uk_voter_data.index = pd.to_datetime(uk_voter_data.index, format='%Y')
uk_voter_data


Out[18]:
Country Data
Year
2015-01-01 United Kingdom 66.12
2010-01-01 United Kingdom 65.77
2005-01-01 United Kingdom 61.36
2001-01-01 United Kingdom 59.38
1997-01-01 United Kingdom 71.46
1992-01-01 United Kingdom 77.83
1987-01-01 United Kingdom 75.42
1983-01-01 United Kingdom 72.81
1979-01-01 United Kingdom 76.00
1974-01-01 United Kingdom 72.93
1970-01-01 United Kingdom 72.15
1966-01-01 United Kingdom 75.96
1964-01-01 United Kingdom 77.17
1959-01-01 United Kingdom 78.71
1955-01-01 United Kingdom 76.78
1951-01-01 United Kingdom 81.89
1950-01-01 United Kingdom 83.61
1945-01-01 United Kingdom 72.55

In [19]:
# this isolates and stores the Australia voter turnout data
australia_voter_data = voter_data.loc[voter_data['Country'].isin(['Australia'])].drop_duplicates(['Year', 'Data'])
australia_voter_data = australia_voter_data.set_index(['Year'])
australia_voter_data.index = pd.to_datetime(australia_voter_data.index, format='%Y')
australia_voter_data


Out[19]:
Country Data
Year
2016-01-01 Australia 91.01
2013-01-01 Australia 93.23
2010-01-01 Australia 93.22
2007-01-01 Australia 94.76
2004-01-01 Australia 94.32
2001-01-01 Australia 94.85
1998-01-01 Australia 94.99
1996-01-01 Australia 95.77
1993-01-01 Australia 95.75
1990-01-01 Australia 95.31
1987-01-01 Australia 93.84
1984-01-01 Australia 94.19
1983-01-01 Australia 94.64
1980-01-01 Australia 94.35
1977-01-01 Australia 95.08
1975-01-01 Australia 95.39
1974-01-01 Australia 95.40
1972-01-01 Australia 95.38
1969-01-01 Australia 94.97
1966-01-01 Australia 95.13
1963-01-01 Australia 95.71
1961-01-01 Australia 95.22
1958-01-01 Australia 95.44
1955-01-01 Australia 95.00
1954-01-01 Australia 96.05
1951-01-01 Australia 95.97
1949-01-01 Australia 95.94
1946-01-01 Australia 93.95

In [20]:
# this isolates and stores the US voter turnout data
canada_voter_data = voter_data.loc[voter_data['Country'].isin(['Canada'])].drop_duplicates(['Year', 'Data'])
canada_voter_data = canada_voter_data.set_index(['Year'])
canada_voter_data.index = pd.to_datetime(canada_voter_data.index, format='%Y')
canada_voter_data


Out[20]:
Country Data
Year
2015-01-01 Canada 68.28
2011-01-01 Canada 61.11
2008-01-01 Canada 59.52
2006-01-01 Canada 64.67
2004-01-01 Canada 60.91
2000-01-01 Canada 61.18
1997-01-01 Canada 67.00
1993-01-01 Canada 69.64
1988-01-01 Canada 75.29
1984-01-01 Canada 75.34
1980-01-01 Canada 69.32
1979-01-01 Canada 75.69
1974-01-01 Canada 71.00
1972-01-01 Canada 77.20
1968-01-01 Canada 75.67
1965-01-01 Canada 75.88
1963-01-01 Canada 80.30
1962-01-01 Canada 80.13
1958-01-01 Canada 80.57
1957-01-01 Canada 75.05
1953-01-01 Canada 67.87
1949-01-01 Canada 74.79
1945-01-01 Canada 76.31

In [22]:
# this creates a dictionary of countries along with their respective voter and income inequality data
summary_dictionary = {canada_voter_data['Country'].iloc[0]:[canada_voter_data['Data'].iloc[0],
                                                            canada_income_ineq['Value'].iloc[-1]],
                      australia_voter_data['Country'].iloc[0]:[australia_voter_data['Data'].iloc[0],
                                                              australia_income_ineq['Value'].iloc[-1]],
                      us_voter_data['Country'].iloc[0]:[us_voter_data['Data'].iloc[0],
                                                        us_income_ineq['Value'].iloc[-1]],
                      uk_voter_data['Country'].iloc[0]:[uk_voter_data['Data'].iloc[0],
                                                        uk_income_ineq['Value'].iloc[-1]]
                     }

In [23]:
summary_dictionary


Out[23]:
{'Australia': [91.010000000000005, 44.219727407400001],
 'Canada': [68.280000000000001, 41.0614293446],
 'United Kingdom': [66.120000000000005, 40.465648999400003],
 'United States': [42.5, 43.862823822000003]}

In [24]:
# from the dictionary, we can create and shape a dataframe of the country data of interest
summary_df = pd.DataFrame.from_dict(summary_dictionary, orient='index')
summary_df = summary_df.reset_index()
summary_df.columns = ['Country', 'Voter Turnout Rate', 'Gini Points']
summary_df


Out[24]:
Country Voter Turnout Rate Gini Points
0 Canada 68.28 41.061429
1 United States 42.50 43.862824
2 Australia 91.01 44.219727
3 United Kingdom 66.12 40.465649

Plotting

After collecting the relevant data and shaping it, we can now start plotting the information. These visualizations help us understand the data that's been gathered.

US Voting and Income Data

Here, we plot the percentage of eligible individuals participating in the political process against the distribution of wealth in the United States. Interestingly, beginning around 1970 political participation began to decline as income inequality began to increase. Of course, voter participation changes dramatically between presidential and non-presidential elections. Periods like 2002 to 2004 show 20% increases in voter turnout rate.


In [25]:
# create traces of both the income inequality and voter turnout data for the US
us_income_trace = go.Scatter(
    x = us_income_ineq.index.to_series(),
    y = us_income_ineq['Value'],
    name = 'Income Data',
    mode = 'lines+markers'
)

us_voter_trace = go.Scatter(
    x = us_voter_data.index.to_series(),
    y = us_voter_data['Data'],
    name = 'Voter Data',
    mode = 'lines+markers'
)

us_summary_layout = dict(
    width = 750,
    height = 450,
    title = 'US Income Inequality and Voting Patterns',
    xaxis = {'title':'Year'},
    yaxis = {'title':'Gini Points & Political Participation Rate'}
)

In [26]:
us_summary_data = [us_income_trace, us_voter_trace]

In [27]:
iplot(go.Figure(data=us_summary_data, layout=us_summary_layout))


UK Income and Voting Data

The United Kingdom shows no immediately discernable pattern. However, growing income inequality in the last few decades could be resulting in growing political participation in the 2000s as globalism and the prominence of the EU prompt greater participation for the better or worse.


In [28]:
# create traces of both the income inequality and voter turnout data for the UK
uk_income_trace = go.Scatter(
    x = uk_income_ineq.index.to_series(),
    y = uk_income_ineq['Value'],
    name = 'Income Data',
    mode = 'lines+markers'
)

uk_voter_trace = go.Scatter(
    x = uk_voter_data.index.to_series(),
    y = uk_voter_data['Data'],
    name = 'Voter Data',
    mode = 'lines+markers'
)

uk_summary_layout = dict(
    width = 750,
    height = 450,
    title = 'UK Income Inequality and Voting Patterns',
    xaxis = {'title':'Year'},
    yaxis = {'title':'Gini Points & Political Participation Rate'}
)

In [29]:
uk_summary_data = [uk_income_trace, uk_voter_trace]

In [30]:
iplot(go.Figure(data=uk_summary_data, layout=uk_summary_layout))


Australia Income and Voting Data

Australia is quite consistent in eligible voter participation. Still, growing income inequality coupled with voter participation that's slightly falling, may lead to a turnaround.


In [31]:
# create traces of both the income inequality and voter turnout data for Australia
australia_income_trace = go.Scatter(
    x = australia_income_ineq.index.to_series(),
    y = australia_income_ineq['Value'],
    name = 'Income Data',
    mode = 'lines+markers'
)

australia_voter_trace = go.Scatter(
    x = australia_voter_data.index.to_series(),
    y = australia_voter_data['Data'],
    name = 'Voter Data',
    mode = 'lines+markers'
)

australia_summary_layout = dict(
    width = 750,
    height = 450,
    title = 'Australia Income Inequality and Voting Patterns',
    xaxis = {'title':'Year'},
    yaxis = {'title':'Gini Points & Political Participation Rate'}
)

In [32]:
australia_summary_data = [australia_income_trace, australia_voter_trace]

In [33]:
iplot(go.Figure(data=australia_summary_data, layout=australia_summary_layout))


Canada Income and Voting Data

Interestingly, Canadian citizens seem to have reacted quite quickly to a sharp increase in income inequality over the 1990s. Furthermore, the decrease in voter participation over the 1990s seems to have correlated with the decline in the wealth gap.


In [34]:
# create traces of both the income inequality and voter turnout data for Canada
canada_income_trace = go.Scatter(
    x = canada_income_ineq.index.to_series(),
    y = canada_income_ineq['Value'],
    name = 'Income Data',
    mode = 'lines+markers'
)

canada_voter_trace = go.Scatter(
    x = canada_voter_data.index.to_series(),
    y = canada_voter_data['Data'],
    name = 'Voter Data',
    mode = 'lines+markers'
)

canada_summary_layout = dict(
    width = 750,
    height = 450,
    title = 'Canada Income Inequality and Voting Patterns',
    xaxis = {'title':'Year'},
    yaxis = {'title':'Gini Points & Political Participation Rate'}
)

In [35]:
canada_summary_data = [canada_income_trace, canada_voter_trace]

In [36]:
iplot(go.Figure(data=canada_summary_data, layout=canada_summary_layout))


Summary

After plotting the four countries against each other, It seems that there is not very much correlation within the buckets from the initial grouping


In [40]:
# create a trace of the data that will be used to create an informative plot later
summary_trace = go.Scatter(
    x = summary_df['Voter Turnout Rate'],
    y = summary_df['Gini Points'],
    mode = 'markers',
    text = summary_df['Country'],
    marker = dict(
        size = 14,
        color = np.random.randn(700)
    )
)

# this layout will create meaningful formatting to help understand the data better
summary_layout = dict(
    width = 750,
    height = 450,
    title = 'Income Inequality vs. Voter Turnout',
    xaxis = {'title':'Voter Turnout Rate'},
    yaxis = {'title':'Gini Points'}
)

In [41]:
# store the trace in an array for plotting
data = [summary_trace]

In [42]:
# finally plot the data
iplot(go.Figure(data=data, layout=summary_layout))


Bibliography/Data Source


In [ ]: