Class-divide and wealth inequality in America is at an all-time high and has dramatically increased since the 1970s. Following the most recent Presidential elections, one might wonder whether income and voter turnout can be somehow related.
Numerous economists have argued that “voters are virtually a carbon copy of the citizen population” (Wolfing, Rosenstone 1980). However, more recent research are starting to question this correlation. It is not that the earlier economists and researchers have been wrong, but “the preferences of voters and nonvoters are becoming increasingly divergent” (Mcelwee 2015). This is particularly relevant in the recent election, where many said that in order to win, both parties need to know the voting demographics of the swing state population. Many analysts broke down the population by race, gender, and education; However we believe that income inequality plays a role too. In fact there has been strong evidence pointing to such a conclusion: in the 2012 election, 80.2 percent of people whose yearly income is above \$150,000 voted, while only 46.9 percent of people whose yearly income falls below $10,000 voted (Mcelwee 2015). This class-voting bias was observed in the 2008 and 2010 election too. Hence we would like to understand if there really is a voting disparity between the rich and the poor in America, and around the world.
We started out by using voter turnout data from International Institute for Democracy and Electoral Assistance (IDEA), a strong favorite amongst researchers looking into voting patterns and data. They have a comprehensive voter turnout data by country-year. We will also be using Gini coefficient data from CLIO, a reputable database with worldwide data on social, economic, and institutional indicators for the past five centuries. We believe that Gini coefficient is a reliable measure of the income distribution within a country, as it is the most commonly used measure of inequality.
As our interest lies particularly on class-voter bias’ effect on the recent election, but there is not enough data for the 2016 election, we chose to focus on Gini coefficient/income inequality data from 2000. We believe the social and political climate back then was the closest to the recent election, given the dot com bubble and the Gore vs. Bush election recount. The economic climate was also similar, as the NASDAQ index was near the same values then as it is now.
In [3]:
import sys # system module
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics module
import datetime as dt # date and time module
import numpy as np # foundation for Pandas
import seaborn.apionly as sns # fancy matplotlib graphics (no styling)
from pandas_datareader import wb, data as web # worldbank data
import quandl
import json
import time
# plotly imports
from plotly.offline import iplot, iplot_mpl # plotting functions
import plotly.graph_objs as go # ditto
import plotly # just to print version and init notebook
import cufflinks as cf # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)
# these lines make our graphics show up in the notebook
%matplotlib inline
plotly.offline.init_notebook_mode(connected=True)
# check versions (overkill, but why not?)
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())
The decision to use a local JSON file containing API keys and data URLs was made in order to keep that sensitive information confidential
In [4]:
# the function below translates JSON files into a Python dictionary
def convert_json_file(filename):
with open(filename, 'r') as full_json_file:
json_data = json.load(full_json_file)
full_json_file.close()
return json_data
In [5]:
# translate the JSON file containing secret API keys and other info into a dictionary for use throughout the program
secrets = convert_json_file('data/keys.json')
Getting data from Quandl was straight forward. Just make an HTTP GET request to their API using their easy to use module and the data was there as a dataframe. Downloading a .xls file on the otherhand was tricky. In order to do this, the file had to be formatted to a .csv file and hosted on GitHub in order to make the notebook less localized.
In [6]:
# this gets the US Income Inequality data form quandl as a dataframe
us_income_ineq = quandl.get('CLIO/USA_II', authtoken=secrets['QUANDL_KEY'])
In [7]:
us_income_ineq
Out[7]:
In [8]:
# this gets the UK Income Inequality data as a dataframe
uk_income_ineq = quandl.get('CLIO/GBR_II', authtoken=secrets['QUANDL_KEY'])
In [9]:
uk_income_ineq
Out[9]:
In [10]:
# this gets the Australia Income Inequality data as a dataframe
australia_income_ineq = quandl.get('CLIO/AUS_II', authtoken=secrets['QUANDL_KEY'])
In [11]:
australia_income_ineq
Out[11]:
In [12]:
# this gets the Canada Income Inequality data as a dataframe
canada_income_ineq = quandl.get('CLIO/CAN_II', authtoken=secrets['QUANDL_KEY'])
In [13]:
canada_income_ineq
Out[13]:
In [14]:
# read and clean up the .csv file that is linked to and store it as a dataframe
voter_data = pd.read_csv(secrets['VOTER_DATA']).dropna(0)
In [15]:
# check for any rows with a blank column 'Country'
voter_data.tail(15)
Out[15]:
In [17]:
# this isolates and stores the US voter turnout data
us_voter_data = voter_data.loc[voter_data['Country'].isin(['United States'])].drop_duplicates(['Year', 'Data'])
us_voter_data = us_voter_data.set_index(['Year'])
us_voter_data.index = pd.to_datetime(us_voter_data.index, format='%Y')
us_voter_data
Out[17]:
In [18]:
# this isolates and stores the UK voter turnout data
uk_voter_data = voter_data.loc[voter_data['Country'].isin(['United Kingdom'])].drop_duplicates(['Year', 'Data'])
uk_voter_data = uk_voter_data.set_index(['Year'])
uk_voter_data.index = pd.to_datetime(uk_voter_data.index, format='%Y')
uk_voter_data
Out[18]:
In [19]:
# this isolates and stores the Australia voter turnout data
australia_voter_data = voter_data.loc[voter_data['Country'].isin(['Australia'])].drop_duplicates(['Year', 'Data'])
australia_voter_data = australia_voter_data.set_index(['Year'])
australia_voter_data.index = pd.to_datetime(australia_voter_data.index, format='%Y')
australia_voter_data
Out[19]:
In [20]:
# this isolates and stores the US voter turnout data
canada_voter_data = voter_data.loc[voter_data['Country'].isin(['Canada'])].drop_duplicates(['Year', 'Data'])
canada_voter_data = canada_voter_data.set_index(['Year'])
canada_voter_data.index = pd.to_datetime(canada_voter_data.index, format='%Y')
canada_voter_data
Out[20]:
In [22]:
# this creates a dictionary of countries along with their respective voter and income inequality data
summary_dictionary = {canada_voter_data['Country'].iloc[0]:[canada_voter_data['Data'].iloc[0],
canada_income_ineq['Value'].iloc[-1]],
australia_voter_data['Country'].iloc[0]:[australia_voter_data['Data'].iloc[0],
australia_income_ineq['Value'].iloc[-1]],
us_voter_data['Country'].iloc[0]:[us_voter_data['Data'].iloc[0],
us_income_ineq['Value'].iloc[-1]],
uk_voter_data['Country'].iloc[0]:[uk_voter_data['Data'].iloc[0],
uk_income_ineq['Value'].iloc[-1]]
}
In [23]:
summary_dictionary
Out[23]:
In [24]:
# from the dictionary, we can create and shape a dataframe of the country data of interest
summary_df = pd.DataFrame.from_dict(summary_dictionary, orient='index')
summary_df = summary_df.reset_index()
summary_df.columns = ['Country', 'Voter Turnout Rate', 'Gini Points']
summary_df
Out[24]:
After collecting the relevant data and shaping it, we can now start plotting the information. These visualizations help us understand the data that's been gathered.
Here, we plot the percentage of eligible individuals participating in the political process against the distribution of wealth in the United States. Interestingly, beginning around 1970 political participation began to decline as income inequality began to increase. Of course, voter participation changes dramatically between presidential and non-presidential elections. Periods like 2002 to 2004 show 20% increases in voter turnout rate.
In [25]:
# create traces of both the income inequality and voter turnout data for the US
us_income_trace = go.Scatter(
x = us_income_ineq.index.to_series(),
y = us_income_ineq['Value'],
name = 'Income Data',
mode = 'lines+markers'
)
us_voter_trace = go.Scatter(
x = us_voter_data.index.to_series(),
y = us_voter_data['Data'],
name = 'Voter Data',
mode = 'lines+markers'
)
us_summary_layout = dict(
width = 750,
height = 450,
title = 'US Income Inequality and Voting Patterns',
xaxis = {'title':'Year'},
yaxis = {'title':'Gini Points & Political Participation Rate'}
)
In [26]:
us_summary_data = [us_income_trace, us_voter_trace]
In [27]:
iplot(go.Figure(data=us_summary_data, layout=us_summary_layout))
The United Kingdom shows no immediately discernable pattern. However, growing income inequality in the last few decades could be resulting in growing political participation in the 2000s as globalism and the prominence of the EU prompt greater participation for the better or worse.
In [28]:
# create traces of both the income inequality and voter turnout data for the UK
uk_income_trace = go.Scatter(
x = uk_income_ineq.index.to_series(),
y = uk_income_ineq['Value'],
name = 'Income Data',
mode = 'lines+markers'
)
uk_voter_trace = go.Scatter(
x = uk_voter_data.index.to_series(),
y = uk_voter_data['Data'],
name = 'Voter Data',
mode = 'lines+markers'
)
uk_summary_layout = dict(
width = 750,
height = 450,
title = 'UK Income Inequality and Voting Patterns',
xaxis = {'title':'Year'},
yaxis = {'title':'Gini Points & Political Participation Rate'}
)
In [29]:
uk_summary_data = [uk_income_trace, uk_voter_trace]
In [30]:
iplot(go.Figure(data=uk_summary_data, layout=uk_summary_layout))
In [31]:
# create traces of both the income inequality and voter turnout data for Australia
australia_income_trace = go.Scatter(
x = australia_income_ineq.index.to_series(),
y = australia_income_ineq['Value'],
name = 'Income Data',
mode = 'lines+markers'
)
australia_voter_trace = go.Scatter(
x = australia_voter_data.index.to_series(),
y = australia_voter_data['Data'],
name = 'Voter Data',
mode = 'lines+markers'
)
australia_summary_layout = dict(
width = 750,
height = 450,
title = 'Australia Income Inequality and Voting Patterns',
xaxis = {'title':'Year'},
yaxis = {'title':'Gini Points & Political Participation Rate'}
)
In [32]:
australia_summary_data = [australia_income_trace, australia_voter_trace]
In [33]:
iplot(go.Figure(data=australia_summary_data, layout=australia_summary_layout))
In [34]:
# create traces of both the income inequality and voter turnout data for Canada
canada_income_trace = go.Scatter(
x = canada_income_ineq.index.to_series(),
y = canada_income_ineq['Value'],
name = 'Income Data',
mode = 'lines+markers'
)
canada_voter_trace = go.Scatter(
x = canada_voter_data.index.to_series(),
y = canada_voter_data['Data'],
name = 'Voter Data',
mode = 'lines+markers'
)
canada_summary_layout = dict(
width = 750,
height = 450,
title = 'Canada Income Inequality and Voting Patterns',
xaxis = {'title':'Year'},
yaxis = {'title':'Gini Points & Political Participation Rate'}
)
In [35]:
canada_summary_data = [canada_income_trace, canada_voter_trace]
In [36]:
iplot(go.Figure(data=canada_summary_data, layout=canada_summary_layout))
In [40]:
# create a trace of the data that will be used to create an informative plot later
summary_trace = go.Scatter(
x = summary_df['Voter Turnout Rate'],
y = summary_df['Gini Points'],
mode = 'markers',
text = summary_df['Country'],
marker = dict(
size = 14,
color = np.random.randn(700)
)
)
# this layout will create meaningful formatting to help understand the data better
summary_layout = dict(
width = 750,
height = 450,
title = 'Income Inequality vs. Voter Turnout',
xaxis = {'title':'Voter Turnout Rate'},
yaxis = {'title':'Gini Points'}
)
In [41]:
# store the trace in an array for plotting
data = [summary_trace]
In [42]:
# finally plot the data
iplot(go.Figure(data=data, layout=summary_layout))
In [ ]: