Data Bootcamp UG Project

A closer look at the 2016 Presidential Election

This notebook was created by Mashal Moeen & Abhishek Dalal

Introduction

For our project we decided to study some of the demographic and economic factors on a county level basis across the United States and its correlation with the corresponding Vote Share for the Republican Presidential nominee Donald Trump during last month's general election. We start off by examining overall national trends and eventually look at the individual vote shares within the key rust belt states of Ohio, Wisconsin, Pennsylvania and Michigan, which were the states which flipped from Blue to Red in this election

In order to organize our project we will be particularly interested in identifying the correlation between a county's level of education, age breakdown, education levels and median income with the corresponding voteshare for Trump within the county. In order to run a comprehensive analysis of these effects, we first imported the data from the sources into Python.

Importing Packages


In [1]:
# import packages 
import pandas as pd                          
import matplotlib.pyplot as plt              
import matplotlib as mpl                     
import numpy as np                           
import seaborn as sns
import sys
import os
from plotly.offline import iplot, iplot_mpl  
import plotly.graph_objs as go               
import plotly                                
import cufflinks as cf                       
cf.set_config_file(offline=True, offline_show_link=False)

# IPython command, puts plots in notebook 
%matplotlib inline

# check Python version 
import datetime as dt 
import sys
print('Today is', dt.date.today())
print('What version of Python are we running? \n', sys.version, sep='')


/Users/sglyon/anaconda3/lib/python3.5/site-packages/matplotlib/__init__.py:878: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))
Today is 2016-12-22
What version of Python are we running? 
3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]

Data Sources and Description

Our primary data source for this project is the Github repository titled: 'Did-China-Cause-Trump'. The repository has compiled election data as well as data from demographic surveys from public websites, and their links have been attached in the Bibliography section. In order to run any meaningful analysis we must first import the data into Pandas.

Election Data: The first part of the importing process involves importing election data from the Github repository into the notebook. We then create a Dataframe that displays the total number of votes and Trump's Vote Count on a county level basis. Some additional code has been written to clean up the data.


In [2]:
# vote count data

url = 'https://raw.githubusercontent.com/mwaugh0328/Did-China-Cause-Trump/master/us-election-2016-results-by-county.csv'
election = pd.read_csv(url)

election.drop(election.columns[[0,2,5]], axis=1, inplace=True)              # drop superflous data
election = election.rename(columns={'CountyName': 'County'})                # rename column
election['County'] =  election['County'].astype(str) + ' ' + 'County'       # append enteries in the series

election = election[election.Candidate == 'Trump']                          # only look at Trump data
election['StateName'] = election['StateName'].str.capitalize()              # this will simplify merging
election['Geography'] = election['County'] + ", " + election['StateName']   # this will simplify merging
election = election.set_index('Geography')                                  # set index
election = election.drop(['StateName','County'], axis = 1)                  # drop superflous data
election.head()


Out[2]:
CountyTotalVote Candidate VoteCount
Geography
Alaska County, Alaska 246588 Trump 130415.0
Macon County, Alabama 8748 Trump 1394.0
Wilcox County, Alabama 6095 Trump 1737.0
Coosa County, Alabama 5223 Trump 3376.0
Blount County, Alabama 25384 Trump 22808.0

Demographic Data: We now import data based on the census surveys. We are interested in identifying education levels, age and gender break downs and median earnings within a county. Once again we create a dataframe in order to import this data into the notebook. Some additional code has been written to clean up the data. Once again this data is organized on a county level basis.


In [3]:
# demographic data

url1 = "https://raw.githubusercontent.com/mwaugh0328/Did-China-Cause-Trump/master/acs_education_fine.csv"
eco = pd.read_csv(url1,
                 skiprows = 1,                    # skip the column codes
                 na_values = ["*****",'(X)'],     # missing values
                 )
eco = eco.set_index('Geography')                  # set index
eco= eco.iloc[:,2:224:2]                          # get rid of margin of error columns
eco.head(5)


Out[3]:
Total; Estimate; Population 18 to 24 years Male; Estimate; Population 18 to 24 years Female; Estimate; Population 18 to 24 years Total; Estimate; Less than high school graduate Male; Estimate; Less than high school graduate Female; Estimate; Less than high school graduate Total; Estimate; High school graduate (includes equivalency) Male; Estimate; High school graduate (includes equivalency) Female; Estimate; High school graduate (includes equivalency) Total; Estimate; Some college or associate's degree ... Female; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - High school graduate (includes equivalency) Total; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Some college or associate's degree Male; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Some college or associate's degree Female; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Some college or associate's degree Total; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Bachelor's degree Male; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Bachelor's degree Female; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Bachelor's degree Total; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Graduate or professional degree Male; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Graduate or professional degree Female; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Graduate or professional degree
Geography
Autauga County, Alabama 4689 2375 2314 21.1 20.2 22.1 37.7 38.8 36.6 36.8 ... 22659 32378 41746 25674 50595 61019 39280 55806 73224 47300
Baldwin County, Alabama 14752 7544 7208 20.7 24.4 16.9 32.3 35.4 29.1 38.4 ... 19094 30206 40313 23639 46113 63092 36975 54309 74827 50388
Barbour County, Alabama 2421 1464 957 24.3 32.9 11.1 35.7 32.3 40.9 37.1 ... 18923 25463 37359 20144 41996 47808 28984 48446 37794 50163
Bibb County, Alabama 2067 1133 934 27.0 29.7 23.8 33.2 37.8 27.7 38.2 ... 16531 27366 38995 21664 31357 45417 18229 42451 40351 51597
Blount County, Alabama 4716 2446 2270 24.3 24.4 24.1 34.6 35.4 33.7 38.7 ... 22577 34954 42305 30373 47889 57917 39883 52156 52443 52072

5 rows × 111 columns

Upon further inspection of the dataframe we thought it would be wiser to rename some of the columns in the second data frame so that it would be easier to run further analysis. We also decided to parse the data frame for columns that were most important for our analysis and only retain those for the remainder of the project.


In [4]:
# rename columns

eco = eco.rename(columns={'Total; Estimate; Percent high school graduate or higher': 'Total Population (%) > HS',
                          "Total; Estimate; Percent bachelor's degree or higher" : 'Total Population (%) > BS',
                          'Total; Estimate; Population 18 to 24 years': 'Total Age 18-24',
                           'Male; Estimate; Population 18 to 24 years': 'Male Age 18-24',
                           'Female; Estimate; Population 18 to 24 years': 'Female Age 18-24',
                          'Total; Estimate; Population 25 to 34 years': 'Total Age 25-34',
                           'Male; Estimate; Population 25 to 34 years': 'Male Age 25-34',
                           'Female; Estimate; Population 25 to 34 years': 'Female Age 25-34',
                          'Total; Estimate; Population 35 to 44 years': 'Total Age 35-44',
                           'Male; Estimate; Population 35 to 44 years': 'Male Age 35-44',
                           'Female; Estimate; Population 35 to 44 years': 'Female Age 35-44',
                          'Total; Estimate; Population 45 to 64 years': 'Total Age 45-64',
                           'Male; Estimate; Population 45 to 64 years': 'Male Age 45-64',
                          'Female; Estimate; Population 45 to 64 years': 'Female Age 45-64',
                          'Total; Estimate; Population 65 years and over':'Total Age 65+',
                           'Male; Estimate; Population 65 years and over': 'Male Age 65+',
                          'Female; Estimate; Population 65 years and over': 'Female Age 65+',
                          'Total; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings': 'Median Earnings Age >25',
                          "Total; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Less than high school graduate":'Median Earnings Age >25, No HS',
                          "Total; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - High school graduate (includes equivalency)": 'Median Earnings Age >25, HS',
                          "Total; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Some college or associate's degree":'Median Earnings Age >25, Some BS',
                          "Total; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Bachelor's degree":'Median Earnings Age >25, BS',
                          "Total; Estimate; MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2014 INFLATION-ADJUSTED DOLLARS) - Population 25 years and over with earnings - Graduate or professional degree":'Median Earnings Age >25, Graduate'
                         })

In [35]:
# keep columns of interest

eco = eco[['Total Population (%) > HS',
           'Total Population (%) > BS',
           'Total Age 18-24',
           'Male Age 18-24',
           'Female Age 18-24',
           'Total Age 25-34',
           'Male Age 25-34',
           'Female Age 25-34',
           'Total Age 35-44',
           'Male Age 35-44',
           'Female Age 35-44',
           'Total Age 45-64',
           'Male Age 45-64',
           'Female Age 45-64',
           'Total Age 65+',
           'Male Age 65+',
           'Female Age 65+',
           'Median Earnings Age >25',
           'Median Earnings Age >25, No HS',
           'Median Earnings Age >25, HS',
           'Median Earnings Age >25, Some BS',
           'Median Earnings Age >25, BS',
           'Median Earnings Age >25, Graduate'
          ]]
eco.shape


Out[35]:
(3142, 23)

The next step involved combining the two dataframes so that we could easily draw graphics displaying salient trends and drawing important conclusions about correlations between demographic data and voting patterns. Additionally, we decided to create further columns to standardize variables across counties by taking percentages.


In [6]:
# combine data frames and create new columns

df = pd.concat([election, eco],                        # merge both data frames
              axis = 1,                                # merge on columns 
             join_axes = [election.index]              # take the row labels from election data frame
             )

# create % columns for further analysis 
df['VoteShare'] = (df['VoteCount'] / df ['CountyTotalVote'])*100
df['Total Population'] = df['Total Age 18-24'] + df['Total Age 25-34'] + df['Total Age 35-44'] + df['Total Age 45-64'] + df['Total Age 65+']
df['Population % 18-24'] = (df['Total Age 18-24'] / df['Total Population'])*100
df['Population % 25-34'] = (df['Total Age 25-34'] / df['Total Population'])*100
df['Population % 35-44'] = (df['Total Age 35-44'] / df['Total Population'])*100
df['Population % 45-64'] = (df['Total Age 45-64'] / df['Total Population'])*100
df['Population % 65+'] = (df['Total Age 65+'] / df['Total Population'])*100
df['% Male 18-24'] = (df['Male Age 18-24'] / df['Total Population'])*100
df['% Female 18-24'] = (df['Female Age 18-24'] / df['Total Population'])*100
df['% Male 25-34'] = (df['Male Age 25-34'] / df['Total Population'])*100
df['% Female 25-34'] = (df['Female Age 25-34'] / df['Total Population'])*100
df['% Male 35-44'] = (df['Male Age 35-44'] / df['Total Population'])*100
df['% Female 35-44'] = (df['Female Age 35-44'] / df['Total Population'])*100
df['% Male 45-64'] = (df['Male Age 45-64'] / df['Total Population'])*100
df['% Female 45-64'] = (df['Female Age 45-64'] / df['Total Population'])*100
df['% Male 65+'] = (df['Male Age 65+'] / df['Total Population'])*100
df['% Female 65+'] = (df['Male Age 65+'] / df['Total Population'])*100
df['Total Male'] = df['Male Age 18-24'] + df['Male Age 25-34'] + df['Male Age 35-44'] + df['Male Age 45-64'] + df['Male Age 65+']
df['% Male'] = (df['Total Male'] / df['Total Population'])*100
df['Total Female'] = df['Female Age 18-24'] + df['Female Age 25-34'] + df['Female Age 35-44'] + df['Female Age 45-64'] + df['Female Age 65+']
df['% Female'] = (df['Total Female'] / df['Total Population'])*100
df = df.dropna()

df.head(5)


Out[6]:
CountyTotalVote Candidate VoteCount Total Age 18-24 Male Age 18-24 Female Age 18-24 Total; Estimate; Less than high school graduate Male; Estimate; Less than high school graduate Female; Estimate; Less than high school graduate Total; Estimate; High school graduate (includes equivalency) ... % Male 35-44 % Female 35-44 % Male 45-64 % Female 45-64 % Male 65+ % Female 65+ Total Male % Male Total Female % Female
Geography
Macon County, Alabama 8748 Trump 1394.0 3970.0 1739.0 2231.0 10.0 15.2 6 20.7 ... 5.724109 6.622836 14.988841 17.419627 7.847277 7.847277 7425.0 44.785572 9154.0 55.214428
Wilcox County, Alabama 6095 Trump 1737.0 967.0 515.0 452.0 23.7 24.1 23.2 48.9 ... 6.427132 8.526029 17.585675 19.340685 9.024072 9.024072 3864.0 45.819993 4569.0 54.180007
Coosa County, Alabama 5223 Trump 3376.0 890.0 493.0 397.0 19.1 25.2 11.6 25.8 ... 8.465086 7.575758 20.575318 19.224857 9.574001 9.574001 4609.0 50.603865 4499.0 49.396135
Blount County, Alabama 25384 Trump 22808.0 4716.0 2446.0 2270.0 24.3 24.4 24.1 34.6 ... 8.609029 8.657010 17.604186 18.008591 9.335588 9.335588 21425.0 48.951289 22343.0 51.048711
Winston County, Alabama 10255 Trump 9225.0 1781.0 912.0 869.0 22.9 22.7 23 36.9 ... 7.762820 7.846629 18.458960 19.208004 10.575664 10.575664 9227.0 48.331675 9864.0 51.668325

5 rows × 135 columns

General Nationwide Trends:

First we look at the countrywide correlationship between demographic patterns within counties and the corresponding Vote Share for Trump.

Education and Voting Patterns

We plot two scatter diagrams comparing the correlationship between the percentage of High School graduates within a county and the corresponding Vote Share for Trump. In the second case we plot the percentage of Bachelors Degree or higher holders and the corresponding Vote Share for Trump.


In [7]:
plt.style.use("classic")

fig, ax = plt.subplots(2)

df.plot(x="Total Population (%) > HS", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
df.plot("Total Population (%) > BS", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="#F52887")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis titles
ax[0].set_title("Trump Vote Share: Population with High School Degrees or higher")
ax[1].set_title("Trump Vote Share: Population with Bachelor's Degrees or higher")
figure_title = "National Education Levels and Voting Patterns"

#set figure title
plt.text(0.5, 1.18,                                # location of title
         figure_title,                             # title
         horizontalalignment='center',             # title alignment
         fontsize=16,
         transform = ax[0].transAxes               # title placement
        )

fig.tight_layout()


Comparing the two scatter graphs above, we can infer some of the general voting patterns nationwide. While the first plot shows a more clustered data set the second plot shows a declining pattern between the total county population that has acquired a bachelors degree the corresponding vote share for Trump. Voters with just a high school diploma were more likely to vote for Trump and counties with a less educated population generally supported Trump.

Age and Voting Patterns

We now plot two scatter diagrams comparing the Vote Share for Trump with the percentage of county population between 18-24 and over 65.


In [8]:
fig, ax = plt.subplots(2)

df.plot(x="Population % 18-24", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
df.plot(x="Population % 65+", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Age and Support for Trump", fontsize = 16)


fig.tight_layout()


Both scatter plots display a significant amount of clustering. Although there is a discernible negative trend between the percentage of people in a county aged 18-24 and the corresponding Trump Vote Share. On the other hand there is a positive trend between the percentage of a county aged over 65 and the corresponding voteshare for Trump.

Gender and Voting Patterns

Next we compare the correlationship between the percentage of Males and then Females within a county and the corresponding Vote Share for Trump.


In [9]:
fig, ax = plt.subplots(2)

df.plot(x="% Male", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
df.plot(x="% Female", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Gender and Trump", fontsize = 16)


fig.tight_layout()


The two scatters do not provide much information about the relationship between gender and support for Trump. Interestingly enough, there appears to be a lot more counties with a Male population of greater than 50% than counties with Females comprising more than 50%. Similarly, there are more counties with fewer than 30% Females than counties with fewer than 30% Males.

Income and Voting Patterns

Finally we draw a regression plot of the county level Median Income on the corresponding Voteshare for Trump.


In [10]:
# regression plot showing relation between income and suport for Trump

sns.regplot(x="Median Earnings Age >25", y="VoteShare", data=df)


Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x11c35a588>

The preliminary reading of the graph above clearly shows that as the Median Earnings in a county increase, Trump's Vote Share within the county decreases.

Nationwide data shows an ambiguous relationship between support for Trump and our chosen estimators. It is probably more fruitful to look at trends within the four specific rust-belt states that swung from Blue to Red from 2012: Michigan, Pennsylvania, Ohio and Wisconsin.

Swing States


In [11]:
dfmi = df[df.index.str.contains("Michigan")==True]           # Michigan
dfpa = df[df.index.str.contains("Pennsylvania")==True]       # Pennsylvania
dfwi = df[df.index.str.contains('Wisconsin')==True]          # Wisconsin
dfoh = df[df.index.str.contains("Ohio")==True]               # Ohio

We will now repeat the analysis we conducted for the overall Nationwide trend for all the four states mentioned above and note our observations.

Michigan

We are going to study the county wide support for Trump in Michigan first.

Michigan: Education and Support for Trump


In [12]:
fig, ax = plt.subplots(2)

dfmi.plot(x="Total Population (%) > HS", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
dfmi.plot(x="Total Population (%) > BS", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Education and Support for Trump", fontsize = 16)


fig.tight_layout()


As you can interpret from the graphs, most Michigan counties have a higher proportion of population with High School diplomas; a demographic group more likely to vote for Trump. Population with Bachelors or higher degrees form a smaller percentage and even though this demographic shows a negative trend and were less likely to vote for Trump, their relative size is dominated by the less educated demographic helping explain the election outcome in Michigan.

Michigan: Age and Support for Trump


In [13]:
fig, ax = plt.subplots(2)

dfmi.plot(x="Population % 18-24", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
dfmi.plot(x="Population % 65+", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Age and Support for Trump", fontsize = 16)


fig.tight_layout()


As can be seen from the plot above, counties with a greater proportion of younger population are less likely to vote for Trump. However, counties with an older population base are more likely to vote for Trump. There is a visible downward trend in the first figure implying that counties with younger voters generally did not support Trump, hoowever, this relationship is weakened (more variable) as the percentage of young voters increases. On the other hand, counties with older voters favored Trump and demonstrated a strong positive relation between percentage of old voters and support for Trump. The two effects combined are sufficient to explain the election results.

Michigan: Gender and Support for Trump


In [14]:
m = sns.regplot("% Male", "VoteShare", data=dfmi, scatter = False, label = "Males")
f = sns.regplot("% Female", "VoteShare", data=dfmi, scatter = False, label = "Females")
plt.legend(loc='upper right')
plt.xlabel('Percentage of Population by Gender')
plt.ylabel('Trump Vote Share')


Out[14]:
<matplotlib.text.Text at 0x11dba8d68>

We see a striking contrast between voting patterns in Michigan counties as a greater percentage of population tends to either male or female. Counties with more than 50% men tended to be more supportive of Trump, whereas when the percentage of women in a county increased, support for Trump fell considerably. We may infer from the lection results that most Michigan counties have more male citizens compared to females.

Michigan: Income and Support for Trump


In [15]:
g = sns.regplot("Median Earnings Age >25", "VoteShare", data=dfmi)


Michigan follows the trend seen across the US and there appears to be a strong negative relationship between Median Eearnings Age >25 and the corresponding county's support for Trump in Michigan.

Ohio

Ohio: Education and Support for Trump


In [16]:
fig, ax = plt.subplots(2)

dfoh.plot(x="Total Population (%) > HS", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
dfoh.plot(x="Total Population (%) > BS", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Education and Support for Trump", fontsize = 16)


fig.tight_layout()


The scatter on top indicates a high amount of clustering thus implying that there was generally strong support for Trump amongst counties with a high percentage of High School Graduates. The scatter below shows a discernible but weak negative trend, wherein the county was less likely to support Trump as the percentage of people with Bachelors degrees increased. However, the negative relationship isn't strong enough to dominate the high school educatedd demographic group which forms the majoroity of Ohio population and is hence responsible for turning the state red this election day.

Ohio: Age and Support For Trump


In [17]:
fig, ax = plt.subplots(2)

dfoh.plot(x="Population % 18-24", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
dfoh.plot(x="Population % 65+", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Age and Support for Trump", fontsize = 16)


fig.tight_layout()


As can be seen from the plot above, there is a negative but weak trend between support for Trump and the percentage of younger people within the county. The older demographic, on the other hand, reveals a strong psoitive relationship helping explain the flip from blue to red.

Ohio : Gender and Support for Trump


In [18]:
m = sns.regplot("% Male", "VoteShare", data=dfmi, scatter = False, label = "Male")
f = sns.regplot("% Female", "VoteShare", data=dfmi, scatter = False, label = "Female")
plt.legend(loc='upper right')
plt.xlabel('Percentage of Population by Gender')
plt.ylabel('Trump Vote Share')


Out[18]:
<matplotlib.text.Text at 0x11c2e69e8>

We see the same contrast between voting patterns in Ohio counties as a greater percentage of population is either male or female. Counties with more than 50% men tended to be more supportive of Trump, whereas when the percentage of women in a county increased, support for Trump fell considerably. It must be noted that there were more counties with more than a 50% male population which may help explain why Trump won the state of Ohio.

Ohio : Income and Support for Trump


In [19]:
g = sns.regplot("Median Earnings Age >25", "VoteShare", data=dfoh)


There is a strong negative corelationship between Median Earnings and Support for Trump, as observed in other states and the nationwide vote.

Pennslyvania

Pennsylvania: Education and Support For Trump


In [20]:
fig, ax = plt.subplots(2)

dfpa.plot(x="Total Population (%) > HS", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
dfpa.plot(x="Total Population (%) > BS", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Education and Support for Trump", fontsize = 16)


fig.tight_layout()


We can observe that the scatter above is clustered, indicating a high level support for Trump within counties that have a high population of High School graduates, with a little more variance. The scatter below shows a clear negative relationship, support for Trump falls as the percentage of people with college degrees increases. But since the high school educated population far exceeds higher educated population, we can clearly see why Trump won the state of Pennsylvania, a trend we have observed in other states as well.

Pennsylvania: Age and Support for Trump


In [21]:
fig, ax = plt.subplots(2)

dfpa.plot(x="Population % 18-24", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
dfpa.plot(x="Population % 65+", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Age and Support for Trump", fontsize = 16)


fig.tight_layout()


Young voters in Pennsylvania demonstrate a weakly negative relationship with support for Trump while older voters show a strong positive relation with support for Trump. older voters were vehement in showing their support for Trump but young voters in Pennsylvania do not seem to have showcased a similar interest in the lecetion. This may help explain why the blue state turned red this election day.

Pennsylvania: Gender and Support for Trump


In [22]:
g = sns.regplot("% Male", "VoteShare", data=dfpa, scatter = False, label = "Male")
b = sns.regplot("% Female", "VoteShare", data=dfpa, scatter = False, label = "Female")
plt.legend(loc='upper right')
plt.xlabel('Percentage of Population by Gender')
plt.ylabel('Trump Vote Share')


Out[22]:
<matplotlib.text.Text at 0x11c2bd6d8>

The gender divide is less pronounced in Pennsylvania, however the general trend holds true, and there is an increase in support as Male population increases, whereas the opposite occurs when the female population increases. We can see from the shaded area that the female vote was a lot more variable in Pennsylvania and edged towards supporting Trump.

Pennsylvania: Income and Support for Trump


In [23]:
g = sns.regplot("Median Earnings Age >25", "VoteShare", data=dfpa)


There is a strong negative relationship between the Median Earnings of a county and general support for Trump.

Wisconsin

Education and Support For Trump


In [24]:
fig, ax = plt.subplots(2)

dfwi.plot(x="Total Population (%) > HS", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
dfwi.plot(x="Total Population (%) > BS", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Education and Support for Trump", fontsize = 16)


fig.tight_layout()


We can see that the scatter above is clustered again, indicating a high level support for Trump within counties that have a high population of High School graduates, with a little more variance. The scatter below shows a weak negative relationship, support for Trump falls as the percentage of people with college degrees increases. This implies that while less educated people strongly supported Trump, the other segment of Wisconsin population wasn't vehement in showing their disapproval of the candidate.

Wisonsin: Age and Support for Trump


In [25]:
fig, ax = plt.subplots(2)

dfwi.plot(x="Population % 18-24", y="VoteShare", ax=ax[0], kind="scatter",
                                 color="blue")
dfwi.plot(x="Population % 65+", y="VoteShare", ax=ax[1], kind="scatter",
                                   color="red")

# set axis limits
ax[0].set_xlim(0, 100)
ax[1].set_xlim(0, 100)
ax[0].set_ylim(0, 100)
ax[1].set_ylim(0, 100)

# set axis title
ax[0].set_title("Age and Support for Trump", fontsize = 16)


fig.tight_layout()


As the population percentage of 18-24 year olds increases there is a slight decline in support for Trump. The opposite holds true for the older demographics with a slightly stronger increase in support as their proportion increases. Putting the two together, we can see why the state voted red in this election.

Wisonsin: Gender and Support for Trump


In [26]:
g = sns.regplot("% Male", "VoteShare", data=dfwi, scatter = False, label = "Male")
b = sns.regplot("% Female", "VoteShare", data=dfwi, scatter = False, label = "Female")
plt.legend(loc='upper right')
plt.xlabel('Percentage of Population by Gender')
plt.ylabel('Trump Vote Share')


Out[26]:
<matplotlib.text.Text at 0x11e76d208>

We see the same contrast between voting patterns in Wisconsin counties as a greater percentage of population is either male or female. Counties with more than 50% men tended to be more supportive of Trump, whereas the percentage of women in a county increased support for Trump fell considerably.

Wisonsin: Median Earnings and Support For Trump


In [27]:
g = sns.regplot("Median Earnings Age >25", "VoteShare", data=dfwi)


There is a negative relationship between Median Earnings and Support for Trump. However the trend is relatively flat compared to the other states, and there is a high amount of variance from the trendline, helping explain the change from blue to red.

Conclusion

While observing the general nationwide trend for factors that correlated with high support for Trump, it was hard to identify any discernible pattern for the Education, Age and Gender variables. However there was a negative trend between Median Earnings and Support for Trump. To explore the effects of these factors further, we decided to drill down on the four key rust-belt states that flipped from Blue to Red: Pennsylvania, Ohio, Wisconsin and Michigan. As observed from those datas there were indeed distinct demographic and economic factors within these states that correlated with high support for Trump. It was interesting to note that Median Earnings tended to have a negative relationship with support for Trump across all these states. The gender divide was also clearly observable in all states. The level of education within a county also tended to correlate with support for Trump with an observable negative relationship between the percentage of the county with a college degree and the corresponding voteshare for Trump. The correlation of age and support for Trump is also visible across all states, with a clear divide between the percentage of Younger and Older voters within a county and the corresponding support for Trump.

Bibliography