notebook.community

Edit and run



In [1]:

    
import pandas as pd

Identify your problem statement, find all your datasets, identify the questions you want to answer, reach out to polling/consulting firms to work with.

Potential question--Why did these counties flip to Trump?

Explore your data to understand it--drop data that is not relevant

Look to predict something (next presidential election outcome).

Think about what would happen if more people became UNINSURED and the result that could have.

Should slcie by margin of county flip. First-fourth quartiles

Look at population counts per county.

Margin of victory/voting which way (Trump/Clinton) is more important to predict than simply whcih flipped (make that a subset)

A listing of the specific counties that flipped: http://www.npr.org/2016/11/15/502032052/lots-of-people-voted-for-obama-and-trump-heres-where-in-3-charts

Nate Silver postulates that education level is a key predctor. http://fivethirtyeight.com/features/education-not-income-predicted-who-would-vote-for-trump/?ex_cid=story-twitter

Daily Kos article: http://www.dailykos.com/story/2017/1/30/1627319/-Daily-Kos-Elections-presents-the-2016-presidential-election-results-by-congressional-district

Diversity Index scource: https://www.kaggle.com/mikejohnsonjr/us-counties-diversity-index



In [2]:

    
election = pd.read_csv('2016_election.csv')



In [3]:

    
prev_election = pd.read_csv('2012_election.csv')



In [4]:

    
div = pd.read_csv('diversityindex.csv')



In [5]:

    
edu = pd.read_excel('education_25_older_filt.xls')

Change in education the past 10 years--find the difference between them for each county



In [6]:

    
pop = pd.read_excel('us county populations.xls')



In [56]:

    
ue_rates = pd.read_excel('Unemployment Rates.xlsx')
ue_rates = ue_rates.drop(ue_rates[[0,1,2,4,5]],axis=1)
ue_rates = ue_rates.rename(columns={'Unnamed: 3':'county_state','Unnamed: 6':'labor_force', 'Unnamed: 7':'employed','Unnamed: 8':'unemployed','Unnamed: 9':'ue_rate'})
ue_rates = ue_rates.drop(ue_rates.index[[0,1,2,3,4]])



In [7]:

    
len(edu)









    Out[7]:





3283



In [8]:

    
len(pop)









    Out[8]:





3145



In [9]:

    
pop.dtypes









    Out[9]:





state              object
county             object
est_pop_2015        int64
pop_change_2015     int64
int_mig_2015        int64
dom_mig_2015        int64
mig_2015            int64
dtype: object



In [10]:

    
div.head()









    Out[10]:






  
    
      
      Location
      Diversity-Index
      Black or African American alone, percent, 2013
      American Indian and Alaska Native alone, percent, 2013
      Asian alone, percent, 2013
      Native Hawaiian and Other Pacific Islander alone, percent,
      Two or More Races, percent, 2013
      Hispanic or Latino, percent, 2013
      White alone, not Hispanic or Latino, percent, 2013
    
  
  
    
      0
      Aleutians West Census Area, AK
      0.769346
      7.4
      13.8
      31.1
      2.3
      4.8
      14.6
      29.2
    
    
      1
      Queens County, NY
      0.742224
      20.9
      1.3
      25.2
      0.2
      2.7
      28.0
      26.7
    
    
      2
      Maui County, HI
      0.740757
      0.8
      0.6
      28.8
      10.6
      23.3
      10.7
      31.5
    
    
      3
      Alameda County, CA
      0.740399
      12.4
      1.2
      28.2
      1.0
      5.2
      22.7
      33.2
    
    
      4
      Aleutians East Borough, AK
      0.738867
      7.7
      21.8
      41.4
      0.7
      3.7
      13.5
      12.9



In [11]:

    
div = div.rename(columns={'Location':'county_state','Diversity-Index':'div_index','Black or African American alone, percent, 2013':'af_am','American Indian and Alaska Native alone, percent, 2013':'native_2013','Asian alone, percent, 2013':'asian_am','Native Hawaiian and Other Pacific Islander alone, percent,':'pac_am','Two or More Races, percent, 2013':'two_or_more_races','Hispanic or Latino, percent, 2013':'hisp_lat_am','White alone, not Hispanic or Latino, percent, 2013':'white_am'})



In [12]:

    
div.head()









    Out[12]:






  
    
      
      county_state
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      hisp_lat_am
      white_am
    
  
  
    
      0
      Aleutians West Census Area, AK
      0.769346
      7.4
      13.8
      31.1
      2.3
      4.8
      14.6
      29.2
    
    
      1
      Queens County, NY
      0.742224
      20.9
      1.3
      25.2
      0.2
      2.7
      28.0
      26.7
    
    
      2
      Maui County, HI
      0.740757
      0.8
      0.6
      28.8
      10.6
      23.3
      10.7
      31.5
    
    
      3
      Alameda County, CA
      0.740399
      12.4
      1.2
      28.2
      1.0
      5.2
      22.7
      33.2
    
    
      4
      Aleutians East Borough, AK
      0.738867
      7.7
      21.8
      41.4
      0.7
      3.7
      13.5
      12.9



In [13]:

    
len(div)









    Out[13]:





3195



In [14]:

    
election.head()









    Out[14]:






  
    
      
      Unnamed: 0
      votes_dem
      votes_gop
      total_votes
      per_dem
      per_gop
      diff
      per_point_diff
      state_abbr
      county_name
      combined_fips
    
  
  
    
      0
      0
      93003.0
      130413.0
      246588.0
      0.377159
      0.52887
      37,410
      15.17%
      AK
      Alaska
      2013
    
    
      1
      1
      93003.0
      130413.0
      246588.0
      0.377159
      0.52887
      37,410
      15.17%
      AK
      Alaska
      2016
    
    
      2
      2
      93003.0
      130413.0
      246588.0
      0.377159
      0.52887
      37,410
      15.17%
      AK
      Alaska
      2020
    
    
      3
      3
      93003.0
      130413.0
      246588.0
      0.377159
      0.52887
      37,410
      15.17%
      AK
      Alaska
      2050
    
    
      4
      4
      93003.0
      130413.0
      246588.0
      0.377159
      0.52887
      37,410
      15.17%
      AK
      Alaska
      2060



In [15]:

    
election.county_name.count()









    Out[15]:





3141



In [16]:

    
#Need to drop Alaska as it doesn't have any county names
election = election[election.county_name!='Alaska']
pop = pop[pop.county!='Alaska']



In [17]:

    
election.head()









    Out[17]:






  
    
      
      Unnamed: 0
      votes_dem
      votes_gop
      total_votes
      per_dem
      per_gop
      diff
      per_point_diff
      state_abbr
      county_name
      combined_fips
    
  
  
    
      29
      29
      5908.0
      18110.0
      24661.0
      0.239569
      0.734358
      12,202
      49.48%
      AL
      Autauga County
      1001
    
    
      30
      30
      18409.0
      72780.0
      94090.0
      0.195653
      0.773515
      54,371
      57.79%
      AL
      Baldwin County
      1003
    
    
      31
      31
      4848.0
      5431.0
      10390.0
      0.466603
      0.522714
      583
      5.61%
      AL
      Barbour County
      1005
    
    
      32
      32
      1874.0
      6733.0
      8748.0
      0.214220
      0.769662
      4,859
      55.54%
      AL
      Bibb County
      1007
    
    
      33
      33
      2150.0
      22808.0
      25384.0
      0.084699
      0.898519
      20,658
      81.38%
      AL
      Blount County
      1009



In [18]:

    
election = election.drop(election[[0,10]], axis=1)



In [19]:

    
election['county_state'] = election['county_name'] + ', ' + election['state_abbr']



In [20]:

    
prev_election['county_state'] = prev_election['county_name'] + ', ' + prev_election['state_abbr']



In [21]:

    
pop.head()









    Out[21]:






  
    
      
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
    
  
  
    
      0
      AL
      Alabama
      4858979
      12568
      5726
      -2268
      3458
    
    
      1
      AL
      Autauga County
      55347
      57
      19
      -140
      -121
    
    
      2
      AL
      Baldwin County
      203709
      3996
      221
      3469
      3690
    
    
      3
      AL
      Barbour County
      26489
      -326
      0
      -281
      -281
    
    
      4
      AL
      Bibb County
      22583
      34
      21
      4
      25



In [22]:

    
pop['county_state'] = pop['county'] + ', ' + pop['state']



In [23]:

    
election.head()









    Out[23]:






  
    
      
      votes_dem
      votes_gop
      total_votes
      per_dem
      per_gop
      diff
      per_point_diff
      state_abbr
      county_name
      county_state
    
  
  
    
      29
      5908.0
      18110.0
      24661.0
      0.239569
      0.734358
      12,202
      49.48%
      AL
      Autauga County
      Autauga County, AL
    
    
      30
      18409.0
      72780.0
      94090.0
      0.195653
      0.773515
      54,371
      57.79%
      AL
      Baldwin County
      Baldwin County, AL
    
    
      31
      4848.0
      5431.0
      10390.0
      0.466603
      0.522714
      583
      5.61%
      AL
      Barbour County
      Barbour County, AL
    
    
      32
      1874.0
      6733.0
      8748.0
      0.214220
      0.769662
      4,859
      55.54%
      AL
      Bibb County
      Bibb County, AL
    
    
      33
      2150.0
      22808.0
      25384.0
      0.084699
      0.898519
      20,658
      81.38%
      AL
      Blount County
      Blount County, AL



In [24]:

    
edu.head()









    Out[24]:






  
    
      
      FIPS Code
      State
      Area name
      less_hs_diploma_2000
      hs_diploma_only_2000
      less_4_years_2000
      four_or_ higher_2000
      per_less_high_school diploma_2000
      per_hs_diploma_only_2000
      per_less_4_years_2000
      per_four_or_ higher_2000
      less_high_school_diploma_2011_15
      hs_diploma_only_2011_15
      less_4_years_2011_15
      four_or_ higher_2011_15
      per_less_high_school_diploma_2011_15
      per_hs_diploma_only_2011_15
      per_less_4_years_2011_15
      per_four_or_higher_2011_15
    
  
  
    
      0
      0
      US
      United States
      35715625.0
      52168981.0
      49864428.0
      44462605.0
      19.6
      28.6
      27.4
      24.4
      28229094.0
      58722528.0
      61558628.0
      62952272.0
      13.3
      27.8
      29.1
      29.8
    
    
      1
      1000
      AL
      Alabama
      714081.0
      877216.0
      746495.0
      549608.0
      24.7
      30.4
      25.9
      19.0
      509891.0
      1005295.0
      962515.0
      761650.0
      15.7
      31.0
      29.7
      23.5
    
    
      2
      1001
      AL
      Autauga County
      5872.0
      9332.0
      7413.0
      4972.0
      21.3
      33.8
      26.9
      18.0
      4656.0
      12182.0
      11044.0
      8437.0
      12.8
      33.5
      30.4
      23.2
    
    
      3
      1003
      AL
      Baldwin County
      17258.0
      28428.0
      28178.0
      22146.0
      18.0
      29.6
      29.3
      23.1
      14360.0
      39431.0
      43500.0
      39710.0
      10.5
      28.8
      31.8
      29.0
    
    
      4
      1005
      AL
      Barbour County
      6679.0
      6124.0
      4025.0
      2068.0
      35.3
      32.4
      21.3
      10.9
      5021.0
      6490.0
      4943.0
      2354.0
      26.7
      34.5
      26.3
      12.5



In [25]:

    
pop.head()









    Out[25]:






  
    
      
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      county_state
    
  
  
    
      0
      AL
      Alabama
      4858979
      12568
      5726
      -2268
      3458
      Alabama, AL
    
    
      1
      AL
      Autauga County
      55347
      57
      19
      -140
      -121
      Autauga County, AL
    
    
      2
      AL
      Baldwin County
      203709
      3996
      221
      3469
      3690
      Baldwin County, AL
    
    
      3
      AL
      Barbour County
      26489
      -326
      0
      -281
      -281
      Barbour County, AL
    
    
      4
      AL
      Bibb County
      22583
      34
      21
      4
      25
      Bibb County, AL



In [26]:

    
edu['county_state'] = edu['Area name'] + ', ' + edu['State']



In [27]:

    
edu.dtypes









    Out[27]:





FIPS Code                                 int64
State                                    object
Area name                                object
less_hs_diploma_2000                    float64
hs_diploma_only_2000                    float64
less_4_years_2000                       float64
four_or_ higher_2000                    float64
per_less_high_school diploma_2000       float64
per_hs_diploma_only_2000                float64
per_less_4_years_2000                   float64
per_four_or_ higher_2000                float64
less_high_school_diploma_2011_15        float64
hs_diploma_only_2011_15                 float64
less_4_years_2011_15                    float64
four_or_ higher_2011_15                 float64
per_less_high_school_diploma_2011_15    float64
per_hs_diploma_only_2011_15             float64
per_less_4_years_2011_15                float64
per_four_or_higher_2011_15              float64
county_state                             object
dtype: object



In [28]:

    
edu.isnull().sum()









    Out[28]:





FIPS Code                                0
State                                    0
Area name                                0
less_hs_diploma_2000                    11
hs_diploma_only_2000                    11
less_4_years_2000                       11
four_or_ higher_2000                    11
per_less_high_school diploma_2000       11
per_hs_diploma_only_2000                11
per_less_4_years_2000                   11
per_four_or_ higher_2000                11
less_high_school_diploma_2011_15        10
hs_diploma_only_2011_15                 10
less_4_years_2011_15                    10
four_or_ higher_2011_15                 10
per_less_high_school_diploma_2011_15    10
per_hs_diploma_only_2011_15             10
per_less_4_years_2011_15                10
per_four_or_higher_2011_15              10
county_state                             0
dtype: int64



In [29]:

    
edu = edu.dropna()



In [30]:

    
edu.isnull().sum()









    Out[30]:





FIPS Code                               0
State                                   0
Area name                               0
less_hs_diploma_2000                    0
hs_diploma_only_2000                    0
less_4_years_2000                       0
four_or_ higher_2000                    0
per_less_high_school diploma_2000       0
per_hs_diploma_only_2000                0
per_less_4_years_2000                   0
per_four_or_ higher_2000                0
less_high_school_diploma_2011_15        0
hs_diploma_only_2011_15                 0
less_4_years_2011_15                    0
four_or_ higher_2011_15                 0
per_less_high_school_diploma_2011_15    0
per_hs_diploma_only_2011_15             0
per_less_4_years_2011_15                0
per_four_or_higher_2011_15              0
county_state                            0
dtype: int64



In [31]:

    
len(edu)









    Out[31]:





3267



In [32]:

    
import seaborn as sns
import matplotlib.pyplot as plt
ax = sns.distplot(edu.per_less_high_school_diploma_2011_15, kde=False)
ax.set(xlabel='Percentage per county with less than a High School Diploma, 2011-2015', ylabel='Count')
ax.set_title('Education Across All US Counties', fontsize=16, fontname='Ubuntu')
plt.show()









    



/Applications/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:1288: UserWarning: findfont: Font family [u'Ubuntu'] not found. Falling back to Bitstream Vera Sans
  (prop.get_family(), self.defaultFamily[fontext]))



In [33]:

    
ax = sns.distplot(edu.per_hs_diploma_only_2011_15, kde=False)
ax.set(xlabel='Percentage per county with only High School Diploma, 2011-2015', ylabel='Count')
ax.set_title('Education Across All US Counties', fontsize=16, fontname='Ubuntu')
plt.show()



In [34]:

    
ax = sns.distplot(edu.per_less_4_years_2011_15, kde=False)
ax.set(xlabel='Percentage per county with less than four years of college, 2011-2015', ylabel='Count')
ax.set_title('Education Across All US Counties', fontsize=16, fontname='Ubuntu')
plt.show()



In [35]:

    
ax = sns.distplot(edu.per_four_or_higher_2011_15, kde=False)
ax.set(xlabel='Percentage per county with four or more years of college, 2011-2015', ylabel='Count')
ax.set_title('Education Across All US Counties', fontsize=16, fontname='Ubuntu')
plt.show()



In [36]:

    
election['per_dem'] = election['per_dem'].apply(lambda x: x*100)
election['per_gop'] = election['per_gop'].apply(lambda x: x*100)



In [37]:

    
prev_election['per_dem_2012'] = prev_election['per_dem_2012'].apply(lambda x: x*100)
prev_election['per_gop_2012'] = prev_election['per_gop_2012'].apply(lambda x: x*100)



In [38]:

    
election['per_point_diff'] = election['per_point_diff'].apply(lambda x: float(x.strip('%')))



In [39]:

    
# Making a new column for positive and negative--if per_dem is below 50%, negative. If
# above 50%, positive.



In [40]:

    
election.head()









    Out[40]:






  
    
      
      votes_dem
      votes_gop
      total_votes
      per_dem
      per_gop
      diff
      per_point_diff
      state_abbr
      county_name
      county_state
    
  
  
    
      29
      5908.0
      18110.0
      24661.0
      23.956855
      73.435789
      12,202
      49.48
      AL
      Autauga County
      Autauga County, AL
    
    
      30
      18409.0
      72780.0
      94090.0
      19.565310
      77.351472
      54,371
      57.79
      AL
      Baldwin County
      Baldwin County, AL
    
    
      31
      4848.0
      5431.0
      10390.0
      46.660250
      52.271415
      583
      5.61
      AL
      Barbour County
      Barbour County, AL
    
    
      32
      1874.0
      6733.0
      8748.0
      21.422039
      76.966164
      4,859
      55.54
      AL
      Bibb County
      Bibb County, AL
    
    
      33
      2150.0
      22808.0
      25384.0
      8.469902
      89.851875
      20,658
      81.38
      AL
      Blount County
      Blount County, AL



In [41]:

    
election['election_range'] = election['per_dem'] - election['per_gop']



In [42]:

    
prev_election['election_range'] = prev_election['per_dem_2012'] - prev_election['per_gop_2012']



In [43]:

    
prev_election.dtypes









    Out[43]:





state_abbr              object
county_name             object
total_votes_2012         int64
votes_dem_2012           int64
votes_gop_2012           int64
county_fips              int64
state_fips               int64
per_dem_2012           float64
per_gop_2012           float64
diff_2012                int64
per_point_diff_2012    float64
county_state            object
election_range         float64
dtype: object



In [44]:

    
import seaborn as sns
import matplotlib.pyplot as plt
ax = sns.distplot(election.per_point_diff, kde=False)
ax.set(xlabel = "Percent Difference", ylabel='Count')
ax.set_title('2016 Election Margins in All US Counties', fontsize=16)
plt.show()



In [45]:

    
prev_election.head()









    Out[45]:






  
    
      
      state_abbr
      county_name
      total_votes_2012
      votes_dem_2012
      votes_gop_2012
      county_fips
      state_fips
      per_dem_2012
      per_gop_2012
      diff_2012
      per_point_diff_2012
      county_state
      election_range
    
  
  
    
      0
      AL
      Autauga County
      23909
      6354
      17366
      1
      1
      26.575766
      72.633736
      11012
      -0.460580
      Autauga County, AL
      -46.057970
    
    
      1
      AL
      Baldwin County
      84988
      18329
      65772
      3
      1
      21.566574
      77.389749
      47443
      -0.558232
      Baldwin County, AL
      -55.823175
    
    
      2
      AL
      Barbour County
      11459
      5873
      5539
      5
      1
      51.252291
      48.337551
      334
      0.029147
      Barbour County, AL
      2.914739
    
    
      3
      AL
      Bibb County
      8391
      2200
      6131
      7
      1
      26.218567
      73.066381
      3931
      -0.468478
      Bibb County, AL
      -46.847813
    
    
      4
      AL
      Blount County
      23980
      2961
      20741
      9
      1
      12.347790
      86.492911
      17780
      -0.741451
      Blount County, AL
      -74.145121



In [46]:

    
ax = sns.distplot(prev_election.per_point_diff_2012, kde=False)
ax.set(xlabel = "Percent Difference 2012", ylabel='Count')
ax.set_title('2016 Election Margins in All US Counties', fontsize=16)
plt.show()



In [47]:

    
import matplotlib.pyplot



In [48]:

    
ax = sns.distplot(election.election_range, kde=False)
ax.set(xlabel = "(negative=Republican, positive=Democrat, %)", ylabel='Count')
ax.set_title('Partisan Pattern per All US Counties, 2016', fontsize=16, fontname='Ubuntu')
plt.show()
# Democrats are in HUGE trouble. Of course, this distribution doesn't mean that they're 
# necessarily losing counties, but of those they held onto in 2016, they have a far, far
# weaker grasp on them than Republicans do on their side. Also, many of the Republican 
# counties are in Red States with few electoral votes. However, for Congressional voting
# this is still a dangerous sign.



In [49]:

    
# What was it like in 2012? 
ax = sns.distplot(prev_election.election_range, kde=False)
ax.set(xlabel = "(negative=Republican, positive=Democrat, %)", ylabel='Count')
ax.set_title('Partisan Degrees per County, 2012', fontsize=15, fontname='Ubuntu')
plt.show()
# It was already bad. But it's clearly gotten worse for Democrats.



In [50]:

    
election.describe()









    Out[50]:






  
    
      
      votes_dem
      votes_gop
      total_votes
      per_dem
      per_gop
      per_point_diff
      election_range
    
  
  
    
      count
      3.112000e+03
      3112.000000
      3.112000e+03
      3112.000000
      3112.000000
      3112.000000
      3112.000000
    
    
      mean
      2.006065e+04
      19622.378856
      4.174537e+04
      31.708228
      63.613409
      39.233014
      -31.905181
    
    
      std
      7.199807e+04
      40442.737492
      1.134048e+05
      15.358601
      15.651728
      20.793041
      30.883786
    
    
      min
      4.000000e+00
      57.000000
      6.400000e+01
      3.144654
      4.122067
      0.040000
      -91.636364
    
    
      25%
      1.166000e+03
      3206.000000
      4.820500e+03
      20.475924
      54.947846
      22.467500
      -54.689887
    
    
      50%
      3.153000e+03
      7164.500000
      1.094700e+04
      28.473862
      66.743096
      40.315000
      -38.217390
    
    
      75%
      9.608500e+03
      17448.250000
      2.879650e+04
      39.999326
      75.147062
      55.462500
      -14.876874
    
    
      max
      1.893770e+06
      620285.000000
      2.652072e+06
      92.846592
      95.272727
      91.640000
      88.724525



In [51]:

    
election['slight_dem'] = election['election_range'].apply(lambda x: 0< x <= 10)
election['slight_gop'] = election['election_range'].apply(lambda x: -10 <= x < 0)
election['med_dem'] = election['election_range'].apply(lambda x: 10< x <= 25)
election['med_gop'] = election['election_range'].apply(lambda x: -25 <= x < -10)
election['strong_dem'] = election['election_range'].apply(lambda x: 25 < x <= 50)
election['strong_gop'] = election['election_range'].apply(lambda x: -50 <= x < -25)



In [52]:

    
election.head()









    Out[52]:






  
    
      
      votes_dem
      votes_gop
      total_votes
      per_dem
      per_gop
      diff
      per_point_diff
      state_abbr
      county_name
      county_state
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      29
      5908.0
      18110.0
      24661.0
      23.956855
      73.435789
      12,202
      49.48
      AL
      Autauga County
      Autauga County, AL
      -49.478934
      False
      False
      False
      False
      False
      True
    
    
      30
      18409.0
      72780.0
      94090.0
      19.565310
      77.351472
      54,371
      57.79
      AL
      Baldwin County
      Baldwin County, AL
      -57.786162
      False
      False
      False
      False
      False
      False
    
    
      31
      4848.0
      5431.0
      10390.0
      46.660250
      52.271415
      583
      5.61
      AL
      Barbour County
      Barbour County, AL
      -5.611165
      False
      True
      False
      False
      False
      False
    
    
      32
      1874.0
      6733.0
      8748.0
      21.422039
      76.966164
      4,859
      55.54
      AL
      Bibb County
      Bibb County, AL
      -55.544124
      False
      False
      False
      False
      False
      False
    
    
      33
      2150.0
      22808.0
      25384.0
      8.469902
      89.851875
      20,658
      81.38
      AL
      Blount County
      Blount County, AL
      -81.381973
      False
      False
      False
      False
      False
      False



In [53]:

    
#Combine the states and counties into a single column.
# Have to find a way to join the dfs by matching up those with the same 
# county names AND the same state (there are countys with the same name)
# Simply concatenating them won't work.



In [54]:

    
#http://www.cnbc.com/heres-a-map-of-the-us-counties-that-flipped-to-trump-from-democrats/



In [57]:

    
ue_rates.labor_force = ue_rates.labor_force.astype(float)
ue_rates.employed =  ue_rates.employed.astype(float)
ue_rates.unemployed =  ue_rates.unemployed.astype(float)
ue_rates.ue_rate =  ue_rates.ue_rate.astype(float)



In [58]:

    
ue_rates.dtypes









    Out[58]:





county_state     object
labor_force     float64
employed        float64
unemployed      float64
ue_rate         float64
dtype: object



In [111]:

    
right = election.set_index('county_state')
left = ue_rates.set_index('county_state')
combined = left.join(right, lsuffix='', rsuffix='_r')
combined = combined_1.reset_index()



In [112]:

    
right = combined.set_index('county_state')
left = div.set_index('county_state')
combined_2 = left.join(right, lsuffix='', rsuffix = '_r')
combined_2 = combined_2.reset_index()



In [120]:

    
edu.head()









    Out[120]:






  
    
      
      FIPS Code
      State
      Area name
      less_hs_diploma_2000
      hs_diploma_only_2000
      less_4_years_2000
      four_or_ higher_2000
      per_less_high_school diploma_2000
      per_hs_diploma_only_2000
      per_less_4_years_2000
      per_four_or_ higher_2000
      less_high_school_diploma_2011_15
      hs_diploma_only_2011_15
      less_4_years_2011_15
      four_or_ higher_2011_15
      per_less_high_school_diploma_2011_15
      per_hs_diploma_only_2011_15
      per_less_4_years_2011_15
      per_four_or_higher_2011_15
      county_state
    
  
  
    
      0
      0
      US
      United States
      35715625.0
      52168981.0
      49864428.0
      44462605.0
      19.6
      28.6
      27.4
      24.4
      28229094.0
      58722528.0
      61558628.0
      62952272.0
      13.3
      27.8
      29.1
      29.8
      United States, US
    
    
      1
      1000
      AL
      Alabama
      714081.0
      877216.0
      746495.0
      549608.0
      24.7
      30.4
      25.9
      19.0
      509891.0
      1005295.0
      962515.0
      761650.0
      15.7
      31.0
      29.7
      23.5
      Alabama, AL
    
    
      2
      1001
      AL
      Autauga County
      5872.0
      9332.0
      7413.0
      4972.0
      21.3
      33.8
      26.9
      18.0
      4656.0
      12182.0
      11044.0
      8437.0
      12.8
      33.5
      30.4
      23.2
      Autauga County, AL
    
    
      3
      1003
      AL
      Baldwin County
      17258.0
      28428.0
      28178.0
      22146.0
      18.0
      29.6
      29.3
      23.1
      14360.0
      39431.0
      43500.0
      39710.0
      10.5
      28.8
      31.8
      29.0
      Baldwin County, AL
    
    
      4
      1005
      AL
      Barbour County
      6679.0
      6124.0
      4025.0
      2068.0
      35.3
      32.4
      21.3
      10.9
      5021.0
      6490.0
      4943.0
      2354.0
      26.7
      34.5
      26.3
      12.5
      Barbour County, AL



In [118]:

    
combined_2.columns









    Out[118]:





Index([u'county_state', u'div_index', u'af_am', u'native_2013', u'asian_am',
       u'pac_am', u'two_or_more_races', u'hisp_lat_am', u'white_am', u'index',
       u'labor_force', u'employed', u'unemployed', u'ue_rate', u'votes_dem',
       u'votes_gop', u'total_votes', u'per_dem', u'per_gop', u'diff',
       u'per_point_diff', u'state_abbr', u'county_name', u'election_range',
       u'slight_dem', u'slight_gop', u'med_dem', u'med_gop', u'strong_dem',
       u'strong_gop'],
      dtype='object')



In [113]:

    
right = combined_2.set_index('county_state')
left = edu.set_index('county_state')
combined_3 = left.join(right, lsuffix='', rsuffix = '_r')
combined_3 = combined_2.reset_index()



In [117]:

    
combined_3.columns









    Out[117]:





Index([u'level_0', u'county_state', u'div_index', u'af_am', u'native_2013',
       u'asian_am', u'pac_am', u'two_or_more_races', u'hisp_lat_am',
       u'white_am', u'index', u'labor_force', u'employed', u'unemployed',
       u'ue_rate', u'votes_dem', u'votes_gop', u'total_votes', u'per_dem',
       u'per_gop', u'diff', u'per_point_diff', u'state_abbr', u'county_name',
       u'election_range', u'slight_dem', u'slight_gop', u'med_dem', u'med_gop',
       u'strong_dem', u'strong_gop'],
      dtype='object')



In [114]:

    
right = combined_3.set_index('county_state')
left = pop.set_index('county_state')
combined_4 = left.join(right, lsuffix='', rsuffix = '_r')
combined_4 = combined_4.reset_index()



In [115]:

    
combined_4.isnull().sum()









    Out[115]:





county_state          0
state                 0
county                0
est_pop_2015          0
pop_change_2015       0
int_mig_2015          0
dom_mig_2015          0
mig_2015              0
level_0               7
div_index             7
af_am                 7
native_2013           7
asian_am              7
pac_am                7
two_or_more_races     7
hisp_lat_am           7
white_am              7
index                21
labor_force          21
employed             21
unemployed           21
ue_rate              21
votes_dem            43
votes_gop            43
total_votes          43
per_dem              43
per_gop              43
diff                 43
per_point_diff       43
state_abbr           43
county_name          43
election_range       43
slight_dem           43
slight_gop           43
med_dem              43
med_gop              43
strong_dem           43
strong_gop           43
dtype: int64



In [71]:

    
combined_4.dropna(inplace=True)



In [76]:

    
combined_4= combined_4[combined_4.county_name!='Alaska']
#Just making sure Alaska isn't included



In [77]:

    
combined_4.head()









    Out[77]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      level_0
      div_index
      ...
      per_point_diff
      state_abbr
      county_name
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      0
      Abbeville County, SC
      SC
      Abbeville County
      24932
      6
      22
      -12
      10
      4.0
      0.445417
      ...
      28.25
      SC
      Abbeville County
      -28.254383
      False
      False
      False
      False
      False
      True
    
    
      1
      Acadia Parish, LA
      LA
      Acadia Parish
      62577
      79
      32
      -281
      -249
      5.0
      0.355956
      ...
      56.67
      LA
      Acadia Parish
      -56.674943
      False
      False
      False
      False
      False
      False
    
    
      2
      Accomack County, VA
      VA
      Accomack County
      32973
      -25
      81
      -53
      28
      6.0
      0.539878
      ...
      11.71
      VA
      Accomack County
      -11.710568
      False
      False
      False
      True
      False
      False
    
    
      3
      Ada County, ID
      ID
      Ada County
      434211
      7364
      933
      3838
      4771
      7.0
      0.256622
      ...
      9.24
      ID
      Ada County
      -9.239878
      False
      True
      False
      False
      False
      False
    
    
      4
      Adair County, IA
      IA
      Adair County
      7228
      -189
      0
      -161
      -161
      8.0
      0.054921
      ...
      35.36
      IA
      Adair County
      -35.355148
      False
      False
      False
      False
      False
      True
    
  

5 rows × 38 columns



In [78]:

    
election.describe()









    Out[78]:






  
    
      
      votes_dem
      votes_gop
      total_votes
      per_dem
      per_gop
      per_point_diff
      election_range
    
  
  
    
      count
      3.112000e+03
      3112.000000
      3.112000e+03
      3112.000000
      3112.000000
      3112.000000
      3112.000000
    
    
      mean
      2.006065e+04
      19622.378856
      4.174537e+04
      31.708228
      63.613409
      39.233014
      -31.905181
    
    
      std
      7.199807e+04
      40442.737492
      1.134048e+05
      15.358601
      15.651728
      20.793041
      30.883786
    
    
      min
      4.000000e+00
      57.000000
      6.400000e+01
      3.144654
      4.122067
      0.040000
      -91.636364
    
    
      25%
      1.166000e+03
      3206.000000
      4.820500e+03
      20.475924
      54.947846
      22.467500
      -54.689887
    
    
      50%
      3.153000e+03
      7164.500000
      1.094700e+04
      28.473862
      66.743096
      40.315000
      -38.217390
    
    
      75%
      9.608500e+03
      17448.250000
      2.879650e+04
      39.999326
      75.147062
      55.462500
      -14.876874
    
    
      max
      1.893770e+06
      620285.000000
      2.652072e+06
      92.846592
      95.272727
      91.640000
      88.724525



In [80]:

    
# Set up range variables
ax = sns.distplot(combined_4.election_range, kde=False)
ax.set(xlabel = "(negative=Republican, positive=Democrat, %)", ylabel='Count')
ax.set_title('Partisan Pattern per All US Counties, 2016', fontsize=16, fontname='Ubuntu')
plt.show()

Visualizations were a little dense as far as the content they were showing, make sure you slow down better explain or use visualizations whose labels make them a lot easier for the to understand.

Be sure to frame your results in a positive light! Saying “it’s not a great score” and “score of only 66,” you convey to your audience a sense of disappointment and they likely come away with a negative outlook. If you frame it in a positive way, then you have more control about what your audience takes away!
You should seek to include additional advanced classification metrics: ROC AUC and confusion matrices are critical to the assessment of a model, and omitting them due to technical errors is insufficient for a capstone presentation



In [ ]:

    
# All counties, not including those in Alaska.



In [83]:

    
VA = combined_4[combined_4.state_abbr=='VA']
VA.head()









    Out[83]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      level_0
      div_index
      ...
      per_point_diff
      state_abbr
      county_name
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      2
      Accomack County, VA
      VA
      Accomack County
      32973
      -25
      81
      -53
      28
      6.0
      0.539878
      ...
      11.71
      VA
      Accomack County
      -11.710568
      False
      False
      False
      True
      False
      False
    
    
      30
      Albemarle County, VA
      VA
      Albemarle County
      105703
      1352
      410
      675
      1085
      33.0
      0.379061
      ...
      25.06
      VA
      Albemarle County
      25.056116
      False
      False
      False
      False
      True
      False
    
    
      37
      Alexandria city, VA
      VA
      Alexandria city
      153511
      2071
      2334
      -2139
      195
      40.0
      0.641095
      ...
      59.03
      VA
      Alexandria city
      59.026135
      False
      False
      False
      False
      False
      False
    
    
      45
      Alleghany County, VA
      VA
      Alleghany County
      15677
      -207
      2
      -85
      -83
      48.0
      0.154440
      ...
      37.07
      VA
      Alleghany County
      -37.065426
      False
      False
      False
      False
      False
      True
    
    
      56
      Amelia County, VA
      VA
      Amelia County
      12903
      118
      8
      123
      131
      59.0
      0.420531
      ...
      36.30
      VA
      Amelia County
      -36.304193
      False
      False
      False
      False
      False
      True
    
  

5 rows × 38 columns

Virginia EDA



In [84]:

    
ax = sns.distplot(VA.election_range, kde=False)
ax.set(xlabel = "Negative=Republican, Positive=Democrat (%)", ylabel='Count')
ax.set_title('Partisan Degree in Virginia Counties, 2016', fontsize=16, fontname='Ubuntu')
plt.show()



In [85]:

    
import matplotlib.pyplot as plt
import seaborn as sns



In [89]:

    
ax = sns.regplot(VA.div_index, VA.per_dem)
ax.set(xlabel = 'Diversity Index', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title("Diversity's Contribution to Democratic Votes in Virginia Counties", fontsize=20)
plt.show()



In [90]:

    
ax = sns.regplot(VA.div_index, VA.per_gop)
ax.set(xlabel = 'Diversity Index', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Diversity's Contribution to Republican Votes in Virginia Counties", fontsize=20)
plt.show()



In [100]:

    
ax = sns.regplot(VA.est_pop_2015, VA.per_dem)
ax.set(xlabel = 'Population Per County', ylabel = 'County Vote Percent Democrats(%)')
ax.set_title("Population Level's Contribution to Democratic Votes in Virginia Counties", fontsize=20)
plt.show()
# Not much of a contribution at all in VAb



In [99]:

    
ax = sns.regplot(VA.pop_change_2015, VA.per_dem)
ax.set(xlabel = 'Population Change Per County', ylabel = 'County Vote Percent Democrats(%)')
ax.set_title("Population Change's Contribution to Democratic Votes in Virginia Counties", fontsize=20)
plt.show()
# Again, not much



In [105]:

    
ax = sns.regplot(VA.white_am, VA.per_dem)
ax.set(xlabel = 'Whites Per County(%)', ylabel = 'County Vote For Democrats(%)')
ax.set_title("White Americans' Contribution to Democratic Votes in Virginia Counties", fontsize=20)
plt.show()



In [104]:

    
ax = sns.regplot(VA.white_am, VA.per_gop)
ax.set(xlabel = 'Whites Per County(%)', ylabel = 'County Vote For Republicans(%)')
ax.set_title("White Americans' Contribution to Republican Votes in Virginia Counties", fontsize=20)
plt.show()



In [107]:

    
ax = sns.regplot(VA.af_am, VA.per_dem)
ax.set(xlabel = 'African Americans Per County(%)', ylabel = 'County Vote For Democrats(%)')
ax.set_title("African Americans' Contribution to Democratic Votes in Virginia Counties", fontsize=20)
plt.show()



In [108]:

    
ax = sns.regplot(VA.af_am, VA.per_gop)
ax.set(xlabel = 'African Americans Per County(%)', ylabel = 'County Vote For Republicans(%)')
ax.set_title("African Americans' Contribution to Republican Votes in Virginia Counties", fontsize=20)
plt.show()



In [110]:

    
ax = sns.regplot(VA.asian_am, VA.per_dem)
ax.set(xlabel = 'Asian Americans Per County(%)', ylabel = 'County Vote For Democrats(%)')
ax.set_title("Asian Americans' Contribution to Democratic Votes in Virginia Counties", fontsize=20)
plt.show()
#There's a slight correlation, but nothing too substantive. Need to id these specific counties.



In [ ]:

    
ax = sns.regplot(VA., VA.)
ax.set(xlabel = 'Asian Americans Per County(%)', ylabel = 'County Vote For Democrats(%)')
ax.set_title("Asian Americans' Contribution to Democratic Votes in Virginia Counties", fontsize=20)
plt.show()



In [85]:

    
# Making swing state list based on the crucial swing states this election.

IA = combined_5[combined_5['state_abbr']==('IA')]
WI = combined_5[combined_5['state_abbr']==('WI')]
MI = combined_5[combined_5['state_abbr']==('MI')]
PA = combined_5[combined_5['state_abbr']==('PA')]
FL = combined_5[combined_5['state_abbr']==('FL')]
NC = combined_5[combined_5['state_abbr']==('NC')]
OH = combined_5[combined_5['state_abbr']==('OH')]
MN = combined_5[combined_5['state_abbr']==('MN')]
swing_states= pd.concat([IA, WI, MI, PA, FL, NC, OH, MN])
# 'IA', 'WI','MI','PA','FL','NC','OH','MN'



In [86]:

    
swing_states.head()









    Out[86]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      FIPS Code
      State
      ...
      per_point_diff
      state_abbr
      county_name_r
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      4
      Adair County, IA
      IA
      Adair County
      7228
      -189
      0
      -161
      -161
      19001.0
      IA
      ...
      35.36
      IA
      Adair County
      -35.355148
      False
      False
      False
      False
      False
      True
    
    
      9
      Adams County, IA
      IA
      Adams County
      3796
      -75
      0
      -80
      -80
      19003.0
      IA
      ...
      39.77
      IA
      Adams County
      -39.769452
      False
      False
      False
      False
      False
      True
    
    
      40
      Allamakee County, IA
      IA
      Allamakee County
      13886
      -175
      21
      -216
      -195
      19005.0
      IA
      ...
      24.32
      IA
      Allamakee County
      -24.323534
      False
      False
      False
      True
      False
      False
    
    
      75
      Appanoose County, IA
      IA
      Appanoose County
      12529
      -99
      -2
      -61
      -63
      19007.0
      IA
      ...
      36.38
      IA
      Appanoose County
      -36.384514
      False
      False
      False
      False
      False
      True
    
    
      106
      Audubon County, IA
      IA
      Audubon County
      5773
      -20
      0
      -19
      -19
      19009.0
      IA
      ...
      31.25
      IA
      Audubon County
      -31.251850
      False
      False
      False
      False
      False
      True
    
  

5 rows × 61 columns



In [87]:

    
ax = sns.distplot(swing_states.election_range, kde=False)
ax.set(xlabel = "Negative=Republican, Positive=Democrat (%)", ylabel='Count')
ax.set_title('Partisan Degree in All Swing State Counties, 2016', fontsize=16, fontname='Ubuntu')
plt.show()
# As expected, in swing states it's not AS bad for Democrats compared to the rest of the 
# country but still quite dire.

Influence of Ethnicity



In [96]:

    
ax = sns.regplot(combined_5.white_am, combined_5.per_dem)
ax.set(xlabel = 'Percentage White American(%)', ylabel = 'County Vote Percent Democrat(%)')
plt.show()



In [97]:

    
ax = sns.regplot(combined_5.white_am, combined_5.per_gop)
ax.set(xlabel = 'Percentage White American(%)', ylabel = 'County Vote Percent Republican(%)')
plt.show()
# It's scattered, but there is stil a strong correlation between percentage white 
# population and Republican vote.



In [98]:

    
ax = sns.regplot(combined_5.af_am, combined_5.per_dem)
ax.set(xlabel = 'Percentage African American(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title('African American Influence on 2016 Democrtic Vote in All US Counties', fontsize=15)
plt.show()



In [99]:

    
ax = sns.regplot(combined_5.af_am, combined_5.per_gop)
ax.set(xlabel = 'Percentage African American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('African American Influence on 2016 Republican Vote in All US Counties', fontsize=15)
plt.show()



In [100]:

    
ax = sns.regplot(combined_5.hisp_lat_am, combined_5.per_dem)
ax.set(xlabel = 'Percentage Hispanic/Latino(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title('Hispanic/Latino Influence on 2016 Democratic Vote in All US Counties', fontsize=15)
plt.show()



In [101]:

    
ax = sns.regplot(combined_5.hisp_lat_am, combined_5.per_gop)
ax.set(xlabel = 'Percentage Hispanic/Latino(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('Hispanic/Latino Influence on 2016 Republican Vote in All US Counties', fontsize=15)
plt.show()
# A correlation is there, but it's not that strong due to the sheer amount of 
# counties with little hispanic/latino population.



In [102]:

    
ax = sns.regplot(combined_5.asian_am, combined_5.per_dem)
ax.set(xlabel = 'Percentage Asian American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('Asian American Influence on 2016 Democratic Vote in All US Counties', fontsize=15)
plt.show()



In [103]:

    
ax = sns.regplot(combined_5.asian_am, combined_5.per_gop)
ax.set(xlabel = 'Percentage Asian American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('Asian American Influence on 2016 Republican Vote in All US Counties', fontsize=15)
plt.show()



In [ ]:

Swing States



In [104]:

    
ax = sns.regplot(swing_states.div_index, swing_states.election_range)
ax.set(xlabel = 'Diversity Index', ylabel = 'Election Range, Neg=Republican, Pos=Democrat(%)')
ax.set_title("Diversity's Effect on Swing State Votes", fontsize=20, fontname='Ubuntu')
plt.show()



In [105]:

    
ax = sns.regplot(swing_states.div_index, swing_states.per_dem)
ax.set(xlabel = 'Diversity Index', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title("Diversity's Effect on Democratic Vote in Swing States", fontsize=20, fontname='Ubuntu')
plt.show()



In [106]:

    
ax = sns.regplot(swing_states.div_index, swing_states.per_gop)
ax.set(xlabel = 'Diversity Index', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Diversity's Effect on Republican Vote in Swing States", fontsize=20, fontname='Ubuntu')
plt.show()



In [107]:

    
ax = sns.regplot(swing_states.ue_rate, swing_states.election_range)
ax.set(xlabel = 'Unemployment Rate(%)', ylabel = 'Election Range(%)')
plt.show()



In [108]:

    
# No discernable realtionship for unemployment in the swing states, just as in the overall dataset.



In [109]:

    
ax = sns.regplot(swing_states.white_am, swing_states.per_dem)
ax.set(xlabel = 'Percentage White American(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title("White Americans' Contribtuion to 2016 Swing State Democratic Vote", fontsize=16)
plt.show()



In [110]:

    
ax = sns.regplot(swing_states.white_am, swing_states.per_gop)
ax.set(xlabel = 'Percentage White American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("White Americans' Contribtuion to 2016 Swing State Republican Vote", fontsize=16)
plt.show()



In [111]:

    
# Look for how incomes of white americans influence how they vote.



In [112]:

    
ax = sns.regplot(swing_states.af_am, swing_states.per_dem)
ax.set(xlabel = 'Percentage African American(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title('African American Influence on 2016 Democratic Vote in Swing State Counties', fontsize=15)
plt.show()



In [113]:

    
ax = sns.regplot(swing_states.af_am, swing_states.per_gop)
ax.set(xlabel = 'Percentage African American(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title('African American Influence on 2016 Republican Vote in Swing State Counties', fontsize=15)
plt.show()



In [114]:

    
ax = sns.regplot(swing_states.hisp_lat_am, swing_states.per_dem)
ax.set(xlabel = 'Percentage Hispanic/Latino(%)', ylabel = 'County Vote Percent Democrat(%)')
plt.show()



In [115]:

    
# Again, a scattered, but string correlation.



In [116]:

    
# The change in the uninsured rate does not appear to have benefitted Democrats, 
# but does appear to have benefitted Republicans.

Influence of Education



In [117]:

    
edu.columns









    Out[117]:





Index([                           u'FIPS Code',
                                      u'State',
                                  u'Area name',
                       u'less_hs_diploma_2000',
                       u'hs_diploma_only_2000',
                          u'less_4_years_2000',
                       u'four_or_ higher_2000',
          u'per_less_high_school diploma_2000',
                   u'per_hs_diploma_only_2000',
                      u'per_less_4_years_2000',
                   u'per_four_or_ higher_2000',
           u'less_high_school_diploma_2011_15',
                    u'hs_diploma_only_2011_15',
                       u'less_4_years_2011_15',
                    u'four_or_ higher_2011_15',
       u'per_less_high_school_diploma_2011_15',
                u'per_hs_diploma_only_2011_15',
                   u'per_less_4_years_2011_15',
                 u'per_four_or_higher_2011_15',
                               u'county_state'],
      dtype='object')



In [118]:

    
ax = sns.regplot(combined_5.per_hs_diploma_only_2011_15, combined_5.per_gop)
ax.set(xlabel = 'High School Diploma Only(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Lower Education's Contribution to 2016 Republican Vote in All US Counties", fontsize=16)
plt.show()



In [119]:

    
ax = sns.regplot(combined_5.per_four_or_higher_2011_15, combined_5.per_gop)
ax.set(xlabel = 'Four or more University Years(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Higher Education's Contribution to 2016 Republican Vote in All US Counties", fontsize=16)
plt.show()



In [120]:

    
ax = sns.regplot(combined_5.per_hs_diploma_only_2011_15, combined_5.per_dem)
ax.set(xlabel = 'High School Diploma Only(%)', ylabel = 'County Vote Percent Democrat(%)')
ax.set_title("Lower Education's Contribution to 2016 Democratic Vote", fontsize=16)
plt.show()



In [121]:

    
ax = sns.regplot(combined_5.per_four_or_higher_2011_15, combined_5.per_dem)
ax.set(xlabel = 'Four or more University Years(%)', ylabel = 'County Vote Percent Republican(%)')
ax.set_title("Higher Education's Contribution to 2016 Democratic Vote in All US Counties", fontsize=16)
plt.show()



In [122]:

    
ax = sns.regplot(combined_5.per_hs_diploma_only_2011_15, combined_5.election_range)
ax.set(xlabel = 'High School Diploma Only per County(%)', ylabel = 'Election Range (neg=Rep, pos=Dem, %)')
ax.set_title("Lower Education's Contribution to 2016 Vote", fontsize=16)
plt.show()



In [123]:

    
ax = sns.regplot(combined_5.per_four_or_higher_2011_15, combined_5.election_range)
ax.set(xlabel = 'Four or more University Years per County(%)', ylabel = 'Election Range (neg=Rep, pos=Dem, %)')
ax.set_title("Higher Education's Contribution to 2016 Vote in All US Counties", fontsize=16)
plt.show()

Swing States



In [124]:

    
ax = sns.regplot(swing_states.per_hs_diploma_only_2011_15, swing_states.election_range)
ax.set(xlabel = 'High School Diploma Only per County(%)', ylabel = 'Election Range (neg=Rep, pos=Dem, %)')
ax.set_title("Lower Education's Contribution to 2016 Vote in Swing State Counties", fontsize=16)
plt.show()



In [125]:

    
ax = sns.regplot(swing_states.per_four_or_higher_2011_15, swing_states.election_range)
ax.set(xlabel = 'Four or more University Years per County(%)', ylabel = 'Election Range (neg=Rep, pos=Dem, %)')
ax.set_title("Higher Education's Contribution to 2016 Vote in Swing State Counties", fontsize=16)
plt.show()



In [126]:

    
# If a county has a higher percentage of people with only a hs diploma, then more likely
# to vote Republican. If a county has a higher proportion of 4+ college degrees, then 
# more likely to go Democrat. Pretty much aligns with Nat Silver's argument.



In [127]:

    
combined_5.labor_force.head()









    Out[127]:





0     10423.0
1     26186.0
2     15972.0
3    217281.0
4      4266.0
Name: labor_force, dtype: float64

Labor Force



In [128]:

    
ax = sns.regplot(combined_5.labor_force, combined_5.election_range)
ax.set(xlabel = 'Labor Force Body per County', ylabel = 'Election Range(neg=Rep, pos=Dem, %)')
ax.set_title("Labor Force Contribution to Votes in All counties", fontsize=16)
plt.show()

Population



In [129]:

    
combined_5.head(1)









    Out[129]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      FIPS Code
      State
      ...
      per_point_diff
      state_abbr
      county_name_r
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      0
      Abbeville County, SC
      SC
      Abbeville County
      24932
      6
      22
      -12
      10
      45001.0
      SC
      ...
      28.25
      SC
      Abbeville County
      -28.254383
      False
      False
      False
      False
      False
      True
    
  

1 rows × 61 columns



In [130]:

    
ax = sns.regplot(combined_5.est_pop_2015, combined_5.election_range)
ax.set(xlabel = 'Popularion per County (2015)', ylabel = '2016 Election Range(neg=Rep, pos=Dem, %)')
ax.set_title("Population Contribution to Votes in All Counties", fontsize=16)
plt.show()



In [131]:

    
# Population size per county does correlate with vote.



In [132]:

    
ax = sns.regplot(combined_5.pop_change_2015, combined_5.election_range)                                                                                              
ax.set(xlabel = 'Population Change per County(2015)', ylabel = '2016 Election Range(neg=Rep, pos=Dem, %)')
ax.set_title("Population Change Contribution to Votes in All Counties", fontsize=16)
plt.show()



In [133]:

    
# Counties that experienced a positve change in population saw a boost for Dems.



In [134]:

    
# Although there is that cluster towards zero, and the correlation is broad, there
# is still something there.

Modeling

Regression

Most predictive features for counties' vote found through EDA:

(note that these variables, sometimes by their nature, don't necessarily follow a normal distribution)

Percentage White American population

Percentage African American population

Percentage Asian American population

Percentage High School Diploma only

Percentage Four or more years of University



In [135]:

    
combined_5.columns









    Out[135]:





Index([                        u'county_state',
                                      u'state',
                                     u'county',
                               u'est_pop_2015',
                            u'pop_change_2015',
                               u'int_mig_2015',
                               u'dom_mig_2015',
                                   u'mig_2015',
                                  u'FIPS Code',
                                      u'State',
                                  u'Area name',
                       u'less_hs_diploma_2000',
                       u'hs_diploma_only_2000',
                          u'less_4_years_2000',
                       u'four_or_ higher_2000',
          u'per_less_high_school diploma_2000',
                   u'per_hs_diploma_only_2000',
                      u'per_less_4_years_2000',
                   u'per_four_or_ higher_2000',
           u'less_high_school_diploma_2011_15',
                    u'hs_diploma_only_2011_15',
                       u'less_4_years_2011_15',
                    u'four_or_ higher_2011_15',
       u'per_less_high_school_diploma_2011_15',
                u'per_hs_diploma_only_2011_15',
                   u'per_less_4_years_2011_15',
                 u'per_four_or_higher_2011_15',
                                  u'div_index',
                                      u'af_am',
                                u'native_2013',
                                   u'asian_am',
                                     u'pac_am',
                          u'two_or_more_races',
                                u'hisp_lat_am',
                                   u'white_am',
                                u'county_fips',
                                u'county_name',
                               u'state_abbrev',
                               u'2013_ui_rate',
                               u'2016_ui_rate',
                                   u'ui_delta',
                                u'labor_force',
                                   u'employed',
                                 u'unemployed',
                                    u'ue_rate',
                                  u'votes_dem',
                                  u'votes_gop',
                                u'total_votes',
                                    u'per_dem',
                                    u'per_gop',
                                       u'diff',
                             u'per_point_diff',
                                 u'state_abbr',
                              u'county_name_r',
                             u'election_range',
                                 u'slight_dem',
                                 u'slight_gop',
                                    u'med_dem',
                                    u'med_gop',
                                 u'strong_dem',
                                 u'strong_gop'],
      dtype='object')



In [136]:

    
modeling = combined_5.drop(combined_5[[0,1,2,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,25,34,35,36,37,38,39,40,52,53]], axis=1)



In [137]:

    
modeling.head()









    Out[137]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      ...
      per_gop
      diff
      per_point_diff
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      0
      24932
      6
      37.5
      12.3
      0.445417
      28.2
      0.3
      0.4
      0.0
      1.3
      ...
      62.868333
      3,030
      28.25
      -28.254383
      False
      False
      False
      False
      False
      True
    
    
      1
      62577
      79
      39.2
      10.5
      0.355956
      18.3
      0.3
      0.4
      0.0
      1.3
      ...
      77.262105
      15,521
      56.67
      -56.674943
      False
      False
      False
      False
      False
      False
    
    
      2
      32973
      -25
      39.9
      18.8
      0.539878
      28.0
      0.6
      0.6
      0.2
      1.5
      ...
      54.471596
      1,845
      11.71
      -11.710568
      False
      False
      False
      True
      False
      False
    
    
      3
      434211
      7364
      21.4
      37.1
      0.256622
      1.3
      0.8
      2.6
      0.2
      2.6
      ...
      47.931611
      18,072
      9.24
      -9.239878
      False
      True
      False
      False
      False
      False
    
    
      4
      7228
      -189
      44.7
      15.3
      0.054921
      0.2
      0.1
      0.4
      0.0
      0.7
      ...
      65.336526
      1,329
      35.36
      -35.355148
      False
      False
      False
      False
      False
      True
    
  

5 rows × 29 columns



In [138]:

    
modeling.isnull().sum()









    Out[138]:





est_pop_2015                   0
pop_change_2015                0
per_hs_diploma_only_2011_15    0
per_four_or_higher_2011_15     0
div_index                      0
af_am                          0
native_2013                    0
asian_am                       0
pac_am                         0
two_or_more_races              0
hisp_lat_am                    0
labor_force                    0
employed                       0
unemployed                     0
ue_rate                        0
votes_dem                      0
votes_gop                      0
total_votes                    0
per_dem                        0
per_gop                        0
diff                           0
per_point_diff                 0
election_range                 0
slight_dem                     0
slight_gop                     0
med_dem                        0
med_gop                        0
strong_dem                     0
strong_gop                     0
dtype: int64



In [139]:

    
modeling.dropna(inplace=True)
#Only 46 isn't too significant.



In [140]:

    
from sklearn.cross_validation import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import cross_val_score
from sklearn.metrics import confusion_matrix, mean_squared_error



In [141]:

    
lr = LinearRegression()



In [142]:

    
modeling.columns









    Out[142]:





Index([               u'est_pop_2015',             u'pop_change_2015',
       u'per_hs_diploma_only_2011_15',  u'per_four_or_higher_2011_15',
                         u'div_index',                       u'af_am',
                       u'native_2013',                    u'asian_am',
                            u'pac_am',           u'two_or_more_races',
                       u'hisp_lat_am',                 u'labor_force',
                          u'employed',                  u'unemployed',
                           u'ue_rate',                   u'votes_dem',
                         u'votes_gop',                 u'total_votes',
                           u'per_dem',                     u'per_gop',
                              u'diff',              u'per_point_diff',
                    u'election_range',                  u'slight_dem',
                        u'slight_gop',                     u'med_dem',
                           u'med_gop',                  u'strong_dem',
                        u'strong_gop'],
      dtype='object')



In [143]:

    
X = modeling[[0,1,2,3,4,5,6,7,8,9,10,11]] 
y = modeling['election_range']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=99)



In [144]:

    
X.head(0)









    Out[144]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      hisp_lat_am
      labor_force



In [145]:

    
lr.fit(X_train,y_train)
y_pred = lr.predict(X_test)



In [146]:

    
ax = sns.regplot(y_test, y_pred)
ax.set(xlabel = 'Predicted Election Range (neg=Rep, pos=Dem)', ylabel = 'Actual Election Range(neg=Rep, pos=Dem)')
ax.set_title("Predicted vs. Actual Election Ranges for All Counties", fontsize=16)
plt.show()



In [147]:

    
lr.score(X_train, y_train)









    Out[147]:





0.66806575723520978

Model Swing States



In [246]:

    
s_modeling = swing_states.drop(swing_states[[0,1,2,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,25,34,35,36,37,38,39,40,52,53]], axis=1)



In [247]:

    
swing_states.head(0)









    Out[247]:






  
    
      
      county_state
      state
      county
      est_pop_2015
      pop_change_2015
      int_mig_2015
      dom_mig_2015
      mig_2015
      FIPS Code
      State
      ...
      per_point_diff
      state_abbr
      county_name_r
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
  

0 rows × 61 columns



In [248]:

    
s_modeling.head(0)









    Out[248]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      ...
      per_gop
      diff
      per_point_diff
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
  

0 rows × 29 columns



In [249]:

    
X = s_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]] 
y = s_modeling['election_range']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=99)



In [250]:

    
X.head()









    Out[250]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      hisp_lat_am
      labor_force
    
  
  
    
      4
      7228
      -189
      44.7
      15.3
      0.054921
      0.2
      0.1
      0.4
      0.0
      0.7
      1.5
      4266.0
    
    
      9
      3796
      -75
      39.1
      15.1
      0.058873
      0.3
      0.5
      0.6
      0.0
      0.6
      1.1
      2300.0
    
    
      40
      13886
      -175
      42.1
      16.3
      0.159016
      1.5
      0.6
      0.5
      0.3
      1.0
      5.8
      7727.0
    
    
      75
      12529
      -99
      36.3
      17.6
      0.074125
      0.6
      0.3
      0.3
      0.0
      1.1
      1.6
      6255.0
    
    
      106
      5773
      -20
      42.3
      14.3
      0.049200
      0.4
      0.2
      0.5
      0.0
      0.7
      0.9
      3251.0



In [251]:

    
lr.fit(X_train,y_train)
y_pred = lr.predict(X_test)



In [253]:

    
ax = sns.regplot(y_test, y_pred)
ax.set(xlabel = 'Predicted Election Range (neg=Rep, pos=Dem)', ylabel = 'Actual Election Range(neg=Rep, pos=Dem)')
ax.set_title("Predicted vs. Actual Election Ranges for Swing State Counties", fontsize=16)
plt.show()



In [254]:

    
lr.score(X_train, y_train)
# Right around the same R^2 score as all counties.









    Out[254]:





0.66209473643341399

Classification

Now we want to see what features classify a county into being "slight dem", "slight gop, "med_dem", "med_gop", "strong_dem", and "strong_gop."



In [206]:

    
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report, precision_score, recall_score, roc_curve, auc



In [159]:

    
# Setting the number of neighbors to the square root of number of instances is a good 
# rule of thumb.
knn = KNeighborsClassifier(n_neighbors = 55)
rfc = RandomForestClassifier(max_depth = 5)



In [160]:

    
modeling.head()









    Out[160]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      ...
      per_gop
      diff
      per_point_diff
      election_range
      slight_dem
      slight_gop
      med_dem
      med_gop
      strong_dem
      strong_gop
    
  
  
    
      0
      24932
      6
      37.5
      12.3
      0.445417
      28.2
      0.3
      0.4
      0.0
      1.3
      ...
      62.868333
      3,030
      28.25
      -28.254383
      False
      False
      False
      False
      False
      True
    
    
      1
      62577
      79
      39.2
      10.5
      0.355956
      18.3
      0.3
      0.4
      0.0
      1.3
      ...
      77.262105
      15,521
      56.67
      -56.674943
      False
      False
      False
      False
      False
      False
    
    
      2
      32973
      -25
      39.9
      18.8
      0.539878
      28.0
      0.6
      0.6
      0.2
      1.5
      ...
      54.471596
      1,845
      11.71
      -11.710568
      False
      False
      False
      True
      False
      False
    
    
      3
      434211
      7364
      21.4
      37.1
      0.256622
      1.3
      0.8
      2.6
      0.2
      2.6
      ...
      47.931611
      18,072
      9.24
      -9.239878
      False
      True
      False
      False
      False
      False
    
    
      4
      7228
      -189
      44.7
      15.3
      0.054921
      0.2
      0.1
      0.4
      0.0
      0.7
      ...
      65.336526
      1,329
      35.36
      -35.355148
      False
      False
      False
      False
      False
      True
    
  

5 rows × 29 columns



In [161]:



In [162]:

    
c_modeling = modeling.join(dummies)
c_modeling = c_modeling.reset_index()
c_modeling = c_modeling.drop(c_modeling[[0]], axis=1)



In [163]:

    
c_modeling.head()









    Out[163]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      ...
      slight_gop_False
      slight_gop_True
      med_dem_False
      med_dem_True
      med_gop_False
      med_gop_True
      strong_dem_False
      strong_dem_True
      strong_gop_False
      strong_gop_True
    
  
  
    
      0
      24932
      6
      37.5
      12.3
      0.445417
      28.2
      0.3
      0.4
      0.0
      1.3
      ...
      1.0
      0.0
      1.0
      0.0
      1.0
      0.0
      1.0
      0.0
      0.0
      1.0
    
    
      1
      62577
      79
      39.2
      10.5
      0.355956
      18.3
      0.3
      0.4
      0.0
      1.3
      ...
      1.0
      0.0
      1.0
      0.0
      1.0
      0.0
      1.0
      0.0
      1.0
      0.0
    
    
      2
      32973
      -25
      39.9
      18.8
      0.539878
      28.0
      0.6
      0.6
      0.2
      1.5
      ...
      1.0
      0.0
      1.0
      0.0
      0.0
      1.0
      1.0
      0.0
      1.0
      0.0
    
    
      3
      434211
      7364
      21.4
      37.1
      0.256622
      1.3
      0.8
      2.6
      0.2
      2.6
      ...
      0.0
      1.0
      1.0
      0.0
      1.0
      0.0
      1.0
      0.0
      1.0
      0.0
    
    
      4
      7228
      -189
      44.7
      15.3
      0.054921
      0.2
      0.1
      0.4
      0.0
      0.7
      ...
      1.0
      0.0
      1.0
      0.0
      1.0
      0.0
      1.0
      0.0
      0.0
      1.0
    
  

5 rows × 41 columns



In [164]:

    
c_modeling.columns









    Out[164]:





Index([               u'est_pop_2015',             u'pop_change_2015',
       u'per_hs_diploma_only_2011_15',  u'per_four_or_higher_2011_15',
                         u'div_index',                       u'af_am',
                       u'native_2013',                    u'asian_am',
                            u'pac_am',           u'two_or_more_races',
                       u'hisp_lat_am',                 u'labor_force',
                          u'employed',                  u'unemployed',
                           u'ue_rate',                   u'votes_dem',
                         u'votes_gop',                 u'total_votes',
                           u'per_dem',                     u'per_gop',
                              u'diff',              u'per_point_diff',
                    u'election_range',                  u'slight_dem',
                        u'slight_gop',                     u'med_dem',
                           u'med_gop',                  u'strong_dem',
                        u'strong_gop',            u'slight_dem_False',
                   u'slight_dem_True',            u'slight_gop_False',
                   u'slight_gop_True',               u'med_dem_False',
                      u'med_dem_True',               u'med_gop_False',
                      u'med_gop_True',            u'strong_dem_False',
                   u'strong_dem_True',            u'strong_gop_False',
                   u'strong_gop_True'],
      dtype='object')



In [255]:

    
# Swing State Classifiers
dummies = pd.get_dummies(s_modeling[['slight_dem','slight_gop','med_dem','med_gop','strong_dem','strong_gop']])
cs_modeling = s_modeling.join(dummies)
cs_modeling = cs_modeling.reset_index()
cs_modeling = cs_modeling.drop(c_modeling[[0]], axis=1)

First test for slight dem and slight gop.



In [265]:

    
# First try KNN for just slight dem and slight gop.
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[29,30,31,32]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)



In [266]:

    
X.head()









    Out[266]:






  
    
      
      est_pop_2015
      pop_change_2015
      per_hs_diploma_only_2011_15
      per_four_or_higher_2011_15
      div_index
      af_am
      native_2013
      asian_am
      pac_am
      two_or_more_races
      hisp_lat_am
      labor_force
    
  
  
    
      0
      24932
      6
      37.5
      12.3
      0.445417
      28.2
      0.3
      0.4
      0.0
      1.3
      1.2
      10423.0
    
    
      1
      62577
      79
      39.2
      10.5
      0.355956
      18.3
      0.3
      0.4
      0.0
      1.3
      2.0
      26186.0
    
    
      2
      32973
      -25
      39.9
      18.8
      0.539878
      28.0
      0.6
      0.6
      0.2
      1.5
      9.0
      15972.0
    
    
      3
      434211
      7364
      21.4
      37.1
      0.256622
      1.3
      0.8
      2.6
      0.2
      2.6
      7.5
      217281.0
    
    
      4
      7228
      -189
      44.7
      15.3
      0.054921
      0.2
      0.1
      0.4
      0.0
      0.7
      1.5
      4266.0



In [267]:

    
y.head()









    Out[267]:






  
    
      
      slight_dem_False
      slight_dem_True
      slight_gop_False
      slight_gop_True
    
  
  
    
      0
      1.0
      0.0
      1.0
      0.0
    
    
      1
      1.0
      0.0
      1.0
      0.0
    
    
      2
      1.0
      0.0
      1.0
      0.0
    
    
      3
      1.0
      0.0
      0.0
      1.0
    
    
      4
      1.0
      0.0
      1.0
      0.0



In [268]:

    
knn.fit(X_train, y_train)









    Out[268]:





KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=55, p=2,
           weights='uniform')



In [269]:

    
y_pred = knn.predict(X_test)



In [270]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.892871526379
0.901771336554
[ 0.90342052  0.90140845  0.8832998   0.875       0.90120968]
             precision    recall  f1-score   support

          0       0.95      1.00      0.98       591
          1       0.00      0.00      0.00        30
          2       0.95      1.00      0.97       590
          3       0.00      0.00      0.00        31

avg / total       0.90      0.95      0.93      1242



In [ ]:

Now test for medium gop and medium dem.



In [215]:

    
#KNN for med_dem and med_gop
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[33,34,35,36]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
y_pred = knn.predict(X_test)



In [216]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.826016915022
0.811594202899
[ 0.82293763  0.81891348  0.81488934  0.83870968  0.83467742]
             precision    recall  f1-score   support

          0       0.95      1.00      0.97       589
          1       0.00      0.00      0.00        32
          2       0.86      1.00      0.93       536
          3       0.00      0.00      0.00        85

avg / total       0.82      0.91      0.86      1242

Now test for strong gop and strong dem.



In [ ]:

    
#KNN for strong dem and stronggop
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[37,38,39,40]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
y_pred = knn.predict(X_test)



In [218]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.627064035441
0.631239935588
[ 0.56740443  0.64788732  0.64788732  0.64717742  0.60685484]
             precision    recall  f1-score   support

          0       0.95      1.00      0.98       593
          1       0.00      0.00      0.00        28
          2       0.68      1.00      0.81       420
          3       0.00      0.00      0.00       201

avg / total       0.68      0.82      0.74      1242

Swing States Classifiers



In [256]:

    
#First slight dem and slight gop
X = cs_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = cs_modeling[[29,30,31,32]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
knn.fit(X_train, y_train)









    Out[256]:





KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=55, p=2,
           weights='uniform')



In [257]:

    
y_pred = knn.predict(X_test)



In [258]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.856332703214
0.87969924812
[ 0.85849057  0.87735849  0.83962264  0.86792453  0.83809524]
             precision    recall  f1-score   support

          0       0.93      1.00      0.96       124
          1       0.00      0.00      0.00         9
          2       0.95      1.00      0.97       126
          3       0.00      0.00      0.00         7

avg / total       0.88      0.94      0.91       266

Medium Dem and GOP



In [259]:

    
#KNN for med_dem and med_gop
X = cs_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = cs_modeling[[33,34,35,36]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
y_pred = knn.predict(X_test)



In [260]:

    
knn.fit(X_train,y_train)









    Out[260]:





KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=55, p=2,
           weights='uniform')



In [261]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.744801512287
0.781954887218
[ 0.80188679  0.69811321  0.74528302  0.73584906  0.74285714]
             precision    recall  f1-score   support

          0       0.98      1.00      0.99       130
          1       0.00      0.00      0.00         3
          2       0.80      1.00      0.89       107
          3       0.00      0.00      0.00        26

avg / total       0.80      0.89      0.84       266

Strong Dem and GOP



In [262]:

    
X = cs_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = cs_modeling[[37,38,39,40]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
y_pred = knn.predict(X_test)
knn.fit(X_train,y_train)









    Out[262]:





KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=55, p=2,
           weights='uniform')



In [263]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.576559546314
0.428571428571
[ 0.47169811  0.59433962  0.5754717   0.56603774  0.59047619]
             precision    recall  f1-score   support

          0       0.93      1.00      0.96       124
          1       0.00      0.00      0.00         9
          2       0.50      1.00      0.66        66
          3       0.00      0.00      0.00        67

avg / total       0.56      0.71      0.61       266

Modeling for the "strong" counties of 25-50% is not that predictive.



In [178]:

    
## Random Forests

RFC for slight dem and slight gop



In [224]:

    
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[29,30,31,32]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)



In [225]:

    
rfc.fit(X_train, y_train)









    Out[225]:





RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=5, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)



In [226]:

    
y_pred = rfc.predict(X_test)



In [227]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.892871526379
0.901771336554
[ 0.90342052  0.90140845  0.8832998   0.875       0.90120968]
             precision    recall  f1-score   support

          0       0.95      1.00      0.98       591
          1       0.00      0.00      0.00        30
          2       0.95      1.00      0.97       590
          3       0.00      0.00      0.00        31

avg / total       0.90      0.95      0.93      1242

RFC for medium dem and medium gop



In [228]:

    
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[33,34,35,36]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)



In [229]:

    
rfc.fit(X_train, y_train)









    Out[229]:





RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=5, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)



In [230]:

    
y_pred = rfc.predict(X_test)



In [231]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.826016915022
0.80998389694
[ 0.82293763  0.81891348  0.81488934  0.83870968  0.83467742]
             precision    recall  f1-score   support

          0       0.95      1.00      0.97       589
          1       0.00      0.00      0.00        32
          2       0.86      1.00      0.93       536
          3       1.00      0.01      0.02        85

avg / total       0.89      0.90      0.86      1242

RFC for strong dem and strong gop



In [232]:

    
X = c_modeling[[0,1,2,3,4,5,6,7,8,9,10,11]]
y = c_modeling[[37,38,39,40]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)



In [233]:

    
rfc.fit(X_train, y_train)









    Out[233]:





RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=5, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)



In [234]:

    
y_pred = rfc.predict(X_test)



In [235]:

    
print knn.score(X_train,y_train)
print accuracy_score(y_test, y_pred)
print cross_val_score(knn, X_train, y_train, cv=5)
print(classification_report(y_test,y_pred))









    



0.627064035441
0.642512077295
[ 0.56740443  0.64788732  0.64788732  0.64717742  0.60685484]
             precision    recall  f1-score   support

          0       0.95      1.00      0.98       593
          1       0.00      0.00      0.00        28
          2       0.69      0.98      0.81       420
          3       0.64      0.08      0.14       201

avg / total       0.79      0.82      0.76      1242



In [191]:

    
# Just like in KNN, not the best classifier for "strong counties."



In [192]:

    
## Problem statement: What are the economic and demographic factors we can use to predict
## whether a county votes Democrat or Republican? More specifically, how do these factors 
## affect the margin of a Democrat or Republican winning the vote in a swing state county?
## Furthermore, are the parties becoming racial identity parties--how much does the data 
## convey this?



In [193]:

    
## Potential questions down the line:
## Look closely at the election week's coverage and how to build off that
## How many misleading data driven stories have their been? Atlantic--said most predicitive 
## question was whether Obama was born here (bunch of false positives)==precision vs recall 
## problem. Look at HOW METRICS HAVE BEEN ABUSED. 

## DEBUNK these stories.

	Location	Diversity-Index	Black or African American alone, percent, 2013	American Indian and Alaska Native alone, percent, 2013	Asian alone, percent, 2013	Native Hawaiian and Other Pacific Islander alone, percent,	Two or More Races, percent, 2013	Hispanic or Latino, percent, 2013	White alone, not Hispanic or Latino, percent, 2013
0	Aleutians West Census Area, AK	0.769346	7.4	13.8	31.1	2.3	4.8	14.6	29.2
1	Queens County, NY	0.742224	20.9	1.3	25.2	0.2	2.7	28.0	26.7
2	Maui County, HI	0.740757	0.8	0.6	28.8	10.6	23.3	10.7	31.5
3	Alameda County, CA	0.740399	12.4	1.2	28.2	1.0	5.2	22.7	33.2
4	Aleutians East Borough, AK	0.738867	7.7	21.8	41.4	0.7	3.7	13.5	12.9

	Unnamed: 0	votes_dem	votes_gop	total_votes	per_dem	per_gop	diff	per_point_diff	state_abbr	county_name	combined_fips
0	0	93003.0	130413.0	246588.0	0.377159	0.52887	37,410	15.17%	AK	Alaska	2013
1	1	93003.0	130413.0	246588.0	0.377159	0.52887	37,410	15.17%	AK	Alaska	2016
2	2	93003.0	130413.0	246588.0	0.377159	0.52887	37,410	15.17%	AK	Alaska	2020
3	3	93003.0	130413.0	246588.0	0.377159	0.52887	37,410	15.17%	AK	Alaska	2050
4	4	93003.0	130413.0	246588.0	0.377159	0.52887	37,410	15.17%	AK	Alaska	2060

	Unnamed: 0	votes_dem	votes_gop	total_votes	per_dem	per_gop	diff	per_point_diff	state_abbr	county_name	combined_fips
29	29	5908.0	18110.0	24661.0	0.239569	0.734358	12,202	49.48%	AL	Autauga County	1001
30	30	18409.0	72780.0	94090.0	0.195653	0.773515	54,371	57.79%	AL	Baldwin County	1003
31	31	4848.0	5431.0	10390.0	0.466603	0.522714	583	5.61%	AL	Barbour County	1005
32	32	1874.0	6733.0	8748.0	0.214220	0.769662	4,859	55.54%	AL	Bibb County	1007
33	33	2150.0	22808.0	25384.0	0.084699	0.898519	20,658	81.38%	AL	Blount County	1009

	state	county	est_pop_2015	pop_change_2015	int_mig_2015	dom_mig_2015	mig_2015
0	AL	Alabama	4858979	12568	5726	-2268	3458
1	AL	Autauga County	55347	57	19	-140	-121
2	AL	Baldwin County	203709	3996	221	3469	3690
3	AL	Barbour County	26489	-326	0	-281	-281
4	AL	Bibb County	22583	34	21	4	25

	FIPS Code	State	Area name	less_hs_diploma_2000	hs_diploma_only_2000	less_4_years_2000	four_or_ higher_2000	per_less_high_school diploma_2000	per_hs_diploma_only_2000	per_less_4_years_2000	per_four_or_ higher_2000	less_high_school_diploma_2011_15	hs_diploma_only_2011_15	less_4_years_2011_15	four_or_ higher_2011_15	per_less_high_school_diploma_2011_15	per_hs_diploma_only_2011_15	per_less_4_years_2011_15	per_four_or_higher_2011_15
0	0	US	United States	35715625.0	52168981.0	49864428.0	44462605.0	19.6	28.6	27.4	24.4	28229094.0	58722528.0	61558628.0	62952272.0	13.3	27.8	29.1	29.8
1	1000	AL	Alabama	714081.0	877216.0	746495.0	549608.0	24.7	30.4	25.9	19.0	509891.0	1005295.0	962515.0	761650.0	15.7	31.0	29.7	23.5
2	1001	AL	Autauga County	5872.0	9332.0	7413.0	4972.0	21.3	33.8	26.9	18.0	4656.0	12182.0	11044.0	8437.0	12.8	33.5	30.4	23.2
3	1003	AL	Baldwin County	17258.0	28428.0	28178.0	22146.0	18.0	29.6	29.3	23.1	14360.0	39431.0	43500.0	39710.0	10.5	28.8	31.8	29.0
4	1005	AL	Barbour County	6679.0	6124.0	4025.0	2068.0	35.3	32.4	21.3	10.9	5021.0	6490.0	4943.0	2354.0	26.7	34.5	26.3	12.5

	state_abbr	county_name	total_votes_2012	votes_dem_2012	votes_gop_2012	county_fips	state_fips	per_dem_2012	per_gop_2012	diff_2012	per_point_diff_2012	county_state	election_range
0	AL	Autauga County	23909	6354	17366	1	1	26.575766	72.633736	11012	-0.460580	Autauga County, AL	-46.057970
1	AL	Baldwin County	84988	18329	65772	3	1	21.566574	77.389749	47443	-0.558232	Baldwin County, AL	-55.823175
2	AL	Barbour County	11459	5873	5539	5	1	51.252291	48.337551	334	0.029147	Barbour County, AL	2.914739
3	AL	Bibb County	8391	2200	6131	7	1	26.218567	73.066381	3931	-0.468478	Bibb County, AL	-46.847813
4	AL	Blount County	23980	2961	20741	9	1	12.347790	86.492911	17780	-0.741451	Blount County, AL	-74.145121

	votes_dem	votes_gop	total_votes	per_dem	per_gop	per_point_diff	election_range
count	3.112000e+03	3112.000000	3.112000e+03	3112.000000	3112.000000	3112.000000	3112.000000
mean	2.006065e+04	19622.378856	4.174537e+04	31.708228	63.613409	39.233014	-31.905181
std	7.199807e+04	40442.737492	1.134048e+05	15.358601	15.651728	20.793041	30.883786
min	4.000000e+00	57.000000	6.400000e+01	3.144654	4.122067	0.040000	-91.636364
25%	1.166000e+03	3206.000000	4.820500e+03	20.475924	54.947846	22.467500	-54.689887
50%	3.153000e+03	7164.500000	1.094700e+04	28.473862	66.743096	40.315000	-38.217390
75%	9.608500e+03	17448.250000	2.879650e+04	39.999326	75.147062	55.462500	-14.876874
max	1.893770e+06	620285.000000	2.652072e+06	92.846592	95.272727	91.640000	88.724525

	county_state	state	county	est_pop_2015	pop_change_2015	int_mig_2015	dom_mig_2015	mig_2015	level_0	div_index	...	per_point_diff	state_abbr	county_name	election_range	slight_dem	slight_gop	med_dem	med_gop	strong_dem	strong_gop
0	Abbeville County, SC	SC	Abbeville County	24932	6	22	-12	10	4.0	0.445417	...	28.25	SC	Abbeville County	-28.254383	False	False	False	False	False	True
1	Acadia Parish, LA	LA	Acadia Parish	62577	79	32	-281	-249	5.0	0.355956	...	56.67	LA	Acadia Parish	-56.674943	False	False	False	False	False	False
2	Accomack County, VA	VA	Accomack County	32973	-25	81	-53	28	6.0	0.539878	...	11.71	VA	Accomack County	-11.710568	False	False	False	True	False	False
3	Ada County, ID	ID	Ada County	434211	7364	933	3838	4771	7.0	0.256622	...	9.24	ID	Ada County	-9.239878	False	True	False	False	False	False
4	Adair County, IA	IA	Adair County	7228	-189	0	-161	-161	8.0	0.054921	...	35.36	IA	Adair County	-35.355148	False	False	False	False	False	True

	est_pop_2015	pop_change_2015	per_hs_diploma_only_2011_15	per_four_or_higher_2011_15	div_index	af_am	native_2013	asian_am	pac_am	two_or_more_races	...	per_gop	diff	per_point_diff	election_range	slight_dem	slight_gop	med_dem	med_gop	strong_dem	strong_gop
0	24932	6	37.5	12.3	0.445417	28.2	0.3	0.4	0.0	1.3	...	62.868333	3,030	28.25	-28.254383	False	False	False	False	False	True
1	62577	79	39.2	10.5	0.355956	18.3	0.3	0.4	0.0	1.3	...	77.262105	15,521	56.67	-56.674943	False	False	False	False	False	False
2	32973	-25	39.9	18.8	0.539878	28.0	0.6	0.6	0.2	1.5	...	54.471596	1,845	11.71	-11.710568	False	False	False	True	False	False
3	434211	7364	21.4	37.1	0.256622	1.3	0.8	2.6	0.2	2.6	...	47.931611	18,072	9.24	-9.239878	False	True	False	False	False	False
4	7228	-189	44.7	15.3	0.054921	0.2	0.1	0.4	0.0	0.7	...	65.336526	1,329	35.36	-35.355148	False	False	False	False	False	True

	slight_dem_False	slight_gop_False	slight_gop_True
0	1.0	1.0	0.0
1	1.0	1.0	0.0
2	1.0	1.0	0.0
3	1.0	0.0	1.0
4	1.0	1.0	0.0

	slight_dem_False	slight_gop_False	slight_gop_True
0	1.0	1.0	0.0
1	1.0	1.0	0.0
2	1.0	1.0	0.0
3	1.0	0.0	1.0
4	1.0	1.0	0.0

	slight_dem_False	slight_gop_False	slight_gop_True
0	1.0	1.0	0.0
1	1.0	1.0	0.0
2	1.0	1.0	0.0
3	1.0	0.0	1.0
4	1.0	1.0	0.0