Finding similar states - clustering using scikit-learn and pandas

On this notebook we'll use pandas to load and process data from the US states, after that we'll use scikit-learn to cluster states based on census statistics.



In [2]:

    
#First import pandas and KMeans from scikit-learn
import pandas as pd
from sklearn.cluster import KMeans

#Configure the plotting library
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (15, 5)



In [3]:

    
#Load county data - this also contains state-level data

#The first file contains the actual statistics, but the columns have codes
county_facts = pd.read_csv('../kaggle-data/county_facts.csv', index_col=['fips', 'area_name'])
#The second file translates from codes to the meaning of each column
county_facts_columns = pd.read_csv('../kaggle-data/county_facts_dictionary.csv')



In [4]:

    
#Take a look at the columns
county_facts_columns









    Out[4]:






  
    
      
      column_name
      description
    
  
  
    
      0
      PST045214
      Population, 2014 estimate
    
    
      1
      PST040210
      Population, 2010 (April 1) estimates base
    
    
      2
      PST120214
      Population, percent change - April 1, 2010 to ...
    
    
      3
      POP010210
      Population, 2010
    
    
      4
      AGE135214
      Persons under 5 years, percent, 2014
    
    
      5
      AGE295214
      Persons under 18 years, percent, 2014
    
    
      6
      AGE775214
      Persons 65 years and over, percent, 2014
    
    
      7
      SEX255214
      Female persons, percent, 2014
    
    
      8
      RHI125214
      White alone, percent, 2014
    
    
      9
      RHI225214
      Black or African American alone, percent, 2014
    
    
      10
      RHI325214
      American Indian and Alaska Native alone, perce...
    
    
      11
      RHI425214
      Asian alone, percent, 2014
    
    
      12
      RHI525214
      Native Hawaiian and Other Pacific Islander alo...
    
    
      13
      RHI625214
      Two or More Races, percent, 2014
    
    
      14
      RHI725214
      Hispanic or Latino, percent, 2014
    
    
      15
      RHI825214
      White alone, not Hispanic or Latino, percent, ...
    
    
      16
      POP715213
      Living in same house 1 year & over, percent, 2...
    
    
      17
      POP645213
      Foreign born persons, percent, 2009-2013
    
    
      18
      POP815213
      Language other than English spoken at home, pc...
    
    
      19
      EDU635213
      High school graduate or higher, percent of per...
    
    
      20
      EDU685213
      Bachelor's degree or higher, percent of person...
    
    
      21
      VET605213
      Veterans, 2009-2013
    
    
      22
      LFE305213
      Mean travel time to work (minutes), workers ag...
    
    
      23
      HSG010214
      Housing units, 2014
    
    
      24
      HSG445213
      Homeownership rate, 2009-2013
    
    
      25
      HSG096213
      Housing units in multi-unit structures, percen...
    
    
      26
      HSG495213
      Median value of owner-occupied housing units, ...
    
    
      27
      HSD410213
      Households, 2009-2013
    
    
      28
      HSD310213
      Persons per household, 2009-2013
    
    
      29
      INC910213
      Per capita money income in past 12 months (201...
    
    
      30
      INC110213
      Median household income, 2009-2013
    
    
      31
      PVY020213
      Persons below poverty level, percent, 2009-2013
    
    
      32
      BZA010213
      Private nonfarm establishments, 2013
    
    
      33
      BZA110213
      Private nonfarm employment,  2013
    
    
      34
      BZA115213
      Private nonfarm employment, percent change, 20...
    
    
      35
      NES010213
      Nonemployer establishments, 2013
    
    
      36
      SBO001207
      Total number of firms, 2007
    
    
      37
      SBO315207
      Black-owned firms, percent, 2007
    
    
      38
      SBO115207
      American Indian- and Alaska Native-owned firms...
    
    
      39
      SBO215207
      Asian-owned firms, percent, 2007
    
    
      40
      SBO515207
      Native Hawaiian- and Other Pacific Islander-ow...
    
    
      41
      SBO415207
      Hispanic-owned firms, percent, 2007
    
    
      42
      SBO015207
      Women-owned firms, percent, 2007
    
    
      43
      MAN450207
      Manufacturers shipments, 2007 ($1,000)
    
    
      44
      WTN220207
      Merchant wholesaler sales, 2007 ($1,000)
    
    
      45
      RTN130207
      Retail sales, 2007 ($1,000)
    
    
      46
      RTN131207
      Retail sales per capita, 2007
    
    
      47
      AFN120207
      Accommodation and food services sales, 2007 ($...
    
    
      48
      BPS030214
      Building permits, 2014
    
    
      49
      LND110210
      Land area in square miles, 2010
    
    
      50
      POP060210
      Population per square mile, 2010



In [5]:

    
#Let's take a look at some of the rows we have using head() pandas method
county_facts.head()









    Out[5]:






  
    
      
      
      state_abbreviation
      PST045214
      PST040210
      PST120214
      POP010210
      AGE135214
      AGE295214
      AGE775214
      SEX255214
      RHI125214
      ...
      SBO415207
      SBO015207
      MAN450207
      WTN220207
      RTN130207
      RTN131207
      AFN120207
      BPS030214
      LND110210
      POP060210
    
    
      fips
      area_name
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0
      United States
      NaN
      318857056
      308758105
      3.3
      308745538
      6.2
      23.1
      14.5
      50.8
      77.4
      ...
      8.3
      28.8
      5319456312
      4174286516
      3917663456
      12990
      613795732
      1046363
      3531905.43
      87.4
    
    
      1000
      Alabama
      NaN
      4849377
      4780127
      1.4
      4779736
      6.1
      22.8
      15.3
      51.5
      69.7
      ...
      1.2
      28.1
      112858843
      52252752
      57344851
      12364
      6426342
      13369
      50645.33
      94.4
    
    
      1001
      Autauga County
      AL
      55395
      54571
      1.5
      54571
      6.0
      25.2
      13.8
      51.4
      77.9
      ...
      0.7
      31.7
      0
      0
      598175
      12003
      88157
      131
      594.44
      91.8
    
    
      1003
      Baldwin County
      AL
      200111
      182265
      9.8
      182265
      5.6
      22.2
      18.7
      51.2
      87.1
      ...
      1.3
      27.3
      1410273
      0
      2966489
      17166
      436955
      1384
      1589.78
      114.6
    
    
      1005
      Barbour County
      AL
      26887
      27457
      -2.1
      27457
      5.7
      21.2
      16.5
      46.6
      50.2
      ...
      0.0
      27.0
      0
      0
      188337
      6334
      0
      8
      884.88
      31.0
    
  

5 rows × 52 columns

Data processing

county_facts contains data at the county and state level. Since we only want state-level data, we need to filter the rows. As you can see from the previous cell, county-level rows contain the state_abbreviation, the rest of the rows contain null. To get only state-level data we'll filter by rows that have a null value in state_abbreviation column but leave the first row out since it contains data for the entire country.



In [6]:

    
#Subselect rows, let's just keep state level data
df = county_facts[county_facts.state_abbreviation.isnull()][1:]
df.head()









    Out[6]:






  
    
      
      
      state_abbreviation
      PST045214
      PST040210
      PST120214
      POP010210
      AGE135214
      AGE295214
      AGE775214
      SEX255214
      RHI125214
      ...
      SBO415207
      SBO015207
      MAN450207
      WTN220207
      RTN130207
      RTN131207
      AFN120207
      BPS030214
      LND110210
      POP060210
    
    
      fips
      area_name
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1000
      Alabama
      NaN
      4849377
      4780127
      1.4
      4779736
      6.1
      22.8
      15.3
      51.5
      69.7
      ...
      1.2
      28.1
      112858843
      52252752
      57344851
      12364
      6426342
      13369
      50645.33
      94.4
    
    
      2000
      Alaska
      NaN
      736732
      710249
      3.7
      710231
      7.4
      25.3
      9.4
      47.4
      66.9
      ...
      0.0
      25.9
      8204030
      4563605
      9303387
      13635
      1851293
      1518
      570640.95
      1.2
    
    
      4000
      Arizona
      NaN
      6731484
      6392310
      5.3
      6392017
      6.4
      24.1
      15.9
      50.3
      83.7
      ...
      10.7
      28.1
      57977827
      57573459
      86758801
      13637
      13268514
      26997
      113594.08
      56.3
    
    
      5000
      Arkansas
      NaN
      2966369
      2915958
      1.7
      2915918
      6.5
      23.8
      15.7
      50.9
      79.7
      ...
      2.3
      24.5
      60735582
      29659789
      32974282
      11602
      3559795
      7666
      52035.48
      56.0
    
    
      6000
      California
      NaN
      38802500
      37254503
      4.2
      37253956
      6.5
      23.6
      12.9
      50.3
      73.2
      ...
      16.5
      30.3
      491372092
      598456486
      455032270
      12561
      80852787
      83645
      155779.22
      239.1
    
  

5 rows × 52 columns

While we can cluster using all columns, let's subset them for now. Feel free to modify this piece of code and experiment with different column combinations!



In [7]:

    
#Select columns that have to do with etnicity proportion
df = df.filter(regex='RHI*')

The column names in our data are coded, since we are interested in the meaning of each column we are using, we need to replace the code for the actual meaning. The following cell achieves that.



In [8]:

    
#Rename columns to use their meaning instead their codename
col_names = dict((k, v) for k,v in county_facts_columns.itertuples(index=False, name=None))
df.rename(columns=col_names, inplace=True)
df.head()









    Out[8]:






  
    
      
      
      White alone, percent, 2014
      Black or African American alone, percent, 2014
      American Indian and Alaska Native alone, percent, 2014
      Asian alone, percent, 2014
      Native Hawaiian and Other Pacific Islander alone, percent, 2014
      Two or More Races, percent, 2014
      Hispanic or Latino, percent, 2014
      White alone, not Hispanic or Latino, percent, 2014
    
    
      fips
      area_name
      
      
      
      
      
      
      
      
    
  
  
    
      1000
      Alabama
      69.7
      26.7
      0.7
      1.3
      0.1
      1.5
      4.1
      66.2
    
    
      2000
      Alaska
      66.9
      3.9
      14.8
      6.1
      1.3
      7.1
      6.8
      61.9
    
    
      4000
      Arizona
      83.7
      4.7
      5.3
      3.3
      0.3
      2.7
      30.5
      56.2
    
    
      5000
      Arkansas
      79.7
      15.6
      1.0
      1.5
      0.3
      1.9
      7.0
      73.4
    
    
      6000
      California
      73.2
      6.5
      1.7
      14.4
      0.5
      3.7
      38.6
      38.5

Now let's run a KMeans algorithm with 4 clusters, feel free to experiment with different clustering algorithms and different parameters (parameter change depending on the algorithm).

Running a clustering algorithm with scikit-learn



In [9]:

    
#Run a clustering algorithm, group in 4 clusters
model = KMeans(n_clusters=4)
results = model.fit_predict(df.values)



In [10]:

    
#Assign cluster number to our dataframe, this will help us identify which cluster was assigned to
#every state
df['cluster'] = results

Data visualization



In [11]:

    
#Count the number of states assigned to each cluster
df.cluster.value_counts().plot.bar()









    Out[11]:





<matplotlib.axes._subplots.AxesSubplot at 0x10f65c490>



In [12]:

    
#Let's take a look at how the algorithm clustered states
df.groupby(df.cluster).mean().transpose().plot.bar()









    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x10f756e10>

We can se that cluster 0 has a much higher proportion of hispanics, cluster 3 of asians and clister 1 african americanas.



In [13]:

    
#Let's see which states are in each cluster
df[df.cluster==0]









    Out[13]:






  
    
      
      
      White alone, percent, 2014
      Black or African American alone, percent, 2014
      American Indian and Alaska Native alone, percent, 2014
      Asian alone, percent, 2014
      Native Hawaiian and Other Pacific Islander alone, percent, 2014
      Two or More Races, percent, 2014
      Hispanic or Latino, percent, 2014
      White alone, not Hispanic or Latino, percent, 2014
      cluster
    
    
      fips
      area_name
      
      
      
      
      
      
      
      
      
    
  
  
    
      1000
      Alabama
      69.7
      26.7
      0.7
      1.3
      0.1
      1.5
      4.1
      66.2
      0
    
    
      2000
      Alaska
      66.9
      3.9
      14.8
      6.1
      1.3
      7.1
      6.8
      61.9
      0
    
    
      10000
      Delaware
      70.8
      22.2
      0.7
      3.8
      0.1
      2.5
      8.9
      63.7
      0
    
    
      11000
      District Of Columbia
      43.6
      49.0
      0.6
      4.0
      0.2
      2.6
      10.4
      35.8
      0
    
    
      13000
      Georgia
      62.1
      31.5
      0.5
      3.8
      0.1
      2.0
      9.3
      54.3
      0
    
    
      17000
      Illinois
      77.5
      14.7
      0.6
      5.3
      0.1
      1.8
      16.7
      62.3
      0
    
    
      22000
      Louisiana
      63.4
      32.5
      0.8
      1.8
      0.1
      1.5
      4.8
      59.3
      0
    
    
      24000
      Maryland
      60.1
      30.3
      0.6
      6.4
      0.1
      2.6
      9.3
      52.6
      0
    
    
      28000
      Mississippi
      59.7
      37.5
      0.6
      1.0
      0.1
      1.2
      3.0
      57.3
      0
    
    
      34000
      New Jersey
      73.0
      14.8
      0.6
      9.4
      0.1
      2.1
      19.3
      56.8
      0
    
    
      36000
      New York
      70.4
      17.6
      1.0
      8.5
      0.1
      2.4
      18.6
      56.5
      0
    
    
      37000
      North Carolina
      71.5
      22.1
      1.6
      2.7
      0.1
      2.1
      9.0
      64.1
      0
    
    
      45000
      South Carolina
      68.3
      27.8
      0.5
      1.5
      0.1
      1.7
      5.4
      63.9
      0
    
    
      51000
      Virginia
      70.5
      19.7
      0.5
      6.3
      0.1
      2.8
      8.9
      63.1
      0



In [14]:

    
df[df.cluster==3]









    Out[14]:






  
    
      
      
      White alone, percent, 2014
      Black or African American alone, percent, 2014
      American Indian and Alaska Native alone, percent, 2014
      Asian alone, percent, 2014
      Native Hawaiian and Other Pacific Islander alone, percent, 2014
      Two or More Races, percent, 2014
      Hispanic or Latino, percent, 2014
      White alone, not Hispanic or Latino, percent, 2014
      cluster
    
    
      fips
      area_name
      
      
      
      
      
      
      
      
      
    
  
  
    
      4000
      Arizona
      83.7
      4.7
      5.3
      3.3
      0.3
      2.7
      30.5
      56.2
      3
    
    
      6000
      California
      73.2
      6.5
      1.7
      14.4
      0.5
      3.7
      38.6
      38.5
      3
    
    
      12000
      Florida
      77.8
      16.8
      0.5
      2.8
      0.1
      2.0
      24.1
      55.8
      3
    
    
      32000
      Nevada
      76.2
      9.1
      1.6
      8.3
      0.7
      4.0
      27.8
      51.5
      3
    
    
      35000
      New Mexico
      82.8
      2.5
      10.4
      1.7
      0.2
      2.5
      47.7
      38.9
      3
    
    
      48000
      Texas
      80.0
      12.5
      1.0
      4.5
      0.1
      1.8
      38.6
      43.5
      3



In [15]:

    
df[df.cluster==2]









    Out[15]:






  
    
      
      
      White alone, percent, 2014
      Black or African American alone, percent, 2014
      American Indian and Alaska Native alone, percent, 2014
      Asian alone, percent, 2014
      Native Hawaiian and Other Pacific Islander alone, percent, 2014
      Two or More Races, percent, 2014
      Hispanic or Latino, percent, 2014
      White alone, not Hispanic or Latino, percent, 2014
      cluster
    
    
      fips
      area_name
      
      
      
      
      
      
      
      
      
    
  
  
    
      5000
      Arkansas
      79.7
      15.6
      1.0
      1.5
      0.3
      1.9
      7.0
      73.4
      2
    
    
      8000
      Colorado
      87.7
      4.5
      1.6
      3.1
      0.2
      2.9
      21.2
      69.0
      2
    
    
      9000
      Connecticut
      81.2
      11.5
      0.5
      4.5
      0.1
      2.2
      15.0
      68.8
      2
    
    
      16000
      Idaho
      93.5
      0.8
      1.7
      1.4
      0.2
      2.3
      12.0
      82.8
      2
    
    
      18000
      Indiana
      86.1
      9.6
      0.4
      2.0
      0.1
      1.9
      6.6
      80.3
      2
    
    
      19000
      Iowa
      92.1
      3.4
      0.5
      2.2
      0.1
      1.7
      5.6
      87.1
      2
    
    
      20000
      Kansas
      86.8
      6.3
      1.2
      2.8
      0.1
      2.8
      11.4
      76.8
      2
    
    
      21000
      Kentucky
      88.3
      8.2
      0.3
      1.4
      0.1
      1.8
      3.4
      85.4
      2
    
    
      23000
      Maine
      95.0
      1.4
      0.7
      1.2
      0.0
      1.6
      1.5
      93.8
      2
    
    
      25000
      Massachusetts
      82.6
      8.3
      0.5
      6.3
      0.1
      2.2
      10.8
      74.3
      2
    
    
      26000
      Michigan
      79.9
      14.2
      0.7
      2.9
      0.0
      2.3
      4.8
      75.8
      2
    
    
      27000
      Minnesota
      85.7
      5.9
      1.3
      4.7
      0.1
      2.3
      5.1
      81.4
      2
    
    
      29000
      Missouri
      83.5
      11.8
      0.5
      1.9
      0.1
      2.1
      4.0
      80.1
      2
    
    
      30000
      Montana
      89.4
      0.6
      6.6
      0.8
      0.1
      2.6
      3.5
      86.7
      2
    
    
      31000
      Nebraska
      89.4
      4.9
      1.4
      2.2
      0.1
      2.0
      10.2
      80.5
      2
    
    
      33000
      New Hampshire
      94.0
      1.5
      0.3
      2.5
      0.0
      1.6
      3.3
      91.3
      2
    
    
      38000
      North Dakota
      89.1
      2.1
      5.4
      1.3
      0.1
      2.0
      3.2
      86.6
      2
    
    
      39000
      Ohio
      83.0
      12.6
      0.3
      2.0
      0.1
      2.1
      3.5
      80.1
      2
    
    
      40000
      Oklahoma
      75.1
      7.7
      9.0
      2.1
      0.2
      5.9
      9.8
      67.0
      2
    
    
      41000
      Oregon
      87.9
      2.0
      1.8
      4.3
      0.4
      3.6
      12.5
      77.0
      2
    
    
      42000
      Pennsylvania
      82.9
      11.6
      0.3
      3.3
      0.1
      1.8
      6.6
      77.9
      2
    
    
      44000
      Rhode Island
      85.1
      7.7
      0.9
      3.5
      0.2
      2.6
      14.0
      74.5
      2
    
    
      46000
      South Dakota
      85.7
      1.9
      8.9
      1.3
      0.1
      2.2
      3.6
      83.0
      2
    
    
      47000
      Tennessee
      78.9
      17.1
      0.4
      1.7
      0.1
      1.7
      5.0
      74.6
      2
    
    
      49000
      Utah
      91.4
      1.3
      1.5
      2.4
      1.0
      2.4
      13.5
      79.3
      2
    
    
      50000
      Vermont
      95.0
      1.2
      0.4
      1.6
      0.0
      1.8
      1.8
      93.5
      2
    
    
      53000
      Washington
      80.7
      4.1
      1.9
      8.2
      0.7
      4.5
      12.2
      70.4
      2
    
    
      54000
      West Virginia
      93.7
      3.6
      0.2
      0.8
      0.0
      1.6
      1.5
      92.5
      2
    
    
      55000
      Wisconsin
      87.8
      6.6
      1.1
      2.6
      0.0
      1.8
      6.5
      82.2
      2
    
    
      56000
      Wyoming
      92.7
      1.6
      2.7
      1.0
      0.1
      2.0
      9.8
      84.1
      2

	column_name	description
0	PST045214	Population, 2014 estimate
1	PST040210	Population, 2010 (April 1) estimates base
2	PST120214	Population, percent change - April 1, 2010 to ...
3	POP010210	Population, 2010
4	AGE135214	Persons under 5 years, percent, 2014
5	AGE295214	Persons under 18 years, percent, 2014
6	AGE775214	Persons 65 years and over, percent, 2014
7	SEX255214	Female persons, percent, 2014
8	RHI125214	White alone, percent, 2014
9	RHI225214	Black or African American alone, percent, 2014
10	RHI325214	American Indian and Alaska Native alone, perce...
11	RHI425214	Asian alone, percent, 2014
12	RHI525214	Native Hawaiian and Other Pacific Islander alo...
13	RHI625214	Two or More Races, percent, 2014
14	RHI725214	Hispanic or Latino, percent, 2014
15	RHI825214	White alone, not Hispanic or Latino, percent, ...
16	POP715213	Living in same house 1 year & over, percent, 2...
17	POP645213	Foreign born persons, percent, 2009-2013
18	POP815213	Language other than English spoken at home, pc...
19	EDU635213	High school graduate or higher, percent of per...
20	EDU685213	Bachelor's degree or higher, percent of person...
21	VET605213	Veterans, 2009-2013
22	LFE305213	Mean travel time to work (minutes), workers ag...
23	HSG010214	Housing units, 2014
24	HSG445213	Homeownership rate, 2009-2013
25	HSG096213	Housing units in multi-unit structures, percen...
26	HSG495213	Median value of owner-occupied housing units, ...
27	HSD410213	Households, 2009-2013
28	HSD310213	Persons per household, 2009-2013
29	INC910213	Per capita money income in past 12 months (201...
30	INC110213	Median household income, 2009-2013
31	PVY020213	Persons below poverty level, percent, 2009-2013
32	BZA010213	Private nonfarm establishments, 2013
33	BZA110213	Private nonfarm employment, 2013
34	BZA115213	Private nonfarm employment, percent change, 20...
35	NES010213	Nonemployer establishments, 2013
36	SBO001207	Total number of firms, 2007
37	SBO315207	Black-owned firms, percent, 2007
38	SBO115207	American Indian- and Alaska Native-owned firms...
39	SBO215207	Asian-owned firms, percent, 2007
40	SBO515207	Native Hawaiian- and Other Pacific Islander-ow...
41	SBO415207	Hispanic-owned firms, percent, 2007
42	SBO015207	Women-owned firms, percent, 2007
43	MAN450207	Manufacturers shipments, 2007 ($1,000)
44	WTN220207	Merchant wholesaler sales, 2007 ($1,000)
45	RTN130207	Retail sales, 2007 ($1,000)
46	RTN131207	Retail sales per capita, 2007
47	AFN120207	Accommodation and food services sales, 2007 ($...
48	BPS030214	Building permits, 2014
49	LND110210	Land area in square miles, 2010
50	POP060210	Population per square mile, 2010

		state_abbreviation	PST045214	PST040210	PST120214	POP010210	AGE135214	AGE295214	AGE775214	SEX255214	RHI125214	...	SBO415207	SBO015207	MAN450207	WTN220207	RTN130207	RTN131207	AFN120207	BPS030214	LND110210	POP060210
fips	area_name
0	United States	NaN	318857056	308758105	3.3	308745538	6.2	23.1	14.5	50.8	77.4	...	8.3	28.8	5319456312	4174286516	3917663456	12990	613795732	1046363	3531905.43	87.4
1000	Alabama	NaN	4849377	4780127	1.4	4779736	6.1	22.8	15.3	51.5	69.7	...	1.2	28.1	112858843	52252752	57344851	12364	6426342	13369	50645.33	94.4
1001	Autauga County	AL	55395	54571	1.5	54571	6.0	25.2	13.8	51.4	77.9	...	0.7	31.7	0	0	598175	12003	88157	131	594.44	91.8
1003	Baldwin County	AL	200111	182265	9.8	182265	5.6	22.2	18.7	51.2	87.1	...	1.3	27.3	1410273	0	2966489	17166	436955	1384	1589.78	114.6
1005	Barbour County	AL	26887	27457	-2.1	27457	5.7	21.2	16.5	46.6	50.2	...	0.0	27.0	0	0	188337	6334	0	8	884.88	31.0

		White alone, percent, 2014	Black or African American alone, percent, 2014	American Indian and Alaska Native alone, percent, 2014	Asian alone, percent, 2014	Native Hawaiian and Other Pacific Islander alone, percent, 2014	Two or More Races, percent, 2014	Hispanic or Latino, percent, 2014	White alone, not Hispanic or Latino, percent, 2014	cluster
fips	area_name
4000	Arizona	83.7	4.7	5.3	3.3	0.3	2.7	30.5	56.2	3
6000	California	73.2	6.5	1.7	14.4	0.5	3.7	38.6	38.5	3
12000	Florida	77.8	16.8	0.5	2.8	0.1	2.0	24.1	55.8	3
32000	Nevada	76.2	9.1	1.6	8.3	0.7	4.0	27.8	51.5	3
35000	New Mexico	82.8	2.5	10.4	1.7	0.2	2.5	47.7	38.9	3
48000	Texas	80.0	12.5	1.0	4.5	0.1	1.8	38.6	43.5	3

		White alone, percent, 2014	Black or African American alone, percent, 2014	American Indian and Alaska Native alone, percent, 2014	Asian alone, percent, 2014	Native Hawaiian and Other Pacific Islander alone, percent, 2014	Two or More Races, percent, 2014	Hispanic or Latino, percent, 2014	White alone, not Hispanic or Latino, percent, 2014	cluster
fips	area_name
5000	Arkansas	79.7	15.6	1.0	1.5	0.3	1.9	7.0	73.4	2
8000	Colorado	87.7	4.5	1.6	3.1	0.2	2.9	21.2	69.0	2
9000	Connecticut	81.2	11.5	0.5	4.5	0.1	2.2	15.0	68.8	2
16000	Idaho	93.5	0.8	1.7	1.4	0.2	2.3	12.0	82.8	2
18000	Indiana	86.1	9.6	0.4	2.0	0.1	1.9	6.6	80.3	2
19000	Iowa	92.1	3.4	0.5	2.2	0.1	1.7	5.6	87.1	2
20000	Kansas	86.8	6.3	1.2	2.8	0.1	2.8	11.4	76.8	2
21000	Kentucky	88.3	8.2	0.3	1.4	0.1	1.8	3.4	85.4	2
23000	Maine	95.0	1.4	0.7	1.2	0.0	1.6	1.5	93.8	2
25000	Massachusetts	82.6	8.3	0.5	6.3	0.1	2.2	10.8	74.3	2
26000	Michigan	79.9	14.2	0.7	2.9	0.0	2.3	4.8	75.8	2
27000	Minnesota	85.7	5.9	1.3	4.7	0.1	2.3	5.1	81.4	2
29000	Missouri	83.5	11.8	0.5	1.9	0.1	2.1	4.0	80.1	2
30000	Montana	89.4	0.6	6.6	0.8	0.1	2.6	3.5	86.7	2
31000	Nebraska	89.4	4.9	1.4	2.2	0.1	2.0	10.2	80.5	2
33000	New Hampshire	94.0	1.5	0.3	2.5	0.0	1.6	3.3	91.3	2
38000	North Dakota	89.1	2.1	5.4	1.3	0.1	2.0	3.2	86.6	2
39000	Ohio	83.0	12.6	0.3	2.0	0.1	2.1	3.5	80.1	2
40000	Oklahoma	75.1	7.7	9.0	2.1	0.2	5.9	9.8	67.0	2
41000	Oregon	87.9	2.0	1.8	4.3	0.4	3.6	12.5	77.0	2
42000	Pennsylvania	82.9	11.6	0.3	3.3	0.1	1.8	6.6	77.9	2
44000	Rhode Island	85.1	7.7	0.9	3.5	0.2	2.6	14.0	74.5	2
46000	South Dakota	85.7	1.9	8.9	1.3	0.1	2.2	3.6	83.0	2
47000	Tennessee	78.9	17.1	0.4	1.7	0.1	1.7	5.0	74.6	2
49000	Utah	91.4	1.3	1.5	2.4	1.0	2.4	13.5	79.3	2
50000	Vermont	95.0	1.2	0.4	1.6	0.0	1.8	1.8	93.5	2
53000	Washington	80.7	4.1	1.9	8.2	0.7	4.5	12.2	70.4	2
54000	West Virginia	93.7	3.6	0.2	0.8	0.0	1.6	1.5	92.5	2
55000	Wisconsin	87.8	6.6	1.1	2.6	0.0	1.8	6.5	82.2	2
56000	Wyoming	92.7	1.6	2.7	1.0	0.1	2.0	9.8	84.1	2