In [1]:

    
import pandas as pd
import matplotlib as mpl
import datetime as dt
%matplotlib inline









    



/Users/johntemplon/.virtualenvs/data-analysis/lib/python2.7/site-packages/pandas/io/excel.py:626: UserWarning: Installed openpyxl is not supported at this time. Use >=1.6.1 and <2.0.0.
  .format(openpyxl_compat.start_ver, openpyxl_compat.stop_ver))

Load Data:

The data is in two CSVs. Loading both, cleaning up header names and created a single data frame:



In [2]:

    
f1 = "../data/FTC_2011-43014.csv"
f2 = "../data/FTC_5114-71014.csv"



In [3]:

    
column_names=["Reference Number", "Created Date", "Consumer Address, City", "Consumer Address, State Code", "Company Name", 
              "Company Phone Area Code", "Company Phone, Number", "Company Phone, Extension", 
              "Complaint Info Initial Contact Method", "Complaint Info Product Service Code"]



In [4]:

    
df_1 = pd.io.parsers.read_csv(f1, names=column_names, index_col=0, skiprows=1)
df_2 = pd.io.parsers.read_csv(f2, names=column_names, index_col=0, skiprows=1)



In [5]:

    
df = pd.concat([df_1, df_2])



In [6]:

    
df.head()









    Out[6]:






  
    
      
      Created Date
      Consumer Address, City
      Consumer Address, State Code
      Company Name
      Company Phone Area Code
      Company Phone, Number
      Company Phone, Extension
      Complaint Info Initial Contact Method
      Complaint Info Product Service Code
    
    
      Reference Number
      
      
      
      
      
      
      
      
      
    
  
  
    
      28741989
       01/04/2011
          Waianae
       HI
                      IRS
        NaN
            NaN
       NaN
       Phone
       2143|9043
    
    
      28803726
       01/10/2011
         Marietta
       GA
        Georgia State IRS
        770
        9448020
       NaN
       Phone
            9043
    
    
      28827558
       01/12/2011
       Fort Worth
       TX
                      IRS
        872
        4796242
       NaN
       Phone
       2142|9043
    
    
      28842181
       01/13/2011
       Orangeburg
       SC
                      IRS
        800
        8297650
       NaN
       Phone
            9043
    
    
      28890200
       01/19/2011
           Morton
       MN
       IRS|Readers Digest
       876|
       4355471|
         |
       Phone
       4190|9043

Subhead Clean Up Created Dates and State Names:

Converting the created dates to datetime objects and also cleaning up the state names. A few of the columns in the data sometimes contain | to delimit multiple options selected on a complaint. Want to eliminate those. The state function assumes that the first state is the right one if available.

Converting to a datetime allows us to see the huge uptick in IRS scam complaints to the FTC over time.



In [7]:

    
def clean_state_names(state_name):
    state_name = str(state_name)
    if "|" in state_name:
        pieces = state_name.split("|")
        if pieces[0] != "":
            state = pieces[0]
        elif pieces[1] != "":
            state = pieces[1]
        else:
            state = None
    else:
        state = state_name
    return state



In [8]:

    
df["clean_state"] = df["Consumer Address, State Code"].apply(lambda x: clean_state_names(x))



In [9]:

    
df["created_date"] = df["Created Date"].apply(lambda x: dt.datetime.strptime(str(x), "%m/%d/%Y") if not pd.isnull(x) else None)



In [10]:

    
df.dropna(subset=["created_date"]).set_index("created_date")["Company Name"].resample("M", how="count").plot()









    Out[10]:





<matplotlib.axes.AxesSubplot at 0x1106c83d0>

Where Are The Complaints Coming From?

The majority of the scam complaints are by phone.

Phone calls follow a similar pattern as the overall calls.



In [12]:

    
df.dropna(subset=["created_date"])['Complaint Info Initial Contact Method'].value_counts()









    Out[12]:





9043                       7930
Phone Call: Landline       5094
Phone                      1591
Phone Call: Mobile/Cell    1282
Internet/E-mail             207
Mail                         54
Mobile: Text/Email/IM        38
I Initiated Contact          32
Internet Web Site            24
Unknown                      24
9043|9001                    11
Internet (Other)             11
In Person                    10
9001|9043                     8
9043|0350                     7
0350|9043                     5
9000|9043                     5
0323|9043                     4
Fax                           3
6100|9043                     3
Wireless                      3
9043|9046                     2
9043|2142                     2
9043|0323                     1
TV/Radio                      1
9050|9043                     1
9043|6001                     1
9043|9000                     1
9043|6101                     1
Print                         1
dtype: int64



In [13]:

    
df["phone"] = df['Complaint Info Initial Contact Method'].apply(lambda x: 1 if ("Phone" in str(x)) else 0)



In [14]:

    
df.dropna(subset=["created_date"]).set_index("created_date")["phone"].resample("M", how="count").plot()









    Out[14]:





<matplotlib.axes.AxesSubplot at 0x110850b50>

Finding the Change Over Time:

The trend in the graph is obviously a huge spike. July is only down because the data here only goes through July 10, 2014. Let's look at all the months for which we have a full set of data available and see just how big the change has been over time.



In [15]:

    
df["month"] = df["created_date"].apply(lambda x: x.month if not pd.isnull(x) else None)
df["year"] = df["created_date"].apply(lambda x: x.year if not pd.isnull(x) else None)



In [16]:

    
complaints_by_month = pd.DataFrame(df.groupby(["year", "month"])["created_date"].count())



In [17]:

    
first_six_months_2011 = complaints_by_month.ix[2011].ix[1:6]["created_date"].sum()
first_six_months_2012 = complaints_by_month.ix[2012].ix[1:6]["created_date"].sum()
first_six_months_2013 = complaints_by_month.ix[2013].ix[1:6]["created_date"].sum()
first_six_months_2014 = complaints_by_month.ix[2014].ix[1:6]["created_date"].sum()



In [18]:

    
print "Pct. Change 2012 vs. 2011: {pct:,.2f}%".format(pct=(first_six_months_2012-first_six_months_2011)/float(first_six_months_2011)*100)
print "Pct. Change 2013 vs. 2012: {pct:,.2f}%".format(pct=(first_six_months_2013-first_six_months_2012)/float(first_six_months_2012)*100)
print "Pct. Change 2014 vs. 2013: {pct:,.2f}%".format(pct=(first_six_months_2014-first_six_months_2013)/float(first_six_months_2013)*100)









    



Pct. Change 2012 vs. 2011: 5.63%
Pct. Change 2013 vs. 2012: 133.33%
Pct. Change 2014 vs. 2013: 3,529.14%

Where Are These Phone Calls Coming From?



In [19]:

    
df.dropna(subset=["created_date"])['Company Phone Area Code'].value_counts()









    Out[19]:





202    6113
415    1549
585     818
888     674
206     416
530     401
347     368
800     332
866     320
716     305
877     150
844     145
571     134
213     124
917     107
...
202|602|405    1
210|           1
312|           1
|||            1
316            1
251||          1
702|           1
253|202        1
607|202        1
614|516        1
610|732        1
951|325        1
206|206        1
385            1
404|702        1
Length: 503, dtype: int64



In [20]:

    
top_phone_numbers = pd.DataFrame(df.groupby(['Company Phone Area Code', 'Company Phone, Number'])\
                                 .count()\
                                 .sort("created_date", ascending=False))



In [25]:

    
top_phone_numbers[:25]









    Out[25]:






  
    
      
      
      Created Date
      Consumer Address, City
      Consumer Address, State Code
      Company Name
      Company Phone, Extension
      Complaint Info Initial Contact Method
      Complaint Info Product Service Code
      clean_state
      created_date
      phone
      month
      year
    
    
      Company Phone Area Code
      Company Phone, Number
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      585
      6331814
       165
       139
       145
       165
       102
       161
       58
       165
       165
       165
       165
       165
    
    
      6331792
       148
       108
       126
       148
        55
       138
       91
       148
       148
       148
       148
       148
    
    
      415
      2514089
       131
       108
       117
       131
        62
       125
       60
       131
       131
       131
       131
       131
    
    
      2511037
       126
       107
       115
       126
       118
       126
        5
       126
       126
       126
       126
       126
    
    
      800
      8291040
       103
        92
        94
       103
        49
       101
       51
       103
       103
       103
       103
       103
    
    
      415
      2510522
        96
        77
        84
        96
         0
        92
       96
        96
        96
        96
        96
        96
    
    
      202
      2413122
        94
        83
        86
        94
        66
        94
       24
        94
        94
        94
        94
        94
    
    
      5069044
        89
        75
        80
        89
         3
        86
       86
        89
        89
        89
        89
        89
    
    
      2412158
        84
        64
        75
        84
        78
        84
        0
        84
        84
        84
        84
        84
    
    
      2411435
        80
        68
        76
        80
        66
        80
        8
        80
        80
        80
        80
        80
    
    
      415
      4848569
        78
        71
        72
        78
         1
        75
       77
        78
        78
        78
        78
        78
    
    
      202
      2411375
        78
        68
        72
        78
        75
        78
        0
        78
        78
        78
        78
        78
    
    
      347
      5597963
        78
        68
        68
        78
         0
        75
       78
        78
        78
        78
        78
        78
    
    
      202
      5069319
        75
        66
        68
        75
        57
        75
       15
        75
        75
        75
        75
        75
    
    
      347
      5590378
        71
        67
        69
        71
        43
        69
       28
        71
        71
        71
        71
        71
    
    
      202
      2411818
        68
        56
        56
        68
         4
        60
       66
        68
        68
        68
        68
        68
    
    
      5068708
        67
        57
        64
        67
        62
        67
        0
        67
        67
        67
        67
        67
    
    
      2807395
        66
        57
        60
        66
         0
        62
       66
        66
        66
        66
        66
        66
    
    
      530
      2385813
        64
        55
        57
        64
         2
        60
       64
        64
        64
        64
        64
        64
    
    
      206
      4144646
        60
        50
        49
        60
        39
        59
       18
        60
        60
        60
        60
        60
    
    
      415
      2519738
        60
        55
        58
        60
        54
        60
        0
        60
        60
        60
        60
        60
    
    
      202
      5068602
        58
        51
        51
        58
        56
        58
        0
        58
        58
        58
        58
        58
    
    
      585
      4446953
        57
        45
        48
        57
        51
        57
        4
        57
        57
        57
        57
        57
    
    
      202
      5069461
        57
        45
        46
        57
        55
        57
        0
        57
        57
        57
        57
        57
    
    
      716
      2063218
        57
        53
        55
        57
         7
        56
       50
        57
        57
        57
        57
        57

The majority of the phone calls come from the 202 area code. This is the area code for Washington, DC. Digging further into just those phone numbers.



In [26]:

    
df_202 = df[df["Company Phone Area Code"] == "202"]



In [28]:

    
df_202.dropna(subset=["created_date"]).set_index("created_date")["phone"].resample("M", how="sum").plot()









    Out[28]:





<matplotlib.axes.AxesSubplot at 0x110b2ded0>



In [29]:

    
complaints_from_202_by_month = pd.DataFrame(df_202.groupby(["year", "month"])\
                                            ["created_date"]\
                                            .count())



In [30]:

    
complaints_from_202_by_month.head()









    Out[30]:






  
    
      
      
      created_date
    
    
      year
      month
      
    
  
  
    
      2011
      3
       1
    
    
      4
       1
    
    
      5
       2
    
    
      6
       2
    
    
      7
       1



In [31]:

    
first_six_months_2014_from_202 = complaints_from_202_by_month.ix[2014].ix[1:6]["created_date"].sum()



In [35]:

    
print "The percentage of phone calls from 202 area during the first six months of 2014 was {0:,.2f}%".format(float(first_six_months_2014_from_202/float(first_six_months_2014)*100))
print "There was just {0} reported scam call from 202 area code in June".format(complaints_from_202_by_month.ix[2014].ix[6].size)









    



The percentage of phone calls from 202 area during the first six months of 2014 was 38.87%
There was just 1 reported scam call from 202 area code in June

Where did they go? Looking into the 202 numbers revealed that a number of them were from magicJack. The company has been aggressively shutting those scam calls off.

Subhead The Most Popular States and Cities to Scam:



In [36]:

    
df.dropna(subset=["created_date"])['clean_state'].value_counts()









    Out[36]:





CA     2471
nan    1901
NY     1189
FL      965
TX      942
NJ      663
PA      555
VA      545
IL      532
MA      503
GA      437
OH      414
AZ      413
WA      408
MD      360
MI      348
NC      339
IN      287
CO      280
OR      228
NV      223
MN      218
CT      207
WI      203
TN      191
MO      172
SC      145
AL      135
NM      130
KY      129
LA      114
HI      112
DE      102
UT       87
ID       77
WV       74
IA       72
AR       71
MS       70
KS       69
OK       66
DC       65
NH       63
ME       55
NE       53
RI       37
MT       30
WY       18
AK       18
VT       13
SD       12
ND        8
ON        6
PR        3
427       2
430       1
Length: 56, dtype: int64



In [37]:

    
df.dropna(subset=["created_date"])['Consumer Address, City'].value_counts()









    Out[37]:





Las Vegas        130
Houston          112
New York         106
Brooklyn         103
San Jose         100
Los Angeles       95
Phoenix           86
San Diego         83
Chicago           75
San Francisco     71
Charlotte         66
Washington        66
Austin            65
Albuquerque       58
Denver            55
...
Plano|              1
Kettering           1
Radcliff            1
Bogart              1
West Valley City    1
Coon Rapids         1
CHICAGO             1
Sunland             1
Smyrna|             1
Nogales             1
Lauderhill          1
Micco               1
Flagstaff           1
Oakridge            1
Savage              1
Length: 5422, dtype: int64

Normalizing the 2014 Complaints by State for 2013 Population:



In [38]:

    
df_2014 = df[df["created_date"] >= dt.datetime(year=2014, month=1, day=1)]
states_2014 = df_2014.dropna(subset=["created_date"])\
              .groupby(['clean_state'])



In [39]:

    
states_df = pd.DataFrame(states_2014.size().order(ascending=False))



In [40]:

    
states_df["state_abbrev"] = states_df.index



In [41]:

    
state_abbrev_to_name = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AS': 'American Samoa',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'GU': 'Guam',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
        'IN': 'Indiana',
        'KS': 'Kansas',
        'KY': 'Kentucky',
        'LA': 'Louisiana',
        'MA': 'Massachusetts',
        'MD': 'Maryland',
        'ME': 'Maine',
        'MI': 'Michigan',
        'MN': 'Minnesota',
        'MO': 'Missouri',
        'MP': 'Northern Mariana Islands',
        'MS': 'Mississippi',
        'MT': 'Montana',
        'NA': 'National',
        'NC': 'North Carolina',
        'ND': 'North Dakota',
        'NE': 'Nebraska',
        'NH': 'New Hampshire',
        'NJ': 'New Jersey',
        'NM': 'New Mexico',
        'NV': 'Nevada',
        'NY': 'New York',
        'OH': 'Ohio',
        'OK': 'Oklahoma',
        'OR': 'Oregon',
        'PA': 'Pennsylvania',
        'PR': 'Puerto Rico',
        'RI': 'Rhode Island',
        'SC': 'South Carolina',
        'SD': 'South Dakota',
        'TN': 'Tennessee',
        'TX': 'Texas',
        'UT': 'Utah',
        'VA': 'Virginia',
        'VI': 'Virgin Islands',
        'VT': 'Vermont',
        'WA': 'Washington',
        'WI': 'Wisconsin',
        'WV': 'West Virginia',
        'WY': 'Wyoming',
}



In [42]:

    
import csv
state_pop_dict = {}
with open('../data/state_pop_2013.csv', 'rU') as f:
    no_heads = f.readlines()[1:]
    reader = csv.reader(no_heads)
    for row in reader:
        state_pop_dict[row[0].lstrip("\xca")] = int(row[1].replace(",", ""))



In [43]:

    
def find_state_pop(abbrev):
    try:
        full_name = state_abbrev_to_name[abbrev]
    except KeyError:
        full_name = None
    if full_name is not None:
        population = state_pop_dict[full_name]
    else:
        population = None
    return population



In [44]:

    
states_df["state_pop"] = states_df["state_abbrev"].apply(lambda x: find_state_pop(x))



In [45]:

    
states_df["complaints_per_100000"] = states_df.apply(lambda x: round(x[0] / float(x["state_pop"]) * 100000, 1), axis=1)



In [46]:

    
states_df.sort("complaints_per_100000", ascending=False)









    Out[46]:






  
    
      
      0
      state_abbrev
      state_pop
      complaints_per_100000
    
    
      clean_state
      
      
      
      
    
  
  
    
      DE
         83
        DE
         925749
       9.0
    
    
      DC
         50
        DC
         646449
       7.7
    
    
      NV
        197
        NV
        2790136
       7.1
    
    
      HI
         97
        HI
        1404054
       6.9
    
    
      MA
        424
        MA
        6692824
       6.3
    
    
      NJ
        555
        NJ
        8899339
       6.2
    
    
      VA
        469
        VA
        8260405
       5.7
    
    
      CA
       2150
        CA
       38332521
       5.6
    
    
      AZ
        353
        AZ
        6626624
       5.3
    
    
      NY
        999
        NY
       19651127
       5.1
    
    
      NM
        107
        NM
        2085287
       5.1
    
    
      WA
        350
        WA
        6971406
       5.0
    
    
      OR
        197
        OR
        3930065
       5.0
    
    
      CT
        171
        CT
        3596080
       4.8
    
    
      MD
        287
        MD
        5928814
       4.8
    
    
      CO
        241
        CO
        5268367
       4.6
    
    
      NH
         55
        NH
        1323459
       4.2
    
    
      FL
        763
        FL
       19552860
       3.9
    
    
      ID
         61
        ID
        1612136
       3.8
    
    
      GA
        366
        GA
        9992167
       3.7
    
    
      IN
        234
        IN
        6570902
       3.6
    
    
      IL
        446
        IL
       12882135
       3.5
    
    
      PA
        436
        PA
       12773801
       3.4
    
    
      ME
         43
        ME
        1328302
       3.2
    
    
      MN
        176
        MN
        5420380
       3.2
    
    
      RI
         30
        RI
        1051511
       2.9
    
    
      TX
        770
        TX
       26448193
       2.9
    
    
      MI
        290
        MI
        9895622
       2.9
    
    
      WV
         53
        WV
        1854304
       2.9
    
    
      NC
        280
        NC
        9848060
       2.8
    
    
      OH
        320
        OH
       11570808
       2.8
    
    
      WI
        157
        WI
        5742713
       2.7
    
    
      MT
         26
        MT
        1015165
       2.6
    
    
      MO
        149
        MO
        6044171
       2.5
    
    
      WY
         14
        WY
         582658
       2.4
    
    
      SC
        106
        SC
        4774839
       2.2
    
    
      UT
         63
        UT
        2900872
       2.2
    
    
      KY
         97
        KY
        4395295
       2.2
    
    
      TN
        143
        TN
        6495978
       2.2
    
    
      NE
         40
        NE
        1868516
       2.1
    
    
      AL
         97
        AL
        4833722
       2.0
    
    
      LA
         88
        LA
        4625470
       1.9
    
    
      KS
         51
        KS
        2893957
       1.8
    
    
      AK
         13
        AK
         735132
       1.8
    
    
      IA
         54
        IA
        3090416
       1.7
    
    
      MS
         48
        MS
        2991207
       1.6
    
    
      AR
         48
        AR
        2959373
       1.6
    
    
      VT
         10
        VT
         626630
       1.6
    
    
      SD
         11
        SD
         844877
       1.3
    
    
      OK
         45
        OK
        3850568
       1.2
    
    
      ND
          7
        ND
         723393
       1.0
    
    
      PR
          2
        PR
        3615086
       0.1
    
    
      nan
       1720
       nan
            NaN
       NaN
    
    
      ON
          4
        ON
            NaN
       NaN
    
    
      427
          1
       427
            NaN
       NaN



In [ ]:

	Created Date	Consumer Address, City	Consumer Address, State Code	Company Name	Company Phone Area Code	Company Phone, Number	Company Phone, Extension	Complaint Info Initial Contact Method	Complaint Info Product Service Code
Reference Number
28741989	01/04/2011	Waianae	HI	IRS	NaN	NaN	NaN	Phone	2143\|9043
28803726	01/10/2011	Marietta	GA	Georgia State IRS	770	9448020	NaN	Phone	9043
28827558	01/12/2011	Fort Worth	TX	IRS	872	4796242	NaN	Phone	2142\|9043
28842181	01/13/2011	Orangeburg	SC	IRS	800	8297650	NaN	Phone	9043
28890200	01/19/2011	Morton	MN	IRS\|Readers Digest	876\|	4355471\|	\|	Phone	4190\|9043

		Created Date	Consumer Address, City	Consumer Address, State Code	Company Name	Company Phone, Extension	Complaint Info Initial Contact Method	Complaint Info Product Service Code	clean_state	created_date	phone	month	year
Company Phone Area Code	Company Phone, Number
585	6331814	165	139	145	165	102	161	58	165	165	165	165	165
585	6331792	148	108	126	148	55	138	91	148	148	148	148	148
415	2514089	131	108	117	131	62	125	60	131	131	131	131	131
415	2511037	126	107	115	126	118	126	5	126	126	126	126	126
800	8291040	103	92	94	103	49	101	51	103	103	103	103	103
415	2510522	96	77	84	96	0	92	96	96	96	96	96	96
202	2413122	94	83	86	94	66	94	24	94	94	94	94	94
	5069044	89	75	80	89	3	86	86	89	89	89	89	89
	2412158	84	64	75	84	78	84	0	84	84	84	84	84
	2411435	80	68	76	80	66	80	8	80	80	80	80	80
415	4848569	78	71	72	78	1	75	77	78	78	78	78	78
202	2411375	78	68	72	78	75	78	0	78	78	78	78	78
347	5597963	78	68	68	78	0	75	78	78	78	78	78	78
202	5069319	75	66	68	75	57	75	15	75	75	75	75	75
347	5590378	71	67	69	71	43	69	28	71	71	71	71	71
202	2411818	68	56	56	68	4	60	66	68	68	68	68	68
	5068708	67	57	64	67	62	67	0	67	67	67	67	67
	2807395	66	57	60	66	0	62	66	66	66	66	66	66
530	2385813	64	55	57	64	2	60	64	64	64	64	64	64
206	4144646	60	50	49	60	39	59	18	60	60	60	60	60
415	2519738	60	55	58	60	54	60	0	60	60	60	60	60
202	5068602	58	51	51	58	56	58	0	58	58	58	58	58
585	4446953	57	45	48	57	51	57	4	57	57	57	57	57
202	5069461	57	45	46	57	55	57	0	57	57	57	57	57
716	2063218	57	53	55	57	7	56	50	57	57	57	57	57

	0	state_abbrev	state_pop	complaints_per_100000
clean_state
DE	83	DE	925749	9.0
DC	50	DC	646449	7.7
NV	197	NV	2790136	7.1
HI	97	HI	1404054	6.9
MA	424	MA	6692824	6.3
NJ	555	NJ	8899339	6.2
VA	469	VA	8260405	5.7
CA	2150	CA	38332521	5.6
AZ	353	AZ	6626624	5.3
NY	999	NY	19651127	5.1
NM	107	NM	2085287	5.1
WA	350	WA	6971406	5.0
OR	197	OR	3930065	5.0
CT	171	CT	3596080	4.8
MD	287	MD	5928814	4.8
CO	241	CO	5268367	4.6
NH	55	NH	1323459	4.2
FL	763	FL	19552860	3.9
ID	61	ID	1612136	3.8
GA	366	GA	9992167	3.7
IN	234	IN	6570902	3.6
IL	446	IL	12882135	3.5
PA	436	PA	12773801	3.4
ME	43	ME	1328302	3.2
MN	176	MN	5420380	3.2
RI	30	RI	1051511	2.9
TX	770	TX	26448193	2.9
MI	290	MI	9895622	2.9
WV	53	WV	1854304	2.9
NC	280	NC	9848060	2.8
OH	320	OH	11570808	2.8
WI	157	WI	5742713	2.7
MT	26	MT	1015165	2.6
MO	149	MO	6044171	2.5
WY	14	WY	582658	2.4
SC	106	SC	4774839	2.2
UT	63	UT	2900872	2.2
KY	97	KY	4395295	2.2
TN	143	TN	6495978	2.2
NE	40	NE	1868516	2.1
AL	97	AL	4833722	2.0
LA	88	LA	4625470	1.9
KS	51	KS	2893957	1.8
AK	13	AK	735132	1.8
IA	54	IA	3090416	1.7
MS	48	MS	2991207	1.6
AR	48	AR	2959373	1.6
VT	10	VT	626630	1.6
SD	11	SD	844877	1.3
OK	45	OK	3850568	1.2
ND	7	ND	723393	1.0
PR	2	PR	3615086	0.1
nan	1720	nan	NaN	NaN
ON	4	ON	NaN	NaN
427	1	427	NaN	NaN