Mapping federal crop insurance in the U.S.

A Jupyter notebook (Python 3) by Peter Donovan, info@soilcarboncoalition.org

Open data is not just a thing or a tool. It's a behavior, based on beliefs. This notebook is a way of sharing my methods and assumptions, and if you use the same or similar tools (such as R instead of Python, for example) you can retread these steps. I hope this notebook may also serve as a guide for me as well as others who want to do similar things.

With crop insurance, as with any data set, looking at the data is a good way of learning about its particulars if not its intentions. Some knowledge of the context or domain of the data is usually required.

For background on federal crop insurance, the following may be a start:

Dennis Shields' 2015 report from the Congressional Research Service: https://fas.org/sgp/crs/misc/R40532.pdf

Environmental Working Group's material on crop insurance, which includes interactive maps showing rate of return (payouts compared to premiums) on some crops by county from 2001 through 2014: http://www.ewg.org/research/crop-insurance-lottery. The average federal subsidy for crop insurance premiums is about 60%.

The Natural Resources Defense Council has a 2013 paper on crop insurance, https://www.nrdc.org/sites/default/files/soil-matters-IP.pdf. This paper suggests that crop insurance could be reformed to reward farming that is low risk with environmental rewards.

A starting hypothesis: federally subsidized crop insurance, while it sustains the economic viability of many farm businesses, might also tend to replace soil health and function as the foundation of a viable agriculture.

To investigate the hypothesis, we'll start by compiling data.

First, we get data. Download and unzip the data file from the USDA Risk Management Agency website: http://www.rma.usda.gov/data/cause.html The complete data for each year is under the "Summary of Business with Month of Loss" header. So far I am using the 2014 through 2016 data. You can get the column headers from the same web page as a Word or pdf doc.



In [1]:

    
#some usual imports, including some options for displaying large currency amounts with commas and only 2 decimals
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option('display.float_format', '{:,}'.format)
pd.set_option('display.precision',2)

From http://www.rma.usda.gov/data/cause we see that years 2010 through 2016 are available as zip archives in Summary of Business. With a slower connection it is better to download and extract the zip archives outside of this notebook. Each contains a text file such as colsom14.txt, which will be an example for this notebook.

Unzip the file and inspect it with a text editor. There are pipe characters separating the fields, and sometimes sequences of spaces before them or after them. There are no column headers, we'll add those next.



In [2]:

    
df = pd.read_csv('/Users/Peter/Documents/atlas/RMA/colsom14.txt',sep='|',header=None)
df.shape #this counts rows, columns in our dataframe









    Out[2]:





(133979, 24)

The column headers are supplied in a Word document (Record layout: Word) from the same web page. They differ for 2010-2014 and from 2015 forward. Format them as a python list of strings as follows, and add them to the dataframe.



In [3]:

    
the_columns_2014 = ['Crop Year Identifier','State Code','State Abbreviation ','County Code','County Name','Crop Code','Crop Name','Insurance Plan Code','Insurance Plan Name Abbreviation','Coverage Category','Stage Code','Cause of Loss Code','Cause of Loss Description','Month of Loss','Month of Loss Name','Policies Earning Premium','Policies Indemnified','Net Planted Acres','Liability','Total Premium','Subsidy','Determined Acres','Indemnity Amount','Loss Ratio']



the_columns_15_16 = ['Crop Year Identifier', 'State Code', 'State Abbreviation ',
       'County Code', 'County Name', 'Crop Code', 'Crop Name',
       'Insurance Plan Code', 'Insurance Plan Name Abbreviation',
       'Coverage Category', 'Stage Code', 'Cause of Loss Code',
       'Cause of Loss Description', 'Month of Loss', 'Month of Loss Name',
       'Policies Earning Premium', 'Policies Indemnified', 'Net Planted Acres',
       'Net Endorsed Acres', 'Liability', 'Total Premium', 'Subsidy',
       'Determined Acres', 'Indemnity Amount', 'Loss Ratio']
df.columns = the_columns_2014 #this adds our column headers

There are spaces on either side of some of the fields. We can use str.strip() to remove them.



In [4]:

    
#we strip excess white space from the columns (numeric columns don't work for strip)
cols_w_spaces = ['County Name','Crop Name','Insurance Plan Name Abbreviation','Cause of Loss Description']

for item in cols_w_spaces:
    df[item] = df[item].map(lambda x: x.strip())
#check the result
print(list(df.loc[1187]))









    



[2014, 1, 'AL', 105, 'Perry', 81, 'SOYBEANS', 2, 'RP', 'A', 'H ', 11, 'Drought', 8, 'AUG', 4, 4, 774.10000000000002, 155474, 43022, 26033, 795.54999999999995, 62849, 1.46]

FIPS code

The state and county location codes are numeric (int64). FIPS (Federal Information Processing Standard) codes for counties are 5-digit strings. We'll pad with zeros using zfill function. This will come in handy when it comes to mapping, as we will want to merge or join our data with county boundaries using the FIPS code.



In [5]:

    
#convert to strings, pad with zeros, 2 digits for state, 3 for county
df['State Code'] = df['State Code'].map(lambda x: str(x)).apply(lambda x: x.zfill(2))
df['County Code'] = df['County Code'].map(lambda x: str(x)).apply(lambda x: x.zfill(3))

#add FIPS or id column and test
df['FIPS'] = df['State Code'] + df['County Code']
df['FIPS'][10] #to make sure we have a 5-digit string, not a number









    Out[5]:





'01001'

Map indemnities by county



In [6]:

    
counties = df.groupby(['FIPS','County Name'])
aggregated = counties.agg({'Indemnity Amount': np.sum})
aggregated.sort_values('Indemnity Amount',ascending=False)









    Out[6]:






  
    
      
      
      Indemnity Amount
    
    
      FIPS
      County Name
      
    
  
  
    
      06019
      Fresno
      105742443
    
    
      48165
      Gaines
      71305188
    
    
      27129
      Renville
      57776883
    
    
      06999
      All Other Counties
      49644074
    
    
      27143
      Sibley
      49374967
    
    
      06029
      Kern
      49220195
    
    
      19155
      Pottawattamie
      48536658
    
    
      40065
      Jackson
      45532816
    
    
      19109
      Kossuth
      44277075
    
    
      06031
      Kings
      44248713
    
    
      27043
      Faribault
      44233662
    
    
      27013
      Blue Earth
      44229055
    
    
      27085
      McLeod
      42950986
    
    
      48445
      Terry
      42788207
    
    
      19069
      Franklin
      41027891
    
    
      48189
      Hale
      39302568
    
    
      31019
      Buffalo
      38997773
    
    
      19147
      Palo Alto
      37509667
    
    
      27091
      Martin
      37482059
    
    
      38009
      Bottineau
      36766845
    
    
      19083
      Hardin
      36750283
    
    
      27105
      Nobles
      36650206
    
    
      27127
      Redwood
      35454664
    
    
      27133
      Rock
      35184224
    
    
      19193
      Woodbury
      34628871
    
    
      48305
      Lynn
      34441575
    
    
      19197
      Wright
      33962591
    
    
      38035
      Grand Forks
      33238080
    
    
      48369
      Parmer
      32743134
    
    
      27173
      Yellow Medicine
      31963615
    
    
      ...
      ...
      ...
    
    
      51095
      James City
      1804
    
    
      47115
      Marion
      1704
    
    
      41013
      Crook
      1686
    
    
      51153
      Prince William
      1506
    
    
      21063
      Elliott
      1498
    
    
      29169
      Pulaski
      1367
    
    
      13021
      Bibb
      1361
    
    
      37039
      Cherokee
      1346
    
    
      33009
      Grafton
      1333
    
    
      13105
      Elbert
      1254
    
    
      45047
      Greenwood
      1200
    
    
      12017
      Citrus
      1136
    
    
      51005
      Alleghany
      1122
    
    
      28113
      Pike
      1013
    
    
      13217
      Newton
      1010
    
    
      01133
      Winston
      868
    
    
      01029
      Cleburne
      774
    
    
      54025
      Greenbrier
      714
    
    
      22057
      Lafourche
      674
    
    
      37043
      Clay
      664
    
    
      21115
      Johnson
      658
    
    
      47043
      Dickson
      616
    
    
      37199
      Yancey
      600
    
    
      42069
      Lackawanna
      550
    
    
      54075
      Pocahontas
      527
    
    
      29123
      Madison
      448
    
    
      54999
      All Other Counties
      446
    
    
      45077
      Pickens
      442
    
    
      13137
      Habersham
      297
    
    
      28067
      Jones
      201
    
  

2662 rows × 1 columns



In [7]:

    
aggregated.reset_index(level=0, inplace=True)
aggregated.reset_index(level=0, inplace=True)
#run this twice to convert the two indexes to columns



In [8]:

    
#rename columns for convenience
aggregated.rename(columns={'County Name': 'name', 'FIPS': 'id', 'Indemnity Amount': 'indemnity'}, inplace=True)
#convert to $millions
aggregated['indemnity']=aggregated['indemnity']/1000000
#reorder columns and write to tab-separated tsv file for d3 mapping
aggregated = aggregated[['id','name','indemnity']]
aggregated.to_csv('/Users/Peter/Documents/atlas/RMA/indemnity2014.tsv', sep='\t', index=False)

Causes of loss

Let's look at the causes of loss. NOTE: These procedures could be duplicated to aggregate indemnities by 'Crop Name' as well.



In [9]:

    
df.groupby('Cause of Loss Description').agg({'Indemnity Amount':np.sum}).sort_values('Indemnity Amount',ascending=False)









    Out[9]:






  
    
      
      Indemnity Amount
    
    
      Cause of Loss Description
      
    
  
  
    
      Excess Moisture/Precip/Rain
      2483884562
    
    
      Decline in Price
      1848696139
    
    
      Drought
      1718087747
    
    
      Hail
      933202124
    
    
      Cold Wet Weather
      339836298
    
    
      Area Plan Crops Only
      316749929
    
    
      Failure Irrig Supply
      311330493
    
    
      Freeze
      273106811
    
    
      Wind/Excess Wind
      220228074
    
    
      Heat
      203128308
    
    
      Cold Winter
      109445405
    
    
      Frost
      78652859
    
    
      Flood
      75085010
    
    
      Plant Disease
      59895203
    
    
      Other (Snow-Lightning-Etc.)
      45130916
    
    
      Hot Wind
      40752799
    
    
      Mycotoxin (Aflatoxin)
      28388250
    
    
      Wildlife
      18875194
    
    
      Hurricane/Tropical Depression
      7931157
    
    
      Insects
      4835435
    
    
      Failure Irrig Equip
      3986129
    
    
      House Burn (Pole Burn)
      3352085
    
    
      Tornado
      1843484
    
    
      Fire
      1202486
    
    
      Excess Sun
      993872
    
    
      Salinity
      945969
    
    
      Inability to Prepare Land for Irr
      429956
    
    
      Cyclone
      415750
    
    
      Falling Numbers
      325744
    
    
      Insufficient Chilling Hours
      231461
    
    
      Earthquake
      116126
    
    
      Asian Soybean Rust
      77359
    
    
      Oxygen Depletion
      57720
    
    
      Federal or State Ordered Destruct
      6086
    
    
      Volcanic Eruption
      658



In [10]:

    
causes_2014 = df.groupby('Cause of Loss Description')['Indemnity Amount'].sum()
causes_2014.sort_values(ascending=False)









    Out[10]:





Cause of Loss Description
Excess Moisture/Precip/Rain          2483884562
Decline in Price                     1848696139
Drought                              1718087747
Hail                                  933202124
Cold Wet Weather                      339836298
Area Plan Crops Only                  316749929
Failure Irrig Supply                  311330493
Freeze                                273106811
Wind/Excess Wind                      220228074
Heat                                  203128308
Cold Winter                           109445405
Frost                                  78652859
Flood                                  75085010
Plant Disease                          59895203
Other (Snow-Lightning-Etc.)            45130916
Hot Wind                               40752799
Mycotoxin (Aflatoxin)                  28388250
Wildlife                               18875194
Hurricane/Tropical Depression           7931157
Insects                                 4835435
Failure Irrig Equip                     3986129
House Burn (Pole Burn)                  3352085
Tornado                                 1843484
Fire                                    1202486
Excess Sun                               993872
Salinity                                 945969
Inability to Prepare Land for Irr        429956
Cyclone                                  415750
Falling Numbers                          325744
Insufficient Chilling Hours              231461
Earthquake                               116126
Asian Soybean Rust                        77359
Oxygen Depletion                          57720
Federal or State Ordered Destruct          6086
Volcanic Eruption                           658
Name: Indemnity Amount, dtype: int64



In [11]:

    
#to generate a table of total indemnities by Cause of Loss, you can export a csv
causes_2014.to_csv('/Users/Peter/Documents/atlas/RMA/causes_2014.csv')

'Excess Moisture/Precip/Rain' and 'Drought' are by far the most common causes. Let's filter the dataframe by these two, so we can potentially see which counties had indemnities for both causes, and how much.



In [12]:

    
rain = df[df['Cause of Loss Description']=='Excess Moisture/Precip/Rain']
drought = df[df['Cause of Loss Description']=='Drought']
print(rain.shape, drought.shape)









    



(37767, 25) (22937, 25)

Now do a groupby on each dataframe by county, with sums of indemnity amounts.



In [13]:

    
g_rain = rain.groupby(['FIPS','County Name']).agg({'Indemnity Amount':np.sum})
g_drought = drought.groupby(['FIPS','County Name']).agg({'Indemnity Amount':np.sum})
together=pd.concat([g_rain,g_drought],axis=1)
together.columns = ['moisture','drought']
together.head()









    Out[13]:






  
    
      
      
      moisture
      drought
    
    
      FIPS
      County Name
      
      
    
  
  
    
      01001
      Autauga
      2,442.0
      18,849.0
    
    
      01003
      Baldwin
      816,341.0
      116,282.0
    
    
      01005
      Barbour
      20,835.0
      588,264.0
    
    
      01007
      Bibb
      22,572.0
      43,286.0
    
    
      01011
      Bullock
      60,122.0
      77,797.0

Let's add two columns, a total, and a ratio of moisture to drought.



In [14]:

    
together['total']=together.moisture + together.drought
together['ratio']=together.moisture / together.drought
together.head(20)









    Out[14]:






  
    
      
      
      moisture
      drought
      total
      ratio
    
    
      FIPS
      County Name
      
      
      
      
    
  
  
    
      01001
      Autauga
      2,442.0
      18,849.0
      21,291.0
      0.1295559446124463
    
    
      01003
      Baldwin
      816,341.0
      116,282.0
      932,623.0
      7.0203556870366866
    
    
      01005
      Barbour
      20,835.0
      588,264.0
      609,099.0
      0.0354177716127453
    
    
      01007
      Bibb
      22,572.0
      43,286.0
      65,858.0
      0.5214619045418842
    
    
      01011
      Bullock
      60,122.0
      77,797.0
      137,919.0
      0.772806149337378
    
    
      01013
      Butler
      16,653.0
      29,017.0
      45,670.0
      0.5739049522693593
    
    
      01015
      Calhoun
      36,100.0
      15,935.0
      52,035.0
      2.265453404455601
    
    
      01017
      Chambers
      nan
      386.0
      nan
      nan
    
    
      01019
      Cherokee
      31,976.0
      12,436.0
      44,412.0
      2.5712447732389836
    
    
      01021
      Chilton
      80,608.0
      30,984.0
      111,592.0
      2.6016008262328945
    
    
      01023
      Choctaw
      20,119.0
      nan
      nan
      nan
    
    
      01029
      Cleburne
      nan
      464.0
      nan
      nan
    
    
      01031
      Coffee
      470,762.0
      3,000,403.0
      3,471,165.0
      0.15689958982176727
    
    
      01033
      Colbert
      nan
      34,912.0
      nan
      nan
    
    
      01035
      Conecuh
      75,425.0
      121,630.0
      197,055.0
      0.6201183918441174
    
    
      01039
      Covington
      106,722.0
      2,204,963.0
      2,311,685.0
      0.04840081216782322
    
    
      01041
      Crenshaw
      91,510.0
      217,002.0
      308,512.0
      0.42170118247758087
    
    
      01045
      Dale
      53,665.0
      1,234,685.0
      1,288,350.0
      0.043464527389577096
    
    
      01047
      Dallas
      121,171.0
      506,776.0
      627,947.0
      0.2391016938450124
    
    
      01049
      De Kalb
      122,687.0
      nan
      nan
      nan



In [15]:

    
mixed = together[(together.ratio < 4) & (together.ratio > .25)]
mixed.shape









    Out[15]:





(513, 4)



In [16]:

    
mixed.reset_index(level=0, inplace=True)
mixed.reset_index(level=0, inplace=True)
#run this twice



In [17]:

    
mixed = mixed.rename(columns={'total':'indemnity'})



In [18]:

    
mixed.indemnity = mixed.indemnity/1000000



In [19]:

    
mixed.to_csv('/Users/Peter/Documents/atlas/RMA/moisture_plus_drought_2014.tsv', sep='\t', index=False)

		Indemnity Amount
FIPS	County Name
06019	Fresno	105742443
48165	Gaines	71305188
27129	Renville	57776883
06999	All Other Counties	49644074
27143	Sibley	49374967
06029	Kern	49220195
19155	Pottawattamie	48536658
40065	Jackson	45532816
19109	Kossuth	44277075
06031	Kings	44248713
27043	Faribault	44233662
27013	Blue Earth	44229055
27085	McLeod	42950986
48445	Terry	42788207
19069	Franklin	41027891
48189	Hale	39302568
31019	Buffalo	38997773
19147	Palo Alto	37509667
27091	Martin	37482059
38009	Bottineau	36766845
19083	Hardin	36750283
27105	Nobles	36650206
27127	Redwood	35454664
27133	Rock	35184224
19193	Woodbury	34628871
48305	Lynn	34441575
19197	Wright	33962591
38035	Grand Forks	33238080
48369	Parmer	32743134
27173	Yellow Medicine	31963615
...	...	...
51095	James City	1804
47115	Marion	1704
41013	Crook	1686
51153	Prince William	1506
21063	Elliott	1498
29169	Pulaski	1367
13021	Bibb	1361
37039	Cherokee	1346
33009	Grafton	1333
13105	Elbert	1254
45047	Greenwood	1200
12017	Citrus	1136
51005	Alleghany	1122
28113	Pike	1013
13217	Newton	1010
01133	Winston	868
01029	Cleburne	774
54025	Greenbrier	714
22057	Lafourche	674
37043	Clay	664
21115	Johnson	658
47043	Dickson	616
37199	Yancey	600
42069	Lackawanna	550
54075	Pocahontas	527
29123	Madison	448
54999	All Other Counties	446
45077	Pickens	442
13137	Habersham	297
28067	Jones	201

	Indemnity Amount
Cause of Loss Description
Excess Moisture/Precip/Rain	2483884562
Decline in Price	1848696139
Drought	1718087747
Hail	933202124
Cold Wet Weather	339836298
Area Plan Crops Only	316749929
Failure Irrig Supply	311330493
Freeze	273106811
Wind/Excess Wind	220228074
Heat	203128308
Cold Winter	109445405
Frost	78652859
Flood	75085010
Plant Disease	59895203
Other (Snow-Lightning-Etc.)	45130916
Hot Wind	40752799
Mycotoxin (Aflatoxin)	28388250
Wildlife	18875194
Hurricane/Tropical Depression	7931157
Insects	4835435
Failure Irrig Equip	3986129
House Burn (Pole Burn)	3352085
Tornado	1843484
Fire	1202486
Excess Sun	993872
Salinity	945969
Inability to Prepare Land for Irr	429956
Cyclone	415750
Falling Numbers	325744
Insufficient Chilling Hours	231461
Earthquake	116126
Asian Soybean Rust	77359
Oxygen Depletion	57720
Federal or State Ordered Destruct	6086
Volcanic Eruption	658

		moisture	drought
FIPS	County Name
01001	Autauga	2,442.0	18,849.0
01003	Baldwin	816,341.0	116,282.0
01005	Barbour	20,835.0	588,264.0
01007	Bibb	22,572.0	43,286.0
01011	Bullock	60,122.0	77,797.0