In [1]:
import numpy as np
import pandas as pd

Why?

From 1993 - 2010, the U.S. Census Bureau published the Consolidated Federal Funds Report, a dataset that contained total federal spending and how money flowed to each state, county, and congressional district.

When they stopped, the country was left with no information about the federal dollars in the states.

Federal Dollars in States

  1. Money directly to individuals
    • Social Security
    • Medicare
    • Food Stamps (SNAP)
    • etc.
  2. Money to state and local governments
    • Transportation
    • General welfare programs (TANF, WIC)
    • Medicaid
  3. Federal contracts performed in a state
  4. Salaries/benefits of a state's federal workers

1. Money to Individuals

source: Bureau of Economic Analysis, Annual State Personal Income and Employment, Personal Current Transfer Receipts (SA35)
Totals include the following LineCodes from the 2000 Current transfer receipts of individuals from governments series:

(list the codes here)


In [2]:
#money to individuals
indv_cols = ['year', 'state_id', 'federal', 'total']
indv = pd.read_csv(
    'data/individualreceiptgovtstate.txt',
    usecols=indv_cols,
    index_col=['year','state_id']
    )
indv = indv.rename(columns = {
    'total' : 'total_indv_receipt',
    'federal' : 'fed_indv_receipt'
})
indv.head()


Out[2]:
total_indv_receipt fed_indv_receipt
year state_id
2002 1 19682018000 15313883000
2 3189286000 2257451000
3 20486993000 15221625000
4 12298894000 9380832000
5 139072970000 93878242000

2. Money to State and Local Governments

source: Census State and Local Government Finances
state revenue totals = general revenue (i.e., exclude liquor, utility, and insurance trust revenues)


In [3]:
#money to state and local governments
govt_cols = ['year', 'state_id', 'level', 'general_rev', 'fed_rev']
govt = pd.read_csv(
    'data/revenuegeneralstate.txt',
    usecols = govt_cols,
    index_col=['year','state_id']
)
#we only want the subset of data where level = 1 (combination of state and local governments)
govt = govt[govt['level']==1][['general_rev', 'fed_rev']]
govt = govt.rename(columns = {
    'general_rev' : 'total_rev'
})
govt.head()


Out[3]:
total_rev fed_rev
year state_id
2002 1 23113713000 6263380000
2 7208865000 1789266000
3 25523515000 5718580000
4 13209552000 3645274000
5 226639222000 48249715000

3. Federal Contracts

source: USASpending.gov
primary awards (no sub-contracts) by place of performance state


In [4]:
#federal contracts
contract_cols = ['year', 'state_id', 'total_amount', 'dod_amount']
contracts = pd.read_csv(
    'data/federalcontractstate.txt',
    usecols = contract_cols,
    index_col=['year','state_id']
)
contracts = contracts.rename(columns= {
    'total_amount' : 'fed_contracts',
    'dod_amount' : 'dod_contracts'
})
contracts.head()


Out[4]:
fed_contracts dod_contracts
year state_id
2011 1 1.138446e+10 9.759450e+09
2012 1 1.073141e+10 9.095490e+09
2009 1 1.109927e+10 9.531534e+09
2008 1 1.000130e+10 8.548404e+09
2010 1 9.966980e+09 8.157725e+09

4. Federal Compensation

source: Bureau of Economic Analysis
Annual State Personal Income and Employment, 2002-2012
Line 2001 = Federal civilian employees
Line 2002 = Military employees


In [5]:
#federal employee salaries/benefits
emp_cols = ['year', 'state_id', 'emp_fed_total', 'comp_fed_total', 'emp_total', 'comp_total']
emp = pd.read_csv(
    'data/federaljobstate.txt',
    usecols = emp_cols,
    index_col=['year','state_id']
)
emp = emp.rename(columns = {
    'emp_total' : 'total_emp',
    'comp_total' : 'total_comp',
    'emp_fed_total' : 'fed_emp',
    'comp_fed_total' : 'fed_comp'
})
emp.head()


Out[5]:
total_emp total_comp fed_emp fed_comp
year state_id
2002 62 136364000 6135061000000 4911000 276494000000
2003 62 136072000 6356044000000 4933000 304410000000
2004 62 137601000 6730345000000 4880000 323937000000
2005 62 139559000 7078761000000 4816000 343384000000
2006 62 141908000 7491672000000 4825000 355886000000

Create the grand totals

federal receipts to individuals + federal money to state/local governments + contract $ + federal compensation


In [6]:
#stick everything together
all = (indv 
    .join(govt)
    .join(contracts)
    .join(emp))
all.reset_index(inplace=True)
#get rid of years < 2004
#all = all[all['year']>2003]
all.head()


Out[6]:
year state_id total_indv_receipt fed_indv_receipt total_rev fed_rev fed_contracts dod_contracts total_emp total_comp fed_emp fed_comp
0 2002 1 19682018000 15313883000 2.311371e+10 6263380000 NaN NaN 1973403 74291247000 88400 4907355000
1 2002 2 3189286000 2257451000 7.208865e+09 1789266000 NaN NaN 319714 15774610000 39647 2275956000
2 2002 3 20486993000 15221625000 2.552352e+10 5718580000 NaN NaN 2356727 97906942000 82234 4530946000
3 2002 4 12298894000 9380832000 1.320955e+10 3645274000 NaN NaN 1203665 41601200000 39789 1831098000
4 2002 5 139072970000 93878242000 2.266392e+11 48249715000 NaN NaN 15642910 794904524000 483628 27210827000

In [7]:
#total federal dollars in state = 
# federal receipts to individuals + 
# federal money to state/local governments + 
# contract $ + 
# federal compensation

all['total'] = (
    all['total_indv_receipt'] + 
    all['fed_rev'] + 
    all['fed_contracts'] +
    all['fed_comp']
)

In [8]:
#merge in state names
states = pd.read_csv(
    'data/state.txt')
all = pd.merge(all, states[['id', 'state_ansi', 'state_abbr', 'state_name']], left_on='state_id', right_on='id')

Inflation-Adjust

Adjust totals to FY 2012 dollars.
Deflator source, Office of Management and Budget:
10.1--Gross Domestic Product and Deflators Used in the Historical Tables: 1940-2019 from the FY 2015 President's Budget Request.


In [9]:
#grab the deflators published by OMB as part of the FY 2015 budget request
gdp = pd.read_excel('data/hist10z1.xls', 0, parse_cols=[0,1,2])
gdp.columns = ['year', 'gdp', 'gdp_price_index']
def clean_year(x):
    try:
        return int(x[:4])
    except:
        return np.nan
gdp['year'] = gdp['year'].map(lambda x: clean_year(x))
gdp = gdp.dropna(subset=['year'])
gdp['year'] = gdp['year'].astype(np.int64)
#adjust deflators to base year 2012
denom = gdp[gdp['year']==2012]['gdp_price_index'].values[0]
gdp['gdp_price_index'] = gdp['gdp_price_index']/denom

In [10]:
#merge deflators into main data and inflation-adjust
all = pd.merge(all, gdp[['year','gdp_price_index']], left_on='year', right_on='year')
all['total_adj'] = all['total']/all['gdp_price_index']

Per-Capita

Get population estimates for per-capita numbers
source: Census Population Estimates, State Characteristics, Vintage 2013


In [11]:
#population
pop = pd.read_csv(
    'data/populationstate.txt'
    )
pop = pop.rename(columns = {'value' : 'pop'})
all = pd.merge(all, pop, left_on=['year','state_id'], right_on=['year','state_id'])

Visualize

Note: need v0.4.0 (Sept. 2014) version of Seaborn to fix plotting issue when series has null values


In [12]:
%matplotlib inline
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt

#tweak dataset format for Seaborn time series plotting
viz2 = all
viz2 = viz2[['state_abbr', 'year', 'fed_indv_receipt', 'fed_contracts', 'fed_comp', 'fed_rev', 'total', 'gdp_price_index', 'pop']]
viz2['Individual Aid'] = viz2['fed_indv_receipt'] / viz2['gdp_price_index']
viz2['Per-Person Individual Aid'] = viz2['Individual Aid'] / viz2['pop']
viz2['Contracts'] = viz2['fed_contracts'] / viz2['gdp_price_index']
viz2['Per-Person Contracts'] = viz2['Contracts'] / viz2['pop']
viz2['State Revenue'] = viz2['fed_rev'] / viz2['gdp_price_index']
viz2['Per-Person State Revenue'] = viz2['State Revenue'] / viz2['pop']
viz2['Compensation'] = viz2['fed_comp'] / viz2['gdp_price_index'] 
viz2['Per-Person Compensation'] = viz2['Compensation'] / viz2['pop']
viz2['Total'] = viz2['total'] / viz2['gdp_price_index'] 
viz2['Per-Person Total'] = viz2['Total'] / viz2['pop']
viz2 = viz2.rename(columns = {'state_abbr' : 'State', 'year' : 'Year'})
spendtype = viz2[['State', 'Year', 'Individual Aid','Contracts', 'State Revenue', 'Compensation']]
spendtype_pc = viz2[['State', 'Year', 'Per-Person Individual Aid','Per-Person Contracts', 'Per-Person State Revenue', 'Per-Person Compensation']]

spendtype = pd.melt(spendtype, id_vars=['State', 'Year'], var_name = 'Fed $ Type', value_name = 'Amount')
spendtype_pc = pd.melt(spendtype_pc, id_vars=['State', 'Year'], var_name = 'Fed $ Type', value_name = 'Amount')
spendtype_pc.head()


Out[12]:
State Year Fed $ Type Amount
0 AL 2002 Per-Person Individual Aid 4218.937
1 AK 2002 Per-Person Individual Aid 4337.702
2 AZ 2002 Per-Person Individual Aid 3481.552
3 AR 2002 Per-Person Individual Aid 4278.875
4 CA 2002 Per-Person Individual Aid 3322.725

Prep Data


In [13]:
#tweak dataset format for Seaborn time series plotting
viz2 = all
viz2 = viz2[['state_abbr', 'year', 'fed_indv_receipt', 'fed_contracts', 'fed_comp', 'fed_rev', 'total', 'gdp_price_index', 'pop']]
viz2['Individual Aid'] = viz2['fed_indv_receipt'] / viz2['gdp_price_index']
viz2['Per-Person Individual Aid'] = viz2['Individual Aid'] / viz2['pop']
viz2['Contracts'] = viz2['fed_contracts'] / viz2['gdp_price_index']
viz2['Per-Person Contracts'] = viz2['Contracts'] / viz2['pop']
viz2['State Revenue'] = viz2['fed_rev'] / viz2['gdp_price_index']
viz2['Per-Person State Revenue'] = viz2['State Revenue'] / viz2['pop']
viz2['Compensation'] = viz2['fed_comp'] / viz2['gdp_price_index'] 
viz2['Per-Person Compensation'] = viz2['Compensation'] / viz2['pop']
viz2['Total'] = viz2['total'] / viz2['gdp_price_index'] 
viz2['Per-Person Total'] = viz2['Total'] / viz2['pop']
viz2 = viz2.rename(columns = {'state_abbr' : 'State', 'year' : 'Year'})

spendtype = viz2[['State', 'Year', 'Individual Aid','Contracts', 'State Revenue', 'Compensation']]
spendtype = pd.melt(spendtype, id_vars=['State', 'Year'], var_name = 'Fed $ Type', value_name = 'Amount')

spendtype_pc = viz2[['State', 'Year', 'Per-Person Individual Aid','Per-Person Contracts', 'Per-Person State Revenue', 'Per-Person Compensation']]
spendtype_pc = pd.melt(spendtype_pc, id_vars=['State', 'Year'], var_name = 'Fed $ Type', value_name = 'Amount')
spendtype_pc.head()


Out[13]:
State Year Fed $ Type Amount
0 AL 2002 Per-Person Individual Aid 4218.937
1 AK 2002 Per-Person Individual Aid 4337.702
2 AZ 2002 Per-Person Individual Aid 3481.552
3 AR 2002 Per-Person Individual Aid 4278.875
4 CA 2002 Per-Person Individual Aid 3322.725

Federal Per-Capita Spending, U.S.

Federal Per-Capita $ Trends, by State


In [21]:
sns.factorplot(
    'Year', 'Amount', 'Fed $ Type',
    spendtype_pc[spendtype_pc['State']=='US'],kind='point',size=5
    )


Out[21]:
<seaborn.axisgrid.FacetGrid at 0x28379550>

In [15]:
#plot per-capita federal spending (by type) in the U.S.
#exclude Washington, DC because its federal compensation numbers skew so high
sns.set(style="white")
flatui = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
sns.factorplot(
    'Year', 'Amount', 'Fed $ Type',
    spendtype_pc[spendtype_pc['State']<>'DC'],
    col='State', col_wrap=4, kind='point',
    palette=sns.color_palette(flatui)
    )


Out[15]:
<seaborn.axisgrid.FacetGrid at 0x17612a90>

Compare Each State's Per-Capita Fed. $ To U.S. Per-Capita


In [16]:
stateus_pc = viz2[['State', 'Year', 'Per-Person Total']]
#limit DataFrame to years where we have data for all 4 ways that federal $ go to states
#get rid of DC
stateus_pc = stateus_pc[(stateus_pc['Year'].isin([2007, 2008, 2009, 2010, 2011])) & (~stateus_pc['State'].isin(['DC']))]
us = stateus_pc[stateus_pc['State']=='US']
us = us.rename(columns={'Per-Person Total' : 'US Per-Person Total'})
stateus_pc = stateus_pc[stateus_pc['State']<>'US']
stateus_pc = pd.merge(
    stateus_pc, us,
    left_on=['Year'], right_on=['Year'],
    suffixes=('', '_US'))
stateus_pc = stateus_pc[['State', 'Year', 'Per-Person Total', 'US Per-Person Total']]
stateus_pc = pd.melt(
    stateus_pc,
    id_vars=['Year', 'State'],
    var_name = 'Type', value_name = 'Amount')
sns.set(style="white")
sns.factorplot(
    'Year', 'Amount', 'Type',
    stateus_pc,
    col='State', col_wrap=4, kind='bar'
    )


Out[16]:
<seaborn.axisgrid.FacetGrid at 0x213154e0>

Federal Contracts


In [17]:
contracts = pd.read_csv('data/fed_contract_totals.txt')
contracts.head()


Out[17]:
state_abbr year total total_adj dod dod_adj non-dod non-dod_adj total_per_capita total_per_capita_adj dod_per_capita dod_per_capita_adj other_per_capita other_per_capita_adj dod_percent_total other_percent_total update_date
0 AK 2012 2.449410e+09 2.566977e+09 2.021896e+09 2.118944e+09 4.275133e+08 4.480333e+08 3353.95 3514.933976 2768.56 2901.446238 585.39 613.487738 82.55 17.45 9/5/2014 17:30:30.164745
1 AK 2013 1.625632e+09 1.678331e+09 1.279484e+09 1.320963e+09 3.461474e+08 3.573687e+08 2211.35 2283.037374 1740.48 1796.902746 470.86 486.124303 78.71 21.29 9/5/2014 17:30:30.171559
2 AK 2011 2.262005e+09 2.412034e+09 1.821573e+09 1.942390e+09 4.404323e+08 4.696442e+08 3127.02 3334.420985 2518.16 2685.178076 608.86 649.242909 80.53 19.47 9/5/2014 17:30:30.178061
3 AK 2007 2.360058e+09 2.672470e+09 1.876305e+09 2.124680e+09 4.837532e+08 5.477898e+08 3469.14 3928.365983 2758.06 3123.157060 711.09 805.220247 79.50 20.50 9/5/2014 17:30:30.184181
4 AK 2009 2.814503e+09 3.086416e+09 2.194235e+09 2.406223e+09 6.202682e+08 6.801932e+08 4027.08 4416.142121 3139.58 3442.899441 887.50 973.242680 77.96 22.04 9/5/2014 17:30:30.190148

In [ ]:
contracts = contracts[~contracts['state_abbr'].isin(['DC', 'PR', 'US'])]
g = sns.FacetGrid(contracts, col='state_abbr', col_wrap=4)
g.map(sns.pointplot, 'year', 'total_adj')
g.set_axis_labels("Year", "Per-Capita Fed. Contract $")