EITC Housing Benefits

In this notebook, we try to estimate the cost of a hypothetical EITC program to aid with rental housing payments. The aim is to supply a family with enough EITC to have them pay no more than 30% of their income for housing. Namely, the algorithm we implement is as follows:

Let $y_i$ denote the income of family $i$. Furthermore, suppose that family $i$ has characteristics married $s \in \{0,1\}=S$ and children $c \in \{0,\dots, 3\}=C$. For convenience, let the family type $t\in C \times S$. Suppose $y_i$ is such that it qualifies family $i$ for $e_i = e(y_i,s,c)$ amount of EITC benefits. Finally, suppose that family $i$ lives in location $l$ and pays rent $r$ for a $b\in \{0,\dots,4\}$ bedroom housing unit. Then, under our scheme, family $i$ will receive additional EITC housing benefits $h$ of:

$$h_{l,t,y} = max(r_{l,b} - .3(y_i+e_i(y_i,s,c)), 0 )$$

Data

We have 3 datasets containing the following information:

Number of EITC recipients in 356 metros per income bracket.
Number of EITC recipients in 356 metros per number of children.
Amount of EITC benefits one receives for various income levels and family characteristics (ie marriage and number of childre)
The FMRs in 356 metros per number of bedrooms of the housing unit.

Assumptions

Two datasets, taxes and FMRs, are merged based on their CDSA codes but not all of the codes match. Namely, the tax data contains 381 metros but only 351 of them match to the FMR data. We are able to match an additional 5 CDSA codes for a total of 356 metros. However, there is no guarantee that the additional 5 metros match precisely with the metro of the Brookings data. For example, the FMR data contain a CDSA as 'Santa Barbara-Santa Maria-Goleta' while the tax has CDSA 'Santa Maria-Santa Barbara'. clearly, the FMR area is larger but in adding the 5 metros, we disregard this discrepancy.
We have a distribution of files incomes in each metro area. The distributions are binned in increments of \$5k up to \$40k and then are binned at \$10k increments. Since we only have the number of people who filed within each bin, we assume a uniform distribution of incomes in each bin and use the bin mean as the representative income for that bin. However, the highest income for which EITC benefits are given is \$52400k but we must rely on \$55k. The latter will overestimate the income and number of EITC recipients in that bin. One proposal to mediate that is to divide the bin \$50-\$60k in half and to use \$52500 as the median and half the number of \$50-\$60k as the filers.
We do not have any figures on marriage but need to differentiate between married and unmarried families. Consequently, we employ 1-50.2% -figure from some Christian Science article*- as the proportion of adults married. In doing so, we must assume that marriage rates across the income distribution, number of children, and metros do not vary.
Both FMR and EITC data are for year 2014 whereas the tax data is for year 2013. Thus, we assume that FMRs and EITC benefits do not differ drastically between 2013 and 2014.
We only have numbers aggregated based on children and based on income bins but not across both; therefore, to estimate the proportion of families with each child type in each income bin, we assume that the number of children a family has does not vary across the income distributions.
Consider the following variables:
- the number $n_{c,l}$ of EITC eligible families in metro $l$ and with number of children $c\in \{0,1,2,3+\}$
- the number $m_{b,l}$ of EITC filers in each metro $l$ and income bin $b \in \{1,2,\dots, 10\}$
The tax data contains two sets of numbers data: 1) the number $n_l = \sum_{c} n_{c,l}$ of EITC eligible people in each metro for each family type 2) the number $m_l = \sum_{b} m_{b,l}$ of families who filed for EITC in each income bracket. Note $n_l \geq m_l \quad \forall l$. To acquire the number of family types in each income bracket, we calculate

$$p_{l,c} = \frac{n_{l,c}}{n_{l}}$$

then compute

$$k_{c,l,b} = p_{l,c} \times m_{b,l}$$

Where $k_{c,l,b}$ is the number of families with children $c$ and in income bin $b$ in metro $l$. Of course, for this we must assume that the proportions among the eligible hold equally among those who actually file. Finally, let $T(p_{l,c})$ be the total cost calculated using eligible counts and let $T(p_{l,b})$ be that total cost calculated using bin proportions then $$T(p_{l,c}) \geq T(p_{l,b})$$
Unlike in the tax data, in which all cbsa codes are unique, the FMR data sometimes contains multiple entries for the same cbsa, albeit over different jurisdictions. Most of the repeats of cbsa codes in the FMR data are also duplicates but, on occasion, there are deviations within a particular cbsa code such as NY - 35620. To give some weight to differentials within cbsa FMR entries, we take the mean within duplicated cbsa codes.



In [1]:

    
##Load modules and set data path:
import pandas as pd
import numpy as np
import numpy.ma as ma
import re
data_path = "C:/Users/SpiffyApple/Documents/USC/RaphaelBostic/EITChousing"
output_container = {}



In [2]:

    
#################################################################
################### load tax data ###############################
#upload tax data
tx_o = pd.read_csv("/".join([data_path, "metro_TY13.csv"]))

#there is some weird formatting in the data due to excel -> fix:
tx_o['cbsa'] = tx_o.cbsa.apply(lambda x: re.sub("=|\"", "",x))

#numerous numerical column entries have commas -> fix:
tx_o.iloc[:,3:] = tx_o.iloc[:,3:].replace(",", "",regex=True).astype(np.int64)

#set the cbsa as the dataframe index:
tx_o.set_index('cbsa', inplace=True)

#for convenience, extract cols corresponding with number of ppl filed for EITC in each income bracket
eagi_cols = tx_o.columns[tx_o.columns.str.contains("EAGI")]

#cut the 50-60 bin number in half according to assumptions
tx_o['EAGI50_13'] = tx_o['EAGI50_13']/2
#tx_o['EAGI40_13'] = tx_o['EAGI40_13']/2

Caveat:

The cell below is the longest running part of this code because we must interact with excel, individually load 9 sheets, and then concatenate them into a single dataset.



In [3]:

    
#################################################################
###################### Read EITC data ###########################
##parse the EITC data:
eitc = pd.ExcelFile("/".join([data_path, "EITC Calculator-2-14.xlsx"]))
sheets = eitc.sheet_names
eitc.close()

eitc_dict = pd.read_excel("/".join([data_path, "EITC Calculator-2-14.xlsx"]), sheetname =  sheets[9:17], skiprows = 14 )

eitc_o = pd.concat(eitc_dict)

#################################################################
################### Process eitc_o data ###########################
eitc_o = eitc_o.iloc[:,[0,40]]
eitc_o.dropna(inplace=True)
eitc_o = eitc_o.loc[eitc_o[2014]>0,:]
eitc_o.reset_index(level=0, inplace=True)
eitc_o.reset_index(drop=True, inplace=True)
#eitc_o['num_kids'] = eitc_o.level_0.str.extract("(\d)", expand=False)
#eitc_o['married'] = eitc_o.level_0.str.contains("Married").astype(int)



In [ ]:

    
#eitc_o.to_csv("./EITC_benefits.csv")



In [4]:

    
#################################################################
################# Read & process fmr data  ######################
#read in fmr data
fmr_o1 = pd.read_excel("/".join([data_path, "FY2014_4050_RevFinal.xls"]))

#drop non Metro areas:
fmr_o1 = fmr_o1[fmr_o1.Metro_code.str.contains("METRO")]

#extract cbsa code:
fmr_o1['cbsa'] = fmr_o1.Metro_code.str.extract("RO(\d{4,5})[MN]", expand=False)

#edit FMR CBSA codes to match tax CBSA codes:
cbsa_chng_map = {'14060':'14010', '29140':'29200', '31100':'31080', '42060':'42200', '44600':'48260'}#, '36061':'35620'}
fmr_o1.cbsa.replace(cbsa_chng_map, inplace=True)

#fetch columns that pertain to FMR figures
fmr_cols = fmr_o1.columns[fmr_o1.columns.str.contains("fmr\d")]

#clean up the area names by removing "MSA and HUD Metro FMR Area"
fmr_o1['Areaname'] = fmr_o1.Areaname.apply(lambda x: re.sub(" MSA| HUD Metro FMR Area", "", x))

#drop duplicates based on cbsa code:
#fmr_o = fmr_o.drop_duplicates('cbsa')
fmr_o2 = fmr_o1.groupby("cbsa").mean()[fmr_cols]
fmr_o = pd.concat([fmr_o2, fmr_o1[['cbsa', 'Areaname']].drop_duplicates("cbsa").set_index("cbsa")],axis=1)


#set an interpratable index
#fmr_o.set_index("cbsa", inplace=True)

#subset to only matching cbsa codes between tax and fmr data
common_cbsa = fmr_o.index.intersection(tx_o.index)

fmr_o = fmr_o.loc[common_cbsa]
tx_o = tx_o.loc[common_cbsa]

fmr_o[fmr_cols] = (fmr_o[fmr_cols])*12

print("The number of CBSAs between tax and fmr data matches?:", fmr_o.shape[0] == tx_o.shape[0])









    



The number of CBSAs between tax and fmr data matches?: True



In [5]:

    
######################################
##0. Define function to calculate eitc
######################################
def calc_haus_eitc(group, income, fmr):
    """
    INPUT:
        1. group - (pd.df) subset of data frame by family type
        2. income - (int64) total income the family earns
    OUTPUT:
        1. aid - (pd.Series) a series containing eitc housing aid for each family type for a given income
    DESCRIPTION:
        The function is basically the max(r - (y+e)*.3, 0) but if we are at income levels that don't qualify
        a family type for EITC benefits then we need to output a corresponding vector of NAN values so that
        the groupby operation doesn't error out on its concatenation step.
    """
    details = group[group['Nominal Earnings'] == income]
    if details.shape[0] > 0:
        aid = fmr[details.r_type]-details.haus_share.iloc[0]
        aid[aid<0] = 0
    else:
        aid = pd.DataFrame(np.array([np.nan]*fmr.shape[0]))
        aid.index = fmr.index
        #aid.columns = [group.r_type.iloc[0]]
    aid.columns = ['aid']
    return(aid)

EITC data contain the amount of EITC benefits a family receives for a given income level. The goal here is to get a family's total income and what would be considered a fair share of income for housing. This is where we perform following aforementioned calculation:

$$\text{total income} = y_i + e_i(s,c)$$$$\text{housing share} = .3\times \text{ total income} $$



In [6]:

    
#calculate fair share of income for housing
eitc = eitc_o[['Nominal Earnings','level_0', 2014]].copy()
eitc['total_income'] = eitc['Nominal Earnings']+eitc[2014]
percent_of_income=.3
eitc['haus_share'] = eitc.total_income*percent_of_income

#remove "Nominal" from family description.
eitc.level_0.replace(", Nominal", "", regex=True, inplace=True)



In [7]:

    
######################################
##I. Make a vector of income bin means
######################################
min_earn = 2500
mid_earn = 37500
step = 5000
income_vect = np.linspace(min_earn,mid_earn,((mid_earn-min_earn)/step+1))

add_vect = [45000,52400]
income_vect = np.concatenate([income_vect, add_vect])

Mapping children to bedrooms

In order to estimate relate FMR data to family types, we must map the number of bedrooms to the number of children and marriage. Here, we assume that marriage does not make a difference, thus, the mapping only varies on the number of children in the family. Presently, the following is implemented:

$$ \begin{array}{ccc} \text{num of children} & & \text{num of bedrooms} \\ 0 & \to & 1 \\ 1 & \vdots & 2 \\ 2 & & 2 \\ 3+ & \to & 3 \\ \end{array} $$



In [8]:

    
##Map FMR data to EITC data (Variable)
#assigned bedrooms to child counts
repl_dict ={'Married, 0 Kid':'fmr1', 'Married, 1 Kid':'fmr2', 'Married, 2 Kids':'fmr2',
       'Married, 3 Kids':'fmr3', 'Single, 0 Kid':'fmr1', 'Single, 1 Kid':'fmr2',
       'Single, 2 Kids':'fmr2', 'Single, 3 Kids':'fmr3'}

eitc['r_type'] = eitc.level_0.replace(repl_dict)

haus_share = eitc[['level_0', 'haus_share', 'r_type', 'Nominal Earnings']]



In [9]:

    
#reformat monthly fmr to annual cost of rent
fmr_discount = 1

Calculate amount of EITC housing aid each family type in each metro receives



In [10]:

    
################################################
##II. Group by family type and loop over incomes
################################################
groups = haus_share.groupby(by = 'level_0')

#place eachincome is placed into a dictionary entry
aid_incomes = {}
for income in income_vect:
    aid_per_income = groups.apply(calc_haus_eitc, income=income, fmr=fmr_o*fmr_discount)
    aid_incomes[income] = aid_per_income.unstack(level=0)
    
#concatenate dictionary into a 2-indexed data frame (flattened 3D)    
one_family_aid = pd.concat(aid_incomes)
one_family_aid.columns = one_family_aid.columns.levels[1]

Calculate families of each type in each income bin



In [11]:

    
#################################################################
############### 0. pre-checks and params ########################
#it doesn't seem that the total eligible for eitc matches the distributional count of incomes
print("Prop accounted for in income distribution counts\n",(tx_o[eagi_cols].sum(axis=1)/tx_o.eitc13).quantile(np.linspace(0,1,5)))









    



Prop accounted for in income distribution counts
 0.00    0.876222
0.25    0.972034
0.50    0.984475
0.75    0.990173
1.00    0.998135
dtype: float64



In [12]:

    
tx = tx_o.copy()
#################################################################
################ I. compute proportions #########################
#set proportion of ppl married - some Christian Science article (not sure if credible source)
prop_married = 1-50.2/100  

eqc_cols = tx.columns[tx.columns.str.contains("EQC\d_")]

chld_prop = tx[eqc_cols].div(tx.eitc13,axis=0)
m_chld_prop = chld_prop*prop_married
s_chld_prop = chld_prop - m_chld_prop

m_chld_prop.columns = m_chld_prop.columns + "_married"
s_chld_prop.columns = s_chld_prop.columns + "_single"

tx = pd.concat([tx, m_chld_prop,s_chld_prop],axis=1)
eqc_cols = tx.columns[tx.columns.str.contains('EQC\d_13_married|EQC\d_13_single', regex=True)]



In [13]:

    
#################################################################
############### II. multiply to across ##########################
#here I make a 3D matrix with metros, bins, types on each axis 
#then flatten it into a 2D data frame. 

#Implicit broadcasting across two 2D matrices into a 3D matrix
C_3D=np.einsum('ij,ik->jik',tx[eagi_cols],tx[eqc_cols])

#flatten into a pandas dataframe
C_2D=pd.Panel(np.rollaxis(C_3D,2)).to_frame()

C_2D.columns = one_family_aid.columns

C_2D.index = one_family_aid.index



In [41]:

    
pop_figs = C_2D.sum(axis=1).groupby(level=1).sum().round(0)

Calculate total cost



In [14]:

    
##################################################################
############### aggregate aid and filers #########################
disaggregated =np.multiply(C_2D, one_family_aid)

#summing once gives us metro-level totals -> summing that gives us total
total = disaggregated.sum(axis=1).sum()
output_container['base_sim'] = total



In [15]:

    
print("Total EITC Housing Aid cost: %.2f billion" %(output_container['base_sim']/1e9))









    



Total EITC Housing Aid cost: 121.05 billion

Supplemental Questions

varying parameters on original sim
number metros where families earning 52,400 recieve EITC housing aid
what if no one receives more aid than the median income earning family

Varying parameters



In [16]:

    
def vary_params(prcnt_of_income = .3,fmr_discount = 1,prop_married=prop_married, repl_dict = repl_dict, fmr=fmr_o[fmr_cols], tx = tx_o):

    ######################################
    ##I. Make a vector of income bin means
    ######################################
    min_earn = 2500
    mid_earn = 37500
    step = 5000
    income_vect = np.linspace(min_earn,mid_earn,((mid_earn-min_earn)/step+1))

    add_vect = [45000,52400]
    income_vect = np.concatenate([income_vect, add_vect])

    #calculate fair share of income for housing
    eitc = eitc_o[['Nominal Earnings','level_0', 2014]].copy()
    eitc['total_income'] = eitc['Nominal Earnings']+eitc[2014]
    eitc['haus_share'] = eitc.total_income*prcnt_of_income

    #remove "Nominal" from family description.
    eitc.level_0.replace(", Nominal", "", regex=True, inplace=True)

    eitc['r_type'] = eitc.level_0.replace(repl_dict)

    haus_share = eitc[['level_0', 'haus_share', 'r_type', 'Nominal Earnings']]

    #reformat monthly fmr to annual cost of rent
    #fmr = fmr*fmr_discount

    ################################################
    ##II. Group by family type and loop over incomes
    ################################################
    groups = haus_share.groupby(by = 'level_0')

    #place eachincome is placed into a dictionary entry
    aid_incomes = {}
    for income in income_vect:
        aid_per_income = groups.apply(calc_haus_eitc, income=income, fmr = fmr*fmr_discount)
        aid_incomes[income] = aid_per_income.unstack(level=0)

    #concatenate dictionary into a 2-indexed data frame (flattened 3D)    
    one_family_aid = pd.concat(aid_incomes)
    one_family_aid.columns = one_family_aid.columns.levels[1]

    #################################################################
    ################ I. compute proportions #########################
    eqc_cols = tx.columns[tx.columns.str.contains("EQC\d_")]

    chld_prop = tx[eqc_cols].div(tx.eitc13,axis=0)
    m_chld_prop = chld_prop*prop_married
    s_chld_prop = chld_prop - m_chld_prop

    m_chld_prop.columns = m_chld_prop.columns + "_married"
    s_chld_prop.columns = s_chld_prop.columns + "_single"

    tx = pd.concat([tx, m_chld_prop,s_chld_prop],axis=1)
    eqc_cols = tx.columns[tx.columns.str.contains('EQC\d_13_married|EQC\d_13_single', regex=True)]

    #################################################################
    ############### II. multiply to across ##########################
    #here I make a 3D matrix with metros, bins, types on each axis 
    #then flatten it into a 2D data frame. 

    #Implicit broadcasting across two 2D matrices into a 3D matrix
    C_3D=np.einsum('ij,ik->jik',tx[eagi_cols],tx[eqc_cols])

    #flatten into a pandas dataframe
    C_2D=pd.Panel(np.rollaxis(C_3D,2)).to_frame()

    C_2D.columns = one_family_aid.columns

    C_2D.index = one_family_aid.index

    ##################################################################
    ############### aggregate aid and filers #########################
    disaggregated =np.multiply(C_2D, one_family_aid)

    #summing once gives us metro-level totals -> summing that gives us total
    total = disaggregated.sum(axis=1).sum()
    return(total)



In [ ]:

    
#Check NY area which is '35620'
disaggregated.groupby(level=1).sum().loc['35620'].sum()



In [17]:

    
#test out sim function:
print("%.2f" %(vary_params()/1e9))

Test various cases:

marriage proportion is 40%
support only 80% and 90% of FMR
allow a higher income proportion to go to housing: 40 and 45
put singles with no children into studios



In [18]:

    
output_container['low_marr_prop'] = vary_params(prop_married = .4)
output_container['sub_fmr_rent_80'] = vary_params(fmr_discount=.8)
output_container['sub_fmr_rent_90'] = vary_params(fmr_discount=.9)

output_container['haus_share_40'] = vary_params(prcnt_of_income = .4)
output_container['haus_share_50'] = vary_params(prcnt_of_income = .5)

repl_dict_studio ={'Married, 0 Kid':'fmr0', 'Married, 1 Kid':'fmr2', 'Married, 2 Kids':'fmr2',
       'Married, 3 Kids':'fmr3', 'Single, 0 Kid':'fmr0', 'Single, 1 Kid':'fmr2',
       'Single, 2 Kids':'fmr2', 'Single, 3 Kids':'fmr3'}
output_container['tight_living'] = vary_params(repl_dict = repl_dict_studio)

output_container['sub_fmr_rent_90_haus_share_40'] = vary_params(prcnt_of_income = .4, fmr_discount = .9)



In [19]:

    
output_container









    Out[19]:





{'base_sim': 121048065013.81049,
 'haus_share_40': 93715802445.02422,
 'haus_share_50': 73102699122.02,
 'low_marr_prop': 120861615671.89114,
 'sub_fmr_rent_80': 79891037223.741,
 'sub_fmr_rent_90': 99999745586.42644,
 'sub_fmr_rent_90_haus_share_40': 75420693594.25081,
 'tight_living': 117088153961.4179}

Capping EITC benefits at the city median income



In [21]:

    
quantiles = np.divide(C_3D,np.cumsum(C_3D, axis=0))



In [22]:

    
#find the median income values
"""
Cool stuff: in essence, we have the following algorithm
-consider only values greater than .5 in the cumulative percentiles of each income bin
-find the minimum on the masked array -> these are the medians
-use the rows set as true as the identifiers of the median income in the index
"""
masked_quantiles = ma.masked_where(quantiles<=.5,quantiles)
median_idx = masked_quantiles.argmin(0)
median_idx = np.where(masked_quantiles == masked_quantiles.min(axis=0))
median_mat = np.zeros(quantiles.shape, dtype=bool)
median_mat[median_idx] = True



In [23]:

    
median_mat.shape









    Out[23]:





(10, 357, 8)



In [24]:

    
#flatten into a pandas dataframe
median_2D=pd.Panel(np.rollaxis(median_mat,2)).to_frame()

median_2D.columns = one_family_aid.columns

median_2D.index = one_family_aid.index



In [25]:

    
#just double check the results from above
np.divide(tx[eagi_cols].cumsum(axis=1),tx[eagi_cols].sum(axis=1).reshape(357,1)).head(n=3)



In [26]:

    
#the amount of aid the median income family in each metro is getting
median_metro_aid = one_family_aid[median_2D].reset_index(level=0, drop=True).dropna()



In [27]:

    
#the amount of aid the median income family in each metro is getting
median_metro_aid = one_family_aid[median_2D].reset_index(level=0, drop=True).dropna()
comparison = pd.concat([median_metro_aid]*10)

#inefficient way of making the comparison matrix the same size as the original matrix
comparison.index = one_family_aid.index

#also inefficient but copy in order to not lose the original matrix
X = one_family_aid.copy()

#set the values that are above the median to np.nan
X[comparison <= one_family_aid] = np.nan

#fill nan values with the median values
X.fillna(comparison, inplace=True)



In [28]:

    
##################################################################
############### aggregate aid and filers #########################
disaggregated_med =np.multiply(C_2D, X)

#summing once gives us metro-level totals -> summing that gives us total
output_container['median_capped'] = disaggregated_med.sum(axis=1).sum()



In [29]:

    
print("Total EITC Housing Aid cost: %.2f billion" %(output_container['median_capped']/1e9))









    



Total EITC Housing Aid cost: 100.94 billion



In [30]:

    
outcome_series = (pd.Series(output_container)/1e9).round(2).to_frame()
outcome_series.columns = ["total cost"]



In [31]:

    
outcome_series.sort_values(by='total cost')









    Out[31]:






  
    
      
      total cost
    
  
  
    
      haus_share_50
      73.10
    
    
      sub_fmr_rent_90_haus_share_40
      75.42
    
    
      sub_fmr_rent_80
      79.89
    
    
      haus_share_40
      93.72
    
    
      sub_fmr_rent_90
      100.00
    
    
      median_capped
      100.94
    
    
      tight_living
      117.09
    
    
      low_marr_prop
      120.86
    
    
      base_sim
      121.05

Get Number of metros where families earn 52,400



In [32]:

    
#Cacl number of metros where family earning ... 
max_inc_aid = (one_family_aid.loc[52400]>0).sum(axis=1).sum()
print("Number of metros where family earning 52400 gets EITC housing aid: %d" %max_inc_aid)

expens_cbsas = one_family_aid.loc[52400,'Married, 3 Kids'].loc[(one_family_aid.loc[52400, "Married, 3 Kids"]>0)].index









    



Number of metros where family earning 52400 gets EITC housing aid: 55



In [33]:

    
fmr_o.loc[expens_cbsas].Areaname[:5]









    Out[33]:





11260                       Anchorage, AK
12100         Atlantic City-Hammonton, NJ
12420    Austin-Round Rock-San Marcos, TX
12580                Baltimore-Towson, MD
12700                 Barnstable Town, MA
Name: Areaname, dtype: object

Make some additional output tables

20 top and bottom cities that contribute most in terms of cost



In [52]:

    
##make the top 20 and bottom 20 table of expenditures
cost_per_metro = disaggregated.sum(axis=1).unstack(level=0).sum(axis=1)



In [53]:

    
cost_per_metro = pd.concat([cost_per_metro, pop_figs], axis=1)
cost_per_metro.columns = ['cost', 'recipients']
cost_per_metro['cost_per_recipient'] = (cost_per_metro.cost/cost_per_metro.recipients).round(0)



In [61]:

    
cost_per_metro.head()
cost_per_metro.sort_values("cost", inplace=True)



In [62]:

    
bottom_20_cbsa = cost_per_metro.iloc[:20].index
bottom_20_costs = pd.concat([fmr_o.loc[bottom_20_cbsa].Areaname, cost_per_metro.iloc[:20][['cost','cost_per_recipient']]],axis=1)
bottom_20_costs.columns = ['Metro','Cost', 'CPR']
bottom_20_costs.Cost =(bottom_20_costs.Cost/1e6).round(2)
bottom_20_costs.rename(columns = {'Cost':"Cost in millions"},inplace=True)



In [63]:

    
bottom_20_costs.head()









    Out[63]:






  
    
      
      Metro
      Cost in millions
      CPR
    
  
  
    
      11180
      Ames, IA
      10.49
      3268.0
    
    
      30300
      Lewiston, ID-WA
      11.26
      2429.0
    
    
      22540
      Fond du Lac, WI
      13.51
      2469.0
    
    
      24500
      Great Falls, MT
      15.74
      2344.0
    
    
      24220
      Grand Forks, ND-MN
      16.16
      2989.0



In [64]:

    
top_20_cbsa = cost_per_metro.iloc[-20:].index
top_20_costs = pd.concat([fmr_o.loc[top_20_cbsa].Areaname, cost_per_metro.iloc[-20:][['cost','cost_per_recipient']]],axis=1)
top_20_costs.columns = ['Metro','Cost', 'CPR']
top_20_costs.Cost = (top_20_costs.Cost/1e9).round(2)
top_20_costs.rename(columns = {'Cost':"Cost in billions"},inplace=True)
pd.concat([top_20_costs.reset_index(drop=True),bottom_20_costs.reset_index(drop=True)],axis=1)









    Out[64]:






  
    
      
      Metro
      Cost in billions
      CPR
      Metro
      Cost in millions
      CPR
    
  
  
    
      0
      Las Vegas-Paradise, NV
      1.09
      5976.0
      Ames, IA
      10.49
      3268.0
    
    
      1
      Seattle-Bellevue, WA
      1.27
      6523.0
      Lewiston, ID-WA
      11.26
      2429.0
    
    
      2
      Tampa-St. Petersburg-Clearwater, FL
      1.40
      5204.0
      Fond du Lac, WI
      13.51
      2469.0
    
    
      3
      Orlando-Kissimmee-Sanford, FL
      1.44
      5506.0
      Great Falls, MT
      15.74
      2344.0
    
    
      4
      Detroit-Warren-Livonia, MI
      1.62
      4431.0
      Grand Forks, ND-MN
      16.16
      2989.0
    
    
      5
      Baltimore-Towson, MD
      1.66
      8541.0
      Logan, UT-ID
      16.83
      1977.0
    
    
      6
      Phoenix-Mesa-Glendale, AZ
      1.81
      5297.0
      Pocatello, ID
      17.00
      2430.0
    
    
      7
      Boston-Cambridge-Quincy, MA-NH
      2.16
      8632.0
      Bismarck, ND
      17.84
      2917.0
    
    
      8
      San Diego-Carlsbad-San Marcos, CA
      2.21
      9284.0
      Carson City, NV
      18.01
      4172.0
    
    
      9
      Atlanta-Sandy Springs-Marietta, GA
      2.41
      4357.0
      Manhattan, KS
      18.18
      3199.0
    
    
      10
      Austin County, TX
      2.53
      4466.0
      Cumberland, MD-WV
      18.28
      2435.0
    
    
      11
      Dallas, TX
      2.67
      4454.0
      Dubuque, IA
      18.30
      3091.0
    
    
      12
      Riverside-San Bernardino-Ontario, CA
      2.94
      7032.0
      Mankato-North Mankato, MN
      18.46
      3221.0
    
    
      13
      Philadelphia-Camden-Wilmington, PA-NJ-DE-MD
      3.03
      6979.0
      Corvallis, OR
      18.86
      4378.0
    
    
      14
      Oakland-Fremont, CA
      3.13
      13749.0
      Parkersburg-Marietta-Vienna, WV-OH
      18.99
      2257.0
    
    
      15
      Washington-Arlington-Alexandria, DC-VA-MD
      3.44
      9881.0
      Casper, WY
      19.38
      3476.0
    
    
      16
      Chicago-Joliet-Naperville, IL
      3.51
      4756.0
      Gadsden, AL
      19.81
      1757.0
    
    
      17
      Fort Lauderdale, FL
      5.90
      7965.0
      Cape Girardeau-Jackson, MO-IL
      19.91
      2674.0
    
    
      18
      Los Angeles-Long Beach, CA
      12.48
      11120.0
      Kokomo, IN
      20.26
      2611.0
    
    
      19
      Bergen-Passaic, NJ
      16.75
      9955.0
      Wausau, WI
      20.46
      2282.0

A metric of EITC housing recipients. Using indexers defined at the top: $$ \sum_{t\in T} = \frac { \sum_y I(h_{l,t,y}=0) }{ \sum_y I(h_{l,t,y} \geq 0 ) } $$



In [65]:

    
yes_numbers = np.multiply(C_2D, one_family_aid>0)



In [66]:

    
non_receivers = 1-(np.divide(yes_numbers,C_2D).fillna(0).sum(axis=1)/8).unstack(level=0).sum(axis=1)/10
non_receivers.sort_values(inplace=True)
print("Average proportion of non-qualifiers: %.02f" %non_receivers.mean())









    



Average proportion of non-qualifiers: 0.43



In [67]:

    
top_low_prop=pd.concat([fmr_o.loc[non_receivers.iloc[-20:].index].Areaname, non_receivers.iloc[-20:]], axis=1).reset_index(drop=True)



In [68]:

    
top_low_prop.columns = ['metro','prop of non-recipients']



In [69]:

    
top_low_prop









    Out[69]:






  
    
      
      metro
      prop of non-recipients
    
  
  
    
      0
      Mansfield, OH
      0.5500
    
    
      1
      Hickory-Lenoir-Morganton, NC
      0.5500
    
    
      2
      Cumberland, MD-WV
      0.5500
    
    
      3
      Youngstown-Warren-Boardman, OH
      0.5500
    
    
      4
      Darlington County, SC
      0.5500
    
    
      5
      Morristown, TN
      0.5500
    
    
      6
      Pine Bluff, AR
      0.5500
    
    
      7
      Jonesboro, AR
      0.5625
    
    
      8
      Joplin, MO
      0.5625
    
    
      9
      Parkersburg-Marietta-Vienna, WV-OH
      0.5625
    
    
      10
      Steubenville-Weirton, OH-WV
      0.5625
    
    
      11
      Decatur, AL
      0.5625
    
    
      12
      Elizabethtown, KY
      0.5625
    
    
      13
      Benton County, IA
      0.5750
    
    
      14
      Kingsport-Bristol-Bristol, TN-VA
      0.5750
    
    
      15
      Owensboro, KY
      0.5750
    
    
      16
      Wheeling, WV-OH
      0.5875
    
    
      17
      Gadsden, AL
      0.5875
    
    
      18
      Fort Smith, AR-OK
      0.5875
    
    
      19
      Dothan, AL
      0.6250



In [ ]:



In [ ]:

	EAGI0_13	EAGI5_13	EAGI10_13	EAGI15_13	EAGI20_13	EAGI25_13	EAGI30_13	EAGI35_13	EAGI40_13	EAGI50_13
10180	0.093470	0.257948	0.456057	0.589849	0.704124	0.802934	0.884988	0.945133	1.000000	1.0
10420	0.115991	0.304029	0.528612	0.651629	0.745072	0.831303	0.905655	0.958074	0.999991	1.0
10500	0.057985	0.235216	0.504913	0.675505	0.797185	0.884118	0.940377	0.978090	1.000000	1.0

	total cost
haus_share_50	73.10
sub_fmr_rent_90_haus_share_40	75.42
sub_fmr_rent_80	79.89
haus_share_40	93.72
sub_fmr_rent_90	100.00
median_capped	100.94
tight_living	117.09
low_marr_prop	120.86
base_sim	121.05

	Metro	Cost in millions	CPR
11180	Ames, IA	10.49	3268.0
30300	Lewiston, ID-WA	11.26	2429.0
22540	Fond du Lac, WI	13.51	2469.0
24500	Great Falls, MT	15.74	2344.0
24220	Grand Forks, ND-MN	16.16	2989.0

	Metro	Cost in billions	CPR	Metro	Cost in millions	CPR
0	Las Vegas-Paradise, NV	1.09	5976.0	Ames, IA	10.49	3268.0
1	Seattle-Bellevue, WA	1.27	6523.0	Lewiston, ID-WA	11.26	2429.0
2	Tampa-St. Petersburg-Clearwater, FL	1.40	5204.0	Fond du Lac, WI	13.51	2469.0
3	Orlando-Kissimmee-Sanford, FL	1.44	5506.0	Great Falls, MT	15.74	2344.0
4	Detroit-Warren-Livonia, MI	1.62	4431.0	Grand Forks, ND-MN	16.16	2989.0
5	Baltimore-Towson, MD	1.66	8541.0	Logan, UT-ID	16.83	1977.0
6	Phoenix-Mesa-Glendale, AZ	1.81	5297.0	Pocatello, ID	17.00	2430.0
7	Boston-Cambridge-Quincy, MA-NH	2.16	8632.0	Bismarck, ND	17.84	2917.0
8	San Diego-Carlsbad-San Marcos, CA	2.21	9284.0	Carson City, NV	18.01	4172.0
9	Atlanta-Sandy Springs-Marietta, GA	2.41	4357.0	Manhattan, KS	18.18	3199.0
10	Austin County, TX	2.53	4466.0	Cumberland, MD-WV	18.28	2435.0
11	Dallas, TX	2.67	4454.0	Dubuque, IA	18.30	3091.0
12	Riverside-San Bernardino-Ontario, CA	2.94	7032.0	Mankato-North Mankato, MN	18.46	3221.0
13	Philadelphia-Camden-Wilmington, PA-NJ-DE-MD	3.03	6979.0	Corvallis, OR	18.86	4378.0
14	Oakland-Fremont, CA	3.13	13749.0	Parkersburg-Marietta-Vienna, WV-OH	18.99	2257.0
15	Washington-Arlington-Alexandria, DC-VA-MD	3.44	9881.0	Casper, WY	19.38	3476.0
16	Chicago-Joliet-Naperville, IL	3.51	4756.0	Gadsden, AL	19.81	1757.0
17	Fort Lauderdale, FL	5.90	7965.0	Cape Girardeau-Jackson, MO-IL	19.91	2674.0
18	Los Angeles-Long Beach, CA	12.48	11120.0	Kokomo, IN	20.26	2611.0
19	Bergen-Passaic, NJ	16.75	9955.0	Wausau, WI	20.46	2282.0

	metro	prop of non-recipients
0	Mansfield, OH	0.5500
1	Hickory-Lenoir-Morganton, NC	0.5500
2	Cumberland, MD-WV	0.5500
3	Youngstown-Warren-Boardman, OH	0.5500
4	Darlington County, SC	0.5500
5	Morristown, TN	0.5500
6	Pine Bluff, AR	0.5500
7	Jonesboro, AR	0.5625
8	Joplin, MO	0.5625
9	Parkersburg-Marietta-Vienna, WV-OH	0.5625
10	Steubenville-Weirton, OH-WV	0.5625
11	Decatur, AL	0.5625
12	Elizabethtown, KY	0.5625
13	Benton County, IA	0.5750
14	Kingsport-Bristol-Bristol, TN-VA	0.5750
15	Owensboro, KY	0.5750
16	Wheeling, WV-OH	0.5875
17	Gadsden, AL	0.5875
18	Fort Smith, AR-OK	0.5875
19	Dothan, AL	0.6250