In [0]:

    
#@title Copyright 2019 The Empirical Calibration Authors.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================

Causal Inference of Lalonde

View source on GitHub

Causal Inference of Lalonde

Background

Imports

Data

Experiment benchmark

Difference Estimator

Regression Estimator

Observational controls

Empirical Calibration

Prepare Balancing Constraints

Entropy Balancing

Quadratic Balancing

Imposing an Upper Bound on Weights

We illustrate the use of empirical calibration on the observational part of the LaLonde data to evaluate the effect of a job training program.

Background

The LaLonde (1986) data is a canonical benchmark in the causal inference literature. It consists of three groups for evaluating the effect of a large scale job training program — the National Supported Work Demonstration (NSW):

An experimental treatment group with 185 observations.
An experimental control group with 260 observations.
An observational control group drawn from the Current Population Survey (CPS), with 15,992 observations.

The outcome variable is the post-intervention earnings in 1978. The following pre-intervention covariates are available for all three groups.

Name	Description
age	age.
education	years of schooling.
black	1 if black, 0 otherwise.
hispanic	1 if hispanic, 0 otherwise.
married	1 if married, 0 otherwise.
nodegree	1 if not completed high school degree, 0 otherwise.
earnings1974	pre-intervention earnings in 1974.
earnings1975	pre-intervention earnings in 1975.

Imports



In [0]:

    
import numpy as np
import pandas as pd
import patsy
import seaborn as sns
import statsmodels.api as sm

# install and import ec
!pip install -q git+https://github.com/google/empirical_calibration
import empirical_calibration as ec

sns.set_style('whitegrid')
%config InlineBackend.figure_format='retina'
%precision 3

Data

The data are downloaded from Dehejia and Wahba (1999)'s data hosting website via functions under ec.data.lalonde.

Experiment benchmark

The first column is the treatment assignment. The last column is the post-intervention outcome variable. The remaining 8 columns are pre-intervention covariates.



In [0]:

    
#@title experimentl treatment group
treated = ec.data.lalonde.experimental_treated()
treated.head()









    Out[0]:







  
    
      
      treatment
      age
      education
      black
      hispanic
      married
      nodegree
      earnings1974
      earnings1975
      earnings1978
    
  
  
    
      0
      1.0
      37.0
      11.0
      1.0
      0.0
      1.0
      1.0
      0.0
      0.0
      9930.0460
    
    
      1
      1.0
      22.0
      9.0
      0.0
      1.0
      0.0
      1.0
      0.0
      0.0
      3595.8940
    
    
      2
      1.0
      30.0
      12.0
      1.0
      0.0
      0.0
      0.0
      0.0
      0.0
      24909.4500
    
    
      3
      1.0
      27.0
      11.0
      1.0
      0.0
      0.0
      1.0
      0.0
      0.0
      7506.1460
    
    
      4
      1.0
      33.0
      8.0
      1.0
      0.0
      0.0
      1.0
      0.0
      0.0
      289.7899



In [0]:

    
#@title experimental control group
control = ec.data.lalonde.experimental_control()
control.head()









    Out[0]:







  
    
      
      treatment
      age
      education
      black
      hispanic
      married
      nodegree
      earnings1974
      earnings1975
      earnings1978
    
  
  
    
      0
      0.0
      23.0
      10.0
      1.0
      0.0
      0.0
      1.0
      0.0
      0.0
      0.00
    
    
      1
      0.0
      26.0
      12.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      12383.68
    
    
      2
      0.0
      22.0
      9.0
      1.0
      0.0
      0.0
      1.0
      0.0
      0.0
      0.00
    
    
      3
      0.0
      18.0
      9.0
      1.0
      0.0
      0.0
      1.0
      0.0
      0.0
      10740.08
    
    
      4
      0.0
      45.0
      11.0
      1.0
      0.0
      0.0
      1.0
      0.0
      0.0
      11796.47

We use two methods to estimate the average treatment effects on the treated.

Difference Estimator

Using a simple difference in means, we estimate that the program increased the average earnings of the treated by $1794.



In [0]:

    
diff = treated["earnings1978"].mean() - control["earnings1978"].mean()
diff









    Out[0]:





1794.342

Using the estimator for the sampling variance (see formula 6.18 of Imbens and Rubin (2014)), we estimate the 95% $z$-interval of the difference as [479, 3109].



In [0]:

    
sd = np.sqrt(
    treated["earnings1978"].var(ddof=1) / len(treated["earnings1978"]) +
    control["earnings1978"].var(ddof=1) / len(control["earnings1978"]))
sd









    Out[0]:





670.997

Regression Estimator

The experimental treated and experimental control group are not perfectly balanced.



In [0]:

    
df = pd.concat([treated, control])
print(df.groupby("treatment").mean().T)









    



treatment             0.0          1.0
age             25.053846    25.816216
education       10.088462    10.345946
black            0.826923     0.843243
hispanic         0.107692     0.059459
married          0.153846     0.189189
nodegree         0.834615     0.708108
earnings1974  2107.026658  2095.573689
earnings1975  1266.909002  1532.055314
earnings1978  4554.801126  6349.143530

Thus, we can fit a linear outcome regression model to "clean up" the imbalance. It yields an estimated average treatment effect on the treated of $1698 with a slightly more efficient 95% confidence interval [458, 2938].



In [0]:

    
lm = sm.OLS(df["earnings1978"], df.iloc[:, :-1])
print(lm.fit().summary())









    



                            OLS Regression Results                            
==============================================================================
Dep. Variable:           earnings1978   R-squared:                       0.424
Model:                            OLS   Adj. R-squared:                  0.412
Method:                 Least Squares   F-statistic:                     35.62
Date:                Thu, 11 Jul 2019   Prob (F-statistic):           4.51e-47
Time:                        16:09:58   Log-Likelihood:                -4534.2
No. Observations:                 445   AIC:                             9086.
Df Residuals:                     436   BIC:                             9123.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
================================================================================
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
treatment     1698.3166    630.972      2.692      0.007     458.192    2938.441
age             59.7197     41.093      1.453      0.147     -21.046     140.486
education      441.4534    114.275      3.863      0.000     216.855     666.052
black        -2088.4813   1127.214     -1.853      0.065   -4303.930     126.967
hispanic       273.2172   1475.048      0.185      0.853   -2625.871    3172.306
married       -162.1316    873.006     -0.186      0.853   -1877.954    1553.691
nodegree        89.9847    728.406      0.124      0.902   -1341.639    1521.609
earnings1974     0.0840      0.077      1.093      0.275      -0.067       0.235
earnings1975     0.0506      0.135      0.375      0.708      -0.215       0.316
==============================================================================
Omnibus:                      283.309   Durbin-Watson:                   2.060
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             3713.374
Skew:                           2.531   Prob(JB):                         0.00
Kurtosis:                      16.216   Cond. No.                     3.58e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.58e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

Observational controls



In [0]:

    
control = ec.data.lalonde.observational_control()
control.head()









    Out[0]:







  
    
      
      treatment
      age
      education
      black
      hispanic
      married
      nodegree
      earnings1974
      earnings1975
      earnings1978
    
  
  
    
      0
      0.0
      45.0
      11.0
      0.0
      0.0
      1.0
      1.0
      21516.670
      25243.550
      25564.670
    
    
      1
      0.0
      21.0
      14.0
      0.0
      0.0
      0.0
      0.0
      3175.971
      5852.565
      13496.080
    
    
      2
      0.0
      38.0
      12.0
      0.0
      0.0
      1.0
      0.0
      23039.020
      25130.760
      25564.670
    
    
      3
      0.0
      48.0
      6.0
      0.0
      0.0
      1.0
      1.0
      24994.370
      25243.550
      25564.670
    
    
      4
      0.0
      18.0
      8.0
      0.0
      0.0
      1.0
      1.0
      1669.295
      10727.610
      9860.869

We create two additional indicators umemployed1974 and umemployed1975 based on the earnings.



In [0]:

    
treated["unemployed1974"] = treated["earnings1974"] == 0
treated["unemployed1975"] = treated["earnings1975"] == 0
control["unemployed1974"] = control["earnings1974"] == 0
control["unemployed1975"] = control["earnings1975"] == 0

The observational control group is significantly different from the experimental treatment group.



In [0]:

    
df = pd.concat([treated, control])
print(df.groupby("treatment").mean().T)









    



treatment                0.0          1.0
age                33.225238    25.816216
education          12.027514    10.345946
black               0.073537     0.843243
hispanic            0.072036     0.059459
married             0.711731     0.189189
nodegree            0.295835     0.708108
earnings1974    14016.800304  2095.573689
earnings1975    13650.803376  1532.055314
earnings1978    14846.659673  6349.143530
unemployed1974      0.119622     0.708108
unemployed1975      0.109305     0.600000

The experimental treated subjects tend to be younger, with less education, black, single, without high school degree, and have smaller pre-intervention earnings.



In [0]:

    
for name in df.columns[1:]:
  axes = df.hist(column=name, by=df["treatment"], figsize=(10, 2))
  axes[0].set_xlabel(name)
  axes[1].set_xlabel(name)

Empirical Calibration

Empirical calibration assigns weights to the observational control subjects such that the reweighted control group equates the treated group on moments of covariates. The weights are also sought to be as uniform as possible.

Prepare Balancing Constraints

There are 8 raw pre-intervention covariates. Simply balancing on their first moment might not be sufficient. We follow Hainmueller (2011)'s analysis and consider a total of 52 balancing constraints, which include single terms, selected one-way interactions, and selected squared terms. Note that balancing on all 52 transformed covariates might be excessive, we choose them simply to reproduce Hainmueller's results.

We use patsy's formula API to construct the covariate matrices, where ":" indicates interaction, and "-" means removes the term on the right.



In [0]:

    
formula = (
    "(age + education + black + hispanic + married + nodegree + earnings1974 +"
    " earnings1975 + unemployed1974 + unemployed1975)**2 + I(age**2) + "
    "I(education**2) - black:hispanic - education:nodegree - "
    "earnings1974:earnings1975 - earnings1974:unemployed1974 - "
    "earnings1975:unemployed1975"
)



In [0]:

    
treated2 = patsy.dmatrix(formula, treated, return_type="dataframe").iloc[:, 1:]
control2 = patsy.dmatrix(formula, control, return_type="dataframe").iloc[:, 1:]

The full list of the 52 balancing constraints is shown below.



In [0]:

    
for i, name in enumerate(control2.columns):
  print("%s: %s" % (i, name))









    



0: unemployed1974[T.True]
1: unemployed1975[T.True]
2: unemployed1974[T.True]:unemployed1975[T.True]
3: age
4: age:unemployed1974[T.True]
5: age:unemployed1975[T.True]
6: education
7: education:unemployed1974[T.True]
8: education:unemployed1975[T.True]
9: black
10: black:unemployed1974[T.True]
11: black:unemployed1975[T.True]
12: hispanic
13: hispanic:unemployed1974[T.True]
14: hispanic:unemployed1975[T.True]
15: married
16: married:unemployed1974[T.True]
17: married:unemployed1975[T.True]
18: nodegree
19: nodegree:unemployed1974[T.True]
20: nodegree:unemployed1975[T.True]
21: earnings1974
22: earnings1974:unemployed1975[T.True]
23: earnings1975
24: earnings1975:unemployed1974[T.True]
25: age:education
26: age:black
27: age:hispanic
28: age:married
29: age:nodegree
30: age:earnings1974
31: age:earnings1975
32: education:black
33: education:hispanic
34: education:married
35: education:earnings1974
36: education:earnings1975
37: black:married
38: black:nodegree
39: black:earnings1974
40: black:earnings1975
41: hispanic:married
42: hispanic:nodegree
43: hispanic:earnings1974
44: hispanic:earnings1975
45: married:nodegree
46: married:earnings1974
47: married:earnings1975
48: nodegree:earnings1974
49: nodegree:earnings1975
50: I(age ** 2)
51: I(education ** 2)

Entropy Balancing

We first use the default ENTROPY objective which mimizes the Kullback-Leibler divergence between the balancing weights and uniform weights.



In [0]:

    
weights, success = ec.calibrate(covariates=control2.values,
                                target_covariates=treated2.values)
success









    Out[0]:





False

It turns out that we need to relax the equality constraint a tiny bit to ensure convergence.



In [0]:

    
weights, success = ec.calibrate(covariates=control2.values,
                                target_covariates=treated2.values,
                                l2_norm=1e-5)
success









    Out[0]:





True

Top 3 control observations account for 12% of total weight.



In [0]:

    
pd.Series(weights).plot(style='o', figsize=(12, 3))









    Out[0]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fbf31da0bd0>

The effective sample size of the control group is merely 103.4, a huge reduction from the original sample size 15992.



In [0]:

    
effective_control_size = 1.0 / np.square(weights).sum()
effective_control_size









    Out[0]:





103.351

We confirm that the means of the raw covariates are matched after weighting.



In [0]:

    
pd.concat([treated.mean(axis=0),
           pd.Series(np.matmul(control.astype('float64').T, weights),
                     index=treated.columns)], axis=1)









    Out[0]:







  
    
      
      0
      1
    
  
  
    
      treatment
      1.000000
      0.000000
    
    
      age
      25.816216
      25.816215
    
    
      education
      10.345946
      10.345945
    
    
      black
      0.843243
      0.843237
    
    
      hispanic
      0.059459
      0.059458
    
    
      married
      0.189189
      0.189194
    
    
      nodegree
      0.708108
      0.708108
    
    
      earnings1974
      2095.573689
      2095.573689
    
    
      earnings1975
      1532.055314
      1532.055314
    
    
      earnings1978
      6349.143530
      4777.895179
    
    
      unemployed1974
      0.708108
      0.708109
    
    
      unemployed1975
      0.600000
      0.600000

The difference in means between the treated group and the reweighted control group yields an average treatment effect on the treated of $1571, which reproduces Hainmueller (2011)'s result.



In [0]:

    
diff = treated["earnings1978"].mean() - np.sum(control["earnings1978"] * weights)
diff









    Out[0]:





1571.248

Under the homoskedastic assumption (i.e., equal unit-level conditional variance), we estimate the variance using formula prescribed in section 19.7 of Imbens and Rubin (2015).



In [0]:

    
sd = np.sqrt(treated["earnings1978"].var(ddof=1) *
             (1 / len(treated["earnings1978"]) + 1 / effective_control_size))
sd









    Out[0]:





773.882

The 95% confidence interval is [55, 3088], wider than the reported [97, 3044] by Hainmueller (2011) (it is unclear what method was used there).



In [0]:

    
(diff - 1.96 * sd, diff + 1.96 * sd)









    Out[0]:





(54.440, 3088.057)

Quadratic Balancing

We then use the QUADRATIC objective which minimizes the Euclidean distance between the balancing weights and the uniform weights.



In [0]:

    
weights, success = ec.calibrate(
    covariates=control2.values,
    target_covariates=treated2.values,
    objective=ec.Objective.QUADRATIC,
    l2_norm=1e-2)
success









    Out[0]:





True

Due to being the explicit optimizing objective, the effective control sample size increases.



In [0]:

    
effective_control_size = 1.0 / np.square(weights).sum()
effective_control_size









    Out[0]:





150.193



In [0]:

    
pd.Series(weights).plot(style='o', figsize=(12, 3))









    Out[0]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fbf31604a90>

The point estimate $1713 is getting closer to the experimental benchmark $1794.



In [0]:

    
diff = treated["earnings1978"].mean() - np.sum(control["earnings1978"] * weights)
diff









    Out[0]:





1713.270

The 95% confidence interval becomes narrower due to increased effective control sample size.



In [0]:

    
sd = np.sqrt(treated["earnings1978"].var(ddof=1) *
             (1 / len(treated["earnings1978"]) + 1 / effective_control_size))
sd









    Out[0]:





641.958



In [0]:

    
(diff - 1.96 * sd, diff + 1.96 * sd)









    Out[0]:





(455.032, 2971.507)

Imposing an Upper Bound on Weights

For both entropy balancing and quadratic balancing, one can optionally impose an upper bound on the weights to avoid extreme individual weights. For illustration, we impose an upper bound 0.015 on quadratic balancing weights, i.e., no single control individual accounts for 1.5% or more of total weights.



In [0]:

    
weights, success = ec.calibrate(
    covariates=control2.values,
    target_covariates=treated2.values,
    objective=ec.Objective.QUADRATIC,
    max_weight=0.015,
    l2_norm=1e-2)
success









    Out[0]:





True

We can confirm the resulting weights are indeed capped at 0.015.



In [0]:

    
pd.Series(weights).plot(style='o', figsize=(12, 3))









    Out[0]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fbf310f4710>

The cost of this additional constraint is a small loss in effective sample size.



In [0]:

    
effective_control_size = 1.0 / np.square(weights).sum()
effective_control_size









    Out[0]:





146.060

Note that this additional weights constraint is an integral part of the optimization, as opposed to a separate trimming step used with propensity score weighting type method. We don't sacrifice the balancing conditions.



In [0]:

    
pd.concat([treated.mean(axis=0),
           pd.Series(np.matmul(control.astype('float64').T, weights),
                     index=treated.columns)], axis=1)









    Out[0]:







  
    
      
      0
      1
    
  
  
    
      treatment
      1.000000
      0.000000
    
    
      age
      25.816216
      25.814217
    
    
      education
      10.345946
      10.343364
    
    
      black
      0.843243
      0.843853
    
    
      hispanic
      0.059459
      0.059537
    
    
      married
      0.189189
      0.191219
    
    
      nodegree
      0.708108
      0.713018
    
    
      earnings1974
      2095.573689
      2095.573688
    
    
      earnings1975
      1532.055314
      1532.055313
    
    
      earnings1978
      6349.143530
      4510.605801
    
    
      unemployed1974
      0.708108
      0.710463
    
    
      unemployed1975
      0.600000
      0.601687

The point estimate is very close to the experimental estimate $1794.



In [0]:

    
diff = treated["earnings1978"].mean() - np.sum(control["earnings1978"] * weights)
diff









    Out[0]:





1838.538

	treatment	age	education	black	hispanic	married	nodegree	earnings1978
0	1.0	37.0	11.0	1.0	0.0	1.0	1.0	9930.0460
1	1.0	22.0	9.0	0.0	1.0	0.0	1.0	3595.8940
2	1.0	30.0	12.0	1.0	0.0	0.0	0.0	24909.4500
3	1.0	27.0	11.0	1.0	0.0	0.0	1.0	7506.1460
4	1.0	33.0	8.0	1.0	0.0	0.0	1.0	289.7899

	age	education	black	nodegree	earnings1978
0	23.0	10.0	1.0	1.0	0.00
1	26.0	12.0	0.0	0.0	12383.68
2	22.0	9.0	1.0	1.0	0.00
3	18.0	9.0	1.0	1.0	10740.08
4	45.0	11.0	1.0	1.0	11796.47

	age	education	married	nodegree	earnings1974	earnings1975	earnings1978
0	45.0	11.0	1.0	1.0	21516.670	25243.550	25564.670
1	21.0	14.0	0.0	0.0	3175.971	5852.565	13496.080
2	38.0	12.0	1.0	0.0	23039.020	25130.760	25564.670
3	48.0	6.0	1.0	1.0	24994.370	25243.550	25564.670
4	18.0	8.0	1.0	1.0	1669.295	10727.610	9860.869

	0	1
treatment	1.000000	0.000000
age	25.816216	25.816215
education	10.345946	10.345945
black	0.843243	0.843237
hispanic	0.059459	0.059458
married	0.189189	0.189194
nodegree	0.708108	0.708108
earnings1974	2095.573689	2095.573689
earnings1975	1532.055314	1532.055314
earnings1978	6349.143530	4777.895179
unemployed1974	0.708108	0.708109
unemployed1975	0.600000	0.600000