In [0]:
#@title Copyright 2019 The Empirical Calibration Authors.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
We illustrate the use of empirical calibration on the observational part of the LaLonde data to evaluate the effect of a job training program.
The LaLonde (1986) data is a canonical benchmark in the causal inference literature. It consists of three groups for evaluating the effect of a large scale job training program — the National Supported Work Demonstration (NSW):
The outcome variable is the post-intervention earnings in 1978. The following pre-intervention covariates are available for all three groups.
Name | Description |
---|---|
age | age. |
education | years of schooling. |
black | 1 if black, 0 otherwise. |
hispanic | 1 if hispanic, 0 otherwise. |
married | 1 if married, 0 otherwise. |
nodegree | 1 if not completed high school degree, 0 otherwise. |
earnings1974 | pre-intervention earnings in 1974. |
earnings1975 | pre-intervention earnings in 1975. |
In [0]:
import numpy as np
import pandas as pd
import patsy
import seaborn as sns
import statsmodels.api as sm
# install and import ec
!pip install -q git+https://github.com/google/empirical_calibration
import empirical_calibration as ec
sns.set_style('whitegrid')
%config InlineBackend.figure_format='retina'
%precision 3
The data are downloaded from
Dehejia and Wahba (1999)'s
data hosting website via functions
under ec.data.lalonde
.
In [0]:
#@title experimentl treatment group
treated = ec.data.lalonde.experimental_treated()
treated.head()
Out[0]:
In [0]:
#@title experimental control group
control = ec.data.lalonde.experimental_control()
control.head()
Out[0]:
We use two methods to estimate the average treatment effects on the treated.
In [0]:
diff = treated["earnings1978"].mean() - control["earnings1978"].mean()
diff
Out[0]:
Using the estimator for the sampling variance (see formula 6.18 of Imbens and Rubin (2014)), we estimate the 95% $z$-interval of the difference as [479, 3109]
.
In [0]:
sd = np.sqrt(
treated["earnings1978"].var(ddof=1) / len(treated["earnings1978"]) +
control["earnings1978"].var(ddof=1) / len(control["earnings1978"]))
sd
Out[0]:
In [0]:
df = pd.concat([treated, control])
print(df.groupby("treatment").mean().T)
Thus, we can fit a linear outcome regression model to "clean up" the imbalance. It yields an estimated average treatment effect on the treated of $1698
with a slightly more efficient 95% confidence interval [458, 2938]
.
In [0]:
lm = sm.OLS(df["earnings1978"], df.iloc[:, :-1])
print(lm.fit().summary())
In [0]:
control = ec.data.lalonde.observational_control()
control.head()
Out[0]:
We create two additional indicators umemployed1974
and umemployed1975
based on the earnings.
In [0]:
treated["unemployed1974"] = treated["earnings1974"] == 0
treated["unemployed1975"] = treated["earnings1975"] == 0
control["unemployed1974"] = control["earnings1974"] == 0
control["unemployed1975"] = control["earnings1975"] == 0
The observational control group is significantly different from the experimental treatment group.
In [0]:
df = pd.concat([treated, control])
print(df.groupby("treatment").mean().T)
The experimental treated subjects tend to be younger, with less education, black, single, without high school degree, and have smaller pre-intervention earnings.
In [0]:
for name in df.columns[1:]:
axes = df.hist(column=name, by=df["treatment"], figsize=(10, 2))
axes[0].set_xlabel(name)
axes[1].set_xlabel(name)
Empirical calibration assigns weights to the observational control subjects such that the reweighted control group equates the treated group on moments of covariates. The weights are also sought to be as uniform as possible.
There are 8 raw pre-intervention covariates. Simply balancing on their first moment might not be sufficient. We follow Hainmueller (2011)'s analysis and consider a total of 52 balancing constraints, which include single terms, selected one-way interactions, and selected squared terms. Note that balancing on all 52 transformed covariates might be excessive, we choose them simply to reproduce Hainmueller's results.
We use patsy
's formula API to construct the covariate matrices, where ":
" indicates interaction, and "-
" means removes the term on the right.
In [0]:
formula = (
"(age + education + black + hispanic + married + nodegree + earnings1974 +"
" earnings1975 + unemployed1974 + unemployed1975)**2 + I(age**2) + "
"I(education**2) - black:hispanic - education:nodegree - "
"earnings1974:earnings1975 - earnings1974:unemployed1974 - "
"earnings1975:unemployed1975"
)
In [0]:
treated2 = patsy.dmatrix(formula, treated, return_type="dataframe").iloc[:, 1:]
control2 = patsy.dmatrix(formula, control, return_type="dataframe").iloc[:, 1:]
The full list of the 52 balancing constraints is shown below.
In [0]:
for i, name in enumerate(control2.columns):
print("%s: %s" % (i, name))
We first use the default ENTROPY
objective which mimizes the Kullback-Leibler divergence between the balancing weights and uniform weights.
In [0]:
weights, success = ec.calibrate(covariates=control2.values,
target_covariates=treated2.values)
success
Out[0]:
It turns out that we need to relax the equality constraint a tiny bit to ensure convergence.
In [0]:
weights, success = ec.calibrate(covariates=control2.values,
target_covariates=treated2.values,
l2_norm=1e-5)
success
Out[0]:
Top 3 control observations account for 12% of total weight.
In [0]:
pd.Series(weights).plot(style='o', figsize=(12, 3))
Out[0]:
The effective sample size of the control group is merely 103.4, a huge reduction from the original sample size 15992.
In [0]:
effective_control_size = 1.0 / np.square(weights).sum()
effective_control_size
Out[0]:
We confirm that the means of the raw covariates are matched after weighting.
In [0]:
pd.concat([treated.mean(axis=0),
pd.Series(np.matmul(control.astype('float64').T, weights),
index=treated.columns)], axis=1)
Out[0]:
The difference in means between the treated group and the reweighted control group yields an average treatment effect on the treated of $1571
, which reproduces Hainmueller (2011)'s result.
In [0]:
diff = treated["earnings1978"].mean() - np.sum(control["earnings1978"] * weights)
diff
Out[0]:
Under the homoskedastic assumption (i.e., equal unit-level conditional variance), we estimate the variance using formula prescribed in section 19.7 of Imbens and Rubin (2015).
In [0]:
sd = np.sqrt(treated["earnings1978"].var(ddof=1) *
(1 / len(treated["earnings1978"]) + 1 / effective_control_size))
sd
Out[0]:
The 95% confidence interval is [55, 3088]
, wider than the reported [97, 3044]
by Hainmueller (2011) (it is unclear what method was used there).
In [0]:
(diff - 1.96 * sd, diff + 1.96 * sd)
Out[0]:
We then use the QUADRATIC
objective which minimizes the Euclidean distance between the balancing weights and the uniform weights.
In [0]:
weights, success = ec.calibrate(
covariates=control2.values,
target_covariates=treated2.values,
objective=ec.Objective.QUADRATIC,
l2_norm=1e-2)
success
Out[0]:
Due to being the explicit optimizing objective, the effective control sample size increases.
In [0]:
effective_control_size = 1.0 / np.square(weights).sum()
effective_control_size
Out[0]:
In [0]:
pd.Series(weights).plot(style='o', figsize=(12, 3))
Out[0]:
The point estimate $1713
is getting closer to the experimental benchmark $1794
.
In [0]:
diff = treated["earnings1978"].mean() - np.sum(control["earnings1978"] * weights)
diff
Out[0]:
The 95% confidence interval becomes narrower due to increased effective control sample size.
In [0]:
sd = np.sqrt(treated["earnings1978"].var(ddof=1) *
(1 / len(treated["earnings1978"]) + 1 / effective_control_size))
sd
Out[0]:
In [0]:
(diff - 1.96 * sd, diff + 1.96 * sd)
Out[0]:
For both entropy balancing and quadratic balancing, one can optionally impose an upper bound on the weights to avoid extreme individual weights. For illustration, we impose an upper bound 0.015 on quadratic balancing weights, i.e., no single control individual accounts for 1.5% or more of total weights.
In [0]:
weights, success = ec.calibrate(
covariates=control2.values,
target_covariates=treated2.values,
objective=ec.Objective.QUADRATIC,
max_weight=0.015,
l2_norm=1e-2)
success
Out[0]:
We can confirm the resulting weights are indeed capped at 0.015.
In [0]:
pd.Series(weights).plot(style='o', figsize=(12, 3))
Out[0]:
The cost of this additional constraint is a small loss in effective sample size.
In [0]:
effective_control_size = 1.0 / np.square(weights).sum()
effective_control_size
Out[0]:
Note that this additional weights constraint is an integral part of the optimization, as opposed to a separate trimming step used with propensity score weighting type method. We don't sacrifice the balancing conditions.
In [0]:
pd.concat([treated.mean(axis=0),
pd.Series(np.matmul(control.astype('float64').T, weights),
index=treated.columns)], axis=1)
Out[0]:
The point estimate is very close to the experimental estimate $1794
.
In [0]:
diff = treated["earnings1978"].mean() - np.sum(control["earnings1978"] * weights)
diff
Out[0]: