Decision support systems (DSS) are information systems that help people make decisions in a particular context like medical diagnosis, loan-granting, and hiring. DSSs have traditionally been built on expert-derived rules-based methods, but as machine learning (ML) is integrated into these systems, we need better tools to measure and mitigate discriminatory patterns in both training data and the predictions made by ML models.
This tutorial introduces themis-ml [1], an open source Python library for measuring and reducing potential discrimination (PD) in machine learning systems.
At a high level, themis-ml defines discrimination as something that occurs when an action is based on biases that systematically benefits one group of people over another based on certain social attributes (legally known as protected classes such as race, gender, and religion).
In the machine learning context, this means that an ML model is discriminatory if it generates predictions that systematically benefits one social group over another. Themis-ml builds on the sklearn API [2] to provide a fairness-aware machine learning interface (FMLI), which defines an interface that incorporates discrimination discovery and fairness-aware methods into a typical ML workflow.
You find the origin source code for this document here.
In this tutorial we'll use the German Credit Dataset to build intuition around the a few of the main concepts around discrimination discovery and fairness-aware machine learning. Specifically, this tutorial will go over how to:
credit risk
and two socially sensitive attributes sex
and foreigner
.estimators
provided by
themis-ml with a baseline fairness-unaware model.Currently Python 2.7 and 3.6 are supported. You can use conda or pip to install themis-ml.
pip install themis-ml
It's highly recommended that you use conda. To install conda, follow the instructions here.
# create virtual environment
conda create -n themis-ml python=<version> # <version> = 2.7 or 3.6
# install themis-ml
conda install -c cosmicbboy themis-ml
In [1]:
from themis_ml.datasets import german_credit
raw_data = german_credit(raw=True)
raw_data.head()
Out[1]:
For the purposes of this tutorial, we'll use mean difference as our measure of potential discrimination with respect to a binary target variable credit risk and two protected classes sex and immigration status.
This metric belongs to a class of group-level discrimination measures that captures
differences in outcome between populations, e.g. female
vs. male
[3]. In contrast,
individual-level measures capture differences in outcome between an individual and
a set of similar peers, where similarity is formalized as some distance function
parameterized by the feature space.
The assumptions that we'll make in measuring PD in this exercise are the following:
In [2]:
from themis_ml.metrics import mean_difference
# target variable
# values: 1 = low credit risk, 0 = high credit risk
credit_risk = raw_data["credit_risk"]
# display frequency counts of each value
credit_risk.value_counts()
Out[2]:
In [3]:
# get sex of the individual from the "personal_status_and_sex" column.
# values: 1 = female, 0 = male
sex = raw_data["personal_status_and_sex"].map(
lambda x: {"male": 0, "female": 1}[x.split("_")[0]])
# display frequency counts of each value
sex.value_counts()
Out[3]:
In [4]:
# get foreign worker status: 1 = yes, 0 = no
foreign = raw_data["foreign_worker"]
# display frequency counts of each value
foreign.value_counts()
Out[4]:
In [5]:
print("Mean difference scores:")
print("protected class = sex: %0.03f, 95%% CI [%0.03f-%0.03f]" %
mean_difference(credit_risk, sex))
# 0.0748013090229
print("protected class = foreign: %0.03f, 95%% CI [%0.03f-%0.03f]" %
mean_difference(credit_risk, foreign))
# 0.199264685246
The mean differences above suggest that men and citizen workers are more likely to have low credit risks compared to women and foreign workers, respectively.
The themis-ml
metrics functions compute confidence intervals by default,
since it's important to come up with confidence bounds for our estimate
of potential discrimination.
In this experiment, we specify three conditions, all using LogisticRegression
as the classifier to keep things simple:
In this toy example, we'll be using mean_difference
as our "fairness" metric,
and area under the curve auc
as our "utility" metric.
ROC works by training an initial classifier on your dataset $D$, generating predicted probabilities on the test set, and then computing the proximity of each prediction to the decision boundary learned by the classifier [5].
Within this boundary defined by the critical region threshold $\theta$, where $0.5 < \theta < 1$, $X_d$ are assigned a label $y = 1$ and $X_a$ are assigned as $y = 0$, where $X_d$ are disadvantaged observations and $X_a$ are advantaged observations.
ACF, as described by [6] within the framework of counterfactual fairness, is the idea that we model the correlations between $s$ and features in $X$ by training linear models to predict each feature $X_j$ using $s$ as input.
Then, we can compute the residuals $\epsilon_{ij}$ between predicted and true feature values for each observation $i$ and feature $j$. The final model is then trained on $\epsilon_{ij}$ as features to predict $y$.
Before training our models, we first need to set up our experiment harness, which includes the features sets, inputs $X$, output $y$, and protected attributes $s_{sex}$ and $s_{foreigner}$.
Here we'll use 5-fold, 20x RepeatedStratifiedKFold
in order to obtain a better estimate of test performance and fairness.
In [6]:
from sklearn.model_selection import train_test_split
# load model-ready data
training_data = german_credit()
training_data.head()
# define feature sets:
# 1. including protected attributes
features = [
'duration_in_month', 'credit_amount', 'installment_rate_in_percentage_of_disposable_income',
'present_residence_since', 'age_in_years', 'number_of_existing_credits_at_this_bank',
'number_of_people_being_liable_to_provide_maintenance_for', 'status_of_existing_checking_account',
'savings_account/bonds', 'present_employment_since', 'job', 'telephone', 'foreign_worker',
'credit_history_all_credits_at_this_bank_paid_back_duly',
'credit_history_critical_account/other_credits_existing_not_at_this_bank',
'credit_history_delay_in_paying_off_in_the_past', 'credit_history_existing_credits_paid_back_duly_till_now',
'credit_history_no_credits_taken/all_credits_paid_back_duly', 'purpose_business',
'purpose_car_(new)', 'purpose_car_(used)', 'purpose_domestic_appliances', 'purpose_education',
'purpose_furniture/equipment', 'purpose_others', 'purpose_radio/television',
'purpose_repairs', 'purpose_retraining', 'personal_status_and_sex_female_divorced/separated/married',
'personal_status_and_sex_male_divorced/separated', 'personal_status_and_sex_male_married/widowed',
'personal_status_and_sex_male_single', 'other_debtors/guarantors_co-applicant',
'other_debtors/guarantors_guarantor', 'other_debtors/guarantors_none',
'property_building_society_savings_agreement/life_insurance',
'property_car_or_other', 'property_real_estate', 'property_unknown/no_property',
'other_installment_plans_bank', 'other_installment_plans_none',
'other_installment_plans_stores', 'housing_for free', 'housing_own', 'housing_rent',
]
# 2. removing variables related to sex
features_no_sex = [
f for f in features if f not in [
'personal_status_and_sex_female_divorced/separated/married',
'personal_status_and_sex_male_divorced/separated',
'personal_status_and_sex_male_married/widowed',
'personal_status_and_sex_male_single']]
# 3. removing variables related to immigration status
features_no_frn = [f for f in features if f != "foreign_worker"]
X = training_data[features].values
X_no_sex = training_data[features_no_sex].values
X_no_frn = training_data[features_no_frn].values
y = training_data["credit_risk"].values
s_sex = sex.values
s_frn = foreign.values
Now we're ready to train our models.
In [12]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.base import clone
from themis_ml.postprocessing.reject_option_classification import \
SingleROClassifier
from themis_ml.linear_model import LinearACFClassifier
METRICS_COLUMNS = [
"mean_diff_sex", "mean_diff_foreign", "auc_sex", "auc_foreign"]
def run_experiment_iteration(
X, X_no_sex, X_no_frn, y, s_sex, s_frn, train, test):
"""Run the experiment on a particular set of train and test indices."""
# store our metrics here. This will be a list of lists, where the inner
# list is contains the following metadata:
# - "name"
# - fairness metric with respect to sex
# - fairness metric with respect to foreign status
# - utility metric with respect to sex
# - utility metric with respect to foreign status
metrics = []
# define our model.
logistic_clf = LogisticRegression(penalty="l2", C=0.001, class_weight="balanced")
baseline_clf = logistic_clf
rpa_clf = logistic_clf
roc_clf = SingleROClassifier(estimator=logistic_clf)
acf_clf = LinearACFClassifier(
target_estimator=logistic_clf,
binary_residual_type="absolute")
# train baseline model
baseline_clf.fit(X[train], y[train])
baseline_preds = baseline_clf.predict(X[test])
baseline_auc = roc_auc_score(y[test], baseline_preds)
metrics.append([
"B",
mean_difference(baseline_preds, s_sex[test])[0],
mean_difference(baseline_preds, s_frn[test])[0],
baseline_auc, baseline_auc # repeated because the two AUC values are the
# same in the baseline case
])
# train "remove protected attributes" model. Here we have to train two
# seperate ones for sex and foreign status.
# model trained with no explicitly sex-related variables
rpa_preds_no_sex = rpa_clf.fit(
X_no_sex[train], y[train]).predict(X_no_sex[test])
# model trained with no explicitly foreign-related variables
rpa_preds_no_frn = rpa_clf.fit(
X_no_frn[train], y[train]).predict(X_no_frn[test])
metrics.append([
"RPA",
mean_difference(rpa_preds_no_sex, s_sex[test])[0],
mean_difference(rpa_preds_no_frn, s_frn[test])[0],
roc_auc_score(y[test], rpa_preds_no_sex),
roc_auc_score(y[test], rpa_preds_no_frn),
])
# train reject-option classification model.
roc_clf.fit(X[train], y[train])
roc_preds_sex = roc_clf.predict(X[test], s_sex[test])
roc_preds_frn = roc_clf.predict(X[test], s_frn[test])
metrics.append([
"ROC",
mean_difference(roc_preds_sex, s_sex[test])[0],
mean_difference(roc_preds_frn, s_frn[test])[0],
roc_auc_score(y[test], roc_preds_sex),
roc_auc_score(y[test], roc_preds_frn),
])
# train additive counterfactually fair model.
acf_preds_sex = acf_clf.fit(
X[train], y[train], s_sex[train]).predict(X[test], s_sex[test])
acf_preds_frn = acf_clf.fit(
X[train], y[train], s_frn[train]).predict(X[test], s_frn[test])
metrics.append([
"ACF",
mean_difference(acf_preds_sex, s_sex[test])[0],
mean_difference(acf_preds_frn, s_frn[test])[0],
roc_auc_score(y[test], acf_preds_sex),
roc_auc_score(y[test], acf_preds_frn),
])
# convert metrics list of lists into dataframe
return pd.DataFrame(
metrics, columns=["condition"] + METRICS_COLUMNS)
In [16]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
N_SPLITS = 5
N_REPEATS = 20
# add these two binary variables so that we can stratify the observations
# by protected class
groups = np.add(s_sex, s_frn)
# do 5-fold, 10x repeating cross validation so that we quantify the
# uncertainty around our metrics estimates.
cv = RepeatedStratifiedKFold(
n_splits=N_SPLITS, n_repeats=N_REPEATS, random_state=41)
metrics = []
print("Running cross-validation experiment...")
for i, (train_idx, test_idx) in enumerate(cv.split(X, y, groups=groups)):
metrics.append(
run_experiment_iteration(
X, X_no_sex, X_no_frn, y, s_sex, s_frn, train_idx, test_idx)
.assign(rep_fold=i))
# concatenate metrics from all cv-folds
metrics = pd.concat(metrics)
# compute mean point estimate for each metric and each condition
group_df = metrics.groupby("condition")
mean_metrics = (
group_df
[METRICS_COLUMNS].mean()
)
# compute standard error of the mean
stderr_metrics = (
group_df
[METRICS_COLUMNS].std()
) / np.sqrt(N_REPEATS * N_SPLITS)
In [17]:
# plot vertical bar chart
ax = mean_metrics.loc[reversed(["B", "RPA", "ROC", "ACF"])].plot(
kind="barh", figsize=(10, 6),
xerr=stderr_metrics.loc[reversed(["B", "RPA", "ROC", "ACF"])],
legend=False);
ax.legend(loc='upper right', bbox_to_anchor=(1.3, 0.6))
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.tick_params(axis='y', which='both', left='off')
ax.set_title(
"Fairness (mean diff) and Utility (auc) Metrics", fontsize=16);
In [20]:
mean_metrics.loc[["B", "RPA", "ROC", "ACF"]].rename(
columns=lambda x: "mean(%s)" % x)
Out[20]:
We'll conclude this tutorial by interpreting the results in the de-biasing experiment we just ran.
In the plot that we just made, we can note a few interesting things:
mean difference
by
roughly 2% points compared to baseline (B), mean auc
are approximately
the same between the two conditions.mean difference
when removing the $s_{foreigner}$
variable (RPA) compared to baseline (B), highlighting the fact that the naive
fairness-aware approach of removing sensitive attributes doesn't necessarily
result in a fairer model.mean difference
compared to baseline (~13% points for $s_{sex}$ and ~20% points
for $s_{foreigner}$, meaning that the prediction made by these models reduce
potential discrimination, but at the cost of about 5% points of auc
.mean difference
with respect to
$s_{sex}$, meaning that we're actually now slightly favoring women over men
when predicting the beneficial low credit risk
outcome.mean difference
with respect to $s_{sex}$ and a ~19% point reduction
with respect to $s_{foreigner}$. However, unlike ROC, we maintain an auc
of
about 61%, even though we're making fairer predictions.These observations highlight the fact that with certain methods like ROC, we
see evidence of the fairness-utility tradeoff, but with others, like ACF,
it's possible to produce a model that reduces potential discrimination in the
predictions while preserving its utility with respect to some measure of
predictive power (in this case, auc
).
themis-ml
is designed to be a flexible tool for measuring and reducing potential discrimination in the supervised learning setting for classification tasks in any arbitrary dataset. Future development will add support for fairness-aware regression estimators, as well as other cases such as multi-class classification and multi-valued protected class attributes.
You can read in your data as a pandas.DataFrame
or numpy.array
, and you should be able to use the Estimator
and Scorer
APIs as you just did with the German Credit dataset.
In this exercise, you looked at a fairly simplistic example of how one might de-bias a classifier, but clearly the real world is much more complicated, so if you'd like to contribute to this project, please feel free to submit issues in the github repo, and pull requests are welcome!