Licensed under the Apache License, Version 2.0 (the 'License'); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

This colab contains TensorFlow code for implementing the constrained optimization methods presented in the paper:

Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Serena Wang, 'Pairwise Fairness for Ranking and Regression', AAAI 2020. [link]

First, let's install and import the relevant libraries.


In [0]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
import sys
from sklearn import model_selection
import tensorflow as tf

We will need the TensorFlow Constrained Optimization (TFCO) library.


In [0]:
!pip install git+https://github.com/google-research/tensorflow_constrained_optimization

In [0]:
import tensorflow_constrained_optimization as tfco

Pairwise Ranking Fairness

We will be training a linear ranking model $f(x) = w^\top x$ where $x \in \mathbb{R}^d$ is a set of features for a query-document pair. Our goal is to train the model such that it accurately ranks the positive documents in a query above the negative ones.

Specifically, for the ranking model $f$, we denote:

  • $err(f)$ as the pairwise ranking error for model $f$ over all pairs of positive and negative documents $$ err(f) = \mathbf{E}\big[\mathbb{I}\big(f(x) < f(x')\big) \,\big|\, y = 1,~ y' = 0\big] $$
  • $err_{i,j}(f)$ as the pairwise ranking error over positive-negative document pairs where the pos. document is from group $i$, and the neg. document is from group $j$.
$$ err_{i, j}(f) = \mathbf{E}\big[\mathbb{I}\big(f(x) < f(x')\big) \,\big|\, y = 1,~ y' = 0,~ grp(x) = i, ~grp(x') = j\big] $$


We then wish to solve the following constrained problem: $$min_f\; err(f)$$ $$\text{ s.t. } |err_{i,j}(f) - err_{k,\ell}(f)| \leq \epsilon \;\;\; \forall ((i,j), (k,\ell)) \in \mathcal{G},$$

where $\mathcal{G}$ contains the pairs we are interested in constraining.

Load Communities & Crime Data

We will use the benchmark Communities and Crimes dataset from the UCI Machine Learning repository for our illustration. This dataset contains various demographic and racial distribution details (aggregated from census and law enforcement data sources) about different communities in the US, along with the per capita crime rate in each commmunity. As is commonly done in the literature, we will bin the crime rate attribute into two categories: "low crime" and "high crime", and formulate the task of ranking the communities such that the high crime ones are above the low crime ones. We consider communities where the percentage of black population is above the 70-th percentile as the protected group.


In [0]:
# We will divide the data into 10 batches, and treat each of them as a query.
num_queries = 10

# List of column names in the dataset.
column_names = ["state", "county", "community", "communityname", "fold", "population", "householdsize", "racepctblack", "racePctWhite", "racePctAsian", "racePctHisp", "agePct12t21", "agePct12t29", "agePct16t24", "agePct65up", "numbUrban", "pctUrban", "medIncome", "pctWWage", "pctWFarmSelf", "pctWInvInc", "pctWSocSec", "pctWPubAsst", "pctWRetire", "medFamInc", "perCapInc", "whitePerCap", "blackPerCap", "indianPerCap", "AsianPerCap", "OtherPerCap", "HispPerCap", "NumUnderPov", "PctPopUnderPov", "PctLess9thGrade", "PctNotHSGrad", "PctBSorMore", "PctUnemployed", "PctEmploy", "PctEmplManu", "PctEmplProfServ", "PctOccupManu", "PctOccupMgmtProf", "MalePctDivorce", "MalePctNevMarr", "FemalePctDiv", "TotalPctDiv", "PersPerFam", "PctFam2Par", "PctKids2Par", "PctYoungKids2Par", "PctTeen2Par", "PctWorkMomYoungKids", "PctWorkMom", "NumIlleg", "PctIlleg", "NumImmig", "PctImmigRecent", "PctImmigRec5", "PctImmigRec8", "PctImmigRec10", "PctRecentImmig", "PctRecImmig5", "PctRecImmig8", "PctRecImmig10", "PctSpeakEnglOnly", "PctNotSpeakEnglWell", "PctLargHouseFam", "PctLargHouseOccup", "PersPerOccupHous", "PersPerOwnOccHous", "PersPerRentOccHous", "PctPersOwnOccup", "PctPersDenseHous", "PctHousLess3BR", "MedNumBR", "HousVacant", "PctHousOccup", "PctHousOwnOcc", "PctVacantBoarded", "PctVacMore6Mos", "MedYrHousBuilt", "PctHousNoPhone", "PctWOFullPlumb", "OwnOccLowQuart", "OwnOccMedVal", "OwnOccHiQuart", "RentLowQ", "RentMedian", "RentHighQ", "MedRent", "MedRentPctHousInc", "MedOwnCostPctInc", "MedOwnCostPctIncNoMtg", "NumInShelters", "NumStreet", "PctForeignBorn", "PctBornSameState", "PctSameHouse85", "PctSameCity85", "PctSameState85", "LemasSwornFT", "LemasSwFTPerPop", "LemasSwFTFieldOps", "LemasSwFTFieldPerPop", "LemasTotalReq", "LemasTotReqPerPop", "PolicReqPerOffic", "PolicPerPop", "RacialMatchCommPol", "PctPolicWhite", "PctPolicBlack", "PctPolicHisp", "PctPolicAsian", "PctPolicMinor", "OfficAssgnDrugUnits", "NumKindsDrugsSeiz", "PolicAveOTWorked", "LandArea", "PopDens", "PctUsePubTrans", "PolicCars", "PolicOperBudg", "LemasPctPolicOnPatr", "LemasGangUnitDeploy", "LemasPctOfficDrugUn", "PolicBudgPerPop", "ViolentCrimesPerPop"]

dataset_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.data"

# Read dataset from the UCI web repository and assign column names.
data_df = pd.read_csv(dataset_url, sep=",", names=column_names,
                      na_values="?")

# Make sure that there are no missing values in the "ViolentCrimesPerPop" column.
assert(not data_df["ViolentCrimesPerPop"].isna().any())

# Binarize the "ViolentCrimesPerPop" column and obtain labels.
crime_rate_70_percentile = data_df["ViolentCrimesPerPop"].quantile(q=0.7)
labels_df = (data_df["ViolentCrimesPerPop"] >= crime_rate_70_percentile)

# Now that we have assigned binary labels, 
# we drop the "ViolentCrimesPerPop" column from the data frame.
data_df.drop(columns="ViolentCrimesPerPop", inplace=True)

# Group features.
race_black_70_percentile = data_df["racepctblack"].quantile(q=0.7)
groups_df = (data_df["racepctblack"] >= race_black_70_percentile)

# Drop categorical features.
data_df.drop(columns=["state", "county", "community", "communityname", "fold"],
             inplace=True)

# Handle missing features.
feature_names = data_df.columns
for feature_name in feature_names:  
    missing_rows = data_df[feature_name].isna()  # Which rows have missing values?
    if missing_rows.any():  # Check if at least one row has a missing value.
        data_df[feature_name].fillna(0.0, inplace=True)  # Fill NaN with 0.
        missing_rows.rename(feature_name + "_is_missing", inplace=True)
        data_df = data_df.join(missing_rows)  # Append boolean "is_missing" feature.

labels = labels_df.values.astype(np.float32)
groups = groups_df.values.astype(np.float32)
features = data_df.values.astype(np.float32)

# Set random seed so that the results are reproducible.
np.random.seed(123456)

# We randomly divide the examples into 'num_queries' queries.
queries = np.random.randint(0, num_queries, size=features.shape[0])

# Train, vali and test indices.
train_indices, test_indices = model_selection.train_test_split(
    range(features.shape[0]), test_size=0.4)

# Train features, labels and protected groups.
train_set = {
  'features': features[train_indices, :],
  'labels': labels[train_indices],
  'groups': groups[train_indices],
  'queries': queries[train_indices],
  'dimension': features.shape[-1],
  'num_queries': num_queries
}

# Test features, labels and protected groups.
test_set = {
  'features': features[test_indices, :],
  'labels': labels[test_indices],
  'groups': groups[test_indices],
  'queries': queries[test_indices],
  'dimension': features.shape[-1],
  'num_queries': num_queries
}

Evaluation Metrics

We will need functions to convert labeled data into paired data.


In [0]:
def pair_pos_neg_docs(data):
  # Returns a DataFrame of pairs of positive-negative docs from given DataFrame.
  # Separate pos and neg docs.
  pos_docs = data[data.label == 1]
  if pos_docs.empty:
    return
  neg_docs = data[data.label == 0]
  if neg_docs.empty:
    return

  # Include a merge key.
  pos_docs.insert(0, 'merge_key', 0)
  neg_docs.insert(0, 'merge_key', 0)

  # Merge docs and drop merge key column.
  pairs = pos_docs.merge(neg_docs, on='merge_key', how='outer',
                         suffixes=('_pos', '_neg'))
  pairs.drop(columns=['merge_key'], inplace=True)
  return pairs


def convert_labeled_to_paired_data(data_dict, index=None):
  # Forms pairs of examples from each batch/query.

  # Converts data arrays to pandas DataFrame with required column names and
  # makes a call to convert_df_to_pairs and returns a dictionary.
  features = data_dict['features']
  labels = data_dict['labels']
  groups = data_dict['groups']
  queries = data_dict['queries']

  if index is not None:
    data_df = pd.DataFrame(features[queries == index, :])
    data_df = data_df.assign(label=pd.DataFrame(labels[queries == index]))
    data_df = data_df.assign(group=pd.DataFrame(groups[queries == index]))
    data_df = data_df.assign(query_id=pd.DataFrame(queries[queries == index]))
  else:
    data_df = pd.DataFrame(features)
    data_df = data_df.assign(label=pd.DataFrame(labels))
    data_df = data_df.assign(group=pd.DataFrame(groups))
    data_df = data_df.assign(query_id=pd.DataFrame(queries))

  # Forms pairs of positive-negative docs for each query in given DataFrame
  # if the DataFrame has a query_id column. Otherise forms pairs from all rows
  # of the DataFrame.
  data_pairs = data_df.groupby('query_id').apply(pair_pos_neg_docs)

  # Create groups ndarray.
  pos_groups = data_pairs['group_pos'].values.reshape(-1, 1)
  neg_groups = data_pairs['group_neg'].values.reshape(-1, 1)
  group_pairs = np.concatenate((pos_groups, neg_groups), axis=1)

  # Create queries ndarray.
  queries = data_pairs['query_id_pos'].values.reshape(-1,)

  # Create features ndarray.
  feature_names = data_df.columns
  feature_names = feature_names.drop(['query_id', 'label'])
  feature_names = feature_names.drop(['group'])

  pos_features = data_pairs[[str(s) + '_pos' for s in feature_names]].values
  pos_features = pos_features.reshape(-1, 1, len(feature_names))

  neg_features = data_pairs[[str(s) + '_neg' for s in feature_names]].values
  neg_features = neg_features.reshape(-1, 1, len(feature_names))

  features_pairs = np.concatenate((pos_features, neg_features), axis=1)

  # Paired data dict.
  paired_data = {
      'features': features_pairs, 
      'groups': group_pairs, 
      'queries': queries,
      'dimension': data_dict['dimension'],
      'num_queries': data_dict['num_queries']
  }

  return paired_data

We will also need functions to evaluate the pairwise error rates for a linear model.


In [0]:
def get_mask(groups, pos_group, neg_group=None):
  # Returns a boolean mask selecting positive-negative document pairs where 
  # the protected group for  the positive document is pos_group and 
  # the protected group for the negative document (if specified) is neg_group.
  # Repeat group membership positive docs as many times as negative docs.
  mask_pos = groups[:, 0] == pos_group
  
  if neg_group is None:
    return mask_pos
  else:
    mask_neg = groups[:, 1] == neg_group
    return mask_pos & mask_neg


def error_rate(model, dataset):
  # Returns error rate for Keras model on dataset.
  d = dataset['dimension']
  scores0 = model.predict(dataset['features'][:, 0, 0:d].reshape(-1, d))
  scores1 = model.predict(dataset['features'][:, 1, 0:d].reshape(-1, d))
  diff = scores0 - scores1  
  return np.mean(diff.reshape((-1)) < 0)


def group_error_rate(model, dataset, pos_group, neg_group=None):
  # Returns error rate for Keras model on data set, considering only document 
  # pairs where the protected group for the positive document is pos_group, and  
  # the protected group for the negative document (if specified) is neg_group.
  d = dataset['dimension']
  scores0 = model.predict(dataset['features'][:, 0, :].reshape(-1, d))
  scores1 = model.predict(dataset['features'][:, 1, :].reshape(-1, d))
  mask = get_mask(dataset['groups'], pos_group, neg_group)
  diff = scores0 - scores1
  diff = diff[mask > 0].reshape((-1))
  return np.mean(diff < 0)

Create Linear Model

We then write a function to create the linear ranking model.


In [0]:
def create_ranking_model(features, dimension):
  # Returns a linear Keras ranking model, and returns a nullary function 
  # returning predictions on the features.

  # Linear ranking model with no hidden layers.
  # No bias included as this is a ranking problem.
  layers = []
  # Input layer takes `dimension` inputs.
  layers.append(tf.keras.Input(shape=(dimension,)))
  layers.append(tf.keras.layers.Dense(1, use_bias=False)) 
  ranking_model = tf.keras.Sequential(layers)

  # Create a nullary function that returns applies the linear model to the 
  # features and returns the tensor with the predictions.
  def predictions():
    scores0 = ranking_model(features()[:, 0, :].reshape(-1, dimension))
    scores1 = ranking_model(features()[:, 1, :].reshape(-1, dimension))
    return tf.reshape(scores0 - scores1, (-1,))

  return ranking_model, predictions

Formulate Optimization Problem

We are ready to formulate the constrained optimization problem using the TFCO library.


In [0]:
def group_mask_fn(groups, pos_group, neg_group=None):
  # Returns a nullary function returning group mask.
  group_mask = lambda: np.reshape(
      get_mask(groups(), pos_group, neg_group), (-1))
  return group_mask


def formulate_problem(
    features, groups, dimension, constraint_groups=[], constraint_slack=None):
  # Formulates a constrained problem that optimizes the error rate for a linear
  # model on the specified dataset, subject to pairwise fairness constraints 
  # specified by the constraint_groups and the constraint_slack.
  # 
  # Args:
  #   features: Nullary function returning features
  #   groups: Nullary function returning groups
  #   labels: Nullary function returning labels
  #   dimension: Input dimension for ranking model
  #   constraint_groups: List containing tuples of the form 
  #     ((pos_group0, neg_group0), (pos_group1, neg_group1)), specifying the 
  #     group memberships for the document pairs to compare in the constraints.
  #   constraint_slack: slackness '\epsilon' allowed in the constraints.
  # Returns:
  #   A RateMinimizationProblem object, and a Keras ranking model.

  # Set random seed for reproducibility.
  random.seed(333333)
  np.random.seed(121212)
  tf.random.set_seed(212121)

  # Create linear ranking model: we get back a Keras model and a nullary  
  # function returning predictions on the features.
  ranking_model, predictions = create_ranking_model(features, dimension)
  
  # Context for the optimization objective.
  context = tfco.rate_context(predictions)
  
  # Constraint set.
  constraint_set = []
  
  # Context for the constraints.
  for ((pos_group0, neg_group0), (pos_group1, neg_group1)) in constraint_groups:
    # Context for group 0.
    group_mask0 = group_mask_fn(groups, pos_group0, neg_group0)
    context_group0 = context.subset(group_mask0)

    # Context for group 1.
    group_mask1 = group_mask_fn(groups, pos_group1, neg_group1)
    context_group1 = context.subset(group_mask1)

    # Add constraints to constraint set.
    constraint_set.append(
        tfco.negative_prediction_rate(context_group0) <= (
            tfco.negative_prediction_rate(context_group1) + constraint_slack))
    constraint_set.append(
        tfco.negative_prediction_rate(context_group1) <= (
            tfco.negative_prediction_rate(context_group0) + constraint_slack))
  
  # Formulate constrained minimization problem.
  problem = tfco.RateMinimizationProblem(
      tfco.negative_prediction_rate(context), constraint_set)
  
  return problem, ranking_model

Train Model

The following function then trains the linear model by solving the above constrained optimization problem. We first provide a training function that performs one gradient update per query. There are three types of pairwise fairness criterion we handle (specified by 'constraint_type'), and assign the (pos_group, neg_group) pairs to compare accordingly.


In [0]:
def train_model(train_set, params):
  # Trains the model with stochastic updates (one query per updates).
  #
  # Args:
  #   train_set: Dictionary of "paired" training data.
  #   params: Dictionary of hyper-paramters for training.
  #
  # Returns:
  #   Trained model, list of objectives, list of group constraint violations.

  # Set up problem and model.
  if params['constrained']:
    # Constrained optimization.
    if params['constraint_type'] == 'marginal_equal_opportunity':
      constraint_groups = [((0, None), (1, None))]
    elif params['constraint_type'] == 'cross_group_equal_opportunity':
      constraint_groups = [((0, 1), (1, 0))]
    else:
      constraint_groups = [((0, 1), (1, 0)), ((0, 0), (1, 1))]
  else:
    # Unconstrained optimization.
    constraint_groups = []

  # Dictionary that will hold batch features pairs, group pairs and labels for 
  # current batch. We include one query per-batch. 
  paired_batch = {}
  batch_index = 0  # Index of current query.

  # Data functions.
  features = lambda: paired_batch['features']
  groups = lambda: paired_batch['groups'] 

  # Create ranking model and constrained optimization problem.
  problem, ranking_model = formulate_problem(
      features, groups, train_set['dimension'], constraint_groups, 
      params['constraint_slack'])
  
  # Create a loss function for the problem.
  lagrangian_loss, update_ops, multipliers_variables = (
      tfco.create_lagrangian_loss(problem, dual_scale=params['dual_scale']))

  # Create optimizer
  optimizer = tf.keras.optimizers.Adagrad(learning_rate=params['learning_rate'])
  
  # List of trainable variables.
  var_list = (
      ranking_model.trainable_weights + problem.trainable_variables + 
      [multipliers_variables])
  
  # List of objectives, group constraint violations.
  # violations, and snapshot of models during course of training.
  objectives = []
  group_violations = []
  models = []

  features = train_set['features']
  queries = train_set['queries']
  groups = train_set['groups']

  print()
  # Run loops * iterations_per_loop full batch iterations.
  for ii in range(params['loops']):
    for jj in range(params['iterations_per_loop']):
      # Populate paired_batch dict with all pairs for current query. The batch
      # index is the same as the current query index.
      paired_batch = {
          'features': features[queries == batch_index],
          'groups': groups[queries == batch_index]
      }

      # Optimize loss.
      update_ops()
      optimizer.minimize(lagrangian_loss, var_list=var_list)

      # Update batch_index, and cycle back once last query is reached.
      batch_index = (batch_index + 1) % train_set['num_queries']
    
    # Snap shot current model.
    model_copy = tf.keras.models.clone_model(ranking_model)
    model_copy.set_weights(ranking_model.get_weights())
    models.append(model_copy)

    # Evaluate metrics for snapshotted model. 
    error, gerr, group_viol = evaluate_results(
        ranking_model, train_set, params)
    objectives.append(error)
    group_violations.append(
        [x - params['constraint_slack'] for x in group_viol])

    sys.stdout.write(
        '\r Loop %d: error = %.3f, max constraint violation = %.3f' % 
        (ii, objectives[-1], max(group_violations[-1])))
  print()
  
  if params['constrained']:
    # Find model iterate that trades-off between objective and group violations.
    best_index = tfco.find_best_candidate_index(
        np.array(objectives), np.array(group_violations), rank_objectives=False)
  else:
    # Find model iterate that achieves lowest objective.
    best_index = np.argmin(objectives)

  return models[best_index]

Summarize and Plot Results

Having trained a model, we will need functions to summarize the various evaluation metrics.


In [0]:
def evaluate_results(model, test_set, params):
  # Returns overall, group error rates, group-level constraint violations.
  if params['constraint_type'] == 'marginal_equal_opportunity':
    g0_error = group_error_rate(model, test_set, 0)
    g1_error = group_error_rate(model, test_set, 1)
    group_violations = [g0_error - g1_error, g1_error - g0_error]
    return (error_rate(model, test_set), [g0_error, g1_error], 
            group_violations)
  else:
    g00_error = group_error_rate(model, test_set, 0, 0)
    g01_error = group_error_rate(model, test_set, 0, 1)
    g10_error = group_error_rate(model, test_set, 1, 1)
    g11_error = group_error_rate(model, test_set, 1, 1)
    group_violations_offdiag = [g01_error - g10_error, g10_error - g01_error]
    group_violations_diag = [g00_error - g11_error, g11_error - g00_error]

    if params['constraint_type'] == 'cross_group_equal_opportunity':
      return (error_rate(model, test_set), 
              [[g00_error, g01_error], [g10_error, g11_error]], 
              group_violations_offdiag)
    else:
      return (error_rate(model, test_set), 
              [[g00_error, g01_error], [g10_error, g11_error]], 
              group_violations_offdiag + group_violations_diag)
    

def display_results(
    model, test_set, params, method, error_type, show_header=False):
  # Prints evaluation results for model on test data.
  error, group_error, diffs = evaluate_results(model, test_set, params)

  if params['constraint_type'] == 'marginal_equal_opportunity':
    if show_header:
      print('\nMethod\t\t\tError\t\tOverall\t\tGroup 0\t\tGroup 1\t\tDiff')
    print('%s\t%s\t\t%.3f\t\t%.3f\t\t%.3f\t\t%.3f' % (
        method, error_type, error, group_error[0], group_error[1], 
        np.max(diffs)))
  elif params['constraint_type'] == 'cross_group_equal_opportunity':
    if show_header:
      print('\nMethod\t\t\tError\t\tOverall\t\tGroup 0/1\tGroup 1/0\tDiff')
    print('%s\t%s\t\t%.3f\t\t%.3f\t\t%.3f\t\t%.3f' % (
        method, error_type, error, group_error[0][1], group_error[1][0], 
        np.max(diffs)))
  else:
    if show_header:
      print('\nMethod\t\t\tError\t\tOverall\t\tGroup 0/1\tGroup 1/0\t' +
            'Group 0/0\tGroup 1/1\tDiff')
    print('%s\t%s\t\t%.3f\t\t%.3f\t\t%.3f\t\t%.3f\t\t%.3f\t\t%.3f' % (
        method, error_type, error, group_error[0][1], group_error[1][0], 
        group_error[0][0], group_error[1][1], np.max(diffs)))

Experimental Results

We now run experiments with two types of pairwise fairness criteria: (1) marginal_equal_opportunity and (2) pairwise equal opportunity. In each case, we compare an unconstrained model trained to optimize the error rate and a constrained model trained with pairwise fairness constraints.


In [0]:
# Convert train/test set to paired data for later evaluation.
paired_train_set = convert_labeled_to_paired_data(train_set)
paired_test_set = convert_labeled_to_paired_data(test_set)

(1) Marginal Equal Opportunity

For a ranking model $f: \mathbb{R}^d \rightarrow \mathbb{R}$, recall:

  • $err(f)$ as the pairwise ranking error for model $f$ over all pairs of positive and negative documents $$ err(f) ~=~ \mathbf{E}\big[\mathbb{I}\big(f(x) < f(x')\big) \,\big|\, y = 1,~ y' = 0\big] $$

and we additionally define:

  • $err_i(f)$ as the row-marginal pairwise error over positive-negative document pairs where the pos. document is from group $i$, and the neg. document is from either groups
$$ err_i(f) = \mathbf{E}\big[\mathbb{I}\big(f(x) < f(x')\big) \,\big|\, y = 1,~ y' = 0,~ grp(x) = i\big] $$

The constrained optimization problem we solve constraints the row-marginal pairwise errors to be similar:

$$min_f\;err(f)$$$$\text{s.t. }\;|err_0(f) - err_1(f)| \leq 0.05$$

In [44]:
# Model hyper-parameters.
model_params = {
    'loops': 10, 
    'iterations_per_loop': 250, 
    'learning_rate': 0.1,
    'constraint_type': 'marginal_equal_opportunity', 
    'constraint_slack': 0.05,
    'dual_scale': 0.1}

# Unconstrained optimization.
model_params['constrained'] = False
model_unc  = train_model(paired_train_set, model_params)
display_results(model_unc, paired_train_set, model_params, 'Unconstrained     ', 
                'Train', show_header=True)
display_results(model_unc, paired_test_set, model_params,  'Unconstrained     ', 
                'Test')

# Constrained optimization with TFCO.
model_params['constrained'] = True
model_con  = train_model(paired_train_set, model_params)
display_results(model_con, paired_train_set, model_params, 'Constrained     ', 
                'Train', show_header=True)
display_results(model_con, paired_test_set, model_params, 'Constrained     ', 
                'Test')


 Loop 9: error = 0.057, max constraint violation = 0.007

Method			Error		Overall		Group 0		Group 1		Diff
Unconstrained     	Train		0.057		0.093		0.036		0.057
Unconstrained     	Test		0.079		0.149		0.043		0.106

 Loop 9: error = 0.056, max constraint violation = 0.002

Method			Error		Overall		Group 0		Group 1		Diff
Constrained     	Train		0.063		0.093		0.044		0.049
Constrained     	Test		0.078		0.140		0.047		0.093

(2) Pairwise Equal Opportunity

Recall that we denote $err_{i,j}(f)$ as the ranking error over positive-negative document pairs where the pos. document is from group $i$, and the neg. document is from group $j$. $$ err_{i, j}(f) ~=~ \mathbf{E}\big[\mathbb{I}\big(f(x) < f(x')\big) \,\big|\, y = 1,~ y' = 0,~ grp(x) = i, ~grp(x') = j\big] $$

We first constrain only the cross-group errors, highlighted below.


Negative
Group 0 Group 1
Positive Group 0 $err_{0,0}$ $\mathbf{err_{0,1}}$
Group 1 $\mathbf{err_{1,0}}$ $err_{1,1}$

The optimization problem we solve constraints the cross-group pairwise errors to be similar:

$$min_f\; err(f)$$$$\text{s.t. }\;\; |err_{0,1}(f) - err_{1,0}(f)| \leq 0.05$$

In [45]:
# Model hyper-parameters.
model_params = {
    'loops': 10, 
    'iterations_per_loop': 250, 
    'learning_rate': 0.1,
    'constraint_type': 'cross_group_equal_opportunity', 
    'constraint_slack': 0.05,
    'dual_scale': 0.1}

# Unconstrained optimization.
model_params['constrained'] = False
model_unc  = train_model(paired_train_set, model_params)
display_results(model_unc, paired_train_set, model_params, 'Unconstrained     ', 
                'Train', show_header=True)
display_results(model_unc, paired_test_set, model_params,  'Unconstrained     ', 
                'Test')

# Constrained optimization with TFCO.
model_params['constrained'] = True
model_con  = train_model(paired_train_set, model_params)
display_results(model_con, paired_train_set, model_params, 'Constrained     ', 
                'Train', show_header=True)
display_results(model_con, paired_test_set, model_params, 'Constrained     ', 
                'Test')


 Loop 9: error = 0.057, max constraint violation = 0.109

Method			Error		Overall		Group 0/1	Group 1/0	Diff
Unconstrained     	Train		0.057		0.289		0.130		0.159
Unconstrained     	Test		0.079		0.333		0.135		0.198

 Loop 9: error = 0.074, max constraint violation = -0.041

Method			Error		Overall		Group 0/1	Group 1/0	Diff
Constrained     	Train		0.074		0.117		0.126		0.009
Constrained     	Test		0.105		0.186		0.147		0.039