Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Overview

In this notebook, we explore the problem of training classifier to reducing churn. That is, given that we've trained a model, how do we train another model so that the predictions don't differ much by the previous model. We show here that training for churn reduction may actually help improve accuracy as well.


In [1]:
import math
import random
import numpy as np
import pandas as pd
import warnings
from six.moves import xrange
import tensorflow.compat.v1 as tf
import tensorflow_constrained_optimization as tfco
import matplotlib.pyplot as plt

tf.disable_eager_execution()

warnings.filterwarnings('ignore')
%matplotlib inline

Reading and processing dataset.

We load the [UCI Adult dataset] and do some pre-processing. The dataset is based on census data and the goal is to predict whether someone's income is over 50k.

We preprocess the features as done in works such as [ZafarEtAl15] and [GohEtAl16]. We transform the categorical features into binary ones and transform the continuous feature into buckets based on each feature's 5 quantiles values in training.


In [2]:
CATEGORICAL_COLUMNS = [
    'workclass', 'education', 'marital_status', 'occupation', 'relationship',
    'race', 'gender', 'native_country'
]
CONTINUOUS_COLUMNS = [
    'age', 'capital_gain', 'capital_loss', 'hours_per_week', 'education_num'
]
COLUMNS = [
    'age', 'workclass', 'fnlwgt', 'education', 'education_num',
    'marital_status', 'occupation', 'relationship', 'race', 'gender',
    'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',
    'income_bracket'
]
LABEL_COLUMN = 'label'
CHURN_COLUMN = 'churn_label'

def get_data():
    train_df_raw = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", names=COLUMNS, skipinitialspace=True)
    test_df_raw = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", names=COLUMNS, skipinitialspace=True, skiprows=1)

    train_df_raw[LABEL_COLUMN] = (train_df_raw['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
    test_df_raw[LABEL_COLUMN] = (test_df_raw['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
    # Preprocessing Features
    pd.options.mode.chained_assignment = None  # default='warn'

    # Functions for preprocessing categorical and continuous columns.
    def binarize_categorical_columns(input_train_df, input_test_df, categorical_columns=[]):

        def fix_columns(input_train_df, input_test_df):
            test_df_missing_cols = set(input_train_df.columns) - set(input_test_df.columns)
            for c in test_df_missing_cols:
                input_test_df[c] = 0
                train_df_missing_cols = set(input_test_df.columns) - set(input_train_df.columns)
            for c in train_df_missing_cols:
                input_train_df[c] = 0
                input_train_df = input_train_df[input_test_df.columns]
            return input_train_df, input_test_df

        # Binarize categorical columns.
        binarized_train_df = pd.get_dummies(input_train_df, columns=categorical_columns)
        binarized_test_df = pd.get_dummies(input_test_df, columns=categorical_columns)
        # Make sure the train and test dataframes have the same binarized columns.
        fixed_train_df, fixed_test_df = fix_columns(binarized_train_df, binarized_test_df)
        return fixed_train_df, fixed_test_df

    def bucketize_continuous_column(input_train_df,
                                  input_test_df,
                                  continuous_column_name,
                                  num_quantiles=None,
                                  bins=None):
        assert (num_quantiles is None or bins is None)
        if num_quantiles is not None:
            train_quantized, bins_quantized = pd.qcut(
              input_train_df[continuous_column_name],
              num_quantiles,
              retbins=True,
              labels=False)
            input_train_df[continuous_column_name] = pd.cut(
              input_train_df[continuous_column_name], bins_quantized, labels=False)
            input_test_df[continuous_column_name] = pd.cut(
              input_test_df[continuous_column_name], bins_quantized, labels=False)
        elif bins is not None:
            input_train_df[continuous_column_name] = pd.cut(
              input_train_df[continuous_column_name], bins, labels=False)
            input_test_df[continuous_column_name] = pd.cut(
              input_test_df[continuous_column_name], bins, labels=False)

    # Filter out all columns except the ones specified.
    train_df = train_df_raw[CATEGORICAL_COLUMNS + CONTINUOUS_COLUMNS + [LABEL_COLUMN]]
    test_df = test_df_raw[CATEGORICAL_COLUMNS + CONTINUOUS_COLUMNS + [LABEL_COLUMN]]
    
    # Bucketize continuous columns.
    bucketize_continuous_column(train_df, test_df, 'age', num_quantiles=4)
    bucketize_continuous_column(train_df, test_df, 'capital_gain', bins=[-1, 1, 4000, 10000, 100000])
    bucketize_continuous_column(train_df, test_df, 'capital_loss', bins=[-1, 1, 1800, 1950, 4500])
    bucketize_continuous_column(train_df, test_df, 'hours_per_week', bins=[0, 39, 41, 50, 100])
    bucketize_continuous_column(train_df, test_df, 'education_num', bins=[0, 8, 9, 11, 16])
    train_df, test_df = binarize_categorical_columns(train_df, test_df, categorical_columns=CATEGORICAL_COLUMNS + CONTINUOUS_COLUMNS)
    feature_names = list(train_df.keys())
    feature_names.remove(LABEL_COLUMN)
    num_features = len(feature_names)
    
    return train_df, test_df, feature_names

train_df, test_df, FEATURE_NAMES = get_data()

Model.

We will use a simple neural network model as the initial classifier. Then we train a linear model with churn constraints to ensure that the linear model's predictions don't differ by much from that of the neural network.

In the following code, we initialize the placeholders and model. In build_train_op, we set up the constrained optimization problem. We create a rate context for the entire dataset to get the error rate on the training data with respect to the labels. We then create a separate rate context to calculate the error rate on the training data with respect to the outputs of the initial model. We then construct a minimization problem using RateMinimizationProblem and use the LagrangianOptimizerV1 as the solver. build_train_op initializes a training operation which will later be used to actually train the model.


In [3]:
def _construct_model(input_tensor, hidden_units=None):
    hidden = input_tensor
    if hidden_units:
        hidden = tf.layers.dense(
            inputs=input_tensor,
            units=hidden_units,
            activation=tf.nn.relu)
    output = tf.layers.dense(
        inputs=hidden,
        units=1,
        activation=None)
    return output

class Model(object):
    def __init__(self,
                 hidden_units=None,
                 max_churn_rate=0.05):
        tf.random.set_random_seed(123)
        self.max_churn_rate = max_churn_rate
        num_features = len(FEATURE_NAMES)
        self.features_placeholder = tf.placeholder(
            tf.float32, shape=(None, num_features), name='features_placeholder')
        self.labels_placeholder = tf.placeholder(
            tf.float32, shape=(None, 1), name='labels_placeholder')
        self.churn_placeholder = tf.placeholder(
            tf.float32, shape=(None, 1), name='churn_placeholder')
        # We use a linear model.
        self.predictions_tensor = _construct_model(self.features_placeholder, hidden_units=hidden_units)


    def build_train_op(self,
                       learning_rate,
                       train_with_churn=False):
        ctx = tfco.rate_context(self.predictions_tensor, self.labels_placeholder)
        ctx_churn = tfco.rate_context(self.predictions_tensor, self.churn_placeholder)
        constraints = [tfco.error_rate(ctx_churn) <= self.max_churn_rate] if train_with_churn else []
        mp = tfco.RateMinimizationProblem(tfco.error_rate(ctx), constraints)
        opt = tfco.LagrangianOptimizerV1(tf.train.AdamOptimizer(learning_rate))
        self.train_op = opt.minimize(mp)
        return self.train_op
  
    def feed_dict_helper(self, dataframe):
        return {self.features_placeholder:
                  dataframe[FEATURE_NAMES],
                self.labels_placeholder:
                  dataframe[[LABEL_COLUMN]],
                self.churn_placeholder: dataframe[[CHURN_COLUMN]]}

Training.

Below is the function which performs the training of our constrained optimization problem. Each call to the function does one epoch through the dataset and then yields the training and testing predictions.


In [4]:
def training_generator(model,
                       train_df,
                       test_df,
                       minibatch_size,
                       num_iterations_per_loop=1,
                       num_loops=1):
    random.seed(31337)
    num_rows = train_df.shape[0]
    minibatch_size = min(minibatch_size, num_rows)
    permutation = list(range(train_df.shape[0]))
    random.shuffle(permutation)

    session = tf.Session()
    session.run((tf.global_variables_initializer(),
               tf.local_variables_initializer()))

    minibatch_start_index = 0
    for n in xrange(num_loops):
        for _ in xrange(num_iterations_per_loop):
            minibatch_indices = []
            while len(minibatch_indices) < minibatch_size:
                minibatch_end_index = (
                minibatch_start_index + minibatch_size - len(minibatch_indices))
                if minibatch_end_index >= num_rows:
                    minibatch_indices += range(minibatch_start_index, num_rows)
                    minibatch_start_index = 0
                else:
                    minibatch_indices += range(minibatch_start_index, minibatch_end_index)
                    minibatch_start_index = minibatch_end_index
                    
            session.run(
                  model.train_op,
                  feed_dict=model.feed_dict_helper(
                      train_df.iloc[[permutation[ii] for ii in minibatch_indices]]))

        train_predictions = session.run(
            model.predictions_tensor,
            feed_dict=model.feed_dict_helper(train_df))
        test_predictions = session.run(
            model.predictions_tensor,
            feed_dict=model.feed_dict_helper(test_df))

        yield (train_predictions, test_predictions)

Computing accuracy and constraint metrics.


In [5]:
def error_rate(predictions, labels):
    signed_labels = (
      (labels > 0).astype(np.float32) - (labels <= 0).astype(np.float32))
    numerator = (np.multiply(signed_labels, predictions) <= 0).sum()
    denominator = predictions.shape[0]
    return float(numerator) / float(denominator)

def _get_error_rate_and_constraints(df, max_churn_rate):
    """Computes the error and constraint violations."""
    error_rate_local = error_rate(df[['predictions']], df[[LABEL_COLUMN]])
    error_rate_churn = error_rate(df[['predictions']], df[[CHURN_COLUMN]])
    return error_rate_local, error_rate_churn - max_churn_rate

def training_helper(model,
                    train_df,
                    test_df,
                    minibatch_size,
                    num_iterations_per_loop=1,
                    num_loops=1):
    train_error_rate_vector = []
    train_constraints_matrix = []
    test_error_rate_vector = []
    test_constraints_matrix = []
    for train, test in training_generator(
      model, train_df, test_df, minibatch_size, num_iterations_per_loop,
      num_loops):
        train_df['predictions'] = train
        test_df['predictions'] = test

        train_error_rate, train_constraints = _get_error_rate_and_constraints(
          train_df, model.max_churn_rate)
        train_error_rate_vector.append(train_error_rate)
        train_constraints_matrix.append(train_constraints)

        test_error_rate, test_constraints = _get_error_rate_and_constraints(
            test_df, model.max_churn_rate)
        test_error_rate_vector.append(test_error_rate)
        test_constraints_matrix.append(test_constraints)

    return (train_error_rate_vector, train_constraints_matrix, test_error_rate_vector, test_constraints_matrix)

Train Neural Network without constraints

We train a neural network with 10 hidden units to obtain a baseline model in which we can train for churn for subsequent models.


In [6]:
model = Model(hidden_units=10)
model.build_train_op(0.01, train_with_churn=False)

# initialize the labels for churn reduction to the true labels as placeholder.
train_df[CHURN_COLUMN] = train_df[LABEL_COLUMN]
test_df[CHURN_COLUMN] = test_df[LABEL_COLUMN]

# training_helper returns the list of errors and violations over each epoch.
train_errors, train_violations, test_errors, test_violations = training_helper(
      model,
      train_df,
      test_df,
      100,
      num_iterations_per_loop=326,
      num_loops=100)

In [7]:
print("Train Error", train_errors[-1])
print("Test Error", test_errors[-1])


Train Error 0.13086207426061852
Test Error 0.14120754253424236

Use Network Predictions as examples to reduce churn against


In [8]:
train_df[CHURN_COLUMN] = train_df["predictions"]
test_df[CHURN_COLUMN] = test_df["predictions"]

Baseline without constraints.

We now declare the model, build the training op, and then perform the training. We use a linear classifier, and train using the ADAM optimizer with learning rate 0.01, with minibatch size of 100 over 100 epochs. We first train without churn constraints to show the baseline performance. We see that without training for churn, we obtain some churn violation.

We also see that unsurprisingly, the performance of the linear model is considerably worse than that of the neural network model in both training and testing.


In [9]:
model = Model(hidden_units=None, max_churn_rate=0.025)
model.build_train_op(0.01, train_with_churn=False)

# training_helper returns the list of errors and violations over each epoch.
train_errors, train_violations, test_errors, test_violations = training_helper(
      model,
      train_df,
      test_df,
      100,
      num_iterations_per_loop=326,
      num_loops=100)

In [10]:
print("Train Error", train_errors[-1])
print("Train Violation", train_violations[-1])
print()
print("Test Error", test_errors[-1])
print("Test Violation", test_violations[-1])


Train Error 0.14296243972850955
Train Violation 0.01940895549890974

Test Error 0.1428659173269455
Test Violation 0.01879337878508691

Training with churn constraints.

We now train our linear model with churn constraints so that the linear model's predictions don't differ from that of the neural network by too much. We set the threshold to 0.025 so that the goal is to train for accuracy while ensuring that we only deviate from the network outputs by 2.5%.

Interestingly, not only do we get very close to succeeding in enforcing this churn constraint, we see that the overall accuracy of the linear model improves when compared to training the linear model without the churn constraint.


In [11]:
model = Model(hidden_units=None, max_churn_rate=0.025)
model.build_train_op(0.01, train_with_churn=True)

# training_helper returns the list of errors and violations over each epoch.
train_errors, train_violations, test_errors, test_violations = training_helper(
      model,
      train_df,
      test_df,
      100,
      num_iterations_per_loop=326,
      num_loops=100)

In [12]:
print("Train Error", train_errors[-1])
print("Train Violation", train_violations[-1])
print()
print("Test Error", test_errors[-1])
print("Test Violation", test_violations[-1])


Train Error 0.13906206811830105
Train Violation 0.000767021897361872

Test Error 0.14102327866838646
Test Violation 0.0029466863214790244