Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
In this notebook, we explore the problem of training classifier to reducing churn. That is, given that we've trained a model, how do we train another model so that the predictions don't differ much by the previous model. We show here that training for churn reduction may actually help improve accuracy as well.
In [1]:
import math
import random
import numpy as np
import pandas as pd
import warnings
from six.moves import xrange
import tensorflow.compat.v1 as tf
import tensorflow_constrained_optimization as tfco
import matplotlib.pyplot as plt
tf.disable_eager_execution()
warnings.filterwarnings('ignore')
%matplotlib inline
We load the [UCI Adult dataset] and do some pre-processing. The dataset is based on census data and the goal is to predict whether someone's income is over 50k.
We preprocess the features as done in works such as [ZafarEtAl15] and [GohEtAl16]. We transform the categorical features into binary ones and transform the continuous feature into buckets based on each feature's 5 quantiles values in training.
In [2]:
CATEGORICAL_COLUMNS = [
'workclass', 'education', 'marital_status', 'occupation', 'relationship',
'race', 'gender', 'native_country'
]
CONTINUOUS_COLUMNS = [
'age', 'capital_gain', 'capital_loss', 'hours_per_week', 'education_num'
]
COLUMNS = [
'age', 'workclass', 'fnlwgt', 'education', 'education_num',
'marital_status', 'occupation', 'relationship', 'race', 'gender',
'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',
'income_bracket'
]
LABEL_COLUMN = 'label'
CHURN_COLUMN = 'churn_label'
def get_data():
train_df_raw = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", names=COLUMNS, skipinitialspace=True)
test_df_raw = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", names=COLUMNS, skipinitialspace=True, skiprows=1)
train_df_raw[LABEL_COLUMN] = (train_df_raw['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
test_df_raw[LABEL_COLUMN] = (test_df_raw['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
# Preprocessing Features
pd.options.mode.chained_assignment = None # default='warn'
# Functions for preprocessing categorical and continuous columns.
def binarize_categorical_columns(input_train_df, input_test_df, categorical_columns=[]):
def fix_columns(input_train_df, input_test_df):
test_df_missing_cols = set(input_train_df.columns) - set(input_test_df.columns)
for c in test_df_missing_cols:
input_test_df[c] = 0
train_df_missing_cols = set(input_test_df.columns) - set(input_train_df.columns)
for c in train_df_missing_cols:
input_train_df[c] = 0
input_train_df = input_train_df[input_test_df.columns]
return input_train_df, input_test_df
# Binarize categorical columns.
binarized_train_df = pd.get_dummies(input_train_df, columns=categorical_columns)
binarized_test_df = pd.get_dummies(input_test_df, columns=categorical_columns)
# Make sure the train and test dataframes have the same binarized columns.
fixed_train_df, fixed_test_df = fix_columns(binarized_train_df, binarized_test_df)
return fixed_train_df, fixed_test_df
def bucketize_continuous_column(input_train_df,
input_test_df,
continuous_column_name,
num_quantiles=None,
bins=None):
assert (num_quantiles is None or bins is None)
if num_quantiles is not None:
train_quantized, bins_quantized = pd.qcut(
input_train_df[continuous_column_name],
num_quantiles,
retbins=True,
labels=False)
input_train_df[continuous_column_name] = pd.cut(
input_train_df[continuous_column_name], bins_quantized, labels=False)
input_test_df[continuous_column_name] = pd.cut(
input_test_df[continuous_column_name], bins_quantized, labels=False)
elif bins is not None:
input_train_df[continuous_column_name] = pd.cut(
input_train_df[continuous_column_name], bins, labels=False)
input_test_df[continuous_column_name] = pd.cut(
input_test_df[continuous_column_name], bins, labels=False)
# Filter out all columns except the ones specified.
train_df = train_df_raw[CATEGORICAL_COLUMNS + CONTINUOUS_COLUMNS + [LABEL_COLUMN]]
test_df = test_df_raw[CATEGORICAL_COLUMNS + CONTINUOUS_COLUMNS + [LABEL_COLUMN]]
# Bucketize continuous columns.
bucketize_continuous_column(train_df, test_df, 'age', num_quantiles=4)
bucketize_continuous_column(train_df, test_df, 'capital_gain', bins=[-1, 1, 4000, 10000, 100000])
bucketize_continuous_column(train_df, test_df, 'capital_loss', bins=[-1, 1, 1800, 1950, 4500])
bucketize_continuous_column(train_df, test_df, 'hours_per_week', bins=[0, 39, 41, 50, 100])
bucketize_continuous_column(train_df, test_df, 'education_num', bins=[0, 8, 9, 11, 16])
train_df, test_df = binarize_categorical_columns(train_df, test_df, categorical_columns=CATEGORICAL_COLUMNS + CONTINUOUS_COLUMNS)
feature_names = list(train_df.keys())
feature_names.remove(LABEL_COLUMN)
num_features = len(feature_names)
return train_df, test_df, feature_names
train_df, test_df, FEATURE_NAMES = get_data()
We will use a simple neural network model as the initial classifier. Then we train a linear model with churn constraints to ensure that the linear model's predictions don't differ by much from that of the neural network.
In the following code, we initialize the placeholders and model. In build_train_op, we set up the constrained optimization problem. We create a rate context for the entire dataset to get the error rate on the training data with respect to the labels. We then create a separate rate context to calculate the error rate on the training data with respect to the outputs of the initial model. We then construct a minimization problem using RateMinimizationProblem and use the LagrangianOptimizerV1 as the solver. build_train_op initializes a training operation which will later be used to actually train the model.
In [3]:
def _construct_model(input_tensor, hidden_units=None):
hidden = input_tensor
if hidden_units:
hidden = tf.layers.dense(
inputs=input_tensor,
units=hidden_units,
activation=tf.nn.relu)
output = tf.layers.dense(
inputs=hidden,
units=1,
activation=None)
return output
class Model(object):
def __init__(self,
hidden_units=None,
max_churn_rate=0.05):
tf.random.set_random_seed(123)
self.max_churn_rate = max_churn_rate
num_features = len(FEATURE_NAMES)
self.features_placeholder = tf.placeholder(
tf.float32, shape=(None, num_features), name='features_placeholder')
self.labels_placeholder = tf.placeholder(
tf.float32, shape=(None, 1), name='labels_placeholder')
self.churn_placeholder = tf.placeholder(
tf.float32, shape=(None, 1), name='churn_placeholder')
# We use a linear model.
self.predictions_tensor = _construct_model(self.features_placeholder, hidden_units=hidden_units)
def build_train_op(self,
learning_rate,
train_with_churn=False):
ctx = tfco.rate_context(self.predictions_tensor, self.labels_placeholder)
ctx_churn = tfco.rate_context(self.predictions_tensor, self.churn_placeholder)
constraints = [tfco.error_rate(ctx_churn) <= self.max_churn_rate] if train_with_churn else []
mp = tfco.RateMinimizationProblem(tfco.error_rate(ctx), constraints)
opt = tfco.LagrangianOptimizerV1(tf.train.AdamOptimizer(learning_rate))
self.train_op = opt.minimize(mp)
return self.train_op
def feed_dict_helper(self, dataframe):
return {self.features_placeholder:
dataframe[FEATURE_NAMES],
self.labels_placeholder:
dataframe[[LABEL_COLUMN]],
self.churn_placeholder: dataframe[[CHURN_COLUMN]]}
In [4]:
def training_generator(model,
train_df,
test_df,
minibatch_size,
num_iterations_per_loop=1,
num_loops=1):
random.seed(31337)
num_rows = train_df.shape[0]
minibatch_size = min(minibatch_size, num_rows)
permutation = list(range(train_df.shape[0]))
random.shuffle(permutation)
session = tf.Session()
session.run((tf.global_variables_initializer(),
tf.local_variables_initializer()))
minibatch_start_index = 0
for n in xrange(num_loops):
for _ in xrange(num_iterations_per_loop):
minibatch_indices = []
while len(minibatch_indices) < minibatch_size:
minibatch_end_index = (
minibatch_start_index + minibatch_size - len(minibatch_indices))
if minibatch_end_index >= num_rows:
minibatch_indices += range(minibatch_start_index, num_rows)
minibatch_start_index = 0
else:
minibatch_indices += range(minibatch_start_index, minibatch_end_index)
minibatch_start_index = minibatch_end_index
session.run(
model.train_op,
feed_dict=model.feed_dict_helper(
train_df.iloc[[permutation[ii] for ii in minibatch_indices]]))
train_predictions = session.run(
model.predictions_tensor,
feed_dict=model.feed_dict_helper(train_df))
test_predictions = session.run(
model.predictions_tensor,
feed_dict=model.feed_dict_helper(test_df))
yield (train_predictions, test_predictions)
In [5]:
def error_rate(predictions, labels):
signed_labels = (
(labels > 0).astype(np.float32) - (labels <= 0).astype(np.float32))
numerator = (np.multiply(signed_labels, predictions) <= 0).sum()
denominator = predictions.shape[0]
return float(numerator) / float(denominator)
def _get_error_rate_and_constraints(df, max_churn_rate):
"""Computes the error and constraint violations."""
error_rate_local = error_rate(df[['predictions']], df[[LABEL_COLUMN]])
error_rate_churn = error_rate(df[['predictions']], df[[CHURN_COLUMN]])
return error_rate_local, error_rate_churn - max_churn_rate
def training_helper(model,
train_df,
test_df,
minibatch_size,
num_iterations_per_loop=1,
num_loops=1):
train_error_rate_vector = []
train_constraints_matrix = []
test_error_rate_vector = []
test_constraints_matrix = []
for train, test in training_generator(
model, train_df, test_df, minibatch_size, num_iterations_per_loop,
num_loops):
train_df['predictions'] = train
test_df['predictions'] = test
train_error_rate, train_constraints = _get_error_rate_and_constraints(
train_df, model.max_churn_rate)
train_error_rate_vector.append(train_error_rate)
train_constraints_matrix.append(train_constraints)
test_error_rate, test_constraints = _get_error_rate_and_constraints(
test_df, model.max_churn_rate)
test_error_rate_vector.append(test_error_rate)
test_constraints_matrix.append(test_constraints)
return (train_error_rate_vector, train_constraints_matrix, test_error_rate_vector, test_constraints_matrix)
In [6]:
model = Model(hidden_units=10)
model.build_train_op(0.01, train_with_churn=False)
# initialize the labels for churn reduction to the true labels as placeholder.
train_df[CHURN_COLUMN] = train_df[LABEL_COLUMN]
test_df[CHURN_COLUMN] = test_df[LABEL_COLUMN]
# training_helper returns the list of errors and violations over each epoch.
train_errors, train_violations, test_errors, test_violations = training_helper(
model,
train_df,
test_df,
100,
num_iterations_per_loop=326,
num_loops=100)
In [7]:
print("Train Error", train_errors[-1])
print("Test Error", test_errors[-1])
In [8]:
train_df[CHURN_COLUMN] = train_df["predictions"]
test_df[CHURN_COLUMN] = test_df["predictions"]
We now declare the model, build the training op, and then perform the training. We use a linear classifier, and train using the ADAM optimizer with learning rate 0.01, with minibatch size of 100 over 100 epochs. We first train without churn constraints to show the baseline performance. We see that without training for churn, we obtain some churn violation.
We also see that unsurprisingly, the performance of the linear model is considerably worse than that of the neural network model in both training and testing.
In [9]:
model = Model(hidden_units=None, max_churn_rate=0.025)
model.build_train_op(0.01, train_with_churn=False)
# training_helper returns the list of errors and violations over each epoch.
train_errors, train_violations, test_errors, test_violations = training_helper(
model,
train_df,
test_df,
100,
num_iterations_per_loop=326,
num_loops=100)
In [10]:
print("Train Error", train_errors[-1])
print("Train Violation", train_violations[-1])
print()
print("Test Error", test_errors[-1])
print("Test Violation", test_violations[-1])
We now train our linear model with churn constraints so that the linear model's predictions don't differ from that of the neural network by too much. We set the threshold to 0.025 so that the goal is to train for accuracy while ensuring that we only deviate from the network outputs by 2.5%.
Interestingly, not only do we get very close to succeeding in enforcing this churn constraint, we see that the overall accuracy of the linear model improves when compared to training the linear model without the churn constraint.
In [11]:
model = Model(hidden_units=None, max_churn_rate=0.025)
model.build_train_op(0.01, train_with_churn=True)
# training_helper returns the list of errors and violations over each epoch.
train_errors, train_violations, test_errors, test_violations = training_helper(
model,
train_df,
test_df,
100,
num_iterations_per_loop=326,
num_loops=100)
In [12]:
print("Train Error", train_errors[-1])
print("Train Violation", train_violations[-1])
print()
print("Test Error", test_errors[-1])
print("Test Violation", test_violations[-1])