How to make targeted offers to customers?

Marketing Campaign Optimization

This notebook is part of the Prescriptive Analytics for Python

Describe the business problem

  • This example is based on a fictional banking company.
  • The marketing department wants to achieve more profitable results in future campaigns by matching the right offer of financial services to each customer.
  • The data science team needs to compute this best plan: it first predicted the current customer behavior with SPSS and stored the data in a CSV.
  • Now they need to apply optimization on this dataset to determine the next best action.
  • The Self-Learning Response Model (SLRM) node from SPSS Modeler enables you to build a model that you can continually update. Such updates are useful in building a model that assists with predicting which offers are most appropriate for customers and the probability of the offers being accepted. These sorts of models are most beneficial in customer relationship management, such as marketing applications or call centers.
  • Specifically, the example uses a Self-Learning Response Model to identify the characteristics of customers who are most likely to respond favorably based on previous offers and responses and to promote the best current offer based on the results.

A set of business constraints have to be respected:

  • We have a limited budget to run a marketing campaign based on "gifts", "newsletter", "seminar"...
    • Each has a cost and an impact (weight)
  • We want to determine which is the unique best way to contact the customers.
  • We need to identify which customers to contact.

Why are Prescriptive Analytics useful?

Solving a very simple Marketing Campaign

  • For each possible campaign, a cost C and an expected return R are provided
  • We have a limited budget to spend efficiently.
How do you spend your $100?
Revenue Cost
39 20
20 11
22 12
26 14
Revenue Cost
36 20
30 16
17 9
34 19

Trying a custom algorithm

  • What about the following rule (heuristic):
    • Sort campaigns according to decreasing return to cost ratio R / C to have the "best" ROI at first.
    • Choose campaigns in this order until the $100k budget is exhausted
Revenue Cost Ratio
39 20 1.95
17 9 1.89
30 16 1.88
26 14 1.86
22 12 1.83
20 11 1.82
36 20 1.80
34 19 1.19
  • You will end up selecting the bold columns, with a revenue of 154 for a cost of 82.
  • Your profit is \$72

Using prescriptive analytics

  • Using CPLEX algorithm, you will be able to express your business constraints as mathematical sentences, then CPLEX will compute the values of decision variables (should I select an item or not?) mapped to the data.
  • Asking for the maximum ROI, you will get a different optimal solution
  • You could write a custom algorithm to the combinations (handle holes)… and search for the best solution but these are usually not trivial, time-consuming to implement whereas solving your problem with CPLEX Optimizer, you benefit from already implemented search algorithm that have been proven very efficient.
Revenue Cost Ratio
39 20 1.95
17 9 1.89
30 16 1.88
26 14 1.86
22 12 1.83
20 11 1.82
36 20 1.80
34 19 1.79
  • You will end up selecting the bold columns, with a revenue of 185 for a cost of 100.

  • Your profit is \$85 (\$13 more than previously)

Conclusion

  • This kind of problem is highly combinatorial, that is it cannot be solved by hand or with a simple custom algorithm.
  • In general, a SPSS-like algorithm will give you a full list of actions with their advantages and drawbacks. But in real life, you cannot put them all into action, because of limited budget, time...

  • Prescriptive analytics is the next step on the path to insight-based actions. It creates value through synergy with predictive analytics, which analyzes data to predict future outcomes.

  • Prescriptive analytics takes that insight to the next level by suggesting the optimal way to handle that future situation.

  • One of the most powerful aspects of mathematical optimization is that you don't need to know how to solve a problem to get a solution, the engine will provide it to you

  • You “just” need to express the problem using a non-ambiguous mathematical language (i.e. you need to write a mathematical model)

Back to the business problem: let's solve it!

Prepare the data

Step 1 : Model the data

The predictions show which offers a customer is most likely to accept, and the confidence that they will accept, depending on each customer’s details.

For example:

  • (139987, "Pension", 0.13221, "Mortgage", 0.10675) indicates that customer Id=139987 will certainly not buy a Pension as the level is only 13.2%,
  • (140030, "Savings", 0.95678, "Pension", 0.84446) is more than likely to buy Savings and a Pension as the rates are 95.7% and 84.4%.

This data is taken from a SPSS example, except that the names of the customers were modified.

It was uploaded as CSV files on DSX.

Data is coming from a statistic tool. Here we use csv files that are supposedly produced by SPSS for example.


In [1]:
import pandas as pd
#channels = read_storage('channels.csv')
_names = pd.read_csv("https://raw.githubusercontent.com/vberaudi/utwt/master/bank_customers.csv", names =["customerid","name"])
offers = pd.read_csv("https://raw.githubusercontent.com/vberaudi/utwt/master/bank_behaviors.csv", names =["customerid","Product1","Confidence1","Product2","Confidence2"])

In [2]:
names = { t[0] : t[1] for t in _names.itertuples(index=False)}

Data for the marketing campaign.


In [3]:
from IPython.core.display import HTML
from IPython.display import display

In [4]:
products = ["Car loan", "Savings", "Mortgage", "Pension"]
productValue = [100, 200, 300, 400]
budgetShare = [0.6, 0.1, 0.2, 0.1]

availableBudget = 500
channels =  pd.DataFrame(data=[("gift", 20.0, 0.20), ("newsletter", 15.0, 0.05), ("seminar", 23.0, 0.30)], columns=["name", "cost", "factor"])

In [5]:
display(channels)
print("Budget is %d $" %availableBudget)


name cost factor
0 gift 20 0.20
1 newsletter 15 0.05
2 seminar 23 0.30
Budget is 500 $

In [6]:
offers.insert(0,'name',pd.Series(names[i[0]] for i in offers.itertuples(index=False)))

In [7]:
import sys
sys.path.append("/gpfs/fs01/user/s683-f3dde465f37390-d7318baf8c6d/notebook/work/lib/python2.7/site-packages")

The data

Highest expectations per customer


In [9]:
offers.drop('customerid',1).sort(columns = ['Confidence1', 'Confidence2'], ascending=False).head()


/usr/local/src/bluemix_jupyter_bundle.v36/notebook/lib/python2.7/site-packages/ipykernel/__main__.py:1: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  if __name__ == '__main__':
Out[9]:
name Product1 Confidence1 Product2 Confidence2
14 Will Henderson Savings 0.95678 Pension 0.86779
1 Michelle M. Lopez Savings 0.95678 Pension 0.84446
24 Earl B. Wood Savings 0.95678 Pension 0.83426
2 Terry L. Ridgley Savings 0.95678 Pension 0.80233
25 Gabrielly Sousa Martins Savings 0.95678 Pension 0.75925

Lowest expectations


In [10]:
offers.drop('customerid',1).sort(columns = ['Confidence1', 'Confidence2'], ascending=True).head()


/usr/local/src/bluemix_jupyter_bundle.v36/notebook/lib/python2.7/site-packages/ipykernel/__main__.py:1: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  if __name__ == '__main__':
Out[10]:
name Product1 Confidence1 Product2 Confidence2
0 Guadalupe J. Martinez Pension 0.13221 Mortgage 0.10675
3 Miranda B. Roush Pension 0.13221 Mortgage 0.10675
5 Roland Guérette Pension 0.13221 Mortgage 0.10675
6 Fabien Mailhot Pension 0.13221 Mortgage 0.10675
7 Christian Austerlitz Pension 0.13221 Mortgage 0.10675

Use IBM Decision Optimization CPLEX Modeling for Python

Let's create the optimization model to select the best ways to contact customers and stay within the limited budget.

Step 1: Set up the prescriptive engine

  • Subscribe to the Decision Optimization on Cloud solve service here.
  • Get the service URL and your personal API key and enter your credentials here:

First import docplex and set the credentials to solve the model using IBM ILOG CPLEX Optimizer on Cloud. docplex is already installed with its dependancies in XSD.


In [11]:
import sys
import docplex.mp

Step 2: Set up the prescriptive model

Create the model


In [13]:
from docplex.mp.model import Model

mdl = Model(name="marketing_campaign")

Define the decision variables

  • The integer decision variables channelVars, represent whether or not a customer will be made an offer for a particular product via a particular channel.
  • The integer decision variable totaloffers represents the total number of offers made.
  • The continuous variable budgetSpent represents the total cost of the offers made.

In [14]:
offersR = xrange(0, len(offers))
productsR = xrange(0, len(products))
channelsR = xrange(0, len(channels))

channelVars = mdl.binary_var_cube(offersR, productsR, channelsR)
totaloffers = mdl.integer_var()
budgetSpent = mdl.continuous_var()
budgetMax = mdl.integer_var(lb=availableBudget, ub=availableBudget, name="budgetMax")

In [15]:
print("we created %d decision variables for this problem" %(len(offersR)*len(productsR)*len(channelsR)+1+1))


we created 326 decision variables for this problem

Set up the constraints

  • Offer only one product per customer.
  • Compute the budget and set a maximum on it.
  • Compute the number of offers to be made.

In [16]:
# Only 1 product is offered to each customer     
mdl.add_constraints( mdl.sum(channelVars[o,p,c] for p in productsR for c in channelsR) <=1
                   for o in offersR)

mdl.add_constraint( totaloffers == mdl.sum(channelVars[o,p,c] 
                                           for o in offersR 
                                           for p in productsR 
                                           for c in channelsR) )

mdl.add_constraint( budgetSpent == mdl.sum(channelVars[o,p,c]*channels.get_value(index=c, col="cost") 
                                           for o in offersR 
                                           for p in productsR 
                                           for c in channelsR) )

# Balance the offers among products   
for p in productsR:
    mdl.add_constraint( mdl.sum(channelVars[o,p,c] for o in offersR for c in channelsR) 
                       <= budgetShare[p] * totaloffers )
            
# Do not exceed the budget
mdl.add_constraint( mdl.sum(channelVars[o,p,c]*channels.get_value(index=c, col="cost") 
                            for o in offersR 
                            for p in productsR 
                            for c in channelsR)  <= budgetMax )  

mdl.print_information()


Model: marketing_campaign
 - number of variables: 327
   - binary=324, integer=2, continuous=1
 - number of constraints: 34
   - linear=34
 - parameters: defaults

Express the objective

We want to maximize the expected revenue.


In [17]:
mdl.maximize(
    mdl.sum( channelVars[idx,p,idx2] * c.factor * productValue[p]* o.Confidence1  
            for p in productsR 
            for idx,o in offers[offers['Product1'] == products[p]].iterrows()  
            for idx2, c in channels.iterrows())
    +
    mdl.sum( channelVars[idx,p,idx2] * c.factor * productValue[p]* o.Confidence2 
            for p in productsR 
            for idx,o in offers[offers['Product2'] == products[p]].iterrows() 
            for idx2, c in channels.iterrows())
    )

Solve with the Decision Optimization solve service


In [18]:
s = mdl.solve()#url=url, key=key)
assert s, "No Solution !!!"

In [19]:
mdl.report()


* model solved with objective = 423.148

Step 3: Analyze the solution


In [20]:
def build_report(disp = True):
    report = [(channels.get_value(index=c, col="name"), products[p], names[offers.get_value(o, "customerid")]) 
              for c in channelsR 
              for p in productsR 
              for o in offersR  if channelVars[o,p,c].solution_value>=0.9]

    assert len(report) == totaloffers.solution_value

    if disp:
        print("Marketing plan has {0} offers costing {1}".format(totaloffers.solution_value, budgetSpent.solution_value))

    report_bd = pd.DataFrame(report, columns=['channel', 'product', 'customer'])
    #report_bd.head()
    return report_bd

In [21]:
report_bd = build_report()


Marketing plan has 20 offers costing 364.0

Then let's focus on newsletter. (in fact, seems efficient way to push customers)


In [22]:
display(report_bd[report_bd['channel'] == "newsletter"].drop('channel',1))


product customer
0 Car loan Fabien Mailhot
1 Car loan Christian Austerlitz
2 Car loan Lee Tsou
3 Car loan Sanaa' Hikmah Hakimi
4 Car loan George Blomqvist
5 Car loan Yuina Ohira
6 Car loan Vlad Alekseeva
7 Car loan Cassio Lombardo
8 Car loan Trinity Zelaya Miramontes
9 Car loan Eldar Muravyov
10 Car loan Jameel Abdul-Ghani Gerges
11 Car loan Zeeb Longoria Marrero

In [23]:
display(report_bd[report_bd['channel'] == "seminar"].drop('channel',1))


product customer
12 Savings Terry L. Ridgley
13 Savings Gabrielly Sousa Martins
14 Mortgage Miranda B. Roush
15 Mortgage Miroslav Škaroupka
16 Mortgage Matheus Azevedo Melo
17 Mortgage Franca Palermo
18 Pension Michelle M. Lopez
19 Pension Will Henderson

Playing naive What-if analysis

  • We will now make the allowed budget move and see what happens.

First, we add some kpis to get the total number of offers, the number of gitfs, seminar...


In [24]:
mdl.add_kpi(totaloffers, "nb_offers")
mdl.add_kpi(budgetSpent, "budgetSpent")
for c in channelsR:
    channel = channels.get_value(index=c, col="name")
    kpi = mdl.sum(channelVars[o,p,c] for p in productsR for o in offersR)
    mdl.add_kpi(kpi, channel)

In [25]:
def what_if(model = None, max_budget=500, disp=True):
    assert model
    var = model.get_var_by_name("budgetMax")
    var.lb =  max_budget
    var.ub = max_budget
    s = model.solve()#url=url, key=key)
    model.report()
    
    if disp:
        report_bd = build_report(disp = disp)
        display(report_bd[report_bd['channel'] == "seminar"].drop('channel',1))
    return (model.kpi_value_by_name("budgetSpent"), \
            model.kpi_value_by_name("nb_offers"), \
            model.kpi_value_by_name("gift"), \
            model.kpi_value_by_name("newsletter"), \
            model.kpi_value_by_name("seminar"))

In [26]:
x250 = what_if(model= mdl, max_budget=250, disp=False)


* model solved with objective = 245.191
*  KPI: nb_offers   = 10.000
*  KPI: budgetSpent = 182.000
*  KPI: gift        = 0.000
*  KPI: newsletter  = 6.000
*  KPI: seminar     = 4.000

In [27]:
x1000 = what_if(model= mdl, max_budget=1000, disp=False)


* model solved with objective = 423.148
*  KPI: nb_offers   = 20.000
*  KPI: budgetSpent = 433.000
*  KPI: gift        = 1.000
*  KPI: newsletter  = 3.000
*  KPI: seminar     = 16.000

In [28]:
display(pd.DataFrame([x250, x1000], columns=["budgetSpent", "nb_offers", "nb_gift", "nb_newsletter", "nb_seminar"]))


budgetSpent nb_offers nb_gift nb_newsletter nb_seminar
0 182 10 0 6 4
1 433 20 1 3 16

The following models are run in parralel on docplexcloud.

  • The idea is to launch dozens of models with increasing budget to detect the maximum expense we can imagine

In [30]:
r = range(20)

We clone the model, modif the budget bound and push it in a process executor to be solved in //


In [31]:
import sys
#import redis
import concurrent.futures

ret = []

def what_if2(max_budget=500):
    model = mdl.copy()
    var = model.get_var_by_name("budgetMax")
    var.lb =  max_budget
    var.ub = max_budget
    s = model.solve()#url=url, key=key)
  
    return (max_budget, \
            model.kpi_value_by_name("budgetSpent"), \
            model.kpi_value_by_name("nb_offers"), \
            model.kpi_value_by_name("gift"), \
            model.kpi_value_by_name("newsletter"), \
            model.kpi_value_by_name("seminar")
           )


with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:    
        jobs = [executor.submit(what_if2, max_budget=100+50*i) for i in r]
        for future in concurrent.futures.as_completed(jobs):
            ret.append(future.result())


index=["Budget", "budgetSpent", "nb_offers", "nb_gift", "nb_newsletter", "nb_seminar"]
what_pd = pd.DataFrame(data = ret, columns=index)

In [35]:
what_pd = what_pd.sort(columns = ['Budget'], ascending=True)


/usr/local/src/bluemix_jupyter_bundle.v36/notebook/lib/python2.7/site-packages/ipykernel/__main__.py:1: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  if __name__ == '__main__':

In [33]:
#display(what_pd.sort(columns = ['Budget'], ascending=True))
%matplotlib inline
import matplotlib
matplotlib.style.use('ggplot')

what_pd.plot(x='Budget', y='budgetSpent')


Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f6fd87db4d0>

Conclusion

For this dataset, the maximum budget to be spent is around 630$

References