This notebook is part of the Prescriptive Analytics for Python
A set of business constraints have to be respected:
|
|
|||||||||||||||||||||
Revenue | Cost | Ratio |
---|---|---|
39 | 20 | 1.95 |
17 | 9 | 1.89 |
30 | 16 | 1.88 |
26 | 14 | 1.86 |
22 | 12 | 1.83 |
20 | 11 | 1.82 |
36 | 20 | 1.80 |
34 | 19 | 1.19 |
Revenue | Cost | Ratio |
---|---|---|
39 | 20 | 1.95 |
17 | 9 | 1.89 |
30 | 16 | 1.88 |
26 | 14 | 1.86 |
22 | 12 | 1.83 |
20 | 11 | 1.82 |
36 | 20 | 1.80 |
34 | 19 | 1.79 |
You will end up selecting the bold columns, with a revenue of 185 for a cost of 100.
Your profit is \$85 (\$13 more than previously)
In general, a SPSS-like algorithm will give you a full list of actions with their advantages and drawbacks. But in real life, you cannot put them all into action, because of limited budget, time...
Prescriptive analytics is the next step on the path to insight-based actions. It creates value through synergy with predictive analytics, which analyzes data to predict future outcomes.
Prescriptive analytics takes that insight to the next level by suggesting the optimal way to handle that future situation.
One of the most powerful aspects of mathematical optimization is that you don't need to know how to solve a problem to get a solution, the engine will provide it to you
The predictions show which offers a customer is most likely to accept, and the confidence that they will accept, depending on each customer’s details.
For example:
This data is taken from a SPSS example, except that the names of the customers were modified.
It was uploaded as CSV files on DSX.
In [1]:
import pandas as pd
#channels = read_storage('channels.csv')
_names = pd.read_csv("https://raw.githubusercontent.com/vberaudi/utwt/master/bank_customers.csv", names =["customerid","name"])
offers = pd.read_csv("https://raw.githubusercontent.com/vberaudi/utwt/master/bank_behaviors.csv", names =["customerid","Product1","Confidence1","Product2","Confidence2"])
In [2]:
names = { t[0] : t[1] for t in _names.itertuples(index=False)}
Data for the marketing campaign.
In [3]:
from IPython.core.display import HTML
from IPython.display import display
In [4]:
products = ["Car loan", "Savings", "Mortgage", "Pension"]
productValue = [100, 200, 300, 400]
budgetShare = [0.6, 0.1, 0.2, 0.1]
availableBudget = 500
channels = pd.DataFrame(data=[("gift", 20.0, 0.20), ("newsletter", 15.0, 0.05), ("seminar", 23.0, 0.30)], columns=["name", "cost", "factor"])
In [5]:
display(channels)
print("Budget is %d $" %availableBudget)
In [6]:
offers.insert(0,'name',pd.Series(names[i[0]] for i in offers.itertuples(index=False)))
In [7]:
import sys
sys.path.append("/gpfs/fs01/user/s683-f3dde465f37390-d7318baf8c6d/notebook/work/lib/python2.7/site-packages")
In [9]:
offers.drop('customerid',1).sort(columns = ['Confidence1', 'Confidence2'], ascending=False).head()
Out[9]:
In [10]:
offers.drop('customerid',1).sort(columns = ['Confidence1', 'Confidence2'], ascending=True).head()
Out[10]:
First import docplex and set the credentials to solve the model using IBM ILOG CPLEX Optimizer on Cloud. docplex is already installed with its dependancies in XSD.
In [11]:
import sys
import docplex.mp
In [13]:
from docplex.mp.model import Model
mdl = Model(name="marketing_campaign")
channelVars
, represent whether or not a customer will be made an offer for a particular product via a particular channel.totaloffers
represents the total number of offers made.budgetSpent
represents the total cost of the offers made.
In [14]:
offersR = xrange(0, len(offers))
productsR = xrange(0, len(products))
channelsR = xrange(0, len(channels))
channelVars = mdl.binary_var_cube(offersR, productsR, channelsR)
totaloffers = mdl.integer_var()
budgetSpent = mdl.continuous_var()
budgetMax = mdl.integer_var(lb=availableBudget, ub=availableBudget, name="budgetMax")
In [15]:
print("we created %d decision variables for this problem" %(len(offersR)*len(productsR)*len(channelsR)+1+1))
In [16]:
# Only 1 product is offered to each customer
mdl.add_constraints( mdl.sum(channelVars[o,p,c] for p in productsR for c in channelsR) <=1
for o in offersR)
mdl.add_constraint( totaloffers == mdl.sum(channelVars[o,p,c]
for o in offersR
for p in productsR
for c in channelsR) )
mdl.add_constraint( budgetSpent == mdl.sum(channelVars[o,p,c]*channels.get_value(index=c, col="cost")
for o in offersR
for p in productsR
for c in channelsR) )
# Balance the offers among products
for p in productsR:
mdl.add_constraint( mdl.sum(channelVars[o,p,c] for o in offersR for c in channelsR)
<= budgetShare[p] * totaloffers )
# Do not exceed the budget
mdl.add_constraint( mdl.sum(channelVars[o,p,c]*channels.get_value(index=c, col="cost")
for o in offersR
for p in productsR
for c in channelsR) <= budgetMax )
mdl.print_information()
In [17]:
mdl.maximize(
mdl.sum( channelVars[idx,p,idx2] * c.factor * productValue[p]* o.Confidence1
for p in productsR
for idx,o in offers[offers['Product1'] == products[p]].iterrows()
for idx2, c in channels.iterrows())
+
mdl.sum( channelVars[idx,p,idx2] * c.factor * productValue[p]* o.Confidence2
for p in productsR
for idx,o in offers[offers['Product2'] == products[p]].iterrows()
for idx2, c in channels.iterrows())
)
In [18]:
s = mdl.solve()#url=url, key=key)
assert s, "No Solution !!!"
In [19]:
mdl.report()
In [20]:
def build_report(disp = True):
report = [(channels.get_value(index=c, col="name"), products[p], names[offers.get_value(o, "customerid")])
for c in channelsR
for p in productsR
for o in offersR if channelVars[o,p,c].solution_value>=0.9]
assert len(report) == totaloffers.solution_value
if disp:
print("Marketing plan has {0} offers costing {1}".format(totaloffers.solution_value, budgetSpent.solution_value))
report_bd = pd.DataFrame(report, columns=['channel', 'product', 'customer'])
#report_bd.head()
return report_bd
In [21]:
report_bd = build_report()
Then let's focus on newsletter. (in fact, seems efficient way to push customers)
In [22]:
display(report_bd[report_bd['channel'] == "newsletter"].drop('channel',1))
In [23]:
display(report_bd[report_bd['channel'] == "seminar"].drop('channel',1))
First, we add some kpis to get the total number of offers, the number of gitfs, seminar...
In [24]:
mdl.add_kpi(totaloffers, "nb_offers")
mdl.add_kpi(budgetSpent, "budgetSpent")
for c in channelsR:
channel = channels.get_value(index=c, col="name")
kpi = mdl.sum(channelVars[o,p,c] for p in productsR for o in offersR)
mdl.add_kpi(kpi, channel)
In [25]:
def what_if(model = None, max_budget=500, disp=True):
assert model
var = model.get_var_by_name("budgetMax")
var.lb = max_budget
var.ub = max_budget
s = model.solve()#url=url, key=key)
model.report()
if disp:
report_bd = build_report(disp = disp)
display(report_bd[report_bd['channel'] == "seminar"].drop('channel',1))
return (model.kpi_value_by_name("budgetSpent"), \
model.kpi_value_by_name("nb_offers"), \
model.kpi_value_by_name("gift"), \
model.kpi_value_by_name("newsletter"), \
model.kpi_value_by_name("seminar"))
In [26]:
x250 = what_if(model= mdl, max_budget=250, disp=False)
In [27]:
x1000 = what_if(model= mdl, max_budget=1000, disp=False)
In [28]:
display(pd.DataFrame([x250, x1000], columns=["budgetSpent", "nb_offers", "nb_gift", "nb_newsletter", "nb_seminar"]))
In [30]:
r = range(20)
We clone the model, modif the budget bound and push it in a process executor to be solved in //
In [31]:
import sys
#import redis
import concurrent.futures
ret = []
def what_if2(max_budget=500):
model = mdl.copy()
var = model.get_var_by_name("budgetMax")
var.lb = max_budget
var.ub = max_budget
s = model.solve()#url=url, key=key)
return (max_budget, \
model.kpi_value_by_name("budgetSpent"), \
model.kpi_value_by_name("nb_offers"), \
model.kpi_value_by_name("gift"), \
model.kpi_value_by_name("newsletter"), \
model.kpi_value_by_name("seminar")
)
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
jobs = [executor.submit(what_if2, max_budget=100+50*i) for i in r]
for future in concurrent.futures.as_completed(jobs):
ret.append(future.result())
index=["Budget", "budgetSpent", "nb_offers", "nb_gift", "nb_newsletter", "nb_seminar"]
what_pd = pd.DataFrame(data = ret, columns=index)
In [35]:
what_pd = what_pd.sort(columns = ['Budget'], ascending=True)
In [33]:
#display(what_pd.sort(columns = ['Budget'], ascending=True))
%matplotlib inline
import matplotlib
matplotlib.style.use('ggplot')
what_pd.plot(x='Budget', y='budgetSpent')
Out[33]: