Marketing Campaign Simulation Modeling

We would like to model if a bank client would buy an investment product.


In [4]:
import pandas as pd
try:
    from ggplot import *
except:
    !pip install ggplot
from ggplot.scales.scale_color_gradient import *
%matplotlib inline

In [5]:
data = pd.read_csv('sale_probability.csv')
ggplot(aes(x='Sales probability'), data=pd.DataFrame({ 'Sales probability' : data['probabilities'] })) \
    + geom_histogram(binwidth=0.02, color='darkcyan', fill="white") \
    + geom_vline(x=[0.30, 0.70], linetype='dashed', color="indigo")


Out[5]:
<ggplot: (8756617832133)>

We need to select a group of clients to be contacted, e.g., by phone, about the investment product. Our goal is to maximize sales (number of clients that buy the product) and minimize cost of contact (cost of contacting the client, e.g., salary of client representatives).

To achieve our goal we can simulate marketing campaign for the sales using probabilities of sales for each client that we have in the dataset. Simulation modeling would allow us to select parameters ${\rm min\_probability}$ and ${\rm max\_probability}$ that give us a list of clients that need to be contacted.

To enhance our model we can try compute if phone call to a client would increase or decrease a probability of sale (sales uplift) if we have data about previous contacts with clients. As we do not have it, we would choose a simple model of "uplift" - probability of sale will increase by 10% if a clients gets a call from the client representative.

Profit function from phone calls is: $$profit=N_{sales}*avg(income_{sale})-N_{contacts}*avg(costs_{contact})$$ To enhace our simulation model we may use more complex model for costs of contact, e.g., fixed cost plus variable cost based on duration of phone calls.


In [6]:
import numpy.random as rnd
import numpy as np

In [23]:
result = []
def monte_carlo_coin(probability):
    r = rnd.uniform()
    return int(r < probability)

def profit(n_sales, n_contacts):
    avg_income_sale = 10.0
    avg_costs_contact = 2.0
    return n_sales*avg_income_sale - n_contacts*avg_costs_contact

In [24]:
for min_probability in np.arange(0.0,0.9,0.1):
    for max_probability in np.arange(min_probability+0.1,1.0,0.1):
        # uplift
        target_group = data.probabilities.between(min_probability, max_probability)
        data_after_contact = data.copy()
        data_after_contact.ix[target_group, 'probabilities'] = data.ix[target_group].probabilities + 0.1
        
        prof = 0
        for _ in range(10):
            # simulation
            data_after_contact['sales'] = data_after_contact['probabilities'].apply(monte_carlo_coin)

            # results
            sales = data_after_contact['sales'].sum(axis=0)
            calls = target_group.sum(axis=0)
            prof += profit(sales, calls)
        prof /= 10
        result.append((min_probability, max_probability, sales, calls, prof))

In [60]:
best_results = sorted(result, key=lambda x: x[4])
best_results[-1]


Out[60]:
(0.60000000000000009, 0.70000000000000007, 18857, 49, 187380.0)

In [61]:
best_results[-3::1]


Out[61]:
[(0.5, 0.59999999999999998, 18997, 138, 186914.0),
 (0.40000000000000002, 0.59999999999999998, 18780, 439, 186964.0),
 (0.60000000000000009, 0.70000000000000007, 18857, 49, 187380.0)]

As a result we get a target group of clients to contact. The target group has middle ranges of probability values, namely 0.4 to 0.7 (smallest ${\rm min\_probability}$ and largest ${\rm max\_probability}$ among three best results) as clients with small probability values would not buy the product even if they get a phone call, while clients with high probability fo sale already decided to buy a product and the phone call would not help.


In [62]:
vis = pd.DataFrame(result, columns=['Minimal probability', 'Maximum probability', 'No of sales', 'No of calls', 'Profit'])
print scale_color_gradient
ggplot(vis, aes(x='Minimal probability', y='Maximum probability', color='Profit')) + \
    geom_point(size=50) + \
    scale_color_gradient(low='blue', high='red')


<class 'ggplot.scales.scale_color_gradient.scale_color_gradient'>
Out[62]:
<ggplot: (8756671033577)>