Combining Machine Learning and Optimization

With Gurobi and sklearn

Machine Learning topics

Touching the elephant here, but ~~not there~~

Supervised Learning

Algorithm selection and hyper-parametric optimization
KFold assessment vs overfitting
Separating training from prediction

~~Unsupervised Learning~~

~~Time Series Data~~

~~Deep Learning~~

Optimization topics

Exploratory programming to application deployment
Coping with the combinatorial explosion
Validating optimization with simulation

The Soda Promotion Problem

We have the challenge of designing the upcoming promotion campaign for a Soda Company. The intended objective is to bolster sales while at the same time obeying various business constraints.

The First Challenge

We need to predict impact of different price points on the expected sales for each type of soda.

To do this, we need to train a soda sales predictor from a historical data table.

Examine historical data



In [1]:

    
import pandas
df_hist = pandas.read_excel("soda_sales_historical_data.xlsx")
df_hist[:5]









    Out[1]:







  
    
      
      Product
      Sales
      Cost Per Unit
      Easter Included
      Super Bowl Included
      Christmas Included
      Other Holiday
      4 Wk Avg Temp
      4 Wk Avg Humidity
      Sales M-1 weeks
      Sales M-2 weeks
      Sales M-3 weeks
      Sales M-4 Weeks
      Sales M-5 weeks
    
  
  
    
      0
      11 Down
      51.9
      1.6625
      No
      No
      Yes
      No
      80.69
      69.19
      17.0
      22.4
      13.5
      14.5
      28.0
    
    
      1
      Alpine Stream
      55.8
      2.2725
      No
      No
      Yes
      No
      80.69
      69.19
      2.4
      2.2
      2.0
      1.4
      0.5
    
    
      2
      Bright
      3385.6
      1.3475
      No
      No
      Yes
      No
      80.69
      69.19
      301.8
      188.8
      101.4
      81.6
      213.8
    
    
      3
      Crisp Clear
      63.5
      1.6600
      No
      No
      Yes
      No
      80.69
      69.19
      73.8
      69.4
      72.8
      75.4
      57.4
    
    
      4
      Popsi Kola
      181.1
      1.8725
      No
      No
      Yes
      No
      80.69
      69.19
      23.1
      22.6
      22.1
      19.9
      23.2



In [2]:

    
df_hist.shape









    Out[2]:





(596, 14)

Convert categorical columns to numeric



In [3]:

    
from pandas import DataFrame, get_dummies
categorical_columns = ['Product','Easter Included','Super Bowl Included', 
                       'Christmas Included', 'Other Holiday']
df_hist = get_dummies(df_hist, prefix={k:"dmy_%s"%k for k in categorical_columns},
                      columns = list(categorical_columns))
df_hist[:5]









    Out[3]:







  
    
      
      Sales
      Cost Per Unit
      4 Wk Avg Temp
      4 Wk Avg Humidity
      Sales M-1 weeks
      Sales M-2 weeks
      Sales M-3 weeks
      Sales M-4 Weeks
      Sales M-5 weeks
      dmy_Product_11 Down
      ...
      dmy_Product_Koala Kola
      dmy_Product_Mr. Popper
      dmy_Product_Popsi Kola
      dmy_Easter Included_No
      dmy_Easter Included_Yes
      dmy_Super Bowl Included_No
      dmy_Super Bowl Included_Yes
      dmy_Christmas Included_No
      dmy_Christmas Included_Yes
      dmy_Other Holiday_No
    
  
  
    
      0
      51.9
      1.6625
      80.69
      69.19
      17.0
      22.4
      13.5
      14.5
      28.0
      1
      ...
      0
      0
      0
      1
      0
      1
      0
      0
      1
      1
    
    
      1
      55.8
      2.2725
      80.69
      69.19
      2.4
      2.2
      2.0
      1.4
      0.5
      0
      ...
      0
      0
      0
      1
      0
      1
      0
      0
      1
      1
    
    
      2
      3385.6
      1.3475
      80.69
      69.19
      301.8
      188.8
      101.4
      81.6
      213.8
      0
      ...
      0
      0
      0
      1
      0
      1
      0
      0
      1
      1
    
    
      3
      63.5
      1.6600
      80.69
      69.19
      73.8
      69.4
      72.8
      75.4
      57.4
      0
      ...
      0
      0
      0
      1
      0
      1
      0
      0
      1
      1
    
    
      4
      181.1
      1.8725
      80.69
      69.19
      23.1
      22.6
      22.1
      19.9
      23.2
      0
      ...
      0
      0
      1
      1
      0
      1
      0
      0
      1
      1
    
  

5 rows × 25 columns

Picking the right predictor algorithm is of upmost importance

Hence we examine our choices here in great deal.



In [4]:

    
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import BaggingRegressor
from sklearn import model_selection
experiments = {"Algorithm":["Ordinary Least Squares", "Regression Tree", 
                            "Big Random Forest", "Random Forest", 
                            "Bagging"], 
               "Objects" : [lambda : LinearRegression(), 
                            lambda : DecisionTreeRegressor(), 
                            lambda : RandomForestRegressor(n_estimators=100), 
                            lambda : RandomForestRegressor(), 
                            lambda : BaggingRegressor()], 
               "Predictions":[[] for _ in range(5)]}
actuals = []

Resist the temptation to overfit!

Instead, split the samples into train, test subsections.



In [5]:

    
from sklearn.model_selection import train_test_split
[_.shape for _ in  train_test_split(df_hist.drop("Sales", axis=1), 
                                    df_hist["Sales"], test_size=0.25)]









    Out[5]:





[(447, 24), (149, 24), (447,), (149,)]

By repeatedly splitting, training, and testing, you can create a realistic simulation of prediction accuracy.



In [6]:

    
for _ in range (4):
    train_X, test_X, train_y, test_y = (
        train_test_split(df_hist.drop("Sales", axis=1), 
                         df_hist["Sales"], test_size=0.25))
    for i, obj_factory in enumerate(experiments["Objects"]):
        obj = obj_factory()
        obj.fit(y=train_y,X=train_X)
        experiments["Predictions"][i] += list(obj.predict(test_X))
    actuals += list(test_y)
actuals = pandas.Series(actuals)
experiments["Predictions"] = list(map(pandas.Series, experiments["Predictions"]))



In [7]:

    
len(actuals), map(len, experiments["Predictions"])









    Out[7]:





(596, [596, 596, 596, 596, 596])

Now lets visualize our results



In [8]:

    
%matplotlib inline 
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
color=iter(cm.rainbow(np.linspace(0,1,len(experiments)+3)))
plt.figure(figsize=(12,7),dpi=300)
plt.plot(actuals,actuals,c=next(color),markersize=2,label='Data')
for _, row in DataFrame(experiments).iterrows():
    plt.plot(actuals, row["Predictions"],'o',c=next(color),
             markersize=2,label=row['Algorithm'])
plt.title('Scatter Plot Prediction v/s Data')
plt.grid(True)
plt.legend()
plt.show()

More visualizations



In [9]:

    
color=iter(cm.rainbow(np.linspace(0,1,len(experiments)+3)))
next(color)
plt.figure(figsize=(13,8),dpi=300)
for index, row in DataFrame(experiments).iterrows():
    relative_error = (row["Predictions"] - actuals) / (1 + abs(actuals))
    plt.plot(np.sort(relative_error),'o',c=next(color),
             markersize=2,label=row['Algorithm'])
plt.title('Relative Error Prediction v/s Data')
plt.ylabel('Relative Error')
plt.grid(True)
plt.legend()
plt.axis([0,len(actuals),-1,1])
plt.show()

Even more visualizations



In [10]:

    
def boxplot(algorithm):
    prediction = (experiments["Predictions"]
                  [experiments["Algorithm"].index(algorithm)])
    plt.title(algorithm)
    plt.boxplot( (prediction - actuals) / (1 + abs(actuals))  )
    plt.show()



In [11]:

    
boxplot("Bagging")



In [12]:

    
boxplot("Big Random Forest")

Visualizations only take us so far

Let's use model_selection.cross_val_score to automate the process of train/test split based assessment. (K-Fold Cross Validation)



In [13]:

    
experiments["Results"] = []
for o in experiments["Objects"]:
    experiments["Results"].append(
        model_selection.cross_val_score(o(), y=df_hist['Sales'], 
                                        X=df_hist.drop("Sales", axis=1),
                                        cv=5).mean())



In [14]:

    
DataFrame(experiments).drop(["Objects", "Predictions"], 
                            axis=1).set_index("Algorithm")









    Out[14]:







  
    
      
      Results
    
    
      Algorithm
      
    
  
  
    
      Ordinary Least Squares
      0.536984
    
    
      Regression Tree
      0.573443
    
    
      Big Random Forest
      0.844174
    
    
      Random Forest
      0.831225
    
    
      Bagging
      0.808155

Now use the complete historical table to create a predictor object with the best algorithm



In [15]:

    
fitted = (experiments["Objects"]
          [experiments["Algorithm"].index("Big Random Forest")]().
          fit(y=df_hist["Sales"], X=df_hist.drop("Sales", axis=1)))

We will make predictions for the "Sales"-less table of current data



In [16]:

    
df_superbowl_original = pandas.read_excel("super_bowl_promotion_data.xlsx")
df_superbowl = get_dummies(df_superbowl_original, 
                           prefix={k:"dmy_%s"%k for k in categorical_columns},
                           columns = list(categorical_columns))
assert "Sales" not in df_superbowl.columns 
assert {"Sales"}.union(df_superbowl.columns).issubset(set(df_hist.columns))
len(df_superbowl)









    Out[16]:





36

Note that the current data table might have less categorical range than the historical data.



In [17]:

    
for fld in set(df_hist.columns).difference(df_superbowl.columns, {"Sales"}):
    assert fld.startswith("dmy_")
    df_superbowl[fld] = 0

Take care!! sklearn has no concept of columns. We make sure that the df_superbowl columns are ordered consistently with the df_hist independent column sub-matrix.



In [18]:

    
df_superbowl = df_superbowl[list(df_hist.drop("Sales", axis=1).columns)]



In [19]:

    
predicted = fitted.predict(df_superbowl)

Prediction in hand, we commence optimization!

LaTeX summary of family of equations

$$ \begin{array}{ll} \max & sales\\ s.t. & X_{b,p}\in\{0,1\}\quad\forall (b,p)\in Prod\\ & \sum\left(X_{b,p}:{(b,p)\in Prod} \right)=1\quad\forall b\in Soda\\ & \sum\left(X_{b,p}:{(b,p)\in Prod, p\neq p_o, T(b)=t} \right) \leq max_t\quad\forall t\\ & sales = \sum\left(f_{b,p} X_{b,p}:{(b,p)\in Prod}\right)\\ & revenue = \sum\left(f_{b,p} p X_{b,p}:{(b,p)\in Prod} \right)\\ & investment = \sum\left(\left(f_{b,p} - f_{b,p_o}\right)_+ p_o X_{b,p}:{(b,p)\in Prod} \right)\\ & investment <= max_{investment} \end{array} $$

Putting the optimization input set together



In [20]:

    
forecast_sales = df_superbowl_original[["Product", "Cost Per Unit"]].copy()
forecast_sales["Sales"] = predicted
forecast_sales.set_index(['Product','Cost Per Unit'], inplace=True)



In [21]:

    
soda_family = {'11 Down': 'Clear', 'AB Root Beer': 'Dark', 
               'Alpine Stream': 'Clear', 'Bright': 'Clear', 
               'Crisp Clear': 'Clear', 'DC Kola': 'Dark',
               'Koala Kola': 'Dark', 'Mr. Popper': 'Dark', 
               'Popsi Kola': 'Dark'}
family  = set(soda_family[j] for j in soda_family)
soda    = set(j for j in soda_family)
max_prom = {f:2 for f in family}
max_investment = 750



In [22]:

    
product_prices = set(forecast_sales.index.values)
normal_price = {b:0 for b in soda}
for b,p in product_prices:
    normal_price[b] = max(normal_price[b],p)

Note that not all estimated discounts yield a boost in sales.



In [23]:

    
meaningful_discounts = 0
for b,p in product_prices:
    if forecast_sales.Sales[b,p] > forecast_sales.Sales[b,normal_price[b]]:
        meaningful_discounts += 1
meaningful_discounts, len(forecast_sales) - len(soda)









    Out[23]:





(23, 27)

Building a MIP model

$$ \begin{array}{l} X_{b,p}\in\{0,1\}\quad\forall (b,p)\in Prod\\ 0 \leq sales\\ 0 \leq revenue\\ 0 \leq investment \leq max_{investment} \end{array} $$



In [24]:

    
import gurobipy as gu
model = gu.Model()
select_price = model.addVars(product_prices,vtype=gu.GRB.BINARY,name='X')
sales        = model.addVar(name='sales')
revenue      = model.addVar(name='revenue')
investment   = model.addVar(ub=max_investment, name='investment')
gusum = gu.quicksum

$$ sales = \sum\left(f_{b,p} X_{b,p}:{(b,p)\in Prod}\right)\\ revenue = \sum\left(f_{b,p} p X_{b,p}:{(b,p)\in Prod} \right)\\ investment = \sum\left(\left(f_{b,p} - f_{b,p_o}\right)_+ p_o X_{b,p}:{(b,p)\in Prod} \right) $$



In [25]:

    
model.addConstr(sales == select_price.prod(forecast_sales.Sales), name='sales')
model.addConstr(revenue == gusum(forecast_sales.Sales[b,p] * p * 
                                 select_price[b,p] for b,p in product_prices), 
                name='revenue')
model.addConstr(investment == 
                gusum(max(0,forecast_sales.Sales[b,p] - 
                            forecast_sales.Sales[b,normal_price[b]]) *
                        normal_price[b] * select_price[b,p] 
                        for b,p in product_prices),
                name='investment')
model.update()

$$ \sum\left(X_{b,p}:{(b,p)\in Prod} \right)=1\quad\forall b\in Soda\\ \sum\left(X_{b,p}:{(b,p)\in Prod, p\neq p_o, T(b)=t} \right) \leq max_t $$



In [26]:

    
model.addConstrs((select_price.sum(b,'*') == 1 for b in soda), name='OnePrice')
model.addConstrs((gusum(select_price[b,p] for b,p in product_prices if 
                        soda_family[b] == f and p != normal_price[b] ) 
                  <= max_prom[f] for f in family),
                 name='MaxProm')
model.update()

Optimize and results



In [27]:

    
model.setObjective(sales, sense=gu.GRB.MAXIMIZE)
model.optimize()









    



Optimize a model with 14 rows, 39 columns and 161 nonzeros
Variable types: 3 continuous, 36 integer (36 binary)
Coefficient statistics:
  Matrix range     [4e-01, 5e+03]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 8e+02]
  RHS range        [1e+00, 2e+00]
Found heuristic solution: objective 5552.39
Presolve removed 4 rows and 19 columns
Presolve time: 0.00s
Presolved: 10 rows, 20 columns, 59 nonzeros
Variable types: 0 continuous, 20 integer (20 binary)

Root relaxation: objective 5.916272e+03, 14 iterations, 0.00 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 5916.27182    0    2 5552.38700 5916.27182  6.55%     -    0s
H    0     0                    5832.5490000 5916.27182  1.44%     -    0s
H    0     0                    5856.0410000 5916.27182  1.03%     -    0s
     0     0 5872.39720    0    3 5856.04100 5872.39720  0.28%     -    0s
     0     0 5866.50365    0    5 5856.04100 5866.50365  0.18%     -    0s
     0     0 5865.80582    0    9 5856.04100 5865.80582  0.17%     -    0s
     0     0 5864.27694    0    6 5856.04100 5864.27694  0.14%     -    0s
     0     0 5863.51309    0    8 5856.04100 5863.51309  0.13%     -    0s
     0     0 5863.32942    0   10 5856.04100 5863.32942  0.12%     -    0s
     0     0 5857.50538    0    8 5856.04100 5857.50538  0.03%     -    0s
     0     0 5857.24185    0    2 5856.04100 5857.24185  0.02%     -    0s

Cutting planes:
  Cover: 2
  Clique: 5
  MIR: 1
  GUB cover: 3

Explored 0 nodes (53 simplex iterations) in 0.05 seconds
Thread count was 4 (of 4 available processors)

Solution count 3: 5856.04 5832.55 5552.39 
Pool objective bound 5856.04

Optimal solution found (tolerance 1.00e-04)
Best objective 5.856041000000e+03, best bound 5.856041000000e+03, gap 0.0000%



In [28]:

    
model.status == gu.GRB.OPTIMAL









    Out[28]:





True

Only the paranoid survive

Carefully sanity check the solution.



In [29]:

    
sales.X, revenue.X, investment.X









    Out[29]:





(5856.041, 11162.34181, 744.4546924999985)



In [30]:

    
price_selections = {"Product":[], "Price":[], "Is Discount":[], "Family":[]}
for b, p in product_prices:
    if abs(select_price[b,p].X -1) < 0.0001: # i.e. almost one
        price_selections["Product"].append(b)
        price_selections["Price"].append(p)
        price_selections["Is Discount"].append(p < normal_price[b])
        price_selections["Family"].append(soda_family[b])
(DataFrame(price_selections).set_index("Product")
 [["Price", "Is Discount", "Family"]].sort_values("Family"))









    Out[30]:







  
    
      
      Price
      Is Discount
      Family
    
    
      Product
      
      
      
    
  
  
    
      11 Down
      1.5600
      False
      Clear
    
    
      Bright
      1.2900
      False
      Clear
    
    
      Alpine Stream
      2.2275
      False
      Clear
    
    
      Crisp Clear
      1.4700
      False
      Clear
    
    
      Mr. Popper
      2.8475
      True
      Dark
    
    
      DC Kola
      1.8900
      True
      Dark
    
    
      AB Root Beer
      3.8425
      False
      Dark
    
    
      Popsi Kola
      1.7500
      False
      Dark
    
    
      Koala Kola
      2.5650
      False
      Dark

Create a range of predictions to simulate the behavior of our solution under a range of conditions.



In [31]:

    
simulated_KPI = {'Sales':[],'Revenue':[],'Investment':[]}
Z = select_price
num_infeas = 0
for i in range(100):
    np.random.seed(i)
    fitted = RandomForestRegressor(n_estimators=100,
                                   n_jobs=4).fit(y=df_hist["Sales"],
                                                 X=df_hist.drop("Sales", axis=1))
    forecast = df_superbowl_original[['Product', 'Cost Per Unit']].copy()
    forecast["Sales"] = fitted.predict(df_superbowl)
    forecast = forecast.set_index(['Product','Cost Per Unit'])
    sales, revenue, investment = 0, 0, 0
    for b,p in product_prices:
        sales   += forecast.Sales[b,p] * Z[b,p].X
        revenue += forecast.Sales[b,p] * p * Z[b,p].X
        investment += (max(0,forecast.Sales[b,p] - 
                             forecast.Sales[b,normal_price[b]]) * 
                       normal_price[b] * Z[b,p].X)
    if investment > max_investment:
        num_infeas += 1
    simulated_KPI['Sales'].append(sales)
    simulated_KPI['Revenue'].append(revenue)
    simulated_KPI['Investment'].append(investment)



In [32]:

    
data = {'Sales','Revenue','Investment'}
color=iter(cm.rainbow(np.linspace(0,1,3)))
for t in data:
    plt.figure(figsize=(7,4),dpi=300)
    plt.hist(simulated_KPI[t],50,normed=1,color=next(color), alpha=0.75)
    plt.ylabel('Probability')
    plt.xlabel(t)
    plt.grid(True)
    plt.show()



In [33]:

    
num_infeas









    Out[33]:





72

Thank you for joining us

If you haven't already done so, please register at http://www.gurobi.com/ and then visit http://www.gurobi.com/downloads/get-anaconda to try Gurobi and Python for yourself.
Explore ticdat at https://pypi.python.org/pypi/ticdat/ and see the Opalytics Cloud Platform in action at http://bit.ly/2sjEuZt.
For questions about Gurobi Pricing contact sales@gurobi.com or sales@gurobi.de.
A recording of the webinar, including the slides, will be available in roughly one week.

Some extra references for Stochastic, Robust, and Risk optimization

After the webinar we received a lot of requests for further material on this topic. The following list is an (incomplete) list of sources for these topics, but a good starting point for it.

Books

Stochastic Programming, Peter Kall, Stein W. Walace, 1994
Optimization Methods in Finance, Gerard Cornuejols and Reha Tütüncü, 2006
Lectures on Stochastic Programming: Modeling and Theory, Alexander Shapiro, Darinka Dentcheva, Andrzej Ruszczy&nacuteski, 2009
Robust Optimization, Aharon Ben-Tal, Laurent El Ghaoui, Arkadi Nemirovski, 2009
Introduction to Stochastic Programming, John Birge, François Louveaux, 2011
Modeling with Stochastic Programming, Alan J. King, Stein W. Walace, 2012

Some papers

Optimization of Conditional Value-at-Risk, R. Tyrrell Rockafellar, Stanislav Uryasev, 2000
Supplier-Retailer Flexible Commitments Contracts: A Robust Optimization Approach, Aharon Ben Tal , Boaz Golany, Arcadi Nemirovskiy, Jean-Philippe Vial, 2003
Tractable Approximations to Robust Conic Optimization Problems,Dimitris Bertsimas, Melvyn Sim, 2006
Modeling and optimization of risk, Pavlo Krokhmal, Michael Zabarankin,Stan Uryasev, 2011

Other resources

The stochastic optimization society in their web site has several tutorials and further links.

Special Thanks

I would like to specially thank (in alphabetic order) to Shabbir Ahmed, Georgia Tech, Tito Homem-de-mello, Universidad Adolfo Ibañez, Jim Luedtke, University of Wisconsin-Madison, David Morton, Northwestern University, and Bernardo Pagnoncelli, Universidad Adolfo Ibañez, for suggesting books, links and papers. Any omission or error is my fault.

Daniel Espinoza,
Senior Developer,
Gurobi Optimization Inc.

	Product	Sales	Cost Per Unit	Easter Included	Super Bowl Included	Christmas Included	Other Holiday	4 Wk Avg Temp	4 Wk Avg Humidity	Sales M-1 weeks	Sales M-2 weeks	Sales M-3 weeks	Sales M-4 Weeks	Sales M-5 weeks
0	11 Down	51.9	1.6625	No	No	Yes	No	80.69	69.19	17.0	22.4	13.5	14.5	28.0
1	Alpine Stream	55.8	2.2725	No	No	Yes	No	80.69	69.19	2.4	2.2	2.0	1.4	0.5
2	Bright	3385.6	1.3475	No	No	Yes	No	80.69	69.19	301.8	188.8	101.4	81.6	213.8
3	Crisp Clear	63.5	1.6600	No	No	Yes	No	80.69	69.19	73.8	69.4	72.8	75.4	57.4
4	Popsi Kola	181.1	1.8725	No	No	Yes	No	80.69	69.19	23.1	22.6	22.1	19.9	23.2

	Results
Algorithm
Ordinary Least Squares	0.536984
Regression Tree	0.573443
Big Random Forest	0.844174
Random Forest	0.831225
Bagging	0.808155

	Price	Is Discount	Family
Product
11 Down	1.5600	False	Clear
Bright	1.2900	False	Clear
Alpine Stream	2.2275	False	Clear
Crisp Clear	1.4700	False	Clear
Mr. Popper	2.8475	True	Dark
DC Kola	1.8900	True	Dark
AB Root Beer	3.8425	False	Dark
Popsi Kola	1.7500	False	Dark
Koala Kola	2.5650	False	Dark