This tutorial includes everything you need to set up the decision optimization engines and build mathematical programming models.
When you finish this tutorial, you'll have a foundational knowledge of Prescriptive Analytics.
This notebook is part of Prescriptive Analytics for Python
It requires either an installation of CPLEX Optimizers or it can be run on IBM Watson Studio Cloud (Sign up for a free IBM Cloud account and you can start using Watson Studio Cloud right away).
Table of contents:
An oil company manufactures different types of gasoline and diesel. Each type of gasoline is produced by blending different types of crude oils that must be purchased. The company must decide how much crude oil to buy in order to maximize its profit while respecting processing capacities and quality levels as well as satisfying customer demand.
Blending problems are a typical industry application of Linear Programming (LP). LP represents real life problems mathematically using an objective function to represent the goal that is to be minimized or maximized, together with a set of linear constraints which define the conditions to be satisfied and the limitations of the real life problem. The function and constraints are expressed in terms of decision variables and the solution, obtained from optimization engines such as IBM® ILOG® CPLEX®, provides the best values for these variables so that the objective function is optimized.
The oil-blending problem consists of calculating different blends of gasoline according to specific quality criteria.
Three types of gasoline are manufactured: super, regular, and diesel.
Each type of gasoline is produced by blending three types of crude oil: crude1, crude2, and crude3.
The gasoline must satisfy some quality criteria with respect to their lead content and their octane ratings, thus constraining the possible blendings.
The company must also satisfy its customer demand, which is 3,000 barrels a day of super, 2,000 of regular, and 1,000 of diesel.
The company can purchase 5,000 barrels of each type of crude oil per day and can process at most 14,000 barrels a day.
In addition, the company has the option of advertising a gasoline, in which case the demand for this type of gasoline increases by ten barrels for every dollar spent.
Finally, it costs four dollars to transform a barrel of oil into a barrel of gasoline.
Prescriptive analytics technology recommends actions based on desired outcomes, taking into account specific scenarios, resources, and knowledge of past and current events. This insight can help your organization make better decisions and have greater control of business outcomes.
Prescriptive analytics is the next step on the path to insight-based actions. It creates value through synergy with predictive analytics, which analyzes data to predict future outcomes.
Prescriptive analytics takes that insight to the next level by suggesting the optimal way to handle that future situation. Organizations that can act fast in dynamic conditions and make superior decisions in uncertain environments gain a strong competitive advantage.
For example:
In [ ]:
import sys
try:
import docplex.mp
except:
raise Exception('Please install docplex. See https://pypi.org/project/docplex/')
If CPLEX is not installed, install CPLEX Community edition.
In [ ]:
try:
import cplex
except:
raise Exception('Please install CPLEX. See https://pypi.org/project/cplex/')
There are inventory costs for each type of final product and blending proportions. All of these have actual values in the model.
The maginal production cost and maximum production are assumed to be identical for all oil types.
Input data comes as NumPy arrays with two dimensions. NumPy is the fundamental package for scientific computing with Python.
The first dimension of the NumPy array is the number of gasoline types; and for each gasoline type, we have a NumPy array containing capacity, price, octane and lead level, in that order.
In [ ]:
import numpy as np
gas_names = ["super", "regular", "diesel"]
gas_data = np.array([[3000, 70, 10, 1], [2000, 60, 8, 2], [1000, 50, 6, 1]])
oil_names = ["crude1", "crude2", "crude3"]
oil_data = np.array([[5000, 45, 12, 0.5], [5000, 35, 6, 2], [5000, 25, 8, 3]])
nb_gas = len(gas_names)
nb_oils = len(oil_names)
range_gas = range(nb_gas)
range_oil = range(nb_oils)
print("Number of gasoline types = {0}".format(nb_gas))
print("Number of crude types = {0}".format(nb_oils))
# global data
production_cost = 4
production_max = 14000
# each $1 spent on advertising increases demand by 10.
advert_return = 10
Pandas is another Python library that we use to store data. pandas contains data structures and data analysis tools for the Python programming language.
In [ ]:
import pandas as pd
gaspd = pd.DataFrame([(gas_names[i],int(gas_data[i][0]),int(gas_data[i][1]),int(gas_data[i][2]),int(gas_data[i][3]))
for i in range_gas])
oilpd = pd.DataFrame([(oil_names[i],int(oil_data[i][0]),int(oil_data[i][1]),int(oil_data[i][2]),oil_data[i][3])
for i in range_oil])
gaspd.columns = ['name','demand','price','octane','lead']
oilpd.columns= ['name','capacity','price','octane','lead']
Use basic HTML and a stylesheet to format the data.
In [ ]:
CSS = """
body {
margin: 0;
font-family: Helvetica;
}
table.dataframe {
border-collapse: collapse;
border: none;
}
table.dataframe tr {
border: none;
}
table.dataframe td, table.dataframe th {
margin: 0;
border: 1px solid white;
padding-left: 0.25em;
padding-right: 0.25em;
}
table.dataframe th:not(:empty) {
background-color: #fec;
text-align: left;
font-weight: normal;
}
table.dataframe tr:nth-child(2) th:empty {
border-left: none;
border-right: 1px dashed #888;
}
table.dataframe td {
border: 2px solid #ccf;
background-color: #f4f4ff;
}
table.dataframe thead th:first-child {
display: none;
}
table.dataframe tbody th {
display: none;
}
"""
from IPython.core.display import HTML
HTML('<style>{}</style>'.format(CSS))
Let's display the data we just prepared.
In [ ]:
from IPython.display import display
print("Gas data:")
display(gaspd)
print("Oil data:")
display(oilpd)
In [ ]:
from docplex.mp.model import Model
mdl = Model(name="oil_blending")
For each combination of oil and gas, we have to decide the quantity of oil to use to produce a gasoline. A decision variable will be needed to represent that amount. A matrix of continuous variables, indexed by the set of oils and the set of gasolines needs to be created.
In [ ]:
blends = mdl.continuous_var_matrix(keys1=nb_oils, keys2=nb_gas, lb=0)
We also have to decide how much should be spent in advertising for each time of gasoline. To do so, we will create a list of continuous variables, indexed by the gasolines.
In [ ]:
adverts = mdl.continuous_var_list(nb_gas, lb=0)
The business constraints are the following:
In [ ]:
# gasoline demand is numpy array field #0
mdl.add_constraints(mdl.sum(blends[o, g] for o in range(nb_oils)) == gas_data[g][0] + advert_return * adverts[g]
for g in range(nb_gas))
mdl.print_information()
In [ ]:
mdl.add_constraints(mdl.sum(blends[o,g] for g in range_gas) <= oil_data[o][0]
for o in range_oil)
mdl.print_information()
In [ ]:
# minimum octane level
# octane is numpy array field #2
mdl.add_constraints(mdl.sum(blends[o,g]*(oil_data[o][2] - gas_data[g][2]) for o in range_oil) >= 0
for g in range_gas)
# maximum lead level
# lead level is numpy array field #3
mdl.add_constraints(mdl.sum(blends[o,g]*(oil_data[o][3] - gas_data[g][3]) for o in range_oil) <= 0
for g in range_gas)
mdl.print_information()
In [ ]:
# -- maximum global production
mdl.add_constraint(mdl.sum(blends) <= production_max)
mdl.print_information()
In [ ]:
# KPIs
total_advert_cost = mdl.sum(adverts)
mdl.add_kpi(total_advert_cost, "Total advertising cost")
total_oil_cost = mdl.sum(blends[o,g] * oil_data[o][1] for o in range_oil for g in range_gas)
mdl.add_kpi(total_oil_cost, "Total Oil cost")
total_production_cost = production_cost * mdl.sum(blends)
mdl.add_kpi(total_production_cost, "Total production cost")
total_revenue = mdl.sum(blends[o,g] * gas_data[g][1] for g in range(nb_gas) for o in range(nb_oils))
mdl.add_kpi(total_revenue, "Total revenue")
# finally the objective
mdl.maximize(total_revenue - total_oil_cost - total_production_cost - total_advert_cost)
If you're using a Community Edition of CPLEX runtimes, depending on the size of the problem, the solve stage may fail and will need a paying subscription or product installation.
We display the objective and KPI values after the solve by calling the method report() on the model.
In [ ]:
assert mdl.solve(), "Solve failed"
mdl.report()
In [ ]:
all_kpis = [(kp.name, kp.compute()) for kp in mdl.iter_kpis()]
kpis_bd = pd.DataFrame(all_kpis, columns=['kpi', 'value'])
In [ ]:
blend_values = [ [ blends[o,g].solution_value for g in range_gas] for o in range_oil]
total_gas_prods = [sum(blend_values[o][g] for o in range_oil) for g in range_gas]
prods = list(zip(gas_names, total_gas_prods))
prods_bd = pd.DataFrame(prods)
Let's display some KPIs in pie charts using the Python package matplotlib.
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
def display_pie(pie_values, pie_labels, colors=None,title=''):
plt.axis("equal")
plt.pie(pie_values, labels=pie_labels, colors=colors, autopct="%1.1f%%")
plt.title(title)
plt.show()
display_pie( [kpnv[1] for kpnv in all_kpis], [kpnv[0] for kpnv in all_kpis],title='KPIs: Revenue - Oil Cost - Production Cost')
In [ ]:
display_pie(total_gas_prods, gas_names, colors=["green", "goldenrod", "lightGreen"],title='Gasoline Total Production')
We see that the most produced gasoline type is by far regular.
Now, let's plot the breakdown of oil blend quantities per gasoline type. We are using a multiple bar chart diagram, displaying all blend values for each couple of oil and gasoline type.
In [ ]:
sblends = [(gas_names[n], oil_names[o], round(blends[o,n].solution_value)) for n in range_gas for o in range_oil]
blends_bd = pd.DataFrame(sblends)
In [ ]:
f, barplot = plt.subplots(1, figsize=(16,5))
bar_width = 0.1
offset = 0.12
rho = 0.7
# position of left-bar boundaries
bar_l = [o for o in range_oil]
mbar_w = 3*bar_width+2*max(0, offset-bar_width)
tick_pos = [b*rho + mbar_w/2.0 for b in bar_l]
colors = ['olive', 'lightgreen', 'cadetblue']
for i in range_oil:
barplot.bar([b*rho + (i*offset) for b in bar_l],
blend_values[i], width=bar_width, color=colors[i], label=oil_names[i])
plt.xticks(tick_pos, gas_names)
barplot.set_xlabel("gasolines")
barplot.set_ylabel("blend")
plt.legend(loc="upper right")
plt.title('Blend Repartition\n')
# Set a buffer around the edge
plt.xlim([0, max(tick_pos)+mbar_w +0.5])
plt.show()
Notice the missing bar for (crude2, diesel) which is expected since blend[crude2, diesel] is zero in the solution.
We can check the solution value of blends for crude2 and diesel, remembering that crude2 has offset 1 and diesel has offset 2. Note how the decision variable is automatically converted to a float here. This would raise an exception if called before submitting a solve, as no solution value would be present.
In [ ]:
print("* value of blend[crude2, diesel] is %g" % blends[1,2])
Copyright © 2017-2019 IBM. IPLA licensed Sample Materials.
In [ ]: