This tutorial includes everything you need to set up IBM Decision Optimization CPLEX Modeling for Python (DOcplex), build a Mathematical Programming model, and get its solution by solving the model on Cloud with IBM ILOG CPLEX Optimizer.
When you finish this tutorial, you'll have a foundational knowledge of Prescriptive Analytics.
This notebook is part of Prescriptive Analytics for Python
It requires either an installation of CPLEX Optimizers or it can be run on IBM Watson Studio Cloud (Sign up for a free IBM Cloud account and you can start using Watson Studio Cloud right away).
Table of contents:
This mining operations optimization problem is an implementation of Problem 7 from "Model Building in Mathematical Programming" by H.P. Williams. The operational decisions that need to be made are which mines should be operated each year and how much each mine should produce.
Each year, the total revenue is equal to the total quantity extracted multiplied by the blend price. The time series of revenues is aggregated in one expected revenue by applying the discount rate; in other terms, a revenue of \$1000 next year is counted as \$900 actualized, \$810 if the revenue is expected in two years, etc.
A mine that stays open must pay royalties (see the column royalties in the DataFrame). Again, royalties from different years are actualized using the discount rate.
The business objective is to maximize the net actualized profit, that is the difference between the total actualized revenue and total actualized royalties.
Prescriptive analytics (decision optimization) technology recommends actions that are based on desired outcomes. It takes into account specific scenarios, resources, and knowledge of past and current events. With this insight, your organization can make better decisions and have greater control of business outcomes.
Prescriptive analytics is the next step on the path to insight-based actions. It creates value through synergy with predictive analytics, which analyzes data to predict future outcomes.
Prescriptive analytics takes that insight to the next level by suggesting the optimal way to handle that future situation. Organizations that can act fast in dynamic conditions and make superior decisions in uncertain environments gain a strong competitive advantage.
With prescriptive analytics, you can:
In [ ]:
import pip
REQUIRED_MINIMUM_PANDAS_VERSION = '0.17.1'
try:
import pandas as pd
assert pd.__version__ >= REQUIRED_MINIMUM_PANDAS_VERSION
except:
raise Exception("Version %s or above of Pandas is required to run this notebook" % REQUIRED_MINIMUM_PANDAS_VERSION)
In [ ]:
import sys
try:
import docplex.mp
except:
raise Exception('Please install docplex. See https://pypi.org/project/docplex/')
If CPLEX is not installed, install CPLEX Community edition.
In [ ]:
try:
import cplex
except:
raise Exception('Please install CPLEX. See https://pypi.org/project/cplex/')
In [ ]:
# If needed, install the module pandas prior to executing this cell
import pandas as pd
from pandas import DataFrame, Series
In [ ]:
df_mines = DataFrame({"royalties": [ 5 , 4, 4, 5 ],
"ore_quality": [ 1.0, 0.7, 1.5, 0.5],
"max_extract": [ 2 , 2.5, 1.3, 3 ]})
nb_mines = len(df_mines)
df_mines.index.name='range_mines'
df_mines
In [ ]:
blend_qualities = Series([0.9, 0.8, 1.2, 0.6, 1.0])
nb_years = len(blend_qualities)
print("* Planning mining operations for: {} years".format(nb_years))
blend_qualities.describe()
In [ ]:
# global data
blend_price = 10
max_worked_mines = 3 # work no more than 3 mines each year
discount_rate = 0.10 # 10% interest rate each year
In [ ]:
from docplex.mp.environment import Environment
env = Environment()
env.print_information()
In [ ]:
from docplex.mp.model import Model
mm = Model("mining_pandas")
What are the decisions we need to make?
What quantity is extracted from each mine, each year? (a positive number)
We need to define some decision variables and add constraints to our model related to these decisions.
In [ ]:
# auxiliary data: ranges
range_mines = range(nb_mines)
range_years = range(nb_years)
# binary decisions: work the mine or not
work_vars = mm.binary_var_matrix(keys1=range_mines, keys2=range_years, name='work')
# open the mine or not
open_vars = mm.binary_var_matrix(range_mines, range_years, name='open')
# quantity to extract
ore_vars = mm.continuous_var_matrix(range_mines, range_years, name='ore')
mm.print_information()
In order to take advantage of the pandas operations to create the optimization model, decision variables are organized in a DataFrame which is automatically indexed by 'range_mines' and 'range_years' (that is, the same keys as the dictionary created by the binary_var_matrix() method).
In [ ]:
# Organize all decision variables in a DataFrame indexed by 'range_mines' and 'range_years'
df_decision_vars = DataFrame({'work': work_vars, 'open': open_vars, 'ore': ore_vars})
# Set index names
df_decision_vars.index.names=['range_mines', 'range_years']
# Display rows of 'df_decision_vars' DataFrame for first mine
df_decision_vars[:nb_years]
Now, let's iterate over rows of the DataFrame "df_decision_vars" and enforce the desired constraints.
The pandas method itertuples() returns a named tuple for each row of a DataFrame. This method is efficient and convenient for iterating over all rows.
In [ ]:
mm.add_constraints(t.work <= t.open for t in df_decision_vars.itertuples())
mm.print_information()
These constraints are a little more complex: we state that the series of open_vars[m,y] for a given mine m is decreasing. In other terms, once some open_vars[m,y] is zero, all subsequent values for future years are also zero.
Let's use the pandas groupby operation to collect all "open" decision variables for each mine in separate pandas Series.
Then, we iterate over the mines and invoke the aggregate() method, passing the postOpenCloseConstraint() function as the argument.
The pandas aggregate() method invokes postOpenCloseConstraint() for each mine, passing the associated Series of "open" decision variables as argument.
The postOpenCloseConstraint() function posts a set of constraints on the sequence of "open" decision variables to enforce that a mine cannot re-open.
In [ ]:
# Once closed, a mine stays closed
def postOpenCloseConstraint(open_vars):
mm.add_constraints(open_next <= open_curr
for (open_next, open_curr) in zip(open_vars[1:], open_vars))
# Optionally: return a string to display information regarding the aggregate operation in the Output cell
return "posted {0} open/close constraints".format(len(open_vars) - 1)
# Constraints on sequences of decision variables are posted for each mine,
# using pandas' "groupby" operation.
df_decision_vars.open.groupby(level='range_mines').aggregate(postOpenCloseConstraint)
This time, we use the pandas groupby operation to collect all "work" decision variables for each year in separate pandas Series. Each Series contains the "work" decision variables for all mines.
Then, the maximum number of worked mines constraint is enforced by making sure that the sum of all the terms of each Series is smaller or equal to the maximum number of worked mines.
The aggregate() method is used to post this constraint for each year.
In [ ]:
# Maximum number of worked mines each year
# Note that Model.sum() accepts a pandas Series of variables.
df_decision_vars.work.groupby(level='range_years').aggregate(
lambda works: mm.add_constraint(mm.sum(works) <= max_worked_mines))
To illustrate the pandas join operation, let's build a DataFrame that joins the "df_decision_vars" DataFrame and the "df_mines.max_extract" Series such that each row contains the information to enforce the quantity extracted limit constraint.
The default behaviour of the pandas join operation is to look at the index of left DataFrame and to append columns of the right Series or DataFrame which have same index.
Here is the result of this operation in our case:
In [ ]:
# Display rows of 'df_decision_vars' joined with 'df_mines.max_extract' Series for first two mines
df_decision_vars.join(df_mines.max_extract)[:(nb_years * 2)]
Now, the constraint to limit quantity extracted is easily created by iterating over all rows of the joined DataFrames:
In [ ]:
# quantity extracted is limited
mm.add_constraints(t.ore <= t.max_extract * t.work
for t in df_decision_vars.join(df_mines.max_extract).itertuples())
mm.print_information()
Again, we use the pandas groupby operation, this time to collect all "ore" decision variables for each year in separate pandas Series.
The "blend" variable for a given year is the sum of "ore" decision variables for the corresponding Series.
In [ ]:
# blend variables
blend_vars = mm.continuous_var_list(nb_years, name='blend')
# define blend variables as sum of extracted quantities
mm.add_constraints(mm.sum(ores.values) == blend_vars[year]
for year, ores in df_decision_vars.ore.groupby(level='range_years'))
mm.print_information()
In [ ]:
# Quality requirement on blended ore
mm.add_constraints(mm.sum(ores.values * df_mines.ore_quality) >= blend_qualities[year] * blend_vars[year]
for year, ores in df_decision_vars.ore.groupby(level='range_years'))
mm.print_information()
In [ ]:
actualization = 1.0 - discount_rate
assert actualization > 0
assert actualization <= 1
#
s_discounts = Series((actualization ** y for y in range_years), index=range_years, name='discounts')
s_discounts.index.name='range_years'
# e.g. [1, 0.9, 0.81, ... 0.9**y...]
print(s_discounts)
In [ ]:
expected_revenue = blend_price * mm.dot(blend_vars, s_discounts)
mm.add_kpi(expected_revenue, "Total Actualized Revenue");
This time, we use the pandas join operation twice to build a DataFrame that joins the "df_decision_vars" DataFrame with the "df_mines.royalties" and "s_discounts" Series such that each row contains the relevant information to calculate its contribution to the total actualized royalty cost.
The join with the "df_mines.royalties" Series is performed by looking at the common "range_mines" index, while the join with the "s_discounts" Series is performed by looking at the common "range_years" index.
In [ ]:
df_royalties_data = df_decision_vars.join(df_mines.royalties).join(s_discounts)
# add a new column to compute discounted roylaties using pandas multiplication on columns
df_royalties_data['disc_royalties'] = df_royalties_data['royalties'] * df_royalties_data['discounts']
df_royalties_data[:nb_years]
The total royalty is now calculated by multiplying the columns "open", "royalties" and "discounts", and to sum over all rows.
Using pandas constructs, this can be written in a very compact way as follows:
In [ ]:
total_royalties = mm.dot(df_royalties_data.open, df_royalties_data.disc_royalties)
mm.add_kpi(total_royalties, "Total Actualized Royalties");
In [ ]:
mm.maximize(expected_revenue - total_royalties)
In [ ]:
mm.print_information()
# turn this flag on to see the solve log
print_cplex_log = False
# start the solve
s1 = mm.solve(log_output=print_cplex_log)
assert s1, "!!! Solve of the model fails"
mm.report()
To analyze the results, we again leverage pandas, by storing the solution value of the ore variables in a new DataFrame.
Note that we use the float function of Python to convert the variable to its solution value. Of course, this requires that the model be successfully solved.
For convenience, we want to organize the ore solution values in a pivot table with years as row index and mines as columns. The pandas unstack operation does this for us.
In [ ]:
mine_labels = [("mine%d" % (m+1)) for m in range_mines]
ylabels = [("y%d" % (y+1)) for y in range_years]
# Add a column to DataFrame containing 'ore' decision variables value
# Note that we extract the solution values of ore variables in one operation with get_values().
df_decision_vars['ore_values'] = s1.get_values(df_decision_vars.ore)
# Create a pivot table by (years, mines), using pandas' "unstack" method to transform the 'range_mines' row index
# into columns
df_res = df_decision_vars.ore_values.unstack(level='range_mines')
# Set user-friendly labels for column and row indices
df_res.columns = mine_labels
df_res.index = ylabels
df_res
In [ ]:
# import matplotlib library for visualization
import matplotlib.pyplot as plt
# matplotlib graphics are printed -inside- the notebook
%matplotlib inline
df_res.plot(kind="bar", figsize=(10,4.5))
plt.xlabel("year")
plt.ylabel("ore")
plt.title('ore values per year');
In [ ]:
# a list of (mine, year) tuples on which work is not possible.
forced_stops = [(1, 2), (0, 1), (1, 0), (3, 2), (2, 3), (3, 4)]
mm.add_constraints(work_vars[stop_m, stop_y] == 0
for stop_m, stop_y in forced_stops)
mm.print_information()
The previous solution does not satisfy these constraints; for example (0, 1) means mine 1 should not be worked on year 2, but it was in fact worked in the above solution.
To help CPLEX find a feasible solution, we will build a heuristic feasible solution and pass it to CPLEX.
In [ ]:
# build a new, empty solution
full_mining = mm.new_solution()
# define the worked
for m in range_mines:
for y in range_years:
if (m,y) not in forced_stops:
full_mining.add_var_value(work_vars[m,y], 1)
#full_mining.display()
Then we pass this solution to the model as a MIP start solution and re-solve, this time with CPLEX logging turned on.
In [ ]:
mm.add_mip_start(full_mining)
s2 = mm.solve(log_output=True) # turns on CPLEX logging
assert s2, "solve failed"
mm.report()
You can see in the CPLEX log above, that our MIP start solution provided a good start for CPLEX, defining an initial solution with objective 157.9355
Now we can again visualize the results with pandas and matplotlib.
In [ ]:
# Add a column to DataFrame containing 'ore' decision variables value and create a pivot table by (years, mines)
df_decision_vars['ore_values2'] = s2.get_values(df_decision_vars.ore)
df_res2 = df_decision_vars.ore_values2.unstack(level='range_mines')
df_res2.columns = mine_labels
df_res2.index = ylabels
df_res2.plot(kind="bar", figsize=(10,4.5))
plt.xlabel("year")
plt.ylabel("ore")
plt.title('ore values per year - what-if scenario');
As expected, mine1 is not worked in year 2: there is no blue bar at y2.
Copyright © 2017-2019 IBM. IPLA licensed Sample Materials.
In [ ]: