In [1]:

    
from pulp import *
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import HTML

%matplotlib inline



In [2]:

    
#TODO hard code the vitamin list for easier reproduction
import re

with open('vitamins.txt','r') as f:
    data = f.read()
    
matcher = re.compile(r'(.+)\s[0-9\.]+\s(mg|mcg|IU)')
ingredients = [ m[0] for m in matcher.findall(data)]

np.random.seed(42)
df = pd.DataFrame( list(zip(ingredients,np.random.rand(len(ingredients)))), columns = ['vitamin','cost'])
df.cost = df.cost.round(2)

df.to_csv('vitamin_costs.csv', index=False)

A new high performance and optimally cheap multivitamin. Saving the world one mixed integer program at a time

You have been tasked with developing a new superior multivitamin. You have been given free reign to select the ingredients and their relative amounts in the vitamin but you have been asked to keep the cost of the raw materials as low as possible.

Further complicating things, you must abide by some restrictions in the formula.

The formula should not have more than 20% of any one vitamin
The formula must have at least 10% Iron, Zinc, or Magnesium
The formula must have at least 20% Vitamin A, Vitamin C, or Vitamin D
Each vitamin that is used must account for at least 5% of the total
The formula may contain as few as 5 vitamins but no more than 10 vitamins
If the formula contains Magnesium it must also contain Calcium and Zinc
The formula must have one of the B vitamins either B6 or B12 but not both

The possible Ingredients and their per mg cost

You might see a simple greedy strategy to solve this, but the vitamin B constrains and the Magnesium/Zinc/Calcium constraints make things a bit more complicated. Instead of trial and error we can try and create this vitamin by writing a fairly simple mixed integer linear program. First we need to come up with an expression that captures our ultimate goal, in this case, to minimize the cost of raw materials.

We can calculate the total cost of the formula by adding up the individual cost of each vitamin in it.

$$\text{Total Cost} = \text{ sum over all the vitamins (cost of vitamin) * (percent of vitamin) } $$

First lets load the data we have and organize it so we can easily grab the cost of particular vitamin



In [3]:

    
df = pd.read_csv('vitamin_costs.csv')
vitamins = df.vitamin.values
vitamin_cost = df.set_index('vitamin').to_dict()['cost']
df.head()









    Out[3]:







  
    
      
      vitamin
      cost
    
  
  
    
      0
      Vitamin A
      0.37
    
    
      1
      Vitamin C
      0.95
    
    
      2
      Vitamin D
      0.73
    
    
      3
      Vitamin E
      0.60
    
    
      4
      Vitamin K
      0.16

Lets model the percent each vitamin is included as by a variable u. so if Vitamin C is included at 15% then u[Vitamin C] = .15. We can use pulp.LpVariable.dicts to return a dictionary of variables that are indexed by the vitamin names. This will make referring to variables later on very easy.



In [4]:

    
u = LpVariable.dicts('percent', vitamins, 0, 1, LpContinuous)

The next thing we need to do is create an instance of the pulp.LpProblem class. This creates a problem variable that will hold the cost and constraints and tells PuLP that we want to minimize our cost. We pass in a name and either LpMinimize or LpMaximize for either minimizing our cost or maximizing it.



In [5]:

    
prob = LpProblem('super-awesome-vitamin', LpMinimize)

Now we can define our cost. We will use pulp.lpSum instead of the regular python sum function for efficiency. In this problem it won't make a difference so feel free to experiment. This returns a pulp.LpAffineExpression which we will discuss in detail later on. To add our cost expression to the problem we literally just add it to the problem.



In [6]:

    
cost = lpSum([ u[v]*vitamin_cost[v] for v in vitamins])
prob += cost

We can now start adding our constraints. The first few are very straightforward. Like with the cost, each constraint just gets added to the problem variable. The key difference is that the constraint will be an inequality.

1: The formula should not have more than 20% of any one vitamin



In [7]:

    
#no more than 20% of any one vitamin
for v in vitamins:
    prob += u[v] <= .2

2: The formula must have at least 10% Iron, Zinc, or Magnesium



In [8]:

    
#The formula must have at least 10% Iron, Zinc, or Magnesium
prob += u['Iron'] + u['Zinc'] + u['Magnesium'] >= .1

3: The formula must have at least 20% Vitamin A, Vitamin C, or Vitamin D



In [9]:

    
#The formula must have at least 20% Vitamin A, Vitamin C, or Vitamin D
prob += u['Vitamin A'] + u['Vitamin C'] + u['Vitamin D'] >= .2

These were fairly straightforward, the code for the constraints and the description of the constraint are almost identical. However the next constraint is a bit stranger.

"4. Each vitamin that is used must account for at least 5% of the total"

We basically need the u variables to be at least 5% or 0%. To model this we need to introduce some new variables that will track in simple yes/no manner if the vitamin is included in the final formula. Once we have these variables we will link them with the u variables somehow and satisfy the rest of the constraints. For now lets just look at the code.



In [10]:

    
#binary variable that captures if this vitamin will be used in the formula
b = LpVariable.dicts('use',vitamins,0,1, LpBinary)

4: Each vitamin that is used must account for at least 5% of the total



In [11]:

    
#Each vitamin that is used must account for at least 5% of the total
for v in vitamins:
    #if we don't use this vitamin then the percent must be zero
    prob += u[v] <= b[v] 
    #likewise if we do use this vitamin, then the percent must not be zero
    prob += u[v] >= .05 -100*(1-b[v]) # > .05 or > .05 -100

That looks a bit confusing but its actually a very common modeling technique we will explore in detail later on. These new b variables make the rest of the constraints really easy to model.

5: The formula may contain as few as 5 vitamins but no more than 10 vitamins



In [12]:

    
#The formula may contain as few as 5 vitamins but no more than 10 vitamins
prob += lpSum([ b[v] for v in vitamins]) >= 5
prob += lpSum([ b[v] for v in vitamins]) <= 10

6: If the formula contains Magnesium it must also contain Calcium and Zinc



In [13]:

    
#If the formula contains Magnesium it must also contain Calcium and Zinc
prob += 2*b['Magnesium'] <= b['Calcium'] + b['Zinc']

7: The formula must have one of the B vitamins either B6 or B12 but not both



In [14]:

    
#The formula must have one of the B vitamins either B6 or B12 but not both
prob += b['Vitamin B12'] + b['Vitamin B6'] == 1

Finally, while it wasn't stated as a constraint, our u variables are supposed to be percents, so they must add up to 100%



In [15]:

    
#the percentages must add up to 100
prob += lpSum([ u[v] for v in vitamins]) == 1.0

We can now solve the problem and relax knowing we have made the best possible multivitamin (with our highly customized and formalized definition of "best")



In [16]:

    
LpStatus[prob.solve()]









    Out[16]:





'Optimal'



In [17]:

    
print('total cost: $%.2f'%prob.objective.value())









    



total cost: $0.15



In [18]:

    
for v in vitamins:
    if value(u[v]) >0:
        print( '%s %.0f%% at unit cost of: $%.2f' %(v, 100*value(u[v]), vitamin_cost[v]))









    



Vitamin A 20% at unit cost of: $0.37
Thiamin (B1) 10% at unit cost of: $0.16
Riboflavin (B2) 20% at unit cost of: $0.06
Vitamin B12 20% at unit cost of: $0.02
Iron 10% at unit cost of: $0.18
Lycopene 20% at unit cost of: $0.14

	vitamin	cost
0	Vitamin A	0.37
1	Vitamin C	0.95
2	Vitamin D	0.73
3	Vitamin E	0.60
4	Vitamin K	0.16