0-1 Knapsack Problem using Dynamic Programming

Formal Description

Given two n-tuples of positive numbers $[v_1, v_2, ..., v_n]$ and $[w_1, w_2, ..., w_n]$, find the $\sum_i (v_i)$ that maximizes $\sum_i (w_i)$ subject to the constraint $\sum_i (w_i) \leq W$.

This problem is called 0-1 Knapsack problem because the i-th item is either in the optimal subset or it's not.

If fractions of the items are allowed, then there's a simple Greedy solution:

  • Calculate $u_i = \frac {v_i} {w_i}$ for each $i$.
  • Sort the items by the decreasing order of $u_i$.
  • Take each item $u_i$ until W runs out. If we can't take the entire item, take the fraction $f$ such that addition of $f*w_i$ keeps the sum of weights in the optimal subset <= W.

The brute force approach requires generating all possible combinations ($2^n-1$) which is exponential in n.

Optimal Substructure Step

The optimal substructure property of the Dynamic Programming solution for the problem can be expressed thus:

Let's say at any given point, $S$ is a potential optimal subset excluding item $i$ and $V(S, w)$ is the maximum value for a weight contraint $w$. Then, the maximum value of the potential optimal subset including item $i$ is:

$$ V(S \cup \{item_i\}, w) = \begin{cases} V(S, w), \text{if} \space w_i \gt w \\ max(V(S, w), V(S \cup \{item_i\}, w - w_i)), \text{otherwise} \end{cases} $$

To implement this algorithm in code, we'll use a 2D array (pandas DataFrame) with one row per item and W+1 columns. Each cell in $row_i, column_j$ represents the maximum value using the first $i$ items subject to weight constraint $j$.


In [1]:
import pandas as pd
import numpy as np

In [2]:
# Solve the problem for this toy example
weights = [5, 4, 6, 3]
values = [10, 50, 30, 50]
W = 10

# Handy data structure to refer to values and weights easily
items_df = pd.DataFrame(
    index=range(1, len(weights)+1), 
    data=list(zip(weights, values)),
    columns=['weights', 'values']
)

items_df


Out[2]:
weights values
1 5 10
2 4 50
3 6 30
4 3 50

In [3]:
# 2D array to store the memoized optimal subproblem solutions
optimal_weights = pd.DataFrame(index=range(len(weights)+1), data=0, columns=range(W+1))
optimal_weights


Out[3]:
0 1 2 3 4 5 6 7 8 9 10
0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0

Now we're ready to implement the DP step.


In [4]:
for i in optimal_weights.index[1:]:
    curr_item_val, cur_item_wt = items_df.loc[i, ['values', 'weights']]
    
    for j in range(1, W+1):
        if j < cur_item_wt:
            # if the target weight is < weight of the current weight, 
            # then current item cannot be included in the optimal subset,
            # so the optimal value remains unchanged.
            cur_optimal_value = optimal_weights.loc[i-1, j]
        else:
            # The target weight is the maximum of two cases
            # with and without the current item             
            cur_optimal_value = max(
                optimal_weights.loc[i-1, j], 
                optimal_weights.loc[i-1, j-cur_item_wt] + curr_item_val)           
        
        optimal_weights.loc[i, j] = cur_optimal_value

In [5]:
optimal_weights


Out[5]:
0 1 2 3 4 5 6 7 8 9 10
0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 10 10 10 10 10 10
2 0 0 0 0 50 50 50 50 50 60 60
3 0 0 0 0 50 50 50 50 50 60 80
4 0 0 0 50 50 50 50 100 100 100 100

The final result is the last element of optimal_weights.


In [6]:
optimal_weights.loc[len(items_df), W]


Out[6]:
100