0-1 Knapsack Problem using Dynamic Programming

Formal Description

Given two n-tuples of positive numbers $[v_1, v_2, ..., v_n]$ and $[w_1, w_2, ..., w_n]$, find the $\sum_i (v_i)$ that maximizes $\sum_i (w_i)$ subject to the constraint $\sum_i (w_i) \leq W$.

This problem is called 0-1 Knapsack problem because the i-th item is either in the optimal subset or it's not.

If fractions of the items are allowed, then there's a simple Greedy solution:

Calculate $u_i = \frac {v_i} {w_i}$ for each $i$.
Sort the items by the decreasing order of $u_i$.
Take each item $u_i$ until W runs out. If we can't take the entire item, take the fraction $f$ such that addition of $f*w_i$ keeps the sum of weights in the optimal subset <= W.

The brute force approach requires generating all possible combinations ($2^n-1$) which is exponential in n.

Optimal Substructure Step

The optimal substructure property of the Dynamic Programming solution for the problem can be expressed thus:

Let's say at any given point, $S$ is a potential optimal subset excluding item $i$ and $V(S, w)$ is the maximum value for a weight contraint $w$. Then, the maximum value of the potential optimal subset including item $i$ is:

$$ V(S \cup \{item_i\}, w) = \begin{cases} V(S, w), \text{if} \space w_i \gt w \\ max(V(S, w), V(S \cup \{item_i\}, w - w_i)), \text{otherwise} \end{cases} $$

To implement this algorithm in code, we'll use a 2D array (pandas DataFrame) with one row per item and W+1 columns. Each cell in $row_i, column_j$ represents the maximum value using the first $i$ items subject to weight constraint $j$.



In [1]:

    
import pandas as pd
import numpy as np



In [2]:

    
# Solve the problem for this toy example
weights = [5, 4, 6, 3]
values = [10, 50, 30, 50]
W = 10

# Handy data structure to refer to values and weights easily
items_df = pd.DataFrame(
    index=range(1, len(weights)+1), 
    data=list(zip(weights, values)),
    columns=['weights', 'values']
)

items_df



In [3]:

    
# 2D array to store the memoized optimal subproblem solutions
optimal_weights = pd.DataFrame(index=range(len(weights)+1), data=0, columns=range(W+1))
optimal_weights

Now we're ready to implement the DP step.



In [4]:

    
for i in optimal_weights.index[1:]:
    curr_item_val, cur_item_wt = items_df.loc[i, ['values', 'weights']]
    
    for j in range(1, W+1):
        if j < cur_item_wt:
            # if the target weight is < weight of the current weight, 
            # then current item cannot be included in the optimal subset,
            # so the optimal value remains unchanged.
            cur_optimal_value = optimal_weights.loc[i-1, j]
        else:
            # The target weight is the maximum of two cases
            # with and without the current item             
            cur_optimal_value = max(
                optimal_weights.loc[i-1, j], 
                optimal_weights.loc[i-1, j-cur_item_wt] + curr_item_val)           
        
        optimal_weights.loc[i, j] = cur_optimal_value



In [5]:

    
optimal_weights

The final result is the last element of optimal_weights.



In [6]:

    
optimal_weights.loc[len(items_df), W]









    Out[6]:





100

	0	1	2	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0

	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0
1	0	0	10	10	10	10	10	10
2	0	50	50	50	50	50	60	60
3	0	50	50	50	50	50	60	80
4	50	50	50	50	100	100	100	100

	0	1	2	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0

	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0
1	0	0	10	10	10	10	10	10
2	0	50	50	50	50	50	60	60
3	0	50	50	50	50	50	60	80
4	50	50	50	50	100	100	100	100

	0	1	2	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0

	3	4	5	6	7	8	9	10
0	0	0	0	0	0	0	0	0
1	0	0	10	10	10	10	10	10
2	0	50	50	50	50	50	60	60
3	0	50	50	50	50	50	60	80
4	50	50	50	50	100	100	100	100