# Formal Description

Given two n-tuples of positive numbers $[v_1, v_2, ..., v_n]$ and $[w_1, w_2, ..., w_n]$, find the $\sum_i (v_i)$ that maximizes $\sum_i (w_i)$ subject to the constraint $\sum_i (w_i) \leq W$.

This problem is called 0-1 Knapsack problem because the i-th item is either in the optimal subset or it's not.

If fractions of the items are allowed, then there's a simple Greedy solution:

• Calculate $u_i = \frac {v_i} {w_i}$ for each $i$.
• Sort the items by the decreasing order of $u_i$.
• Take each item $u_i$ until W runs out. If we can't take the entire item, take the fraction $f$ such that addition of $f*w_i$ keeps the sum of weights in the optimal subset <= W.

The brute force approach requires generating all possible combinations ($2^n-1$) which is exponential in n.

# Optimal Substructure Step

The optimal substructure property of the Dynamic Programming solution for the problem can be expressed thus:

Let's say at any given point, $S$ is a potential optimal subset excluding item $i$ and $V(S, w)$ is the maximum value for a weight contraint $w$. Then, the maximum value of the potential optimal subset including item $i$ is:

$$V(S \cup \{item_i\}, w) = \begin{cases} V(S, w), \text{if} \space w_i \gt w \\ max(V(S, w), V(S \cup \{item_i\}, w - w_i)), \text{otherwise} \end{cases}$$

To implement this algorithm in code, we'll use a 2D array (pandas DataFrame) with one row per item and W+1 columns. Each cell in $row_i, column_j$ represents the maximum value using the first $i$ items subject to weight constraint $j$.



In [1]:

import pandas as pd
import numpy as np




In [2]:

# Solve the problem for this toy example
weights = [5, 4, 6, 3]
values = [10, 50, 30, 50]
W = 10

# Handy data structure to refer to values and weights easily
items_df = pd.DataFrame(
index=range(1, len(weights)+1),
data=list(zip(weights, values)),
columns=['weights', 'values']
)

items_df




Out[2]:

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}

weights
values

1
5
10

2
4
50

3
6
30

4
3
50




In [3]:

# 2D array to store the memoized optimal subproblem solutions
optimal_weights = pd.DataFrame(index=range(len(weights)+1), data=0, columns=range(W+1))
optimal_weights




Out[3]:

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}

0
1
2
3
4
5
6
7
8
9
10

0
0
0
0
0
0
0
0
0
0
0
0

1
0
0
0
0
0
0
0
0
0
0
0

2
0
0
0
0
0
0
0
0
0
0
0

3
0
0
0
0
0
0
0
0
0
0
0

4
0
0
0
0
0
0
0
0
0
0
0



Now we're ready to implement the DP step.



In [4]:

for i in optimal_weights.index[1:]:
curr_item_val, cur_item_wt = items_df.loc[i, ['values', 'weights']]

for j in range(1, W+1):
if j < cur_item_wt:
# if the target weight is < weight of the current weight,
# then current item cannot be included in the optimal subset,
# so the optimal value remains unchanged.
cur_optimal_value = optimal_weights.loc[i-1, j]
else:
# The target weight is the maximum of two cases
# with and without the current item
cur_optimal_value = max(
optimal_weights.loc[i-1, j],
optimal_weights.loc[i-1, j-cur_item_wt] + curr_item_val)

optimal_weights.loc[i, j] = cur_optimal_value




In [5]:

optimal_weights




Out[5]:

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}

0
1
2
3
4
5
6
7
8
9
10

0
0
0
0
0
0
0
0
0
0
0
0

1
0
0
0
0
0
10
10
10
10
10
10

2
0
0
0
0
50
50
50
50
50
60
60

3
0
0
0
0
50
50
50
50
50
60
80

4
0
0
0
50
50
50
50
100
100
100
100



The final result is the last element of optimal_weights.



In [6]:

optimal_weights.loc[len(items_df), W]




Out[6]:

100