In [1]:
%load_ext watermark
In [2]:
%watermark -a 'Sebastian Raschka' -u -d -v
For an introduction to greedy algorithm, see the related notebook, Introduction to Greedy Algorithms.
Set cover problems are problems where we want to find the minimum number of subsets such that their set union contains all items in a target set. This target set is typically called the "universe." To borrow an example from the excellent Wikipedia page on set cover problems, let's assume we have the universe
and are given the collection of sets
The task is to find the minimum number of sets in $C$ so that their union equals $U$.
Note that set cover problems are NP-complete, thus no computationally efficient solution exists. However, we can use greedy algorithms to approximate the solution; this solution may or may not be globally optimal.
The greedy strategy we are going to employ is very simple and works as follows:
In [3]:
collection = {'set_1': {1, 2, 3},
'set_2': {2, 4},
'set_3': {3, 4},
'set_4': {4, 5}}
sets_used = []
elements_not_covered = {1, 2, 3, 4, 5}
while elements_not_covered:
elements_covered = set()
for set_ in collection.keys():
if set_ in sets_used:
continue
current_set = collection[set_]
would_cover = elements_covered.union(current_set)
if len(would_cover) > len(elements_covered):
elements_covered = would_cover
sets_used.append(set_)
elements_not_covered -= elements_covered
if not elements_not_covered:
break
print(sets_used)
As a result, we can see that 3 sets are required to cover the universe U. In this case, the greedy algorithm has not found the globally optimal solution, which would be 'set_1'
and 'set_4'
. Note that this is just a trivial example, and greedy algorithms can be very useful approximators for solutions that are computationally infeasible.
For instance, an exhaustive solution to this problem that would guaranteed to find the global optimum (remember that set cover problems are NP-complete) would involve iterating over a power set, which has $2^n$ elements, where $n$ is the number of sets in the collection. For example, a collection of 30 sets would already require comparing the solutions of $2^{30}=1,073,741,824$ million possible combinations!
(Note that the greedy approach may have found the globally optimal solution in this simple example if it had iterated over the dictionary in a different order -- for example, if we had swapped the positions of {2, 4} and {4, 5})