In [1]:
%load_ext watermark

In [2]:
%watermark -a 'Sebastian Raschka' -u -d -v


Sebastian Raschka 
last updated: 2016-06-08 

CPython 3.5.1
IPython 4.2.0

More Greedy Algorithm Examples

For an introduction to greedy algorithm, see the related notebook, Introduction to Greedy Algorithms.

Set Cover Problems

Set cover problems are problems where we want to find the minimum number of subsets such that their set union contains all items in a target set. This target set is typically called the "universe." To borrow an example from the excellent Wikipedia page on set cover problems, let's assume we have the universe

  • $U=\{1, 2, 3, 4, 5\}$

and are given the collection of sets

  • $C=\{\{1, 2, 3\}, \{2, 4\}, \{3, 4\}, \{4, 5\}\}$

The task is to find the minimum number of sets in $C$ so that their union equals $U$.

Note that set cover problems are NP-complete, thus no computationally efficient solution exists. However, we can use greedy algorithms to approximate the solution; this solution may or may not be globally optimal.

The greedy strategy we are going to employ is very simple and works as follows:

  • While not all elements in U are covered:
    • For all uncovered sets in C:
    • Pick the set that covers most of the elements in U

In [3]:
collection = {'set_1': {1, 2, 3},
              'set_2': {2, 4}, 
              'set_3': {3, 4}, 
              'set_4': {4, 5}}
sets_used = []
elements_not_covered = {1, 2, 3, 4, 5}


while elements_not_covered:
    elements_covered = set()
    for set_ in collection.keys():
        
        if set_ in sets_used:
            continue
        
        current_set = collection[set_]
        would_cover = elements_covered.union(current_set)
        if len(would_cover) > len(elements_covered):
            elements_covered = would_cover
            sets_used.append(set_)
            elements_not_covered -= elements_covered
            
            if not elements_not_covered:
                break
    
print(sets_used)


['set_1', 'set_2', 'set_4']

As a result, we can see that 3 sets are required to cover the universe U. In this case, the greedy algorithm has not found the globally optimal solution, which would be 'set_1' and 'set_4'. Note that this is just a trivial example, and greedy algorithms can be very useful approximators for solutions that are computationally infeasible.

For instance, an exhaustive solution to this problem that would guaranteed to find the global optimum (remember that set cover problems are NP-complete) would involve iterating over a power set, which has $2^n$ elements, where $n$ is the number of sets in the collection. For example, a collection of 30 sets would already require comparing the solutions of $2^{30}=1,073,741,824$ million possible combinations!

(Note that the greedy approach may have found the globally optimal solution in this simple example if it had iterated over the dictionary in a different order -- for example, if we had swapped the positions of {2, 4} and {4, 5})