In [1]:
%load_ext watermark
In [2]:
%watermark -a 'Sebastian Raschka' -u -d -v
The subfamily of Divide-and-Conquer algorithms is one of the main paradigms of algorithmic problem solving next to Dynamic Programming and Greedy Algorithms. The main goal behind greedy algorithms is to implement an efficient procedure for often computationally more complex, often infeasible brute-force methods such as exhaustive search algorithms by splitting a task into subtasks that can be solved indpendently and in parallel; later, the solutions are combined to yield the final result.
Let's say we want to implement an algorithm that returns the index position of an item that we are looking for in an array. in an array. Here, we assume that the array is alreadt sorted. The simplest (and computationally most expensive) approach would be to check each element in the array iteratively, until we find the desired match or return -1:
In [3]:
def linear_search(lst, item):
for i in range(len(lst)):
if lst[i] == item:
return i
return -1
In [4]:
lst = [1, 5, 8, 12, 13]
for k in [8, 1, 23, 11]:
print(linear_search(lst=lst, item=k))
The runtime of linear search is obviously $O(n)$ since we are checking each element in the array -- remember that big-Oh is our upper bound. Now, a cleverer way of implementing a search algorithm would be binary search, which is a simple, yet nice example of a divide-and-conquer algorithm.
The idea behind divide-and-conquer algorithm is to break a problem down into non-overlapping subproblems of the original problem, which we can then solve recursively. Once, we processed these recursive subproblems, we combine the solutions into the end result.
Using a divide-and-conquer approach, we can implement an $O(\log n)$ search algorithm called binary search.
The idea behind binary search is quite simple:
midpoint - 1
midpoint + 1
Assuming that we are looking for the search key k=5, the individual steps of binary search can be illustrated as follows:
And below follows our Python implementation of this idea:
In [5]:
def binary_search(lst, item):
first = 0
last = len(lst) - 1
found = False
while first <= last and not found:
midpoint = (first + last) // 2
if lst[midpoint] == item:
found = True
else:
if item < lst[midpoint]:
last = midpoint - 1
else:
first = midpoint + 1
if found:
return midpoint
else:
return -1
In [6]:
for k in [8, 1, 23, 11]:
print(binary_search(lst=lst, item=k))
"Finding the Majority Element" is a problem where we want to find an element in an array positive integers with length n that occurs more than n/2 in that array. For example, if we have an array $a = [1, 2, 3, 3, 3]$, $3$ would be the majority element. In another array, b = [1, 2, 3, 3] there exists no majority element, since $2$ (where $2$ is the the count of element $3$) is not greater than $n / 2$.
Let's start with a simple implementation where we count how often each unique element occurs in the array. Then, we return the element that meets the criterion "$\text{occurences } > n / 2$", and if such an element does not exist, we return -1. Note that we return a tuple of three items: (element, number_occurences, count_dictionary), which we will use later ...
In [7]:
def majority_ele_lin(lst):
cnt = {}
for ele in lst:
if ele not in cnt:
cnt[ele] = 1
else:
cnt[ele] += 1
for ele, c in cnt.items():
if c > (len(lst) // 2):
return (ele, c, cnt)
return (-1, -1, cnt)
###################################################
lst0 = []
print(lst0, '->', majority_ele_lin(lst=lst0)[0])
lst1 = [1, 2, 3, 4, 4, 5]
print(lst1, '->', majority_ele_lin(lst=lst1)[0])
lst2 = [1, 2, 4, 4, 4, 5]
print(lst2, '->', majority_ele_lin(lst=lst2)[0])
lst3 = [4, 2, 4, 4, 4, 5]
print(lst3, '->', majority_ele_lin(lst=lst3)[0])
print(lst3[::-1], '->', majority_ele_lin(lst=lst3[::-1])[0])
lst4 = [2, 3, 9, 2, 2]
print(lst4, '->',majority_ele_lin(lst=lst4)[0])
print(lst4[::-1], '->', majority_ele_lin(lst=lst4[::-1])[0])
lst5 = [0, 0, 2, 2, 2]
print(lst5, '->',majority_ele_lin(lst=lst5)[0])
print(lst5[::-1], '->', majority_ele_lin(lst=lst5[::-1])[0])
Now, "finding the majority element" is a nice task for a Divide and Conquer algorithm. Here, we use the fact that if a list has a majority element it is also the majority element of one of its two sublists, if we split it into 2 halves.
More concretely, what we do is:
In [8]:
def majority_ele_dac(lst):
n = len(lst)
left = lst[:n // 2]
right = lst[n // 2:]
l_maj = majority_ele_lin(left)
r_maj = majority_ele_lin(right)
# case 3A
if l_maj[0] == -1 and r_maj[0] == -1:
return -1
# case 3B
elif l_maj[0] == -1 and r_maj[0] > -1:
cnt = r_maj[1]
if r_maj[0] in l_maj[2]:
cnt += l_maj[2][r_maj[0]]
if cnt > n // 2:
return r_maj[0]
# case 3C
elif r_maj[0] == -1 and l_maj[0] > -1:
cnt = l_maj[1]
if l_maj[0] in r_maj[2]:
cnt += r_maj[2][l_maj[0]]
if cnt > n // 2:
return l_maj[0]
# case 3D
else:
c1, c2 = l_maj[1], r_maj[1]
if l_maj[0] in r_maj[2]:
c1 = l_maj[1] + r_maj[2][l_maj[0]]
if r_maj[0] in l_maj[2]:
c2 = r_maj[1] + l_maj[2][r_maj[0]]
m = max(c1, c2)
if m > n // 2:
return m
return -1
###################################################
lst0 = []
print(lst0, '->', majority_ele_dac(lst=lst0))
lst1 = [1, 2, 3, 4, 4, 5]
print(lst1, '->', majority_ele_dac(lst=lst1))
lst2 = [1, 2, 4, 4, 4, 5]
print(lst2, '->', majority_ele_dac(lst=lst2))
lst3 = [4, 2, 4, 4, 4, 5]
print(lst3, '->', majority_ele_dac(lst=lst3))
print(lst3[::-1], '->', majority_ele_dac(lst=lst3[::-1]))
lst4 = [2, 3, 9, 2, 2]
print(lst4, '->',majority_ele_dac(lst=lst4))
print(lst4[::-1], '->', majority_ele_dac(lst=lst4[::-1]))
lst5 = [0, 0, 2, 2, 2]
print(lst5, '->',majority_ele_dac(lst=lst5))
print(lst5[::-1], '->', majority_ele_dac(lst=lst5[::-1]))
In algorithms such as binary search that we saw at the beginning of this notebook, we recursively break down our problem into smaller subproblems. Thus, we have a recurrence problem with time complexity
$T(n) = T(\frac{2}{n}) + O(1) \rightarrow T(n) = O(\log n).$
In this example, finding the majority element, we break our problem down into only 2 subproblems. Thus, the complexity of our algorithm is
$T(n) = 2T (\frac{2}{n} + O(n) \rightarrow T(n) = O(n \log n).$
Our Divide and Conquer approach above is actually a good candidate for multi-processing, since we can parallelize the majority element search in the two sub-lists. So, let's make a simple modification and use Python's multiprocessing
module for that. Here, we use the apply_async
method from the Pool
class, which doesn't return the results in order (in contrast to the apply
method). Thus, the left sublist and right sublist may be swapped in the variable assignment l_maj, r_maj = [p.get() for p in results]
. However, for our implementation, this doesn't make a difference.
In [9]:
import multiprocessing as mp
def majority_ele_dac_mp(lst):
n = len(lst)
left = lst[:n // 2]
right = lst[n // 2:]
results = (pool.apply_async(majority_ele_lin, args=(x,))
for x in (left, right))
l_maj, r_maj = [p.get() for p in results]
if l_maj[0] == -1 and r_maj[0] == -1:
return -1
elif l_maj[0] == -1 and r_maj[0] > -1:
cnt = r_maj[1]
if r_maj[0] in l_maj[2]:
cnt += l_maj[2][r_maj[0]]
if cnt > n // 2:
return r_maj[0]
elif r_maj[0] == -1 and l_maj[0] > -1:
cnt = l_maj[1]
if l_maj[0] in r_maj[2]:
cnt += r_maj[2][l_maj[0]]
if cnt > n // 2:
return l_maj[0]
else:
c1, c2 = l_maj[1], r_maj[1]
if l_maj[0] in r_maj[2]:
c1 = l_maj[1] + r_maj[2][l_maj[0]]
if r_maj[0] in l_maj[2]:
c2 = r_maj[1] + l_maj[2][r_maj[0]]
m = max(c1, c2)
if m > n // 2:
return m
return -1
###################################################
lst0 = []
print(lst0, '->', majority_ele_dac(lst=lst0))
lst1 = [1, 2, 3, 4, 4, 5]
print(lst1, '->', majority_ele_dac(lst=lst1))
lst2 = [1, 2, 4, 4, 4, 5]
print(lst2, '->', majority_ele_dac(lst=lst2))
lst3 = [4, 2, 4, 4, 4, 5]
print(lst3, '->', majority_ele_dac(lst=lst3))
print(lst3[::-1], '->', majority_ele_dac(lst=lst3[::-1]))
lst4 = [2, 3, 9, 2, 2]
print(lst4, '->',majority_ele_dac(lst=lst4))
print(lst4[::-1], '->', majority_ele_dac(lst=lst4[::-1]))
lst5 = [0, 0, 2, 2, 2]
print(lst5, '->',majority_ele_dac(lst=lst5))
print(lst5[::-1], '->', majority_ele_dac(lst=lst5[::-1]))
In [ ]: