Divide and Conquer

Pseudo-code

The "Divide and Conquer" maximum sub-array algorithm is described by the following pseudo-code:

DIVIDE_AND_CONQUER(A[1,...,N]){
    if N == 0 {
        return 0, A
    } else if N == 1 {
        return A[0], A
    }

    tmp_max = 0
    mid_max = 0
    mid_start = 0
    mid_end = 0

    left_max = 0
    right_max = 0

    midpoint = N / 2

    mid_start = midpoint
    mid_end = midpoint

    for i from A[N,...,midpoint] {
        tmp_max = tmp_max + A[i]
        if tmp_max > left_max {
            left_max = tmp_max
            mid_start = i
        }
    }

    tmp_max = 0

    for i from A[midpoint,...,N] {
        tmp_max = tmp_max + A[i]
        if tmp_max > right_max {
            right_max = tmp_max
            mid_end = i + 1
        }
    }

    mid_max = left_max + right_max

    left_max, left_subarray = DIVIDE_AND_CONQUER(A[0,...,midpoint])
    right_max, right_subarray = DIVIDE_AND_CONQUER(A[midpoint,...,N])

    if mid_max >= left_max and mid_max >= right_max {
        return mid_max, A[mid_start,...,mid_end]
    } else if left_max >= right_max and left_max > mid_max {
        return left_max, left_subarray
    } else if right_max > left_max and right_max > mid_max {
        return right_max, right_subarray
    }
}

Theoretical Run-time Analysis

In order to determine the run-time of the divide and conquer algorithm we will need to derive its recurrence. We will make a simplifying assumption that the original problem size is a power of two so that all of the subproblem sizes will remain integers. We will denote $T(n)$ as the run-time of "divide and conquer" on a subarray of $n$ elements. The base case is when $n = 0$ or $n = 1$ which will take constant time and as a result:

  • $T(1) = \theta(1)$

The recursive case occurs when $n \gt 1$. The variable initializing prior to the first for loop will also take constant time. The following for loops are used to find the maximum sub-array that crosses the array's midpoint and will add a runtime of $\theta(n)$. After the loops there are two recursive calls to the main function where we spend $T(\frac{n}{2})$ solving each of them so the total contribution to the running time is $2T(\frac{n}{2})$. The remaining if-else block would also run in constant time $(\theta(1))$. So the recurrence in total for $n \gt 1$ is:

  • $T(n) = 2T(\frac{n}{2}) + \theta(n) + \theta(1)$
  • $T(n) = 2T(\frac{n}{2}) + \theta(n)$

So the complete recurrence is:

  • $T(n) = \theta(1)$ (when $n = 0$ or $n = 1$)
  • $T(n) = 2T(\frac{n}{2}) + \theta(n)$ (when $n \gt 1$)

The recurrence can be solved using the master method as shown below:

  • $T(n) = 2T(\frac{n}{2}) + \theta(n)$
  • $a = 2, b = 2, f(n) = \theta(n)$
  • $n^{log_{b}(a)} = n^{log_{2}(2)} = n^{1}$
  • $\theta(n^{log_{b}(a)}) = \theta(n)$ (Case 2 applies)
  • $T(n) = \theta(n^{log_{2}(2)} * log_{2}(n)) = \theta(n * log_2(n))$ (By the master theorem)

So substituting in the runtime we get:

  • $T(n) = \theta(n * log_2(n)) + \theta(n)$

The $\theta(n)$ term drops off leaving us with:

  • $T(n) = \theta(n * log_2(n))$

The runtime above is for the divide and conquer algorithm for finding maximum sub arrays.

Experimental Analysis

For a series of array sizes $N$, $10$ random arrays were generated and run through the "Divide and Conquer" algorithm. The CPU clock time was recorded for each of the $10$ random array inputs, and an average run time was computed. Below is the plot of average run time versus $N$ for the "Divide and Conquer" algorithm.

A linear fit and logarithmic fit were applied to the average run time data, which resulted in the following fit equations as a function of $N$:

The function of the best logarithmic fit curve where $x$ is substituted for $N$:

$$y = 2.57865 * 10^{-7} * x * log(x) + 1.000$$

The function of the best linear fit curve where $x$ is substituted for $N$:

$$y = 6.15402 * 10^{-6} * x - 9.15974 * 10^{-1}$$

This plot has both a linear fit and a logarithmic fit to show how similiar they present for the values plotted. The logarithmic curve fits the data points almost exactly therefore it shows that the experimental values are strongly aligned with the theoretically derived runtime of $\theta(n * log_2(n))$.

Based on the average run time data curve fit, we would expect the "Divide and Conquer" algorithm to be able to process the following number of elements in the given amount of time:

Time Max Input Size
10 seconds $1687210$
30 seconds $5050390$
60 seconds $9848770$