2. Algorithm Analysis

2.2. What Is Algorithm Analysis?


In [3]:
import time

def sumOfN(n):
   start = time.time()

   theSum = 0
   for i in range(1,n+1):
      theSum = theSum + i

   end = time.time()

   return theSum,end-start

for i in range(5):
       print("Sum is %d required %10.7f seconds"%sumOfN(1000000))


Sum is 500000500000 required  0.1758540 seconds
Sum is 500000500000 required  0.1900620 seconds
Sum is 500000500000 required  0.1335070 seconds
Sum is 500000500000 required  0.1104770 seconds
Sum is 500000500000 required  0.1154571 seconds

2.3. Big-O Notation

The order of magnitude function describes the part of T(n) that increases the fastest as the value of n increases. Order of magnitude is often called Big-O notation (for “order”) and written as O(f(n)).

It provides a useful approximation to the actual number of steps in the computation. The function f(n) provides a simple representation of the dominant part of the original T(n).

  • The parameter n is often referred to as the “size of the problem,” and we can read this as “T(n) is the time it takes to solve a problem of size n”

As another example, suppose that for some algorithm, the exact number of steps is

T(n)=5nˆ2+27n+1005

When n is small, say 1 or 2, the constant 1005 seems to be the dominant part of the function. However, as n gets larger, the nˆ2 term becomes the most important.

In fact, when n is really large, the other two terms become insignificant in the role that they play in determining the final result

Again, to approximate T(n) as n gets large, we can ignore the other terms and focus on 5nˆ2.

In addition, the coefficient 5 becomes insignificant as n gets large.

We would say then that the function T(n) has an order of magnitude f(n)=n2, or simply that it is O(n2).

  • Although we do not see this in the summation example, sometimes the performance of an algorithm depends on the exact values of the data rather than simply the size of the problem.
  • For these kinds of algorithms we need to characterize their performance in terms of best case, worst case, or average case performance.
  • The worst case performance refers to a particular data set where the algorithm performs especially poorly. Whereas a different data set for the exact same algorithm might have extraordinarily good performance. However, in most cases the algorithm performs somewhere in between these two extremes (average case). It is important for a computer scientist to understand these distinctions so they are not misled by one particular case.
  • A number of very common order of magnitude functions will come up over and over as you study algorithms
f(n) Name
1 Constant
log⁡ n Logarithmic
n Linear
n log n Log Linear
nˆ2 Quadratic
nˆ3 Cubic
2ˆn Exponential

In [12]:
import matplotlib.pyplot as plt
import math

n = 11

x = list(range(1,n))

linear = [x for x in range(1, n)]
Quadratic = [x**2 for x in range(1, n)]
LogLinear = [x * math.log(x) for x in range(1, n)]
Cubic = [x**3 for x in range(1, n)]
Exponential = [2**x for x in range(1, n)]
logarithmic = [math.log(x) for x in range(1, n)]

plt.plot(x, linear,'k--', label='linear')
plt.plot(x, Quadratic, 'k:', label='Quadratic')
plt.plot(x, LogLinear, 'y', label='LogLinear')
plt.plot(x, Cubic, 'b', label='Cubic')
plt.plot(x, Exponential, 'r', label='Exponential')
plt.plot(x, logarithmic, 'g', label='logarithmic')
    
legend = plt.legend(shadow=True)

plt.grid(True)
plt.show()



In [13]:
a=5
b=6
c=10
n = 100
for i in range(n):
   for j in range(n):
      x = i * i
      y = j * j
      z = i * j
for k in range(n):
   w = a*k + 45
   v = b*b
d = 33

The number of assignment operations is the sum of four terms.

  • The first term is the constant 3, representing the three assignment statements at the start of the fragment.
  • The second term is 3nˆ2, since there are three statements that are performed nˆ2 times due to the nested iteration.
  • The third term is 2n, two statements iterated n times.
  • Finally, the fourth term is the constant 1, representing the final assignment statement.

This gives us T(n)=3+3n2+2n+1=3nˆ2+2n+4.

  • By looking at the exponents, we can easily see that the n2 term will be dominant and therefore this fragment of code is O(n2).
  • Note that all of the other terms as well as the coefficient on the dominant term can be ignored as n grows larger.

2.6. Lists


In [14]:
import timeit

def test1():
    l = []
    for i in range(1000):
        l = l + [i]

def test2():
    l = []
    for i in range(1000):
        l.append(i)

def test3():
    l = [i for i in range(1000)]

def test4():
    l = list(range(1000))

t1 = timeit.Timer("test1()", "from __main__ import test1")
print("concat ",t1.timeit(number=1000), "milliseconds")
t2 = timeit.Timer("test2()", "from __main__ import test2")
print("append ",t2.timeit(number=1000), "milliseconds")
t3 = timeit.Timer("test3()", "from __main__ import test3")
print("comprehension ",t3.timeit(number=1000), "milliseconds")
t4 = timeit.Timer("test4()", "from __main__ import test4")
print("list range ",t4.timeit(number=1000), "milliseconds")


concat  2.258772683002462 milliseconds
append  0.15120430599927204 milliseconds
comprehension  0.06961311700433725 milliseconds
list range  0.04029222999815829 milliseconds

Big-O efficiency of all the basic list operations.

Operation Big-O Efficiency
index [] O(1)
index assignment O(1)
append O(1)
pop() O(1)
pop(i) O(n)
insert(i,item) O(n)
del operator O(n)
iteration O(n)
contains (in) O(n)
get slice [x:y] O(k)
del slice O(n)
set slice O(n+k)
reverse O(n)
concatenate O(k)
sort O(n log n)
multiply O(nk)

In [15]:
popzero = timeit.Timer("x.pop(0)",
                       "from __main__ import x")
popend = timeit.Timer("x.pop()",
                      "from __main__ import x")

x = list(range(2000000))
print(popzero.timeit(number=1000))

x = list(range(2000000))
print(popend.timeit(number=1000))


2.465543024998624
0.0001711369986878708

2.7. Dictionaries

operation Big-O Efficiency
copy O(n)
get item O(1)
set item O(1)
delete item O(1)
contains (in) O(1)
iteration O(n)

Comparing Dictionary time with list time


In [16]:
import timeit
import random

for i in range(10000,1000001,20000):
    t = timeit.Timer("random.randrange(%d) in x" % i,
                     "from __main__ import random, x")
    x = list(range(i))
    lst_time = t.timeit(number=1000)
    x = {j:None for j in range(i)}
    d_time = t.timeit(number=1000)
    print("%d,%10.3f,%10.3f" % (i, lst_time, d_time))


10000,     0.189,     0.002
30000,     0.442,     0.002
50000,     0.626,     0.004
70000,     1.240,     0.002
90000,     1.313,     0.002
110000,     1.457,     0.002
130000,     1.795,     0.002
150000,     2.015,     0.002
170000,     2.436,     0.002
190000,     2.524,     0.002
210000,     2.706,     0.002
230000,     3.028,     0.002
250000,     3.242,     0.003
270000,     3.386,     0.002
290000,     3.668,     0.007
310000,     4.059,     0.003
330000,     4.082,     0.002
350000,     4.089,     0.003
370000,     4.957,     0.002
390000,     5.160,     0.002
410000,     5.023,     0.010
430000,     5.368,     0.003
450000,     5.930,     0.002
470000,     6.128,     0.002
490000,     6.105,     0.002
510000,     8.307,     0.003
530000,     7.444,     0.002
550000,     6.726,     0.002
570000,     8.511,     0.003
590000,     8.130,     0.003
610000,     7.316,     0.002
630000,     7.886,     0.002
650000,     7.947,     0.002
670000,     8.214,     0.003
690000,     8.577,     0.002
710000,    11.211,     0.003
730000,    14.261,     0.003
750000,    10.711,     0.003
770000,    12.565,     0.002
790000,    11.537,     0.002
810000,    12.884,     0.003
830000,    11.781,     0.002
850000,    10.902,     0.003
870000,    12.310,     0.002
890000,    12.671,     0.002
910000,    11.249,     0.002
930000,    11.611,     0.003
950000,    11.475,     0.003
970000,    13.233,     0.003
990000,    13.667,     0.003

In [ ]: