In [ ]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
Today, we will be concerned solely with time complexity.
Formally, we want to know $T(d)$, where $d$ is any given dataset and $T(d)$ gives the total run time.
In [ ]:
def mini(x):
n = len(x)
mini = x[0]
for i in range(n):
if x[i] < mini:
mini= x[i]
return mini
Go ahead and work it out with your neighbors.
In [ ]:
x = np.random.rand(1000)
In [ ]:
print(mini(x))
print(x.min())
Each time the function is called, we have
n = len(x)
mini = x[0]
that's two instructions (again, ignoring how much goes into the len call).
Then, the for loop body requires either one or two instructions. You always have the comparison x[i] < mini, but you may or may not have the assignment mini = x[i].
$N_{inst}(x) = 9$
$N_{inst}(y) = 7$
The answer to "how long does this take" is...it depends on the input.
Since we would like to know how long an algorithm will take before we run it, let's examine the worst case scenario.
This allows us to looking for from $T(d)$ to $f(n)$, where $n \equiv \mathrm{size}(d)$ is the size of the dataset.
For our simple problem,
$$f(n) = 2 + 4n$$
In [ ]:
n = np.linspace(0,1000,10000)
f0 = 2; f1 = 1; f2 = 10; f3 = 2
g0 = 0; g1 = 10; g2 = 1; g3 = 1
f = f0 + f1*n + f2*n**2 + f3*n**3
g = g0 + g1*n + g2*n**2 + g3*n**3
In [ ]:
plt.figure()
plt.plot(n, f, label='f')
plt.plot(n, g, label='g')
plt.xlim(0,2)
plt.ylim(0,20)
plt.legend()
Clearly, we can drop the lower order terms and the coefficients $f_3$ and $g_3$.
We call this
$$O(n^3)$$,
and we say our algorithm is "$n^3$", meaning no worse than $n^3$.
Of course this is exactly the same notation and meaning as when we do a series expansion in any other calculation,
$$ e^{x} = 1 + x + \frac{x^2}{2} + O(x^3), x\to 0$$.
Calculate the complexity with your neighbors
In [ ]:
def f_x(particles):
G = 1
a_x = np.empty(len(particles))
for i, p in enumerate(particles):
for j, p in enumerate(particles):
if j != i:
a_x[i] -= G*p.x/(p.x**2 + p.y**2 + p.z**2)**1.5
In [ ]:
class Particle:
def __init__(self, r, v):
self.r = r
self.v = v
@property
def x(self):
return self.r[0]
@property
def y(self):
return self.r[1]
@property
def z(self):
return self.r[2]
In [ ]:
nn = np.array([10, 100, 300, 1000])
nnn = np.linspace(nn[0],nn.max(),10000)
p1 = [Particle(np.random.rand(3),(0,0,0)) for i in range(nn[0])]
p2 = [Particle(np.random.rand(3),(0,0,0)) for i in range(nn[1])]
p3 = [Particle(np.random.rand(3),(0,0,0)) for i in range(nn[2])]
p4 = [Particle(np.random.rand(3),(0,0,0)) for i in range(nn[3])]
In [ ]:
t1 = %timeit -o f_x(p1)
t2 = %timeit -o f_x(p2)
t3 = %timeit -o f_x(p3)
t4 = %timeit -o f_x(p4)
times = np.array([t1.average, t2.average, t3.average, t4.average])
In [ ]:
plt.figure()
plt.loglog(nn,times,'x', label='data')
plt.loglog(nnn,times[0]*(nnn/nnn[0])**2, label=r'$O(n^2)$')
plt.legend();plt.xlabel('data size');plt.ylabel('run time (s)')
In [ ]:
plt.figure()
plt.loglog(nn,times,'x', label='data')
plt.loglog(nnn,t1.average*(nnn/nnn[0])**3, label=r'$O(n^3)$')
plt.loglog(nnn,times[0]*(nnn/nnn[0])**2, label=r'$O(n^2)$')
plt.loglog(nnn,times[0]*(nnn/nnn[0]), label=r'$O(n)$')
plt.loglog(nnn,t1.average*(nnn/nnn[0])*np.log(nnn/nnn[0]), label=r'$O(n \log(n))$')
plt.legend()
plt.xlabel('data size')
plt.ylabel('run time (s)')
Let's talk about solving PDES:
$$\frac{\partial \mathbf{u}}{\partial t} + \mathbf{u} \cdot \nabla \mathbf{u} = -\frac{\nabla p}{\rho} + \nu \nabla^2 \mathbf{u} $$Let's focus on two ways of calculating gradients.
Finite Difference $$\frac{\partial u}{\partial x} \simeq \frac{u_{i+1} - u_{i-1}}{\Delta x}$$
Spectral $$\frac{\partial u}{\partial x} \simeq i k_j \sum_{j = 0}^{N} f_j \exp{i k_j x}$$
In [ ]:
plt.figure()
plt.loglog(nnn,times[0]*(nnn/nnn[0]), label=r'$O(n)$ finite difference')
plt.loglog(nnn,t1.average*(nnn/nnn[0])*np.log(nnn/nnn[0]), label=r'$O(n \log(n))$ spectral')
plt.legend()
plt.xlabel('data size')
plt.ylabel('run time (s)')