I've been meaning to check this out for a long time.
The setup is simple. We write a monstrous decorator which modifies a simple function that iterates n times, to a function that runs the former on a number of inputs and returns how long each run took.
I'm not in the business of 4% performance improvements here. I'm after quadratic behavior!
In [1]:
from time import time
import numpy as np
import matplotlib.pyplot as plt
from collections import deque
import random
%matplotlib inline
In [2]:
def benchmark(counts):
def times(f):
def ret():
timings = []
for c in counts:
start = time()
f(c)
timings.append(time() - start)
return timings
return ret
return times
Let's start with the obvious: repeated string concatenation. Everyone knows about that one, right?
In [3]:
counts = [int(x) for x in np.logspace(4, 7, num=40)]
@benchmark(counts)
def accumulate_strings(n):
s = ''
for i in range(n):
s += 'a'
@benchmark(counts)
def list_join(n):
l = []
for i in range(n):
l.append('a')
''.join(l)
In [4]:
plt.plot(counts, accumulate_strings())
plt.plot(counts, list_join())
plt.legend(['s+=', "''.join()"], loc='best')
Out[4]:
Looks pretty linear to me. Apparently string +=
runs in O(1)
. That was honestly a shocking discovery to me. I've been telling people to avoid string +=
for years!
Let's test another common truth - that you should use a deque for repeated append
and pop
operations.
In [5]:
counts = [int(x) for x in np.logspace(4, 6)]
@benchmark(counts)
def add_remove_deque(n):
q = deque()
for i in range(n):
q.append(i)
for i in range(n):
q.pop()
@benchmark(counts)
def add_remove_list(n):
l = []
for i in range(n):
l.append(i)
for i in range(n):
l.pop()
In [6]:
plt.plot(counts, add_remove_deque())
plt.plot(counts, add_remove_list())
plt.legend(['deque', 'list'], loc='best')
Out[6]:
Seems like they're both linear time. Hey, how much is that is overhead?
In [7]:
@benchmark(counts)
def just_pass(n):
l = []
for i in range(n):
pass
for i in range(n):
pass
plt.plot(counts, add_remove_deque())
plt.plot(counts, add_remove_list())
plt.plot(counts, just_pass())
plt.legend(['deque', 'list', 'pass'], loc='best')
Out[7]:
Not that much. What if we insert to the end and take from the beginning?
In [8]:
counts = [int(x) for x in np.logspace(3, 5)]
@benchmark(counts)
def append_pop0_deque(n):
q = deque()
for i in range(n):
q.append(i)
for i in range(n):
q.popleft()
@benchmark(counts)
def append_pop0_list(n):
l = []
for i in range(n):
l.append(i)
for i in range(n):
l.pop(0)
plt.plot(counts, append_pop0_deque())
plt.plot(counts, append_pop0_list())
plt.legend(['deque', 'list'], loc='best')
Out[8]:
Finally we see a quadratic function. Seems that lists are fine with adding and removing from one end, but they're really bad when you need to pop(0)
repeatedly.
Are strings the same?
In [9]:
counts = [int(x) for x in np.logspace(3, 5)]
@benchmark(counts)
def accumulate_right(n):
s = ''
for i in range(n):
s = s + 'a'
@benchmark(counts)
def accumulate_left(n):
s = ''
for i in range(n):
s = 'a' + s
plt.plot(counts, accumulate_right())
plt.plot(counts, accumulate_left())
plt.legend(["s = s + 'a'", "s = 'a' + s"], loc='upper left')
Out[9]:
Certainly seems so. That one is a little strange. If x
and y
are strings, x + y
runs in O(len(y))
.
Out of curiosity, let's check the bytes
type.
In [10]:
counts = [int(x) for x in np.logspace(3, 5)]
@benchmark(counts)
def accumulate_bytes(n):
s = b''
for i in range(n):
s += b'a'
@benchmark(counts)
def join_bytes(n):
l = []
for i in range(n):
l.append(b'a')
b''.join(l)
plt.plot(counts, accumulate_bytes())
plt.plot(counts, join_bytes())
plt.legend(['accumulate', 'join'], loc='upper left')
Out[10]:
+=
loop for strings. We may have to re-educate ourselves!''.join()
is still preferable when you need to accumulate bytes..pop(0)
or .insert(0)
in a loop, you really want a collections.deque
.-- Noam Mor, 2015