# Python idioms performance characteristics

I've been meaning to check this out for a long time.

The setup is simple. We write a monstrous decorator which modifies a simple function that iterates n times, to a function that runs the former on a number of inputs and returns how long each run took.

I'm not in the business of 4% performance improvements here. I'm after quadratic behavior!

``````

In :

from time import time
import numpy as np
import matplotlib.pyplot as plt
from collections import deque
import random
%matplotlib inline

``````
``````

In :

def benchmark(counts):
def times(f):
def ret():
timings = []
for c in counts:
start = time()
f(c)
timings.append(time() - start)
return timings
return ret
return times

``````

``````

In :

counts = [int(x) for x in np.logspace(4, 7, num=40)]

@benchmark(counts)
def accumulate_strings(n):
s = ''
for i in range(n):
s += 'a'

@benchmark(counts)
def list_join(n):
l = []
for i in range(n):
l.append('a')
''.join(l)

``````
``````

In :

plt.plot(counts, accumulate_strings())
plt.plot(counts, list_join())
plt.legend(['s+=', "''.join()"], loc='best')

``````
``````

Out:

<matplotlib.legend.Legend at 0x7fba66d57978>

``````

Looks pretty linear to me. Apparently string `+=` runs in `O(1)`. That was honestly a shocking discovery to me. I've been telling people to avoid string `+=` for years!

Let's test another common truth - that you should use a deque for repeated `append` and `pop` operations.

``````

In :

counts = [int(x) for x in np.logspace(4, 6)]

@benchmark(counts)
q = deque()
for i in range(n):
q.append(i)
for i in range(n):
q.pop()

@benchmark(counts)
l = []
for i in range(n):
l.append(i)
for i in range(n):
l.pop()

``````
``````

In :

plt.legend(['deque', 'list'], loc='best')

``````
``````

Out:

<matplotlib.legend.Legend at 0x7fba66c90b38>

``````

Seems like they're both linear time. Hey, how much is that is overhead?

``````

In :

@benchmark(counts)
def just_pass(n):
l = []
for i in range(n):
pass
for i in range(n):
pass

plt.plot(counts, just_pass())
plt.legend(['deque', 'list', 'pass'], loc='best')

``````
``````

Out:

<matplotlib.legend.Legend at 0x7fba66c0beb8>

``````

Not that much. What if we insert to the end and take from the beginning?

``````

In :

counts = [int(x) for x in np.logspace(3, 5)]

@benchmark(counts)
def append_pop0_deque(n):
q = deque()
for i in range(n):
q.append(i)
for i in range(n):
q.popleft()

@benchmark(counts)
def append_pop0_list(n):
l = []
for i in range(n):
l.append(i)
for i in range(n):
l.pop(0)

plt.plot(counts, append_pop0_deque())
plt.plot(counts, append_pop0_list())
plt.legend(['deque', 'list'], loc='best')

``````
``````

Out:

<matplotlib.legend.Legend at 0x7fba66b7fe10>

``````

Finally we see a quadratic function. Seems that lists are fine with adding and removing from one end, but they're really bad when you need to `pop(0)` repeatedly.

Are strings the same?

``````

In :

counts = [int(x) for x in np.logspace(3, 5)]

@benchmark(counts)
def accumulate_right(n):
s = ''
for i in range(n):
s = s + 'a'

@benchmark(counts)
def accumulate_left(n):
s = ''
for i in range(n):
s = 'a' + s

plt.plot(counts, accumulate_right())
plt.plot(counts, accumulate_left())
plt.legend(["s = s + 'a'", "s = 'a' + s"], loc='upper left')

``````
``````

Out:

<matplotlib.legend.Legend at 0x7fba66af26a0>

``````

Certainly seems so. That one is a little strange. If `x` and `y` are strings, `x + y` runs in `O(len(y))`.

Out of curiosity, let's check the `bytes` type.

``````

In :

counts = [int(x) for x in np.logspace(3, 5)]

@benchmark(counts)
def accumulate_bytes(n):
s = b''
for i in range(n):
s += b'a'

@benchmark(counts)
def join_bytes(n):
l = []
for i in range(n):
l.append(b'a')
b''.join(l)

plt.plot(counts, accumulate_bytes())
plt.plot(counts, join_bytes())
plt.legend(['accumulate', 'join'], loc='upper left')

``````
``````

Out:

``````
``````

In :

counts = [int(x) for x in np.logspace(3, 6)]

@benchmark(counts)
def accumulate_1ref(n):
s = ''
for i in range(n):
s += 'a'

@benchmark(counts)
def accumulate_2ref(n):
s = ''
blah = s
q = s
for i in range(n):
s += 'a'

plt.plot(counts, accumulate_1ref())
plt.plot(counts, accumulate_2ref())
plt.legend(['1ref', '2ref'], loc='best')

``````
``````

Out:

<matplotlib.legend.Legend at 0x7f179b457a58>

``````

### Let's recap:

• There's actually nothing wrong with a `+=` loop for strings. We may have to re-educate ourselves!
• `''.join()` is still preferable when you need to accumulate bytes.
• If you find yourself write `.pop(0)` or `.insert(0)` in a loop, you really want a `collections.deque`.
• Postulate reaffirmed: Common truths should be tested!

-- Noam Mor, 2015