In [1]:
import time
print('Last updated: %s' %time.strftime('%d/%m/%Y'))
The collections module in Python's standard library provides some convienient container datatypes that not only make your code more "Pythonic", but they also are optimized for performance for certain operations. Previously, we have seen how we can use the collections.Counter class (Day 3: 6 different ways for counting elements using a dictionary) to write short and efficient code to quantify items.
Today, we will take a look at another collections class: the deque container data type.
Most of you might already be familiar with deques due to their popularity in C++ and might want to skip this short introduction.
For all the others: deque (pronounced "deck") is short for "double-ended queue" and thus it is sometimes also written as "dequeue". This abstract data structure is quite similar to an array (or Python list), which is optimized for adding and removing elements from either the back or front.
The deque class comes with methods that are already familiar from the Python list type, such as
append(val)clear()count(val)
returns the number of elements that are equal to val extend(iterable)pop()reverse()Noneremove(val)val appendleft(val)extendleft(iterable)popleft()rotate(n)
In [1]:
import collections
a = collections.deque([1,2,3])
a.rotate(2)
print(a)
Note that deques don't have equivalents to the Python list methods, such as index() for looking up values, or the .sort() method (although the sorted() function can be used as a workaround). If we want to get a slice from a deque, equivalent to my_list[0:3], we would need to use itertools, for example.
import itertools
list(itertools.islice(my_deque, 0, 3))
In [3]:
import timeit
import collections
from copy import copy
labels = ['.append()', '.insert(0,x)\n.appendleft(x)',
'.pop()', '.pop(0)\n.popleft()', '.reverse()']
list_time, deque_time = [], []
n = 10000
l = [i for i in range(n)]
d = collections.deque([i for i in range(n)])
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.append(1)',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.append(1)',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.insert(0,1)',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.appendleft(1)',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.pop()',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.pop()',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.pop(0)',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.popleft()',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.reverse()',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.reverse()',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
In [4]:
import platform
import multiprocessing
def print_sysinfo():
print('\nPython version :', platform.python_version())
print('compiler :', platform.python_compiler())
print('\nsystem :', platform.system())
print('release :', platform.release())
print('machine :', platform.machine())
print('processor :', platform.processor())
print('CPU count :', multiprocessing.cpu_count())
print('interpreter:', platform.architecture()[0])
print('\n\n')
In [5]:
%matplotlib inline
In [13]:
import matplotlib.pyplot as plt
def plot():
plt.rcParams.update({'font.size': 12})
fig = plt.figure(figsize=(11,13))
# Setting the positions and width for the bars
pos = list(range(len(list_time)))
width = 0.2
# Plotting the bars
fig, ax = plt.subplots(figsize=(8,6))
plt.bar(pos, list_time, width,
alpha=0.5,
color='g',
label=labels[0])
plt.bar([p + width for p in pos], deque_time, width,
alpha=0.5,
color='b',
label=labels[1])
# Setting axis labels and ticks
ax.set_ylabel('time in seconds\n(for 1,000 operations on starting size 10,000)')
ax.set_title('Python list and collections.deque operations')
ax.set_xticks([p + 1.5 * width for p in pos])
ax.set_xticklabels(labels)
# Setting the x-axis and y-axis limits
plt.xlim(0-width, len(list_time))
plt.ylim([0, max(list_time + deque_time) * 1.5])
# Adding the legend and showing the plot
plt.legend(['list', 'collections.deque'], loc='upper left')
plt.grid()
plt.show()
In [14]:
plot()
print_sysinfo()
As we can see in the plot above, the performance of the shared methods between Python list and deque objects are about the same. However, as expected, the appendleft(x) and popleft(x) methods on deques are significantly faster than workaround-approaches on Python lists.
Thus, if our application requires a lot of adding and removing of items at both ends of the data structure, deques are definitely to be preferred over lists.