In [1]:
import time
print('Last updated: %s' %time.strftime('%d/%m/%Y'))
The collections
module in Python's standard library provides some convienient container datatypes that not only make your code more "Pythonic", but they also are optimized for performance for certain operations. Previously, we have seen how we can use the collections.Counter
class (Day 3: 6 different ways for counting elements using a dictionary) to write short and efficient code to quantify items.
Today, we will take a look at another collections
class: the deque
container data type.
Most of you might already be familiar with deque
s due to their popularity in C++ and might want to skip this short introduction.
For all the others: deque
(pronounced "deck") is short for "double-ended queue" and thus it is sometimes also written as "dequeue". This abstract data structure is quite similar to an array (or Python list), which is optimized for adding and removing elements from either the back or front.
The deque
class comes with methods that are already familiar from the Python list
type, such as
append(val)
clear()
count(val)
returns the number of elements that are equal to val
extend(iterable)
pop()
reverse()
None
remove(val)
val
appendleft(val)
extendleft(iterable)
popleft()
rotate(n)
In [1]:
import collections
a = collections.deque([1,2,3])
a.rotate(2)
print(a)
Note that deque
s don't have equivalents to the Python list
methods, such as index()
for looking up values, or the .sort()
method (although the sorted()
function can be used as a workaround). If we want to get a slice from a deque
, equivalent to my_list[0:3]
, we would need to use itertools
, for example.
import itertools
list(itertools.islice(my_deque, 0, 3))
In [3]:
import timeit
import collections
from copy import copy
labels = ['.append()', '.insert(0,x)\n.appendleft(x)',
'.pop()', '.pop(0)\n.popleft()', '.reverse()']
list_time, deque_time = [], []
n = 10000
l = [i for i in range(n)]
d = collections.deque([i for i in range(n)])
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.append(1)',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.append(1)',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.insert(0,1)',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.appendleft(1)',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.pop()',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.pop()',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.pop(0)',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.popleft()',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
l_cp, d_cp = copy(l), copy(d)
list_time.append(min(timeit.Timer('l_cp.reverse()',
'from __main__ import l_cp').repeat(repeat=3, number=1000)))
deque_time.append(min(timeit.Timer('d_cp.reverse()',
'from __main__ import d_cp').repeat(repeat=3, number=1000)))
In [4]:
import platform
import multiprocessing
def print_sysinfo():
print('\nPython version :', platform.python_version())
print('compiler :', platform.python_compiler())
print('\nsystem :', platform.system())
print('release :', platform.release())
print('machine :', platform.machine())
print('processor :', platform.processor())
print('CPU count :', multiprocessing.cpu_count())
print('interpreter:', platform.architecture()[0])
print('\n\n')
In [5]:
%matplotlib inline
In [13]:
import matplotlib.pyplot as plt
def plot():
plt.rcParams.update({'font.size': 12})
fig = plt.figure(figsize=(11,13))
# Setting the positions and width for the bars
pos = list(range(len(list_time)))
width = 0.2
# Plotting the bars
fig, ax = plt.subplots(figsize=(8,6))
plt.bar(pos, list_time, width,
alpha=0.5,
color='g',
label=labels[0])
plt.bar([p + width for p in pos], deque_time, width,
alpha=0.5,
color='b',
label=labels[1])
# Setting axis labels and ticks
ax.set_ylabel('time in seconds\n(for 1,000 operations on starting size 10,000)')
ax.set_title('Python list and collections.deque operations')
ax.set_xticks([p + 1.5 * width for p in pos])
ax.set_xticklabels(labels)
# Setting the x-axis and y-axis limits
plt.xlim(0-width, len(list_time))
plt.ylim([0, max(list_time + deque_time) * 1.5])
# Adding the legend and showing the plot
plt.legend(['list', 'collections.deque'], loc='upper left')
plt.grid()
plt.show()
In [14]:
plot()
print_sysinfo()
As we can see in the plot above, the performance of the shared methods between Python list
and deque
objects are about the same. However, as expected, the appendleft(x)
and popleft(x)
methods on deques are significantly faster than workaround-approaches on Python list
s.
Thus, if our application requires a lot of adding and removing of items at both ends of the data structure, deque
s are definitely to be preferred over list
s.