Before we delve into the topic of comprehensions, here is a bit of setup code.
In [ ]:
from numpy.random import randint
import matplotlib.pyplot as plt
%matplotlib inline
S = randint(low=0, high=11, size=15) # 10 random integers b/w 0 and 10
def f(x):
"""
Dummy function - returns identity
"""
return x
List comprehensions are a special syntax for compactly generating lists. Historically, they come from a programming style referred to as functional programming.
A list comprehensions can always be expanded into procedural statements using loops. Although comprehensions have a slight advantage in performance when compared to loops, this doesn't mean that you should always prefer comprehensions over procedural code. Too much syntactic sugar can be hazardous to your (program's) health, in the sense of making it hard to read.
The following sections contain examples of common patterns where list comprehensions are useful. The patterns described here are by no means exhaustive. Rather, they are meant to act as a solution template for common problems.
A typical use of a list comprehension in a single variable is to expand statements in mathematics known as (universal) quantifiers, or "for-all" statements.
In [ ]:
print("1. S == {}".format(S))
y1 = [f(x) for x in S]
print("2. All (x, f(x)) pairs: {}".format(list(zip(S, y1))))
plt.scatter(S, y1)
As you can see, the translation from the math to code is natural.
In a procedural or iterative style, an equivalent program might look like the following.
In [ ]:
y2 = []
for x in S:
y2.append(f(x))
print("3. All (x, f(x)) pairs: {}".format(list(zip(S, y2))))
assert y1 == y2
In [ ]:
y1 = [0 if x <= 5 else f(x) for x in S]
print(*zip(S,y1))
plt.scatter(S, y1)
NOTE:
This is not different from the first pattern in syntactic terms. This is a trick based on the ternary expressions in Python.
The procedural equivalent of this code is shown below.
In [ ]:
y2 = []
for x in S:
if x <= 5:
y2.append(0)
else:
y2.append(f(x))
print(*zip(S, y2))
assert y1 == y2
print("Passed!")
The two patterns shown in examples 1 and 2 can be generalised to the following pattern.
output_list = [expression(i) for i in some_iterable]
Suppose we wish to construct a list from a subset of the elements of $S$. That is, let $R \subseteq S$ and consider
\begin{align*} y = f(x) \ \forall x \in R, \mbox{ where } R \subseteq S. \end{align*}As this notation indicates, we are interested in the function's value for only a subset of the input space, namely $R \subseteq S$. The subset can be seen as imposing a condition on the input space.
For the purpose of this example, we will use $R = \{x: x \leq 5, x \in S\}$.
In [ ]:
y1 = [f(x) for x in S if x <= 5]
s = [x for x in S if x <= 5]
print(*zip(s,y1))
# Note how the output range has been modified due to the change in input range
plt.scatter(s, y1)
The procedural equivalent of this code is shown below.
In [ ]:
y2 = []
for x in S:
if x <= 5:
y2.append(f(x))
assert y2 == y1
print(*zip(S, y2))
print("Passed!")
This pattern is syntactically different from the previous pattern. It can be generalized as
output_list = [expression(i) for i in some_iterable if condition(i)]
Comprehensions can also be extended to multiple variables. The rules discussed in the previous section also apply to the multivariable comprehensions. The main thing you need to remember for multivariable comprehensions is that the outer variable in the comprehension varies the fastest.
For example, imagine a matrix $C$ whose elements are given by
\begin{align*} c_{i,j} &= g(i,j) \\ i &\in 0\cdots2,\ j \in 0\cdots2 \end{align*}We can create the (flattened) matrix in a single list comprehension using the following code.
In [ ]:
import numpy as np
def g(i, j):
"""
Returns the result of division of indices
"""
return (i + 1) / (j + 1)
C1 = [g(i,j) for i in range(0,3) for j in range(0,3)] # replace g with any function that you want
print(C1)
print(np.array(C1).reshape(3,3))
The procedural equivalent of this code is shown below.
In [ ]:
C2 = []
for i in range(3):
for j in range(3):
C2.append(g(i, j))
print(C2)
assert C1 == C2
print("Passed!")
In [ ]:
C1 = [g(i,j) if i !=j else 0 for i in range(0,3) for j in range(0,3)]
print(C1)
print(np.array(C1).reshape(3,3))
Technically, this is the same pattern as the previous example but uses the ternary operator (as shown in example 2). The procedural equivalent is shown below.
In [ ]:
C2 = []
for i in range(3):
for j in range(3):
if i != j:
C2.append(g(i,j))
else:
C2.append(0)
print(C2)
assert C1 == C2
print("Passed!")
The two examples can be generalized to
output_list = [expr(i,j) for i in iterable1 for j in iterable2] # j varies fastest
Restrictions on the input space as shown in example 3 can also extended to the multivariable comprehension. This is illustrated below for the sake of completeness, though the result cannot be displayed as a matrix.
For example, \begin{align*} C &= g(i,j) \\ i &\in 0\cdots2,\ j \in 0\cdots2,\ i \neq j \end{align*}
In [ ]:
C1 = [ (i, j, g(i,j)) for i in range(0,3) for j in range(0,3) if i !=j]
print(C1) # note that the input restriction on the diagonals removes the diagonals from the output list
The procedural equivalent is shown below.
In [ ]:
C2 = []
for i in range(3):
for j in range(3):
if i != j:
C2.append((i, j, g(i,j)))
print(C2)
assert C1 == C2
print("Passed!")
The pattern can be generalized as
output_list = [expr(i,j) for i in iterable1 for j in iterable2 if condition(i,j)]
Comprehensions can be used with even more variables but readability takes a serious hit with more than two variables.
See PEP 202 (https://www.python.org/dev/peps/pep-0202/) for more details about list comprehensions. I highly encourage reading PEP documents since you often get the rationale behind a feature in the language straight from the horse's mouth.
Some other built-in collections in Python have "comprehensive" analogues. One example is the dictionary comprehension, which is described in PEP 274 (https://www.python.org/dev/peps/pep-0274/).
You can use dictionary comprehensions in ways very similar to list comprehensions, except that the output of a dictionary comprehension is, well, a dictionary instead of a list.
Mathematically, dictionary comprehensions are suited to representing functions.
$$ \begin{align*} x \rightarrow f(x), x \in S \end{align*} $$can be translated as
In [ ]:
dict_comp = {x: f(x) for x in S}
print(dict_comp)
The patterns discussed in the previous section also apply here. There are mainly two kinds of patterns in the single variable case.
dict_comp1 = {x: expr(x) for x in iterable}
dict_comp2 = {x: expr(x) for x in iterable if condition(x)}
In [ ]:
# Bad code
[print(i) for i in range(3)]
# you know you can do better than that
for i in range(3):
print(i)
# that's better
2. Do not sacrifice readability over "speed." For example, do not write code like the snippet shown below
In [ ]:
x1 = [i if i <= 10 else i**2 if 10 < i <= 20 else i**4 if 20 < i <= 50 else 1.0 / i for i in range(100) if i not in (5, 7, 11, 13, 17, 19, 29, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97)]
The procedural code is more readable compared to the comprehension
In [ ]:
# procedural code is more readable in this case here
x2 = []
for i in range(100):
# optimus primes are beyond our reach, https://oeis.org/A217090
if i not in (5, 7, 11, 13, 17, 19, 29, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97):
if i <= 10:
x2.append(i)
elif 10 < i <= 20:
x2.append(i**2)
elif 20 < i <= 50:
x2.append(i**4)
else:
x2.append(1.0 / i)
assert x2 == x1
This can be shortened to comprehension for readability with a little bit of refactoring.
In [ ]:
def function(val):
if val <= 10:
return val
elif 10 < val <= 20:
return val**2
elif 20 < val <= 50:
return val**4
else:
return 1.0 / val
def is_optimus_prime(val):
return val in (5, 7, 11, 13, 17, 19, 29, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97)
x3 = [function(i) for i in range(100) if not is_optimus_prime(i)]
assert x1 == x2 == x3
print("Passed!")
In [ ]: