Reading, and writing, comprehension(s)

Before we delve into the topic of comprehensions, here is a bit of setup code.



In [ ]:

    
from numpy.random import randint
import matplotlib.pyplot as plt
%matplotlib inline

S = randint(low=0, high=11, size=15) # 10 random integers b/w 0 and 10

def f(x):
    """
    Dummy function - returns identity
    """
    return x

List comprehensions

List comprehensions are a special syntax for compactly generating lists. Historically, they come from a programming style referred to as functional programming.

A list comprehensions can always be expanded into procedural statements using loops. Although comprehensions have a slight advantage in performance when compared to loops, this doesn't mean that you should always prefer comprehensions over procedural code. Too much syntactic sugar can be hazardous to your (program's) health, in the sense of making it hard to read.

Common patterns with list comprehensions in single variable

The following sections contain examples of common patterns where list comprehensions are useful. The patterns described here are by no means exhaustive. Rather, they are meant to act as a solution template for common problems.

A typical use of a list comprehension in a single variable is to expand statements in mathematics known as (universal) quantifiers, or "for-all" statements.

Example 1

The following mathematical statement,

$$ y = f(x) \ \ \forall x \in S, $$

is usually read as, "for all elements $x$ from a collection $S$, (let) $y$ equal $f(x)$." Statements of this form have natural translations into Python using list comprehensions. To wit:



In [ ]:

    
print("1. S == {}".format(S))
y1 = [f(x) for x in S]
print("2. All (x, f(x)) pairs: {}".format(list(zip(S, y1))))
plt.scatter(S, y1)

As you can see, the translation from the math to code is natural.

In a procedural or iterative style, an equivalent program might look like the following.



In [ ]:

    
y2 = []
for x in S:
    y2.append(f(x))
    
print("3. All (x, f(x)) pairs: {}".format(list(zip(S, y2))))

assert y1 == y2

Example 2

The syntax extends nicely in the presence of conditionals. For instance, consider the following:

\begin{align*} y &= \begin{cases} 0 & x \leq 5 \\ f(x) & \text{otherwise} \end{cases}\\ &\forall x \in S \end{align*}

The list comprehension-based code might look as follows:



In [ ]:

    
y1 = [0 if x <= 5 else f(x) for x in S]
print(*zip(S,y1))
plt.scatter(S, y1)

NOTE:

This is not different from the first pattern in syntactic terms. This is a trick based on the ternary expressions in Python.

The procedural equivalent of this code is shown below.



In [ ]:

    
y2 = []
for x in S:
    if x <= 5:
        y2.append(0)
    else:
        y2.append(f(x))
print(*zip(S, y2))
assert y1 == y2
print("Passed!")

The two patterns shown in examples 1 and 2 can be generalised to the following pattern.

output_list = [expression(i) for i in some_iterable]

Example 3

Suppose we wish to construct a list from a subset of the elements of $S$. That is, let $R \subseteq S$ and consider

\begin{align*} y = f(x) \ \forall x \in R, \mbox{ where } R \subseteq S. \end{align*}

As this notation indicates, we are interested in the function's value for only a subset of the input space, namely $R \subseteq S$. The subset can be seen as imposing a condition on the input space.

For the purpose of this example, we will use $R = \{x: x \leq 5, x \in S\}$.



In [ ]:

    
y1 = [f(x) for x in S if x <= 5]
s = [x for x in S if x <= 5]
print(*zip(s,y1))

# Note how the output range has been modified due to the change in input range
plt.scatter(s, y1)

The procedural equivalent of this code is shown below.



In [ ]:

    
y2 = []
for x in S:
    if x <= 5:
        y2.append(f(x))
assert y2 == y1
print(*zip(S, y2))
print("Passed!")

This pattern is syntactically different from the previous pattern. It can be generalized as

output_list = [expression(i) for i in some_iterable if condition(i)]

List comprehensions with two variables

Comprehensions can also be extended to multiple variables. The rules discussed in the previous section also apply to the multivariable comprehensions. The main thing you need to remember for multivariable comprehensions is that the outer variable in the comprehension varies the fastest.

Example 4

For example, imagine a matrix $C$ whose elements are given by

\begin{align*} c_{i,j} &= g(i,j) \\ i &\in 0\cdots2,\ j \in 0\cdots2 \end{align*}

We can create the (flattened) matrix in a single list comprehension using the following code.



In [ ]:

    
import numpy as np

def g(i, j):
    """
    Returns the result of division of indices
    """
    return (i + 1) / (j + 1)

C1 = [g(i,j) for i in range(0,3) for j in range(0,3)] # replace g with any function that you want

print(C1)
print(np.array(C1).reshape(3,3))

The procedural equivalent of this code is shown below.



In [ ]:

    
C2 = []
for i in range(3):
    for j in range(3):
        C2.append(g(i, j))
print(C2)
assert C1 == C2
print("Passed!")

Example 2 also has an equivalent in the two variable case.

Example 5

For example, \begin{align*} C &= \begin{cases} g(i,j) & i \neq j \\ 0 & i = j \end{cases} \\ i &\in 0\cdots2, j \in 0\cdots2 \end{align*}



In [ ]:

    
C1 = [g(i,j) if i !=j else 0 for i in range(0,3) for j in range(0,3)] 

print(C1)
print(np.array(C1).reshape(3,3))

Technically, this is the same pattern as the previous example but uses the ternary operator (as shown in example 2). The procedural equivalent is shown below.



In [ ]:

    
C2 = []
for i in range(3):
    for j in range(3):
        if i != j:
            C2.append(g(i,j))
        else:
            C2.append(0)
print(C2)
assert C1 == C2
print("Passed!")

The two examples can be generalized to

output_list = [expr(i,j) for i in iterable1 for j in iterable2] # j varies fastest

Restrictions on the input space as shown in example 3 can also extended to the multivariable comprehension. This is illustrated below for the sake of completeness, though the result cannot be displayed as a matrix.

Example 6

For example, \begin{align*} C &= g(i,j) \\ i &\in 0\cdots2,\ j \in 0\cdots2,\ i \neq j \end{align*}



In [ ]:

    
C1 = [ (i, j, g(i,j)) for i in range(0,3) for j in range(0,3) if i !=j] 
print(C1) # note that the input restriction on the diagonals removes the diagonals from the output list

The procedural equivalent is shown below.



In [ ]:

    
C2 = []
for i in range(3):
    for j in range(3):
        if i != j:
            C2.append((i, j, g(i,j)))
print(C2)
assert C1 == C2
print("Passed!")

The pattern can be generalized as

output_list = [expr(i,j) for i in iterable1 for j in iterable2 if condition(i,j)]

Comprehensions can be used with even more variables but readability takes a serious hit with more than two variables.

See PEP 202 (https://www.python.org/dev/peps/pep-0202/) for more details about list comprehensions. I highly encourage reading PEP documents since you often get the rationale behind a feature in the language straight from the horse's mouth.

Dictionary comprehensions

Some other built-in collections in Python have "comprehensive" analogues. One example is the dictionary comprehension, which is described in PEP 274 (https://www.python.org/dev/peps/pep-0274/).

You can use dictionary comprehensions in ways very similar to list comprehensions, except that the output of a dictionary comprehension is, well, a dictionary instead of a list.

Mathematically, dictionary comprehensions are suited to representing functions.

$$ \begin{align*} x \rightarrow f(x), x \in S \end{align*} $$

can be translated as



In [ ]:

    
dict_comp = {x: f(x) for x in S}
print(dict_comp)

The patterns discussed in the previous section also apply here. There are mainly two kinds of patterns in the single variable case.

dict_comp1 = {x: expr(x) for x in iterable}
dict_comp2 = {x: expr(x) for x in iterable if condition(x)}

What not to do with comprehensions

1. Do not use side effects within comprehensions



In [ ]:

    
# Bad code
[print(i) for i in range(3)]

# you know you can do better than that
for i in range(3):
    print(i)
# that's better

2. Do not sacrifice readability over "speed." For example, do not write code like the snippet shown below



In [ ]:

    
x1 = [i if i <= 10 else i**2 if 10 < i <= 20 else i**4 if 20 < i <= 50 else 1.0 / i for i in range(100) if i not in (5, 7, 11, 13, 17, 19, 29, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97)]

The procedural code is more readable compared to the comprehension



In [ ]:

    
# procedural code is more readable in this case here
x2 = []
for i in range(100):
    # optimus primes are beyond our reach, https://oeis.org/A217090 
    if i not in (5, 7, 11, 13, 17, 19, 29, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97):
        if i <= 10:
            x2.append(i)
        elif 10 < i <= 20:
            x2.append(i**2)
        elif 20 < i <= 50:
            x2.append(i**4)
        else:
            x2.append(1.0 / i)
assert x2 == x1

This can be shortened to comprehension for readability with a little bit of refactoring.



In [ ]:

    
def function(val):
    if val <= 10:
        return val
    elif 10 < val <= 20:
        return val**2
    elif 20 < val <= 50:
        return val**4
    else:
        return 1.0 / val

def is_optimus_prime(val):
    return val in (5, 7, 11, 13, 17, 19, 29, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97)

x3 = [function(i) for i in range(100) if not is_optimus_prime(i)]
assert x1 == x2 == x3
print("Passed!")

Suggested exercise: set comprehensions

Python sets (set) also have comprehension-based constructors. Try looking them up and writing some code to experiment with them, analogous to what you did above.



In [ ]: