IHE Python course, 2017

List, dict and set comprehensions, list generators and filtering using a key function

T.N.Olsthoorn, Feb2017

List comprehensions (listcomps) dict comprehensions (dictcomps) and set comprehensions (setcomps) are a shortcut to construct lists, dict and sets in a single line.


In [3]:
from pprint import pprint
import numpy as np

List comprehensions

Consider myList, a list of coordinate tuples and the coordinates x0, y0. The idea is the compute the distance between the two coordinates and the coordinate pairs in myList.


In [9]:
myList = [(3, 2), (40, 12), (-5, 4), (-6, -21), (-7, 23)]
x0 = 44
y0 = 13

We might compute the distance between the coordinates x0, y0 to each of the points implied by the coordinate tuples in myList using a for-loop as follows:


In [15]:
r = []
for x, y in myList:
    r.append(np.sqrt((x - x0)**2 + (y - y0)**2))
print(r)


[42.449970553582247, 4.1231056256176606, 49.819674828324601, 60.464865831323898, 51.97114584074513]

Now the same thing, but with a list comprehension:


In [19]:
r = [ np.sqrt((x - x0)**2 + (y - y0)**2 )       for x, y in myList]
print(type(r))
print(r)


<class 'list'>
[42.449970553582247, 4.1231056256176606, 49.819674828324601, 60.464865831323898, 51.97114584074513]

When parenthesis ( ) are used instead of square brackets, then it's not a tuple that is generated, but we createa generator objects:


In [18]:
r = (np.sqrt((x - x0)**2 + (y - y0)**2 ) for x, y in myList)
print(type(r))


<class 'generator'>

r is a generator object that we can now use wherever we need the list that it will generate upon request:


In [20]:
r


Out[20]:
[42.449970553582247,
 4.1231056256176606,
 49.819674828324601,
 60.464865831323898,
 51.97114584074513]

Syntax of a list comprehension:

The syntax of list comprehensions is:

new = [ expression for p in old_list if expression]

The if part is use to filer out values for p in the original lists or tuples.

On the other hand, numerical stuff is mostly better done using numpy functionality such as numpy arrays.

Let's generate a deck of playing cards and shuffle them.

A deck of cards looks like this:


In [76]:
from random import shuffle

cards1 = ['Clubs', 'Diamonds', 'Hearts', 'Spades']
cards2 = ['Ace', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'Jack', 'Queen', 'King']

# gnerate the deck
cards = [c1 + '_' + c2 for c1 in cards1 for c2 in cards2]

shuffle(cards) # shuffle the cards

#pprint(cards) # show them

print("\nShuffled playing cards:\n")
for i in range(13):
    for j in range(4):
        print("{:20}".format(cards[4 * i + j]), end="")
    print()


Shuffled playing cards:

Diamonds_Queen      Spades_4            Clubs_6             Spades_Jack         
Diamonds_8          Clubs_Queen         Diamonds_9          Spades_6            
Clubs_5             Hearts_Ace          Spades_2            Clubs_10            
Spades_Queen        Diamonds_3          Clubs_7             Diamonds_2          
Spades_8            Clubs_4             Hearts_5            Diamonds_Jack       
Hearts_6            Spades_1            Diamonds_5          Hearts_1            
Spades_9            Spades_10           Hearts_Jack         Hearts_3            
Hearts_7            Hearts_Queen        Clubs_King          Hearts_8            
Diamonds_4          Spades_5            Diamonds_Ace        Clubs_9             
Diamonds_10         Clubs_1             Hearts_2            Diamonds_1          
Hearts_9            Diamonds_King       Hearts_4            Spades_King         
Spades_3            Hearts_10           Clubs_Ace           Spades_Ace          
Clubs_Jack          Hearts_King         Clubs_3             Diamonds_6          

List comprehensions are especially useful for inspection of objects, to see their public attributes:


In [23]:
[p for p in dir(r) if not p.startswith('_')]


Out[23]:
['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

We could also use it to for a better introspection of the methods of an object like so. For this we can use the inspect module with the function getmembers. With a small list comprehension we can easily idendity the type of the public attributes of the list:


In [70]:
from inspect import getmembers

[p for p in getmembers(myList) if not p[0].startswith('_')]


Out[70]:
[('append', <function list.append>),
 ('clear', <function list.clear>),
 ('copy', <function list.copy>),
 ('count', <function list.count>),
 ('extend', <function list.extend>),
 ('index', <function list.index>),
 ('insert', <function list.insert>),
 ('pop', <function list.pop>),
 ('remove', <function list.remove>),
 ('reverse', <function list.reverse>),
 ('sort', <function list.sort>)]

Set comprehensions

Set comprehensions work the same as list comprehensions, but curly braces { } are used instead. Here we construct a set for the remainder of integer division by 5. We do this for numbers 0 to 50. The result is a set with the unique values only.


In [81]:
myList = [p%5 for p in range(51)]  # % computes remainder of a division
mySet ={p%5 for p in range(51)}

print(myList)
print()
print(mySet)


[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0]

{0, 1, 2, 3, 4}

Dict comprehensions

Dict comprehensions are similar to list comprehensions, but two values [key, value] must be supplied.

For example the list of tuples can be regarded as a list of x, y coordinates and now we want to use the first value as de key and the second values at its value.


In [1]:
myList = [(3, 2), (40, 12), (-5, 4), (-6, -21), (-7, 23)]

myDict1 = {key : value for key, value in myList}
myDict2 = {value : key for key, value in myList}

print(myDict1)
print(myDict2)
print()
pprint(myDict1)  # sorts the keys
pprint(myDict2)  # sorts the keys


{40: 12, -7: 23, -6: -21, 3: 2, -5: 4}
{4: -5, 2: 3, -21: -6, 12: 40, 23: -7}

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-d04fbf06302b> in <module>()
      7 print(myDict2)
      8 print()
----> 9 pprint(myDict1)  # sorts the keys
     10 pprint(myDict2)  # sorts the keys

NameError: name 'pprint' is not defined

More advanced comprehensions will be shown shortly when we deal with world population data in an extended example.

Using key for filtering sequences of objects (lambda function)

find the minimum of tuple, in the lis of tuples, when concluding the minimum from the second item in each tuple.

Let's first generate a tuple of tuples, each with three numbers.


In [1]:
from numpy import random

In [4]:
myTuples = tuple([tuple(random.randint(-5, 5, 3)) for i in range(20)])
pprint(myTuples)


((-3, -5, -3),
 (4, 3, -4),
 (-2, -2, -3),
 (2, -4, -2),
 (-3, -5, 4),
 (4, -4, 3),
 (-3, 2, -4),
 (4, 2, 3),
 (2, -4, -5),
 (2, 2, 2),
 (1, 4, 1),
 (0, -1, -2),
 (-5, -2, 3),
 (0, -1, 3),
 (0, -1, -1),
 (3, 4, -4),
 (-3, 1, 3),
 (2, 3, -5),
 (-5, -4, -1),
 (-5, 2, -2))

Then find the tuple for which the seond field (that with index 1) is lowest.


In [5]:
import sys

m = myTuples[0]   # initialize by taking the the first tuple, any other would do as well
for tp in myTuples:
    if tp[1] < m[1]:  # compare the field with that of the current minimum tuple
        m = tp   # if true then replace the current minimum tuple
    print(m) # show the update mininum tuple
print("\nminimum in field 2 is: ",m)


(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)
(-3, -5, -3)

minimum in field 2 is:  (-3, -5, -3)

Now a more elegant one using keyword key


In [6]:
def vcmp(tp):
    x, y, z = tp
    return y

min(myTuples, key=vcmp)


Out[6]:
(-3, -5, -3)

Why does this work ?

In general with a list of arbitrary objects, comparing them is not defined, like in this case. We can, however, come around this, by defining how two of the objects in questions have to be compared to decide which of them is smallest. This comparison is then done using some value that is computed for each object. In this case it's the second value of the tuple that we compare between objects to decide which of them is smallest. The function computes this value. This function is than passed to the min function as the argment of key. What then happens, is that min runs along the list of tuples and for each of them computes the value for comparison using the passed function. These values are compared to decide which object is smallest. When done the smallest object is returned.

And the most concise way, using a lambda function:


In [9]:
min(myTuples, key = lambda x: x[1])


Out[9]:
(-3, -5, -3)

How does it work?

A lambda function is a so-called anonumous function, in some languages also called a macro. It takes one or more arguments and its body consists of a single expression, that returns a single value.

So the lambda function

lambda x: x[1]

is equivalent to

def vcmp(x): return x[1]

That's why it works.

Lambda functions come in handy at many places where simple processing is needed on the fly and there is no standard function to do it.


In [ ]: