Basic Programming Using Python: Making Decisions and Programming Defensively

Objectives

  • Explain what Boolean values are and correctly identify them in programs.
  • Name the three common Boolean operators and explain what they do.
  • Explain what short-circuit evaluation is and when it should be used.
  • Trace the behavior of programs containing assert statements.
  • Add useful assertions to programs.
  • Format output using strings.

Changing Colors

Let's create another grid and color a few cells:


In [11]:
from ipythonblocks import ImageGrid, colors # import both at the same time
row = ImageGrid(6, 1)
row[1, 0] = colors['Orchid']
row[5, 0] = colors['Orchid']
row.show()


Suppose we want to invert these colors, i.e., turn every black cell orchid, and every orchid cell black. We could do this directly, but if we want to do the operation frequently, on many different images, we ought to write a function, and that function ought to work equally well on this grid:


In [12]:
another_row = ImageGrid(8, 1)
another_row[0, 0] = colors['Orchid']
another_row[1, 0] = colors['Orchid']
another_row[3, 0] = colors['Orchid']
another_row.show()


What we really want is a way for the computer to make decisions based on the data it is processing. The tool that does that is the conditional statement, often called an "if statement" because of how it's written:


In [13]:
if 5 > 0:
    print '5 is greater than 0'
if 5 < 0:
    print '5 should not be less than 0'


5 is greater than 0

A conditional statement starts with the word if, followed by an expression that can be either true or false. If the expression is true, Python executes the block of code underneath the if; if it's false, Python skips that block:

FIXME: diagram

We often want to do one thing when a condition is true, and another thing when the condition is false, so Python allows us to attach an else to an if like this:


In [14]:
if 'abc' > 'xyz':
    print 'whoops: "abc" should be less than "xyz"'
else:
    print 'correct: "abc" is less than "xyz"'


correct: "abc" is less than "xyz"

We can use another keyword, elif, to insert additional tests after the if. Python checks each one in order, and executes the code block belonging to the first one that's true. If none of them are, it executes the else, or does nothing at all if an else hasn't been provided:


In [15]:
for number in range(-2, 3): # produces -2, -1, 0, 1, 2
    if number < 0:
        print number, 'is negative'
    elif number == 0:
        print number, 'is zero'
    else:
        print number, 'must be positive'


-2 is negative
-1 is negative
0 is zero
1 must be positive
2 must be positive

We now have everything we need to invert the colors in a color grid:


In [16]:
def invert(grid):
    for x in range(grid.width):
        if grid[x, 0] == colors['Orchid']:
            grid[x, 0] = colors['Black']
        else: # must be black
            grid[x, 0] = colors['Orchid']

As discussed in the previous lessons, grid.width is the width of the grid, so range(grid.width) is the sequence of numbers 0, 1, 2, …, grid.width-1, i.e., the legal X indices for the grid. Inside that loop, we check grid[x, 0]'s color. If it's orchid, we turn it black; if it's not orchid, we assume that it's black and make it orchid. To test it, let's look at our original row:


In [17]:
row.show()


and then look at it again after inverting it:


In [18]:
invert(row)
row.show()


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-976189eb5286> in <module>()
----> 1 invert(row)
      2 row.show()

<ipython-input-16-48eb82a2f68c> in invert(grid)
      1 def invert(grid):
      2     for x in range(grid.width):
----> 3         if grid[x, 0] == colors['Orchid']:
      4             grid[x, 0] = colors['Black']
      5         else: # must be black

/Users/gwilson/anaconda/lib/python2.7/site-packages/ipythonblocks/ipythonblocks.pyc in __eq__(self, other)
    249     def __eq__(self, other):
    250         if not isinstance(other, Block):
--> 251             raise NotImplemented
    252         return self.rgb == other.rgb and self.size == other.size
    253 

TypeError: exceptions must be old-style classes or derived from BaseException, not NotImplementedType

This error message isn't particularly helpful, since it depends on concepts we haven't encountered yet. After a bit of poking around, though, it turns out that when we select a cell from a grid, we don't get the cell's RGB color value. Instead, we get a Pixel that contains both the cell's color and its XY coordinates:


In [19]:
pixel = row[0, 0]
help(pixel)


Help on Pixel in module ipythonblocks.ipythonblocks object:

class Pixel(Block)
 |  Method resolution order:
 |      Pixel
 |      Block
 |      __builtin__.object
 |  
 |  Methods defined here:
 |  
 |  __str__(self)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  x
 |      Horizontal coordinate of Pixel.
 |  
 |  y
 |      Vertical coordinate of Pixel.
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from Block:
 |  
 |  __eq__(self, other)
 |  
 |  __init__(self, red, green, blue, size=20)
 |  
 |  __repr__(self)
 |  
 |  set_colors(self, red, green, blue)
 |      Updated block colors.
 |      
 |      Parameters
 |      ----------
 |      red, green, blue : int
 |          Integers on the range [0 - 255].
 |  
 |  show(self)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from Block:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  blue
 |  
 |  col
 |  
 |  green
 |  
 |  red
 |  
 |  rgb
 |  
 |  row
 |  
 |  size
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes inherited from Block:
 |  
 |  __hash__ = None

What we need to do is compare colors['Black'] with grid[x, 0].rgb. Let's rewrite our function and try it:


In [20]:
def invert(grid):
    for x in range(grid.width):
        if grid[x, 0].rgb == colors['Orchid']: # comparing to RGB
            grid[x, 0] = colors['Black']
        else:
            grid[x, 0] = colors['Orchid']

invert(row)
row.show()


That seems to have worked—let's try with the other row:


In [21]:
invert(another_row)
another_row.show()


That seems to have worked too—or did it? We can't check by displaying the original state of another_row because we've just changed it. What we really ought to do is change our function to create a new grid rather than modifying the one we pass in:


In [22]:
def invert(grid):
    result = ImageGrid(grid.width, 1)
    for x in range(grid.width):
        if grid[x, 0].rgb == colors['Orchid']:
            result[x, 0] = colors['Black']
        else:
            result[x, 0] = colors['Orchid']
    return result

Let's try it out:


In [23]:
test_case = ImageGrid(4, 1)
test_case[0, 0] = colors['Orchid']
test_case[3, 0] = colors['Orchid']
test_case.show()


and:


In [24]:
changed = invert(test_case)
changed.show()


and:


In [25]:
test_case.show()


That's better: we still have our original data to compare our new data to, and if we really want to overwrite the original, we can always do this:


In [26]:
test_case = invert(test_case)
test_case.show()



When to Mutate

Changing a value in place is called [mutating](glossary.html#mutation) it. It makes programs harder to understand, since readers have to follow a sequence of steps in order to figure out what the value of a variable is, but it is often done for the sake of efficiency. Creating a new four-pixel image grid takes almost no time at all, but copying a multi-gigabyte video in order to eliminate red-eye in a couple of frames would be very slow. We'll return to this topic in [the lesson on lists](python-4-files-lists.ipynb).


Boolean Values and Operators

Most people understand that 5+3 produces the value 8, but it can take a while to realize that 5>3 also produces a value. Let's do a few experiments:


In [27]:
print '5 is greater than 3:', 5 > 3
print '5 is less than 3:', 5 < 3


5 is greater than 3: True
5 is less than 3: False

The result of an expression like 5>3 is the Boolean True; the result of 5<3 is the Boolean False. Those are the only two values of the type bool: there are many thousands of different characters, and millions of integers and floating-point numbers, but True and False are all that bool gets. Like other values, Booleans can be assigned to variables:


In [28]:
answer = 5 > 3
print 'answer stored in variable:', answer


answer stored in variable: True

Booleans can also be used directly in conditional statements:


In [29]:
if answer:
    print 'answer is true'


answer is true

Note that we do not write if answer == True. answer itself is either True or False, and that's all if needs. As the table below shows, comparing a Boolean to True is redundant:

Value`== True`
`True``True`
`False``False`

Booleans can be manipulated using three operators: and, or, and not. The third is the simplest: if x is True, not x is False and vice versa. and produces True only if both of its operands are True, while or produces True if either or both of its operands are True. (This is sometimes called inclusive or; the term exclusive or is used to mean "one or the other is true, but not both".) The Venn diagram below shows how these operators work when we are looking at creatures that can either fly or not, and are either real or not:

FIXME: diagram

Python evaluates and and or a bit differently from the way it evaluates arithmetic operators like + and *. When Python executes x+y, it gets the values of x and y before performing the addition, but is allowed to decide for itself whether to get x or y first. When it evaluates x or y, on the other hand, it always starts by checking whether x is True. If it is, it stops evaluation right there: since or is True if either operand is True, Python doesn't need to know the value of y in order to complete its calculations. If x is False, on the other hand, Python must get y in order to figure out the expression's final value.

Similarly, when Python evaluates x and y, it always starts by getting the value of x. If this is False, the result is bound to be False, so Python doesn't even try to get the value of y. This is called short-circuit evaluation, and is often used to do things like this:

if (number != 0) and (1/number < threshold):
    total += 1/number

Without that first test, the if would blow up if number was zero. Since Python always executes the check for zero before checking the reciprocal of number, though, this is safe to execute.

One other thing that's special about Booleans is that values of almost any other type can be used in their place. The numbers 0 and 0.0 are treated as equivalent to False, and so is the empty string ''; all other numbers and strings are equivalent to True. This means that we can rewrite:

if len(some_string) > 0:
    ...do something...

as:

if len(some_string):
    ...do something...

or even just as:

if some_string:
    ...do something...

The first version checks that the length of the string is greater than zero, i.e., that the string contains some characters. The second version checks that the length of the string is not zero; since the length can't be negative, this is the same as checking that it's positive. The final version just checks that some_string is not the empty string: it's the shortest, the most efficient to execute, and the one that most experienced Python programmers would write, but it also puts the greatest burden on the reader. Which one you use is up to you, but whatever you do, please be consistent: many studies have shown that people can learn to read almost anything quickly as long as there are patterns for their eyes and brain to follow

Defensive Programming

Initializing image grids by assigning colors to cells one at a time is getting pretty tedious, so let's invent something easier. As a general rule, this is how a lot of software gets written, and not just by scientists: if we find ourselves doing the same thing repeatedly, it's worth taking a few minutes to teach the computer how to do it for us.

We'll start by coloring the cells of a one-row grid red or green based on a string containing the letters 'R' and 'G':


In [30]:
data = 'RGRGRRGG'
row = ImageGrid(8, 1)
for x in range(8):
    if data[x] == 'R':
        row[x, 0] = colors['Red']
    else:
        row[x, 0] = colors['Green']
row.show()


That seems to have worked: the cells of the grid are red and green in the same locations that the characters 'R' and 'G' appear in the string. Let's try putting the code in a function:


In [31]:
def color_from_string(grid, data):
    for x in range(grid.width):
        if data[x] == 'R':
            grid[x, 0] = colors['Red']
        else:
            grid[x, 0] = colors['Green']

test_row = ImageGrid(8, 1)
color_from_string(test_row, 'RGRGRGG')
test_row.show()


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-31-000b48022a44> in <module>()
      7 
      8 test_row = ImageGrid(8, 1)
----> 9 color_from_string(test_row, 'RGRGRGG')
     10 test_row.show()

<ipython-input-31-000b48022a44> in color_from_string(grid, data)
      1 def color_from_string(grid, data):
      2     for x in range(grid.width):
----> 3         if data[x] == 'R':
      4             grid[x, 0] = colors['Red']
      5         else:

IndexError: string index out of range

Whoops: it looks like we're trying to get a character from the string that doesn't exist. Let's try printing out our loop variable as we go along:


In [32]:
def color_from_string(grid, data):
    for x in range(grid.width):
        print '*** x is', x
        if data[x] == 'R':
            grid[x, 0] = colors['Red']
        else:
            grid[x, 0] = colors['Green']

color_from_string(test_row, 'RGRGRRGG')


*** x is 0
*** x is 1
*** x is 2
*** x is 3
*** x is 4
*** x is 5
*** x is 6
*** x is 7

Why would printing out the loop variable stop the function from crashing? The answer is that we didn't just add a print statement to our function: we also passed in a different character string. In the first call, we used 'RGRGRGG', which has only 7 characters; in the second, we used 'RGRGRRGG', which has 8. This is an example of what happens when we violate the DRY Principle, so let's fix it now:


In [33]:
test_string = 'RGRGRRGG' # with the right number of characters
color_from_string(test_row, test_string)
test_row.show()


*** x is 0
*** x is 1
*** x is 2
*** x is 3
*** x is 4
*** x is 5
*** x is 6
*** x is 7

That's better—but since there's no guarantee we won't make the same mistake again, we really ought to modify the function to detect the problem before we start modifying cell colors. (We also ought to take out the print statement.)


In [34]:
def color_from_string(grid, data):
    assert grid.width == len(data), 'Grid and string lengths do not match'
    for x in range(grid.width):
        if data[x] == 'R':
            grid[x, 0] = colors['Red']
        else:
            grid[x, 0] = colors['Green']

The statement on line 2 is called an assertion. When Python encounters one, it checks that the assertion's condition is true. If it is, Python does nothing, but if it's not, Python halts the program immediately and prints the error message provided. Let's test it out:


In [35]:
should_work = ImageGrid(4, 1)
color_from_string(should_work, 'RGRG')
should_work.show()


and:


In [36]:
should_fail = ImageGrid(4, 1)
color_from_string(should_fail, 'RGRGRGRG') # string is too long


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-36-3ccc2222584d> in <module>()
      1 should_fail = ImageGrid(4, 1)
----> 2 color_from_string(should_fail, 'RGRGRGRG') # string is too long

<ipython-input-34-bfa5631afc2f> in color_from_string(grid, data)
      1 def color_from_string(grid, data):
----> 2     assert grid.width == len(data), 'Grid and string lengths do not match'
      3     for x in range(grid.width):
      4         if data[x] == 'R':
      5             grid[x, 0] = colors['Red']

AssertionError: Grid and string lengths do not match

Excellent: rather than trusting us to be perfect, our function is now checking that we've called it with sensible values, and halting immediately if we haven't. This is one embodiment of another general principle of programming called FEFO: fail early, fail often. The more code the computer executes between when something goes wrong and when the symptoms of that error show up, the more we'll have to wade through when debugging, so getting the computer to stop as soon as it can after a mistake can save us hours or days of hunting around. Assertions are also good documentation, since they give human readers hints about how the code ought to work.

We can check more than just the lengths of the row to be filled and the input string, and we should. Here's another mistaken call to our function:


In [37]:
filled_incorrectly = ImageGrid(12, 1)
color_from_string(filled_incorrectly, 'GGGGGGRRRPRR')
filled_incorrectly.show()


Why is one cell on the right green? It should be red, because our string is six G's and six—oh. Oops. There's a P mixed in with our R's, but since the two letters are so similar, it was hard to spot. Let's add an assertion to our function to check for that:


In [38]:
def color_from_string(grid, data):
    assert grid.width == len(data), 'Grid and string lengths do not match'
    for x in range(grid.width):
        assert data[x] in 'GR', 'Unknown character in data string'
        if data[x] == 'R':
            grid[x, 0] = colors['Red']
        else:
            grid[x, 0] = colors['Green']

color_from_string(filled_incorrectly, 'GGGGGGRRRPRR') # hopefully the same wrong string as before
filled_incorrectly.show()


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-38-8d1096cc830c> in <module>()
      8             grid[x, 0] = colors['Green']
      9 
---> 10 color_from_string(filled_incorrectly, 'GGGGGGRRRPRR') # hopefully the same wrong string as before
     11 filled_incorrectly.show()

<ipython-input-38-8d1096cc830c> in color_from_string(grid, data)
      2     assert grid.width == len(data), 'Grid and string lengths do not match'
      3     for x in range(grid.width):
----> 4         assert data[x] in 'GR', 'Unknown character in data string'
      5         if data[x] == 'R':
      6             grid[x, 0] = colors['Red']

AssertionError: Unknown character in data string

Excellent: as strange as it may sound, the function is failing as desired.


Other Ways To Do It

Another way to write this function would be to check the characters as part of the conditional statement:


In [39]:
def color_from_string(grid, data):
    assert grid.width == len(data), 'Grid and string lengths do not match'
    for x in range(grid.width):
        if data[x] == 'R':
            grid[x, 0] = colors['Red']
        elif data[x] == 'G':
            grid[x, 0] = colors['Green']
        else:
            assert False, 'Unknown character in data string'

The advantage of this is that the legal characters only appear once, rather than being duplicated in the `assert` and in the conditional. The disadvantage is that `assert False` sounds odd to many people, since it's guaranteed to fail every time. More importantly, it doesn't make sense on its own: we can only understand **why** we're always failing on that line by reading the `if` and `elif` that come before it.

Another way to write this, which many people prefer, is to do all our checking before we modify the grid:


In [40]:
def color_from_string(grid, data):
    assert grid.width == len(data), 'Grid and string lengths do not match'
    for char in data:
        assert char in 'RG', 'Unknown character in data string'
    for x in range(grid.width):
        if data[x] == 'R':
            grid[x, 0] = colors['Red']
        else:
            grid[x, 0] = colors['Green']

This version doesn't do anything until it's sure that whatever it does will succeed, so there's no risk of changing part of the grid but not the rest. On the other hand, there are now two loops instead of one. While the slowdown due to the extra loop won't be noticeable on small grids, checking data before modifying it can have a noticeable impact on the speed of programs that are working with terabytes. Again, it's up to you to use whichever variation you prefer, but whichever one you choose, you should write all your checks the same way.


String Formatting

To end this lesson, let's try to make our error messages a little more helpful. After all, few things are as frustrating as being told that something is wrong, but not being told what. We'll start by inserting a few strings into another string:


In [41]:
print '{0} and {1} shared the Nobel Prize in 1947'.format('Gerty Cori', 'Carl Cori')


Gerty Cori and Carl Cori shared the Nobel Prize in 1947

As you can probably infer, strings have a method called format that can be given any number of parameters. These parameters are interpolated wherever the markers {0}, {1}, and so on appear. It's OK to use a value several times:


In [42]:
print '{0} is the same as {0}'.format('this')


this is the same as this

and we can provide extra information to format numbers and many (many) other things:


In [43]:
print 'Four digits, two after the decimal point: {0:4.2f}'.format(3.14159)


Four digits, two after the decimal point: 3.14

What we can't do is change the original string:


In [44]:
mass = 46.5
name = 'Lyell'
template = 'Mass {0:4e} and name "{1:^20}"'
result = template.format(mass, name)
print 'result after formatting:', result
print 'template after formatting:', template


result after formatting: Mass 4.650000e+01 and name "       Lyell        "
template after formatting: Mass {0:4e} and name "{1:^20}"

The reason is that strings are immutable, i.e., they cannot be changed after they have been created. Numbers are also immutable: we can't assign a new value to 2 or 3.14159 (although we can assign these values to variables, then change those variables). We will meet some mutable data types in the python-4-files-lists.ipynb; for now, we'll use what we have learned about string formatting to make our error messages more readable:


In [45]:
def color_from_string(grid, data):
    assert grid.width == len(data), \
           'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))
    for x in range(grid.width):
        assert data[x] in 'GR', \
               'Unknown character in data string: "{0}"'.format(data[x])
        if data[x] == 'R':
            grid[x, 0] = colors['Red']
        else:
            grid[x, 0] = colors['Green']

Notice that we have used \ at the end of lines to tell Python that the statement continues on the following line. This makes the program easier to read by making the assertion's condition easier to spot, and by reducing the number of very long lines in our code. Let's try our revised function:


In [46]:
test_case = ImageGrid(4, 1)
color_from_string(test_case, 'R') # wrong length


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-46-05398559e78a> in <module>()
      1 test_case = ImageGrid(4, 1)
----> 2 color_from_string(test_case, 'R') # wrong length

<ipython-input-45-884bbad757a8> in color_from_string(grid, data)
      1 def color_from_string(grid, data):
----> 2     assert grid.width == len(data),            'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))
      3     for x in range(grid.width):
      4         assert data[x] in 'GR',                'Unknown character in data string: "{0}"'.format(data[x])
      5         if data[x] == 'R':

AssertionError: Grid and string lengths do not match: 4 != 1

and:


In [47]:
color_from_string(test_case, 'PPPP')


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-47-c132ae306847> in <module>()
----> 1 color_from_string(test_case, 'PPPP')

<ipython-input-45-884bbad757a8> in color_from_string(grid, data)
      2     assert grid.width == len(data),            'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))
      3     for x in range(grid.width):
----> 4         assert data[x] in 'GR',                'Unknown character in data string: "{0}"'.format(data[x])
      5         if data[x] == 'R':
      6             grid[x, 0] = colors['Red']

AssertionError: Unknown character in data string: "P"

There's only one thing left to do: document our function. The name color_from_string may seem obvious to us right now, but when we revisit this code in six weeks, we might easily think it means "convert a string to a color". Let's add a docstring (documentation string) to the function:


In [48]:
def color_from_string(grid, data):
    "Color grid cells red and green according to 'R' and 'G' in data."
    assert grid.width == len(data), \
           'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))
    for x in range(grid.width):
        assert data[x] in 'GR', \
               'Unknown character in data string: "{0}"'.format(data[x])
        if data[x] == 'R':
            grid[x, 0] = colors['Red']
        else:
            grid[x, 0] = colors['Green']

We use a docstring rather than a comment because docstrings are what help displays:


In [49]:
help(color_from_string)


Help on function color_from_string in module __main__:

color_from_string(grid, data)
    Color grid cells red and green according to 'R' and 'G' in data.

Key Points

  • Use if/elif/else to make choices in programs.
  • Use assert to embed self-checks in programs.
  • Use string.format to create nicely-formatted output.
  • Add docstrings to functions to provide interactive help.