assert
statements.Let's create another grid and color a few cells:
In [11]:
from ipythonblocks import ImageGrid, colors # import both at the same time
row = ImageGrid(6, 1)
row[1, 0] = colors['Orchid']
row[5, 0] = colors['Orchid']
row.show()
Suppose we want to invert these colors, i.e., turn every black cell orchid, and every orchid cell black. We could do this directly, but if we want to do the operation frequently, on many different images, we ought to write a function, and that function ought to work equally well on this grid:
In [12]:
another_row = ImageGrid(8, 1)
another_row[0, 0] = colors['Orchid']
another_row[1, 0] = colors['Orchid']
another_row[3, 0] = colors['Orchid']
another_row.show()
What we really want is a way for the computer to make decisions based on the data it is processing. The tool that does that is the conditional statement, often called an "if statement" because of how it's written:
In [13]:
if 5 > 0:
print '5 is greater than 0'
if 5 < 0:
print '5 should not be less than 0'
A conditional statement starts with the word if
,
followed by an expression that can be either true or false.
If the expression is true,
Python executes the block of code underneath the if
;
if it's false,
Python skips that block:
FIXME: diagram
We often want to do one thing when a condition is true,
and another thing when the condition is false,
so Python allows us to attach an else
to an if
like this:
In [14]:
if 'abc' > 'xyz':
print 'whoops: "abc" should be less than "xyz"'
else:
print 'correct: "abc" is less than "xyz"'
We can use another keyword,
elif
,
to insert additional tests after the if
.
Python checks each one in order,
and executes the code block belonging to the first one that's true.
If none of them are,
it executes the else
,
or does nothing at all if an else
hasn't been provided:
In [15]:
for number in range(-2, 3): # produces -2, -1, 0, 1, 2
if number < 0:
print number, 'is negative'
elif number == 0:
print number, 'is zero'
else:
print number, 'must be positive'
We now have everything we need to invert the colors in a color grid:
In [16]:
def invert(grid):
for x in range(grid.width):
if grid[x, 0] == colors['Orchid']:
grid[x, 0] = colors['Black']
else: # must be black
grid[x, 0] = colors['Orchid']
As discussed in the previous lessons,
grid.width
is the width of the grid,
so range(grid.width)
is the sequence of numbers 0, 1, 2, …, grid.width
-1,
i.e.,
the legal X indices for the grid.
Inside that loop,
we check grid[x, 0]
's color.
If it's orchid,
we turn it black;
if it's not orchid,
we assume that it's black
and make it orchid.
To test it,
let's look at our original row:
In [17]:
row.show()
and then look at it again after inverting it:
In [18]:
invert(row)
row.show()
This error message isn't particularly helpful,
since it depends on concepts we haven't encountered yet.
After a bit of poking around,
though,
it turns out that when we select a cell from a grid,
we don't get the cell's RGB color value.
Instead,
we get a Pixel
that contains both the cell's color and its XY coordinates:
In [19]:
pixel = row[0, 0]
help(pixel)
What we need to do is compare colors['Black']
with grid[x, 0].rgb
.
Let's rewrite our function and try it:
In [20]:
def invert(grid):
for x in range(grid.width):
if grid[x, 0].rgb == colors['Orchid']: # comparing to RGB
grid[x, 0] = colors['Black']
else:
grid[x, 0] = colors['Orchid']
invert(row)
row.show()
That seems to have worked—let's try with the other row:
In [21]:
invert(another_row)
another_row.show()
That seems to have worked too—or did it?
We can't check by displaying the original state of another_row
because we've just changed it.
What we really ought to do is change our function to create a new grid
rather than modifying the one we pass in:
In [22]:
def invert(grid):
result = ImageGrid(grid.width, 1)
for x in range(grid.width):
if grid[x, 0].rgb == colors['Orchid']:
result[x, 0] = colors['Black']
else:
result[x, 0] = colors['Orchid']
return result
Let's try it out:
In [23]:
test_case = ImageGrid(4, 1)
test_case[0, 0] = colors['Orchid']
test_case[3, 0] = colors['Orchid']
test_case.show()
and:
In [24]:
changed = invert(test_case)
changed.show()
and:
In [25]:
test_case.show()
That's better: we still have our original data to compare our new data to, and if we really want to overwrite the original, we can always do this:
In [26]:
test_case = invert(test_case)
test_case.show()
Changing a value in place is called [mutating](glossary.html#mutation) it. It makes programs harder to understand, since readers have to follow a sequence of steps in order to figure out what the value of a variable is, but it is often done for the sake of efficiency. Creating a new four-pixel image grid takes almost no time at all, but copying a multi-gigabyte video in order to eliminate red-eye in a couple of frames would be very slow. We'll return to this topic in [the lesson on lists](python-4-files-lists.ipynb).
Most people understand that 5+3 produces the value 8, but it can take a while to realize that 5>3 also produces a value. Let's do a few experiments:
In [27]:
print '5 is greater than 3:', 5 > 3
print '5 is less than 3:', 5 < 3
The result of an expression like 5>3 is the Boolean True
;
the result of 5<3 is the Boolean False
.
Those are the only two values of the type bool
:
there are many thousands of different characters,
and millions of integers and floating-point numbers,
but True
and False
are all that bool
gets.
Like other values,
Booleans can be assigned to variables:
In [28]:
answer = 5 > 3
print 'answer stored in variable:', answer
Booleans can also be used directly in conditional statements:
In [29]:
if answer:
print 'answer is true'
Note that we do not write if answer == True
.
answer
itself is either True
or False
,
and that's all if
needs.
As the table below shows,
comparing a Boolean to True
is redundant:
Value | `== True` |
---|---|
`True` | `True` |
`False` | `False` |
Booleans can be manipulated using three operators: and
, or
, and not
.
The third is the simplest:
if x
is True
,
not x
is False
and vice versa.
and
produces True
only if both of its operands are True
,
while or
produces True
if either or both of its operands are True
.
(This is sometimes called inclusive or;
the term exclusive or is used to mean
"one or the other is true, but not both".)
The Venn diagram below shows how these operators work
when we are looking at creatures that can either fly or not,
and are either real or not:
FIXME: diagram
Python evaluates and
and or
a bit differently from
the way it evaluates arithmetic operators like +
and *
.
When Python executes x+y
,
it gets the values of x
and y
before performing the addition,
but is allowed to decide for itself whether to get x
or y
first.
When it evaluates x or y
,
on the other hand,
it always starts by checking whether x
is True
.
If it is,
it stops evaluation right there:
since or
is True
if either operand is True
,
Python doesn't need to know the value of y
in order to complete its calculations.
If x
is False
,
on the other hand,
Python must get y
in order to figure out the expression's final value.
Similarly,
when Python evaluates x and y
,
it always starts by getting the value of x
.
If this is False
,
the result is bound to be False
,
so Python doesn't even try to get the value of y
.
This is called short-circuit evaluation,
and is often used to do things like this:
if (number != 0) and (1/number < threshold):
total += 1/number
Without that first test,
the if
would blow up if number
was zero.
Since Python always executes the check for zero before checking the reciprocal of number
,
though,
this is safe to execute.
One other thing that's special about Booleans is that
values of almost any other type can be used in their place.
The numbers 0 and 0.0 are treated as equivalent to False
,
and so is the empty string ''
;
all other numbers and strings are equivalent to True
.
This means that we can rewrite:
if len(some_string) > 0:
...do something...
as:
if len(some_string):
...do something...
or even just as:
if some_string:
...do something...
The first version checks that the length of the string is greater than zero,
i.e.,
that the string contains some characters.
The second version checks that the length of the string is not zero;
since the length can't be negative,
this is the same as checking that it's positive.
The final version just checks that some_string
is not the empty string:
it's the shortest,
the most efficient to execute,
and the one that most experienced Python programmers would write,
but it also puts the greatest burden on the reader.
Which one you use is up to you,
but whatever you do,
please be consistent:
many studies have shown that people can learn to read almost anything quickly
as long as there are patterns for their eyes and brain to follow
Initializing image grids by assigning colors to cells one at a time is getting pretty tedious, so let's invent something easier. As a general rule, this is how a lot of software gets written, and not just by scientists: if we find ourselves doing the same thing repeatedly, it's worth taking a few minutes to teach the computer how to do it for us.
We'll start by coloring the cells of a one-row grid red or green based on a string containing the letters 'R' and 'G':
In [30]:
data = 'RGRGRRGG'
row = ImageGrid(8, 1)
for x in range(8):
if data[x] == 'R':
row[x, 0] = colors['Red']
else:
row[x, 0] = colors['Green']
row.show()
That seems to have worked: the cells of the grid are red and green in the same locations that the characters 'R' and 'G' appear in the string. Let's try putting the code in a function:
In [31]:
def color_from_string(grid, data):
for x in range(grid.width):
if data[x] == 'R':
grid[x, 0] = colors['Red']
else:
grid[x, 0] = colors['Green']
test_row = ImageGrid(8, 1)
color_from_string(test_row, 'RGRGRGG')
test_row.show()
Whoops: it looks like we're trying to get a character from the string that doesn't exist. Let's try printing out our loop variable as we go along:
In [32]:
def color_from_string(grid, data):
for x in range(grid.width):
print '*** x is', x
if data[x] == 'R':
grid[x, 0] = colors['Red']
else:
grid[x, 0] = colors['Green']
color_from_string(test_row, 'RGRGRRGG')
Why would printing out the loop variable stop the function from crashing?
The answer is that we didn't just add a print
statement to our function:
we also passed in a different character string.
In the first call,
we used 'RGRGRGG'
,
which has only 7 characters;
in the second,
we used 'RGRGRRGG'
,
which has 8.
This is an example of what happens when we violate the DRY Principle,
so let's fix it now:
In [33]:
test_string = 'RGRGRRGG' # with the right number of characters
color_from_string(test_row, test_string)
test_row.show()
That's better—but
since there's no guarantee we won't make the same mistake again,
we really ought to modify the function to detect the problem
before we start modifying cell colors.
(We also ought to take out the print
statement.)
In [34]:
def color_from_string(grid, data):
assert grid.width == len(data), 'Grid and string lengths do not match'
for x in range(grid.width):
if data[x] == 'R':
grid[x, 0] = colors['Red']
else:
grid[x, 0] = colors['Green']
The statement on line 2 is called an assertion. When Python encounters one, it checks that the assertion's condition is true. If it is, Python does nothing, but if it's not, Python halts the program immediately and prints the error message provided. Let's test it out:
In [35]:
should_work = ImageGrid(4, 1)
color_from_string(should_work, 'RGRG')
should_work.show()
and:
In [36]:
should_fail = ImageGrid(4, 1)
color_from_string(should_fail, 'RGRGRGRG') # string is too long
Excellent: rather than trusting us to be perfect, our function is now checking that we've called it with sensible values, and halting immediately if we haven't. This is one embodiment of another general principle of programming called FEFO: fail early, fail often. The more code the computer executes between when something goes wrong and when the symptoms of that error show up, the more we'll have to wade through when debugging, so getting the computer to stop as soon as it can after a mistake can save us hours or days of hunting around. Assertions are also good documentation, since they give human readers hints about how the code ought to work.
We can check more than just the lengths of the row to be filled and the input string, and we should. Here's another mistaken call to our function:
In [37]:
filled_incorrectly = ImageGrid(12, 1)
color_from_string(filled_incorrectly, 'GGGGGGRRRPRR')
filled_incorrectly.show()
Why is one cell on the right green? It should be red, because our string is six G's and six—oh. Oops. There's a P mixed in with our R's, but since the two letters are so similar, it was hard to spot. Let's add an assertion to our function to check for that:
In [38]:
def color_from_string(grid, data):
assert grid.width == len(data), 'Grid and string lengths do not match'
for x in range(grid.width):
assert data[x] in 'GR', 'Unknown character in data string'
if data[x] == 'R':
grid[x, 0] = colors['Red']
else:
grid[x, 0] = colors['Green']
color_from_string(filled_incorrectly, 'GGGGGGRRRPRR') # hopefully the same wrong string as before
filled_incorrectly.show()
Excellent: as strange as it may sound, the function is failing as desired.
In [39]:
def color_from_string(grid, data):
assert grid.width == len(data), 'Grid and string lengths do not match'
for x in range(grid.width):
if data[x] == 'R':
grid[x, 0] = colors['Red']
elif data[x] == 'G':
grid[x, 0] = colors['Green']
else:
assert False, 'Unknown character in data string'
The advantage of this is that the legal characters only appear once, rather than being duplicated in the `assert` and in the conditional. The disadvantage is that `assert False` sounds odd to many people, since it's guaranteed to fail every time. More importantly, it doesn't make sense on its own: we can only understand **why** we're always failing on that line by reading the `if` and `elif` that come before it.
Another way to write this, which many people prefer, is to do all our checking before we modify the grid:
In [40]:
def color_from_string(grid, data):
assert grid.width == len(data), 'Grid and string lengths do not match'
for char in data:
assert char in 'RG', 'Unknown character in data string'
for x in range(grid.width):
if data[x] == 'R':
grid[x, 0] = colors['Red']
else:
grid[x, 0] = colors['Green']
This version doesn't do anything until it's sure that whatever it does will succeed, so there's no risk of changing part of the grid but not the rest. On the other hand, there are now two loops instead of one. While the slowdown due to the extra loop won't be noticeable on small grids, checking data before modifying it can have a noticeable impact on the speed of programs that are working with terabytes. Again, it's up to you to use whichever variation you prefer, but whichever one you choose, you should write all your checks the same way.
To end this lesson, let's try to make our error messages a little more helpful. After all, few things are as frustrating as being told that something is wrong, but not being told what. We'll start by inserting a few strings into another string:
In [41]:
print '{0} and {1} shared the Nobel Prize in 1947'.format('Gerty Cori', 'Carl Cori')
As you can probably infer,
strings have a method called format
that can be given any number of parameters.
These parameters are interpolated
wherever the markers {0}
, {1}
, and so on appear.
It's OK to use a value several times:
In [42]:
print '{0} is the same as {0}'.format('this')
and we can provide extra information to format numbers and many (many) other things:
In [43]:
print 'Four digits, two after the decimal point: {0:4.2f}'.format(3.14159)
What we can't do is change the original string:
In [44]:
mass = 46.5
name = 'Lyell'
template = 'Mass {0:4e} and name "{1:^20}"'
result = template.format(mass, name)
print 'result after formatting:', result
print 'template after formatting:', template
The reason is that strings are immutable, i.e., they cannot be changed after they have been created. Numbers are also immutable: we can't assign a new value to 2 or 3.14159 (although we can assign these values to variables, then change those variables). We will meet some mutable data types in the python-4-files-lists.ipynb; for now, we'll use what we have learned about string formatting to make our error messages more readable:
In [45]:
def color_from_string(grid, data):
assert grid.width == len(data), \
'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))
for x in range(grid.width):
assert data[x] in 'GR', \
'Unknown character in data string: "{0}"'.format(data[x])
if data[x] == 'R':
grid[x, 0] = colors['Red']
else:
grid[x, 0] = colors['Green']
Notice that we have used \
at the end of lines to tell Python that
the statement continues on the following line.
This makes the program easier to read
by making the assertion's condition easier to spot,
and by reducing the number of very long lines in our code.
Let's try our revised function:
In [46]:
test_case = ImageGrid(4, 1)
color_from_string(test_case, 'R') # wrong length
and:
In [47]:
color_from_string(test_case, 'PPPP')
There's only one thing left to do:
document our function.
The name color_from_string
may seem obvious to us right now,
but when we revisit this code in six weeks,
we might easily think it means "convert a string to a color".
Let's add a docstring
(documentation string)
to the function:
In [48]:
def color_from_string(grid, data):
"Color grid cells red and green according to 'R' and 'G' in data."
assert grid.width == len(data), \
'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))
for x in range(grid.width):
assert data[x] in 'GR', \
'Unknown character in data string: "{0}"'.format(data[x])
if data[x] == 'R':
grid[x, 0] = colors['Red']
else:
grid[x, 0] = colors['Green']
We use a docstring rather than a comment because
docstrings are what help
displays:
In [49]:
help(color_from_string)
if
/elif
/else
to make choices in programs.assert
to embed self-checks in programs.string.format
to create nicely-formatted output.