We made several mistakes while writing the programs in our first few lessons. How can we be sure that there aren't still errors lurking in the code we have? And how can we guard against introducing new errors in code as we modify it?
The first step is to use defensive programming, i.e., to assume that mistakes will happen and to guard against them. One way to do this is to add assertions to our code so that it checks itself as it runs. We met assertions in an earlier lesson; as we said then, each one states something that must be true at a certain point in the program's execution, and optionally includes a customized error message explaining what's gone wrong if it fails:
assert width > 0, 'Grid width must be positive'
assert 0 <= ablation(x, y) <= 1.0, 'Ablation must be normalized'
Programs like the Firefox browser are littered with assertions: 10-20% of the code they contain are there to check that the other 80-90% are working correctly. Broadly speaking, assertions fall into three categories:
For example,
suppose we are representing rectangles using a list of four coordinates [x0, y0, x1, y1]
.
In order to do some calculations,
we need to normalize the rectangle so that it is at the origin
and 1.0 units long on its longest axis.
This function does that:
In [7]:
def normalize_rectangle(rect):
x0, y0, x1, y1 = rect
assert x0 < x1, 'Invalid X coordinates'
assert y0 < y1, 'Invalid Y coordinates'
dx = x1 - x0
dy = y1 - y0
if dx > dy:
scaled = float(dy) / dx
upper_x, upper_y = 1.0, scaled
else:
scaled = float(dx) / dy
upper_x, upper_y = scaled, 1.0
assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'
assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'
return [0, 0, upper_x, upper_y]
The first two assertions check that we've been given a legal rectangle, while the last two check the output we're about to return to our caller. Strictly speaking these post-conditions are redundant: if the inputs and calculations are correct, the last two assertions should always hold. But those are pretty big ifs, and having the program check itself can save us a lot of hunting around later.
A rule that many programmers follow is, "Bugs become assertions." Whenever we fix a bug in a program, we should add some assertions to the program at that point to catch the bug if it reappears. After all, if we made the mistake once, then we (or someone else) might well make it again. Few things are as frustrating as having someone delete several carefully-crafted lines of code that fixed a subtle problem because they didn't realize what problem those lines were there to fix.
Assertions help us catch errors in our code, but things can go wrong for other reasons. In particular, errors may have external causes, like missing or badly-formatted files. Most modern programming languages allow programmers to use exceptions to handle errors in a way that separates what's supposed to happen if everything goes right from what the program should do if something goes wrong. Doing this makes both cases easier to read and understand.
For example, here's a small piece of code that tries to read parameters and a grid from two separate files, and reports an error if either goes wrong:
try:
params = read_params(param_file)
grid = read_grid(grid_file)
except:
log.error('Failed to read input file(s)')
sys.exit(ERROR)
We join the normal case and the error-handling code using the keywords try
and except
.
These work together like if
and else
:
the statements under the try
are what should happen if everything works,
while the statements under except
are what the program should do if something goes wrong.
We have actually seen exceptions before without knowing it,
since by default,
when an exception occurs,
Python prints it out and halts our program.
For example,
trying to open a nonexistent file triggers a type of exception called an IOError
,
while an out-of-bounds index to a list triggers an IndexError
:
In [8]:
open('nonexistent-file.txt', 'r')
In [9]:
values = [0, 1, 2]
print values[999]
We can use try
and except
to deal with these errors ourselves
if we don't want the program simply to fall over.
Here,
for example,
we put our attempt to open a nonexistent file inside a try
,
and in the except
, we print a not-very-helpful error message:
In [10]:
try:
reader = open('nonexistent-file.txt', 'r')
except IOError:
print 'Whoops!'
When Python executes this code,
it runs the statement inside the try
.
If that works, it skips over the except
block without running it.
If an exception occurs inside the try
block,
though,
Python compares the type of the exception to the type specified by the except
.
If they match, it executes the code in the except
block.
IOError
is the particular kind of exception Python raises
when there is a problem related to input and output,
such as files not existing
or the program not having the permissions it needs to read them.
We can put as many lines of code in a try
block as we want,
just as we can put many statements under an if
.
We can also handle several different kinds of errors afterward.
For example,
here's some code to calculate the entropy at each point in a grid:
try:
params = read_params(param_file)
grid = read_grid(grid_file)
entropy = lee_entropy(params, grid)
write_entropy(entropy_file, entropy)
except IOError:
log_error_and_exit('IO error')
except ArithmeticError:
log_error_and_exit('Arithmetic error')
Python tries to run the four functions inside the try
as normal.
If an error occurs in any of them,
Python immediately jumps down
and tries to find an except
of the corresponding type:
if the exception is an IOError
,
Python jumps into the first error handler,
while if it's an ArithmeticError
,
Python jumps into the second handler instead.
It will only execute one of these,
just as it will only execute one branch
of a series of if
/elif
/else
statements.
This layout has made the code easier to read,
but we've lost something important:
the message printed out by the IOError
branch doesn't tell us
which file caused the problem.
We can do better if we capture and hang on to the object that Python creates
to record information about the error:
try:
params = read_params(param_file)
grid = read_grid(grid_file)
entropy = lee_entropy(params, grid)
write_entropy(entropy_file, entropy)
except IOError as err:
log_error_and_exit('Cannot read/write' + err.filename)
except ArithmeticError as err:
log_error_and_exit(err.message)
If something goes wrong in the try
,
Python creates an exception object,
fills it with information,
and assigns it to the variable err
.
(There's nothing special about this variable name—we can use anything we want.)
Exactly what information is recorded depends on what kind of error occurred;
Python's documentation describes the properties of each type of error in detail,
but we can always just print the exception object.
In the case of an I/O error,
we print out the name of the file that caused the problem.
And in the case of an arithmetic error,
printing out the message embedded in the exception object is what Python would have done anyway.
So much for how exceptions work:
how should they be used?
Some programmers use try
and except
to give their programs default behaviors.
For example,
if this code can't read the grid file that the user has asked for,
it creates a default grid instead:
try:
grid = read_grid(grid_file)
except IOError:
grid = default_grid()
Other programmers would explicitly test for the grid file,
and use if
and else
for control flow:
if file_exists(grid_file):
grid = read_grid(grid_file)
else:
grid = default_grid()
It's mostly a matter of taste,
but we prefer the second style.
As a rule,
exceptions should only be used to handle exceptional cases.
If the program knows how to fall back to a default grid,
that's not an unexpected event.
Using if
and else
instead of try
and except
sends different signals to anyone reading our code,
even if they do the same thing.
Novices often ask another question about exception handling style as well, but before we address it, there's something in our example that you might not have noticed. Exceptions can actually be thrown a long way: they don't have to be handled immediately. Take another look at this code:
try:
params = read_params(param_file)
grid = read_grid(grid_file)
entropy = lee_entropy(params, grid)
write_entropy(entropy_file, entropy)
except IOError as err:
log_error_and_exit('Cannot read/write' + err.filename)
except ArithmeticError as err:
log_error_and_exit(err.message)
The four lines in the try
block are all function calls.
They might catch and handle exceptions themselves,
but if an exception occurs in one of them that isn't handled internally,
Python looks in the calling code for a matching except
.
If it doesn't find one there,
it looks in that function's caller,
and so on.
If we get all the way back to the main program without finding an exception handler,
Python's default behavior is to print an error message like the ones we've been seeing all along.
This rule is the origin of the rule "Throw Low, Catch High." There are many places in our program where an error might occur. There are only a few, though, where errors can sensibly be handled. For example, a linear algebra library doesn't know whether it's being called directly from the Python interpreter, or whether it's being used as a component in a larger program. In the latter case, the library doesn't know if the program that's calling it is being run from the command line or from a GUI. The library therefore shouldn't try to handle or report errors itself, because it has no way of knowing what the right way to do this is. It should instead just raise an exception, and let its caller figure out how best to handle it.
Finally, we can raise exceptions ourselves if we want to. In fact, we should do this, since it's the standard way in Python to signal that something has gone wrong. Here, for example, is a function that reads a grid and checks its consistency:
def read_grid(grid_file):
'''Read grid, checking consistency.'''
data = read_raw_data(grid_file)
if not grid_consistent(data):
raise Exception('Inconsistent grid: ' + grid_file)
result = normalize_grid(data)
return result
The raise
statement creates a new exception with a meaningful error message.
Since read_grid
itself doesn't contain a try
/except
block,
this exception will always be thrown up and out of the function,
to be caught and handled by whoever is calling read_grid
.
We can define new types of exceptions if we want to.
And we should,
so that errors in our code can be distinguished from errors in other people's code.
However,
this involves classes and objects,
which is outside the scope of these lessons.
assert
to embed pre-conditions, post-conditions, and invariants in programs.raise
to signal an error, and try
/except
to handle errors.