This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599_2014/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2014_fall_ASTR599/).

When Things Go Wrong:

Errors, Exceptions, and Debugging

Today we'll cover perhaps one of the most important aspects of using Python: dealing with errors and bugs in code.

Three Classes of Errors

Types of bugs/errors in code, from the easiest to the most difficult to diagnose:

  1. Syntax Errors: Errors where the code is not valid Python (generally easy to fix)
  2. Runtime Errors: Errors where syntactically valid code fails to execute (sometimes easy to fix)
  3. Semantic Errors: Errors in logic (often very difficult to fix)

Syntax Errors

Syntax errors are when you write code which is not valid Python. For example:


In [1]:
X = [1, 2, 3)


  File "<ipython-input-1-60713a9f7956>", line 1
    X = [1, 2, 3)
                ^
SyntaxError: invalid syntax

In [2]:
y = 4x + 3


  File "<ipython-input-2-d2bc800650b1>", line 1
    y = 4x + 3
         ^
SyntaxError: invalid syntax

Note that if your code contains even a single syntax error, none of it will run:


In [3]:
a = 4
something ==== is wrong


  File "<ipython-input-3-940e499369c5>", line 2
    something ==== is wrong
                 ^
SyntaxError: invalid syntax

In [4]:
print(a)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-c5a4f3535135> in <module>()
----> 1 print(a)

NameError: name 'a' is not defined

Even though the syntax error appears below the (valid) variable definition, the valid code is not executed.

Runtime Errors

Runtime errors occur when the code is valid python code, but are errors within the context of the program execution. For example:


In [5]:
print(Q)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-e796bdcf24ff> in <module>()
----> 1 print(Q)

NameError: name 'Q' is not defined

In [6]:
x = 1 + 'abc'


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-785fcfd7f367> in <module>()
----> 1 x = 1 + 'abc'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [7]:
X = 1 / 0


---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-7-746fef050f31> in <module>()
----> 1 X = 1 / 0

ZeroDivisionError: division by zero

In [8]:
import numpy as np
np.add(1, 2, 3, 4)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-17bf2f52943b> in <module>()
      1 import numpy as np
----> 2 np.add(1, 2, 3, 4)

ValueError: invalid number of arguments

In [9]:
x = [1, 2, 3]
print(x[100])


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-e2d018ea7fa3> in <module>()
      1 x = [1, 2, 3]
----> 2 print(x[100])

IndexError: list index out of range

Unlike Syntax errors, RunTime errors occur during code execution, which means that valid code occuring before the runtime error will execute:


In [10]:
spam = "my all-time favorite"
eggs = 1 / 0


---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-10-6efac0201e49> in <module>()
      1 spam = "my all-time favorite"
----> 2 eggs = 1 / 0

ZeroDivisionError: division by zero

In [11]:
print(spam)


my all-time favorite

Semantic Errors

Semantic errors are perhaps the most insidious errors, and are by far the ones that will take most of your time. Semantic errors occur when the code is syntactically correct, but produces the wrong result.

By way of example, imagine you want to write a simple script to approximate the value of $\pi$ according to the following formula:

$$ \pi = \sqrt{12} \sum_{k = 0}^{\infty} \frac{(-3)^{-k}}{2k + 1} $$

You might write a function something like this, using numpy's vectorized syntax:


In [12]:
from math import sqrt

def approx_pi(nterms=100):
    k = np.arange(nterms)
    return sqrt(12) * np.sum(-3.0 ** -k / (2 * k + 1))

Looks OK, yes? Let's try it out:


In [13]:
approx_pi(100)


Out[13]:
-3.9508736907744493

Huh. That doesn't look like $\pi$. Maybe we need more terms?


In [14]:
approx_pi(1000)


Out[14]:
-3.9508736907744493

Nope... it looks like the algorithm simply gives the wrong result. This is a classic example of a semantic error.

Question: can you spot the problem?

Runtime Errors and Exception Handling

Now we'll talk about how to handle RunTime errors (we skip Syntax Errors because they're pretty self-explanatory).

Runtime errors can be handled through "exception catching" using try...except statements. Here's a basic example:


In [15]:
try:
    print("this block gets executed first")
except:
    print("this block gets executed if there's an error")


this block gets executed first

In [16]:
try:
    print("this block gets executed first")
    x = 1 / 0  # ZeroDivisionError
    print("we never get here")
except:
    print("this block gets executed if there's an error")


this block gets executed first
this block gets executed if there's an error

Notice that the first block executes up until the point of the Runtime error. Once the error is hit, the except block is executed.

One important note: the above clause catches any and all exceptions. It is not generally a good idea to catch-all. Better is to name the precise exception you expect:


In [17]:
def safe_divide(a, b):
    try:
        return a / b
    except:
        print("oops, dividing by zero. Returning None.")
        return None
    
print(safe_divide(15, 3))
print(safe_divide(1, 0))


5.0
oops, dividing by zero. Returning None.
None

But there's a problem here: this is a catch-all exception, and will sometimes give us misleading information. For example:


In [18]:
safe_divide(15, 'three')


oops, dividing by zero. Returning None.

Our program tells us we're dividing by zero, but we aren't! This is one reason you should almost never use a catch-all try..except statement, but instead specify the errors you're trying to catch:


In [19]:
def better_safe_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        print("oops, dividing by zero. Returning None.")
        return None
    
better_safe_divide(15, 0)


oops, dividing by zero. Returning None.

In [20]:
better_safe_divide(15, 'three')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-1cb60731bdd8> in <module>()
----> 1 better_safe_divide(15, 'three')

<ipython-input-19-67529345ead1> in better_safe_divide(a, b)
      1 def better_safe_divide(a, b):
      2     try:
----> 3         return a / b
      4     except ZeroDivisionError:
      5         print("oops, dividing by zero. Returning None.")

TypeError: unsupported operand type(s) for /: 'int' and 'str'

This also allows you to specify different behaviors for different exceptions:


In [21]:
def even_better_safe_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        print("oops, dividing by zero. Returning None.")
        return None
    except TypeError:
        print("incompatible types.  Returning None")
        return None

In [22]:
even_better_safe_divide(15, 3)


Out[22]:
5.0

In [23]:
even_better_safe_divide(15, 0)


oops, dividing by zero. Returning None.

In [24]:
even_better_safe_divide(15, 'three')


incompatible types.  Returning None

Remember this lesson, and always specify your except statements! I once spent an entire day tracing down a bug in my code which amounted to this.

Raising Your Own Exceptions

When you write your own code, it's good practice to use the raise keyword to create your own exceptions when the situation calls for it:


In [25]:
import os  # the "os" module has useful operating system stuff

def read_file(filename):
    if not os.path.exists(filename):
        raise ValueError("'{0}' does not exist".format(filename))
    f = open(filename)
    result = f.read()
    f.close()
    return result

We'll use IPython's %%file magic to quickly create a text file


In [26]:
%%file tmp.txt
this is the contents of the file


Overwriting tmp.txt

In [27]:
read_file('tmp.txt')


Out[27]:
'this is the contents of the file'

In [28]:
read_file('file.which.does.not.exist')


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-cd859ba872b6> in <module>()
----> 1 read_file('file.which.does.not.exist')

<ipython-input-25-4ee8d27c305e> in read_file(filename)
      3 def read_file(filename):
      4     if not os.path.exists(filename):
----> 5         raise ValueError("'{0}' does not exist".format(filename))
      6     f = open(filename)
      7     result = f.read()

ValueError: 'file.which.does.not.exist' does not exist

It is sometimes useful to define your own custom exceptions, which you can do easily via class inheritance:


In [29]:
class NonExistentFile(RuntimeError):
    # you can customize exception behavior by defining class methods.
    # we won't discuss that here.
    pass


def read_file(filename):
    if not os.path.exists(filename):
        raise NonExistentFile(filename)
    f = open(filename)
    result = f.read()
    f.close()
    return result

In [30]:
read_file('tmp.txt')


Out[30]:
'this is the contents of the file'

In [31]:
read_file('file.which.does.not.exist')


---------------------------------------------------------------------------
NonExistentFile                           Traceback (most recent call last)
<ipython-input-31-cd859ba872b6> in <module>()
----> 1 read_file('file.which.does.not.exist')

<ipython-input-29-30c28eb497e1> in read_file(filename)
      7 def read_file(filename):
      8     if not os.path.exists(filename):
----> 9         raise NonExistentFile(filename)
     10     f = open(filename)
     11     result = f.read()

NonExistentFile: file.which.does.not.exist

Get used to throwing appropriate — and meaningful — exceptions in your code! It makes reading and debugging your code much, much easier.

More Advanced Exception Handling

There is also the possibility of adding else and finally clauses to your try statements. You'll probably not need these often, but in case you encounter them some time, it's good to know what they do.

The behavior looks like this:


In [32]:
try:
    print("doing something")
except:
    print("this only happens if it fails")
else:
    print("this only happens if it succeeds")


doing something
this only happens if it succeeds

In [33]:
try:
    print("doing something")
    raise ValueError()
except:
    print("this only happens if it fails")
else:
    print("this only happens if it succeeds")


doing something
this only happens if it fails

Why would you ever want to do this? Mainly, it prevents the code within the else block from being caught by the try block. Accidentally catching an exception you don't mean to catch can lead to confusing results.

The last statement you might use is the finally statement, which looks like this:


In [34]:
try:
    print("do something")
except:
    print("this only happens if it fails")
else:
    print("this only happens if it succeeds")
finally:
    print("this happens no matter what.")


do something
this only happens if it succeeds
this happens no matter what.

In [35]:
try:
    print("do something")
    raise ValueError()
except:
    print("this only happens if it fails")
else:
    print("this only happens if it succeeds")
finally:
    print("this happens no matter what.")


do something
this only happens if it fails
this happens no matter what.

finally is generally used for some sort of cleanup (closing a file, etc.) It might seem a bit redundant, though. Why not write the following?


In [36]:
try:
    print("do something")
except:
    print("this only happens if it fails")
else:
    print("this only happens if it succeeds")
print("this happens no matter what.")


do something
this only happens if it succeeds
this happens no matter what.

The main difference is when the clause is used within a function:


In [37]:
def divide(x, y):
    try:
       result = x / y
    except ZeroDivisionError:
        print("division by zero!")
        return None
    else:
        print("result is", result)
        return result
    finally:
        print("some sort of cleanup")

In [38]:
divide(15, 3)


result is 5.0
some sort of cleanup
Out[38]:
5.0

In [39]:
divide(15, 0)


division by zero!
some sort of cleanup

Note that the finally clause is executed no matter what, even if the return statement has already executed! This makes it useful for cleanup tasks, such as closing an open file, restoring a state, or something along those lines.

Handling Semantic Errors: Debugging

Here is the most difficult piece of this lecture: handling semantic errors. This is the situation where your program runs, but doesn't produce the correct result. These errors are commonly known as bugs, and the process of correcting the bugs is debugging.

There are three main methods commonly used for debugging Python code. In order of increasing sophistication, they are:

  1. Inserting print statements
  2. Injecting an IPython interpreter
  3. Using a line-by-line debugger like pdb

The easiest method: print statements

Say we're trying to compute the entropy of a set of probabilities. The form of the equation is $$ H = -\sum_i p_i \log(p_i) $$ We can write the function like this:


In [40]:
def entropy(p):
    p = np.asarray(p)  # convert p to array if necessary
    items = p * np.log(p)
    return -np.sum(items)

Say these are our probabilities:


In [41]:
p = np.arange(5.)
p /= p.sum()

In [42]:
entropy(p)


-c:3: RuntimeWarning: divide by zero encountered in log
-c:3: RuntimeWarning: invalid value encountered in multiply
Out[42]:
nan

We get nan, which stands for "Not a Number". What's going on here?

Often the first thing to try is to simply print things and see what's going on. Within the file, you can add some print statements in key places:


In [43]:
def entropy(p):
    p = np.asarray(p)  # convert p to array if necessary
    print(p)
    items = p * np.log(p)
    print(items)
    return -np.sum(items)

entropy(p)


[ 0.   0.1  0.2  0.3  0.4]
[        nan -0.23025851 -0.32188758 -0.36119184 -0.36651629]
-c:4: RuntimeWarning: divide by zero encountered in log
-c:4: RuntimeWarning: invalid value encountered in multiply
Out[43]:
nan

By printing some of the intermediate items, we see the problem: 0 * np.log(0) is resulting in a NaN. Though mathematically it's true that $\lim_{x\to 0} [x\log(x)] = 0$, the fact that we're performing the computation numerically means that we don't obtain this result.

Often, inserting a few print statements can be enough to figure out what's going on.

Embedding an IPython instance

You can go a step further by actually embedding an IPython instance in your code. This doesn't work from within the notebook, so we'll create a file and run it from the command-line


In [44]:
%%file test_script.py
import numpy as np

def entropy(p):
    p = np.asarray(p)  # convert p to array if necessary
    items = p * np.log(p)
    import IPython; IPython.embed()
    return -np.sum(items)

p = np.arange(5.)
p /= p.sum()
entropy(p)


Overwriting test_script.py

Now open a terminal and run this. You'll see that an IPython interpreter opens, and from there you can print p, print items, and do any manipulation you feel like doing. This can also be a nice way to debug a script.

Using a Debugger

Python comes with a built-in debugger called pdb. It allows you to step line-by-line through a computation and examine what's happening at each step. Note that this should probably be your last resort in tracing down a bug. I've probably used it a dozen times or so in five years of coding. But it can be a useful tool to have in your toolbelt.

You can use the debugger by inserting the line

import pdb; pdb.set_trace()

within your script. Let's try this out:


In [45]:
def entropy(p):
    import pdb; pdb.set_trace()
    p = np.asarray(p)  # convert p to array if necessary
    items = p * np.log(p)
    return -np.sum(items)

entropy(p)


> <ipython-input-45-6fd2b690ec56>(3)entropy()
-> p = np.asarray(p)  # convert p to array if necessary
(Pdb) h

Documented commands (type help <topic>):
========================================
EOF    cl         disable  interact  next     return  u          where
a      clear      display  j         p        retval  unalias  
alias  commands   down     jump      pp       run     undisplay
args   condition  enable   l         print    rv      unt      
b      cont       exit     list      q        s       until    
break  continue   h        ll        quit     source  up       
bt     d          help     longlist  r        step    w        
c      debug      ignore   n         restart  tbreak  whatis   

Miscellaneous help topics:
==========================
pdb  exec

(Pdb) n
> <ipython-input-45-6fd2b690ec56>(4)entropy()
-> items = p * np.log(p)
(Pdb) print p
array([ 0. ,  0.1,  0.2,  0.3,  0.4])
(Pdb) print np.log(p)
-c:1: RuntimeWarning: divide by zero encountered in log
array([       -inf, -2.30258509, -1.60943791, -1.2039728 , -0.91629073])
(Pdb) n
> <ipython-input-45-6fd2b690ec56>(5)entropy()
-> return -np.sum(items)
(Pdb) print items
array([        nan, -0.23025851, -0.32188758, -0.36119184, -0.36651629])
(Pdb) q
---------------------------------------------------------------------------
BdbQuit                                   Traceback (most recent call last)
<ipython-input-45-6fd2b690ec56> in <module>()
      5     return -np.sum(items)
      6 
----> 7 entropy(p)

<ipython-input-45-6fd2b690ec56> in entropy(p)
      3     p = np.asarray(p)  # convert p to array if necessary
      4     items = p * np.log(p)
----> 5     return -np.sum(items)
      6 
      7 entropy(p)

<ipython-input-45-6fd2b690ec56> in entropy(p)
      3     p = np.asarray(p)  # convert p to array if necessary
      4     items = p * np.log(p)
----> 5     return -np.sum(items)
      6 
      7 entropy(p)

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/bdb.py in trace_dispatch(self, frame, event, arg)
     45             return # None
     46         if event == 'line':
---> 47             return self.dispatch_line(frame)
     48         if event == 'call':
     49             return self.dispatch_call(frame, arg)

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/bdb.py in dispatch_line(self, frame)
     64         if self.stop_here(frame) or self.break_here(frame):
     65             self.user_line(frame)
---> 66             if self.quitting: raise BdbQuit
     67         return self.trace_dispatch
     68 

BdbQuit: 

This can be a more convenient way to debug programs and step through the actual execution.

When you run this, you'll see the pdb prompt where you can enter one of several commands. If you type h for "help", it will list the possible commands:

(Pdb) h
Documented commands (type help <topic>):
========================================
EOF    bt         cont      enable  jump  pp       run      unt   
a      c          continue  exit    l     q        s        until 
alias  cl         d         h       list  quit     step     up    
args   clear      debug     help    n     r        tbreak   w     
b      commands   disable   ignore  next  restart  u        whatis
break  condition  down      j       p     return   unalias  where 

Miscellaneous help topics:
==========================
exec  pdb

Undocumented commands:
======================
retval  rv

Type h collowed by a command to see the documentation of that command:

(Pdb) h n
n(ext)
Continue execution until the next line in the current function
is reached or it returns.

The most useful are probably the following:

  • q(uit): quit the debugger and the program.
  • c(ontinute): quit the debugger, continue in the program.
  • n(ext): go to the next step of the program.
  • list: show the current location in the file.
  • <enter>: repeat the previous command.
  • p(rint): print variables.
  • s(tep into): step into a subroutine.
  • r(eturn out): return out of a subroutine.
  • Arbitrary Python code: writing Python code at the (Pdb) will execute it at that point in the program.

We'll see more of this in the next section.

IPython Debugging

IPython also has some magic commands that allow you to debug scripts withing the notebook as soon as you see a failure. For example, imagine we have the following file:


In [46]:
%%file numbers.dat
123 456 789


Overwriting numbers.dat

And we want to execute the following function:


In [47]:
def add_lines(filename):
    f = open(filename)
    lines = f.read().split()
    f.close()
    result = 0
    for line in lines:
        result += line
    return result

filename = 'numbers.dat'
total = add_lines(filename)
print(total)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-47-7b1d1fa71c40> in <module>()
      9 
     10 filename = 'numbers.dat'
---> 11 total = add_lines(filename)
     12 print(total)

<ipython-input-47-7b1d1fa71c40> in add_lines(filename)
      5     result = 0
      6     for line in lines:
----> 7         result += line
      8     return result
      9 

TypeError: unsupported operand type(s) for +=: 'int' and 'str'

We get a type error. We can immediately open the debugger using IPython's %debug magic function. Remember to type q to quit!


In [48]:
%debug


> <ipython-input-47-7b1d1fa71c40>(7)add_lines()
      6     for line in lines:
----> 7         result += line
      8     return result

ipdb> print line
'123'
ipdb> print result
0
ipdb> print result + line
*** TypeError: unsupported operand type(s) for +: 'int' and 'str'
ipdb> print result + int(line)
123
ipdb> q

We see that we need to convert the line to an integer before adding!

Advanced debugging

When you write more advanced code (especially if you dig into C or Fortran extensions of Python), you might run into more serious errors like segmentation faults, core dumps, and memory leaks. For these, more advanced tools like gdb and valgrind. If you ever get to the point of needing these, there is a lot of specific info floating around on the web. Here I'll just leave off by letting you know that they exist.

Homework

Below is a script taken from scipy lectures

It is meant to compare the performance of several root-finding algorithms within the scipy.optimize package, but it breaks. Use one or more of the above tools to figure out what's going on and to fix the script.

When you turn in this homework (via github pull request, of course), please write a one to two paragraph summary of the process you used to debug this, including any dead ends (it may help to take notes as you go).

"""
A script to compare different root-finding algorithms.

This version of the script is buggy and does not execute. It is your task
to find an fix these bugs.

The output of the script sould look like:

    Benching 1D root-finder optimizers from scipy.optimize:
                brenth:   604678 total function calls
                brentq:   594454 total function calls
                ridder:   778394 total function calls
                bisect:  2148380 total function calls
"""
from itertools import product

import numpy as np
from scipy import optimize

FUNCTIONS = (np.tan,  # Dilating map
             np.tanh, # Contracting map
             lambda x: x**3 + 1e-4*x, # Almost null gradient at the root
             lambda x: x+np.sin(2*x), # Non monotonous function
             lambda x: 1.1*x+np.sin(4*x), # Fonction with several local maxima
            )

OPTIMIZERS = (optimize.brenth, optimize.brentq,
              optimize.ridder, optimize.bisect)


def apply_optimizer(optimizer, func, a, b):
    """ Return the number of function calls given an root-finding optimizer, 
        a function and upper and lower bounds.
    """
    return optimizer(func, a, b, full_output=True)[1].function_calls,


def bench_optimizer(optimizer, param_grid):
    """ Find roots for all the functions, and upper and lower bounds
        given and return the total number of function calls.
    """
    return sum(apply_optimizer(optimizer, func, a, b)
               for func, a, b in param_grid)


def compare_optimizers(optimizers):
    """ Compare all the optimizers given on a grid of a few different
        functions all admitting a signle root in zero and a upper and
        lower bounds.
    """
    random_a = -1.3 + np.random.random(size=100)
    random_b =   .3 + np.random.random(size=100)
    param_grid = product(FUNCTIONS, random_a, random_b)
    print("Benching 1D root-finder optimizers from scipy.optimize:")
    for optimizer in OPTIMIZERS:
        ncalls = bench_optimizer(optimizer, param_grid)
        print('{name}: {ncalls} total function calls'.format(
                  name=optimizer.__name__, ncalls=ncalls))


if __name__ == '__main__':
    compare_optimizers(OPTIMIZERS)