In [1]:
import numpy as np

What is a bug?

Bugs are codes that result in errors or wrong results.


In [9]:
# Syntax error
x = 1; y = 2
b = x == y # Boolean variable that is true when x & y have the same value
b = 1 == 2  # Syntax error

In [5]:
b


Out[5]:
False

In [11]:
# Exception - invalid operation
a = 0
5/a  # Division by zero


-------------------------------------------------------------------------
ZeroDivisionError                       Traceback (most recent call last)
<ipython-input-11-b44ec0e90397> in <module>()
      1 # Exception - invalid operation
      2 a = 0
----> 3 5/a  # Division by zero

ZeroDivisionError: integer division or modulo by zero

In [15]:
# Exception - invalid operation
input = '40'
float(input)/11  # Incompatiable types for the operation


Out[15]:
3.6363636363636362

In [17]:
# Incorrect logic
import math
x = 55
math.sin(x)**2 + math.cos(x)**2 == 1  # Should be math.cos(x)**2


Out[17]:
True

Question: If incorrect code is never executed, is it a bug?

This is the software equivalent to "If a tree falls and no one hears it, does it make a sound?"

How Do We Find And Resolve Bugs?

Debugging has the following steps:

  1. Detection of an exception or invalid results. We detail with this in depth in testing.
  2. Isolation of where the program causes the error. This is often the most difficult step.
  3. Resolution of how to change the code to eliminate the error. Mostly, it's not too bad, but sometimes this can cause major revisions in codes.

Isolation Of Bugs

There are three main methods commonly used for bug isolation:

  1. The "thought" method. Think about how your code is structured and so what part of your could would most likely lead to the exception or invalid result.
  2. Inserting print statements (or other logging techniques)
  3. Using a line-by-line debugger like pdb.

Typically, all three are used in combination, often repeatedly.

Say we're trying to compute the entropy of a set of probabilities. The form of the equation is $$ H = -\sum_i p_i \log(p_i) $$ We can write the function like this:


In [20]:
def entropy(ps):
    items = ps * np.log(ps)
    return -np.sum(items)

In [22]:
ps = [0.1, 0.3, 0.5, 0.7, 0.9]
entropy(ps)


Out[22]:
1.2825208657263143

What's the bug here and how do we resolve?


In [ ]:
def entropy(ps):
    print ps
    items = ps * np.log(ps)
    return -np.sum(items)
ps = "0.1, 0.3, 0.5, 0.7, 0.9"
entropy(ps)

We should have documented the inputs to the function!


In [ ]:
def entropy(ps):
    print ps
    items = ps * np.log(ps)
    return -np.sum(items)
ps = [0.1, 0.3, 0.5, 0.7, 0.9]
entropy(ps)

Now it works fine for the first set of inputs. Let's try other inputs.


In [23]:
# Create a vector of probabilities.
ps = np.arange(5.)
ps /= ps.sum()
ps


Out[23]:
array([ 0. ,  0.1,  0.2,  0.3,  0.4])

In [24]:
entropy(ps)


/home/ubuntu/miniconda2/lib/python2.7/site-packages/ipykernel/__main__.py:2: RuntimeWarning: divide by zero encountered in log
  from ipykernel import kernelapp as app
/home/ubuntu/miniconda2/lib/python2.7/site-packages/ipykernel/__main__.py:2: RuntimeWarning: invalid value encountered in multiply
  from ipykernel import kernelapp as app
Out[24]:
nan

We get nan, which stands for "Not a Number". What's going on here?

Often the first thing to try is to simply print things and see what's going on. Within the file, you can add some print statements in key places:


In [30]:
def entropy1(ps):
    print("ps=%s" % str(ps))
    items = ps * np.log(ps)
    if np.isnan(items[0]):
      print(items)
    return -np.sum(items)

In [32]:
entropy1([.1, .2])


ps=[0.1, 0.2]
Out[32]:
0.55214609178622465

In [ ]:
np.isnan(np.nan)

By printing some of the intermediate items, we see the problem: 0 * np.log(0) is resulting in a NaN. Though mathematically it's true that limx→0[xlog(x)]=0limx→0[xlog⁡(x)]=0, the fact that we're performing the computation numerically means that we don't obtain this result.

Often, inserting a few print statements can be enough to figure out what's going on.


In [ ]:
def entropy2(ps):
    ps = np.asarray(ps)  # convert p to array if necessary
    print(ps)
    items = []
    for val in ps:
        item = val * np.log(val)
        if np.isnan(item):
          print("%f makes a nan" % val)
        items.append(item)
    #items = ps * np.log(ps)
    return -np.sum(items)

In [ ]:
entropy2(ps)

Using a Debugger

Python comes with a built-in debugger called pdb. It allows you to step line-by-line through a computation and examine what's happening at each step. Note that this should probably be your last resort in tracing down a bug. I've probably used it a dozen times or so in five years of coding. But it can be a useful tool to have in your toolbelt.

You can use the debugger by inserting the line

import pdb; pdb.set_trace()

within your script. To leave the debugger, type "exit()". To see the commands you can use, type "help".

Let's try this out:


In [34]:
def entropy(ps):
    items = ps * np.log(ps)
    if np.isnan(items[0]):
      import pdb; pdb.set_trace()
    return -np.sum(items)

This can be a more convenient way to debug programs and step through the actual execution.


In [35]:
ps = [0, .1, .1, .3]
entropy(ps)


/home/ubuntu/miniconda2/lib/python2.7/site-packages/ipykernel/__main__.py:2: RuntimeWarning: divide by zero encountered in log
  from ipykernel import kernelapp as app
/home/ubuntu/miniconda2/lib/python2.7/site-packages/ipykernel/__main__.py:2: RuntimeWarning: invalid value encountered in multiply
  from ipykernel import kernelapp as app
> <ipython-input-34-9580c428fbbd>(5)entropy()
-> return -np.sum(items)
(Pdb) ps
[0, 0.1, 0.1, 0.3]
(Pdb) items
array([        nan, -0.23025851, -0.23025851, -0.36119184])
(Pdb) np.log(0.0)
/home/ubuntu/miniconda2/lib/python2.7/site-packages/ipykernel/__main__.py:1: RuntimeWarning: divide by zero encountered in log
  if __name__ == '__main__':
-inf
(Pdb) exit()
-------------------------------------------------------------------------
BdbQuit                                 Traceback (most recent call last)
<ipython-input-35-b737e7857995> in <module>()
      1 ps = [0, .1, .1, .3]
----> 2 entropy(ps)

<ipython-input-34-9580c428fbbd> in entropy(ps)
      3     if np.isnan(items[0]):
      4       import pdb; pdb.set_trace()
----> 5     return -np.sum(items)

<ipython-input-34-9580c428fbbd> in entropy(ps)
      3     if np.isnan(items[0]):
      4       import pdb; pdb.set_trace()
----> 5     return -np.sum(items)

/home/ubuntu/miniconda2/lib/python2.7/bdb.pyc in trace_dispatch(self, frame, event, arg)
     47             return # None
     48         if event == 'line':
---> 49             return self.dispatch_line(frame)
     50         if event == 'call':
     51             return self.dispatch_call(frame, arg)

/home/ubuntu/miniconda2/lib/python2.7/bdb.pyc in dispatch_line(self, frame)
     66         if self.stop_here(frame) or self.break_here(frame):
     67             self.user_line(frame)
---> 68             if self.quitting: raise BdbQuit
     69         return self.trace_dispatch
     70 

BdbQuit: