In [1]:
import numpy as np
Bugs are codes that result in errors or wrong results.
In [9]:
# Syntax error
x = 1; y = 2
b = x == y # Boolean variable that is true when x & y have the same value
b = 1 == 2 # Syntax error
In [5]:
b
Out[5]:
In [11]:
# Exception - invalid operation
a = 0
5/a # Division by zero
In [15]:
# Exception - invalid operation
input = '40'
float(input)/11 # Incompatiable types for the operation
Out[15]:
In [17]:
# Incorrect logic
import math
x = 55
math.sin(x)**2 + math.cos(x)**2 == 1 # Should be math.cos(x)**2
Out[17]:
Question: If incorrect code is never executed, is it a bug?
This is the software equivalent to "If a tree falls and no one hears it, does it make a sound?"
Debugging has the following steps:
There are three main methods commonly used for bug isolation:
print
statements (or other logging techniques)pdb
.Typically, all three are used in combination, often repeatedly.
In [20]:
def entropy(ps):
items = ps * np.log(ps)
return -np.sum(items)
In [22]:
ps = [0.1, 0.3, 0.5, 0.7, 0.9]
entropy(ps)
Out[22]:
What's the bug here and how do we resolve?
In [ ]:
def entropy(ps):
print ps
items = ps * np.log(ps)
return -np.sum(items)
ps = "0.1, 0.3, 0.5, 0.7, 0.9"
entropy(ps)
We should have documented the inputs to the function!
In [ ]:
def entropy(ps):
print ps
items = ps * np.log(ps)
return -np.sum(items)
ps = [0.1, 0.3, 0.5, 0.7, 0.9]
entropy(ps)
Now it works fine for the first set of inputs. Let's try other inputs.
In [23]:
# Create a vector of probabilities.
ps = np.arange(5.)
ps /= ps.sum()
ps
Out[23]:
In [24]:
entropy(ps)
Out[24]:
We get nan
, which stands for "Not a Number". What's going on here?
Often the first thing to try is to simply print things and see what's going on. Within the file, you can add some print statements in key places:
In [30]:
def entropy1(ps):
print("ps=%s" % str(ps))
items = ps * np.log(ps)
if np.isnan(items[0]):
print(items)
return -np.sum(items)
In [32]:
entropy1([.1, .2])
Out[32]:
In [ ]:
np.isnan(np.nan)
By printing some of the intermediate items, we see the problem: 0 * np.log(0) is resulting in a NaN. Though mathematically it's true that limx→0[xlog(x)]=0limx→0[xlog(x)]=0, the fact that we're performing the computation numerically means that we don't obtain this result.
Often, inserting a few print statements can be enough to figure out what's going on.
In [ ]:
def entropy2(ps):
ps = np.asarray(ps) # convert p to array if necessary
print(ps)
items = []
for val in ps:
item = val * np.log(val)
if np.isnan(item):
print("%f makes a nan" % val)
items.append(item)
#items = ps * np.log(ps)
return -np.sum(items)
In [ ]:
entropy2(ps)
Python comes with a built-in debugger called pdb. It allows you to step line-by-line through a computation and examine what's happening at each step. Note that this should probably be your last resort in tracing down a bug. I've probably used it a dozen times or so in five years of coding. But it can be a useful tool to have in your toolbelt.
You can use the debugger by inserting the line
import pdb; pdb.set_trace()
within your script. To leave the debugger, type "exit()". To see the commands you can use, type "help".
Let's try this out:
In [34]:
def entropy(ps):
items = ps * np.log(ps)
if np.isnan(items[0]):
import pdb; pdb.set_trace()
return -np.sum(items)
This can be a more convenient way to debug programs and step through the actual execution.
In [35]:
ps = [0, .1, .1, .3]
entropy(ps)