Today we'll cover dealing with errors in your Python code, an important aspect of writing software.
According to Wikipedia (accessed 16 Oct 2018), a software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or behave in unintended ways.
Engineers have used the term well before electronic computers and software. Sometimes Thomas Edison is credited with the first recorded use of bug in that fashion. [Wikipedia]
Let's discuss three major types of bugs in your code, from easiest to most difficult to diagnose:
In [1]:
import numpy as np
In [38]:
print "This should only work in Python 2.x, not 3.x used in this class.
In [17]:
x = 1; y = 2
b = x == y # Boolean variable that is true when x & y have the same value
b = 1 = 2
In [18]:
b
Out[18]:
In [11]:
# invalid operation
try:
a = 0
5/a # Division by zero
In [10]:
# invalid operation
input = '40'
input/11 # Incompatiable types for the operation
In [31]:
import math
'''Checks that Pythagorean identity holds for one input, theta'''
def check_pythagorean_identity(theta):
return math.sin(theta)**2 + math.cos(theta)*2 == 1
In [32]:
check_pythagorean_identity(12)
Out[32]:
Debugging has the following steps:
The detection of bugs is too often done by chance. While running your Python code, you encounter unexpected functionality, exceptions, or syntax errors. While we'll focus on this in today's lecture, you should never leave this up to chance in the future.
Software testing practices allow for thoughtful detection of bugs in software. We'll discuss more in the lecture on testing.
There are three main methods commonly used for bug isolation:
print
statements (or other logging techniques)pdb
.Typically, all three are used in combination, often repeatedly.
In [82]:
def entropy(p):
items = p * np.log(p)
return -np.add(items)
If we can't easily see the bug here, let's add print statements to see the variables change over time.
Point out that may need slight refactor on result.
def entropy(p):
print(p)
items = p * np.log(p)
print(items)
result = -np.sum(items)
print(result)
return result
Show complication of reading multiple print statements without labels.
Add labels so code looks like below.
def entropy(p):
print("p=%s" % p)
items = p * np.log(p)
print("items=%s" % items)
result = -np.sum(items)
print("result=%s" % result)
return result
In [73]:
np.add?
Note that the print statements significantly reduce legibility of the code. We would like to remove them when we're done debugging.
In [97]:
def entropy(p):
items = p * np.log(p)
return -np.sum(items)
In [80]:
p = [0.1, 0.3, 0.5, 0.7, 0.9]
entropy(p)
Out[80]:
Now it works fine for the first set of inputs. Let's try other inputs.
We should have documented the inputs to the function!
In [101]:
# Create a vector of probabilities.
p = np.arange(start=5., stop=-1., step=-0.5)
p /= np.sum(p)
p
Out[101]:
In [105]:
entropy(p)
Out[105]:
We get nan
, which stands for "Not a Number". What's going on here?
Let's add our print statements again, but it only fails later in the range of numbers. We may choose to print only if we find a nan
.
In [144]:
def entropy1(p):
print("p=%s" % str(p))
items = p * np.log(p)
if [np.isnan(el) for el in items]:
print(items)
return -np.sum(items)
In [142]:
entropy1([.1, .2])
Out[142]:
In [145]:
entropy1(p)
Out[145]:
By printing some of the intermediate items, we see the problem: 0 * np.log(0) is resulting in a NaN. Though mathematically it's true that limx→0[xlog(x)]=0limx→0[xlog(x)]=0, the fact that we're performing the computation numerically means that we don't obtain this result.
Often, inserting a few print statements can be enough to figure out what's going on.
In [140]:
def entropy2(p):
p = np.asarray(p) # convert p to array if necessary
print(p)
items = []
for val in p:
item = val * np.log(val)
if np.isnan(item):
print("%f makes a nan" % val)
items.append(item)
#items = p * np.log(ps)
return -np.sum(items)
In [114]:
entropy2(p)
Out[114]:
pdb
Python comes with a built-in debugger called pdb. It allows you to step line-by-line through a computation and examine what's happening at each step. Note that this should probably be your last resort in tracing down a bug. I've probably used it a dozen times or so in five years of coding. But it can be a useful tool to have in your toolbelt.
You can use the debugger by inserting the line
import pdb; pdb.set_trace()
within your script. To leave the debugger, type "exit()". To see the commands you can use, type "help".
Let's try this out:
In [148]:
def entropy(p):
import pdb; pdb.set_trace()
items = p * np.log(p)
return -np.sum(items)
This can be a more convenient way to debug programs and step through the actual execution.
In [ ]:
p = [.1, -.2, .3]
entropy(p)
In [81]:
p = "[0.1, 0.3, 0.5, 0.7, 0.9]"
entropy(p)