How much time do you spend writing software? How much time do you spend debugging that software? It turns out that it is very easy to spend lots of time fixing bugs and less time than you would like writing new software to do new science. This is a problem that is fairly well understood by the software engineering community, but many scientists don't take advantage of this knowledge. This afternoon we will take a brief look at some of the tools and technique to make your debugging less painful.
We'll also think a bit about how you may know if your programmes are correct. This is a much harder but important problem. Even minor errors in research code can lead to the retraction of papers, as happened to Geoffrey Chang in 2006 (see http://dx.doi.org/10.1126/science.314.5807.1856). Chang did nothing malicious and committed no fraud, but because of a minor software error had two retract five papers just before Christmas.
NB: This notebook is designed for teaching about exceptions and error testing. It includes deliberate errors. There are probably accidental errors too.
In [1]:
def cell_volume(X, Y, Z):
# Return the volume of a unit cell
# described by lattice vectors X, Y and Z
# The volume is given by the determinant of
# the matrix formed by sticking the three
# vectors together. i.e.
#
# | X[0] Y[0] Z[0] |
# V = | X[1] Y[1] Z[1] |
# | X[2] Y[2] Z[2] |
#
# V = X[0].Y[1].Z[2] + Y[0].Z[1].X[2]
# + X[2].Y[0].Z[1] - Z[0].Y[1].X[2]
# - Y[0].X[1].Z[2] - X[0].Z[1].Y[2]
volume = (X[0]*Y[1]*Z[2] + Y[0]*Z[1]*X[2] + X[2]*Y[0]*Z[1]
- Z[0]*Y[1]*X[2] - Y[0]*X[1]*Z[2] - X[0]*Z[1]*Y[2])
return volume
In [2]:
cell_volume([4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0])
Out[2]:
In [3]:
def mean_cell_volume(cell_list):
# Return the avarage volume of a list
# of unit cells. Each element of cell_list
# should be a list of three lattice vectors,
# each with three components. The volume of
# each cell is calculated and summed before
# being devided by the number of cells to give
# the mean volume.
num_cells = 0
sum_volume = 0.0
for cell in cell_list:
X = cell[0]
Y = cell[1]
Z = cell[2]
sum_volume = sum_volume + cell_volume(X, Y, Z)
num_cells = num_cells + 1
mean_volume = sum_volume/num_cells
return mean_volume
In [4]:
mean_cell_volume([[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
[[10.0, 0.0, 0.0], [0.0, 4.0, 0.0], [0.0, 0.0, 6.0]],
[[6.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 4.0]]])
Out[4]:
In [5]:
mean_cell_volume([[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0]],
[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]]])
In [6]:
mean_cell_volume([])
That something went wrong, where it went wrong, what went wrong, and what the programme was doing at the time. This is an exception.
ZeroDivisionError
)float division by zero
)mean_cell_volume
at line ---> 19
inside cell ipython-input-...
)We can create these ourselves when we run code:
In [7]:
raise Exception("something went wrong")
In [8]:
mean_cell_volume([[[4.0, 0.0, 0.0], [0.0, -10.0, 0.0], [0.0, 0.0, 6.0]],
[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
[[-4.0, 0.0, 0.0], [0.0, -10.0, 0.0], [0.0, 0.0, 6.0]],
[[-4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]]])
Out[8]:
The reason is that there is a bug in cell_volume
.
In [9]:
cell_volume([4.0, 0.0, 0.0], [0.0, -10.0, 0.0], [0.0, 0.0, 6.0])
Out[9]:
The volume should always be positive. We can check for this. This kind of check is known as an assertion.
In [10]:
volume = cell_volume([4.0, 0.0, 0.0], [0.0, -10.0, 0.0], [0.0, 0.0, 6.0])
if (volume < 0.0):
raise AssertionError("The volume must be positive")
print volume
We can write these more easily with the assert statment. It is good practice to put these in your code when you write it (and you know what it does, and what assumptions you have made). These act as a form of documentation as well as a form of protection.
In [11]:
volume = cell_volume([4.0, 0.0, 0.0], [0.0, -10.0, 0.0], [0.0, 0.0, 6.0])
assert volume >= 0.0, "The volume must be positive"
print volume
We can think about three types of assert statment:
precondition - something that must be true at the start of a function in order for it to work correctly.
invariant - something that is always true at a particular point inside a piece of code.
postcondition - something that the function guarantees is true when it finishes.
Lets think of some and add these to the functions above. My collection is inserted below.
In [12]:
def cell_volume(X, Y, Z):
# Return the volume of a unit cell
# described by lattice vectors X, Y and Z
# The volume is given by the determinant of
# the matrix formed by sticking the three
# vectors together. i.e.
#
# | X[0] Y[0] Z[0] |
# V = | X[1] Y[1] Z[1] |
# | X[2] Y[2] Z[2] |
#
# V = X[0].Y[1].Z[2] + Y[0].Z[1].X[2]
# + X[2].Y[0].Z[1] - Z[0].Y[1].X[2]
# - Y[0].X[1].Z[2] - X[0].Z[1].Y[2]
assert len(X) == 3, "X must be a three-vector"
assert len(Y) == 3, "Y must be a three-vector"
assert len(Z) == 3, "Z must be a three-vector"
volume = (X[0]*Y[1]*Z[2] + Y[0]*Z[1]*X[2] + X[2]*Y[0]*Z[1]
- Z[0]*Y[1]*X[2] - Y[0]*X[1]*Z[2] - X[0]*Z[1]*Y[2])
assert volume >= 0.0, "The calculated volume must be positive"
return volume
In [ ]:
def mean_cell_volume(cell_list):
# Return the avarage volume of a list
# of unit cells. Each element of cell_list
# should be a list of three lattice vectors,
# each with three components. The volume of
# each cell is calculated and summed before
# being devided by the number of cells to give
# the mean volume.
num_cells = 0
sum_volume = 0.0
for cell in cell_list:
X = cell[0]
Y = cell[1]
Z = cell[2]
sum_volume = sum_volume + cell_volume(X, Y, Z)
num_cells = num_cells + 1
assert num_cells >= 1, "One or more cells must be provided"
mean_volume = sum_volume/num_cells
return mean_volume
In [13]:
mean_cell_volume([[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0]],
[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]]])
In [14]:
mean_cell_volume([])
In [15]:
cell_volume([4.0, 0.0, 0.0], [0.0, -10.0, 0.0], [0.0, 0.0, 6.0])
If you are interested - the bug is in cell_volume. This should return the absolute value of the determinant. The handedness of the three vectors should not matter.
In [ ]: