Defensive programming (1)

How much time do you spend writing software? How much time do you spend debugging that software? It turns out that it is very easy to spend lots of time fixing bugs and less time than you would like writing new software to do new science. This is a problem that is fairly well understood by the software engineering community, but many scientists don't take advantage of this knowledge. This afternoon we will take a brief look at some of the tools and technique to make your debugging less painful.

We'll also think a bit about how you may know if your programmes are correct. This is a much harder but important problem. Even minor errors in research code can lead to the retraction of papers, as happened to Geoffrey Chang in 2006 (see http://dx.doi.org/10.1126/science.314.5807.1856). Chang did nothing malicious and committed no fraud, but because of a minor software error had two retract five papers just before Christmas.

NB: This notebook is designed for teaching about exceptions and error testing. It includes deliberate errors. There are probably accidental errors too.

Mean cell volume

First, we will look at how one programme can produce the wrong answer, and how we can avoid this happening when we use it.


In [2]:
def cell_volume(X, Y, Z):
    # Return the volume of a unit cell 
    # described by lattice vectors X, Y and Z
    # The volume is given by the determinant of
    # the matrix formed by sticking the three 
    # vectors together. i.e.
    #
    #     | X[0] Y[0] Z[0] |
    # V = | X[1] Y[1] Z[1] |
    #     | X[2] Y[2] Z[2] |
    #
    # V = X[0].Y[1].Z[2] + Y[0].Z[1].X[2] 
    #     + X[2].Y[0].Z[1] - Z[0].Y[1].X[2]
    #     - Y[0].X[1].Z[2] - X[0].Z[1].Y[2]
    
    volume = (X[0]*Y[1]*Z[2] + Y[0]*Z[1]*X[2] + X[2]*Y[0]*Z[1]  
           - Z[0]*Y[1]*X[2] - Y[0]*X[1]*Z[2] - X[0]*Z[1]*Y[2])
        
    return volume

In [3]:
cell_volume([4.0,0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0])


Out[3]:
240.0

In [4]:
def mean_cell_volume(cell_list):
    # Return the avarage volume of a list 
    # of unit cells. Each element of cell_list
    # should be a list of three lattice vectors, 
    # each with three components. The volume of
    # each cell is calculated and summed before 
    # being devided by the number of cells to give
    # the mean volume.
    
    num_cells = 0
    sum_volume = 0.0
    for cell in cell_list:
        X = cell[0]
        Y = cell[1]
        Z = cell[2]
        sum_volume = sum_volume + cell_volume(X, Y, Z)
        num_cells = num_cells + 1
    
    mean_volume = sum_volume/num_cells
    
    return mean_volume

In [8]:
mean_cell_volume([[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
                 [[10.0, 0.0,0.0], [0.0, 4.0, 0.0], [0.0, 0.0, 6.0]]]) 
# answer should be 240.0. Cells have same volume as above


Out[8]:
240.0

In [9]:
mean_cell_volume([[[4.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
                 [[10.0, 0.0,0.0], [0.0, 4.0, 0.0], [0.0, 0.0, 6.0]]]) 

# Try removing one of the values.
# Gives error

# Function stack - Function that was called
#                - where the error happenned in function


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-64d7747e3bb0> in <module>()
      1 mean_cell_volume([[[4.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
----> 2                  [[10.0, 0.0,0.0], [0.0, 4.0, 0.0], [0.0, 0.0, 6.0]]]) 
      3 
      4 # Try removing one of the values.

<ipython-input-4-363b41d915d2> in mean_cell_volume(cell_list)
     14         Y = cell[1]
     15         Z = cell[2]
---> 16         sum_volume = sum_volume + cell_volume(X, Y, Z)
     17         num_cells = num_cells + 1
     18 

<ipython-input-2-68194218b676> in cell_volume(X, Y, Z)
     15 
     16     volume = (X[0]*Y[1]*Z[2] + Y[0]*Z[1]*X[2] + X[2]*Y[0]*Z[1]  
---> 17            - Z[0]*Y[1]*X[2] - Y[0]*X[1]*Z[2] - X[0]*Z[1]*Y[2])
     18 
     19     return volume

IndexError: list index out of range

In [10]:
mean_cell_volume([[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
                 [[10.0, 0.0,0.0], [0.0, -4.0, 0.0], [0.0, 0.0, 6.0]]]) 
# Gives answer of 0. Should be 240


Out[10]:
0.0

In [12]:
cell_volume([10.0, 0.0,0.0], [0.0, -4.0, 0.0], [0.0, 0.0, 6.0])
# Volume returned is negative


Out[12]:
-240.0

In [17]:
# Assertions - bits of code to say- if result is what we expect, let it through. Otherwise raise an error
#      Pre-conditions : To check that input args are sensible. Check that calculation is possible
#      Post-conditions : To check that results of calculation are sensible
#      Invariants : To check things are working in the middle of a function. Not so useful in short functions.
assert 1.0 > 0.0, 'Something went wrong'


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-17-97948230e2db> in <module>()
      1 # Assertions - bits of code to say- if result is what we expect, let it through. Otherwise raise an error
      2 assert 1.0 > 0.0, 'Something went wrong'
----> 3 assert 1.0 < 0.0

AssertionError: 

In [18]:
assert 1.0 < 0.0, 'Something went wrong'


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-18-427ce1022919> in <module>()
----> 1 assert 1.0 < 0.0, 'Something went wrong'

AssertionError: Something went wrong

In [ ]:


In [19]:
len([1,2,3])


Out[19]:
3

In [20]:
len([1,2,3])& len([1,2,3])==3


Out[20]:
True

In [21]:
assert len([1,2,3])& len([1,2,3])==3

In [25]:
def cell_volume(X, Y, Z):
    # Return the volume of a unit cell 
    # described by lattice vectors X, Y and Z
    # The volume is given by the determinant of
    # the matrix formed by sticking the three 
    # vectors together. i.e.
    #
    #     | X[0] Y[0] Z[0] |
    # V = | X[1] Y[1] Z[1] |
    #     | X[2] Y[2] Z[2] |
    #
    # V = X[0].Y[1].Z[2] + Y[0].Z[1].X[2] 
    #     + X[2].Y[0].Z[1] - Z[0].Y[1].X[2]
    #     - Y[0].X[1].Z[2] - X[0].Z[1].Y[2]
    
    assert len(X) & len(Y) & len(Z) ==3, 'Vectors not of length 3'
    
    volume = (X[0]*Y[1]*Z[2] + Y[0]*Z[1]*X[2] + X[2]*Y[0]*Z[1]  
           - Z[0]*Y[1]*X[2] - Y[0]*X[1]*Z[2] - X[0]*Z[1]*Y[2])
    
    assert volume >= 0.0, 'Volume is negative'
    return volume

In [26]:
mean_cell_volume([[[4.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
                 [[10.0, 0.0,0.0], [0.0, 4.0, 0.0], [0.0, 0.0, 6.0]]])


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-26-d2c64ff6d386> in <module>()
      1 mean_cell_volume([[[4.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
----> 2                  [[10.0, 0.0,0.0], [0.0, 4.0, 0.0], [0.0, 0.0, 6.0]]]) 

<ipython-input-4-363b41d915d2> in mean_cell_volume(cell_list)
     14         Y = cell[1]
     15         Z = cell[2]
---> 16         sum_volume = sum_volume + cell_volume(X, Y, Z)
     17         num_cells = num_cells + 1
     18 

<ipython-input-25-b84f1e236772> in cell_volume(X, Y, Z)
     14     #     - Y[0].X[1].Z[2] - X[0].Z[1].Y[2]
     15 
---> 16     assert len(X) & len(Y) & len(Z) ==3, 'Vectors not of length 3'
     17 
     18     volume = (X[0]*Y[1]*Z[2] + Y[0]*Z[1]*X[2] + X[2]*Y[0]*Z[1]  

AssertionError: Vectors not of length 3

In [28]:
mean_cell_volume([[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
                 [[10.0, 0.0,0.0], [0.0, -4.0, 0.0], [0.0, 0.0, 6.0]]])


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-28-b6c49516e691> in <module>()
      1 mean_cell_volume([[[4.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 6.0]],
----> 2                  [[10.0, 0.0,0.0], [0.0, -4.0, 0.0], [0.0, 0.0, 6.0]]]) 

<ipython-input-4-363b41d915d2> in mean_cell_volume(cell_list)
     14         Y = cell[1]
     15         Z = cell[2]
---> 16         sum_volume = sum_volume + cell_volume(X, Y, Z)
     17         num_cells = num_cells + 1
     18 

<ipython-input-25-b84f1e236772> in cell_volume(X, Y, Z)
     19            - Z[0]*Y[1]*X[2] - Y[0]*X[1]*Z[2] - X[0]*Z[1]*Y[2])
     20 
---> 21     assert volume >= 0.0, 'Volume is negative'
     22     return volume

AssertionError: Volume is negative

In [ ]: