Debugging Python Programs

Material download: github.com/krother/talks

Dr. Kristian Rother

www.academis.eu

Goal: are monkeys and bananas genetically similar?

Goal: are monkeys and bananas genetically similar?

Input: Protein sequences (strings)

>tr|A0A075B6H5|A0A075B6H5_HUMAN T cell receptor..
METVVTTLPREGGVGPSRKMLLLLLLLGPGSGLSAV
...

Output

Average character count in chimp, banana and human (as reference).

What is a protein?

Proteins are tiny molecular machines that do all kinds of things in living cells. For example, antibodies, digestive enzymes and spider silk are all made of protein.

Proteins are chains made of 20 chemical building blocks, which is why we can easily represent and analyze them as strings.

Protein S100A8 PDB 1mr8 by Emw - Own work, CC BY-SA 3.0

Task: execute code_buggy/parse_uniprot.py

Fix all bugs

Technique #1: Read the Error Message

parse_uniprot.py 
  File "parse_uniprot.py", line 65
    for aa in AMINO_ACIDS
                        ^
SyntaxError: invalid syntax

Rule of thumb: read Python error messages from bottom to top.

SyntaxError: Something obvious is wrong

Exceptions at Runtime

Python is not the easiest language to debug


In [2]:
data = [1, 2, 3, 4


  File "<ipython-input-2-516badf311e1>", line 1
    data = [1, 2, 3, 4
                      ^
SyntaxError: unexpected EOF while parsing

In [3]:
data = 1, 2, 3, 4]


  File "<ipython-input-3-f2ba22b1c99a>", line 1
    data = 1, 2, 3, 4]
                     ^
SyntaxError: invalid syntax

Nasty fact: Errors propagate

Technique #3: Minimize the input

our input file is TOO BIG!

Create a smaller file with just 3 entries, e.g.:

> tr | ABC12345 | python fake protein (Os=Homo sapiens) tail
PYTHQNMMXVII

Technique #4: Scientific Method

Technique #5: Add print Statements

print() is a bit like shooting holes into a wall to see what is inside

Introspection functions

more elegant alternatives to print():

function purpose
dir(x) examine local namespace
locals(x) examine local namespace
globals(x) examine global namespace
help(x) access help interactively
type(x) examine object type
isinstance(x, cl) examine object type
issubclass(cl1, cl2) examine class hierarchy

Technique #6: Assertions

We creating a consistency check for failing early. Add the following assertion:

assert sum(aa_counts) == len(seq)

in parse_uniprot.py at the end of the parse() function.

Technique #7: Explain the problem to someone

Rubber Duck by Florentijn Hofman in Hong Kong, CC-BY-SA 3.0

Also see: https://en.wikipedia.org/wiki/Rubber_duck_debugging

Technique #8: Interactive Debugger

An interactive debugger allows us to watch our program at work in slow motion


In [ ]:
import pdb
pdb.set_trace()


--Call--
> /home/krother/anaconda3/lib/python3.6/site-packages/IPython/core/displayhook.py(236)__call__()
-> def __call__(self, result=None):

Stepwise Execution in ipdb

command description
l, ll list lines
n execute next line
s step into function
c continue execution
q abort
? see other commands

Breakpoints in ipdb

command description
b list breakpoints
b <file:line> add breakpoint
b <function> add breakpoint
b <file:line>, <condition> add breakpoint with condition
cl <number> remove breakpoint

Technique #9: Code Review

Conduct a code review of pipeline.py with your neighbour.

  • Which part of the code is clear to you? Which is not?
  • Do you find any bugs?
  • What would you improve in the code?

Code Review

The control mechanism of the lock of a vault for nuclear waste has been designed for safe operation. It makes sure that it is only possible to access the vault, if the radiation shields are in place or the radiation level in the vault is below a threshold (DANGER_LEVEL). That means:

  • If the remote-controlled radiation shields are in place, the door may be opened by an authorized operator.
  • If the radiation level in the room is below the threshold, the door may be opened by an authorized operator.
  • An authorized operator may open the door by entering a code.

The code below controls the door lock. Note that the safe state is that no entry is possible. Develop an argument for safety that shows that the code is potentially unsafe.

(adopted from I.Sommerville, Software Engineering, 9th edition)

Trivia: Code reviews are seen as superior to automated testing when engineering safety-critical software.

Code Review

Review the following (fictional!) code for a nuclear vault door:

entry_code = lock.get_entry_code()
if entry_code == lock.authorised_code:
    shield_status = shield.get_status()
    radiation_level = rad_sensor.get()
    if radiation_level < DANGER_LEVEL:
        state = SAFE
    else:
        state = UNSAFE
    if shield_status == shield.in_place():
        state = SAFE
    if state == SAFE:
        door.locked = False
        door.unlock()
    else:
        door.lock()
        door.locked = True

(code adopted from I.Sommerville, Software Engineering, 9th edition)

Technique #10: Log information

Log verbosity levels

Summary: What we know about debugging

  • error messages in Python are not always helpful
  • syntax errors are when Python does not do anything
  • some errors cause a program to stop with an Exception
  • read error messages from bottom to top
  • semantic errors: the program does not do the right thing
  • errors are distinct from the underlying defects
  • defects propagate through the program

10 Debugging Techniques

  1. read the error message first
  2. use minimal input for diagnosing bugs
  3. reproduce the bug
  4. use the scientific method instead of guessing blindly
  5. add print statements to get diagnostic information
  6. assertions check for consistency
  7. explaining the problem to someone often helps
  8. use an interactive debugger
  9. code reviews are a superior method for creating reliable software
  10. logging creates more diagnostic information

More on Debugging

Pro Python Best Practices (Apress, 2017)

More on Python: www.academis.eu

Contact: krother@academis.eu


In [ ]: