Development II: Software Engineering and Testing

John (EBo) David -- heavily borrowed from Jarrod Millman (2014)

Before we begin...

This example uses "tappy" which has not been integrated into Anaconda yet. It can be installed with the following command:

pip install tap.py or %install_ext ...

Definitions:

  • Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation and maintenance of software (IEEE'90)
  • Debugging is what you do when you know a program is broken
  • Testing is a determined, systematic attempt to break a program

Main Motivation:

  • Usability - Users can learn it and fast and get their job done easily
  • Efficiency - It doesn’t waste resources such as CPU time and memory
  • Reliability - It does what it is required to do without failing
  • Maintainability - It can be easily changed
  • Reusability - Its parts can be used in other projects, so reprogramming is not needed

Side Note: Pick a license... Any License...

coding horror

Why Bother? I Find Your Lack of Tests Disturbing

  • Over 50% of a projects life-cycle is spent debugging and testing
  • Bug fixes take 33 to 40 times as long to write as new code
  • Additional 15% to 30% investment results in 40% to 90% reduction in defect density
  • Test suite itself is an asset

How do we do it?

Reproducibility is the key

Programming By Contract

  • Formal methods

Pre- and post-condition tests

  • what must be true before a method is invoked
  • what must be true after a method is invoked
  • use assertions

Program defensively (Murphy was an optimist)

  • out-of-range index
  • division by zero
  • error returns

Be systematic

  • incremental
  • simple things first
  • know what to expect
  • compare independent implementations

Automate it

  • regression tests ensure that changes don't break existing functionality
  • verify conservation
  • unit tests (white box testing)
  • measure test coverage

Testing in Python

Landscape

  • errors, exceptions, and debugging
  • assert, doctest, and unit tests
  • logging, unittest, and nose

Errors & Exceptions

Syntax Errors

  • Caught by Python parser, prior to execution
  • arrow marks the last parsed command / syntax, which gave an error

In [ ]:
while True print 'Hello world'

Exceptions

  • Caught during run-time

In [ ]:
1/0

In [ ]:
factorial

In [ ]:
'1'+1

Exception handling


In [ ]:
try:
   file=open('test.txt')
except IOError:
   print 'No such file'

Raising exceptions


In [ ]:
def newfunction():
    raise NotImplementedError

newfunction()

Fixing bugs


In [ ]:
def bar(y):
    return foo(1-y)
def foo(x):
    print "test and",
    if x==0.0:
        print "branch..."
        return float('Inf')
    else:
        return 1/x

bar(1)
bar(1.0)

In [ ]:
def foo(x):
    try:
        print "test and",
        return 1/x
    except ZeroDivisionError:
        print "trap exception..."
        return float('Inf')

bar(1)
bar(1.0)

Test as you code

Type checking


In [ ]:
#i=raw_input("Please enter an integer: ")
i="123"
if not isinstance(i,int):
    print "Casting ", i, " to integer."
    i=int(i)
else:
    print "already an int:",i

Assert invariants

Give immediate feedback -- most implementations halt the system. These are unrecoverable errors.

Many languages allow them to be turned on/off at either compile or run time.


In [ ]:
i=8
if i%3 == 0:
    print 1
elif i%3 == 1:
    print 2
else:
    assert i%3 == 2
    print 3

Example

Let's make a factorial function.


In [ ]:
%%file myfactorial.py

def factorial2(n):
    """ Details to come ...
    """

    raise NotImplementedError

def test():
     from math import factorial
     for x in range(10):
         print ".",
         assert factorial2(x) == factorial(x), \
                "My factorial function is incorrect for n = %i" % x

Let's test it ...


In [ ]:
import myfactorial
myfactorial.test()

Looks like we will have to implement our function, if we want to make any progress...


In [ ]:
%%file myfactorial.py

def factorial2(n):
    """ Details to come ...
    """

    if n == 0:
        return 1
    else:
        return n*factorial2(n-1)

def test():
     from math import factorial
     for x in range(10):
         assert factorial2(x) == factorial(x), \
                "My factorial function is incorrect for n = %i" % x

Let's test it again...


In [ ]:
reload(myfactorial)
myfactorial.test()

Hmmm.... What's going on??

What about preconditions

What happens if we call factorial2 with a negative integer? Or something that's not an integer?


In [ ]:
%%file myfactorial.py
def factorial2(n):
    """ Find n!. Raise an AssertionError if n is negative or non-integral.
    """

    assert n>=0. and type(n) is int, "Unrecognized input"

    if n == 0:
        return 1
    else:
        return n*factorial2(n-1)

def test():
     from math import factorial
     for x in range(10):
         assert factorial2(x) == factorial(x), \
                "My factorial function is incorrect for n = %i" % x

In [ ]:
from myfactorial import factorial2
[factorial2(n) for n in range(5)]

Real world testing and continuous integration

unittest and nose

Test fixtures

  • create self-contained tests
  • setup: open file, connect to a DB, create data structures
  • teardown: tidy up afterward

Test runner

  • nosetests
  • test discovery: any callable beginning with test in a module beginning with test

Testing scientific computing libraries


In [ ]:
import scipy.integrate
scipy.integrate.test()

Assertions revisited

Mathematically

$ x = (\sqrt(x))^2$.

So what is happening here:


In [ ]:
import math
assert 2 == math.sqrt(2)**2

In [ ]:
math.sqrt(2)**2

NumPy Testing


In [ ]:
import numpy as np
np.testing.assert_almost_equal(2, math.sqrt(2)**2)

In [ ]:
x=1.000001
y=1.000002
np.testing.assert_almost_equal(x,y, decimal=5)

What if we consider x and y almost equal? Can we modify our assertion?


In [ ]:
np.testing.assert_almost_equal?

Testing Philosophies

xUnit / declarative testing /

load testing / procedural tests /

record=>replay / TDD / BDD / etc.

eXtreme Programming / Agile /

FIXME: reword...

eXtreme Programming (XP):

Test to Live. Live to Test!

Fine-scale feedback

  • Pair programming
  • Planning game
  • Test-driven development
  • Whole team owns the code

Continuous process

  • Continuous integration
  • Refactoring or design improvement
  • Small releases

Shared understanding

  • Coding standards
  • Collective code ownership
  • Simple design
  • System metaphor

Coding

  • The customer is always available
  • Code the unit test first
  • Only one pair integrates code at a time
  • Leave optimization until last

Testing

  • All code must have unit tests
  • All code must pass all unit tests before it can be released.
  • When a bug is found tests are created before the bug is addressed (a bug is not an error in logic, it is a test that was not written)
  • Acceptance tests are run often and the results are published

Problem: Using them together

  • Everything is tightly coupled
  • Writing new tests environments is a pain
  • Everyone has to figure out how to produce and consume test results

A solutions: The Testing Anything Protocol (TAP)

  • Aims to help by: separating code to produce and consume test results

a simple TAP example:

1..4 ok 1 We can foo ok 2 We can bar not ok 3 Oh No... Br. Bill ok

TAP: Additional Functionality

  • TAP versions
  • Diagnostic output
  • TODO/SKIP tests
  • Nested TAP (very new)
  • Structures diagnostics (as YAML messages)
  • Draft IETF Standard (WIP)

In [ ]:
!nosetests --with-tap myTAP.py

In [ ]:
!cat FunctionTestCase.tap

In [ ]:
!tappy FunctionTestCase.tap

Why Bother?

TAP is language agnostic. How would you integrate testing against a project with Fortran, C, and Python as well as Bash scripts?

Test Summary Report


t/iterators.t (Tests: 92 Failed: 8)

Failed tests: 7-13, 15

t/nofork-mux.t (Tests: 6 Failed: 0)

t/regression.t (Tests: 4794 Failed: 103)

Failed tests: 2, 5, 31, 34, 58, 61, 85, 88, 114, 118, 145-146, 171-172, 200-201, 226-227, 252, 255, 278-279, 308, 312, 338, 342, 368-369, 395-396, 422, 425, 452, 454-455, 481, 484, 509-510, 538-539, 563, 567, 593, 597, 623, 627, 653, 657, 683-684, 686, 690, 716, 720, 746, 749, 775-776, 803-804, 831-832, 835-837, 866, 870, 896-897, 923-924, 926-927, 929, 955, 958, 984, 987, 1013-1014, 1040, 1043, 1069, 1073, 1099, 1102, 1126-1127, 1129, 1133, 1159, 1163, 1189-1190, 1192, 1196, 1222-1223, 1226-1227, 1253, 1257

Plans=47 Tests=9370

Result: FAIL


In [8]:
%%file bats_tap_ex

@test "no arguments prints usage instructions" {
  run bats
  [ $status -eq 1 ]
  [ $(expr "${lines[1]}" : "Usage:") -ne 0 ]
}

@test "-v and --version print version number" {
  run bats -v
  [ $status -eq 0 ]
  [ $(expr "$output" : "Bats [0-9][0-9.]*") -ne 0 ]
}


Writing bats_tap_ex

In [9]:
!bats bats_tap_ex --tap


1..2
ok 1 no arguments prints usage instructions
ok 2 -v and --version print version number

What do you do if you do not have the tools?

Write your own tap producers...

Breakout Session:

Assignment #1: Write a simple test and run it through nose

Assignment #2: Pick a language (any language) and write a function or subroutine "ok" which takes the following parameters:

  • val -- the local truth value (ie success/fail)
  • message -- the test descriptor
  • test_num -- the test number

and auto-increments test_num.

Now write a couple of tests (1==1, 1==0,...)

What is missing?

Add a "plan" -- ie "1.." followed by the number of tests you ran.

Coming Soon!