Development II: Software Engineering and Testing

John (EBo) David -- heavily borrowed from Jarrod Millman (2014)

Before we begin...

This example uses "tappy" which has not been integrated into Anaconda yet. It can be installed with the following command:

pip install tap.py or %install_ext ...

Definitions:

Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation and maintenance of software (IEEE'90)
Debugging is what you do when you know a program is broken
Testing is a determined, systematic attempt to break a program

Main Motivation:

Usability - Users can learn it and fast and get their job done easily
Efficiency - It doesn’t waste resources such as CPU time and memory
Reliability - It does what it is required to do without failing
Maintainability - It can be easily changed
Reusability - Its parts can be used in other projects, so reprogramming is not needed

Side Note: Pick a license... Any License...

coding horror

Why Bother? I Find Your Lack of Tests Disturbing

Over 50% of a projects life-cycle is spent debugging and testing
Bug fixes take 33 to 40 times as long to write as new code
Additional 15% to 30% investment results in 40% to 90% reduction in defect density
Test suite itself is an asset

How do we do it?

Reproducibility is the key

Programming By Contract

Formal methods

Pre- and post-condition tests

what must be true before a method is invoked
what must be true after a method is invoked
use assertions

Program defensively (Murphy was an optimist)

out-of-range index
division by zero
error returns

Be systematic

incremental
simple things first
know what to expect
compare independent implementations

Automate it

regression tests ensure that changes don't break existing functionality
verify conservation
unit tests (white box testing)
measure test coverage

Testing in Python

Landscape

errors, exceptions, and debugging
assert, doctest, and unit tests
logging, unittest, and nose

Errors & Exceptions

Syntax Errors

Caught by Python parser, prior to execution
arrow marks the last parsed command / syntax, which gave an error



In [ ]:

    
while True print 'Hello world'

Exceptions

Caught during run-time



In [ ]:

    
1/0



In [ ]:

    
factorial



In [ ]:

    
'1'+1

Exception handling



In [ ]:

    
try:
   file=open('test.txt')
except IOError:
   print 'No such file'

Raising exceptions



In [ ]:

    
def newfunction():
    raise NotImplementedError

newfunction()

Fixing bugs



In [ ]:

    
def bar(y):
    return foo(1-y)
def foo(x):
    print "test and",
    if x==0.0:
        print "branch..."
        return float('Inf')
    else:
        return 1/x

bar(1)
bar(1.0)



In [ ]:

    
def foo(x):
    try:
        print "test and",
        return 1/x
    except ZeroDivisionError:
        print "trap exception..."
        return float('Inf')

bar(1)
bar(1.0)

Test as you code

Type checking



In [ ]:

    
#i=raw_input("Please enter an integer: ")
i="123"
if not isinstance(i,int):
    print "Casting ", i, " to integer."
    i=int(i)
else:
    print "already an int:",i

Assert invariants

Give immediate feedback -- most implementations halt the system. These are unrecoverable errors.

Many languages allow them to be turned on/off at either compile or run time.



In [ ]:

    
i=8
if i%3 == 0:
    print 1
elif i%3 == 1:
    print 2
else:
    assert i%3 == 2
    print 3

Example

Let's make a factorial function.



In [ ]:

    
%%file myfactorial.py

def factorial2(n):
    """ Details to come ...
    """

    raise NotImplementedError

def test():
     from math import factorial
     for x in range(10):
         print ".",
         assert factorial2(x) == factorial(x), \
                "My factorial function is incorrect for n = %i" % x

Let's test it ...



In [ ]:

    
import myfactorial
myfactorial.test()

Looks like we will have to implement our function, if we want to make any progress...



In [ ]:

    
%%file myfactorial.py

def factorial2(n):
    """ Details to come ...
    """

    if n == 0:
        return 1
    else:
        return n*factorial2(n-1)

def test():
     from math import factorial
     for x in range(10):
         assert factorial2(x) == factorial(x), \
                "My factorial function is incorrect for n = %i" % x

Let's test it again...



In [ ]:

    
reload(myfactorial)
myfactorial.test()

Hmmm.... What's going on??

What about preconditions

What happens if we call factorial2 with a negative integer? Or something that's not an integer?



In [ ]:

    
%%file myfactorial.py
def factorial2(n):
    """ Find n!. Raise an AssertionError if n is negative or non-integral.
    """

    assert n>=0. and type(n) is int, "Unrecognized input"

    if n == 0:
        return 1
    else:
        return n*factorial2(n-1)

def test():
     from math import factorial
     for x in range(10):
         assert factorial2(x) == factorial(x), \
                "My factorial function is incorrect for n = %i" % x



In [ ]:

    
from myfactorial import factorial2
[factorial2(n) for n in range(5)]

Real world testing and continuous integration

`unittest` and `nose`

Test fixtures

create self-contained tests
setup: open file, connect to a DB, create data structures
teardown: tidy up afterward

Test runner

nosetests
test discovery: any callable beginning with test in a module beginning with test

Testing scientific computing libraries



In [ ]:

    
import scipy.integrate
scipy.integrate.test()

Assertions revisited

Mathematically

$ x = (\sqrt(x))^2$.

So what is happening here:



In [ ]:

    
import math
assert 2 == math.sqrt(2)**2



In [ ]:

    
math.sqrt(2)**2

NumPy Testing



In [ ]:

    
import numpy as np
np.testing.assert_almost_equal(2, math.sqrt(2)**2)



In [ ]:

    
x=1.000001
y=1.000002
np.testing.assert_almost_equal(x,y, decimal=5)

What if we consider x and y almost equal? Can we modify our assertion?



In [ ]:

    
np.testing.assert_almost_equal?

Testing Philosophies

xUnit / declarative testing /

load testing / procedural tests /

record=>replay / TDD / BDD / etc.

eXtreme Programming / Agile /

FIXME: reword...

eXtreme Programming (XP):

Test to Live. Live to Test!

Fine-scale feedback

Pair programming
Planning game
Test-driven development
Whole team owns the code

Continuous process

Continuous integration
Refactoring or design improvement
Small releases

Shared understanding

Coding standards
Collective code ownership
Simple design
System metaphor

Coding

The customer is always available
Code the unit test first
Only one pair integrates code at a time
Leave optimization until last

Testing

All code must have unit tests
All code must pass all unit tests before it can be released.
When a bug is found tests are created before the bug is addressed (a bug is not an error in logic, it is a test that was not written)
Acceptance tests are run often and the results are published

Problem: Using them together

Everything is tightly coupled
Writing new tests environments is a pain
Everyone has to figure out how to produce and consume test results

A solutions: The Testing Anything Protocol (TAP)

Aims to help by: separating code to produce and consume test results

a simple TAP example:

1..4 ok 1 We can foo ok 2 We can bar not ok 3 Oh No... Br. Bill ok

TAP: Additional Functionality

TAP versions
Diagnostic output
TODO/SKIP tests
Nested TAP (very new)
Structures diagnostics (as YAML messages)
Draft IETF Standard (WIP)



In [ ]:

    
!nosetests --with-tap myTAP.py



In [ ]:

    
!cat FunctionTestCase.tap



In [ ]:

    
!tappy FunctionTestCase.tap

Why Bother?

TAP is language agnostic. How would you integrate testing against a project with Fortran, C, and Python as well as Bash scripts?

Bats: Bash Automated Testing System
Tappy: TAP tools for Python (a nose extension)
and many more including Ada, Lisp, C/C++, MySQL and PostgreSQL, Erlang, Go, Java, Matlab, and comming soon to Fortran (via pFUnit).

Test Summary Report

t/iterators.t (Tests: 92 Failed: 8)

Failed tests: 7-13, 15

t/nofork-mux.t (Tests: 6 Failed: 0)

t/regression.t (Tests: 4794 Failed: 103)

Failed tests: 2, 5, 31, 34, 58, 61, 85, 88, 114, 118, 145-146, 171-172, 200-201, 226-227, 252, 255, 278-279, 308, 312, 338, 342, 368-369, 395-396, 422, 425, 452, 454-455, 481, 484, 509-510, 538-539, 563, 567, 593, 597, 623, 627, 653, 657, 683-684, 686, 690, 716, 720, 746, 749, 775-776, 803-804, 831-832, 835-837, 866, 870, 896-897, 923-924, 926-927, 929, 955, 958, 984, 987, 1013-1014, 1040, 1043, 1069, 1073, 1099, 1102, 1126-1127, 1129, 1133, 1159, 1163, 1189-1190, 1192, 1196, 1222-1223, 1226-1227, 1253, 1257

Plans=47 Tests=9370

Result: FAIL



In [8]:

    
%%file bats_tap_ex

@test "no arguments prints usage instructions" {
  run bats
  [ $status -eq 1 ]
  [ $(expr "${lines[1]}" : "Usage:") -ne 0 ]
}

@test "-v and --version print version number" {
  run bats -v
  [ $status -eq 0 ]
  [ $(expr "$output" : "Bats [0-9][0-9.]*") -ne 0 ]
}









    



Writing bats_tap_ex



In [9]:

    
!bats bats_tap_ex --tap









    



1..2
ok 1 no arguments prints usage instructions
ok 2 -v and --version print version number

What do you do if you do not have the tools?

Write your own tap producers...

Breakout Session:

Assignment #1: Write a simple test and run it through nose

Assignment #2: Pick a language (any language) and write a function or subroutine "ok" which takes the following parameters:

val -- the local truth value (ie success/fail)
message -- the test descriptor
test_num -- the test number

and auto-increments test_num.

Now write a couple of tests (1==1, 1==0,...)

What is missing?

Add a "plan" -- ie "1.." followed by the number of tests you ran.

Coming Soon!

SpinalTAP