Testing

How do we know if our code is working correctly? It is not when the code runs and returns some value: as seen above, there may be times where it makes sense to stop the code even when it is correct, as it is being used incorrectly. We need to test the code to check that it works.

Unit testing is the idea of writing many small tests that check if simple cases are behaving correctly. Rather than trying to prove that the code is correct in all cases (which could be very hard), we check that it is correct in a number of tightly controlled cases (which should be more straightforward). If we later find a problem with the code, we add a test to cover that case.


Teaching note

Testing isn't always the easiest thing to motivate in a mathematical context, as even the best discussions focus on CS applications or on complex codes. One approach is to talk about generality. A function may be designed to work for integers. Later we may decide that the same concept makes sense for real, or complex, numbers. So we want to extend our function to work for the more complex case. Testing allows us to easily check that, when extending our function, we didn't break it in the earlier cases where it worked.

We are intending to use unit tests to automatically mark student submissions of weekly work. Knowing how this works in outline is going to be needed to interpret the errors the students see and ask questions about.

At least to start, we are not intending to use these tests on coursework submissions.


We will write a simple function that divides two numbers:


In [1]:
def divide(x, y):
    """
    Divide two numbers
    
    Parameters
    ----------
    
    x : float
        Numerator
    y : float
        Denominator
    
    Returns
    -------
    
    x / y : float
    """
    return x / y

For now we can play with this in the console.

We want to check that it does the "right thing". How much do we need to check?

Check integers:


In [2]:
print(divide(4,2))


2.0

Check obvious fractions:


In [3]:
print(divide(1,3))


0.3333333333333333

Does $a^7 / a = a^6$?


In [4]:
a = 1.234
print(divide(a**7, a), a**6)


3.5309450437774568 3.5309450437774568

What happens if you divide by zero? (What should happen?)


In [5]:
print(divide(1, 0))


---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-5-da0caa7f6bf5> in <module>()
----> 1 print(divide(1, 0))

<ipython-input-1-255c1de61e50> in divide(x, y)
     16     x / y : float
     17     """
---> 18     return x / y

ZeroDivisionError: division by zero

What happens if you divide by a really large number?


In [6]:
print(divide(1, 1e1000))


0.0

Each of these tests has their uses and may show different potential problems. What counts as "correct" depends on how you want your code to handle certain situations.

If are function didn't do what we wanted on one of these tests then we'd have to alter it and test again. This can be error prone, so it's better to write functions. We want these functions to complain loudly if something is wrong, but be quiet if all is well. To do this we can use the assert statement:


In [7]:
def test_integer_division():
    assert(divide(4, 2) == 2)

In [8]:
test_integer_division()

We see that nothing happened, as we wanted. However, we can't do that exact test in the case of general real numbers:


In [9]:
def test_real_division1():
    assert(divide(1, 3) == 0.33333333333333)

In [10]:
test_real_division1()


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-10-eb75d319d2de> in <module>()
----> 1 test_real_division1()

<ipython-input-9-a9b2929027ed> in test_real_division1()
      1 def test_real_division1():
----> 2     assert(divide(1, 3) == 0.33333333333333)

AssertionError: 

We see that we

  1. want a way of testing that works with real numbers, i.e., checks that the result is "close";
  2. gives us a better error message so we know what is wrong!

We will use functions from numpy for this, introduced purely by example:


In [11]:
from numpy.testing import assert_equal, assert_allclose

def test_integer_division():
    assert_equal(divide(4, 2), 2, 
                 err_msg="Dividing 4 by 2: answer should be 2")
    
def test_real_division1():
    assert_allclose(divide(1, 3), 0.33333333333333, 
                 err_msg="Dividing 1 by 3: answer should be 1/3")

def test_power_division():
    a = 1.234
    assert_allclose(divide(a**7, a), a**6, 
                 err_msg="Dividing a^7 by a: answer should be a^6")

def test_large_division():
    assert_equal(divide(1, 1e1000), 0,
                 err_msg="Dividing by a large enough number returns zero")

In [12]:
test_integer_division()
test_real_division1()
test_power_division()
test_large_division()

This did not allow us to check that the division by zero was handled correctly. For that we'll need pytest:


In [13]:
import pytest

def test_zero_division():
    with pytest.raises(ZeroDivisionError, 
                       message="Dividing by zero should give a ZeroDivisionError"):
        divide(1, 0)

In [14]:
test_zero_division()

Note the difference in the syntax between the two ways of testing, which are annoying.

py.test

We now have a set of tests - a testsuite, as it is sometimes called - encoded in functions, with meaningful names, which give useful error messages if the test fails. Every time the code is changed, we want to re-run all the tests to ensure that our change has not broken the code. This can be tedious. A better way would be to run a single command that runs all tests. pytest is that command.

The easiest way to use it is to put all tests in the same file as the function being tested. So, create a file test_divide.py containing all the functions above:

from numpy.testing import assert_equal, assert_allclose
import pytest

def divide(x, y):
    """
    Divide two numbers

    Parameters
    ----------

    x : float
        Numerator
    y : float
        Denominator

    Returns
    -------

    x / y : float
    """
    return x / y

def test_integer_division():
    assert_equal(divide(4, 2), 2, 
                 err_msg="Dividing 4 by 2: answer should be 2")

def test_real_division1():
    assert_allclose(divide(1, 3), 0.33333333333333, 
                 err_msg="Dividing 1 by 3: answer should be 1/3")

def test_power_division():
    a = 1.234
    assert_allclose(divide(a**7, a), a**6, 
                 err_msg="Dividing a^7 by a: answer should be a^6")

def test_large_division():
    assert_equal(divide(1, 1e1000), 0,
                 err_msg="Dividing by a large enough number returns zero")

def test_zero_division():
    with pytest.raises(ZeroDivisionError, 
                       message="Dividing by zero should give a ZeroDivisionError"):
        divide(1, 0)

# The following command will run when the file is executed:

pytest.main()

Then execute the file by running it in the Spyder editor. This will define all the functions and the last line will run the tests. You should see output like

===================================== test session starts ======================================
platform darwin -- Python 3.6.1, pytest-3.1.1, py-1.4.33, pluggy-0.4.0
rootdir: /Users/ih3/Documents/github/orcomp-training, inifile:
collected 5 items 

test_divide.py .....

=================================== 5 passed in 0.05 seconds ===================================

Each dot corresponds to a passing test.

Try modifying the divide function so that it will fail in some cases but not others. For example, make it so that the function returns the integer division, rather than the real division:


In [17]:
def divide(x, y):
    """
    Divide two numbers
    
    Parameters
    ----------
    
    x : float
        Numerator
    y : float
        Denominator
    
    Returns
    -------
    
    x / y : float
    """
    return x // y

This should make the some of the tests fail. Rather than re-running them all individually, we can re-execute the file in Spyder, which uses pytest to rerun all the tests. You should see something like

===================================== test session starts ======================================
platform darwin -- Python 3.6.1, pytest-3.1.1, py-1.4.33, pluggy-0.4.0
rootdir: /Users/ih3/Documents/github/orcomp-training, inifile:
collected 5 items

test_divide.py .FF..

=========================================== FAILURES ===========================================
_____________________________________ test_real_division1 ______________________________________

    def test_real_division1():
        assert_allclose(divide(1, 3), 0.33333333333333,
>                    err_msg="Dividing 1 by 3: answer should be 1/3")
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=0
E       Dividing 1 by 3: answer should be 1/3
E       (mismatch 100.0%)
E        x: array(0)
E        y: array(0.33333333333333)

test_divide.py:29: AssertionError
_____________________________________ test_power_division ______________________________________

    def test_power_division():
        a = 1.234
        assert_allclose(divide(a**7, a), a**6,
>                    err_msg="Dividing a^7 by a: answer should be a^6")
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=0
E       Dividing a^7 by a: answer should be a^6
E       (mismatch 100.0%)
E        x: array(3.0)
E        y: array(3.5309450437774568)

test_divide.py:34: AssertionError
============================== 2 failed, 3 passed in 0.60 seconds ==============================

It tells us explicitly

  • how many tests passed or failed;
  • which tests failed;
  • what result was expected and what was actually calculated.

Make sure you can see what are the key parts of the output.

Exercise

Modify your divide function so that all tests except the zero division check pass (for example, make the function do something wrong when the divisor is larger than $10$). Check that you can explain the results.

Exercise

Download the testing script test_quadratic.py from https://github.com/IanHawke/orcomp-training/raw/master/test_quadratic.py. The test expects a file quadratic.py containing a function quadratic that returns all real roots of a given polynomial of maximum degree $2$.

The function should take as input $a_2, a_1, a_0$ where $a_i \in \mathbb{R}$, representing the equation $a_2 x^2 + a_1 x + a_0 = 0$. It should return a list of all real roots. If there are no real roots it should return an empty list. If there is one real root it should return a list with that one root. If there are repeated real roots it should return a list with two entries, both the same.

Build your function up so it is correct for one simple case at a time. Each time, run the test script and see what other cases you may need. Once all tests pass, think what other tests you might add.