How do we know if our code is working correctly? It is not when the code runs and returns some value: as seen above, there may be times where it makes sense to stop the code even when it is correct, as it is being used incorrectly. We need to test the code to check that it works.
Unit testing is the idea of writing many small tests that check if simple cases are behaving correctly. Rather than trying to prove that the code is correct in all cases (which could be very hard), we check that it is correct in a number of tightly controlled cases (which should be more straightforward). If we later find a problem with the code, we add a test to cover that case.
Testing isn't always the easiest thing to motivate in a mathematical context, as even the best discussions focus on CS applications or on complex codes. One approach is to talk about generality. A function may be designed to work for integers. Later we may decide that the same concept makes sense for real, or complex, numbers. So we want to extend our function to work for the more complex case. Testing allows us to easily check that, when extending our function, we didn't break it in the earlier cases where it worked.
We are intending to use unit tests to automatically mark student submissions of weekly work. Knowing how this works in outline is going to be needed to interpret the errors the students see and ask questions about.
At least to start, we are not intending to use these tests on coursework submissions.
We will write a simple function that divides two numbers:
In [1]:
def divide(x, y):
"""
Divide two numbers
Parameters
----------
x : float
Numerator
y : float
Denominator
Returns
-------
x / y : float
"""
return x / y
For now we can play with this in the console.
We want to check that it does the "right thing". How much do we need to check?
Check integers:
In [2]:
print(divide(4,2))
Check obvious fractions:
In [3]:
print(divide(1,3))
Does $a^7 / a = a^6$?
In [4]:
a = 1.234
print(divide(a**7, a), a**6)
What happens if you divide by zero? (What should happen?)
In [5]:
print(divide(1, 0))
What happens if you divide by a really large number?
In [6]:
print(divide(1, 1e1000))
Each of these tests has their uses and may show different potential problems. What counts as "correct" depends on how you want your code to handle certain situations.
If are function didn't do what we wanted on one of these tests then we'd have to alter it and test again. This can be error prone, so it's better to write functions. We want these functions to complain loudly if something is wrong, but be quiet if all is well. To do this we can use the assert
statement:
In [7]:
def test_integer_division():
assert(divide(4, 2) == 2)
In [8]:
test_integer_division()
We see that nothing happened, as we wanted. However, we can't do that exact test in the case of general real numbers:
In [9]:
def test_real_division1():
assert(divide(1, 3) == 0.33333333333333)
In [10]:
test_real_division1()
We see that we
We will use functions from numpy
for this, introduced purely by example:
In [11]:
from numpy.testing import assert_equal, assert_allclose
def test_integer_division():
assert_equal(divide(4, 2), 2,
err_msg="Dividing 4 by 2: answer should be 2")
def test_real_division1():
assert_allclose(divide(1, 3), 0.33333333333333,
err_msg="Dividing 1 by 3: answer should be 1/3")
def test_power_division():
a = 1.234
assert_allclose(divide(a**7, a), a**6,
err_msg="Dividing a^7 by a: answer should be a^6")
def test_large_division():
assert_equal(divide(1, 1e1000), 0,
err_msg="Dividing by a large enough number returns zero")
In [12]:
test_integer_division()
test_real_division1()
test_power_division()
test_large_division()
This did not allow us to check that the division by zero was handled correctly. For that we'll need pytest
:
In [13]:
import pytest
def test_zero_division():
with pytest.raises(ZeroDivisionError,
message="Dividing by zero should give a ZeroDivisionError"):
divide(1, 0)
In [14]:
test_zero_division()
Note the difference in the syntax between the two ways of testing, which are annoying.
py.test
We now have a set of tests - a testsuite, as it is sometimes called - encoded in functions, with meaningful names, which give useful error messages if the test fails. Every time the code is changed, we want to re-run all the tests to ensure that our change has not broken the code. This can be tedious. A better way would be to run a single command that runs all tests. pytest
is that command.
The easiest way to use it is to put all tests in the same file as the function being tested. So, create a file test_divide.py
containing all the functions above:
from numpy.testing import assert_equal, assert_allclose
import pytest
def divide(x, y):
"""
Divide two numbers
Parameters
----------
x : float
Numerator
y : float
Denominator
Returns
-------
x / y : float
"""
return x / y
def test_integer_division():
assert_equal(divide(4, 2), 2,
err_msg="Dividing 4 by 2: answer should be 2")
def test_real_division1():
assert_allclose(divide(1, 3), 0.33333333333333,
err_msg="Dividing 1 by 3: answer should be 1/3")
def test_power_division():
a = 1.234
assert_allclose(divide(a**7, a), a**6,
err_msg="Dividing a^7 by a: answer should be a^6")
def test_large_division():
assert_equal(divide(1, 1e1000), 0,
err_msg="Dividing by a large enough number returns zero")
def test_zero_division():
with pytest.raises(ZeroDivisionError,
message="Dividing by zero should give a ZeroDivisionError"):
divide(1, 0)
# The following command will run when the file is executed:
pytest.main()
Then execute the file by running it in the Spyder editor. This will define all the functions and the last line will run the tests. You should see output like
===================================== test session starts ======================================
platform darwin -- Python 3.6.1, pytest-3.1.1, py-1.4.33, pluggy-0.4.0
rootdir: /Users/ih3/Documents/github/orcomp-training, inifile:
collected 5 items
test_divide.py .....
=================================== 5 passed in 0.05 seconds ===================================
Each dot corresponds to a passing test.
Try modifying the divide
function so that it will fail in some cases but not others. For example, make it so that the function returns the integer division, rather than the real division:
In [17]:
def divide(x, y):
"""
Divide two numbers
Parameters
----------
x : float
Numerator
y : float
Denominator
Returns
-------
x / y : float
"""
return x // y
This should make the some of the tests fail. Rather than re-running them all individually, we can re-execute the file in Spyder, which uses pytest
to rerun all the tests. You should see something like
===================================== test session starts ======================================
platform darwin -- Python 3.6.1, pytest-3.1.1, py-1.4.33, pluggy-0.4.0
rootdir: /Users/ih3/Documents/github/orcomp-training, inifile:
collected 5 items
test_divide.py .FF..
=========================================== FAILURES ===========================================
_____________________________________ test_real_division1 ______________________________________
def test_real_division1():
assert_allclose(divide(1, 3), 0.33333333333333,
> err_msg="Dividing 1 by 3: answer should be 1/3")
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E Dividing 1 by 3: answer should be 1/3
E (mismatch 100.0%)
E x: array(0)
E y: array(0.33333333333333)
test_divide.py:29: AssertionError
_____________________________________ test_power_division ______________________________________
def test_power_division():
a = 1.234
assert_allclose(divide(a**7, a), a**6,
> err_msg="Dividing a^7 by a: answer should be a^6")
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E Dividing a^7 by a: answer should be a^6
E (mismatch 100.0%)
E x: array(3.0)
E y: array(3.5309450437774568)
test_divide.py:34: AssertionError
============================== 2 failed, 3 passed in 0.60 seconds ==============================
It tells us explicitly
Make sure you can see what are the key parts of the output.
Modify your divide
function so that all tests except the zero division check pass (for example, make the function do something wrong when the divisor is larger than $10$). Check that you can explain the results.
Download the testing script test_quadratic.py
from https://github.com/IanHawke/orcomp-training/raw/master/test_quadratic.py. The test expects a file quadratic.py
containing a function quadratic
that returns all real roots of a given polynomial of maximum degree $2$.
The function should take as input $a_2, a_1, a_0$ where $a_i \in \mathbb{R}$, representing the equation $a_2 x^2 + a_1 x + a_0 = 0$. It should return a list of all real roots. If there are no real roots it should return an empty list. If there is one real root it should return a list with that one root. If there are repeated real roots it should return a list with two entries, both the same.
Build your function up so it is correct for one simple case at a time. Each time, run the test script and see what other cases you may need. Once all tests pass, think what other tests you might add.