Unit testing

Foreword

There are several Python libraries dedicated to unit testing. The most common are:

This course focuses on the use of `pytest`.

1. General testing principles

Why should you write tests ?

In general, tests are an assessment of both the quality and the efficiency of your code.

Tests actually define the requirements of the code at various levels. From the basic method definition, to the full software validation.

What kind of tests should you write ?

For each of these levels, a type of test exists :

  • unit tests
  • non-regression tests
  • pre-integration tests
  • integration tests
  • validation tests

From this terminology, the unit tests are the basic elements, that should be run before any commit of the code. They are the one that will be the focus of this course.

2. Unit tests

Definition

A unit test case should

  • test individual software components ("units") like classes and methods,
  • supply mocks or fake versions of dependencies (e.g. database, server, etc.) so that the test does not rely on any external object ,
  • enable failures to be pinpointed easily.

Design (theory)

Based on :

“The Art of Software Testing”, Glenford J. Myers, Corey Sandler, Tom Badgett, Wiley & Sons, 2011

Normal conditions

  • Rightness validate the results against the requirements

    Ex: a method supposed to select the largest number of a list should be checked by comparing the result with the identified maximum from a known list.

  • Inverse relationships

    Ex1: a method calculating the square root of a number can be checked by squaring the result. Ex2: a method inserting a value in a file can be checked by searching this value after insertion.

    This should be done with tools independent from the method to test (other library).

  • Cross-check:

    Ex: an analytical result can be compared to a numerical calculations for values or conditions where it is possible.

  • Code logic:

    Ex:

    if ((x or y) and z):
       decision_1
    else: 
       decision_2
    

    where x, yand z are called "conditions".

    Combinations of conditions could be unexpected and lead to a wrong decision. Some decisions are never reached because a different combinations of conditions always lead to the same logical value.

Abnormal conditions and edges

  • Exceptions:

    If the methods throws exceptions under a certain conditions, this should be checked with a test. See this section.

    with pytest.raises(<NameOfException>):
        # call to the method that should throw the exception
    
  • Boundary conditions:

    • Does the value exist ?
    • Does the value conform to an expected format ?
    • Is the value given in a reasonable range (min, max) ?
    • Are there enough values (cardinality) ?
    • Does the code reference anything external ?
    • What are the edges of the partitioned values ?
    • Is everything happening in the right order ?
  • Error conditions:

    Force error conditions

    • running out of memory
    • running out of disk space
    • issues with wall-clock time
    • network availability and errors
    • system load.

Test-Driven Development

Test-driven development (TDD) is a software development process, part of the Agile principles, that relies on the transcription of the software requirements into tests, before the code that passes the tests is written.

It is an iterative process that aims at starting with a basic test case andthen upgrading alternatively the test cases and the code until the requirements are met, depending on the expectations.

3. Writing a test with pytest

A unit test written under [pytest][pytest] is a Python function or class whose name starts with "test" and that makes an hypothesis considered true .

A first test

Let's right a basic file containing a function f, and the corresponding test


In [ ]:
%%file my_first_test.py

def identity(a):
    return a

def test_a():
    assert identity(1) == 1

The file has been saved in the current directory


In [ ]:
!ls *test.py
Info: Shell commands can be run from the notebook preceded with an exclamation mark "!". This functionality will be used thoughout the notebook.

Launching pytest is as easy as move to the right directory and using the command line

py.test

It will start a recursive search from the current directory for Python files, look for methods containing "test" and run them.


In [ ]:
!py.test my_first_test.py

For a quick summary, use the quick option -q


In [ ]:
!py.test -q my_first_test.py

For more information on which test has been run, use the verbose option -v


In [ ]:
!py.test -v my_first_test.py

Additional tests

Let's now write a bunch of tests, introduce an error on test_b and re-run pytest.

Info: In a notebook cell, the %%file path/to/filename.py magic command will write the remainder of the cell into the file.

In [ ]:
%%file my_second_test.py

def identity(a):
    return a

def test_a():
    assert identity(1) == 1
    
def test_b():
    assert identity(2) == 1

def test_c():
    assert identity(3) == 1 + 1 + 1

In [ ]:
!py.test -v my_second_test.py

We see pytest has collected and run 4 items, 1 from the first file, and 3 from the second.

As expected, one test has failed.

Therefore pytest shows the full traceback leading to the failure, and even gives the output value of the f method which can be useful for quick debugging.

Testing errors and exceptions

The philosophy of Python is to try something first and then decide what to do in case of an error. This is the reason behind Python Exceptions. They inform on the issue that was detected and help the user debug or catch it and find another way to deal with the issue.

When testing a code, it is thus important to assess if these Exceptions are raised as they should be. However, since an exception raised but not caught in an environmment triggers an error, one cannot use the "assert" syntax but the context manager pytest.raises instead.


In [ ]:
%%file my_third_test.py

import pytest

def positive_identity(n):
    if n < 0:
        raise ValueError("Negative value detected")
    return n
        
def test_positive_identity():
    assert positive_identity(1) == 1
    
def test_exception_positive_identity():
    with pytest.raises(ValueError):
        positive_identity(-1)

In [ ]:
!py.test -v my_third_test.py

numpy and pandas testing helpers

Both numpy and pandas libraries have created helper methods to ease the testing and comparison of their core objects, that is numpy array and pandas DataFrame.

These methods can be found in a submodule called testing.


In [ ]:
import numpy.testing
import pandas.testing

In [ ]:
# List of NumPy assert methods
[func for func in dir(numpy.testing) if func.startswith('assert')]

In [ ]:
# List of Pandas assert methods
[func for func in dir(pandas.testing) if func.startswith('assert')]

The numpy assert methods will be used to write the tests in the next section.

Clean current directory before next section


In [ ]:
!rm -rf my*test.py

4. Writing tests for the spectra_analysis module

This part explains how to organize your tests for testing a module. It uses the spectra_analysis module created and upgrade all along this course.

We will write tests for the two submodules:

  • spectra_analysis/preprocessing.py
  • spectra_analysis/regression.py

Directory setup

We'll start by copying a clean version of the spectra_analysis module in the current directory.


In [ ]:
# Cleaning current directory
!rm -rf spectra_analysis

In [ ]:
# Copy clean version from the previous course
!cp -vr solutions/04_modules/spectra_analysis .

Then we'll create the structure for the tests


In [ ]:
# Test directory
!mkdir spectra_analysis/tests

# __init__.py file so the directory is recognized as a submodule
!touch spectra_analysis/tests/__init__.py

# create a local copy of test data
!mkdir spectra_analysis/tests/data
!cp solutions/data/data.csv spectra_analysis/tests/data/.

preprocessing.py

We start first with the tests for the read_spectra method of the preprocessing submodule.


In [ ]:
from spectra_analysis.preprocessing import read_spectra

For a closer look at the content of the method, use the "??" syntax


In [ ]:
#read_spectra??

We will create a file called test_preprocessing.py with two unit tests: one for testing the normal conditions, the other one for handling the exceptions.


In [ ]:
%%file spectra_analysis/tests/test_preprocessing.py

from spectra_analysis.preprocessing import read_spectra


def test_read_spectra():
    # Write here the tests for the normal conditions
    pass


def test_read_spectra_exceptions():
    # Write here the tests to handle errors
    pass
EXERCISE:
Fill the blanks in the two test units.

Advice:
  • Use the example data.csv file
  • List all the outputs of the method and check them separately.
  • List the possible errors raised by the code and make the code fail to that these errors are handled.
  • Find the most appropriate assert method from numpy or pandas.

Verify that the tests pass by running pytest.


In [ ]:
!py.test -v spectra_analysis/tests/

regression.py

We now focus on the second file regression.py.


In [ ]:
from spectra_analysis.regression import fit_params, transform

This time we will create a file called test_regression.py with two unit tests corresponding to the methods fit_params and transform.


In [ ]:
%%file spectra_analysis/tests/test_regression.py

from spectra_analysis.regression import fit_params
from spectra_analysis.regression import transform


def test_fit_params():
    pass


def test_transform():
    pass
EXERCISE:
Same as above, implement tests for the two methods.

Advice:
  • Start by defining a common array for both tests
    
      X = np.array([[0, 0, 0],
                    [0, 0, 0],
                    [1, 1, 1],
                    [1, 1, 1],
                    [1, 1, 1],
                    [2, 2, 2]])
      
  • Compute the expected output of the methods applied on this array

Again, verify that the tests pass.


In [ ]:
!py.test -v spectra_analysis/tests/

5. Test coverage

A test coverage is a report on the percentage of lines of a module that have been run during a test.

The higher the coverage, the greater the number of code lines that have been executed at least once during a test.

Installation

To use coverage with pytest, one must first install pytest coverage plugin, e.g. the pytest-cov library:

conda install pytest-cov

or

pip install pytest-cov


Usage

The coverage can then be run alongside the testing, by setting the path to the module to run the coverage against with an extra argument --cov=<path to the module>.


In [ ]:
!py.test -v --cov=spectra_analysis spectra_analysis/tests/

The coverage table lists for each file in the module tree

  • Stmts: the number of actual lines of code in the file,
  • Miss: the number of code lines missed by the tests,
  • Cover: the resulting coverage percentage.
EXERCISE:
Maximise the coverage of the two test files above.

Advice:
  • You should obtain a coverage above 80%.

In [ ]: