In [1]:
import numpy as np

Unit Tests

Overview and Principles

Testing is the process by which you exercise your code to determine if it performs as expected. The code you are testing is referred to as the code under test.

There are two parts to writing tests.

  1. invoking the code under test so that it is exercised in a particular way;
  2. evaluating the results of executing code under test to determine if it behaved as expected.

The collection of tests performed are referred to as the test cases. The fraction of the code under test that is executed as a result of running the test cases is referred to as test coverage.

For dynamical languages such as Python, it's extremely important to have a high test coverage. In fact, you should try to get 100% coverage. This is because little checking is done when the source code is read by the Python interpreter. For example, the code under test might contain a line that has a function that is undefined. This would not be detected until that line of code is executed.

Test cases can be of several types. Below are listed some common classifications of test cases.

  • Smoke test. This is an invocation of the code under test to see if there is an unexpected exception. It's useful as a starting point, but this doesn't tell you anything about the correctness of the results of a computation.
  • One-shot test. In this case, you call the code under test with arguments for which you know the expected result.
  • Edge test. The code under test is invoked with arguments that should cause an exception, and you evaluate if the expected exception occurrs.
  • Pattern test - Based on your knowledge of the calculation (not implementation) of the code under test, you construct a suite of test cases for which the results are known or there are known patterns in these results that are used to evaluate the results returned.

Another principle of testing is to limit what is done in a single test case. Generally, a test case should focus on one use of one function. Sometimes, this is a challenge since the function being tested may call other functions that you are testing. This means that bugs in the called functions may cause failures in the tests of the calling functions. Often, you sort this out by knowing the structure of the code and focusing first on failures in lower level tests. In other situations, you may use more advanced techniques called mocking. A discussion of mocking is beyond the scope of this course.

A best practice is to develop your tests while you are developing your code. Indeed, one school of thought in software engineering, called test-driven development, advocates that you write the tests before you implement the code under test so that the test cases become a kind of specification for what the code under test should do.

Examples of Test Cases

This section presents examples of test cases. The code under test is the calculation of entropy.

Entropy of a set of probabilities

$$ H = -\sum_i p_i \log(p_i) $$

where $\sum_i p_i = 1$.


In [13]:
import numpy as np
# Code Under Test
def entropy(ps):
    items = ps * np.log(ps)
    if any(np.isnan(items)):
        raise ValueError("Cannot compute log of ps!")
    return -np.sum(items)

In [12]:
np.isnan([.1, .9])


Out[12]:
array([False, False])

In [3]:
# Smoke test
entropy([0.5, 0.5])


Out[3]:
0.6931471805599453

Suppose that all of the probability of a distribution is at one point. An example of this is a coin with two heads. Whenever you flip it, you always get heads. That is, the probability of a head is 1.

What is the entropy of such a distribution? From the calculation above, we see that the entropy should be $log(1)$, which is 0. This means that we have a test case where we know the result!


In [6]:
# One-shot test. Need to know the correct answer.
SMALL_VALUE = 1e-5
entropy([SMALL_VALUE, 1-SMALL_VALUE])


Out[6]:
0.00012512920464949012

Question: What is an example of another one-shot test? (Hint: You need to know the expected result.)

One edge test of interest is to provide an input that is not a distribution in that probabilities don't sum to 1.


In [14]:
# Edge test. This is something that should cause an exception.
entropy([-.1, .9])


/home/ubuntu/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in log
  after removing the cwd from sys.path.
-------------------------------------------------------------------------
ValueError                              Traceback (most recent call last)
<ipython-input-14-6183bfbf9b67> in <module>()
      1 # Edge test. This is something that should cause an exception.
----> 2 entropy([-.1, .9])

<ipython-input-13-329ed73fe3ab> in entropy(ps)
      4     items = ps * np.log(ps)
      5     if any(np.isnan(items)):
----> 6         raise ValueError("Cannot compute log of ps!")
      7     return -np.sum(items)

ValueError: Cannot compute log of ps!

Now let's consider a pattern test. Examining the structure of the calculation of $H$, we consider a situation in which there are $n$ equal probabilities. That is, $p_i = \frac{1}{n}$. $$ H = -\sum_{i=1}^{n} p_i \log(p_i) = -\sum_{i=1}^{n} \frac{1}{n} \log(\frac{1}{n}) = n (-\frac{1}{n} \log(\frac{1}{n}) ) = -\log(\frac{1}{n}) $$ For example, entropy([0.5, 0.5]) should be $-log(0.5)$.


In [16]:
# Pattern test
print (entropy([0.5, 0.5]), entropy([1/3, 1/3, 1/3]), entropy(np.repeat(1/20, 20)))


0.6931471805599453 1.0986122886681096 2.995732273553991

You see that there are many, many cases to test. So far, we've been writing special codes for each test case. We can do better.

Unittest Infrastructure

There are several reasons to use a test infrastructure:

  • If you have many test cases (which you should!), the test infrastructure will save you from writing a lot of code.
  • The infrastructure provides a uniform way to report test results, and to handle test failures.
  • A test infrastructure can tell you about coverage so you know what tests to add.

We'll be using the unittest framework. This is a separate Python package. Using this infrastructure, requires the following:

  1. import the unittest module
  2. define a class that inherits from unittest.TestCase
  3. write methods that run the code to be tested and check the outcomes.

The last item has two subparts. First, we must identify which methods in the class inheriting from unittest.TestCase are tests. You indicate that a method is to be run as a test by having the method name begin with "test".

Second, the "test methods" should communicate with the infrastructure the results of evaluating output from the code under test. This is done by using assert statements. For example, self.assertEqual takes two arguments. If these are objects for which == returns True, then the test passes. Otherwise, the test fails.


In [18]:
import unittest

# Define a class in which the tests will run
class UnitTests(unittest.TestCase):

    # Each method in the class to execute a test
    def test_success(self):
        self.assertEqual(1, 1)
        
    def test_success1(self):
        self.assertTrue(1 == 1)

    def test_failure(self):
        self.assertEqual(1, 1)
 
suite = unittest.TestLoader().loadTestsFromTestCase(UnitTests)
_ = unittest.TextTestRunner().run(suite)


...
----------------------------------------------------------------------
Ran 3 tests in 0.011s

OK

In [18]:
import unittest

# Define a class in which the tests will run
class UnitTests(unittest.TestCase):

    # Each method in the class to execute a test
    def test_success(self):
        self.assertEqual(1, 1)
        
    def test_success1(self):
        self.assertTrue(1 == 1)

    def test_failure(self):
        self.assertEqual(1, 1)
 
suite = unittest.TestLoader().loadTestsFromTestCase(UnitTests)
_ = unittest.TextTestRunner().run(suite)


...
----------------------------------------------------------------------
Ran 3 tests in 0.011s

OK

In [8]:
# Function the handles test loading
#def test_setup(argument ?):

Code for homework or your work should use test files. In this lesson, we'll show how to write test codes in a Jupyter notebook. This is done for pedidogical reasons. It is NOT not something you should do in practice, except as an intermediate exploratory approach.

As expected, the first test passes, but the second test fails.

Exercise

  • Rewrite the above one-shot test for entropy using the unittest infrastructure.

In [9]:
# Implementating a pattern test. Use functions in the test.
import unittest

# Define a class in which the tests will run
class TestEntropy(unittest.TestCase):
        
    def test_equal_probability(self):
        def test(count):
            """
            Invokes the entropy function for a number of values equal to count
            that have the same probability.
            :param int count:
            """
            raise RuntimeError ("Not implemented.")
        #
        test(2)
        test(20)
        test(200)

 suite = unittest.TestLoader().loadTestsFromTestCase(TestEntropy)
_ = unittest.TextTestRunner().run(suite)

Testing For Exceptions

Edge test cases often involves handling exceptions. One approach is to code this directly.


In [21]:
import unittest

# Define a class in which the tests will run
class TestEntropy(unittest.TestCase):
        
    def test_invalid_probability(self):
        try:
            entropy([0.1, -0.5])
            self.assertTrue(False)
        except ValueError:
            self.assertTrue(True)
        
suite = unittest.TestLoader().loadTestsFromTestCase(TestEntropy)
_ = unittest.TextTestRunner().run(suite)


/home/ubuntu/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in log
  after removing the cwd from sys.path.
.
----------------------------------------------------------------------
Ran 1 test in 0.008s

OK

unittest provides help with testing exceptions.


In [24]:
import unittest

# Define a class in which the tests will run
class TestEntropy(unittest.TestCase):
        
    def test_invalid_probability(self):
        with self.assertRaises(ValueError):
            a = 1 / 0.0
            entropy([0.1, -0.5])
        
suite = unittest.TestLoader().loadTestsFromTestCase(TestEntropy)
_ = unittest.TextTestRunner().run(suite)


E
======================================================================
ERROR: test_invalid_probability (__main__.TestEntropy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-24-9345e9676bf1>", line 8, in test_invalid_probability
    a = 1 / 0.0
ZeroDivisionError: float division by zero

----------------------------------------------------------------------
Ran 1 test in 0.007s

FAILED (errors=1)

Test Files

Although I presented the elements of unittest in a notebook. your tests should be in a file. If the name of module with the code under test is foo.py, then the name of the test file should be test_foo.py.

The structure of the test file will be very similar to cells above. You will import unittest. You must also import the module with the code under test. Take a look at test_prime.py in this directory to see an example.

Discussion

Question: What tests would you write for a plotting function?

Test Driven Development

Start by writing the tests. Then write the code.

We illustrate this by considering a function geomean that takes a list of numbers as input and produces the geometric mean on output.


In [12]:
import unittest

# Define a class in which the tests will run
class TestEntryopy(unittest.TestCase):
        
    def test_oneshot(self):
        self.assertEqual(geomean([1,1]), 1)
        
    def test_oneshot2(self):
        self.assertEqual(geomean([3, 3, 3]), 3)

Exercise: Testing Basics

Other infrastructures

  • pytest
  • nose
  • Use binary functions that being with "test"

Exercise: Testing With Data And Plots

The directory contains 3 CSV files of a happiness survey for 2015-2017. Write python codes that:

  • Read the files
  • Compute a linear trend for a numerical value
  • Plot the results
  • Test the resulting codes

References

https://www.youtube.com/watch?v=GEqM9uJi64Q (Pydata 2015) https://www.youtube.com/watch?v=yACtdj1_IxE (Pycon 2017)

The first talk mentions some packages: engarde - https://github.com/TomAugspurger/engarde Hypothesis - https://hypothesis.readthedocs.io/en/latest/ Feature Forge - https://github.com/machinalis/featureforge

Detlef Nauck talk: http://ukkdd.org.uk/2017/info/talks/nauck.pdf He also had a list of R tools but I could not find the slides form the talk I saw.

Test Driven Data Analysis: https://www.youtube.com/watch?v=TGwZnZYg0jw

Profiling for Pandas: https://github.com/pandas-profiling/pandas-profiling