Data Science is Software

Developer #lifehacks for the Jupyter Data Scientist

Lab 4: Testing

In this lab, you will be creating a test suite with pytest to start checking the functions you wrote for the previous section in order to make sure they work as expected. You will also use engarde in order to declare and enforce sanity checks on data processing steps.

Setup

Read the getting started guide for pytest.

Your task: Create a file in src/ which can be automatically discovered by pytest (hint: the very first section of this page explains the rules for test discovery. Open a terminal and navigate to your project folder:


Now run the following command:

py.test

You should see output like this:

============================= test session starts ==============================
platform linux -- Python 3.5.2, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /path/to/your/project/folder, inifile: 
collected 0 items 

========================= no tests ran in 0.00 seconds =========================


Now go into your test file and add the following:

def test_our_test_suite():
    assert True == False

Run the command py.test on the command line again. You should see something like the following:

============================================== test session starts =======================================
platform linux -- Python 3.5.2, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /path/to/your/project/folder, inifile: 
collected 1 items 

[your_file].py F

==================================================== FAILURES ============================================
______________________________________________ test_our_test_suite _______________________________________

    def test_our_test_suite():
>       assert True == False
E       assert True == False

[your_file].py:2: AssertionError
============================================ 1 failed in 0.01 seconds ====================================

Great, a failing test! Time to replace that and start filling in this file with real tests.

Question 1: creating a test suite with a data fixture

The term "test fixtures" refers to known objects or mock data used to put other pieces of the system to the the test. We want these to have the same, known state every time.

For those familiar with unittest, this might be data that you read in as part of the setUp method. pytest does things a bit differently; you define functions that return expected fixtures, and use a special decorator so that your tests automatically get passed the fixture data when you add the fixture function name as an argument.

We need to set up a way to get some data in here for testing. There are two basic choices — reading in the actual data or a known subset of it, or making up some smaller, fake data. You can choose whatever you think works best for your project.

Remove the failing test from above and copy the following into your testing file:


In [ ]:
import os
import pytest
import pandas as pd


@pytest.fixture()
def df():
    """
    read in the raw data file and return the dataframe.
    """
    path, _ = os.path.split(os.path.abspath(__file__))
    project_path = os.path.join(path, os.pardir)
    pass


def test_df_fixture(df):
    assert df.shape == (59400, 40)

    useful_columns = ['amount_tsh', 'gps_height', 'longitude', 'latitude', 'region',
                      'population', 'construction_year', 'extraction_type_class',
                      'management_group', 'quality_group', 'source_type',
                      'waterpoint_type', 'status_group']
    
    for column in useful_columns:
        assert column in df.columns

Your task: implement the rest of the df() fixture so that the test_df_fixture test passes when you run py.test.

Question 2: rigorously TDD the data processing pipeline's "happy path"

The "happy path" is used in testing to refer to the "default scenario featuring no exceptional or error conditions, and comprises the sequence of activities executed if everything goes as expected."

Add the following functions to your test file:


In [ ]:
def test_clean_raw_data(df):
    """ test the `clean_raw_data` function """
    pass


def test_replace_value_with_grouped_mean(df):
    """ test the `replace_value_with_grouped_mean` function """
    pass

Your task:

  • Import the functions these tests are designed to examine and then fill these tests out with asserts. Your tests should take the as expected with normal inputs. Make sure these functions did what they were supposed to do.
  • When your tests fail, make sure the test is doing what you think it should and then -- once confirmed -- go fix your code.
  • By all means, check what happens when you pass in a value that might be slightly unexpected (like negative numbers or empty lists where appropriate) but not a value that is totally crazy. We will get to testing "edge case" scenarios in question 4.

You should be able to import your implemented methods in the test using the lines:

from src.features.preprocess_solution import clean_raw_data
from src.features.preprocess_solution import replace_value_with_grouped_mean


**Note:** check out the docs for [numpy.isclose](http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.isclose.html) and [numpy.allclose](http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.allclose.html#numpy.allclose). When making assertions about data, especially where small probabilistic changes or machine precision may result in numbers that aren't **exactly** equal. Consider using this instead of == for numbers involved in anything where randomness may influence the results (e.g. making predictions).

Question 3: sanity checks using engarde

Remember learning about "defensive driving" in driver's ed class? The package engarde lets us code defensively by declaring expectations about the return values of functions and failing loudly when those expectations are violated.

Your task:

  • Read the example in the engarde docs. Then browse through the full list of checks to see what declarations are available.

  • Go into your data processing code and decorate clean_raw_data, replace_value_with_grouped_mean, and create_categorical with engarde decorators as you feel appropriate.

  • Run your tests again. If any fail with the engarde decorators, fix the code or the tests as appropriate.

  • Now see if you can to break a few of the engarde assertions by passing in incorrect inputs, and make sure they fail as expected. Remember that you can you pytest.raises as a context like this:

    def test_clean_raw_data_cannot_return_missing(df):
        # do something weird here with the df
        with pytest.raises(AssertionError):
            clean_raw_data(df)
    

Question 4: Did you run all of your code?

Your task: Run code coverage on your tests and see what parts of your fucntions you covered and if you missed anything.

Question 5: fully embracing paranoia by testing weird edge cases

Now that we have basically convinced ourselves that the processing functions work in general, let's try to come up with some uncommon but not pathological situations where things might break.

This is your chance to get creative and paranoid at the same time. For example, what if the csv file has a bad row? What if there is a weird unicode character like an emoji in one of the fields? What if there is a zero in a column that might be a denominator?

At the same time, bear in mind that all testing is a tradeoff. You can't prove the function always behaves correctly, but you can convince yourself of its behavior when certain things are wrong. The ultimate goal is to make sure that your analysis work does not get contaminated by avoidable mistakes.

Your task: write some new test functions in your test file that push the bounds of expected behavior in your data processing functions. See if you can hit the sweet spot of things that might happen but would be really hard to catch.

STRETCH: add an end-to-end test for training a model and testing it

Add a test pipeline using any necessary fixtures that trains does preprocessing, feature building, model training, and then test assumptions about the model.

How could you structure these tests to raise a red flag if performance decreases? Think about small changes you might make in preprocessing or feature building code -- what would you want to know about the bottom line impact on performance?


In [ ]: