Read the getting started guide for pytest
.
Your task: Create a file in src/
which can be automatically discovered by pytest (hint: the very first section of this page explains the rules for test discovery. Open a terminal and navigate to your project folder:
Now run the following command:
py.test
You should see output like this:
============================= test session starts ==============================
platform linux -- Python 3.5.2, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /path/to/your/project/folder, inifile:
collected 0 items
========================= no tests ran in 0.00 seconds =========================
Now go into your test file and add the following:
def test_our_test_suite():
assert True == False
Run the command py.test
on the command line again. You should see something like the following:
============================================== test session starts =======================================
platform linux -- Python 3.5.2, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /path/to/your/project/folder, inifile:
collected 1 items
[your_file].py F
==================================================== FAILURES ============================================
______________________________________________ test_our_test_suite _______________________________________
def test_our_test_suite():
> assert True == False
E assert True == False
[your_file].py:2: AssertionError
============================================ 1 failed in 0.01 seconds ====================================
Great, a failing test! Time to replace that and start filling in this file with real tests.
The term "test fixtures" refers to known objects or mock data used to put other pieces of the system to the the test. We want these to have the same, known state every time.
For those familiar with unittest
, this might be data that you read in as part of the setUp
method. pytest
does things a bit differently; you define functions that return expected fixtures, and use a special decorator so that your tests automatically get passed the fixture data when you add the fixture function name as an argument.
We need to set up a way to get some data in here for testing. There are two basic choices — reading in the actual data or a known subset of it, or making up some smaller, fake data. You can choose whatever you think works best for your project.
Remove the failing test from above and copy the following into your testing file:
In [ ]:
import os
import pytest
import pandas as pd
@pytest.fixture()
def df():
"""
read in the raw data file and return the dataframe.
"""
path, _ = os.path.split(os.path.abspath(__file__))
project_path = os.path.join(path, os.pardir)
pass
def test_df_fixture(df):
assert df.shape == (59400, 40)
useful_columns = ['amount_tsh', 'gps_height', 'longitude', 'latitude', 'region',
'population', 'construction_year', 'extraction_type_class',
'management_group', 'quality_group', 'source_type',
'waterpoint_type', 'status_group']
for column in useful_columns:
assert column in df.columns
Your task: implement the rest of the df()
fixture so that the test_df_fixture
test passes when you run py.test
.
The "happy path" is used in testing to refer to the "default scenario featuring no exceptional or error conditions, and comprises the sequence of activities executed if everything goes as expected."
Add the following functions to your test file:
In [ ]:
def test_clean_raw_data(df):
""" test the `clean_raw_data` function """
pass
def test_replace_value_with_grouped_mean(df):
""" test the `replace_value_with_grouped_mean` function """
pass
Your task:
You should be able to import your implemented methods in the test using the lines:
from src.features.preprocess_solution import clean_raw_data
from src.features.preprocess_solution import replace_value_with_grouped_mean
Remember learning about "defensive driving" in driver's ed class? The package engarde lets us code defensively by declaring expectations about the return values of functions and failing loudly when those expectations are violated.
Your task:
Read the example in the engarde
docs. Then browse through the full list of checks to see what declarations are available.
Go into your data processing code and decorate clean_raw_data
, replace_value_with_grouped_mean
, and create_categorical
with engarde decorators as you feel appropriate.
Run your tests again. If any fail with the engarde decorators, fix the code or the tests as appropriate.
Now see if you can to break a few of the engarde assertions by passing in incorrect inputs, and make sure they fail as expected. Remember that you can you pytest.raises
as a context like this:
def test_clean_raw_data_cannot_return_missing(df):
# do something weird here with the df
with pytest.raises(AssertionError):
clean_raw_data(df)
Now that we have basically convinced ourselves that the processing functions work in general, let's try to come up with some uncommon but not pathological situations where things might break.
This is your chance to get creative and paranoid at the same time. For example, what if the csv file has a bad row? What if there is a weird unicode character like an emoji in one of the fields? What if there is a zero in a column that might be a denominator?
At the same time, bear in mind that all testing is a tradeoff. You can't prove the function always behaves correctly, but you can convince yourself of its behavior when certain things are wrong. The ultimate goal is to make sure that your analysis work does not get contaminated by avoidable mistakes.
Your task: write some new test functions in your test file that push the bounds of expected behavior in your data processing functions. See if you can hit the sweet spot of things that might happen but would be really hard to catch.
Add a test pipeline using any necessary fixtures that trains does preprocessing, feature building, model training, and then test assumptions about the model.
How could you structure these tests to raise a red flag if performance decreases? Think about small changes you might make in preprocessing or feature building code -- what would you want to know about the bottom line impact on performance?
In [ ]: