In [2]:
import harness
In [1]:
from harness import Harness
from pandas import Categorical
from sklearn import datasets, discriminant_analysis
iris = datasets.load_iris()
# Harness is just a dataframe
df = Harness(
data=iris['data'], index=Categorical(iris['target']),
estimator=discriminant_analysis.LinearDiscriminantAnalysis(),
feature_level=-1, # the feature level indicates an index
# in the dataframe. -1 is the last index.
)
# Fit the model with 50 random rows.
df.sample(50).fit()
# Transform the dataframe
transformed = df.transform()
transformed.set_index(
df.index
.rename_categories(iris['target_names'])
.rename('species'), append=True, inplace=True,
)
# Plot the dataframe using Bokeh charts.
with transformed.reset_index().DataSource(x=0, y=1) as source:
source.Scatter(color='species')
source.show()
More examples can be found in the tests
directory. Tap the Ⓣ key while in the Github interface to search quickly.
harness
initially responded to the need for scikit-learn
models closer to a pandas.DataFrame
. Since a DataFrame is Tidy Data the rows and columns can assist in tracking samples and features over many estimations. With this knowledge it would be easier to design a testing harness for data science.
The DataFrame
has a powerful declarative syntax, consider the groupby
and rolling
apis. There is a modern tendency toward declarative and functional syntaxes in scientific computing and visualization. This is observed in altair, dask, and scikit-learn.
tidy-harness
aims to provide a chain interface between pandas.DataFrame
objects and other popular scientific computing libraries in the python ecosystem. The initial harness
extensions :
scikit-learn
estimator to the dataframe.jinja2
environment to render narratives about the dataframes.bokeh
plotting methods with a contextmanager
for interactive visualization developmentThe development scripts can be run through this notebook.
Jupyter notebooks are used for all Python development in this project. The key features are:
watchdog
file system watcher that converts notebooks to python scripts with nbconvert
. Tests are not converted.nbconvert
with the --execute
flag to run notebooks and fill out their output. _The current goal is for the notebook to be viewable in a Github repo.pytest-ipynb
to run tests directly on the notebooks.
In [16]:
%%script bash --bg
python setup.py develop
watchmedo tricks tricks.yaml
In [2]:
# Execute this cell to stop watching the files
%killbgscripts
In [4]:
%%script bash
jupyter nbconvert harness/tests/*.ipynb --execute --to notebook --inplace
py.test
In [ ]: