Parameter Exploration for an In-House Library

Presenter: Ramón Heberto Martínez

Course Responsible: Olav Vahtras

Python Course SESE - 27 November of 2015

The Problem

Nexa library.

  • In house library for spatio-temporal feature extraction from time signals.
  • Calulation and Memory intensive.
  • Used for pre-processing data.

The Problem

The Nexa Library has a Complex Processing Pipeline

  • Construction of a matrix with all the data and delayed versions (SLM)
  • Construction of a correlation matrix of the data matrix above (STDM)
  • Embedding of each of the signals above and its delays in an euclidian space.
  • Clustering of each of those signals.
  • For each of the cluster above cluster the data within them.

Complexity madness!

The Problem

Ilustration with a simple use-case

  • The signal
  • Distance between them.
  • Width of Gaussian Bumpbs.
  • Base levels.

Parameters

  • SLM: The Matrix with the data and its lagged versions.
  • STDM: The correlation matrix that quantifies the distance between every one of the signals and the delayed versions of them.
  • Clustering Map: Shows to which clusters are assigned each of the signals.

Python Tools Involved

Github

  • Using github before but in a monolitic way. No use of branches.
  • Using branches to separate feature development from the main body of work. This allows to experiment freely without the constant fear of breaking something lurking in the background.

Python Tools Involved

Matplotlib

  • Using matplotlib in a class oriented way we can separate the figure from the axis.
  • By passing axis to a function we can separate the figure creationg in all its glory from its position in the figure.
  • We avoid this duplicating code for every possible (positionally) use case of our data plots.

Python Tools Involved

Iptyhon Widgets

  • In calculation-memory intensive computations is not feasible to run many times a routine and extract the results after. Many things can go wrong.
  • Decouple results generation from results exploration.
  • GUI usually have a step learning curve and require some time investment.
  • Luckily the Ipython.widget module combined with the Ipython notebook can cover most scientific use cases and is readily available without too much configuration, boilerplate or depedency requirements.

Discussion

  • We have built a routine to study systematically the outcome of input variation in our results.
  • We did this by leveragin in the facilities provided by scientific Python.
  • We have also created an intermediate step where we store the results in HDF format. We achieve this by using h5py that allow us to merge the storage with our framework seamlessly.

In [ ]: