Dionysus and Topological Data in Python

A basic, fast-paced, and hopefully understandable introduction to ideas and applications of topological data analysis (TDA) using Dionysus in Python.


Topological Data Analysis main points, extremely informally:

  • Topology is classes of surfaces continuously deformable into each other.
  • Surface is infinitely stretchy and compressible, but no ripping of the surface allowed.
  • Topological data is a discretization of ideas from topology.
  • Provides access to invariants (and more) under deformation.
  • Data has shape, and shape has meaning.
  • Difficult to understand high-dimensional (>3) space.

Famously, "the coffee cup is topologically equivalent to a donut".

Think of rescaling features as a deformation

In [38]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Local companion package
from topology.data import coffee_mug
from topology.data import pail
from topology.plotting import plot_mug_3D
from topology.plotting import plot_pail_3D
from topology.plotting import plot_circle_2D

%matplotlib notebook

In [1]:
from IPython.display import Image
from IPython.display import display

Dionysus is a package for analyzing the topology (think holes, circles, handles, and the higher dimensional analogues) of data.

  • PRO: It's one of the few TDA options out there.
  • PRO: Accessible through python bindings.
  • PRO: Provides access to quite a few features.
  • CON: Not very well documented.
  • CON: Not completely accessible through Python.
  • CON: Code is difficult to read and navigate. (Especially C++ code for non-specialists).

It uses "persistent homology":

  • Connect points at each length scale from a range of scales.
  • Persistence: Feature stays over many length scales -- more important.
  • Add points, then add lines, then eventually circles, ...

By Example

In [39]:

<matplotlib.axes._subplots.AxesSubplot at 0x125101850>

In [ ]:

In [35]:

<matplotlib.axes._subplots.Axes3DSubplot at 0x1248f5bd0>

Dimensional reduction

Many dimensional reduction methods will obscure the topological nature of your data. Here is an example of where PCA does a good job of keeping some of the character of a coffee mug, but doesn't do well with a bucket.

In [8]:
%run pca_demo.py

In [1]:
%run d_explore.py

/Users/dylan/anaconda/envs/gp27/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
/Users/dylan/Documents/programming/me/python/tda/Dionysus/build/bindings/python/dionysus/__init__.py:1: RuntimeWarning: to-Python converter for boost::shared_ptr<PersistenceDiagram<boost::python::api::object> > already registered; second conversion method ignored.
  from    _dionysus   import *
Distances Pairwise: time 3.19480895996e-05s
Distances Explicit: time 0.00259494781494s
Rips generate: time 0.0278868675232s
Filtration sort: time 0.148832082748s
Static persistence pairing simplices: time 0.00386214256287s
Dynamic persistence pairing simplices: time 0.0202660560608s
Simplex mapping: time 5.00679016113e-06s
[<31>, <0>]
[<32>, <0>]

In [ ]: