Dionysus and Topological Data in Python

A basic, fast-paced, and hopefully understandable introduction to ideas and applications of topological data analysis (TDA) using Dionysus in Python.

Introduction

Topological Data Analysis main points, extremely informally:

Topology is classes of surfaces continuously deformable into each other.
Surface is infinitely stretchy and compressible, but no ripping of the surface allowed.
Topological data is a discretization of ideas from topology.
Provides access to invariants (and more) under deformation.
Data has shape, and shape has meaning.
Difficult to understand high-dimensional (>3) space.

Famously, "the coffee cup is topologically equivalent to a donut".

Think of rescaling features as a deformation



In [38]:

    
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Local companion package
from topology.data import coffee_mug
from topology.data import pail
from topology.plotting import plot_mug_3D
from topology.plotting import plot_pail_3D
from topology.plotting import plot_circle_2D

%matplotlib notebook



In [1]:

    
from IPython.display import Image
from IPython.display import display
#https://en.wikipedia.org/wiki/File:Mug_and_Torus_morph.gif
display(Image(url="images/Mug_and_Torus_morph.gif"))

Dionysus is a package for analyzing the topology (think holes, circles, handles, and the higher dimensional analogues) of data.

PRO: It's one of the few TDA options out there.
PRO: Accessible through python bindings.
PRO: Provides access to quite a few features.
CON: Not very well documented.
CON: Not completely accessible through Python.
CON: Code is difficult to read and navigate. (Especially C++ code for non-specialists).

It uses "persistent homology":

Connect points at each length scale from a range of scales.
Persistence: Feature stays over many length scales -- more important.
Add points, then add lines, then eventually circles, ...

By Example



In [39]:

    
plot_circle_2D()









    














    











    Out[39]:





<matplotlib.axes._subplots.AxesSubplot at 0x125101850>



In [ ]:



In [35]:

    
plot_mug_3D()
plot_pail_3D()









    














    











    














    











    Out[35]:





<matplotlib.axes._subplots.Axes3DSubplot at 0x1248f5bd0>

Dimensional reduction

Many dimensional reduction methods will obscure the topological nature of your data. Here is an example of where PCA does a good job of keeping some of the character of a coffee mug, but doesn't do well with a bucket.



In [8]:

    
%run pca_demo.py



In [1]:

    
%run d_explore.py









    



/Users/dylan/anaconda/envs/gp27/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
/Users/dylan/Documents/programming/me/python/tda/Dionysus/build/bindings/python/dionysus/__init__.py:1: RuntimeWarning: to-Python converter for boost::shared_ptr<PersistenceDiagram<boost::python::api::object> > already registered; second conversion method ignored.
  from    _dionysus   import *






    



Distances Pairwise: time 3.19480895996e-05s
Distances Explicit: time 0.00259494781494s
Rips generate: time 0.0278868675232s
Filtration sort: time 0.148832082748s
Static persistence pairing simplices: time 0.00386214256287s
Dynamic persistence pairing simplices: time 0.0202660560608s
Simplex mapping: time 5.00679016113e-06s
[<31>, <0>]
[<32>, <0>]



In [ ]: