Persistence Explorer Tutorial

This Jupyter notebook will show you how to get up and running with the tda-d3-explorer tool. If you haven't installed the tool yet, please follow the instructions listed on the readme of the tda-d3-explorer repository.


In [8]:
from PersistenceExplorer import *

Computing persistence of images

First we need to compute the persistence diagrams for a set of grayscale bitmap images. We have provided a set of 20 test images in the folder ../data/bmp, as we see:


In [ ]:
[ filename for filename in os.listdir('../data/bmp') if filename.endswith('.bmp') ]

Now that we know we have a set of images, we want to compute the corresponding persistence diagrams. The following command will compute both the sublevel and superlevel persistence of each image and save the results as .csv files. Each image processed will result in a .csv file with the same basename as the original input image, e.g. 00001.bmp results in files pd_sub/00001.csv and pd_sup/00001.csv.

NOTE: The following command will spawn eight processes, each utilizing approximately 100MB of memory.


In [ ]:
ProcessImageFolderWithPHAT('../data/bmp/')

We verify the sublevel results are indeed stored in a newly created subfolder pd_sub:


In [ ]:
[ filename for filename in os.listdir('../data/bmp/pd_sub') if filename.endswith('.csv') ]

Similarly the superlevel persistence results are stored in the subdirectory pd_sup. We can visually inspect the results of such a file:


In [ ]:
with open('../data/bmp/pd_sub/00001.csv', 'r') as f:
    csv_data = f.read()
print(csv_data)

Finer control of persistence calculations

If finer control is required over which filtration to use for the peristence computation and where to place the results, the user can directly call the following function (already available in the PersistenceExplorer package imported above):

def ProcessImageListWithPHAT( list_of_image_filenames, list_of_output_filenames, filtration_type, cores=8 ):
  """
  Iterate through images, compute persistence results, and store results.
    list_of_image_filenames: a list of image files
    list_of_output_filenames: a list of files to save corresponding persistence results in
    filtration_type: either "sub" or "super" to indicate to obtain persistence results for either
                     sublevel or superlevel set filtrations
    cores: the number of cores to use for the parallel computation of persistence diagrams.
  """
  def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
  cohorts_of_image_filenames = chunks(list_of_image_filenames, cores)
  cohorts_of_output_filenames = chunks(list_of_output_filenames, cores)
  # Run commands in parallel
  for images,outputs in zip(cohorts_of_image_filenames, cohorts_of_output_filenames):
    processes = [subprocess.Popen(["ImagePersistence", infile, outfile, filtration_type]) for infile, outfile in zip(images, outputs) ]
    # Block until processing complete
    exitcodes = [p.wait() for p in processes]

PersistenceExplorer

Reading the results of the persistence calculation as a long table of numbers is not in itself very illuminating. Instead, we can use a tool which allows us to visualize and interact with the results of the persistence calculation as a persistence diagram. To use the tool to explore our data, we first:

  • load a sequence of images and their associated persistence results
  • choose a range of frames of interest
  • provide the actual size of the image that the persistence diagram was generated from
  • choose a maximum size height/width for the displayed image (in pixels)
  • choose a maximum size for the persistence diagram (in pixels)

The following commands will initialize the variables needed to provide the parameters for the tool. See the note provided at the end of this tutorial to understand the conventions used to populate the paths to the image and persistence diagram files.


In [9]:
imagefiles = [ '/files/data/bmp/%05d.bmp' % i for i in range(1,21)]
pdfiles = [ '/files/data/bmp/pd_sub/%05d.csv' % i for i in range(1,21)]
frames = range(0, len(imagefiles))
imagesize = [421, 421]
max_image_display_size = 400
persistence_diagram_display_size = 400

After these general environment parameters have been set up, we can now call the persistence explorer to interact with our data. We still need to choose a dimension of interest (i.e. $H_0$ features or $H_1$ persistence features), and then provide this choice and the above parameters to the tool.

The tool currently comes provided with the following features:

  • Selecting a region in the persistence diagram with a lasso tool will display annotated features overlaid on the image, and run an animation which shows how they evolve with time (if you have provided more than one frame). The annotated features come from the critical cell pairings of discrete Morse theory. Open circles indicate birth critical cells (e.g. local minima for sublevel set dimension 0 persistence), and closed circles indicate death critical cells (e.g. saddle points for sublevel set dimension 0 persistence). Line segments are drawn between the critical cells to show the pairings that correspond to the encircled persistence points.

  • Selecting a rectangle in the image will highlight the associated persistence points in the persistence diagram. Any persistence point that has at least one critical cell from its underlying pairing (either birth or death) will be highlighted on the persistence diagram.

Let's try it out! After running the following cell, you'll see the first sample image on the left and a time series of persistence diagrams overlaid as a set on the right. Gray persistence points correspond to the first frame in the dataset, and blue persistence points correspond to the last frame in the dataset. Go ahead and run the following cell now and then try out the suggested features following the output.


In [10]:
dimension_of_interest = 0
PersistenceExplorer(imagefiles, pdfiles, frames, dimension_of_interest, 
                    imagesize, max_image_display_size, persistence_diagram_display_size)


Out[10]:

Visualizing persistence generators

To explore the above sample data, try encircling some persistence points on the persistence diagram by drawing a circle around them. When you release the mouse/trackpad, the circle will complete itself (it is not important to draw the entire circle yourself) and then an animation will play. The frame number indicator will advance through the frames in the sample dataset and the critical cell pairings underlying each encircled persistence point will be overlaid on top of the corresponding image. It is best to wait until the animation has finished before circling additional points on the persistence plane.

Reverse search feature

Next, try out the reverse-search feature by clicking and dragging a box around a feature in the image itself. Upon releasing the mouse/trackpad, any persistence points that have a critical cell in that box will be highlighted in the persistence diagram, with yellow corresponding to the first frame in the dataset and red corresponding to the last frame. Now try encircling a collection of highlighted persistence points and see how they correspond to the region you selected.

Try this again using the dimension 1 sublevel set persistence diagrams.


In [ ]:
dimension_of_interest = 1
PersistenceExplorer(imagefiles, pdfiles, frames, dimension_of_interest,
                    imagesize, max_image_display_size, persistence_diagram_display_size)

File path gotcha:

Note that above, the file paths used for ProcessImageFolderWithPHAT and for PersistenceExplorer are not the same.

ProcessImageFolderWithPHAT executes server side code which treats / as the root of the filesystem as recognized as the OS.

PersistenceExplorer, on the other hand, is javascript which executes client-side. The client can only see the part of the filesystem which Jupyter serves over HTTP. The following example assumes jupyter notebook has been launched from the tda-d3-explorer directory. Jupyter Notebook treats this as the root directory, (which it names /files as opposed to just /). Thus we prefix all our paths with /files and can only access files underneath the file-system from wherever Jupyter Notebook was launched. For example /path/to/tda-d3-explorer/data/bmp/00001.bmp becomes /files/data/bmp/00001.bmp since Jupyter Notebook was launched in /path/to/tda-d3-explorer.

It is not clear yet the best way to resolve this confusion.