In [8]:
from PersistenceExplorer import *
In [ ]:
[ filename for filename in os.listdir('../data/bmp') if filename.endswith('.bmp') ]
Now that we know we have a set of images, we want to compute the corresponding persistence diagrams. The following command will compute both the sublevel and superlevel persistence of each image and save the results as .csv
files. Each image processed will result in a .csv
file with the same basename as the original input image, e.g. 00001.bmp
results in files pd_sub/00001.csv
and pd_sup/00001.csv
.
NOTE: The following command will spawn eight processes, each utilizing approximately 100MB of memory.
In [ ]:
ProcessImageFolderWithPHAT('../data/bmp/')
We verify the sublevel results are indeed stored in a newly created subfolder pd_sub
:
In [ ]:
[ filename for filename in os.listdir('../data/bmp/pd_sub') if filename.endswith('.csv') ]
Similarly the superlevel persistence results are stored in the subdirectory pd_sup
. We can visually inspect the results of such a file:
In [ ]:
with open('../data/bmp/pd_sub/00001.csv', 'r') as f:
csv_data = f.read()
print(csv_data)
If finer control is required over which filtration to use for the peristence computation and where to place the results, the user can directly call the following function (already available in the PersistenceExplorer package imported above):
def ProcessImageListWithPHAT( list_of_image_filenames, list_of_output_filenames, filtration_type, cores=8 ):
"""
Iterate through images, compute persistence results, and store results.
list_of_image_filenames: a list of image files
list_of_output_filenames: a list of files to save corresponding persistence results in
filtration_type: either "sub" or "super" to indicate to obtain persistence results for either
sublevel or superlevel set filtrations
cores: the number of cores to use for the parallel computation of persistence diagrams.
"""
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
cohorts_of_image_filenames = chunks(list_of_image_filenames, cores)
cohorts_of_output_filenames = chunks(list_of_output_filenames, cores)
# Run commands in parallel
for images,outputs in zip(cohorts_of_image_filenames, cohorts_of_output_filenames):
processes = [subprocess.Popen(["ImagePersistence", infile, outfile, filtration_type]) for infile, outfile in zip(images, outputs) ]
# Block until processing complete
exitcodes = [p.wait() for p in processes]
Reading the results of the persistence calculation as a long table of numbers is not in itself very illuminating. Instead, we can use a tool which allows us to visualize and interact with the results of the persistence calculation as a persistence diagram. To use the tool to explore our data, we first:
The following commands will initialize the variables needed to provide the parameters for the tool. See the note provided at the end of this tutorial to understand the conventions used to populate the paths to the image and persistence diagram files.
In [9]:
imagefiles = [ '/files/data/bmp/%05d.bmp' % i for i in range(1,21)]
pdfiles = [ '/files/data/bmp/pd_sub/%05d.csv' % i for i in range(1,21)]
frames = range(0, len(imagefiles))
imagesize = [421, 421]
max_image_display_size = 400
persistence_diagram_display_size = 400
After these general environment parameters have been set up, we can now call the persistence explorer to interact with our data. We still need to choose a dimension of interest (i.e. $H_0$ features or $H_1$ persistence features), and then provide this choice and the above parameters to the tool.
The tool currently comes provided with the following features:
Selecting a region in the persistence diagram with a lasso tool will display annotated features overlaid on the image, and run an animation which shows how they evolve with time (if you have provided more than one frame). The annotated features come from the critical cell pairings of discrete Morse theory. Open circles indicate birth critical cells (e.g. local minima for sublevel set dimension 0 persistence), and closed circles indicate death critical cells (e.g. saddle points for sublevel set dimension 0 persistence). Line segments are drawn between the critical cells to show the pairings that correspond to the encircled persistence points.
Selecting a rectangle in the image will highlight the associated persistence points in the persistence diagram. Any persistence point that has at least one critical cell from its underlying pairing (either birth or death) will be highlighted on the persistence diagram.
Let's try it out! After running the following cell, you'll see the first sample image on the left and a time series of persistence diagrams overlaid as a set on the right. Gray persistence points correspond to the first frame in the dataset, and blue persistence points correspond to the last frame in the dataset. Go ahead and run the following cell now and then try out the suggested features following the output.
In [10]:
dimension_of_interest = 0
PersistenceExplorer(imagefiles, pdfiles, frames, dimension_of_interest,
imagesize, max_image_display_size, persistence_diagram_display_size)
Out[10]:
Visualizing persistence generators
To explore the above sample data, try encircling some persistence points on the persistence diagram by drawing a circle around them. When you release the mouse/trackpad, the circle will complete itself (it is not important to draw the entire circle yourself) and then an animation will play. The frame number indicator will advance through the frames in the sample dataset and the critical cell pairings underlying each encircled persistence point will be overlaid on top of the corresponding image. It is best to wait until the animation has finished before circling additional points on the persistence plane.
Reverse search feature
Next, try out the reverse-search feature by clicking and dragging a box around a feature in the image itself. Upon releasing the mouse/trackpad, any persistence points that have a critical cell in that box will be highlighted in the persistence diagram, with yellow corresponding to the first frame in the dataset and red corresponding to the last frame. Now try encircling a collection of highlighted persistence points and see how they correspond to the region you selected.
Try this again using the dimension 1 sublevel set persistence diagrams.
In [ ]:
dimension_of_interest = 1
PersistenceExplorer(imagefiles, pdfiles, frames, dimension_of_interest,
imagesize, max_image_display_size, persistence_diagram_display_size)
Note that above, the file paths used for ProcessImageFolderWithPHAT
and for PersistenceExplorer
are not the same.
ProcessImageFolderWithPHAT
executes server side code which treats /
as the root of the filesystem as recognized as the OS.
PersistenceExplorer
, on the other hand, is javascript which executes client-side. The client can only see the part of the filesystem which Jupyter serves over HTTP. The following example assumes jupyter notebook
has been launched from the tda-d3-explorer
directory. Jupyter Notebook treats this as the root directory, (which it names /files
as opposed to just /
). Thus we prefix all our paths with /files
and can only access files underneath the file-system from wherever Jupyter Notebook was launched. For example /path/to/tda-d3-explorer/data/bmp/00001.bmp
becomes /files/data/bmp/00001.bmp
since Jupyter Notebook was launched in /path/to/tda-d3-explorer
.
It is not clear yet the best way to resolve this confusion.