Overview:
This notebook describes how to merge annotations generated by tiled analysis of a whole-slide image. Since tiled analysis is carried out on small tiles, the annotations produced by image segmentation algorithms will be disjoint at the tile boundaries, prohibiting analysis of large structures that span multiple tiles.
The example presented below addresses the case where the annotations are stored in an array format that preserves the spatial organization of tiles. This scenario arises when iterating through the columns and rows of a tiled representation of a whole-slide image. Analysis of the organized array format is faster and preferred since the interfaces where annotations need to be merged are known. In cases where where the annotations to be merged do not come from tiled analysis, or where the tile results are not organized, an alternative method based on R-trees provides a slightly slower solution.
This extends on some of the work described in Amgad et al, 2019:
Mohamed Amgad, Habiba Elfandy, Hagar Hussein, ..., Jonathan Beezley, Deepak R Chittajallu, David Manthey, David A Gutman, Lee A D Cooper, Structured crowdsourcing enables convolutional segmentation of histology images, Bioinformatics, 2019, btz083
This is a sample result:
Implementation summary
In the tiled array approach the tiles must be rectangular and unrotated. The algorithm used merges polygons in coordinate space so that almost-arbitrarily large structures can be handled without encountering memory issues. The algorithm works as follows:
Extract contours from the given masks using functionality from the masks_to_annotations_handler.py
, making sure to account for contour offset so that all coordinates are relative to whole-slide image frame.
Identify contours that touch tile interfaces.
Identify shared edges between tiles.
For each shared edge, find contours that neighbor each other (using bounding box location) and verify that they should be paired using shapely.
Using 4-connectivity link all pairs of contours that are to be merged.
Use morphologic processing to dilate and fill gaps in the linked pairs and then erode to generate the final merged contour.
This initial steps ensures that the number of comparisons made is << n^2
. This is important since algorithm complexity plays a key role as whole slide images may contain tens of thousands of annotated structures.
Where to look?
histomicstk/
|_annotations_and_masks/
|_polygon_merger.py
|_tests/
|_ test_polygon_merger.py
|_ test_annotations_to_masks_handler.py
In [1]:
import os
CWD = os.getcwd()
import os
import girder_client
from pandas import read_csv
from histomicstk.annotations_and_masks.polygon_merger import Polygon_merger
from histomicstk.annotations_and_masks.masks_to_annotations_handler import (
get_annotation_documents_from_contours, )
In [2]:
APIURL = 'http://candygram.neurology.emory.edu:8080/api/v1/'
SAMPLE_SLIDE_ID = '5d586d76bd4404c6b1f286ae'
gc = girder_client.GirderClient(apiUrl=APIURL)
gc.authenticate(interactive=True)
# gc.authenticate(apiKey='kri19nTIGOkWH01TbzRqfohaaDWb6kPecRqGmemb')
# read GTCodes dataframe
PTESTS_PATH = os.path.join(CWD, '..', '..', 'tests')
GTCODE_PATH = os.path.join(PTESTS_PATH, 'test_files', 'sample_GTcodes.csv')
GTCodes_df = read_csv(GTCODE_PATH)
GTCodes_df.index = GTCodes_df.loc[:, 'group']
# This is where masks for adjacent rois are saved
MASK_LOADPATH = os.path.join(
PTESTS_PATH,'test_files', 'annotations_and_masks', 'polygon_merger_roi_masks')
maskpaths = [
os.path.join(MASK_LOADPATH, j) for j in os.listdir(MASK_LOADPATH)
if j.endswith('.png')]
In [3]:
print(Polygon_merger.__doc__)
In [4]:
print(Polygon_merger.__init__.__doc__)
In [5]:
print(Polygon_merger.run.__doc__)
This contains the ground truth codes and information dataframe. This is a dataframe that is indexed by the annotation group name and has the following columns:
group
: group name of annotation (string), eg. "mostly_tumor"GT_code
: int, desired ground truth code (in the mask) Pixels of this value belong to corresponding group (class)color
: str, rgb format. eg. rgb(255,0,0).NOTE:
Zero pixels have special meaning and do NOT encode specific ground truth class. Instead, they simply mean 'Outside ROI' and should be IGNORED during model training or evaluation.
In [6]:
GTCodes_df.head()
Out[6]:
In [7]:
[os.path.split(j)[1] for j in maskpaths[:5]]
Out[7]:
Note that the pattern _left-123_
and _top-123_
is assumed to encode the x and y offset
of the mask at base magnification. If you prefer some other convention, you will need to manually provide the
parameter roi_offsets
to the method Polygon_merger.set_roi_bboxes
.
In [8]:
print(Polygon_merger.set_roi_bboxes.__doc__)
In [9]:
pm = Polygon_merger(
maskpaths=maskpaths, GTCodes_df=GTCodes_df,
discard_nonenclosed_background=True, verbose=1,
monitorPrefix='test')
contours_df = pm.run()
In [10]:
contours_df.head()
Out[10]:
In [11]:
# deleting existing annotations in target slide (if any)
existing_annotations = gc.get('/annotation/item/' + SAMPLE_SLIDE_ID)
for ann in existing_annotations:
gc.delete('/annotation/%s' % ann['_id'])
# get list of annotation documents
annotation_docs = get_annotation_documents_from_contours(
contours_df.copy(), separate_docs_by_group=True,
docnamePrefix='test',
verbose=False, monitorPrefix=SAMPLE_SLIDE_ID + ": annotation docs")
# post annotations to slide -- make sure it posts without errors
for annotation_doc in annotation_docs:
resp = gc.post(
"/annotation?itemId=" + SAMPLE_SLIDE_ID, json=annotation_doc)
Now you can go to HistomicsUI and confirm that the posted annotations make sense and correspond to tissue boundaries and expected labels.