Overview:
Most segmentation algorithms produce outputs in an image format. Visualizing these outputs in HistomicsUI requires conversion from mask images to an annotation document containing (x,y) coordinates in the whole-slide image coordinate frame. This notebook demonstrates this conversion process in two steps:
Converting a mask image into contours (coordinates in the mask frame)
Placing contours data into a format following the annotation document schema that can be pushed to DSA for visualization in HistomicsUI.
This notebook is based on work described in Amgad et al, 2019:
Mohamed Amgad, Habiba Elfandy, Hagar Hussein, ..., Jonathan Beezley, Deepak R Chittajallu, David Manthey, David A Gutman, Lee A D Cooper, Structured crowdsourcing enables convolutional segmentation of histology images, Bioinformatics, 2019, btz083
Where to look?
|_ histomicstk/
|_annotations_and_masks/
| |_masks_to_annotations_handler.py
|_tests/
|_test_masks_to_annotations_handler.py
In [1]:
import os
CWD = os.getcwd()
import girder_client
from pandas import read_csv
from imageio import imread
from histomicstk.annotations_and_masks.masks_to_annotations_handler import (
get_contours_from_mask,
get_single_annotation_document_from_contours,
get_annotation_documents_from_contours)
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = 7, 7
In [2]:
# APIURL = 'http://demo.kitware.com/histomicstk/api/v1/'
# SAMPLE_SLIDE_ID = '5bbdee92e629140048d01b5d'
APIURL = 'http://candygram.neurology.emory.edu:8080/api/v1/'
SAMPLE_SLIDE_ID = '5d586d76bd4404c6b1f286ae'
# Connect to girder client
gc = girder_client.GirderClient(apiUrl=APIURL)
gc.authenticate(interactive=True)
# gc.authenticate(apiKey='kri19nTIGOkWH01TbzRqfohaaDWb6kPecRqGmemb')
Out[2]:
This contains the ground truth codes and information dataframe. This is a dataframe that is indexed by the annotation group name and has the following columns:
group
: group name of annotation (string), eg. "mostly_tumor"GT_code
: int, desired ground truth code (in the mask) Pixels of this value belong to corresponding group (class)color
: str, rgb format. eg. rgb(255,0,0).NOTE:
Zero pixels have special meaning and do not encode specific ground truth class. Instead, they simply mean 'Outside ROI' and should be ignored during model training or evaluation.
In [3]:
# read GTCodes dataframe
GTCODE_PATH = os.path.join(
CWD, '..', '..', 'tests', 'test_files', 'sample_GTcodes.csv')
GTCodes_df = read_csv(GTCODE_PATH)
GTCodes_df.index = GTCodes_df.loc[:, 'group']
In [4]:
GTCodes_df.head()
Out[4]:
In [5]:
# read mask
X_OFFSET = 59206
Y_OFFSET = 33505
MASKNAME = "TCGA-A2-A0YE-01Z-00-DX1.8A2E3094-5755-42BC-969D-7F0A2ECA0F39" + \
"_left-%d_top-%d_mag-BASE.png" % (X_OFFSET, Y_OFFSET)
MASKPATH = os.path.join(CWD, '..', '..', 'tests', 'test_files', 'annotations_and_masks', MASKNAME)
MASK = imread(MASKPATH)
In [6]:
plt.figure(figsize=(7,7))
plt.imshow(MASK)
plt.title(MASKNAME[:23])
plt.show()
This function get_contours_from_mask()
generates contours from a mask image. There are many parameters that can be set but most have defaults set for the most common use cases. The only required parameters you must provide are MASK
and GTCodes_df
, but you may want to consider setting the following parameters based on your specific needs: get_roi_contour
, roi_group
, discard_nonenclosed_background
, background_group
, that control behaviour regarding region of interest (ROI) boundary and background pixel class (e.g. stroma).
In [7]:
print(get_contours_from_mask.__doc__)
In [8]:
# Let's extract all contours from a mask, including ROI boundary. We will
# be discarding any stromal contours that are not fully enclosed within a
# non-stromal contour since we already know that stroma is the background
# group. This is so things look uncluttered when posted to DSA.
groups_to_get = None
contours_df = get_contours_from_mask(
MASK=MASK, GTCodes_df=GTCodes_df, groups_to_get=groups_to_get,
get_roi_contour=True, roi_group='roi',
discard_nonenclosed_background=True,
background_group='mostly_stroma',
MIN_SIZE=30, MAX_SIZE=None, verbose=True,
monitorPrefix=MASKNAME[:12] + ": getting contours")
In [9]:
contours_df.head()
Out[9]:
In [10]:
print(get_annotation_documents_from_contours.__doc__)
As mentioned in the docs, this function wraps get_single_annotation_document_from_contours()
In [11]:
print(get_single_annotation_document_from_contours.__doc__)
Let's get a list of annotation documents (each is a dictionary). For the purpose of this tutorial,
we separate the documents by group (i.e. each document is composed of polygons from the same
style/group). You could decide to allow heterogeneous groups in the same annotation document by
setting separate_docs_by_group
to False
. We place 10 polygons in each document for this demo
for illustration purposes. Realistically you would want each document to contain several hundred depending on their complexity. Placing too many polygons in each document can lead to performance issues when rendering in HistomicsUI.
In [12]:
# get list of annotation documents
annprops = {
'X_OFFSET': X_OFFSET,
'Y_OFFSET': Y_OFFSET,
'opacity': 0.2,
'lineWidth': 4.0,
}
annotation_docs = get_annotation_documents_from_contours(
contours_df.copy(), separate_docs_by_group=True, annots_per_doc=10,
docnamePrefix='demo', annprops=annprops,
verbose=True, monitorPrefix=MASKNAME[:12] + ": annotation docs")
In [13]:
ann_doc = annotation_docs[0].copy()
ann_doc['elements'] = ann_doc['elements'][:2]
for i in range(2):
ann_doc['elements'][i]['points'] = ann_doc['elements'][i]['points'][:5]
In [14]:
ann_doc
Out[14]:
In [15]:
# deleting existing annotations in target slide (if any)
existing_annotations = gc.get('/annotation/item/' + SAMPLE_SLIDE_ID)
for ann in existing_annotations:
gc.delete('/annotation/%s' % ann['_id'])
# post the annotation documents you created
for annotation_doc in annotation_docs:
resp = gc.post(
"/annotation?itemId=" + SAMPLE_SLIDE_ID, json=annotation_doc)
Now you can go to HistomicsUI and confirm that the posted annotations make sense and correspond to tissue boundaries and expected labels.