This example describes how to run image analysis tasks in a workflow process on collections of slides hosted on a DSA server. The API is used to retrieve pixel data from the server and the analysis is performed locally, with results pushed back to the server as anotations for visualization. Note that this is not the same approach as using girder tasks to run jobs remotely as a user through HistomicsUI. This is a utility intended for use by developers of image analysis and machine learning algorithms.
In this example, we will be running a cellularity detection workflow on all slides in the following source girder directory and the results are posted to the following results girder directory.
Where to look?
|_ histomicstk/
|_workflows/
|_workflow_runner.py
|_specific_workflows.py
|_tests/
|_test_workflow_runner.py
In [1]:
import shutil
import tempfile
import girder_client
# import numpy as np
from pandas import read_csv
from histomicstk.workflows.workflow_runner import Slide_iterator
# from histomicstk.saliency.cellularity_detection import (
# Cellularity_detector_superpixels)
from histomicstk.saliency.cellularity_detection_thresholding import (
Cellularity_detector_thresholding)
from histomicstk.workflows.workflow_runner import (
Workflow_runner, Slide_iterator)
from histomicstk.workflows.specific_workflows import (
cellularity_detection_workflow)
In [2]:
APIURL = 'http://candygram.neurology.emory.edu:8080/api/v1/'
SAMPLE_SOURCE_FOLDER_ID = "5d5c28c6bd4404c6b1f3d598"
SAMPLE_DESTINATION_FOLDER_ID = "5d9246f6bd4404c6b1faaa89"
# girder client
gc = girder_client.GirderClient(apiUrl=APIURL)
# gc.authenticate(interactive=True)
gc.authenticate(apiKey='kri19nTIGOkWH01TbzRqfohaaDWb6kPecRqGmemb')
# This is where the run logs will be saved
logging_savepath = tempfile.mkdtemp()
# params for cellularity thresholding
cdt_params = {
'gc': gc,
'slide_id': '', # this will be handled by the slide iterator
'GTcodes': read_csv('../../histomicstk/saliency/tests/saliency_GTcodes.csv'),
'MAG': 3.0,
'visualize': True,
'verbose': 2,
'logging_savepath': logging_savepath,
}
In [3]:
print(Workflow_runner.__init__.__doc__)
Note how this requires:
Slide_iterator
instance - which yields information
about the slides you want to the run the workflow on.workflow
- a method that you define, which runs on a single slide.workflow_kwargs
- parameters for your defined method. In this example, we will be using cellularity_detection_workflow() as our workflow
to run, which is defined in the histomicstk.workflows.specific_workflows
module.
In [4]:
print(Slide_iterator.__init__.__doc__)
In [5]:
print(cellularity_detection_workflow.__doc__)
In [6]:
# Init specific workflow (Cellularity_detector_thresholding)
cdt = Cellularity_detector_thresholding(**cdt_params)
# Init workflow runner
workflow_runner = Workflow_runner(
slide_iterator=Slide_iterator(
gc, source_folder_id=SAMPLE_SOURCE_FOLDER_ID,
# keep_slides=None), # run all slides in girder directory
keep_slides=[ # run specific slides only
'TCGA-A1-A0SK-01Z-00-DX1_POST.svs',
'TCGA-A2-A04Q-01Z-00-DX1_POST.svs',
]),
workflow=cellularity_detection_workflow,
workflow_kwargs={
'gc': gc,
'cdo': cdt,
'destination_folder_id': SAMPLE_DESTINATION_FOLDER_ID,
'keep_existing_annotations': False, },
logging_savepath=cdt.logging_savepath,
monitorPrefix='test')
In [7]:
workflow_runner.run()
Now you may go to the Digital Slide Archive and check the posted results at the results girder directory.