SlideSeg

Author: Brendan Crabb brendancrabb8388@pointloma.edu
Created August 1, 2017


Welcome to SlideSeg, a python module that allows you to segment whole slide images into usable image chips for deep learning. Image masks for each chip are generated from associated markup and annotation files.

User Guide

1. Dependencies

SlideSeg runs on Python 2.7 and depends on the following Python libraries:

  • openslide 1.1.1
  • tqdm 4.15.0
  • cv2 3.2.0
  • numpy
  • pexif 0.15

SlideSeg and the necessary Python libraries can be installed using:

pip install slideseg

If pip isn't installed, you may have to enter the following before installing slideseg (OS X):

sudo easy_install pip

If you are using the preconfigured SlideSeg anaconda environment, these dependencies will already be installed. SlideSeg also depends on several C libraries; see section 2.2 (windows) and section 2.3 (Mac OS X) for installation instructions.

2. Anaconda Environment

Make sure anaconda is installed. The SlideSeg environment has an Ipython kernel with all of the necessary packages already installed; however, conda support for jupyter notebooks is needed to switch kernels. This support is available through conda itself and can be enabled by issuing the following command:

conda install nb_conda

2.1 Creating environment from .yml file

Copy the environment_slideseg.yml file to the anaconda directory, .../anaconda/scripts/. In the same directory, issue the following command to create the anaconda environment from the file:

conda env create -f environment_slideseg.yml

Creating the environment might take a few minutes. Once finished, issue the following command to activate the environment:

  • Windows: activate SlideSeg
  • macOS and Linux: source activate SlideSeg

If the environment was activated successfully, you should see (SlideSeg) at the beggining of the command prompt. This will set the SlideSeg kernel as your default kernel when running jupyter.

2.2 Installing C Libraries (Windows)

OpenSlide and OpenCV are C libraries; as a result, they have to be installed separately from the conda environment, which contains all of the python dependencies.

The Windows Binaries for OpenSlide can be found at 'openslide.org/download/'. Download the appropriate binaries for your system (either 32-bit or 64-bit) and unzip the file.

Copy the .dll files in ../bin/ to .../Anaconda/envs/SlideSeg/Library/bin/.

Copy the .h files to .../Anaconda/envs/SlideSeg/include/.

Finally, copy the .lib file to .../Anaconda/envs/SlideSeg/libs/.

OpenSlide has now been installed.

Use the following tutorial to download OpenCV, either from prebuilt binaries or from source:

http://docs.opencv.org/3.2.0/d5/de5/tutorial_py_setup_in_windows.html

2.3 Installing C Libraries (Mac OS X)

OpenSlide and OpenCV are C libraries; as a result, they have to be installed separately from the conda environment, which contains all of the python dependencies.

If you are using Homebrew, enter the following in the terminal:

brew install openslide

brew install opencv

OpenSlide and OpenCV should now be installed in your anaconda environment.

2.4 Launching Jupyter Notebook

The Jupyter Notebook App can be launched by clicking on the Jupyter Notebook icon installed by Anaconda in the start menu (Windows) or by typing in the terminal (cmd on Windows):

jupyter notebook

This will launch a new browser window showing the Notebook Dashboard. When started, the Jupyter Notebook app can only access files within its start-up folder. If you stored the SlideSeg notebook documents in a subfolder of your user folder, no configuration is necessary. Otherwise, you need to change your Jupyter Notebook App start-up folder.

2.5 Change Jupyter Notebook startup folder (Windows)
  • Copy the Jupyter Notebook launcher from the menu to the desktop.
  • Right click on the new launcher, select properties, and change the Target field, change %USERPROFILE% to the full path of the folder which will contain all the notebooks.
  • Double-click on the Jupyter Notebook desktop launcher (icon shows [IPy]) to start the Jupyter Notebook App, which will open in a new browser window (or tab). Note also that a secondary terminal window (used only for error logging and for shut down) will be also opened. If only the terminal starts, try opening this address with your browser: http://localhost:8888/.
2.6 Change Jupyter Notebook startup folder (OS X)

To launch Jupyter Notebook App:

  • Click on spotlight, type terminal to open a terminal window.
  • Enter the startup folder by typing cd /some_folder_name.
  • Type jupyter notebook to launch the Jupyter Notebook App (it will appear in a new browser window or tab).
2.7 Jupyter Kernel Selection

After launching the Jupyter Notebook App, navigate to the SlideSeg notebook and click on its name to open in a new browser tab. In the upper right corner, you should see Python [conda env:SlideSeg]. If not, click on Kernel> Change Kernel> and change your current kernel to Python [conda env:SlideSeg].

3. Setup

Copy all of the slide images into the images folder in the main project directory. Copy the markup and annotation files (in .xml format) into the xml folder in the main project directory. It is important that the annotation files have the same file name as the slide they are associated with.

3.1 Supported Formats

SlideSeg can read virtual slides in the following formats:

  • Aperio (.svs, .tif)
  • Hamamatsu (.ndpi, .vms, .vmu)
  • Leica (.scn)
  • MIRAX (.mrxs)
  • Philips (.tiff)
  • Sakura (.svslide)
  • Trestle (.tif)
  • Ventana (.bif, .tif)
  • Generic tile TIFF (.tif)

SlideSeg can read annotations in the following formats:

  • XML (.xml)
3.2 Parameters

SlideSeg depends on the following parameters:

slide_path: Path to the folder of slide images

xml_path: Path to the folder of xml files

output_dir: Path to the output folder where image_chips, image_masks, and text_files will be saved

format: Output format of the image_chips and image_masks (png or jpg only)

quality: Output quality: JPEG compression if output format is 'jpg' (100 recommended,jpg compression artifacts will distort image segmentation)

size: Size of image_chips and image_masks in pixels

overlap: Pixel overlap between image chips

key: The text file containing annotation keys and color codes

save_all: True saves every image_chip, False only saves chips containing an annotated pixel

save_ratio: Ratio of image_chips containing annotations to image_chips not containing annotations (use 'inf' if only annotated chips are desired; only applicable if save_all == False

These parameters can be specified in the cell below.


In [ ]:
Parameters = {
    'slide_path': 'images/',
    'xml_path': 'xml/',
    'output_dir': 'output/',
    'format': 'jpg',
    'quality': 100,
    'size': 128,
    'overlap': 1,
    'key': 'Annotation_Key.txt',
    'save_all': False,
    'save_ratio': 'inf'
}
3.3 Annotation Key

The main directory should already contain an Annotation_Key.txt file. If no Annotation_Key file is present, one will be generated automatically from the annotation files in the xml folder.

The Annotation_Key file contains every annotation key with its associated color code. In all image masks, annotations with that key will have the specified pixel value. If an unknown key is encountered, it will be given a pixel value and added to the Annotation_Key automatically.

The following functions are defined within the slideseg module and used to generate, edit, and read the annotation key:
def loadkeys(annotation_key): """ Opens annotation_key file and loads keys and color codes :param: annotation_key: the filename of the annotation key :return: color codes """

def addkeys(annotation_key, key): """ Adds new key and color_code to annotation key :param annotation_key: the filename of the annotation key :param key: The annotation to be added :return: updated annotation key file """

def writeannotations(annotation_key, annotations): """ Writes annotation keys and color codes to annotation key text file :param annotation_key: filename of annotation key :param annotations: Dictionary of annotation keys and color codes :return: .txt file with annotation keys """

def generatekey(annotation_key, path): """ Generates annotation_key from folder of xml files :param annotation_key: the name of the annotation key file :param path: Directory containing xml files :return: annotation_key file """ </code>

Use the cell below to import slideseg, as well as some other useful modules.


In [ ]:
import slideseg
import sys
import os

Run the cell below to display the Annotation Key. The first function generates a new annotation keys from the folder 'xml/' if no annotation key exists. The second function displays the key in the notebook.


In [ ]:
if not os.path.isfile('Annotation_Key.txt'):
    slideseg.generatekey('Annotation_Key.txt', 'xml/')

file = open('Annotation_Key.txt', 'r')
for line in file:
    sys.stdout.write(line)

5. Output

5.1 Image_chips

Every generated image chip will be saved in the output/image_chips folder. The chips are saved with the naming convention of slide filename_level number_row_column.format. If the chip contains an area that was annotated and the tags are enabled, it will have an associated tag (under the Subject category) with the annotation key. If the image chip does not contain annotations, the 'NONE' tag will be added. To view these tags, switch to details view and click display 'Subject' in the explorer. The files can be sorted according to their tags. Unfortunately, these tags will only be available if the output format is .jpg.

The following functions are defined in the slideseg module and are used to save both the image chips and image masks, as well as attaching exif metadata to the images:

def ensuredirectory(dest): """ Ensures the existence of a directory :param dest: Directory to ensure. :return: new directory if it did not previously exist. """

def attachtags(path, keys): """ Attaches image tags to metadata of chips and masks :param path: file to attach tags to. :param keys: keys to attach as tags :return: JPG with metadata tags """

def savechip(chip, path, quality, keys): """ Saves the image chip :param chip: the slide image chip to save :param path: the full path to the chip :param quality: the output quality :param keys: keys associated with the chip :return: """

def savemask(mask, path, keys): """ Saves the image masks :param mask: the image mask to save :param path: the complete path for the mask :param keys: keys associated with the chip :return: """

def checksave(save_all, pix_list, save_ratio, save_count_annotated, save_count_blank): """ Checks whether or not an image chip should be saved :param save_all: (bool) saves all chips if true :param pix_list: list of pixel values in image mask :param save_ratio: ratio of annotated chips to unannotated chips :param save_count_annotated: total annotated chips saved :param save_count_blank: total blank chips saved :return: bool """

def formatcheck(format): """ Assures correct format parameter was defined correctly :param format: the output format parameter :return: format :return: suffix """ </code>

The main functionality of SlideSeg is performed by the following functions. These functions takes all of the inputs specified in parameters and uses it to generate image chips and image masks.

def openwholeslide(path): """ Opens a whole slide image :param path: Slide image path. :return: slide image, levels, and dimensions """

def curatemask(mask, scale_width, scale_height, chip_size): """ Resize and pad annotation mask if necessary :param mask: an image mask :param scale_width: scaling for higher magnification levels :param scale_height: scaling for higher magnification levels :return: curated annotation mask """

def getchips(levels, dims, chip_size, overlap, mask, annotations, filename, suffix, save_all, save_ratio): """ Finds chip locations that should be loaded and saved

:param levels: levels in whole slide image
:param dims: dimension of whole slide image
:param chip_size: the size of the image chips
:param overlap: overlap between image chips (stride)
:param mask: annotation mask for slide image
:param annotations: dictionary of annotations in image
:param filename: slide image filename
:param suffix: output format for saving.
:param save_all: whether or not to save every image chip (bool)
:param save_ratio: ratio of annotated to unannotated chips (float)
:return: chip_dict. Dictionary of chip names, level, col, row, and scale
:return: image_dict. Dictionary of annotations and chips with those annotations
"""

def run(parameters, filename): """ Runs SlideSeg: Generates image chips from a whole slide image. :param parameters: specified in Parameters.txt file :param filename: filename of whole slide image :return: image chips and masks. """ </code>

5.2 Image_masks

An image mask for each image chip is saved in the output/image_masks folder. The mask has the same name as the image chip it is associated with. Furthermore, these masks will have the same tags, allowing you to sort by annotation type.

The following function handles the generation of an annotation mask from xml files:

def makemask(annotation_key, size, xml_path): """ Reads xml file and makes annotation mask for entire slide image :param annotation_key: name of the annotation key file :param size: size of the whole slide image :param xml_path: path to the xml file :return: annotation mask :return: dictionary of annotation keys and color codes """

5.3 Text Files

A text file with details about annotations and image chips will also be saved to output/textfiles. For each slide image, this text file will contain a list of all annotation keys present in the image. For each annotation key, a list of every image chip/mask containing that specific key is also recorded in this file.

The following functions generates these .txt files:

def writekeys(filename, annotations): """ Writes each annotation key to the output text file :param filename: filename of image chip :param annotations: dictionary of annotation keys :return: updated text file """

def writeimagelist(filename, image_dictionary): """ Writes list of images containing each annotation key :param filename: the name of the slide image :param image_dictionary: dictionary of images with each key :return text """ </code>

6. Run

To execute SlideSeg, simply run the jupyter notebook cells below. Alternatively, you can run the python script 'main.py'. Make sure that you defined the Parameters above. If the python script is used, the parameters are specified in the Parameters.txt file.

To get started, run the cell below to make sure all of the necessary modules are imported.


In [ ]:
import slideseg
import tqdm
import os

The following cell defines a function run(parameters, filename) that generates image chips and masks from the slide image and xml file specified by filename. This function uses the slideseg module to open the slide image, generate an annotation mask, find regions of interest, and save chip data. This function is also defined within the module as slideseg.run(parameters, filename).


In [ ]:
def run(parameters, filename):
    """
    Runs SlideSeg: Generates image chips from a whole slide image.
    :param parameters: specified in Parameters.txt file
    :param filename: filename of whole slide image
    :return: image chips and masks.
    """

    # Define variables
    _slide_path = parameters["slide_path"]
    _xml_path = parameters["xml_path"]
    _output_dir = parameters["output_dir"]
    _format = parameters["format"]
    _quality = int(parameters["quality"])
    _chip_size = int(parameters["size"])
    _overlap = int(parameters["overlap"])
    _key = parameters["key"]
    _save_all = parameters["save_all"]
    _save_ratio = parameters["save_ratio"]

    # Open slide
    _osr, _levels, _dims = slideseg.openwholeslide('{0}{1}'.format(_slide_path, filename))
    _size = (int(_dims[0][0]), int(_dims[0][1]))

    # Annotation Mask
    xml_file = filename.rstrip(".svs")
    xml_file = xml_file + ".xml"

    print('loading annotation data from {0}/{1}'.format(_xml_path, xml_file))
    _mask, _annotations = slideseg.makemask(_key, _size, '{0}{1}'.format(_xml_path, xml_file))

    # Define output directory
    output_directory_chip = '{0}image_chips/'.format(_output_dir)
    output_directory_mask = '{0}image_mask/'.format(_output_dir)

    # Output formatting check
    _format, _suffix = slideseg.formatcheck(_format)

    # Find chip data/locations to be saved
    chip_dictionary, image_dict = slideseg.getchips(_levels, _dims, _chip_size, _overlap,
                                           _mask, _annotations, filename, _suffix, _save_all, _save_ratio)

    # Save chips and masks
    print('Saving chips... {0} total chips'.format(len(chip_dictionary)))

    for filename, value in tqdm.tqdm(chip_dictionary.iteritems()):
        keys = value[0]
        i = value[1]
        col = value[2]
        row = value[3]
        scale_factor_width = value[4] 
        scale_factor_height = value[5]

        # load chip region from slide image
        img = _osr.read_region([int(col * scale_factor_width), int(row * scale_factor_height)], i,
                              [_chip_size, _chip_size]).convert('RGB')

        # load image mask and curate
        img_mask = _mask[int(row * scale_factor_height):int((row + _chip_size) * scale_factor_height),
                         int(col * scale_factor_width):int((col + _chip_size) * scale_factor_width)]

        img_mask = slideseg.curatemask(img_mask, scale_factor_width, scale_factor_height, _chip_size)

        # save the image chip and image mask
        _path_chip = output_directory_chip + filename
        _path_mask = output_directory_mask + filename

        slideseg.savechip(img, _path_chip, _quality, keys)
        slideseg.savemask(img_mask, _path_mask, keys)

    # Make text output of Annotation Data
    print('Updating txt file details...')

    slideseg.writekeys(xml_file, _annotations)
    slideseg.writeimagelist(xml_file, image_dict)

    print('txt file details updated')

Now that we have imported the necessary modules and defined the function run(), we can execute SlideSeg by running the cell below, which simply passes the parameter and filename information to run().


In [ ]:
print('running __main__ with parameters: {0}'.format(Parameters))

if not os.path.isdir(Parameters["slide_path"]):
    path, filename = os.path.split(Parameters["slide_path"])
    xpath, xml_filename = os.path.split(Parameters["xml_path"])
    Parameters["slide_path"] = path
    Parameters["xml_path"] = xpath

    print('loading {0}'.format(filename))
    run(Parameters, filename)

else:
    for filename in os.listdir(Parameters["slide_path"]):
        run(Parameters, filename)

In [ ]: