This notebook demonstrates how to use large-image to extract pixel data and image metadata from multiresolution whole-slide image files. It is designed to work with a variety of tile sources including OpenSlide, and provides functions for extraction regions, for translating regions across magnifications, and for iterating through a tiled representation of a whole-slide image.
In [1]:
import large_image
In [2]:
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
#Some nice default configuration for plots
plt.rcParams['figure.figsize'] = 10, 10
plt.rcParams['image.cmap'] = 'gray'
The following code downloads a sample slide from data.kitware.com
In [3]:
wsi_url = 'https://data.kitware.com/api/v1/file/5899dd6d8d777f07219fcb23/download'
wsi_path = 'TCGA-02-0010-01Z-00-DX4.07de2e55-a8fe-40ee-9e98-bcb78050b9f7.svs'
if not os.path.isfile(wsi_path):
!curl -OJ "$wsi_url"
In [4]:
ts = large_image.getTileSource(wsi_path)
The getTileSource()
function can read a variety of image file formats. It detects the format automatically and abstracts the differences between them. Some formats require installing optional dependencies. For instance, large_image needs to have been installed with extras, such as pip install .[openslide]
, and your system needs to have the necessary core packages installed. The available sources are:
Other tile sources can be added by extending the FileTileSource
base class, provided that they implement the getTile
method.
The getTileSource()
function returns an object of the TileSource class that contains the following utility functions for reading different kinds of information from whole-slide images:
The purpose and typical usage of each of these utility functions is presented in detail below.
The getMetadata()
function of the TileSource class returns a python dict containing basic metadata of a slide:
In [5]:
ts.getMetadata()
Out[5]:
The getNativeMagnification()
function of the TileSource class returns a python dict containing the magnification and physical size of a pixel in millimeters at the base or highest resolution level at which the slide was scanned
In [6]:
ts.getNativeMagnification()
Out[6]:
The getMagnificationForLevel()
function of the TileSource class returns a python dict containing the magnification and physical size of a pixel for a specified level in the image pyramid.
In [7]:
# Get the magnification associated with Level 0
ts.getMagnificationForLevel(level=0)
Out[7]:
In [8]:
# Get the magnification associated with all levels of the image pyramid
for i in range(ts.levels):
print('Level-{} : {}'.format(
i, ts.getMagnificationForLevel(level=i)))
The getLevelForMagnification()
function of the TileSource class returns the level of the image pyramid associated with a specific magnification or pixel size in millimeters.
In [9]:
# get level whose magnification is closest to 10x
print('Level with magnification closest to 10x = {}'.format(
ts.getLevelForMagnification(10)))
In [10]:
# get level whose pixel width is closest to 0.0005 mm
print('Level with pixel width closest to 0.0005mm = {}'.format(
ts.getLevelForMagnification(mm_x=0.0005)))
The tileIterator()
function provides a iterator for sequentially iterating through the entire slide or a region of interest (ROI) within the slide at any desired resolution in a tile-wise fashion.
Among others, below are the main optional parameters of tileIterator that cover most of the tile-wise iteration use-cases for image analysis:
region
- allows you to specify an ROI within the slide.scale
- allows you to specify the desired magnification/resolution.tile_size
- allows you to specify the size of the tile.tile_overlap
- allows you to specify the amount of overlap between adjacent tiles.format
- allows you to specify the format of the tile image (numpy array or PIL image).At each iteration the tileIterator outputs a dictionary that includes:
tile
- cropped tile image that is lazy loaded or computed only when this element of the dictionary is explicitly accessed.format
- format of the tile.x, y
- (left, top) coordinates in current magnification pixels.width, height
- size of current tile in current magnification pixels.level
- level of the current tile.magnification
- magnification of the current tile.mm_x, mm_y
- size of the current tile pixel in millimeters.gx, gy
- (left, top) coordinate in base/maximum resolution pixels.gwidth, gheight
- size of of the current tile in base/maximum resolution pixels.The lazy loading of the tile image allows us to quickly iterate through the tiles and selectively process tiles of interest based on the tile metadata.
The code below shows how to iterate through an ROI within a slide with a specific tile size and at a specific resolution
In [11]:
num_tiles = 0
tile_means = []
tile_areas = []
for tile_info in ts.tileIterator(
region=dict(left=5000, top=5000, width=20000, height=20000, units='base_pixels'),
scale=dict(magnification=20),
tile_size=dict(width=1000, height=1000),
tile_overlap=dict(x=50, y=50),
format=large_image.tilesource.TILE_FORMAT_PIL
):
if num_tiles == 100:
print('Tile-{} = '.format(num_tiles))
display(tile_info)
im_tile = np.array(tile_info['tile'])
tile_mean_rgb = np.mean(im_tile[:, :, :3], axis=(0, 1))
tile_means.append( tile_mean_rgb )
tile_areas.append( tile_info['width'] * tile_info['height'] )
num_tiles += 1
slide_mean_rgb = np.average(tile_means, axis=0, weights=tile_areas)
print('Number of tiles = {}'.format(num_tiles))
print('Slide mean color = {}'.format(slide_mean_rgb))
The getSingleTile()
function can be used to directly get the tile at a specific position of the tile iterator.
In addition to the aforementioned parameters of the tileIterator
, it takes a tile_position parameter that can be used to specify the linear position of the tile of interest.
In [12]:
pos = 1000
tile_info = ts.getSingleTile(
tile_size=dict(width=1000, height=1000),
scale=dict(magnification=20),
tile_position=pos
)
plt.imshow(tile_info['tile'])
Out[12]:
The getRegion()
function can be used to get a rectangular region of interest (ROI) within the slide at any scale/magnification via the following two parameters:
region
- a dictionary containing the (left, top, width, height, units) of the ROIscale
- a dictionary containing the magnification or the physical size of a pixel (mm_x, mm_y)The following code shows how to read an ROI
In [13]:
im_roi, _ = ts.getRegion(
region=dict(left=10000, top=10000, width=1000, height=1000, units='base_pixels'),
format=large_image.tilesource.TILE_FORMAT_NUMPY
)
plt.imshow(im_roi)
Out[13]:
The following code reads the entire slide at a low magnification
In [14]:
im_low_res, _ = ts.getRegion(
scale=dict(magnification=1.25),
format=large_image.tilesource.TILE_FORMAT_NUMPY
)
plt.imshow(im_low_res)
Out[14]:
The convertRegionScale()
function can be used to convert a region from one scale/magnification to another as illustrated in the following example
In [15]:
tr = ts.convertRegionScale(
sourceRegion=dict(left=5000, top=5000, width=1000, height=1000,
units='mag_pixels'),
sourceScale=dict(magnification=20),
targetScale=dict(magnification=10),
targetUnits='mag_pixels'
)
display(tr)
The getRegionAtAnotherScale() function can be used to get an image of a region at another scale.
In [16]:
# get a large region defined at base resolution at a much lower scale
im_roi, _ = ts.getRegionAtAnotherScale(
sourceRegion=dict(left=5000, top=5000, width=10000, height=10000,
units='base_pixels'),
targetScale=dict(magnification=1.25),
format=large_image.tilesource.TILE_FORMAT_NUMPY)
print im_roi.shape