In [2]:
from IPython.core.display import HTML
HTML('custom.css')


Out[2]:

Welcome to Lightning

Lightning is intended to be an open-source library for integrating and analyzing spatial data obtained from a variety of imaging modalities. Lightning will hopefully be capable of working with small, medium, and very large data sets. Like its sibling, Thunder, it will expose array operations through either local or distributed implementations with a common interface, and makes it easy to switch between them. Its distributed implementation currently targets Spark, a powerful cluster computing framework.

To see the internals of Lightning, visit the project page.

Tutorials

These notebooks are interactive tutorials that show how Lightning works:

Basic usage


Introduction

The overarching theme behind Lightning is 'fusion' at multiple scales: fusion of imaging data, fusion of the latest algorithms/tools, fusion of its applications. The original motivation behind Lightning was to develop and test spatial intratumor heterogeneity metrics that correlate with particular cancer subtypes using deep neural networks. Our particular goal was to establish an image analysis pipeline that would provide end-to-end, automated classification of multiplexed immunofluorescence images obtained from tissue microarray sections. However, since we are interested in more than tissue microarray data and classification tasks, our project has quickly expanded into the development of a pipeline for integrating and analyzing data along the imaging continuum. We hope to include:

  • Light microscopy
  • Fluorescent and immunofluorescent (IF) imaging
  • Immunohistochemistry (IHC)
  • Histopathology (H&E)
  • Mass-Spec imaging (IMS)
  • In-situ hybridization imaging (ISH)

To facilitate the formation and analysis of these multimodal images, we will use a ‘fusion’ of the latest, state-of-the-art algorithms and tools. These include:

  • EM patch-based CNNs (summer 2016)
  • Spatial statistics (summer, fall 2016)
  • Nonlinear image registration - registration is pre-req for image fusion (fall 2016, winter 2017)
  • Image Fusion using multivariate regression
  • Interactive visualization - for clinical use

Lightning is not meant to be used strictly by the researcher to study the structure and progression of diseased tissue. The hope is for Lightning to be used as a powerful visualization tool and diagnostic advisor for the clinician.

Rethinking the Image in Lightning:

When we think about the structure of a “traditional” grayscale image, we typically imaging a 2D array representing the spatial dimensions height and width with each element representing the intensity at that spatial location. For color images, we add a third dimension to accommodate the three color channels red, green and blue.

In contrast, most imaging modalities natively deliver their measurements as an $n$th order tensor (multidimensional array of values that can be accessed via $n$ indices). More specifically, an imaging modality that records along $n_s$ spatial dimensions and acquires an $n_m$ dimensional array at each measurement location delivers an $n = n_s + n_m$ order tensor. To ground this discussion, typical examples include:

  • \\(n_s = 0\\): a single measurement
  • $n_s = 1$: a list of measurements
  • $n_s = 2$: 2D imaging (e.g. light microscopy)
  • $n_s = 3$: 3D imaging (e.g. MRI)

Similarly, the number of measurement modes can also vary:

  • $n_m = 0$ scalar value measured (e.g.grayscale image)
  • $n_m = 1$ : vector of measured values (e.g. RGB image when len(vector) = 3)
  • $n_m = 2$ : array of measured values at locations
  • And so on..

In most cases, we will be dealing with 3rd order tensors, (i.e. two spatial dimensions and a 1 dimensional measurement vector).

Internal representation of an image (Lightning)

In order to accommodate data acquired from a variety of imaging modalities, we will apply a “flattening” procedure to convert data into a tabular format (2D array). Each row in this array will represent a particular spatial location and each column will represent a particular feature or measurement made across all spatial locations. (I believe this will also lend itself nicely to Spark computations).

Figure 1. Example of the flattening procedure. An RGB image with 2 spatial dimensions and one measurement dimension with three channels (red, green and blue) is converted to a 2D array. The rows represent individual pixels and the colums correspond red, green and blue measurements at that pixel.

Advantage: In this way, an image can be viewed more as a detailed description of the “state” of a tissue in a way that makes use of the strong suits of each imaging modality. For example, we can enhance the spatial specificity of mass-spec imaging, which inherently has high chemical specificity, by fusing with H&E images. We can start to incorporate genetic information such as maps of gene expression into our description of the state of the tissue.

Handling the Image Registration Problem (Lightning-Register)

We are interested in "image-fusion" approaches to combining data across modalities, including integrating with genomics data in the long run. However, image registration is a pre-requisite for any modern approach to image fusion [1]. Thus, we will eventually need a solution to the image registration problem. To the best of my knowledge, the only cases where you could fuse without registering are instances where the same section of tissue has been imaged multiple times for different features (these are aligned by definition). If the features of interest require a different labeling/staining technique, it would be possible to differentially label neighboring sections - but now registration is needed again. Even this approach seems inefficient and complicated to do at a large scale. Having image-registration would also allow for inter-subject comparisons.

Handling the Multi-resolution Problem (Lightning-Fuse)

Now that the images are aligned, the spatial domian is common to all data sources. Even still, it may be difficult to establish a one-to-one mapping between measurements of different technical origin due to the different spatial resolution scales. The naive approach would be to downsample images to the match the modality with the lowest spatial approach. A better approach would be to apply a one-to-many mapping whereby many observations in a higher resolution modality is mapped to a single observation in a lower resolution modality (for more info see Raf Van De Plas 2015)

Image Analysis

The intent is to inlucude the following:

Spatial Statistics as Description of ITH

(Summer 2016)

Automatated, end-to-end Classification of Cancer Subtypes

(Summer 2016)

Computational Anatomy

(Fall 2016, Winter 2017)

Proposed Pipeline

Provider

The Provider is meant to bridge the gap between the db and individual image controllers. With the help of the dbHelper and knowledge of the database schema, the provider will make queries to the db and pass all the relevant to an image controller. Users/frontend will interact mostly with the provider to load dataset.

ImageController

The ImageController is mean to handle everything related to an individual image, from calling alignment and fusing modules to orchestrating the patching process.

    The Controller will maintain references to:
  • the image array
  • the patches array
  • the features array
  • indexing labels about images, patches and features (e.g. channel/subchannel lists, features of interest etc).
  • Class information (e.g. does it belong to cohort 1?)

References

  1. Van de Plas, Raf, et al. "Image fusion of mass spectrometry and microscopy: a multimodality paradigm for molecular tissue mapping." Nature methods12.4 (2015): 366-372.