In [5]:
%matplotlib inline

In [8]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
# import seaborn as sns
# sns.set_style('darkgrid')
# sns.set_context('notebook')

from lightning.image.controller import ImageController
from lightning_flash import image as flash
from test.resources import demo_resources as demo

Let's see the ImageController in action:

Since we don't have a database up and running, we will pass the ImageController the information it needs manually. As soon as the database is set up, the Provider will make queries to the database and provide individual ImageControllers the information they need.

Loading images and creating ImageControl


In [7]:
imcontroller = ImageController(demo.image_info)
demo.image_info.items()


Out[7]:
[('IF',
  {'channel_list': ['ER', 'PR', 'HER2'],
   'file_name': {'feature': 'feature_name', 'images': 'image_names'},
   'image_shape': (2048, 2048),
   'path': {'features': ['test/resources/data/013_Quant.csv'],
    'images': ['test/resources/images/ER-allTissue/ER_AFRemoved_013.tif',
     'test/resources/images/PR-allTissue/PR_AFRemoved_013.tif',
     'test/resources/images/HER2-allTissue/Her2_AFRemoved_013.tif']}})]

Normally the image controller would be passed images directly, but for now, we have to load them from disk by calling generate_im_obj


In [4]:
imcontroller.generate_image_obj()


Out[4]:
Image
mode: local
dtype: uint16
shape: (3, 2048, 2048)

From above, it is possible to inspect basic properties such as whether or not the ImageController's image is running in local or Spark mode, its type and shape. (3, 2048, 2048) indicates that our image is 2048 x 2048 with 3 channels. Let's look at some of the other information housed by the ImageController:


In [5]:
imcontroller.channels
imcontroller
print('Numerical Labels that index image: ')
print(imcontroller.image_obj.labels)
print('Channels: ')
print(imcontroller.channels)
print('Image types: ' + str(imcontroller.feat_data.keys()))
print ('Features: ')
print(imcontroller.feat_data['IF'].keys())
print('ER features: ')
print(imcontroller.feat_data['IF']['intensity']['ER'].keys())
xy = imcontroller.feat_data['IF']['xy']
x = xy[:,0]
a = ((xy[:, 0] > 100) & (xy[:, 1] < 400))
print sum(a)
# print np.where([xy[:, 0] > 100], [xy[:, 1] < 300])
# print xy[:,0]
b = [True, False, True]

def generate_patches():
    num_patches = 5
    patches = np.zeros((num_patches, 3, 256, 256))
    for i in xrange(num_patches):
        a = np.ones((3, 256, 256))
        patches[i] = a

    return patches, np.ones(num_patches)
   

patches = []
labels = []

for i in xrange(5):
    pats, labs =generate_patches()
    patches.append(pats)
    labels.append(labs)



print np.vstack(patches).shape
print np.hstack(labels).shape


# X = np.vstack(zip(*patches_labels)[0])

# a = np.ones((3, 256, 256))
# b = np.ones((3, 256, 256))
# aug_patches = []
# aug_patches2 = []

# aug_patches.append(a)
# aug_patches.append(b)

# print type(aug_patches)

# aug_patches2.append(a)
# aug_patches2.append(b)

# aug_patches = np.stack(aug_patches, axis=0)
# aug_patches2 = np.stack(aug_patches2, axis=0)
# print aug_patches.shape
# print aug_patches2.shape

# c = []
# c.append((aug_patches,1))
# c.append((aug_patches2,2))
# cz = zip(*c)
# print np.vstack(cz[0]).shape

# d = [(1, 3),(2, 4)]

# e =  zip(*d)
# print e[0]
# f = np.stack(e[0], axis=0)
# print f
# print f.shape


Numerical Labels that index image: 
[0 1 2]
Channels: 
[('IF', 'ER'), ('IF', 'PR'), ('IF', 'HER2')]
Image types: ['IF']
Features: 
['intensity', 'xy']
ER features: 
['Mem', 'Nuc', 'Cyt']
97
(25, 3, 256, 256)
(25,)

Question: I am trying to transition from storing data in nested dictionaries to ndarrays - you will notice that I have moved the images to this format but not the features. Thoughts? The image arrays can be stored in an RDD but not the features - I'd like to fix this.

From above we have one imaging modalitity IF with three subchanels: ER, PR and HER2. Let's look at the ER subchannel:


In [6]:
ER = imcontroller.get_image_channel(('IF','ER'))
flash.show_channel(ER) # Note: this is the unadjusted image


Showing a pseudo-colored image:


In [7]:
print imcontroller.image_obj
flash.show_first_three_channels(imcontroller.image_obj)


Image
mode: local
dtype: uint16
shape: (3, 2048, 2048)

Patches:


In [8]:
patch_shape = (3, 256, 256)
imcontroller.patchify(patch_shape=patch_shape, overlap=64)
print imcontroller.patches.shape


patches = imcontroller.patches[0].reshape((-1,) + patch_shape)
print patches.shape
print np.stack(patches, axis=0).shape
class_num = 2
labels = np.ones(60) * int(class_num)
labels2 = np.ones(60) * int(class_num)
l = [labels, labels2]
print np.hstack(l)
print 

plt.imshow(imcontroller.patches[0,1,0,0,:, :])
# plt.show()

plt.imshow(patches[8, 0,:, :])
# plt.show()

for k in xrange(64):
    for i in xrange(8):
        for j in xrange(8):
            a = sum(sum(patches[k,0,:, :] == imcontroller.patches[0,i,j,0,:, :]))   
            if a > 60000:
                print (i, j, k)
                print'(%d, %d, %d)' % ((k / imcontroller.patches.shape[1]), (k % imcontroller.patches.shape[2]), k)
                print 'found match'
print 'done'
p1 = patches
p2 = patches
p3 = patches
p = [p1, p2, p3]


(1, 10, 10, 3, 256, 256)
(100, 3, 256, 256)
(100, 3, 256, 256)
[ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.
  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]

(0, 0, 0)
(0, 0, 0)
found match
(0, 1, 1)
(0, 1, 1)
found match
(0, 2, 2)
(0, 2, 2)
found match
(0, 3, 3)
(0, 3, 3)
found match
(0, 4, 4)
(0, 4, 4)
found match
(0, 5, 5)
(0, 5, 5)
found match
(0, 6, 6)
(0, 6, 6)
found match
(0, 7, 7)
(0, 7, 7)
found match
(1, 0, 10)
(1, 0, 10)
found match
(1, 1, 11)
(1, 1, 11)
found match
(1, 2, 12)
(1, 2, 12)
found match
(1, 3, 13)
(1, 3, 13)
found match
(1, 4, 14)
(1, 4, 14)
found match
(1, 5, 15)
(1, 5, 15)
found match
(1, 6, 16)
(1, 6, 16)
found match
(1, 7, 17)
(1, 7, 17)
found match
(2, 0, 20)
(2, 0, 20)
found match
(2, 1, 21)
(2, 1, 21)
found match
(2, 2, 22)
(2, 2, 22)
found match
(2, 3, 23)
(2, 3, 23)
found match
(2, 4, 24)
(2, 4, 24)
found match
(2, 5, 25)
(2, 5, 25)
found match
(2, 6, 26)
(2, 6, 26)
found match
(2, 7, 27)
(2, 7, 27)
found match
(3, 0, 30)
(3, 0, 30)
found match
(3, 1, 31)
(3, 1, 31)
found match
(3, 2, 32)
(3, 2, 32)
found match
(3, 3, 33)
(3, 3, 33)
found match
(3, 4, 34)
(3, 4, 34)
found match
(3, 5, 35)
(3, 5, 35)
found match
(3, 6, 36)
(3, 6, 36)
found match
(3, 7, 37)
(3, 7, 37)
found match
(4, 0, 40)
(4, 0, 40)
found match
(4, 1, 41)
(4, 1, 41)
found match
(4, 2, 42)
(4, 2, 42)
found match
(4, 3, 43)
(4, 3, 43)
found match
(4, 4, 44)
(4, 4, 44)
found match
(4, 5, 45)
(4, 5, 45)
found match
(4, 6, 46)
(4, 6, 46)
found match
(4, 7, 47)
(4, 7, 47)
found match
(5, 0, 50)
(5, 0, 50)
found match
(5, 1, 51)
(5, 1, 51)
found match
(5, 2, 52)
(5, 2, 52)
found match
(5, 3, 53)
(5, 3, 53)
found match
(5, 4, 54)
(5, 4, 54)
found match
(5, 5, 55)
(5, 5, 55)
found match
(5, 6, 56)
(5, 6, 56)
found match
(5, 7, 57)
(5, 7, 57)
found match
(6, 0, 60)
(6, 0, 60)
found match
(6, 1, 61)
(6, 1, 61)
found match
(6, 2, 62)
(6, 2, 62)
found match
(6, 3, 63)
(6, 3, 63)
found match
done

In [343]:
# flash.show_subset_patches(imcontroller.patches, (8,8), 0)

In [20]:
imcontroller.patchify(patch_shape=(512, 512, 3),overlap=200)
flash.show_subset_patches(imcontroller.patches, (4,4), 0)


ImageController and Spark:

The ImageController will be able to manage images stored in an RDD on Spark:

from pyspark import SparkContext sc = SparkContext() And then I will add something similar to:

imcontroller.set_mode('Spark', sc) # change mode and pass SparkContext

Questions:

  • Indexing these image and feature arrays?
  • Database Schema?
  • Organization: where should imtype-specific methods be housed? For example, we may want something like: `patch.is_informative()` but the implementation of `is_informative` depends on the type of image from which the patch was derived.
  • In terms of distributed computing, PySpark seems like a very good way to go:
  • A lot of computer vision tools are distributed as MATLAB toolboxes. Will these play nicely with the PySpark framework? Will it have to be rewritten? What about MATLAB Compiler Rutime (MCR)?
  • Matlab just introduced its own distributed computing/cloud computing options...how do those compare?
  • What about doing registration on the GPU instead of a cluster?

In [ ]: