An Images
object is a distributed collection of either 2D images or 3D volumes. It supports several simple manipulations of image content, exporting image data, and conversion to other data formats.
In [1]:
%matplotlib inline
In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
sns.set_context('notebook')
from thunder import Colorize
image = Colorize.image
In [3]:
data = tsc.loadExample('fish-images')
We can inspect the object to see basic properties, like the number of records (which here are images):
In [4]:
data
Out[4]:
Note that the dimensions and dtype are missing. That's because we haven't computed them yet.
In [5]:
data.dims.max
Out[5]:
In [6]:
data.dims.min
Out[6]:
In [7]:
data.dims.count
Out[7]:
If you now inpsect the object again, you'll see those fields populated. This is an example of a lazy operation; because computing the dimensions requires loading data, we don't compute it automatically, but once we have computed it, we keep it around in case we need it again.
In [8]:
data
Out[8]:
We can also look at the first record, which is a key-value pair. The first entry, the key, is an integer identifier (this is the first image, so it is 0).
In [9]:
data.keys().first()
Out[9]:
In Images data, a single record usually corresponds to a single input file. The ordering of keys comes from sorting the names of the input files alphanumerically. In a movie, where each image file represents a single time point, ordering file names so that alphabetically later names correspond to later time points (for instance, by appending a postfix '_tp000', '_tp001', etc to a common file name) will cause the resulting Images data to be sorted in temporal order. This will simplify ease conversion into temporally-oriented Series
data and other analyses.
The value of an Images record is the image volume, as an n-dimensional numpy array:
In [10]:
img = data.values().first()
img.shape
Out[10]:
We can look at it as an image. It's three-dimensional, with two planes to look at:
In [11]:
image(img[:,:,0])
Here's the second plane:
In [12]:
image(img[:,:,1])
As an example of 2D data, we can load another one of the examples
In [13]:
data = tsc.loadExample('mouse-images')
data.dims.count
Out[13]:
Look at the first image
In [14]:
img = data.values().first()
image(img)
Bracket notation makes it easy to grab images with particular indexes, by specifying the key in the bracket
In [15]:
image(data[10])
A common manipulation on volumetric data is computing a maximum projection across one dimension. Here, we compute the projections, and then show the first one.
In [16]:
data = tsc.loadExample('fish-images')
projs = data.maxProjection(axis=2)
img = projs.first()[1]
image(img)
We can also subselect a set of planes, specifying the top and bottom of the desired range:
In [17]:
sub = data.planes(0,0).values().first()
image(img)
And we can subsample in space:
In [18]:
img = data.subsample([5,5,1]).values().first()
image(img[:,:,0])
Finally, we can perform operations that aggregate across images. For example, computing the standard deviation:
In [19]:
img = data.stdev()
image(img[:,:,0])
The result of distributed image operations can be saved by exporting each image to a png
file. You can pass the same arguments as to the matplotlib
function imshow
(NOTE: This support is currently limited to saving 2D image data.)
data.maxProjection().saveAsPngs('directory', vmin=0, vmax=200, cmap='gray')
We commonly encounter images or volumes that correspond to different time points, e.g. from a movie. It is useful to convert these data into a Series
object, the object used in Thunder for distributed one-dimensional array data. After this conversion, the Series
will be a distributed collection of records, where each record has a pixel identifier as a key, and a time series as a value.
There are two ways to convert image or volume data into Series
data. The first is to load Images
data and then convert it into Series
data. This can be done on an Images
object using the toSeries()
method, and it can also be done in one step using the ThunderContext
method loadImagesAsSeries
. Here, we load the example Images
data and convert it to a Series
.
In [20]:
data = tsc.loadExample('fish-images')
ts = data.toSeries()
ts.cache();
Let's check some properties of the resulting Series
to make sure the conversion we did makes sense. We have twenty images, so there should be twenty time points, and the dimensions of the keys should match the dimensions of the original images.
In [21]:
ts.index
Out[21]:
In [22]:
ts.dims.max
Out[22]:
In [23]:
ts.dims.min
Out[23]:
In [24]:
ts.dims.count
Out[24]:
We can also quickly look at some example time series:
In [25]:
samples = ts.toTimeSeries().normalize().subset(nsamples=50, thresh=0.05)
plt.plot(samples.T);
For a large data set that will be analyzed repeatedly as a Series
, it will ultimately be faster and more convienient to save Images
data to a collection of flat binary files on a distributed file system, which can in turn be read back in directly as a Series
, rather than repeatedly converting the images to a Series
object. This can be performed directly on an Images
object, as in:
data = tsc.loadImages('my-image-directory', inputFormat='tif', startIdx=0, stopIdx=20)
data.saveAsBinarySeries('my-binary-directory', overwrite=True)
ts = tsc.loadSeries('my-binary-directory')