Data Handling Utilities

tiff file directory to tiff stack conversion

A utility script that can be executed from the command line to convert tif files in a directory into a tif stack:


In [1]:
%%bash
build_tiff_stack.py --help


usage: build_tiff_stack.py [-h] [--filename FILENAME] [--dir DIR] [--pattern PATTERN] [--verbose] [--nocache]

A script to build a .tiff image stack from .tiff image files in a directory.

optional arguments:
  -h, --help           show this help message and exit
  --filename FILENAME  The filename of the output stack. (default: stack.tif)
  --dir DIR            The path to the directory to load tiff files from. (default: )
  --pattern PATTERN    The regexp pattern for filename selection. (default: \w*.tif$)
  --verbose
  --nocache            Generate faster loading cache files. (default: True)

The --pattern argument allows you to define a regular expression pattern for the files in --dir to build a stack only from files that match the regular expression pattern, ie --pattern ChanA will build a stack from files that contain ChanA in its name.
To select files whose name begin with ChanA write --pattern ^Chan.

Speed

It took approximately 3 minutes to build a stack with 3000 files.

tiff stack extraction from raw files

A utility script that can be used to extract tiff stacks from raw data files acquired by ThorLabs microscope.


In [1]:
%%bash
extract_channels_from_raw.py --help


usage: extract_channels_from_raw.py [-h] [--imwidth IMWIDTH] [--imheight IMHEIGHT] [--verbose] [--nocache] rawfile

A script to extract different channels from raw data files and save them as tiff stacks.

positional arguments:
  rawfile              The raw data file from which the channels are to be extracted.

optional arguments:
  -h, --help           show this help message and exit
  --imwidth IMWIDTH    The width of the images in the stack. (default: 512)
  --imheight IMHEIGHT  The height of the images in the stack. (default: 512)
  --verbose
  --nocache            Do not generate faster loading cache files. (default: False)

tiff stacks as numpy arrays

Furthermore I wrote functions that can easily load these tiff stacks into python (also an IPython notebook) as numpy arrays.
But because loading these large tiffs takes similarly long as building a stack, I added a caching layer that saves faster loading hdf5 binaries of the arrays.

This also explains the --nocache option for the build_tiff_stack.py scripts. By default the script right away saves a fast loading cache file. Its name is simply filename.hdf5. But be aware that this doubles the volume of your data.

However, using this caching tiff stacks now load like a charm:


In [2]:
datafiles = [
    '/home/michael/datac/data1/ChanA_0001_0001_0001.tif',
    '/home/michael/datac/data1/ChanB_0001_0001_0001.tif',
    '/home/michael/datac/data2/ChanA_0001_0001_0001.tif',
    '/home/michael/datac/data2/ChanB_0001_0001_0001.tif',
    '/home/michael/datac/data3/ChanA_0001_0001_0001.tif',
    '/home/michael/datac/data3/ChanB_0001_0001_0001.tif',
    ]

In [4]:
import neuralyzer

In [5]:
%%timeit
stackdata = neuralyzer.get_data(datafiles[1])


[ 2015-04-07 17:10:03 ] [ log ] [ DEBUG ] : stdoutloglevel: DEBUG
[ 2015-04-07 17:10:03 ] [ log ] [ INFO ] : NEURALYZER LOGGER STARTED.
[ 2015-04-07 17:10:03 ] [ data_handler ] [ DEBUG ] : root_path set to /home/michael/lib/neuralyzer/notebooks/doc
[ 2015-04-07 17:10:04 ] [ data_handler ] [ DEBUG ] : loaded data from cache file: /home/michael/datac/data1/ChanB_0001_0001_0001.tif.hdf5
[ 2015-04-07 17:10:04 ] [ data_handler ] [ DEBUG ] : root_path set to /home/michael/lib/neuralyzer/notebooks/doc
[ 2015-04-07 17:10:05 ] [ data_handler ] [ DEBUG ] : loaded data from cache file: /home/michael/datac/data1/ChanB_0001_0001_0001.tif.hdf5
[ 2015-04-07 17:10:05 ] [ data_handler ] [ DEBUG ] : root_path set to /home/michael/lib/neuralyzer/notebooks/doc
[ 2015-04-07 17:10:06 ] [ data_handler ] [ DEBUG ] : loaded data from cache file: /home/michael/datac/data1/ChanB_0001_0001_0001.tif.hdf5
[ 2015-04-07 17:10:06 ] [ data_handler ] [ DEBUG ] : root_path set to /home/michael/lib/neuralyzer/notebooks/doc
[ 2015-04-07 17:10:07 ] [ data_handler ] [ DEBUG ] : loaded data from cache file: /home/michael/datac/data1/ChanB_0001_0001_0001.tif.hdf5
1 loops, best of 3: 819 ms per loop

on kumo it takes on average ~ 0.8 s to load a 1.5 G stack, whereas on my computer it takes now on average 2.13 s to load the 1.5 G stacks.

We just saw, the utilities come with a logger ..


In [5]:
stackdata = neuralyzer.get_data(datafiles[0])


[ 2015-04-07 16:59:47 ] [ data_handler ] [ DEBUG ] : root_path set to /home/michael/lib/neuralyzer/notebooks/doc
[ 2015-04-07 16:59:48 ] [ data_handler ] [ INFO ] : loaded data from cache file: /home/michael/datac/data1/ChanA_0001_0001_0001.tif.hdf5

In [6]:
whos


Variable     Type       Data/Info
---------------------------------
datafiles    list       n=6
neuralyzer   module     <module 'neuralyzer' from<...>neuralyzer/__init__.pyc'>
stackdata    ndarray    3000x512x512: 786432000 elems, type `uint16`, 1572864000 bytes (1500 Mb)