Here we do some experiments with image resizing.
We will use pillow module (a fork of PIL), included with Anaconda 2.2.
For more info: http://pillow.readthedocs.org/

Start a cluster with an appropriate number of engines (= number of CPUs on your box) on the homepage Clusters tab before executing the next cell (you might need to restart the kernel).


In [1]:
from IPython.parallel import Client
rc = Client()
print "Using cluster with %d engines." % len(rc.ids)


Using cluster with 2 engines.

Let us set up imports and parameters in one place for the cluster and the local engine:


In [2]:
%%px --local
import os
import os.path
from PIL import Image

src_dir = "/kaggle/retina/sample" # source directory of images to resize 
trg_dir = "/kaggle/retina/resized" # target directory of the resized images 
prefix = "resized_" # string to prepend to the resized file name
hsize = 256 # horizontal size of the resized image
vsize = 256 # vertical size of the resized image  
all_files = filter(lambda x: x.endswith(".jpeg"), os.listdir(src_dir))

Load an image:


In [3]:
filename = all_files[0]
filepath = os.path.join(src_dir, filename)
%timeit -n1 -r1 Image.open(filepath)
im = Image.open(filepath)


1 loops, best of 1: 8.6 ms per loop

Resize the image with default downsampling:


In [4]:
%timeit im.resize((hsize, vsize))
resized_im = im.resize((hsize, vsize), Image.NEAREST)


The slowest run took 477.69 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 448 µs per loop

LANCZOS anti-aliasing method is recommended for downsampling by PIL tutorial, but is much slower:


In [5]:
%timeit im.resize((hsize, vsize), Image.LANCZOS)


1 loops, best of 3: 195 ms per loop

Save the resized image.
Parameter value quality > 95 is not recommended due to excessive file size with minimal benefits, but we do not care.
More info on file formats can be found here: http://pillow.readthedocs.org/handbook/image-file-formats.html


In [6]:
if not os.path.exists(trg_dir):
    os.makedirs(trg_dir)
    
resized_filepath = os.path.join(trg_dir, prefix + filename)
%timeit -n1 -r1 resized_im.save(resized_filepath, "JPEG", quality = 100)


1 loops, best of 1: 11.8 ms per loop

Let us define functions that do the above in one go for all files:


In [7]:
%%px --local
def resize_method(filename, method):
    filepath = os.path.join(src_dir, filename)
    im = Image.open(filepath)
    resized_im = im.resize((hsize, vsize), method)
    resized_filepath = os.path.join(trg_dir, prefix + filename)
    resized_im.save(resized_filepath, "JPEG", quality = 100)

def resize_NEAREST(filename):
    resize_method(filename, Image.NEAREST)

def resize_LANCZOS(filename):
    resize_method(filename, Image.LANCZOS)

For quick and dirty experiments we can use the default downsampling. Here we create downsized copies of all files in the sample directory with two downsampling methods:


In [8]:
%timeit -n1 -r1 map(resize_NEAREST, all_files)
%timeit -n1 -r1 map(resize_LANCZOS, all_files)


1 loops, best of 1: 1.64 s per loop
1 loops, best of 1: 3.1 s per loop

Since the processing is dominated by the CPU-bound resizing we can benefit from parallelization:


In [9]:
v = rc[:]
%timeit -n1 -r1 v.map_sync(resize_NEAREST, all_files)
%timeit -n1 -r1 v.map_sync(resize_LANCZOS, all_files)


1 loops, best of 1: 1.07 s per loop
1 loops, best of 1: 1.77 s per loop