Here we do some experiments with image resizing.
We will use pillow module (a fork of PIL), included with Anaconda 2.2.
For more info: http://pillow.readthedocs.org/
Start a cluster with an appropriate number of engines (= number of CPUs on your box) on the homepage Clusters tab before executing the next cell (you might need to restart the kernel).
In [1]:
from IPython.parallel import Client
rc = Client()
print "Using cluster with %d engines." % len(rc.ids)
Let us set up imports and parameters in one place for the cluster and the local engine:
In [2]:
%%px --local
import os
import os.path
from PIL import Image
src_dir = "/kaggle/retina/sample" # source directory of images to resize
trg_dir = "/kaggle/retina/resized" # target directory of the resized images
prefix = "resized_" # string to prepend to the resized file name
hsize = 256 # horizontal size of the resized image
vsize = 256 # vertical size of the resized image
all_files = filter(lambda x: x.endswith(".jpeg"), os.listdir(src_dir))
Load an image:
In [3]:
filename = all_files[0]
filepath = os.path.join(src_dir, filename)
%timeit -n1 -r1 Image.open(filepath)
im = Image.open(filepath)
Resize the image with default downsampling:
In [4]:
%timeit im.resize((hsize, vsize))
resized_im = im.resize((hsize, vsize), Image.NEAREST)
LANCZOS anti-aliasing method is recommended for downsampling by PIL tutorial, but is much slower:
In [5]:
%timeit im.resize((hsize, vsize), Image.LANCZOS)
Save the resized image.
Parameter value quality > 95 is not recommended due to excessive file size with minimal benefits, but we do not care.
More info on file formats can be found here: http://pillow.readthedocs.org/handbook/image-file-formats.html
In [6]:
if not os.path.exists(trg_dir):
os.makedirs(trg_dir)
resized_filepath = os.path.join(trg_dir, prefix + filename)
%timeit -n1 -r1 resized_im.save(resized_filepath, "JPEG", quality = 100)
Let us define functions that do the above in one go for all files:
In [7]:
%%px --local
def resize_method(filename, method):
filepath = os.path.join(src_dir, filename)
im = Image.open(filepath)
resized_im = im.resize((hsize, vsize), method)
resized_filepath = os.path.join(trg_dir, prefix + filename)
resized_im.save(resized_filepath, "JPEG", quality = 100)
def resize_NEAREST(filename):
resize_method(filename, Image.NEAREST)
def resize_LANCZOS(filename):
resize_method(filename, Image.LANCZOS)
For quick and dirty experiments we can use the default downsampling. Here we create downsized copies of all files in the sample directory with two downsampling methods:
In [8]:
%timeit -n1 -r1 map(resize_NEAREST, all_files)
%timeit -n1 -r1 map(resize_LANCZOS, all_files)
Since the processing is dominated by the CPU-bound resizing we can benefit from parallelization:
In [9]:
v = rc[:]
%timeit -n1 -r1 v.map_sync(resize_NEAREST, all_files)
%timeit -n1 -r1 v.map_sync(resize_LANCZOS, all_files)