This example introduces some of the image processing capabilities available with NumPy and the SciPy
ndimage package. More extensive documentation and tutorials can be found through the SciPy Lectures series.
Here is a list of beautiful star field images taken by the Hubble Space Telescope:
We can use the SciPy
ndimage library to read image data into NumPy arrays. If we want to fetch a file off the web, we also need some help from the
In [ ]:import scipy.ndimage as ndi import requests from StringIO import StringIO
In [ ]:#Pick an image from the list above and fetch it with requests.get #The default picture here is of M45 - the Pleiades Star Cluster. response = requests.get("http://imgsrc.hubblesite.org/hu/db/images/hs-2004-20-a-large_web.jpg") pic = ndi.imread(StringIO(response.content))
We can plot the image using
In [ ]:%pylab inline import matplotlib.pyplot as plt
In [ ]:plt.imshow(pic);
We can examine the image properties:
In [ ]:print pic.shape
Pixel coordinates are (column,row).
Colors are represented by RGB triples. Black is (0,0,0), White is (255, 255, 255) or (0xFF, 0xFF, 0xFF) in hexadecimal. Think of it as a color cube with the three axes representing the different possible colors. The furthest away from the origin (black) is white.
In [ ]:#Color array [R,G,B] of very first pixel print pic[0,0]
We could write code to find the brightest pixel in the image, where "brightest" means highest value of R+G+B. For the 256 color scale, the greatest possible value is 3 * 255, or 765. One way to do this would be to write a set of nested loops over the pixel dimensions, calculating the sum R+G+B for each pixel, but that would be rather tedious and slow.
We could process the information faster if we take advantage of the speedy NumPy slicing, aggregates, and ufuncs. Remember that any time we can eliminate interpreted Python loops we save a lot of processing time.
In [ ]:#find value of max pixel with aggregates print pic.sum(axis=2).max() #numbering from 0, axis 2 is the color depth
Now that we know how to read in the image as a NumPy array, let's count the stars above some threshold brightness. Start by converting the image to B/W, so that which pixels belong to stars and which don't is unambiguous. We'll use black for stars and white for background, since it's easier to see black-on-white than the reverse.
In [ ]:def monochrome(pic_array, threshold): """replace the RGB values in the loaded image with either black or white depending on whether its total luminance is above or below some threshold passed in by the user""" mask = (pic_array.sum(axis=2) >= threshold) #could also be done in one step pic_array[mask] = 0 #BLACK - broadcasting at work here pic_array[~mask] = 255 #WHITE - broadcasting at work here return
In [ ]:#Get another copy to convert to B/W bwpic = ndi.imread(StringIO(response.content)) #This threshold is a scalar, not an RGB triple #We're looking for pixels whose total color value is 600 or greater monochrome(bwpic,200+200+200) plt.imshow(bwpic);
The way to count the features (stars) in the image is to identify "blobs" of connected or adjacent black pixels.
scipy.ndimage.label function will use a structuring element (cross-shaped by default) to search for features. As an example, consider the simple array:
In [ ]:a = np.array([[0,0,1,1,0,0], [0,0,0,1,0,0], [1,1,0,0,1,0], [0,0,0,1,0,0]])
There are four unique features here, if we only count those that have neighbors along a cross-shaped structuring element.
In [ ]:labeled_array, num_features = ndi.label(a)
In [ ]:print(num_features)
In [ ]:print(labeled_array)
If we wish to consider elements connected on the diagonal, as well as the cross structure, we define a new structuring element:
In [ ]:s = [[1,1,1], [1,1,1], [1,1,1]] #Note, that scipy.ndimage.generate_binary_structure(2,2) would also do the same thing. print s
Label the image using the new structuring element:
In [ ]:labeled_array, num_features = ndi.label(a, structure=s)
In [ ]:print(num_features)
Note that features 1, 3, and 4 from above are now considered a single feature
In [ ]:print(labeled_array)
ndi.label to count up the stars in our B/W starfield image.
In [ ]:labeled_array, num_stars = ndi.label(~bwpic) #Count and label the complement
In [ ]:print num_stars
In [ ]:plt.imshow(labeled_array);
Label returns an array the same shape as the input where each "unique feature has a unique value", so if you want the indices of the features you use a list comprehension to extract the exact feature indices. Something like:
label_indices = [(labeled_array[:,:,0] == i).nonzero() for i in xrange(1, num_stars+1)]
or use the
ndi.find_objects method to obtain a tuple of feature locations as slices to obtain the general location of the star but not necessarily the correct shape.
In [ ]:locations = ndi.find_objects(labeled_array) print locations
In [ ]:label_indices = [(labeled_array[:,:,0] == i).nonzero() for i in xrange(1, num_stars+1)] print label_indices
Let's change the color of the largest star in the field to red. To find the largest star, look at the lengths of the arrays stored in
In [ ]:star_sizes = [(label_indices[i-1]).size for i in xrange(1, num_stars+1)] print len(star_sizes)
In [ ]:biggest_star = np.where(star_sizes == np.max(star_sizes)) print biggest_star
In [ ]:print star_sizes[biggest_star]
In [ ]:bwpic[label_indices[biggest_star],label_indices[biggest_star],:] = (255,0,0) plt.imshow(bwpic);
For fun and practice, you can use these tricks to analyze the whole set of pictures listed at the top of this section. Suggested activities include:
scipy.ndimage.center_of_massfunction to identify the "location" of each star in pixel coordinates.
Use NumPy's I/O functions to write out a text file containing
NumPy offers convenient and fast numerical data manipulation with a large body of support libraries. Use it with large datasets to very quickly slice through the information and extract meaningful results.
The advantage of using NumPy is all about moving loops into compiled code.