This example introduces some of the image processing capabilities available with NumPy and the SciPy ndimage
package. More extensive documentation and tutorials can be found through the SciPy Lectures series.
Here is a list of beautiful star field images taken by the Hubble Space Telescope:
We can use the SciPy ndimage
library to read image data into NumPy arrays. If we want to fetch a file off the web, we also need some help from the requests
and StringIO
libraries:
In [ ]:
import scipy.ndimage as ndi
import requests
from StringIO import StringIO
In [ ]:
#Pick an image from the list above and fetch it with requests.get
#The default picture here is of M45 - the Pleiades Star Cluster.
response = requests.get("http://imgsrc.hubblesite.org/hu/db/images/hs-2004-20-a-large_web.jpg")
pic = ndi.imread(StringIO(response.content))
We can plot the image using matplotlib
:
In [ ]:
%pylab inline
import matplotlib.pyplot as plt
In [ ]:
plt.imshow(pic);
We can examine the image properties:
In [ ]:
print pic.shape
Pixel coordinates are (column,row).
Colors are represented by RGB triples. Black is (0,0,0), White is (255, 255, 255) or (0xFF, 0xFF, 0xFF) in hexadecimal. Think of it as a color cube with the three axes representing the different possible colors. The furthest away from the origin (black) is white.
In [ ]:
#Color array [R,G,B] of very first pixel
print pic[0,0]
We could write code to find the brightest pixel in the image, where "brightest" means highest value of R+G+B. For the 256 color scale, the greatest possible value is 3 * 255, or 765. One way to do this would be to write a set of nested loops over the pixel dimensions, calculating the sum R+G+B for each pixel, but that would be rather tedious and slow.
We could process the information faster if we take advantage of the speedy NumPy slicing, aggregates, and ufuncs. Remember that any time we can eliminate interpreted Python loops we save a lot of processing time.
In [ ]:
#find value of max pixel with aggregates
print pic.sum(axis=2).max() #numbering from 0, axis 2 is the color depth
Now that we know how to read in the image as a NumPy array, let's count the stars above some threshold brightness. Start by converting the image to B/W, so that which pixels belong to stars and which don't is unambiguous. We'll use black for stars and white for background, since it's easier to see black-on-white than the reverse.
In [ ]:
def monochrome(pic_array, threshold):
"""replace the RGB values in the loaded image with either
black or white depending on whether its total
luminance is above or below some threshold
passed in by the user"""
mask = (pic_array.sum(axis=2) >= threshold) #could also be done in one step
pic_array[mask] = 0 #BLACK - broadcasting at work here
pic_array[~mask] = 255 #WHITE - broadcasting at work here
return
In [ ]:
#Get another copy to convert to B/W
bwpic = ndi.imread(StringIO(response.content))
#This threshold is a scalar, not an RGB triple
#We're looking for pixels whose total color value is 600 or greater
monochrome(bwpic,200+200+200)
plt.imshow(bwpic);
The way to count the features (stars) in the image is to identify "blobs" of connected or adjacent black pixels.
A traditional implementation of this algorithm using plain Python loops is presented in the Multimedia Programming lesson from Software Carpentry. This was covered in the notebook Counting Stars.
Let's see how to implement such an algorithm much more efficiently using numpy
and scipy.ndimage
.
The scipy.ndimage.label
function will use a structuring element (cross-shaped by default) to search for features. As an example, consider the simple array:
In [ ]:
a = np.array([[0,0,1,1,0,0],
[0,0,0,1,0,0],
[1,1,0,0,1,0],
[0,0,0,1,0,0]])
There are four unique features here, if we only count those that have neighbors along a cross-shaped structuring element.
In [ ]:
labeled_array, num_features = ndi.label(a)
In [ ]:
print(num_features)
In [ ]:
print(labeled_array)
If we wish to consider elements connected on the diagonal, as well as the cross structure, we define a new structuring element:
In [ ]:
s = [[1,1,1],
[1,1,1],
[1,1,1]]
#Note, that scipy.ndimage.generate_binary_structure(2,2) would also do the same thing.
print s
Label the image using the new structuring element:
In [ ]:
labeled_array, num_features = ndi.label(a, structure=s)
In [ ]:
print(num_features)
Note that features 1, 3, and 4 from above are now considered a single feature
In [ ]:
print(labeled_array)
Let's use ndi.label
to count up the stars in our B/W starfield image.
In [ ]:
labeled_array, num_stars = ndi.label(~bwpic) #Count and label the complement
In [ ]:
print num_stars
In [ ]:
plt.imshow(labeled_array);
Label returns an array the same shape as the input where each "unique feature has a unique value", so if you want the indices of the features you use a list comprehension to extract the exact feature indices. Something like:
label_indices = [(labeled_array[:,:,0] == i).nonzero() for i in xrange(1, num_stars+1)]
or use the ndi.find_objects
method to obtain a tuple of feature locations as slices to obtain the general location of the star but not necessarily the correct shape.
In [ ]:
locations = ndi.find_objects(labeled_array)
print locations[9]
In [ ]:
label_indices = [(labeled_array[:,:,0] == i).nonzero() for i in xrange(1, num_stars+1)]
print label_indices[9]
Let's change the color of the largest star in the field to red. To find the largest star, look at the lengths of the arrays stored in label_indices
.
In [ ]:
star_sizes = [(label_indices[i-1][0]).size for i in xrange(1, num_stars+1)]
print len(star_sizes)
In [ ]:
biggest_star = np.where(star_sizes == np.max(star_sizes))[0]
print biggest_star
In [ ]:
print star_sizes[biggest_star]
In [ ]:
bwpic[label_indices[biggest_star][0],label_indices[biggest_star][1],:] = (255,0,0)
plt.imshow(bwpic);
For fun and practice, you can use these tricks to analyze the whole set of pictures listed at the top of this section. Suggested activities include:
scipy.ndimage.label
functionscipy.ndimage.center_of_mass
function to identify the "location" of each star in pixel coordinates.Use NumPy's I/O functions to write out a text file containing
NumPy offers convenient and fast numerical data manipulation with a large body of support libraries. Use it with large datasets to very quickly slice through the information and extract meaningful results.
The advantage of using NumPy is all about moving loops into compiled code.
For more details and documentation, consult the SciPy lecture series or the NumPy/SciPy documentation.