Counting Stars with NumPy

This example introduces some of the image processing capabilities available with NumPy and the SciPy ndimage package. More extensive documentation and tutorials can be found through the SciPy Lectures series.

Image I/O

We can use the SciPy ndimage library to read image data into NumPy arrays. If we want to fetch a file off the web, we also need some help from the requests and StringIO libraries:


In [ ]:
import scipy.ndimage as ndi
import requests
from StringIO import StringIO

In [ ]:
#Pick an image from the list above and fetch it with requests.get 
#The default picture here is of M45 - the Pleiades Star Cluster.
response = requests.get("http://imgsrc.hubblesite.org/hu/db/images/hs-2004-20-a-large_web.jpg")
pic = ndi.imread(StringIO(response.content))

Image Visualization

We can plot the image using matplotlib:


In [ ]:
%pylab inline
import matplotlib.pyplot as plt

In [ ]:
plt.imshow(pic);

Image Inspection

We can examine the image properties:


In [ ]:
print pic.shape

Pixel coordinates are (column,row).

Colors are represented by RGB triples. Black is (0,0,0), White is (255, 255, 255) or (0xFF, 0xFF, 0xFF) in hexadecimal. Think of it as a color cube with the three axes representing the different possible colors. The furthest away from the origin (black) is white.


In [ ]:
#Color array [R,G,B] of very first pixel
print pic[0,0]

We could write code to find the brightest pixel in the image, where "brightest" means highest value of R+G+B. For the 256 color scale, the greatest possible value is 3 * 255, or 765. One way to do this would be to write a set of nested loops over the pixel dimensions, calculating the sum R+G+B for each pixel, but that would be rather tedious and slow.

We could process the information faster if we take advantage of the speedy NumPy slicing, aggregates, and ufuncs. Remember that any time we can eliminate interpreted Python loops we save a lot of processing time.


In [ ]:
#find value of max pixel with aggregates
print pic.sum(axis=2).max() #numbering from 0, axis 2 is the color depth

Image Feature Extraction

Now that we know how to read in the image as a NumPy array, let's count the stars above some threshold brightness. Start by converting the image to B/W, so that which pixels belong to stars and which don't is unambiguous. We'll use black for stars and white for background, since it's easier to see black-on-white than the reverse.


In [ ]:
def monochrome(pic_array, threshold):
    """replace the RGB values in the loaded image with either 
    black or white depending on whether its total 
    luminance is above or below some threshold 
    passed in by the user"""
    mask = (pic_array.sum(axis=2) >= threshold) #could also be done in one step
    pic_array[mask] = 0 #BLACK - broadcasting at work here
    pic_array[~mask] = 255 #WHITE - broadcasting at work here
    return

In [ ]:
#Get another copy to convert to B/W
bwpic = ndi.imread(StringIO(response.content))

#This threshold is a scalar, not an RGB triple
#We're looking for pixels whose total color value is 600 or greater
monochrome(bwpic,200+200+200)
plt.imshow(bwpic);

The way to count the features (stars) in the image is to identify "blobs" of connected or adjacent black pixels.

A traditional implementation of this algorithm using plain Python loops is presented in the Multimedia Programming lesson from Software Carpentry. This was covered in the notebook Counting Stars.

Let's see how to implement such an algorithm much more efficiently using numpy and scipy.ndimage.

The scipy.ndimage.label function will use a structuring element (cross-shaped by default) to search for features. As an example, consider the simple array:


In [ ]:
a = np.array([[0,0,1,1,0,0],
              [0,0,0,1,0,0],
              [1,1,0,0,1,0],
              [0,0,0,1,0,0]])

There are four unique features here, if we only count those that have neighbors along a cross-shaped structuring element.


In [ ]:
labeled_array, num_features = ndi.label(a)

In [ ]:
print(num_features)

In [ ]:
print(labeled_array)

If we wish to consider elements connected on the diagonal, as well as the cross structure, we define a new structuring element:


In [ ]:
s = [[1,1,1],
     [1,1,1],
     [1,1,1]]

#Note, that scipy.ndimage.generate_binary_structure(2,2) would also do the same thing.
print s

Label the image using the new structuring element:


In [ ]:
labeled_array, num_features = ndi.label(a, structure=s)

In [ ]:
print(num_features)

Note that features 1, 3, and 4 from above are now considered a single feature


In [ ]:
print(labeled_array)

Let's use ndi.label to count up the stars in our B/W starfield image.


In [ ]:
labeled_array, num_stars = ndi.label(~bwpic) #Count and label the complement

In [ ]:
print num_stars

In [ ]:
plt.imshow(labeled_array);

Label returns an array the same shape as the input where each "unique feature has a unique value", so if you want the indices of the features you use a list comprehension to extract the exact feature indices. Something like:

label_indices = [(labeled_array[:,:,0] == i).nonzero() for i in xrange(1, num_stars+1)]

or use the ndi.find_objects method to obtain a tuple of feature locations as slices to obtain the general location of the star but not necessarily the correct shape.


In [ ]:
locations = ndi.find_objects(labeled_array)
print locations[9]

In [ ]:
label_indices = [(labeled_array[:,:,0] == i).nonzero() for i in xrange(1, num_stars+1)]
print label_indices[9]

Let's change the color of the largest star in the field to red. To find the largest star, look at the lengths of the arrays stored in label_indices.


In [ ]:
star_sizes = [(label_indices[i-1][0]).size for i in xrange(1, num_stars+1)]
print len(star_sizes)

In [ ]:
biggest_star = np.where(star_sizes == np.max(star_sizes))[0]
print biggest_star

In [ ]:
print star_sizes[biggest_star]

In [ ]:
bwpic[label_indices[biggest_star][0],label_indices[biggest_star][1],:] = (255,0,0)
plt.imshow(bwpic);

Exercises

For fun and practice, you can use these tricks to analyze the whole set of pictures listed at the top of this section. Suggested activities include:

  • Find the brightest pixel within each star identified using the scipy.ndimage.label function
  • Draw the image with different colors for identified stars of different brightnesses
  • Use the scipy.ndimage.center_of_mass function to identify the "location" of each star in pixel coordinates.
  • Use NumPy's I/O functions to write out a text file containing

    • a header which includes
      • the link to the image
      • the threshold value used in the conversion to monochrome
      • the structuring element used to count its stars
      • how many stars were found
      • a description of the data in the subsequent columns
    • a list of the star data organized in three columns and formatted for easy reading as follows:
      • location (x,y in pixel units)
      • maximum brightness value for the star

Summary:

NumPy offers convenient and fast numerical data manipulation with a large body of support libraries. Use it with large datasets to very quickly slice through the information and extract meaningful results.

The advantage of using NumPy is all about moving loops into compiled code.

For more details and documentation, consult the SciPy lecture series or the NumPy/SciPy documentation.