# Counting Stars with NumPy

This example introduces some of the image processing capabilities available with NumPy and the SciPy `ndimage` package. More extensive documentation and tutorials can be found through the SciPy Lectures series.

### Image I/O

We can use the SciPy `ndimage` library to read image data into NumPy arrays. If we want to fetch a file off the web, we also need some help from the `requests` and `StringIO` libraries:

``````

In [ ]:

import scipy.ndimage as ndi
import requests
from StringIO import StringIO

``````
``````

In [ ]:

#Pick an image from the list above and fetch it with requests.get
#The default picture here is of M45 - the Pleiades Star Cluster.
response = requests.get("http://imgsrc.hubblesite.org/hu/db/images/hs-2004-20-a-large_web.jpg")

``````

### Image Visualization

We can plot the image using `matplotlib`:

``````

In [ ]:

%pylab inline
import matplotlib.pyplot as plt

``````
``````

In [ ]:

plt.imshow(pic);

``````

### Image Inspection

We can examine the image properties:

``````

In [ ]:

print pic.shape

``````

Pixel coordinates are (column,row).

Colors are represented by RGB triples. Black is (0,0,0), White is (255, 255, 255) or (0xFF, 0xFF, 0xFF) in hexadecimal. Think of it as a color cube with the three axes representing the different possible colors. The furthest away from the origin (black) is white.

``````

In [ ]:

#Color array [R,G,B] of very first pixel
print pic[0,0]

``````

We could write code to find the brightest pixel in the image, where "brightest" means highest value of R+G+B. For the 256 color scale, the greatest possible value is 3 * 255, or 765. One way to do this would be to write a set of nested loops over the pixel dimensions, calculating the sum R+G+B for each pixel, but that would be rather tedious and slow.

We could process the information faster if we take advantage of the speedy NumPy slicing, aggregates, and ufuncs. Remember that any time we can eliminate interpreted Python loops we save a lot of processing time.

``````

In [ ]:

#find value of max pixel with aggregates
print pic.sum(axis=2).max() #numbering from 0, axis 2 is the color depth

``````

### Image Feature Extraction

Now that we know how to read in the image as a NumPy array, let's count the stars above some threshold brightness. Start by converting the image to B/W, so that which pixels belong to stars and which don't is unambiguous. We'll use black for stars and white for background, since it's easier to see black-on-white than the reverse.

``````

In [ ]:

def monochrome(pic_array, threshold):
"""replace the RGB values in the loaded image with either
black or white depending on whether its total
luminance is above or below some threshold
passed in by the user"""
mask = (pic_array.sum(axis=2) >= threshold) #could also be done in one step
return

``````
``````

In [ ]:

#Get another copy to convert to B/W

#This threshold is a scalar, not an RGB triple
#We're looking for pixels whose total color value is 600 or greater
monochrome(bwpic,200+200+200)
plt.imshow(bwpic);

``````

The way to count the features (stars) in the image is to identify "blobs" of connected or adjacent black pixels.

A traditional implementation of this algorithm using plain Python loops is presented in the Multimedia Programming lesson from Software Carpentry. This was covered in the notebook Counting Stars.

Let's see how to implement such an algorithm much more efficiently using `numpy` and `scipy.ndimage`.

The `scipy.ndimage.label` function will use a structuring element (cross-shaped by default) to search for features. As an example, consider the simple array:

``````

In [ ]:

a = np.array([[0,0,1,1,0,0],
[0,0,0,1,0,0],
[1,1,0,0,1,0],
[0,0,0,1,0,0]])

``````

There are four unique features here, if we only count those that have neighbors along a cross-shaped structuring element.

``````

In [ ]:

labeled_array, num_features = ndi.label(a)

``````
``````

In [ ]:

print(num_features)

``````
``````

In [ ]:

print(labeled_array)

``````

If we wish to consider elements connected on the diagonal, as well as the cross structure, we define a new structuring element:

``````

In [ ]:

s = [[1,1,1],
[1,1,1],
[1,1,1]]

#Note, that scipy.ndimage.generate_binary_structure(2,2) would also do the same thing.
print s

``````

Label the image using the new structuring element:

``````

In [ ]:

labeled_array, num_features = ndi.label(a, structure=s)

``````
``````

In [ ]:

print(num_features)

``````

Note that features 1, 3, and 4 from above are now considered a single feature

``````

In [ ]:

print(labeled_array)

``````

Let's use `ndi.label` to count up the stars in our B/W starfield image.

``````

In [ ]:

labeled_array, num_stars = ndi.label(~bwpic) #Count and label the complement

``````
``````

In [ ]:

print num_stars

``````
``````

In [ ]:

plt.imshow(labeled_array);

``````

Label returns an array the same shape as the input where each "unique feature has a unique value", so if you want the indices of the features you use a list comprehension to extract the exact feature indices. Something like:

`label_indices = [(labeled_array[:,:,0] == i).nonzero() for i in xrange(1, num_stars+1)]`

or use the `ndi.find_objects` method to obtain a tuple of feature locations as slices to obtain the general location of the star but not necessarily the correct shape.

``````

In [ ]:

locations = ndi.find_objects(labeled_array)
print locations

``````
``````

In [ ]:

label_indices = [(labeled_array[:,:,0] == i).nonzero() for i in xrange(1, num_stars+1)]
print label_indices

``````

Let's change the color of the largest star in the field to red. To find the largest star, look at the lengths of the arrays stored in `label_indices`.

``````

In [ ]:

star_sizes = [(label_indices[i-1]).size for i in xrange(1, num_stars+1)]
print len(star_sizes)

``````
``````

In [ ]:

biggest_star = np.where(star_sizes == np.max(star_sizes))
print biggest_star

``````
``````

In [ ]:

print star_sizes[biggest_star]

``````
``````

In [ ]:

bwpic[label_indices[biggest_star],label_indices[biggest_star],:] = (255,0,0)
plt.imshow(bwpic);

``````

## Exercises

For fun and practice, you can use these tricks to analyze the whole set of pictures listed at the top of this section. Suggested activities include:

• Find the brightest pixel within each star identified using the `scipy.ndimage.label` function
• Draw the image with different colors for identified stars of different brightnesses
• Use the `scipy.ndimage.center_of_mass` function to identify the "location" of each star in pixel coordinates.
• Use NumPy's I/O functions to write out a text file containing

• the link to the image
• the threshold value used in the conversion to monochrome
• the structuring element used to count its stars
• how many stars were found
• a description of the data in the subsequent columns
• a list of the star data organized in three columns and formatted for easy reading as follows:
• location (x,y in pixel units)
• maximum brightness value for the star

# Summary:

NumPy offers convenient and fast numerical data manipulation with a large body of support libraries. Use it with large datasets to very quickly slice through the information and extract meaningful results.

The advantage of using NumPy is all about moving loops into compiled code.

For more details and documentation, consult the SciPy lecture series or the NumPy/SciPy documentation.