Summarizing Images

Images are high dimensional objects: our XMM image contains 648*648 = datapoints (the pixel values).

Visualizing the data is an extremely important first step: the next is summarizing, which can be thought of as dimensionality reduction.

Let's dust off some standard statistics and put them to good use in summarizing this X-ray image.


In [ ]:
import astropy.io.fits as pyfits
import numpy as np
import astropy.visualization as viz
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 10.0)

In [ ]:
targdir = 'a1835_xmm/'
imagefile = targdir+'P0098010101M2U009IMAGE_3000.FTZ'
expmapfile = targdir+'P0098010101M2U009EXPMAP3000.FTZ'
bkgmapfile = targdir+'P0098010101M2X000BKGMAP3000.FTZ'

!du -sch $targdir/*

How Many Photons Came From the Cluster?

Let's estimate the total counts due to the cluster.

That means we need to somehow ignore

  • all the other objects in the field

  • the diffuse X-ray "background"

Let's start by masking various regions of the image to separate cluster from background.


In [ ]:
imfits = pyfits.open(imagefile)
im = imfits[0].data
plt.imshow(viz.scale_image(im, scale='log', max_cut=40), cmap='gray', origin='lower');

Estimating the background

Now let's look at the outer parts of the image, far from the cluster, and estimate the background level there.


In [ ]:
maskedimage = im.copy()

# First make some coordinate arrays, including polar r from the cluster center:
(ny,nx) = maskedimage.shape
centroid = np.where(maskedimage == np.max(maskedimage))
x = np.linspace(0, nx-1, nx)
y = np.linspace(0, ny-1, ny)
dx, dy = np.meshgrid(x,y)
dx = dx - centroid[1]
dy = dy - centroid[0]
r = np.sqrt(dx*dx + dy*dy)

In [ ]:
# Now select an outer annulus, for the background and an inner circle, for the cluster: 
background = maskedimage.copy()
background[r < 100] = -3
background[r > 150] = -3
signal = maskedimage.copy()
signal[r > 100] = 0.0

In [ ]:
plt.imshow(viz.scale_image(background, scale='log', max_cut=40), cmap='gray', origin='lower')

Let's look at the mean and median of the pixels in this image that have non-negative values.


In [ ]:
meanbackground = np.mean(background[background > -1])
medianbackground = np.median(background[background > -1])

print "Mean background counts per pixel = ",meanbackground
print "Median background counts per pixel = ",medianbackground

Exercise:

Why do you think there is a difference? Talk to your neighbor for a minute, and be ready to suggest an answer.

To understand the difference in these two estimates, lets look at a pixel histogram for this annulus.


In [ ]:
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(background[background > -1], bins=np.linspace(-3.5,29.5,34))
# plt.yscale('log', nonposy='clip')
plt.xlabel('Background annulus pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 40000])
plt.grid(True)
plt.show()

In [ ]:
stdevbackground = np.std(background[background > -1])
print "Standard deviation: ",stdevbackground

Exercise:

"The background level in this image is approximately $0.09 \pm 0.66$ counts"

What's wrong with this statement?

Talk to your neighbor for a few minutes, and see if you can come up with a better version.

Estimating the Cluster Counts

Now let's summarize the circular region centered on the cluster.


In [ ]:
plt.imshow(viz.scale_image(signal, scale='log', max_cut=40), cmap='gray', origin='lower')

In [ ]:
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(signal[signal > -1], bins=np.linspace(-3.5,29.5,34), color='red')
plt.yscale('log', nonposy='clip')
plt.xlabel('Signal region pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 500000])
plt.grid(True)
plt.show()

Now we can make our estimates:


In [ ]:
# Total counts in signal region:
Ntotal = np.sum(signal[signal > -1])

# Background counts: mean in annulus, multiplied by number of pixels in signal region:
N = signal.copy()*0.0
N[signal > -1] = 1.0
Nbackground = np.sum(N)*meanbackground  # Is this a good choice?

# Difference is the cluster counts:
Ncluster = Ntotal - Nbackground

print "Counts in signal region: ",Ntotal
print "Approximate counts due to background: ",Nbackground
print "Approximate counts due to cluster: ",Ncluster

Lessons

Summary statistics are how we:

  • reduce the dimensionality of datasets to a level where we can make sense of what is going on

  • make rough estimates of important quantities we are interested in

But:

  • their uncertainty is non-trivial to estimate, and at least requires more information

In [ ]: