Summarizing Images

Images are high dimensional objects: our XMM image contains 648*648 = datapoints (the pixel values).

Visualizing the data is an extremely important first step: the next is summarizing, which can be thought of as dimensionality reduction.

Let's dust off some standard statistics and put them to good use in summarizing this X-ray image.


In [1]:
from __future__ import print_function
import astropy.io.fits as pyfits
import numpy as np
import astropy.visualization as viz
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 10.0)

In [2]:
targdir = 'a1835_xmm/'
imagefile = targdir+'P0098010101M2U009IMAGE_3000.FTZ'
expmapfile = targdir+'P0098010101M2U009EXPMAP3000.FTZ'
bkgmapfile = targdir+'P0098010101M2X000BKGMAP3000.FTZ'

!du -sch $targdir/*


4.0K	a1835_xmm//M2ptsrc.txt
464K	a1835_xmm//P0098010101M2U009EXPMAP3000.FTZ
48K	a1835_xmm//P0098010101M2U009IMAGE_3000.FTZ
472K	a1835_xmm//P0098010101M2X000BKGMAP3000.FTZ
988K	total

How Many Photons Came From the Cluster?

Let's estimate the total counts due to the cluster.

That means we need to somehow ignore

  • all the other objects in the field

  • the diffuse X-ray "background"

Let's start by masking various regions of the image to separate cluster from background.


In [3]:
imfits = pyfits.open(imagefile)
expfits = pyfits.open(expmapfile)
im = imfits[0].data
exp = expfits[0].data
plt.imshow(viz.scale_image(im, scale='log', max_cut=40), cmap='gray', origin='lower');


Estimating the background

Now let's look at the outer parts of the image, far from the cluster, and estimate the background level there.


In [4]:
# First make some coordinate arrays, including polar r from the cluster center:
ny, nx = im.shape
centroid = np.unravel_index(im.argmax(), im.shape)
x = np.linspace(0, nx-1, nx)
y = np.linspace(0, ny-1, ny)
dx, dy = np.meshgrid(x,y)
dx = dx - centroid[1]
dy = dy - centroid[0]
r = np.sqrt(dx*dx + dy*dy)

In [5]:
# Now select an outer annulus, for the background,
# and an inner circle, for the cluster: 
background = (r >= 100.0) & (r <= 150.0)
signal = (r < 100.0)

First, let's visualize the background region by masking out everything else.


In [6]:
maskedimage = im.copy()
maskedimage[np.logical_not(background)] = -1

plt.imshow(viz.scale_image(maskedimage, scale='log', max_cut=40), cmap='gray', origin='lower')


Out[6]:
<matplotlib.image.AxesImage at 0x7f8466c348d0>

Now let's look at the mean and median of the pixels in the background annulus that have non-negative values.


In [8]:
meanbackground = np.sum(im[background])/np.sum(exp[background])
medianbackground = np.median(im[background])

print("Mean background counts per pixel = ", meanbackground)
print("Median background counts per pixel = ", medianbackground)


Mean background counts per pixel =  5.56403664825e-06
Median background counts per pixel =  0.0

Q: Why do you think there's a difference?

Talk to your neighbor for a minute, and be ready to suggest an answer.

To understand the difference in these two estimates, lets look at a pixel histogram for this annulus.


In [56]:
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(im[background], bins=np.linspace(-3.5,29.5,34))
# plt.yscale('log', nonposy='clip')
plt.xlabel('Background annulus pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 40000])
plt.grid(True)
plt.show()



In [57]:
stdevbackground = np.std(im[background])
print("Standard deviation: ",stdevbackground)


Standard deviation:  0.659138624851

Exercise:

"The background level in this image is approximately $0.09 \pm 0.66$ counts"

What's wrong with this statement?

Talk to your neighbor for a few minutes, and see if you can come up with a better version.

Estimating the Cluster Counts

Now let's summarize the circular region centered on the cluster, by making another masked image.


In [58]:
maskedimage = im.copy()
maskedimage[np.logical_not(signal)] = 0
plt.imshow(viz.scale_image(maskedimage, scale='log', max_cut=40), cmap='gray', origin='lower')


Out[58]:
<matplotlib.image.AxesImage at 0x10d19c450>

In [59]:
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(im[signal], bins=np.linspace(-3.5,29.5,34), color='red')
plt.yscale('log', nonposy='clip')
plt.xlabel('Signal region pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 500000])
plt.grid(True)
plt.show()


Now we can make our estimates:


In [61]:
# Total counts in signal region:
Ntotal = np.sum(im[signal])

# Background counts: the mean counts per pixel in the annulus, 
# multiplied by the number of pixels in the signal region:
Nbackground = np.count_nonzero(signal)*meanbackground  # Is this a good choice?

# Difference is the cluster counts:
Ncluster = Ntotal - Nbackground

print("Counts in signal region: ",Ntotal)
print("Approximate counts due to background: ",Nbackground)
print("Approximate counts due to cluster: ",Ncluster)


Counts in signal region:  28195
Approximate counts due to background:  2866.84245494
Approximate counts due to cluster:  25328.1575451

Lessons

Summary statistics are how we:

  • reduce the dimensionality of datasets to a level where we can make sense of what is going on

  • make rough estimates of important quantities we are interested in

But:

  • their uncertainty is non-trivial to estimate, and at least requires more information

In [ ]: