Summarizing Images

Images are high dimensional objects: our XMM image contains 648*648 = datapoints (the pixel values).

Visualizing the data is an extremely important first step: the next is summarizing, which can be thought of as dimensionality reduction.

Let's dust off some standard statistics and put them to good use in summarizing this X-ray image.



In [ ]:

    
import astropy.io.fits as pyfits
import numpy as np
import astropy.visualization as viz
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 10.0)



In [ ]:

    
targdir = 'a1835_xmm/'
imagefile = targdir+'P0098010101M2U009IMAGE_3000.FTZ'
expmapfile = targdir+'P0098010101M2U009EXPMAP3000.FTZ'
bkgmapfile = targdir+'P0098010101M2X000BKGMAP3000.FTZ'

!du -sch $targdir/*

How Many Photons Came From the Cluster?

Let's estimate the total counts due to the cluster.

That means we need to somehow ignore

all the other objects in the field
the diffuse X-ray "background"

Let's start by masking various regions of the image to separate cluster from background.



In [ ]:

    
imfits = pyfits.open(imagefile)
im = imfits[0].data
plt.imshow(viz.scale_image(im, scale='log', max_cut=40), cmap='gray', origin='lower');

Estimating the background

Now let's look at the outer parts of the image, far from the cluster, and estimate the background level there.



In [ ]:

    
maskedimage = im.copy()

# First make some coordinate arrays, including polar r from the cluster center:
(ny,nx) = maskedimage.shape
centroid = np.where(maskedimage == np.max(maskedimage))
x = np.linspace(0, nx-1, nx)
y = np.linspace(0, ny-1, ny)
dx, dy = np.meshgrid(x,y)
dx = dx - centroid[1]
dy = dy - centroid[0]
r = np.sqrt(dx*dx + dy*dy)



In [ ]:

    
# Now select an outer annulus, for the background and an inner circle, for the cluster: 
background = maskedimage.copy()
background[r < 100] = -3
background[r > 150] = -3
signal = maskedimage.copy()
signal[r > 100] = 0.0



In [ ]:

    
plt.imshow(viz.scale_image(background, scale='log', max_cut=40), cmap='gray', origin='lower')

Let's look at the mean and median of the pixels in this image that have non-negative values.



In [ ]:

    
meanbackground = np.mean(background[background > -1])
medianbackground = np.median(background[background > -1])

print "Mean background counts per pixel = ",meanbackground
print "Median background counts per pixel = ",medianbackground

Exercise:

Why do you think there is a difference? Talk to your neighbor for a minute, and be ready to suggest an answer.

To understand the difference in these two estimates, lets look at a pixel histogram for this annulus.



In [ ]:

    
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(background[background > -1], bins=np.linspace(-3.5,29.5,34))
# plt.yscale('log', nonposy='clip')
plt.xlabel('Background annulus pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 40000])
plt.grid(True)
plt.show()



In [ ]:

    
stdevbackground = np.std(background[background > -1])
print "Standard deviation: ",stdevbackground

Exercise:

"The background level in this image is approximately $0.09 \pm 0.66$ counts"

What's wrong with this statement?

Talk to your neighbor for a few minutes, and see if you can come up with a better version.

Estimating the Cluster Counts

Now let's summarize the circular region centered on the cluster.



In [ ]:

    
plt.imshow(viz.scale_image(signal, scale='log', max_cut=40), cmap='gray', origin='lower')



In [ ]:

    
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(signal[signal > -1], bins=np.linspace(-3.5,29.5,34), color='red')
plt.yscale('log', nonposy='clip')
plt.xlabel('Signal region pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 500000])
plt.grid(True)
plt.show()

Now we can make our estimates:



In [ ]:

    
# Total counts in signal region:
Ntotal = np.sum(signal[signal > -1])

# Background counts: mean in annulus, multiplied by number of pixels in signal region:
N = signal.copy()*0.0
N[signal > -1] = 1.0
Nbackground = np.sum(N)*meanbackground  # Is this a good choice?

# Difference is the cluster counts:
Ncluster = Ntotal - Nbackground

print "Counts in signal region: ",Ntotal
print "Approximate counts due to background: ",Nbackground
print "Approximate counts due to cluster: ",Ncluster

Lessons

Summary statistics are how we:

reduce the dimensionality of datasets to a level where we can make sense of what is going on
make rough estimates of important quantities we are interested in

But:

their uncertainty is non-trivial to estimate, and at least requires more information



In [ ]: