Summarizing Images

Images are high dimensional objects: our XMM image contains 648*648 = datapoints (the pixel values).

Visualizing the data is an extremely important first step: the next is summarizing, which can be thought of as dimensionality reduction.

Let's dust off some standard statistics and put them to good use in summarizing this X-ray image.



In [1]:

    
import astropy.io.fits as pyfits
import numpy as np
import astropy.visualization as viz
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 10.0)



In [2]:

    
targdir = 'a1835_xmm/'
imagefile = targdir+'P0098010101M2U009IMAGE_3000.FTZ'
expmapfile = targdir+'P0098010101M2U009EXPMAP3000.FTZ'
bkgmapfile = targdir+'P0098010101M2X000BKGMAP3000.FTZ'

!du -sch $targdir/*









    



4.0K	a1835_xmm//M2ptsrc.txt
0	a1835_xmm//P0098010101M2U009EXPMAP3000.FTZ
0	a1835_xmm//P0098010101M2U009IMAGE_3000.FTZ
0	a1835_xmm//P0098010101M2X000BKGMAP3000.FTZ
4.0K	total

How Many Photons Came From the Cluster?

Let's estimate the total counts due to the cluster.

That means we need to somehow ignore

all the other objects in the field
the diffuse X-ray "background"

Let's start by masking various regions of the image to separate cluster from background.



In [3]:

    
imfits = pyfits.open(imagefile)
im = imfits[0].data
plt.imshow(viz.scale_image(im, scale='log', max_cut=40), cmap='gray', origin='lower');









    



---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-3-e31b2265afa5> in <module>()
----> 1 imfits = pyfits.open(imagefile)
      2 im = imfits[0].data
      3 plt.imshow(viz.scale_image(im, scale='log', max_cut=40), cmap='gray', origin='lower');

/usr/lib/python2.7/site-packages/astropy/io/fits/hdu/hdulist.py in fitsopen(name, mode, memmap, save_backup, cache, **kwargs)
    127         raise ValueError('Empty filename: %s' % repr(name))
    128 
--> 129     return HDUList.fromfile(name, mode, memmap, save_backup, cache, **kwargs)
    130 
    131 

/usr/lib/python2.7/site-packages/astropy/io/fits/hdu/hdulist.py in fromfile(cls, fileobj, mode, memmap, save_backup, cache, **kwargs)
    269 
    270         return cls._readfrom(fileobj=fileobj, mode=mode, memmap=memmap,
--> 271                              save_backup=save_backup, cache=cache, **kwargs)
    272 
    273     @classmethod

/usr/lib/python2.7/site-packages/astropy/io/fits/hdu/hdulist.py in _readfrom(cls, fileobj, data, mode, memmap, save_backup, cache, **kwargs)
    859             # raise and exception
    860             if mode in ('readonly', 'denywrite') and len(hdulist) == 0:
--> 861                 raise IOError('Empty or corrupt FITS file')
    862 
    863             # initialize/reset attributes to be used in "update/append" mode

IOError: Empty or corrupt FITS file

Estimating the background

Now let's look at the outer parts of the image, far from the cluster, and estimate the background level there.



In [4]:

    
maskedimage = im.copy()

# First make some coordinate arrays, including polar r from the cluster center:
(ny,nx) = maskedimage.shape
centroid = np.where(maskedimage == np.max(maskedimage))
x = np.linspace(0, nx-1, nx)
y = np.linspace(0, ny-1, ny)
dx, dy = np.meshgrid(x,y)
dx = dx - centroid[1]
dy = dy - centroid[0]
r = np.sqrt(dx*dx + dy*dy)









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-4cd254f0c530> in <module>()
----> 1 maskedimage = im.copy()
      2 
      3 # First make some coordinate arrays, including polar r from the cluster center:
      4 (ny,nx) = maskedimage.shape
      5 centroid = np.where(maskedimage == np.max(maskedimage))

NameError: name 'im' is not defined



In [51]:

    
# Now select an outer annulus, for the background and an inner circle, for the cluster: 
background = maskedimage.copy()
background[r < 100] = -3
background[r > 150] = -3
signal = maskedimage.copy()
signal[r > 100] = 0.0



In [50]:

    
plt.imshow(viz.scale_image(background, scale='log', max_cut=40), cmap='gray', origin='lower')









    Out[50]:





<matplotlib.image.AxesImage at 0x111943650>

Let's look at the mean and median of the pixels in this image that have non-negative values.



In [93]:

    
meanbackground = np.mean(background[background > -1])
medianbackground = np.median(background[background > -1])

print "Mean background counts per pixel = ",meanbackground
print "Median background counts per pixel = ",medianbackground









    



Mean background counts per pixel =  0.0913094389573
Median background counts per pixel =  0.0

Exercise:

Why do you think there is a difference? Talk to your neighbor for a minute, and be ready to suggest an answer.

To understand the difference in these two estimates, lets look at a pixel histogram for this annulus.



In [87]:

    
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(background[background > -1], bins=np.linspace(-3.5,29.5,34))
# plt.yscale('log', nonposy='clip')
plt.xlabel('Background annulus pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 40000])
plt.grid(True)
plt.show()



In [94]:

    
stdevbackground = np.std(background[background > -1])
print "Standard deviation: ",stdevbackground









    



Standard deviation:  0.659138624851

Exercise:

"The background level in this image is approximately $0.09 \pm 0.66$ counts"

What's wrong with this statement?

Talk to your neighbor for a few minutes, and see if you can come up with a better version.

Estimating the Cluster Counts

Now let's summarize the circular region centered on the cluster.



In [45]:

    
plt.imshow(viz.scale_image(signal, scale='log', max_cut=40), cmap='gray', origin='lower')









    Out[45]:





<matplotlib.image.AxesImage at 0x10d66e510>



In [80]:

    
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(signal[signal > -1], bins=np.linspace(-3.5,29.5,34), color='red')
plt.yscale('log', nonposy='clip')
plt.xlabel('Signal region pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 500000])
plt.grid(True)
plt.show()

Now we can make our estimates:



In [95]:

    
# Total counts in signal region:
Ntotal = np.sum(signal[signal > -1])

# Background counts: mean in annulus, multiplied by number of pixels in signal region:
N = signal.copy()*0.0
N[signal > -1] = 1.0
Nbackground = np.sum(N)*meanbackground

# Difference is the cluster counts:
Ncluster = Ntotal - Nbackground

print "Counts in signal region: ",Ntotal
print "Approximate counts due to background: ",Nbackground
print "Approximate counts due to cluster: ",Ncluster









    



Counts in signal region:  28196
Approximate counts due to background:  38341.1986559
Approximate counts due to cluster:  -10145.1986559

Lessons

Summary statistics are how we:

reduce the dimensionality of datasets to a level where we can make sense of what is going on
make rough estimates of important quantities we are interested in

But:

their uncertainty is non-trivial to estimate, and at least requires more information



In [ ]: