Summarizing Images

Images are high dimensional objects: our XMM image contains 648*648 = datapoints (the pixel values).

Visualizing the data is an extremely important first step: the next is summarizing, which can be thought of as dimensionality reduction.

Let's dust off some standard statistics and put them to good use in summarizing this X-ray image.


In [1]:
import astropy.io.fits as pyfits
import numpy as np
import astropy.visualization as viz
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 10.0)

In [2]:
targdir = 'a1835_xmm/'
imagefile = targdir+'P0098010101M2U009IMAGE_3000.FTZ'
expmapfile = targdir+'P0098010101M2U009EXPMAP3000.FTZ'
bkgmapfile = targdir+'P0098010101M2X000BKGMAP3000.FTZ'

!du -sch $targdir/*


4.0K	a1835_xmm//M2ptsrc.txt
0	a1835_xmm//P0098010101M2U009EXPMAP3000.FTZ
0	a1835_xmm//P0098010101M2U009IMAGE_3000.FTZ
0	a1835_xmm//P0098010101M2X000BKGMAP3000.FTZ
4.0K	total

How Many Photons Came From the Cluster?

Let's estimate the total counts due to the cluster.

That means we need to somehow ignore

  • all the other objects in the field

  • the diffuse X-ray "background"

Let's start by masking various regions of the image to separate cluster from background.


In [3]:
imfits = pyfits.open(imagefile)
im = imfits[0].data
plt.imshow(viz.scale_image(im, scale='log', max_cut=40), cmap='gray', origin='lower');


---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-3-e31b2265afa5> in <module>()
----> 1 imfits = pyfits.open(imagefile)
      2 im = imfits[0].data
      3 plt.imshow(viz.scale_image(im, scale='log', max_cut=40), cmap='gray', origin='lower');

/usr/lib/python2.7/site-packages/astropy/io/fits/hdu/hdulist.py in fitsopen(name, mode, memmap, save_backup, cache, **kwargs)
    127         raise ValueError('Empty filename: %s' % repr(name))
    128 
--> 129     return HDUList.fromfile(name, mode, memmap, save_backup, cache, **kwargs)
    130 
    131 

/usr/lib/python2.7/site-packages/astropy/io/fits/hdu/hdulist.py in fromfile(cls, fileobj, mode, memmap, save_backup, cache, **kwargs)
    269 
    270         return cls._readfrom(fileobj=fileobj, mode=mode, memmap=memmap,
--> 271                              save_backup=save_backup, cache=cache, **kwargs)
    272 
    273     @classmethod

/usr/lib/python2.7/site-packages/astropy/io/fits/hdu/hdulist.py in _readfrom(cls, fileobj, data, mode, memmap, save_backup, cache, **kwargs)
    859             # raise and exception
    860             if mode in ('readonly', 'denywrite') and len(hdulist) == 0:
--> 861                 raise IOError('Empty or corrupt FITS file')
    862 
    863             # initialize/reset attributes to be used in "update/append" mode

IOError: Empty or corrupt FITS file

Estimating the background

Now let's look at the outer parts of the image, far from the cluster, and estimate the background level there.


In [4]:
maskedimage = im.copy()

# First make some coordinate arrays, including polar r from the cluster center:
(ny,nx) = maskedimage.shape
centroid = np.where(maskedimage == np.max(maskedimage))
x = np.linspace(0, nx-1, nx)
y = np.linspace(0, ny-1, ny)
dx, dy = np.meshgrid(x,y)
dx = dx - centroid[1]
dy = dy - centroid[0]
r = np.sqrt(dx*dx + dy*dy)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-4cd254f0c530> in <module>()
----> 1 maskedimage = im.copy()
      2 
      3 # First make some coordinate arrays, including polar r from the cluster center:
      4 (ny,nx) = maskedimage.shape
      5 centroid = np.where(maskedimage == np.max(maskedimage))

NameError: name 'im' is not defined

In [51]:
# Now select an outer annulus, for the background and an inner circle, for the cluster: 
background = maskedimage.copy()
background[r < 100] = -3
background[r > 150] = -3
signal = maskedimage.copy()
signal[r > 100] = 0.0

In [50]:
plt.imshow(viz.scale_image(background, scale='log', max_cut=40), cmap='gray', origin='lower')


Out[50]:
<matplotlib.image.AxesImage at 0x111943650>

Let's look at the mean and median of the pixels in this image that have non-negative values.


In [93]:
meanbackground = np.mean(background[background > -1])
medianbackground = np.median(background[background > -1])

print "Mean background counts per pixel = ",meanbackground
print "Median background counts per pixel = ",medianbackground


Mean background counts per pixel =  0.0913094389573
Median background counts per pixel =  0.0

Exercise:

Why do you think there is a difference? Talk to your neighbor for a minute, and be ready to suggest an answer.

To understand the difference in these two estimates, lets look at a pixel histogram for this annulus.


In [87]:
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(background[background > -1], bins=np.linspace(-3.5,29.5,34))
# plt.yscale('log', nonposy='clip')
plt.xlabel('Background annulus pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 40000])
plt.grid(True)
plt.show()



In [94]:
stdevbackground = np.std(background[background > -1])
print "Standard deviation: ",stdevbackground


Standard deviation:  0.659138624851

Exercise:

"The background level in this image is approximately $0.09 \pm 0.66$ counts"

What's wrong with this statement?

Talk to your neighbor for a few minutes, and see if you can come up with a better version.

Estimating the Cluster Counts

Now let's summarize the circular region centered on the cluster.


In [45]:
plt.imshow(viz.scale_image(signal, scale='log', max_cut=40), cmap='gray', origin='lower')


Out[45]:
<matplotlib.image.AxesImage at 0x10d66e510>

In [80]:
plt.figure(figsize=(10,7))
n, bins, patches = plt.hist(signal[signal > -1], bins=np.linspace(-3.5,29.5,34), color='red')
plt.yscale('log', nonposy='clip')
plt.xlabel('Signal region pixel value (counts)')
plt.ylabel('Frequency')
plt.axis([-3.0, 30.0, 0, 500000])
plt.grid(True)
plt.show()


Now we can make our estimates:


In [95]:
# Total counts in signal region:
Ntotal = np.sum(signal[signal > -1])

# Background counts: mean in annulus, multiplied by number of pixels in signal region:
N = signal.copy()*0.0
N[signal > -1] = 1.0
Nbackground = np.sum(N)*meanbackground

# Difference is the cluster counts:
Ncluster = Ntotal - Nbackground

print "Counts in signal region: ",Ntotal
print "Approximate counts due to background: ",Nbackground
print "Approximate counts due to cluster: ",Ncluster


Counts in signal region:  28196
Approximate counts due to background:  38341.1986559
Approximate counts due to cluster:  -10145.1986559

Lessons

Summary statistics are how we:

  • reduce the dimensionality of datasets to a level where we can make sense of what is going on

  • make rough estimates of important quantities we are interested in

But:

  • their uncertainty is non-trivial to estimate, and at least requires more information

In [ ]: