All photographic images represent a measurement of how much light hits the receiver. For instance, the Hubble image below is obtained by measuring the brightnesses of distant stars:
With traditional optical cameras, this measurement results in an image which is continuous, as it is projected directly onto paper. In order to store images digitally, they need to be divided into discrete chunks, pixels, each of which contains the value of the measurement in that small portion of the image. In this representation, an image is simply a grid of numbers, which allows it to be easily stored as an array with a shape equal to the resolution of the image.
The scikit-image
(abbreviated to skimage
in code) module contains some sample images in the data
submodule that we can use to demonstrate this principle.
In [ ]:
# Get some import statements out of the way.
from __future__ import division, print_function
%matplotlib inline
import matplotlib.pyplot as plt
from skimage import data
In [ ]:
# Load the data.moon() image and print it
moon = data.moon()
print(moon)
Once read in as an array, the image can be processed in the same ways as any other array. For instance, we can easily find the highest, lowest and mean values of the image, the type of the variables stored in the array, and the resolution of the image:
In [ ]:
# Output the image minimum, mean and maximum.
print('Image min:', moon.min(), '; Image mean:', moon.mean(), '; Image max: ', moon.max())
# Output the array dtype.
print('Data type:', moon.dtype)
# Output image size.
print('Image size:', moon.shape)
This tells us that the image has a resolution of 512 x 512 pixels, and is stored as integers between 0 and 255. This is a common way of normalising images, but they can just as easily be stored as floats between 0 and 1. More commonly with astronomical data though, an image will consist of photon counts (i.e. integers), so the minimum will be 0 and any upper limit will likely be defined by the capabilities of the instrument.
While storing an image as a grid of numbers is very useful for analysis, we still need to be able to visually inspect the image. This can be achieved with plt.imshow()
, which allocates a colour to every element in the array according to its value.
In [ ]:
# Display image array with imshow()
plt.imshow(moon)
When plotting an image in this way, you will often need to know what actual values correspond to the colours. To find this out, we can draw a colour bar alongside the image which indicates the mapping of values to colours:
In [ ]:
plt.imshow(moon)
plt.colorbar()
You may notice that the default mapping of values to colours doesn't show the features of this image very well. Fortunately, matplotlib provides a large variety of colour maps which are suitable for various different purposes (more on this later). plt.imshow()
has a cmap
keyword argument which can be passed a string defining the desired colour map.
In [ ]:
# Display the image with a better colour map.
plt.imshow(moon, cmap='gray')
plt.colorbar()
The full list of available colour maps (for matplotlib 1.5) can be found here.
In [ ]:
# 1
# Load an image from skimage.data
my_image = data.coins()
# Print image shape and size
print(my_image.shape, my_image.size)
# Print data type and min&max of array
print(my_image.dtype, my_image.min(), my_image.max())
In [ ]:
# 2
# Display my image
plt.imshow(my_image, cmap='cubehelix')
plt.colorbar()
The default behaviour of imshow()
in terms of colour mapping is that the colours cover the full range of the data so that the lower end (blue, in the plots above) represents the smallest value in the array, and the higher end (red) represents the greatest value.
This is fine if rest of the values are fairly evenly spaced between these extremes. However, if we have a very low minimum or very high maximum compared to the rest of the image, this default scaling is unhelpful. To deal with this problem, imshow()
allows you to set the minimum and maximum values used for the scaling with the vmin
and vmax
keywords.
In [ ]:
plt.imshow(moon, cmap='gray', vmin=75, vmax=150)
plt.colorbar()
As you can see, this allows us to increase the contrast of the image at the cost of discounting extreme values, or we can include a broader range of values but see less detail. Similar effects can also be achieved with the norm
keyword, which allows you to set how imshow()
scales values in order to map them to colours (linear or logarithmic scaling, for example).
You will notice in the above plots that the axes are labelled with the pixel coordinates of the image. You will also notice that the origin of the axes is in the top left corner rather than the bottom left. This is a convention in image-drawing, but can be changed if necessary by setting the origin
keyword to 'lower'
when calling imshow()
:
In [ ]:
plt.imshow(moon, cmap='gray', origin='lower')
imshow()
also allows you to change the upper and lower values of each axis, and the appropriate tick labels will be drawn. This feature can be used to apply physical spatial scales to the image (if you know them) rather than going purely on pixel positions, which may be less useful. This is done with the extent
keyword, which takes a list of values corresponding to lower and upper x values and the lower and upper y values (in that order).
In [ ]:
plt.imshow(moon, cmap='gray', origin='lower', extent=[-1, 1, 0, 2])
In [ ]:
# 1
# Display the coins image with adjusted value range
plt.imshow(my_image, cmap='cubehelix', vmin=60, vmax=180)
plt.colorbar()
In [ ]:
# 2
pixelsize = 0.1
plt.imshow(my_image, cmap='cubehelix', vmin=60, vmax=180, extent=[0, my_image.shape[1]*pixelsize, 0, my_image.shape[0]*pixelsize])
plt.xlabel('x (cm)')
plt.ylabel('y (cm)')
plt.colorbar()
The image used in the examples above uses an image which is already supplied as an array by scikit-image. But what if we have been given an image file and we want to read it into Python?
There are many ways to do this, depending on the type of file. Typically in astronomy, images are stored in FITS format, which will be covered in detail later on. For now, we will return to the example of the Hubble image from earlier, which is stored in this repo in fig/galaxy.jpg. To load image data from a JPEG, we need the plt.imread()
function. This takes a filename which points at an image file and loads it into Python as a NumPy array.
In [ ]:
# Load the Hubble image from fig/galaxy.jpg
galaxy_image = plt.imread('fig/galaxy.jpg')
# PLot the image with imshow()
plt.imshow(galaxy_image)
You may notice that instead of using a colour map, this image has been plotted in full colour so it looks the same as the original image above. We can see why if we inspect the shape of the image array:
In [ ]:
galaxy_image.shape
Rather than just being a 2D array with a shape equivalent to the image resolution, the array has an extra dimension of length 3. This is because the image has been split into red, blue and green components, each of which are stored in a slice of the array. When given an n x m x 3 array like this, imshow()
interprets it as an RGB image and combines the layers into a single image.
However, if we wish to see the individual components they can be accessed and displayed by taking a slice of the array corresponding to the layer we wish to use.
In [ ]:
plt.imshow(galaxy_image[..., 0], cmap='Reds') # Plot the red layer of the image
plt.show()
plt.imshow(galaxy_image[..., 1], cmap='Greens') # Plot the green layer of the image
plt.show()
plt.imshow(galaxy_image[..., 2], cmap='Blues') # Plot the blue layer of the image
plt.show()
As we've already seen, multiple axes can be added to a single figure using plt.add_subplot()
. There is also a function that allows you to define several axes and their arrangement at the same time as the figure, plt.subplots()
.
This function returns a tuple of two objects - the figure and an array of axes objects with the specified shape. Referencing the axes array allows things to be plotted on the individual subplots.
In [ ]:
# Make a grid of 1 x 3 plots and show the Hubble image on the right.
fig, ax = plt.subplots(1, 3)
ax[2].imshow(galaxy_image)
plt.show()
In [ ]:
# 1
my_image = data.coffee()
# Create 2x2 grid of subplots
fig, axes = plt.subplots(2, 2)
# Plot image and image components with appropriate colour maps.
axes[0, 0].imshow(my_image)
axes[0, 1].imshow(my_image[..., 0], cmap='Reds')
axes[1, 0].imshow(my_image[..., 1], cmap='Greens')
axes[1, 1].imshow(my_image[..., 2], cmap='Blues')
We saw above that an RGB image array can be sliced to access one colour component. But the array can also be sliced in one or both of the image dimensions to crop the image. For instance, the smaller galaxy at the bottom of the image above occupies the space between about 200 and 400 pixels in the x direction, and stretches from about 240 pixels to the edge of the image in the y direction. This information allows us to slice the array appropriately:
In [ ]:
# Crop the image in x and y directions but keep all three colour components.
cropped_galaxy = galaxy_image[240:, 200:400, :]
plt.imshow(cropped_galaxy)
Similarly, if we need to reduce the image resolution for whatever reason, this can be done using array slicing operations.
In [ ]:
# Crop athe image and use only every other pixel in each direction to reduce the resolution.
lowres_galaxy = galaxy_image[240::2, 200:400:2, :]
plt.imshow(lowres_galaxy)
IMPORTANT NOTE: you should probably never do the above with actual astronomical data, because you're throwing away three quarters of your measurement. There are better ways to reduce image resolution which preserve much more of the data's integrity, and we will talk about these later. But it's useful to remember you can reduce an image's size like this, as long as you don't need that image for any actual science.
In order to display a smooth image, imshow()
automatically interpolates to find what values should be displayed between the given data points. The default interpolation scheme is 'linear'
, which interpolates linearly between points, as you might expect. The interpolation can be changed with yet another keyword in imshow()
. Here are a few examples:
In [ ]:
# Image with default interpolation
fig, ax = plt.subplots(2, 2, figsize=(16, 16))
smallim = galaxy_image[:100, 250:350, :]
ax[0, 0].imshow(smallim) # Default (linear) interpolation
ax[0, 1].imshow(smallim, interpolation='bicubic') # Bicubic interpolation
ax[1, 0].imshow(smallim, interpolation='nearest') # Nearest-neighbour interpolation
ax[1, 1].imshow(smallim, interpolation='none') # No interpolation
This can be a useful way to change how the image appears. For instance, if the exact values of the data are extremely important, little or no interpolation may be appropriate so the original values are easier to discern, whereas a high level of interpolation can be used if the smoothness of the image is more important than the actual numbers.
Note that that 'none'
in the imshow()
call above is NOT the same as None
. None
tells imshow()
you are not passing it a variable for the interpolation
keyword, so it uses the default, whereas 'none'
explicitly tells it not to interpolate.
We have already seen animation of data points on basic plots in a previous lesson. Animating an image is no different in principle. To demonstrate this, we'll set up an animation that shows the Hubble image and then cycles through each of the RGB components. This task requires all the same parts as an animation of a line or scatter plot:
matplotlib.animation
, a figure and an axes. Then we'll plot the initial image we want to display and return the plot object to a variable we can use for the animation.
In [ ]:
import matplotlib.animation as ani
# We'll need this for displaying animations
%matplotlib nbagg
fig, ax = plt.subplots()
display = plt.imshow(galaxy_image)
titles = ['Red component', 'Green component', 'Blue component', 'Combined image']
cmaps = ['Reds_r', 'Greens_r', 'Blues_r']
def animate(i):
try:
display.set_data(galaxy_image[..., i])
display.set_cmap(cmaps[i])
except IndexError:
display.set_data(galaxy_image)
ax.set_title(titles[i])
return display
myanim = ani.FuncAnimation(fig, animate, range(4), interval=1000)
plt.show()
In [ ]:
# 1
fig, ax = plt.subplots()
y0 = 0
y_ext = 120
display = plt.imshow(galaxy_image[y0:y0+y_ext, 200:400])
def pan(i):
y1 = y0 + i
display.set_data(galaxy_image[y1:y1+y_ext, 200:400])
return display
panimation = ani.FuncAnimation(fig, pan, range(galaxy_image.shape[0]-y_ext), interval=10)
plt.show()
A type of image file that you are quite likely to come across in astronomy is FITS (Flexible Image Transport System) files. This file type is used for storing various types of astronomical image data, including solar images. The advantage of FITS files is that as well as storing the numerical data which makes up the image, they also store a header associated with these data. The header usually contains information such as the spatial extent of the image, the resolution, the time at which the observation was taken, and various other properties of the data which may be useful when using the image for research. These pairs of data arrays and associated headers are stored in a HDU (Header-Data Unit). Several HDUs can be stored in a FITS file, so they are kept in a container called HDUList.
In [ ]:
from astropy.io import fits
import sunpy.data
#sunpy.data.download_sample_data()
aia_file = fits.open('/home/drew/sunpy/data/sample_data/aia.lev1.193A_2013-09-21T16_00_06.84Z.image_lev1.fits')
aia_file.verify('fix')
print(type(aia_file))
print(aia_file)
print(type(aia_file[1].data), type(aia_file[1].header))
print(aia_file[1].data)
print(aia_file[1].header['NAXIS1'])
print(aia_file[1].header['NAXIS2'])
print(aia_file[1].header['DATE-OBS'])
for tag in aia_file[1].header.keys():
print(tag, aia_file[1].header[tag])