Lab 6: Working with FITS Image Data and Manipulating Arrays

Names:

The main goal of this lab is to create "master" calibration frames from the raw calibration frames we took at Smith. Your homework will then involve applying the calibrations you created in this lab to the cluster and standard star images we took, so that you can work with calibrated science frames.

To work with the calibration frames, we will learn new methods for organizing files, working with arrays of image data, using the FITS data we took from our first observing night at Smith. Along the way, we will learn to use a few Unix tasks from within Python, and we will use and write a number of functions that we will later use outside of the Jupyter Notebook environment.

Part 1: Sorting Data

When you pulled the class Git repository for this lab, you will have also downloaded the zipped folder with all of the FITS files we took at Smith.

First, unzip this file, which will decompress all of files into a single folder, 2017oct04.

To start working with different groups of images, it will be help to first organize all of those files into subfolders. We will import our usual Python modules and a few others needed for this lab:


In [ ]:
# The standard fare:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

# Recall our use of this module to work with FITS files in Lab 4:
from astropy.io import fits 

# This lets us use various Unix (or Unix-like) commands within Python:
import os 

# We will see what this does shortly.
import glob

1.1 Using Glob

glob is an extremely useful function in Python. First, move into your data directory using cd, then ls to see the file list:


In [ ]:
cd 2017oct04/

In [ ]:
ls

Now, execute the following two cells.


In [ ]:
a_few_files = glob.glob('science*.fit')

In [ ]:
a_few_files

What does glob do? What kind of Python object is a_few_files?
Answer:



Now use glob to create a new variable, all_fits, that contains the names of all of the FITS files in your directory. We will use this variable a number of times.


In [ ]:
# Finish this cell

all_fits =

1.2 Iterating to View Header Info

We now would like to take all of our FITS files and sort them into subfolders based on the type of image, e.g., calibration, science, standard star, etc. Helpfully, the file names include some clues about what sort of files we have. However, if you remember from our observing run and log sheets, not all of the file names are always correct. This can happen quite easily at the telescope, because camera software programs typically have various automated ways of naming files (often different from observatory to observatory) and sometimes require the user to remember to change settings while observing.

(To avoid this, some observatories name their files in a uniform way, like "Image_00001.fits" or with a time/date stamp, like "NACO_2017-10-05T00:04:18.fits", which provide unique identifiers.)

Therefore, to check our data and sort it accurately, we'll look at the header information first to get a sense of the type of files we're dealing with. There are a couple different ways we can look at FITS headers; in Lab 4, we used fits.open, which is very versatile. Here, we'll use two new functions to quickly view the header and data:

fits.getheader('filename')

and

fits.getdata('filename')


In [ ]:
# Complete this cell to save the header of the zeroth (that is, the first) FITS file in the all_fits list:

test_header =

In [ ]:
# Now, view the header info here:

We can also use our new variable to view specific header keyword values, such as the image type, or object:
("IMAGETYP" and "OBJECT", since FITS files can only have 8 character keywords)


In [ ]:
test_header['IMAGETYP']

In [ ]:
test_header['OBJECT']

We are going to use for loops to iterate over each FITS file in our directory, so we can quickly see what kinds of FITS files we have based on keyword.

If iteration is newer to you (or you'd like a brief refresher), check out the other notebook in the main directory ('Unix_Programming_Refresher') for some quick reference exercises.


In [ ]:
# Here is an example for loop that provides all of the image types. 
# (1) Run this function to view all of the 'IMAGETYP' values, then
# (2) Edit and re-run the function to view all of the 'OBJECT' values, then 
# (3) Edit and re-run once more to view a header keyword value of your choice.

for filename in all_fits:
    header = fits.getheader(filename)
    filetype = header['IMAGETYP']
    print(filetype)

1.3 The Sorting Function

Because data taken in different filters and with different exposure times require matching calibration files, our goal is to create the following directory structure:

We want a generic function that:

(1) takes as input:

  • the name of a single FITS file,
  • the desired folder name,
  • the desired type of FITS file,
  • the header keyword to match the FITS file type

(2) reads in the file's header information,

(3) reads in the file's type based on the keyword

(4) checks if the desired folder name exists, and makes a new directory if it doesn't

(5) checks if the file's type matches the desired type of fitsfile

(6) moves the file into the new or existing folder, if the file matches the right file type.


We've provided a function that does all of these steps, but lacks a docstring. You'll be using this function multiple times, so discuss within your group exactly how the function works, then update the docstring and add comments within the function (using #).


In [ ]:
def filesorter(filename, foldername, fitskeyword_to_check, keyword):
    '''
    Edit this docstring to describe the function purpose and use.
    '''
    if os.path.exists(filename):
        pass
    else:
        print(filename + " does not exist or has already been moved.")
        return
    
    header = fits.getheader(filename)
    fits_type = header[keyword]
    
    if os.path.exists(foldername):
        pass
    else:
        print("Making new directory: " + foldername)
        os.mkdir(foldername)
        
    if fits_type == fitskeyword_to_check:
        destination = foldername + '/'
        print("Moving " + filename + " to: ./" + destination + filename)
        os.rename(filename, destination + filename)  
    return

Now test out the filesorter function for a single file, cal-001_Rflat.fit. We know this should be a calibration file from the name, but the 'calibration' folder doesn't exist yet, and we can test whether or not it is actually a 'cal' file by checking the 'OBJECT' keyword.

Complete and execute the cell below:


In [ ]:
filesorter()

1.4 Sort all the calibration data

Below, write your own for loop that goes through each file in all_fits, and applies filesorter to each file based on your choice of folder name ('calibration'), keyword value, and keyword.


In [ ]:
# Your for loop to sort calibration data here:

In [ ]:
# Move into the calibration folder you created:

In [ ]:


In [ ]:
# Re-glob the fits files in the calibration director to a new variable

cal_files =

In the following three cells, use the same for loop structure to make subdirectories for biasframes, darks, and flats. We won't be using 'OBJECT' as the keyword, since that's a more generic category -- decide which keyword is most appropriate here.


In [ ]:
# Loop to sort flats:

In [ ]:
# Loop to sort bias frames:

In [ ]:
# Loop to sort dark frames:

In [ ]:
# Below, list the contents of each subfolder to make sure things moved correctly:

In [ ]:
ls flats

In [ ]:
ls biasframes

In [ ]:
ls darks

Part 2: Working with Array Data!

Now that our calibration data are all sorted into folders, we'll start to work with the raw frames to make the master calibration files.

2.1 Bias Frames

It's simplest to create a master bias frame first, so we'll start with the bias frames. Change directories into your newly-created biasframes folder and ls to make sure everything has transferred correctly:


In [ ]:
cd biasframes

In [ ]:
ls

Determining bias frame properties

The first thing we'll need to do, as before, is to use glob to create a new list of only bias frames to work with:


In [ ]:
# Complete to create a new bias frame list:

biasfiles =

As discussed in your reading and in lecture, what we ultimately want is a median combination of all of the individual bias frames. So we will create a stack of bias images, and then we will take the median values at each pixel location in the stack to create the final combined frame.


In [ ]:
# Complete: Define a new variable, n, and determine how many bias files there are (length of biasfiles):
n = 

# And use fits.getdata to read in the data for only the first bias frame (the zeroth element of the bias list):
first_bias_data =

Of course, our FITS images are 2-D, and we will be working with arrays of various dimensions -- not all FITS images are the same size (e.g. 1024 x 1024). In numpy, we can define arrays of various sizes as follows. Execute the cell, and Jupyter will print the formatted version:


In [ ]:
np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])

To determine the shape of the array, we can simply use the .shape command:


In [ ]:
test_array = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
test_array.shape

In the cell below, use this method to determine the dimensions of the first bias image and define them as new variables:


In [ ]:
# Complete to get the dimensions of first_bias_data, then check the values

imsize_y, imsize_x = 
imsize_y, imsize_x

How many bias files are there?
Answer:

What are the dimensions of the first bias frame?
Answer:

Creating a blank 3-D array and adding images to it

In order to take the median value of the stacked bias frames, we'll need to insert them into a larger array first. We can do this by creating a "blank" 3-D array filled with zero values with dimensions of (y dimension, x dimension, number of images):


In [ ]:
# Create blank stack of arrays to hold each of the frames
# **IMPORTANT** Note that Y is listed first! This is a pecularity in how python reads arrays:

biasarray = np.zeros((imsize_y, imsize_x , n), dtype = np.float32) 

# Now check the shape of our new "blank" array:
biasarray.shape

In [ ]:
# In this cell, check what the values in the biasarray look like now:

We can make an image stack of bias frames by adding the data from each of the FITS files into the "blank" stack, one by one:


In [ ]:
# Insert each bias frame into a three dimensional stack, one by one:

for ii in range(0, n):
    im = fits.getdata(biasfiles[ii])
    biasarray[:,:,ii] = im

In [ ]:
# How do the biasarray values look now?
biasarray

Taking the median and saving the master bias:

Now the final steps to create a master bias frame are to:

(1) take the median of the 3-D array, along the appropriate axis, which will collapse the image to a regular two dimensional array,

and

(2) save this new 2-D array -- the master bias -- as a brand-new fits file with a new name, giving it the same header as the the first bias image for simplicity.


In [ ]:
# Take the median of the 3-D stack: 
med_bias = np.median(biasarray, axis = 2) # axis=2 means stack along the *third* axis, since python is zero-indexed

In [ ]:
# And get the header for the first image in the list:
biasheader = fits.getheader(biasfiles[0])

# Define a name for the new output FITS file:
master_bias = 'Master_Bias.fit'

# Save your new FITS file!
fits.writeto(master_bias, med_bias, biasheader, clobber=True)

What inputs does the fits.writeto function require to save a new FITS files?
Answer:

The final step is to check the final master bias frame to see if it appears normal. In DS9, open up a single raw calibration bias frame as well as the the new Master_Bias.fit that you just created.

How do the two compare? Do the pixel values seem reasonable? Do the dimensions of the image make sense?
Answer:

Median Combination Function

For ease of use, let's write all of those proceeding steps into a single function that we can re-use later. Edit the docstring below and add any comments on how to use the function that future you will find helpful.


In [ ]:
def mediancombine(filelist):
    '''
    Edit this docstring accordingly!
    '''
    n = len(filelist)
    first_frame_data = fits.getdata(filelist[0])
    imsize_y, imsize_x = first_frame_data.shape
    fits_stack = np.zeros((imsize_y, imsize_x , n), dtype = np.float32) 
    for ii in range(0, n):
        im = fits.getdata(filelist[ii])
        fits_stack[:,:,ii] = im
    med_frame = np.median(fits_stack, axis=2)
    return med_frame

In [ ]:
# Now our step to create a median of all the bias frames is much simpler! 
median_bias = mediancombine(biasfiles)

# Below, how would you save the new median_bias frame as a FITS file? 
# Complete the function below and save the duplicate master bias as "Backup_MasterBias.fit")

fits.writeto()

One last note on the master bias -- we will need to determine the path to the Master_Bias.fit file, because we will use functions that call it from different folders. We can do this using os.getcwd (get current working directory) and adding a string with the filename, as follows:


In [ ]:
master_bias_path = os.getcwd() + '/Master_Bias.fit'
master_bias_path

2.2 Dark Frames

Bias subtracting the darks:

We want to median combine our darks, but contribution from bias is present in every image taken, so our first step after creating a master bias frame is always to subtract it from every other image. Array subtraction is more straightforward -- as long as two arrays have the same dimension, they can be subtracted from one another on a pixel-by-pixel basis.

Below, write a generalized function below that subtracts Master_Bias.fit from a single frame. Your function should:

1) take a FITS file name and path to the master bias as inputs,
2) load in the data for the file to be calibrated,
3) loads in the header information for that file,
4) loads in the data for Master_Bias.fit as a variable (remember, it's located in a different folder than where you are now! Hence the second input),
5) subtract the bias from the input FITS file, and
6) save (writeto) the new bias-subtracted FITS file with a modified name (e.g., cal-001_dark60.fit would become b_cal-001_dark60.fit, for bias-subtraction).

Once you've written and tested your function, you will apply it to all of the dark frames.

The first step is to move into the darks directory from your current location and glob the dark files.


In [ ]:
cd ../darks

In [ ]:
darkfiles = glob.glob('*fit')

In [ ]:
# Write your bias subtraction function here:

def bias_subtract(filename, path_to_bias):
    '''
    Add your docstring here.
    '''
    # Your code goes here.
    fits.writeto('b_' + filename, ) # finish this code too to save the FITS
    return

In [ ]:
# Test out your function on an individual frame (remember, we defined "master_bias_path" just before Section 2.2:

bias_subtract('cal-001_dark60.fit', master_bias_path)

# Did it work? You can test whether the bias subtraction worked by viewing the before/after frames in DS9.

In [ ]:
# Now write and execute a for loop that subtracts the bias from each of the dark frames.

You should now have twice the number of dark frames in your directory, half of which have the prefix 'b_'. These are the frames we want to median combine into the master dark!


In [ ]:
ls

In the cell below, use the mediancombine function from earlier to combine all of the bias-subtracted dark frames into a single master dark. We will call it Master_Dark_60s.fit , since dark frames need to match exposure times.

Be careful using glob to select only the darks that have been bias-subtracted!


In [ ]:
# Your lines of code here:

In [ ]:
ls

How does the master dark compare to a single raw dark frame? Take a look in DS9 and compare:
Answer:

2.3 Flat Fields

The final master calibration we want to create is a master flat field (flat). As you may have noticed during our observing night, the features that appear in the flats are highly specific to the filters in which they are taken -- so we will end up with two master flat fields. Therefore, our first steps will be to cd into the flats folder, glob files, and run our filesorter function to make the two flat subfolders in our diagram, 'Vband' and 'Rband'.


In [ ]:
cd ../flats/

In [ ]:
# Sort by filter into new subfolders below, using the filesorter function and updating its inputs as needed:

In [ ]:
ls

We'll work just with Vband for now, so go into that directory, and you'll work with the Rband reduction in the homework. Like always, our first step is to subtract the master bias. Do this below for all of the files, using your bias_subtract function.


In [ ]:
cd Vband

In [ ]:
# Bias-subtract the flat fields:

In [ ]:
ls

Dark subtract the flat fields:

Now, we will want to subtract the dark contribution from the flat fields. This can be accomplished by creating a new function below, dark_subtract, that looks very much like your bias_subtract function.

  • Most importantly, make sure that the dark you subtract matches the exposure time of the flat fields!

Check the flat exposure times in the cell below. What is/are the value(s)?
Answer:


In [ ]:
# Check the flat field exposure times here for the files in your directory:

Typically you would have to scale the 60s master dark to different exposure times, but to save you a bit of effort, we've scaled them for you ahead of time. Both 1s and 10s master dark frames can be found in the "ExtraFiles" folder in the top level directory. Copy these files into your 'darks' folder.

When you save your dark-subtracted FITS file, be sure to add another prefix to the file name, and it's important to only dark subtract the bias-subtracted images. For example, this would change 'b_cal-001_Vflat.fit' to 'db_cal-001_Vflat.fit'.


In [ ]:
# Copy other master darks to directory in the following cell:

In [ ]:


In [ ]:
# Write your dark subtraction function here:

def dark_subtract(filename, path_to_dark):
    '''
    Add your docstring here.
    '''
    
    # Your code goes here.
    
    return

In [ ]:
# Now dark subtract the bias-subtracted flat fields:



# Did that work?

In [ ]:
ls

2.4 Making a Master Flat Field

The final step in creating master calibrations is to make a master flat field. Before we median combine into a single image, we want to divide each individual flat by its median pixel value, such that all the pixel values in each frame have values of approximately ~1.0 -- that is, we want to normalize them. Only then do we median combine the stack of normalized flat fields to create a master flat.

We can do this in a single function by editing the mediancombine function from earlier, and simply adding a single new line of code.

In line 11 below, add this extra line of code that normalizes im before adding it to the 3D array of fits_stack:


In [ ]:
def norm_combine_flats(filelist):
    '''
    Edit this docstring accordingly!
    '''
    n = len(filelist)
    first_frame_data = fits.getdata(filelist[0])
    imsize_y, imsize_x = first_frame_data.shape
    fits_stack = np.zeros((imsize_y, imsize_x , n), dtype = np.float32) 
    for ii in range(0, n):
        im = fits.getdata(filelist[ii])
        norm_im =  # finish new line here to normalize flats
        fits_stack[:,:,ii] = norm_im
    med_frame = np.median(fits_stack, axis=2)
    return med_frame

In the following cells, run norm_combine_flats on your list of dark-subtracted, bias-subtracted frames (only 3 images!), and then check the values of the output to ensure they're close to 1.0.


In [ ]:
# Make your list of files first, as usual:

In [ ]:
# Apply norm_combine_flats to that list:

In [ ]:
# Look at the output of the variable you defined in the previous cell to check the values:

In [ ]:
# As a final step, finish the code below to save the master flat as a new fits file (Master_Flat_Vband.fit),
# with the header taken from the first frame in the flat list.

flat_header =

fits.writeto('Master_Flat_Vband.fit', )

Examine your master flat file in DS9 and compare it to one of the raw flat frames. What do you notice about the two frames, qualitatively and quantitatively?
Answer:


In [ ]: