In [1]:
import numpy as np
import scipy.fftpack as fftpack
from astropy.table import Table
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
%matplotlib inline
font_prop = font_manager.FontProperties(size=16)

Problem 1: damped harmonic oscillator example

Generating a light curve


In [2]:
a = Table()
a.meta['dt'] = 0.0001  # time step, in seconds
a.meta['duration'] = 200 # length of time, in seconds
a.meta['omega'] = 2*np.pi  # angular frequency, in radians
a.meta['phi'] = 0.0  # offset angle, in radians

1a. Compute the time steps and a cosine harmonic with the above-defined properties.

For plotting ease below, save them in the table as time and har.


In [3]:
a['time'] = np.arange(0, a.meta['duration'], a.meta['dt']) # seconds
a['har'] = np.cos(a.meta['omega'] * a['time'] + a.meta['phi'])

1b. Compute four exponentially damped versions of the harmonic oscillation.

$$D(t_i) = e^{-\zeta t_i}H(t_i)$$

Pick your own four $\zeta$s! I recommend values between 0.01 and 1.

Save them in the table as damp1, damp2, etc.


In [4]:
a.meta['zeta1'] = 0.01
a['damp1'] = np.exp(-a['time'] * a.meta['zeta1']) * a['har']
a.meta['zeta2'] = 0.2
a['damp2'] = np.exp(-a['time'] * a.meta['zeta2']) * a['har']
a.meta['zeta3'] = 0.5
a['damp3'] = np.exp(-a['time'] * a.meta['zeta3']) * a['har']
a.meta['zeta4'] = 1.0 
a['damp4'] = np.exp(-a['time'] * a.meta['zeta4']) * a['har']

1c. Plot them all on top of each other.


In [5]:
fig, ax = plt.subplots(1, 1, figsize=(8, 4), dpi=300)
ax.plot(a['time'], a['har'], lw=2, linestyle='-', color='black')
ax.plot(a['time'], a['damp1'], lw=2, linestyle='-', color='orange')
ax.plot(a['time'], a['damp2'], lw=2, linestyle='-.', color='blue')
ax.plot(a['time'], a['damp3'], lw=2, linestyle='--', color='magenta')
ax.plot(a['time'], a['damp4'], lw=2, linestyle='-', color='green')
ax.set_xlim(0,8)
ax.set_xlabel("Time (seconds)", fontproperties=font_prop)
ax.set_ylabel("Amplitude", fontproperties=font_prop)
ax.tick_params(axis='both', which='major', labelsize=16, 
               top=True, right=True, bottom=True, left=True)
plt.show()


1d. Take the power spectrum of the harmonic and 4 damped harmonic time series.

The power $P$ at each frequency $\nu_i$, for the Fourier transform $F$, is $$P(\nu_i)=|F(\nu_i)|^2$$ Again, for plotting ease, save as pow_har, pow_damp1, etc.


In [6]:
a['pow_har'] = np.abs(fftpack.fft(a['har'])) ** 2
a['pow_damp1'] = np.abs(fftpack.fft(a['damp1'])) ** 2
a['pow_damp2'] = np.abs(fftpack.fft(a['damp2'])) ** 2
a['pow_damp3'] = np.abs(fftpack.fft(a['damp3'])) ** 2
a['pow_damp4'] = np.abs(fftpack.fft(a['damp4'])) ** 2

1e. Plot them!

Notice the trend between the width of the peak in the power spectrum, and the strength of the damping factor. For bonus points, put in a key/legend with the corresponding $\zeta$ value for each curve.


In [7]:
a['freq'] = fftpack.fftfreq(len(a), d=a.meta['dt'])
a.meta['nyq_ind'] = int(len(a)/2.)  # the index of the last positive Fourier frequency

fig, ax = plt.subplots(1, 1, figsize=(8, 5), dpi=300, tight_layout=True)
ax.plot(a['freq'][0:a.meta['nyq_ind']], a['pow_har'][0:a.meta['nyq_ind']].real/3e9, 
        lw=2, drawstyle='steps-mid', color='black')
ax.plot(a['freq'][0:a.meta['nyq_ind']], a['pow_damp1'][0:a.meta['nyq_ind']].real/1e9, 
        lw=2, drawstyle='steps-mid', linestyle='-', color='orange')
ax.plot(a['freq'][0:a.meta['nyq_ind']], a['pow_damp2'][0:a.meta['nyq_ind']].real/1e9, 
        lw=2, drawstyle='steps-mid', linestyle='-.', color='blue')
ax.plot(a['freq'][0:a.meta['nyq_ind']], a['pow_damp3'][0:a.meta['nyq_ind']].real/1e9, 
        lw=2, drawstyle='steps-mid', linestyle='--', color='magenta')
ax.plot(a['freq'][0:a.meta['nyq_ind']], a['pow_damp4'][0:a.meta['nyq_ind']].real/1e9, 
        lw=2, drawstyle='steps-mid', color='green')
ax.set_xlim(0.5, 1.5)
ax.set_ylim(1e-3, 5e2)
ax.set_yscale('log')
ax.set_xlabel("Frequency (Hz)", fontproperties=font_prop)
ax.set_ylabel("Amplitude (arbitrary)", fontproperties=font_prop)
ax.tick_params(axis='both', which='major', labelsize=16, 
               top=True, right=True, bottom=True, left=True)
plt.show()


Problem 2: Analyzing NICER data of the black hole X-ray binary MAXI J1535-571

Import it with astropy tables from the file "J1535_evt.fits".


In [8]:
j1535 = Table.read("./J1535_evt.fits", format='fits')

The data have come to us as an 'event list', meaning that it's a list of the time at which a photon was detected (in seconds, in spacecraft clock time) and what the energy of the photon was (a channel number; channel/100=photon energy in keV).

2a. Turn this rag-tag list of photons into an evenly-spaced light curve

2a.i. First, clean it up a little by only keeping photons with energies greater than 1 keV and less than 12 keV, using a mask.


In [9]:
energy_mask = (j1535['ENERGY'] >= 100) & (j1535['ENERGY'] <= 1200)
j1535 = j1535[energy_mask]

2a.ii. Then, make an evenly-spaced light curve array with np.histogram. Pick a light curve time resolution of dt=1/8seconds to start with. To put your light curve in units of count rate, divide the histogram (counts per bin) by dt (seconds per bin); to avoid typecasting errors, do this by multiplying by int(1/dt).


In [10]:
dt = 1./8. # seconds
tbins = np.arange(j1535['TIME'][0], j1535['TIME'][-1]+dt, step=dt)
lc, tbin_edges = np.histogram(j1535['TIME'], bins=tbins)
lc *= int(1/dt)

(yes, it takes a second or two; you're using half a million time bins in your light curve!)

2b. Let's try taking the power spectrum of it.


In [11]:
psd = np.abs(fftpack.fft(lc)) ** 2

Plot it!


In [12]:
freq = fftpack.fftfreq(len(psd), d=dt)
nyq_ind = int(len(psd)/2.)  # the index of the last positive Fourier frequency

fig, ax = plt.subplots(1, 1, figsize=(8, 5), dpi=300, tight_layout=True)
ax.plot(freq[0:nyq_ind], psd[0:nyq_ind].real, lw=1, drawstyle='steps-mid', color='black')
ax.set_yscale('log')
ax.set_xlabel("Frequency (Hz)", fontproperties=font_prop)
ax.set_ylabel("Power/Hz", fontproperties=font_prop)
ax.tick_params(axis='both', which='major', labelsize=16, 
               top=True, right=True, bottom=True, left=True)
plt.show()


It's ugly! But more importantly, you can't get useful science out of it.

What's going on?

  1. There are gaps in the light curve due to the orbit of the spacecraft (and occasionally stuff gets in the way). This has the effect of inserting top-hat windows into our function, which give the lumpy bumps at ~0.25 Hz. So, we need to break the light curve up into shorter segments that won't have weird drop-outs.
  2. There is a giant DC component at $\nu=0$. This is not astrophysical in origin, but from the mean of the light curve.
  3. Power spectra are often plotted on log-log scales, but the power gets really noisy and 'scatter-y' at higher frequencies.
  4. The astute observer will notice that we can only go up to a Nyquist frequency of 4 Hz. There's interesting astrophysical signals above 4 Hz, but if we did smaller dt with keeping the very long segment length, we'd have >1 million time bins, which can be asking a lot of a laptop processor.

2c. Segments!

2c.i. Turn your light curve code from 2a.ii. into a function:


In [13]:
def make_lc(events, start, end, dt):
    # these are EDGES, hence end+dt, so that 'end' gets used
    tbins = np.arange(start, end+dt, step=dt)  
    lc, tbin_edges = np.histogram(events, bins=tbins)
    lc *= int(1/dt)  # gives lc units of counts per second
    return lc

2c.ii. Sometimes, the detector is on and recording photons, but it's pointed too close to the Earth, or a structure on the spacecraft is occulting part of the view, or the instrument is moving through a zone of high particle background, or other things. The times when these things happen are recorded, and in data reduction you make a list of Good Time Intervals, or GTIs, which is when you can use good science data. I made a list of GTIs for this data file that are longer than 4 seconds long, which you can read in from "J1535_gti.fits".


In [14]:
gti_tab = Table.read("./J1535_gti.fits", format='fits')

2c.iii. Not only do we want to only use data in the GTIs, but we want to split the light curve up into multiple equal-length segments, take the power spectrum of each segment, and average them together. By using shorter time segments, we can use finer dt on the light curves without having so many bins for the computation that our computer grinds to a halt. There is the added bonus that the noise amplitudes will tend to cancel each other out, and the signal amplitudes will add, and we get better signal-to-noise!

As you learned in Jess's lecture yesterday, setting the length of the segment determines the lowest frequency you can probe, but for stellar-mass compact objects where we're usually interested in variability above ~0.1 Hz, this is an acceptable trade-off.


In [15]:
time = np.asarray(j1535['TIME'])  ## Doing this so that we can re-run
seg_length = 32. # seconds
dt = 1./128.# seconds
n_bins = int(seg_length/dt) # Number of time bins in a segment of light curve
psd_avg = np.zeros(n_bins)  # initiating, to keep running sum (then avearge at end)
n_seg = 0
for (start_gti, stop_gti) in zip(gti_tab['START'], gti_tab['STOP']):
    start_time = start_gti
    end_time = start_time + seg_length
    while end_time <= stop_gti:
        ## Make a mask of events in this segment
        seg_mask = time < end_time
        time_seg = time[seg_mask]
        ## Keep the stuff not in this segment for next time
        time = time[~seg_mask]
        ## Make the light curve
        lc_seg = make_lc(time_seg, start_time, end_time, dt)
        ## Turn that into a power spectrum
        psd_seg = np.abs(fftpack.fft(lc_seg)) ** 2
        ## Keep a running sum (to average at end)
        psd_avg += psd_seg
        ## Print out progress
        if n_seg % 5 == 0:
            print(n_seg)
        ## Incrementing for next loop
        n_seg += 1
        start_time += seg_length
        end_time += seg_length
psd_avg /= n_seg


0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75

Plot it!


In [16]:
freq = fftpack.fftfreq(len(psd_avg), d=dt)
nyq_ind = int(len(psd_avg)/2.)  # the index of the last positive Fourier frequency

fig, ax = plt.subplots(1, 1, figsize=(8, 5), dpi=300, tight_layout=True)
ax.plot(freq[0:nyq_ind], psd_avg[0:nyq_ind].real, lw=1, drawstyle='steps-mid', color='black')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel("Frequency (Hz)", fontproperties=font_prop)
ax.set_ylabel("Power/Hz", fontproperties=font_prop)
ax.tick_params(axis='both', which='major', labelsize=16, 
               top=True, right=True, bottom=True, left=True)
plt.show()


2d. Mean-subtracted

So, you can see something just to the left 10 Hz much clearer, but there's this giant whopping signal at the lowest frequency bin. This is what I've heard called the 'DC component', which arises from the mean count rate of the light curve. To get rid of it, subtract the mean from your light curve segment before taking the Fourier transform. Otherwise, keep the same code as above for 2c.iii. You may want to put some of this in a function for future use in this notebook.


In [17]:
time = np.asarray(j1535['TIME'])  ## Doing this so that we can re-run
seg_length = 32. # seconds
dt = 1./128.# seconds
n_bins = int(seg_length/dt) # Number of time bins in a segment of light curve
psd_avg = np.zeros(n_bins)  # initiating, to keep running sum (then average at end)
mean_avg = 0  # initiating, to keep running sum (then average at end)
n_seg = 0
for (start_gti, stop_gti) in zip(gti_tab['START'], gti_tab['STOP']):
    start_time = start_gti
    end_time = start_time + seg_length
    while end_time <= stop_gti:
        seg_mask = time < end_time
        time_seg = time[seg_mask]
        time = time[~seg_mask]
        lc_seg = make_lc(time_seg, start_time, end_time, dt)
        seg_mean = np.mean(lc_seg)
        mean_avg += seg_mean
        psd_seg = np.abs(fftpack.fft(lc_seg-seg_mean)) ** 2
        psd_avg += psd_seg
        if n_seg % 5 == 0:
            print(n_seg)
        n_seg += 1
        start_time += seg_length
        end_time += seg_length
psd_avg = psd_avg.real / n_seg
mean_avg /= n_seg


0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75

2e. Error on average power

The average power at a particular frequency has a chi-squared distribution with two degrees of freedom about the true underlying power spectrum. So, error is the value divided by the root of the number of segments. A big reason why we love power spectra(/periodograms) is because this is so straight forward!

One way to intuitively check if your errors are way-overestimated or way-underestimated is whether the size of the error bar looks commeasurate with the amount of bin-to-bin scatter of power at neighbouring frequencies.


In [18]:
err_psd = psd_avg.real / np.sqrt(n_seg)

Plotting, this time with ax.errorbar instead of ax.plot.


In [19]:
freq = fftpack.fftfreq(len(psd_avg), d=dt)
nyq_ind = int(len(psd_avg)/2.) 

fig, ax = plt.subplots(1, 1, figsize=(8, 5), dpi=300, tight_layout=True)
ax.errorbar(freq[0:nyq_ind], psd_avg[0:nyq_ind], yerr=err_psd[0:nyq_ind], lw=1, 
            drawstyle='steps-mid', color='black')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel("Frequency (Hz)", fontproperties=font_prop)
ax.set_ylabel("Power/Hz", fontproperties=font_prop)
ax.tick_params(axis='both', which='major', labelsize=16, 
               top=True, right=True, bottom=True, left=True)
plt.show()


The thing at ~8 Hz is a low-frequency QPO, and the hump at-and-below 1 Hz is broadband noise (which we'll discuss in detail this afternoon)!! Now that you've got the basic analysis step complete, we'll focus on plotting the data in a meaningful way so you can easily extract information about the QPO and noise.

2f. Re-binning

We often plot power spectra on log-log scaled axes (so, log on both the X and Y), and you'll notice that there's a big noisy part above 10 Hz. It is common practice to bin up the power spectrum geometrically (which is like making it equal-spaced in when log-plotted).

For this example, I'll use a re-binning factor of 1.03. If new bin 1 has the width of one old bin, new bin 2 will be some 1.03 bins wider. New bin 3 will be 1.03 times wider than that (the width of new bin 2), etc. For the first couple bins, this will round to one old bin (since you can only have an integer number of bins), but eventually a new bin will be two old bins, then more and more as you move higher in frequency. If the idea isn't quite sticking, try drawing out a representation of old bins and how the new bins get progressively larger by the rebinning factor.

For a given new bin x that spans indices a to b in the old bin array: $$\nu_{x} = \frac{1}{b-a}\sum_{i=a}^{b}\nu_{i}$$ $$P_{x} = \frac{1}{b-a}\sum_{i=a}^{b}P_{i}$$ $$\delta P_{x} = \frac{1}{b-a}\sqrt{\sum_{i=a}^{b}(\delta P_{i})^{2}}$$


In [20]:
def rebin(freq, power, err_power, rebin_factor=1.05):
    """
    Re-bin the power spectrum in frequency space by some re-binning factor
    (rebin_factor > 1). This is sometimes called 'geometric re-binning' or 
    'logarithmic re-binning', as opposed to linear re-binning 
    (e.g., grouping by 2)

    Parameters
    ----------
    freq : np.array of floats
        1-D array of the Fourier frequencies.

    power : np.array of floats
        1-D array of the power at each Fourier frequency, with any/arbitrary
        normalization.

    err_power : np.array of floats
        1-D array of the error on the power at each Fourier frequency, with the
        same normalization as the power.

    rebin_factor : float
        The factor by which the data are geometrically re-binned.

    Returns
    -------
    rb_freq : np.array of floats
        1-D array of the re-binned Fourier frequencies.

    rb_power : np.array of floats
        1-D array of the power at the re-binned Fourier frequencies, with the
        same normalization as the input power array.

    rb_err : np.array of floats
        1-D array of the error on the power at the re-binned Fourier
        frequencies, with the same normalization as the input error on power.
    """
    assert rebin_factor >= 1.0

    rb_power = np.asarray([])  # Array of re-binned power
    rb_freq = np.asarray([])   # Array of re-binned frequencies
    rb_err = np.asarray([])    # Array of error in re-binned power
    real_index = 1.0           # The unrounded next index in power
    int_index = 1              # The int of real_index, added to current_m every iteration
    current_m = 1              # Current index in power
    prev_m = 0                 # Previous index m

    ## Loop through the length of the array power, new bin by new bin, to
    ## compute the average power and frequency of that new geometric bin.
    while current_m < len(power):

        ## Determine the range of indices this specific geometric bin covers
        bin_range = np.absolute(current_m - prev_m)
        ## Want mean power of data points contained within one geometric bin
        bin_power = np.mean(power[prev_m:current_m,])
        ## Compute error in bin
        err_bin_power2 = np.sqrt(np.sum(err_power[prev_m:current_m] ** 2)) / \
            float(bin_range)
        ## Compute the mean frequency of a geometric bin
        bin_freq = np.mean(freq[prev_m:current_m])
        ## Append values to arrays
        rb_power = np.append(rb_power, bin_power)
        rb_freq = np.append(rb_freq, bin_freq)
        rb_err = np.append(rb_err, err_bin_power2)

        ## Increment for the next iteration of the loop
        ## Since the for-loop goes from prev_m to current_m-1 (since that's how
        ## the range function and array slicing works) it's ok that we set
        ## prev_m = current_m here for the next round. This will not cause any
        ## double-counting bins or skipping bins.
        prev_m = current_m
        real_index *= rebin_factor
        int_index = int(round(real_index))
        current_m += int_index

    return rb_freq, rb_power, rb_err

Apply this to the data (using JUST the frequency, power, and error at positive Fourier frequencies). Start with a rebin factor of 1.03.


In [21]:
rb_freq, rb_psd, rb_err = rebin(freq[0:nyq_ind], psd_avg[0:nyq_ind], err_psd[0:nyq_ind], 1.03)

In [22]:
fig, ax2 = plt.subplots(1,1, figsize=(9,6))
ax2.errorbar(rb_freq, rb_psd, yerr=rb_err, lw=1, 
            drawstyle='steps-mid', color='black')
ax2.set_xscale('log')
ax2.set_yscale('log')
ax2.set_xlabel(r'Frequency (Hz)', fontproperties=font_prop)
ax2.set_ylabel(r'Power/Hz', fontproperties=font_prop)
ax2.tick_params(axis='x', labelsize=16, bottom=True, top=True, 
                labelbottom=True, labeltop=False)
ax2.tick_params(axis='y', labelsize=16, left=True, right=True, 
                labelleft=True, labelright=False)
plt.show()


Play around with a few different values of rebin_factor to see how it changes the plotted power spectrum. 1 should give back exactly what you put in, and 1.1 tends to bin things up quite a lot.

Congratulations! You can make great-looking power spectra! Now, go back to part 2d. and try 4 or 5 different combinations of dt and seg_length. What happens when you pick too big of a dt to see the QPO frequency? What if your seg_length is really short?

One of the most important things to notice is that for a real astrophysical signal, the QPO (and low-frequency noise) are present for a variety of different dt and seg_length parameters.

2g. Normalization

The final thing standing between us and a publication-ready power spectrum plot is the normalization of the power along the y-axis. The normalization that's commonly used is fractional rms-squared normalization, sometimes just called the rms normalization. For a power spectrum created from counts/second unit light curves, the equation is: $$P_{frac.rms2} = P \times \frac{2*dt}{N * mean^2}$$ P is the power we already have, dt is the time step of the light curve, N is n_bins, the number of bins in one segment, and mean is the mean count rate (in counts/s) of the light curve (so, you will need to go back to 2d. and re-run keeping a running sum-then-average of the mean count rate).

Don't forget to apply the normalization to the error, and re-bin after!


In [23]:
psd_norm = psd_avg *  2. * dt / n_bins / mean_avg**2
err_norm = err_psd *  2. * dt / n_bins / mean_avg**2
rb_freq, rb_psd_norm, rb_err_norm = rebin(freq[0:nyq_ind], psd_norm[0:nyq_ind], err_norm[0:nyq_ind], 1.03)

In [24]:
fig, ax = plt.subplots(1,1, figsize=(9,6))
ax.errorbar(rb_freq, rb_psd_norm, yerr=rb_err_norm, lw=1, 
            drawstyle='steps-mid', color='black')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel(r'Frequency (Hz)', fontproperties=font_prop)
ax.set_ylabel(r'Power [(rms/mean$^{2}$)/Hz]', fontproperties=font_prop)
ax.tick_params(axis='x', labelsize=16, bottom=True, top=True, 
                labelbottom=True, labeltop=False)
ax.tick_params(axis='y', labelsize=16, left=True, right=True, 
                labelleft=True, labelright=False)
plt.show()


2h. Poisson noise level

Notice that the Poisson noise is a power law with slope 0 at high frequencies. With this normalization, we can predict the power of the Poisson noise level from the mean counts/s rate of the light curve! $$P_{noise} = 2/mean$$

Compute this noise level, and plot it with the power spectrum.


In [25]:
poissnoise = 2./mean_avg

fig, ax = plt.subplots(1,1, figsize=(9,6))
ax.errorbar(rb_freq, rb_psd_norm, yerr=rb_err_norm, lw=1, 
            drawstyle='steps-mid', color='black')
ax.hlines(poissnoise, rb_freq[0], rb_freq[-1], color='red')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel(r'Frequency (Hz)', fontproperties=font_prop)
ax.set_ylabel(r'Power [(rms/mean$^{2}$)/Hz]', fontproperties=font_prop)
ax.tick_params(axis='x', labelsize=16, bottom=True, top=True, 
                labelbottom=True, labeltop=False)
ax.tick_params(axis='y', labelsize=16, left=True, right=True, 
                labelleft=True, labelright=False)
plt.show()


Your horizontal Poisson noise line should be really close to the power at and above ~10 Hz.

2i. For plotting purposes, we sometimes subtract the Poisson noise level from the power before plotting.

Once we've done this and removed the noise, we can also plot the data in units of Power, instead of Power/Hz, by multiplying the power by the frequency. Recall that following the propagation of errors, you will need to multiply the error by the frequency as well, but not subtract the Poisson noise level there.


In [26]:
nuPnu = (rb_psd_norm - poissnoise) * rb_freq
fig, ax = plt.subplots(1,1, figsize=(9,6))
ax.errorbar(rb_freq, nuPnu, yerr=rb_err_norm*rb_freq, lw=1, 
            drawstyle='steps-mid', color='black')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel(r'Frequency (Hz)', fontproperties=font_prop)
ax.set_ylabel(r'Power (rms/mean$^{2}$)', fontproperties=font_prop)
ax.tick_params(axis='x', labelsize=16, bottom=True, top=True, 
                labelbottom=True, labeltop=False)
ax.tick_params(axis='y', labelsize=16, left=True, right=True, 
                labelleft=True, labelright=False)
plt.show()


Beautiful! This lets us see the components clearly above the noise and see their relative contributions to the power spectrum (and thus to the light curve).

Recap of what you learned in problem 2:

You are now able to take a light curve, break it into appropriate segments using the given Good Time Intervals, compute the average power spectrum (without weird aliasing artefacts or a strong DC component), and plot it in such away that you can see the signals clearly.

Problem 3: It's pulsar time

We are going to take these skills and now work on two different observations of the same source, the ultra-luminous X-ray pulsar Swift J0243.6+6124. The goal is for you to see how different harmonics in the pulse shape manifest in the power spectrum.

3a. Load the data and GTI

Using the files J0243-122_evt.fits and J0243-134_evt.fits, and the corresponding x_gti.fits.


In [27]:
j0243_1 = Table.read("./J0243-122_evt.fits", format='fits')
gti_1 = Table.read("./J0243-122_gti.fits", format='fits')
j0243_2 = Table.read("./J0243-134_evt.fits", format='fits')
gti_2 = Table.read("./J0243-134_gti.fits", format='fits')

3b. Apply a mask to remove energies below 0.5 keV and above 12 keV.


In [28]:
energy_mask_1 = (j0243_1['ENERGY'] >= 50) & (j0243_1['ENERGY'] <= 1200)
j0243_1 = j0243_1[energy_mask_1]
energy_mask_2 = (j0243_2['ENERGY'] >= 50) & (j0243_2['ENERGY'] <= 1200)
j0243_2 = j0243_2[energy_mask_2]

3c. Make the average power spectrum for each data file.

Go through in the same way as 2d (using the make_lc function you already wrote). Re-bin and normalize (using the rebin function you already wrote). The spin period is 10 seconds, so I don't recommend using a segment length shorter than that (try 64 seconds). Since the period is quite long (for a pulsar), you can use a longer dt, like 1/8 seconds. Use the same segment length and dt for both data sets.


In [29]:
seg_length = 64. # seconds
dt = 1./8.# seconds
n_bins = int(seg_length/dt) # Number of time bins in a segment of light curve

In [30]:
def make_psd(time_array, gti_tab, dt, seg_length):
    time = np.asarray(time_array)  ## Doing this so that we can re-run
    psd_avg = np.zeros(n_bins)  # initiating, to keep running sum (then average at end)
    mean_avg = 0  # initiating, to keep running sum (then average at end)
    n_seg = 0
    for (start_gti, stop_gti) in zip(gti_tab['START'], gti_tab['STOP']):
        start_time = start_gti
        end_time = start_time + seg_length
        while end_time <= stop_gti:
            seg_mask = time < end_time
            time_seg = time[seg_mask]
            time = time[~seg_mask]
            lc_seg = make_lc(time_seg, start_time, end_time, dt)
            seg_mean = np.mean(lc_seg)
            mean_avg += seg_mean
            psd_seg = np.abs(fftpack.fft(lc_seg-seg_mean)) ** 2
            psd_avg += psd_seg
            if n_seg % 10 == 0:
                print(n_seg)
            n_seg += 1
            start_time += seg_length
            end_time += seg_length
    psd_avg = psd_avg.real / n_seg
    mean_avg /= n_seg
    err_psd = psd_avg/np.sqrt(n_seg)
    psd_avg *= 2. * dt / n_bins / mean_avg**2
    err_psd *= 2. * dt / n_bins / mean_avg**2
    return psd_avg, err_psd, mean_avg, n_seg

In [31]:
psd1, err1, mean1, nseg1 = make_psd(np.asarray(j0243_1['TIME']), gti_1, dt, seg_length)


0
10
20
30
40
50

In [32]:
psd2, err2, mean2, nseg2 = make_psd(np.asarray(j0243_2['TIME']), gti_2, dt, seg_length)


0
10
20
30
40
50
60
70

In [33]:
freq = fftpack.fftfreq(n_bins, d=dt)
nyq_ind = int(n_bins/2.) 
fig, ax = plt.subplots(1,1, figsize=(9,6))
ax.errorbar(freq[0:nyq_ind], psd1[0:nyq_ind], lw=1, 
            drawstyle='steps-mid', color='purple')
ax.errorbar(freq[0:nyq_ind], psd2[0:nyq_ind], lw=1, 
            drawstyle='steps-mid', color='green')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel(r'Frequency (Hz)', fontproperties=font_prop)
ax.set_ylabel(r'Power (rms/mean$^{2}$)', fontproperties=font_prop)
ax.tick_params(axis='x', labelsize=16, bottom=True, top=True, 
                labelbottom=True, labeltop=False)
ax.tick_params(axis='y', labelsize=16, left=True, right=True, 
                labelleft=True, labelright=False)
plt.show()


3d. Make a phase-folded light curve

3d.i. Determine the spin period from the frequency of the lowest (fundamental) tone in the power spectrum. Remember that period=1/f. Hint: np.argmax is a great function for quick, brute-force things.


In [34]:
spin_f = freq[np.argmax(psd2[0:10])]
period = 1./spin_f

3d.ii. Use the modulo operator of the light curve (starting it at time zero) to determine the relative phase of each photon event, then divide by the period to have relative phase from 0 to 1.


In [35]:
rel_time1 = np.asarray(j0243_1['TIME']) - j0243_1['TIME'][0]
rel_phase1 = (rel_time1 % period) / period
rel_time2 = np.asarray(j0243_2['TIME']) - j0243_2['TIME'][0]
rel_phase2 = (rel_time2 % period) / period

3d.iii. Make an array of 20 phase bins and put the relative phases in their phase bins with np.histogram.


In [36]:
phase1, bins1 = np.histogram(rel_phase1, bins=20, range=(0, 1))
phase2, bins2 = np.histogram(rel_phase2, bins=20, range=(0, 1))

3d.iv. Plot the light curve next to its accompanying power spectrum!


In [37]:
fig, ax = plt.subplots(1,1, figsize=(9,6))
ax.plot(bins1[0:-1], phase1, lw=1, color='purple')
ax.set_xlabel(r'Relative phase', fontproperties=font_prop)
ax.set_ylabel(r'Counts per phase bin', fontproperties=font_prop)
ax.set_xlim(0, 1)
ax.tick_params(axis='x', labelsize=16, bottom=True, top=True, 
                labelbottom=True, labeltop=False)
ax.tick_params(axis='y', labelsize=16, left=True, right=True, 
                labelleft=True, labelright=False)
plt.show()



In [38]:
fig, ax = plt.subplots(1,1, figsize=(9,6))
ax.plot(bins2[0:-1], phase2, lw=1, color='green')
ax.set_xlabel(r'Relative phase', fontproperties=font_prop)
ax.set_ylabel(r'Counts per phase bin', fontproperties=font_prop)
ax.set_xlim(0, 1)
ax.tick_params(axis='x', labelsize=16, bottom=True, top=True, 
                labelbottom=True, labeltop=False)
ax.tick_params(axis='y', labelsize=16, left=True, right=True, 
                labelleft=True, labelright=False)
plt.show()


Though these are very quickly made phase-folded light curves, you can see how the light curve with stronger harmonic content shows more power at the harmonic frequency in the power spectrum, and the light curve that's more asymmetric in rise and fall times (number 1) shows powe at higher harmonics!

If you want to see what a real phase-folded pulse profile looks like for these data, check out the beautiful plots in Wilson-Hodge et al. 2018: https://ui.adsabs.harvard.edu/abs/2018ApJ...863....9W/abstract Data set 1 has an observation ID that ends in 122 and corresponds to MJD 58089.626, and data set 2 has an observation ID that ends in 134 and corresponds to MJD 58127.622.

Bonus challenges:

1. Dynamical power spectrum (/spectrogram):

Instead of averaging the power spectra at each segment, save it in an array (if one power spectrum has length n_bins, the array will end up with size (n_bins, n_seg). Apply the normalization and re-binning to each segment, then make a 3d plot with frequency along the y-axis, segment (which corresponds to elapsed time) along the x-axis, and power as the colormap. This approach is useful if you think the QPO turns on and off rapidly (high-frequency QPOs do this) or is changing its frequency on short timescales. If the frequency is changing, this can artificially broaden the Lorentzian-shaped peak we see in the average power spectrum. Or, sometimes it's intrinsically broad. A look at the dynamical power spectrum will tell you! This will be most interesting on the black hole J1535 data, but could be done for both objects.

2. Energy bands:

Make and plot power spectra of the same object using light curves of different energy bands. For example, try 1-2 keV, 2-4 keV, and 4-12 keV. Try to only loop through the event list once as you do the analysis for all three bands. What do you notice about the energy dependence of the signal?

3. Optimization:

Optimize the memory and/or time usage of the algorithm we wrote that reads through the light curve and makes an average power spectrum. You can use skills you learned at this and previous DSFP sessions!

4. Modeling:

Using astropy.modeling (or your own preferred modeling package), fit the power spectrum of the black hole J1535 with a Lorentzian for the QPO, a few Lorentzians for the low-frequency broadband noise, and a power law for the Poisson noise level. In papers we often report the centroid frequency and the full-width at half maximum (FWHM) of the QPO Lorentzian model. How would you rule out the presence of a QPO at, e.g., 12 Hz?


In [ ]: