Import standard modules:
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import HTML
HTML('../style/course.css') #apply general CSS
In [ ]:
HTML('../style/code_toggle.html')
We ended the previous section by discussing the ADC, the analogue component which converts a continuous, analogue signal into a discrete, digitial representation. Once the voltage signal is converted to a digital signal, we can simply record that signal to computer memory, and perform whatever computational operations we desire with software. In the case of interferometric observations, this means computing the correlation between all antenna pairs to produce baseline correlations. Most interferometric arrays produce significant amounts of data, and correlation is a deterministic algorithm. So, instead of recording the digitial signals and then performing the correlations, it is common to build correlators using digital signal processing (DSP) modules on custom hardware such as field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and graphical processing units (GPUs). Using custom hardware for correlation provides a specialized and power-efficient system, at the cost of limited flexiblity. For arrays with a small number of elements, or for maximum flexibility, software-based correlators can be used. The choice of digital architecture depends on the engineering costs and timescale for an array or facility, but at the core of any correlator system are the same basic DSP operations.
We should note here that there exist other types of digital instruments used in radio astronomy - spectrometers for computing the auto-correlation of a single element, beamformers for adding together elements into a single telescope, VLBI recorders which record the raw voltage signal to be correlated later at a central location. Since this section focuses on aperture synthesis with an interferometric array, we will focus on correlators.
A correlator is used to compute the correlations of all antenna pairs in an interferometric array. In $\S$ 4 ➞ we showed that the visibility function is equivalent to the mutual coherence of two spatialy separated receivers, using the van Cittert-Zernike Theorem:
$$ \Gamma_{pq}(u,v,\tau=0) = \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} { I_\nu(l,m) e^{-2\imath\pi(u_{12}l+v_{12}m)}} dl dm =V_{pq}(u,v,0) $$where the mutual coherence function $\Gamma_{pq}$ is
$$ \Gamma_\text{pq}(u,v,\tau)= \lim_{T \rightarrow \infty} \frac{1}{2T} \int_{-T}^{T} E_p(t)E_{q}^{*}(t-\tau) dt $$This function is equivalent to the correlation between two signals ($\S$ 2.6.1 ➞). Using the Convolution Theorem ($\S$ 2.7.4, $\S$ 2.7.8 ➞) the visibility function is then computed as
$$ V_{pq}(u,v,0) = \left( \mathscr{F} \left\{ {E_p} \right\} \right)^* \cdot \mathscr{F} \left\{ E_q \right\} $$Thus, in order to compute a baseline's visibility, the antenna signals are Fourier transformed, the complex conjugate is taken for one of the antenna signals, and it is multiplied with the other antenna signal. A note: the choice of complex conjugation does not make a difference, as long as it is consistently recorded, as $V_{pq} = V_{qp}^*$ or - equivalently - $V_{pq}(u,v) = V_{qp}(-u, -v)$. This operation is known as an 'FX' correlation, as the signals are first Fourier transformed (F) and then complex multiplied (X). After multiplication, the resulting signal is averaged over some integration time length, since the correlation is a statistical operation. This accumulation improves the signal to noise ratio and reduces the instrument's overall data rate. The terms integration time and accumulation length are often used interchangibly to describe this time scale. There are limits to the maximum integration time, which are set in practice by an image-domain effect known as smearing or decorrelation. This effect is discussed in ($\S$ 2.6.1 ➞). Very short integration times (< 1 second) are used is specific science cases, such as for fast transient events or pulsar imaging where the time scale of the physical process is very short. The general architecture of the FX correlator is shown in Figure 7.4.1.
Figure 7.4.1: FX Correlator architecture. The digitized signal from each antenna element is Fourier transformed with a poly-phase filterbank (PFB) consisting of a finite impulse response (FIR) filter and a fast Fourier transform (FFT). Each pair of signals is then complex multiplied together in the correlation tringale using a complex multiply and accumulate (CMAC) to produce a correlation matrix.
There are two types of correlations a correlator can compute:
With an interferometric array, we are interested in sampling the $uv$-plane to perform aperture synthesis. This means we are interested in the cross-correlations. The auto-correlations all measure the origin of the $uv$-plane. This is a special point in the plane, as it measures the average flux of the sky. From a signal processing perspective, this is the 'DC bias or offest' point, and is the first coefficient of the Fourier Transform.
We are interested in measuring the visibility function at the origin, as it provides an absolute flux scale to the synthesized image. This turns out to be fairly difficult in practice, however. When we compute a cross-correlation between antenna pairs, much of the system noise conveniently disappears, because each antenna will be generating noise which is incoherent with the other antenna's. When computing the auto-correlation, however, the noise is being correlated with itself - and remains in the correlation. In order to measure the visibility function at the origin, special antennas need to be built in order to mitigate the system noise with additional electronics or by combining interferometric observations with single-dish observations.
For an $N_{\textrm{antennas}}$ element interferometric array the number of correlations is:
$$ N_{\textrm{corr}} = N_{\textrm{auto corr}} + N_{\textrm{cross corr}} = N_{\textrm{antennas}} + \frac{N_{\textrm{antennas}} (N_{\textrm{antennas}}-1)}{2} \sim N_{\textrm{antennas}}^2 $$As $N_{\textrm{antennas}}$ increases the number of auto-correlations increases linearly while the number of cross-correlations grows quadratically. For $N_{\textrm{antennas}}$ of the order of 10 or more, the number of cross-correlations quickly dominates the number of computations compared to auto-correlations. The auto-correlations are usually computed for to diagnose individual telescopes, or used in special science cases such as power spectrum measurements.
As a historical note, the first correlator designs were called 'lag correlators'. These correlators pre-date the era of fast Fourier transform implementation using computing hardware, and were built in an era when an array consisted of only a few antennas and the observational bandwidth was fractionally small. This correlation was done by adding a series of time delays (lags) to one antenna, and multiplying the signal from each lag with the signal from a reference antenna. Each correlated lag signal was then integrated for a length of time and subsequently recorded. Later, the recorded correlations at different lags were Fourier transformed to produce the visibility. With the development of FX correlators the lag correlator was renamed the XF correlator as the multiplication (X) we performed before the Fourier tranform (F). Their general architecture is shown in Figure 7.4.2. This type of correlator is rarely used today, except in special cases. For modern interferometric arrays, when correlation is mentioned it is almost always done via FX correlation. For the remainder of this section we will only discuss FX correlators, and readers interested in further details on lag correlators are referred to Chapter 4 of Synthesis Imaging in Radio Astronomy II ⤴.
Figure 7.4.2: XF Correlator architecture. M time delays (lags) are applied to the signal from each antenna element. The lagged signals for each pair of antenna elements are then multiplied together with an M-tap lag multiply and accumulator (MAC). These signals are recorded to memory, and the lag-multiplied signals are later Fourier-transformed to produce the elements of the correlation matrix. <a id='instrum:fig:xf_diagram-crop></a>
Thougth the FX and XF architectures are different they produce equivalent results, Figure 7.4.3, as they are both applications of the Convolution Theorem.
Figure 7.4.3: FX-XF Correlator architecture equivalence. Both designs can be used to produce the equivalent visibilities, as both are an application of the Convolution Theorem. FX correlators are generally the most computationally efficient, however. Adapted from Synthesis Imaging in Radio Astronomy II, Chapter 4 ⤴.
The exact implementation of an FX Correlator depends on instrument requirements and hardware architecture, but there is a typical set of components which are included in the design. An overview of a digital correlator design is shown in the figure below. The signal from each antenna is digitized with an ADC and Fourier transformed with 'F' engines. This produces an unaccumulated spectrum made up of frequency channels which cover the digital bandwidth. In the 'F' engine, additional operations such as gain correction, fringe tracking, and quantization can be implemented. To perform the cross multiplication and accumulation for each antenna pair, a subset of these frequency channels are sent to various 'X' engines. This transform is done with a cornerturn operation. The resulting visibilities are recorded to file, often in a 'Measurement Set' which contains further metadata on the observation and synthesized telescope.
Figure 7.4.4: Block diagram of a distributed FX correlator design. The orthogonal polarization from N antennas are digitized and Fourier transformed in the 'F' engines. A cornerturn operation is applied to segment the data by frequency channels, to be cross multiplied and accumulated in the 'X' engines. The resulting visibilities are then written to file.
The F-engine, a block diagram shown below, is primarily composed of DSP to Fourier-transform the time-domain signal into a frequency-domain spectrum with a fast Fourier transform algorithm. The bandpass of the analogue system and digitisation process select a range of frequencies to be correlated between antennas. The full band from each antenna can be chosen to be correlated against another antenna, but this would only give information about the entire band. If there is strong narrowband RFI, then any correlation will be dominated by that, and the sky signal will be lost. If we are only interested in a weak narrow signal, such as a spectral line, that line structure will be hidden in the band correlation. In order to gain access to subsets of the frequency range, known as subbands or channels, a Fourier Transform is employed. In this context, a Fourier Transform will transform a wideband time-domain signal into a set of frequency-domain signals (for a given 'window' of samples). As we are working on discrete samples sampled regularly in time, a Discrete Fourier Transform (DFT) can be used.
The naive computational cost of the DFT is $\mathcal{O}(N^2)$, which can grow to be very computationally expensive for relatively small sizes of $N$, where $N$ is the number of samples (or channels). The solution to this computational problem is the Fast Fourier Transform (FFT) which is a class of algorithms with use the regularity of the sampling to approach a run time of $\mathcal{O}(N \log N)$. The Cooley-Tukey Radix-2 algorithm is "the" classic example of a FFT algorithm. Important note: you will often see FFTs of size $2^d$, i.e. 256, 512, 1024, etc., being used. This is because a size of $2^d$ allows the use of the most efficient FFT. Other FFT sizes, even sizes $<2^d$, will take longer to compute.
This leads to the following important point: any time you are doing an FFT, try to pick a size which is of the form $2^d$.
Let us close this parenthesis and return to our discretely-sampled time-domain signal. Our choice of how many samples to include in the window in which we apply the FFT affects the subband resolution. For $N$ real-valued time-domain signals, $N/2$ complex frequency subbands are created. The apparent factor of 2 'loss' in information due to the FFT is because the output of the FFT is complex and the input is real, so there are effectively 2 values per frequency subband - there is no net information loss. The more samples are included in the window, the higher the resolution of subband outputs. In the extreme case of an infinite window the DFT approaches the continuous Fourier Transform output. On the other exteme, a window of 1 sample will produce the same output as input. It is often useful to think of the FFT generating a series of narrow-band time domain signals: this is called a filterbank in DSP.
I have been using the term windows for blocks of samples which an FFT is applied to. If you take a continuous signal and select out a block of samples, then you have altered the signal: you have effectively applied a top-hat function (rectangle) to the signal. This might not seem like an issue since none of the samples have been modified, but the continuity of the signal has been altered. When an FFT is applied to the window, from the convolution theorem, we are convolving the Fourier transform of the data with the Fourier transform of a top-hat function. The Fourier transform of the top-hat function is a sinc function which is defined as $\textrm{sinc}(x) = \frac{\sin x}{x}$. The top-hat function is known as a window function, and it turns out to be a generally poor one. The primary issue with the top-hat function is that is that there are high 'sidelobes' with a 'slow' roll-off. Many other window functions have been developed: Hamming, Hann, Blackman, Gaussian, etc. These windowing functions have a trade off in the subband width, sidelobe levels, sidelobe roll-off, and sharpness. A simple example as to why you would want to apply a window function to samples is that if there is strong RFI in one subband, and the windowing function has high sidelobes (like a top-hat function), then power will leak from the RFI subband to clean neighbour subbands, corrupting it.
To apply a window function to the signals, a finite impulse response (FIR) filter is applied before the Fourier transform. The FIR filter - together with the FFT - is known as a poly-phase filterbank (PFB), and all modern digital back-ends use this instead of the simple FFT.
Figure 7.4.5: Block diagram of an F-engine. At the centre of the DSP is a Fourier transform to transform the time-domain voltage signals to a frequency-domain spectrum. This transform is done with a fast Fourier transform algorithm, and is often accompanied with a finite impluse response filter in what is called a polyphase filterbank. Fringe tracking can be applied within the F-engine using various signal delay methods. Gain correction and quantization is applied before the signal is transmitted to cornerturn memory.
During an observation, an array will track a position in right ascension and declination $(\textrm{RA}, \delta)$ on the sky (typically speaking, anyway; there are special cases such as transiting arrays). This $(\textrm{RA}, \delta)$ position is the phase-centre of the observation. Tracking is done with fringe-tracking or fringe-stopping logic. Using the position of the array elements, time of observation, and phase centre, the fringe-stopping logic continuously updates delay corrections. Two types of delays are used: coarse delay to delay the incoming signals by integer time samples, and fine delay to delay signal by subsamples. The coarse delay is applied before the Fourier transform and can be thought of as physically inserting cables of different lengths to delay the incoming signals. Fine delay is applied after the Fourier transform as a phase correction to each frequnecy channel of the spectra. This phase correction is slightly different for each frequency channel.
What occurs with these two different types of delays comes down the fact that the antenna signals have been digitized at some sampling frequency $f_{\textrm{samp}}$ which results in a maximum digital frequency (i.e. a Nyquist frequency) of $f_{\textrm{nyquist}} = \frac{f_{\textrm{samp}}}{2}$. This sampling frequency can also be thought of a sampling time resolution $\Delta t_{\textrm{samp}} = \frac{1}{f_{\textrm{samp}}}$: how often the incoming sky signal is sampled. Therefore, coarse delay can only delay a signal at a resolution of the sampling rate. For example, say we want to Nyquist sample 250 MHz of bandwidth. We use an ADC which samples at 500 Msps (mega-sampling per second). In this system, the sampling time resolution is $\Delta t_{\textrm{samp}} = 2$ nanoseconds. This time resolution can also be seen as a physical distance resolution, since the speed of light is constant $\Delta d_{\textrm{samp}} = \frac{c}{f_{\textrm{samp}}}$. Light travels approximately $3 10^8$ metres per second. Conversion to nanoseconds is done by dividing by $10^9$. In our example system, $\Delta d_{\textrm{samp}} = 60 cm$. This means that, even if we know the location of the antennas (and hence their delay correction) to better than 60 cm resolution - which is very likely as antenna positions are usually known to millimetre accuracy - the delay cannot be properly corrected, which leads to signal decorrelation between antennas.
Ideally, the delays can be corrected up to the resolution of the known telescope positions. To do this, fine delay is used. Once coarse delay is applied to correct delays up to the $\Delta t_{\textrm{samp}}$ resolution, a phase correction $\phi(\nu)$ is added to each frequency channel. The phase correction allows each frequency channel to be delayed up to the wavelegnth of that channel.
To determine the delays for each telescope in the array, a reference position is chosen. This is typically one of the antennas or the centre of the array. Delays can be both positive and negative in time, to account for the array geometry relative to the reference position and sky phase centre.
There is an important note to make here about fringe-stopping to track a position in the sky and physically pointing dishes. We will discuss primary beams and polarization in the next sections, which are related to physically pointing dishes. Fringe-stopping is the process of applying a phase and time delay correction remove phase variation (fringes) at the phase centre (i.e. counteracting the K-jones term). Physically pointing telescopes is done to improve sensitivity. Fringe-stopping is required to do aperture synthesis by setting a fixed phase centre for the observation. If fringe-stopping is not applied, then the phase centre of the observation effectively drifts slowly in right ascension throughout the observation.
example: full dynamic range spectrum w/ quantization lines -> flattened with gain correction w/ quantization lines
gain:
In [ ]:
# load observed sprectum from analogue section
npzfile = np.load('data/analogue_spectrum.npz')
obsSpectrum = npzfile['arr_0']
ifFreqs = npzfile['arr_1']
fig, axes = plt.subplots(figsize=(8,8))
ax1 = plt.subplot2grid((5,5), (0,0), colspan=4, rowspan=4)
plt.imshow(np.abs(obsSpectrum), aspect='auto')
ax1.get_xaxis().set_visible(False)
ax1.get_yaxis().set_visible(False)
ax2 = plt.subplot2grid((5,5), (4,0), colspan=4, rowspan=1)
plt.plot(ifFreqs/1e9, obsSpectrum[50, :])
plt.plot(ifFreqs/1e9, obsSpectrum[250, :])
plt.plot(ifFreqs/1e9, obsSpectrum[450, :])
plt.xlim(ifFreqs[0]/1e9, ifFreqs[-1]/1e9)
plt.ylabel('Flux')
plt.xlabel('Frequency (GHz)')
ax3 = plt.subplot2grid((5,5), (0,4), colspan=1, rowspan=4)
plt.plot(obsSpectrum[:, 130], np.arange(obsSpectrum.shape[0]))
plt.plot(obsSpectrum[:, 260], np.arange(obsSpectrum.shape[0]))
plt.plot(obsSpectrum[:, 390], np.arange(obsSpectrum.shape[0]))
ax3.get_xaxis().set_visible(False)
ax3.invert_yaxis()
ax3.yaxis.tick_right()
plt.ylabel('Time (s)')
ax3.yaxis.set_label_position('right')
plt.suptitle('Observed Spectrum')
Figure 7.4.7: Correlation matrix for a 7-telescope array, such as KAT-7. The seven auto-correlations are shown in green, while the 21 cross-correlations are shown in blue. Each cell of the matrix is a complex multiplication of the antenna pairs, and accumulation/averaging of the signal to produce the visibility.
In [ ]: