Import standard modules:
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import HTML
HTML('../style/course.css') #apply general CSS
Out[1]:
Import section specific modules:
In [2]:
from IPython.display import HTML
from ipywidgets import interact
HTML('../style/code_toggle.html')
Out[2]:
Our goal in radio interferometry is to produce a map of the sky by sampling a finite number of spatial frequency modes present in its spectrum (this will become clearer as the course unfolds). By sampling only a discrete number of frequency modes we are effectively turning a continuous problem into a discrete one. Recall that at the end of the last section we saw some peculiarities in the output of the discrete Fourier transform (DFT). In particular we note the following features:
These features will, in some form or another, also be present in radio interferometry. The aim of this section is to develop an intuitive understanding of why the above features are present in the discrete spectrum. This requires discussing aliasing and the Nyquist-Shannon sampling theorem (or just sampling theorem for short) which will allow us to answer questions such as:
Note that, in interferometry, we are sampling in the spatial frequency domain. Unfortunately, our intuition does not extend as easily to this domain. We are more inclined to think of the world in terms of a sequence of events i.e. in the time domain. A similar statement is true in interferometry. Our intuition extends more easily to the image domain than to its spectral dual. In this section, for simplicity, we will use time series to get a feeling for what is going on in the dual Fourier space viz. the frequency domain.
Intuitively we can think of the act of sampling simply as selecting a number of points of a continuous function. For simplicity let's start by considering a real valued function
$$ f:\mathbb{R} \rightarrow \mathbb{R}. $$The mathematical equivalent of selecting an infinite number of equally spaced samples can be expressed as
$$ f(t_n) = \sum_{n = -\infty}^{\infty} f(t) \delta(t - n\Delta t) = f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t}), $$where $\Delta t$ is the sampling interval, the index $n$ labels the samples and $III(\cdot)$ is the Dirac comb (Shah function). Obviously it is not possible to sample a function at an infinite number of points in practice. Suppose we sample a function at $N$ points in a finite domain, $t_0 \leq t \leq t_f$ say. This can be achieved as follows
$$ f(t_n) = \sum_{n = 0}^{N-1} f(t) \delta(t - n\Delta t - t_0), \quad \mbox{where} \quad \Delta t = \frac{t_f - t_0}{N-1}, \quad \mbox{and} \quad t_n = t_0 + n\Delta t. $$So far we have done nothing fancy, we have just expressed the act of sampling in terms of a continuous mathematical function. We can go a step further and write it purely in terms of the special functions introduced in ($\S$ 2.2 ) as
$$f(t_n)= f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t})\sqcap_{t_0,t_f}(t), $$where $\sqcap_{a,b}(\cdot)$ is the boxcar function. The sampling function can therefore be expressed as
$$ s_{t_0,\Delta t, N} = \frac{1}{\Delta t}III(\frac{t}{\Delta t})\sqcap_{t_0,t_f}(t), $$where it should be understood that the function $s_{t_0,\Delta t, N} $ implies a sampling of $N$ points in the interval $t_0 \leq t \leq t_f$ where $t_f = (N-1)\Delta t$.
At this stage you might be rolling your eyes at this seemingly cumbersome notation but it does serve a purpose. Consider what happens when we take the Fourier transform of the sampled function $f(t_n)$. First, recall the definition of the 1-D Fourier transform
$$ \mathscr{F}\{f(t)\}(s) = \int_{-\infty}^{\infty} f(t)e^{-2\pi\imath t s} dt. $$Substituting in the sampled function we get
$$ \mathscr{F}\{f(t_n)\}(s) = \int_{-\infty}^{\infty} f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t})\sqcap_{t_0,t_f}(t)e^{-2\pi\imath t s} dt. $$Next the Dirac comb changes the continuous integral into a discrete sum i.e.
$$ \mathscr{F}\{f(t_n)\}(s) = \sum_{n = -\infty}^{\infty} f(t_n) \sqcap_{t_0,t_f}(t)e^{-2\pi\imath t_n s}. $$The boxcar function sets at the terms for which $n \notin [0,\cdots,N-1]$ to zero. Thus we have
$$ \mathscr{F}\{f(t_n)\}(s) = \sum_{n = 0}^{N-1} f(t_n) e^{-2\pi\imath t_n s}. $$At this stage you can probably spot where we are going with this. Defining
$$ y_n = e^{-2\pi\imath t_0}f(t_n) \quad \mbox{where} \quad t_n = t_0 + n\Delta t, $$we see that
$$ \mathscr{F}\{f(t_n)\}(s) = \sum_{n = 0}^{N-1} f(t_n) e^{-2\pi\imath t_n s} = \sum_{n = 0}^{N-1} y_n e^{-2\pi\imath n\Delta t s}. $$Finally, defining
$$ s_k = \frac{k}{\Delta t N}, $$gives
$$ \mathscr{F}\{y_n\}(s)_k = Y_k = \sum_{n = 0}^{N-1} y_n e^{-2\pi\imath \frac{n k}{N}}. $$Note that, if $t_0 = 0$, we recover the Discrete Fourier Transform (DFT) as the Fourier transform of a sampled function $f(t_n)$. The factor $e^{-2\pi\imath t_0}$ is just a phase shift, it does not alter the amplitude of the components $Y_k$ at all (for a fixed time interval).
Writing the DFT as the Fourier transform of a sampled signal allows us to understanding some the peculiarities about the DFT we noted in the introduction. To see this, recall that multiplication in the time domain is the same as convolution in the frequency domain i.e.
$$ z(t) = f(t)g(t) \quad \Rightarrow \quad \mathscr{F}\{z\}(s) = \mathscr{F}\{f\}(s) \circ \mathscr{F}\{g\}(s). $$Applying this result to our sampled function we see that
$$ \mathscr{F}\{f(t_n)\}(s) = \mathscr{F}\{f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t})\} \circ \mathscr{F}\{\sqcap_{t_0,t_f}(t)\}. $$If we also use the fact that the Fourier transform of the boxcar function is given by (might be a good idea for you to verify this)
$$ \mathscr{F}\{\sqcap_{t_0,t_f}(t)\} = (N-1)\Delta t ~ sinc ((N-1)\Delta t s), $$then we have established that
$$ \mathscr{F}\{f(t_n)\}(s) = \mathscr{F}\{f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t})\} \circ (N-1)\Delta t ~ sinc ((N-1)\Delta t s). $$We see here that the output of the DFT is the convolution of a sinc function with the Fourier transform of an infinitely sampled function. This explains why there are non-zero components of $Y_k$ even at frequencies which do not correspond to any of the frequencies in the input signal. It is caused by the fact that, in practice, we can only compute the DFT of a limited number of samples. Before we discuss the concept of aliasing, use the interactive demo below to convince yourself (by adjusting the $N$ slider from the far left to the far right) that the width of the sinc function decreases with increased number of samples (for a fixed time interval). Also check that, for a fixed total time interval and as long as we have a sufficient number of sampls, the starting value $t_0$ only changes the phase of $Y_k$ but not its amplitude. Note we are only plotting the components of $Y_k$ corresponding to frequencies below $f_k \leq 5$Hz.
In [3]:
def inter_DFT(N,t0,tf,f1=1,f2=2,f3=3,plot_interval=5.0,plot_phase=True,show_Nyquist=False):
"""
Interactive DFT visualizer
"""
#set time domain
t = np.linspace(t0,tf,N)
#Get the signal
y = np.sin(2.0*np.pi*f1*t) + np.sin(2.0*np.pi*f2*t) + np.sin(2.0*np.pi*f3*t)
#Take the DFT (here we use FFT for speed)
Y = np.fft.fft(y)
#Get sampling interval
delt = (tf - t0)/(N-1)
#Sampling rate
fs = 1.0/delt
#Covert k to frequency
k = np.arange(N)
fk = k*fs/N
#Plot amplitude and phase
plt.figure(figsize=(15, 6))
plt.subplot(121)
absY = abs(Y)
Ymax = np.unique(absY.max())[0]
plt.stem(fk,absY)
plt.xlabel('$f_k$',fontsize=18)
plt.ylabel(r'$|Y_k|$',fontsize=18)
plt.xlim(0,plot_interval)
#Compute Nyquist freq
f_N = (N-1)/(2*(tf - t0))
if show_Nyquist and f_N < plot_interval:
arrow = plt.arrow(f_N, 0, 0, Ymax, head_width=0.5, head_length=3.0, fc='r', ec='k')
plt.subplot(122)
if plot_phase:
plt.stem(fk,np.angle(Y))
plt.xlabel('$f_k$',fontsize=18)
plt.ylabel(r'phase$(Y_k)$',fontsize=18)
plt.xlim(0,plot_interval)
plt.show()
else:
#Here we plot the theoretical spectrum
ft = np.array([f1,f2,f3])
Ymax = absY.max()
Yt = np.array([Ymax,Ymax,Ymax])
plt.stem(ft,Yt)
plt.xlabel('$f_k$',fontsize=18)
plt.ylabel('$|Y_k|$',fontsize=18)
plt.xlim(0,plot_interval)
plt.arrow(f_N, 0, 0, Ymax, head_width=0.5, head_length=3.0, fc='r', ec='k')
In [4]:
interact(lambda N,t0:inter_DFT(N=N,t0=t0,tf=t0+10),
N=(50,512,1),t0=(0,5,0.01)) and None
Can you explain what happens to the $|Y_k|$ as $N \rightarrow 0$? Why do the amplitudes change?
Hopefully, you will have noticed that the information contained in $Y_k$ is very sensitive to the sampling interval $\Delta t$. The Nyquist-Shannon sampling theorem explains why it is not possible to recover the frequency information contained in a signal when the sampling becomes too course i.e. $\Delta t$ is too large or $N$ is too small. One way to state the theorem is that, when sampling a signal at an interval $\Delta t$ (equivalently a frequency of $f_s = \frac{1}{\Delta t}$), it is not possible to recover any frequency information above a frequency of $f_N = \frac{1}{2\Delta t} = \frac{f_s}{2}$. We call $f_N$ the Nyquist frequency. In the above example we kept the sampling interval fixed. Thus, by decreasing $N$, we increase the width of the sampling interval $\Delta t$ and hence decrease $f_N$. This explains the seemingly erratic behaviour of $|Y_k|$ as $N \rightarrow 0$. Note that, at $f_1 = 1$Hz, $f_2 = 2$Hz and $f_3 = 3$Hz, the input frequencies remain the same. Decreasing $N$ below a certain limit therefore results in a value of $f_N$ less than the maximum frequency present in the signal. Proving the sampling theorem is trivial once the Poisson summation formula has been established. Recall that the periodic summation of the spectrum $Y(f)$ of the (Schwartz) function $y(t)$ is related to the Fourier series components of $y_n = \Delta t y(n \Delta t)$ according to
Since we can obtain the $y_n$ simply by sampling the function $y(t)$ (and multiplying by $\Delta t$), and we can in principle compute the Fourier series of $y_n$ up to arbitrary order, it is possible to find $Y_{f_s}(f)$ up to arbitrary precision. If there are no frequencies greater than $\frac{f_s}{2} = f_N$ present in the signal, the spectrum $Y(f)$ can be found from $Y_{f_s}(f)$ simply by constructing the filter
where $\sqcap(\cdot)$ is the rectangle (or boxcar) function. Finally, since the spectrum of a function completely determines the function, we must conclude that the samples $y(n\Delta t)$ are sufficient to reconstruct $y(t)$ completely. To see how this can be done, we simply note the following
where we have used the formula for the Fourier transform of the rectangle function. Taking the inverse Fourier transform of both sides results in the Whittaker-Shannon interpolation formula
Note that, in order to use this formula, we still need an infinite number of samples of $y(t)$, something which is impossible in practice.
We have already touched upon the concept of aliasing when we discussed periodic summation. We noted that the Discrete Time Fourier Transform (DTFT) results in a periodic function of the frequency variable $Y_{f_s}(f)$ which is in some way related to the periodic summation of the spectrum $Y(f)$ of the signal $y(t)$. The copies of $Y(f)$ in $Y_{f_s}(f)$ at $k \neq 0$ were called aliases. Understanding aliasing in the context of the DFT is slightly more subtle and is probably best illustrated with an example.
Going back to our sum of sine functions model above, let's consider what happens if we now allow one of the frequency components to vary i.e.
where $f_i$ is the varaible frequency. If we sample this function in the interval $0< t < 10$s with 256 samples (i.e. $\Delta t = \frac{t_f - t_0}{N - 1} = \frac{10}{255} \approx 0.0392s$) we obtain a Nyquist frequency of $f_N = \frac{1}{2\Delta t} = 12.75$. In the following demonstration the Nyquist frequency is indicated with the red arrow. The figure on the left shows the resulting $|Y_k|$ from the DFT. The figure on the right shows the input frequencies $f_1$, $f_2$ and $f_i$. Use the slider to investigate the output of the DFT as $f_i$ approaches $f_N$. Can you explain the output when $f_i > f_N$?
In [5]:
interact(lambda fi:inter_DFT(N=256,t0=0.0,tf=10,f3=fi,plot_interval=26,plot_phase=False,show_Nyquist=True),
fi=(3,25,0.25)) and None
Notice that the spectrum is reflected about $f_N$. As we noted before, this happens because of periodicity and because the DFT of a real valued signal is Hermitian. Since these additional frequency peaks do not contain any new information, they do not present a problem when trying to recover the signal $y(t)$ from its spectrum. Any non-zero frequency components above $f_N$ can safely be discarded. A more troublesome feature of the spectrum is that input frequencies $f_i$ above $f_N$ are folded or aliased back into the spectrum i.e. a frequency at $f_i = f_N + \epsilon$ appears at $f_k = f_N - \epsilon$. This effect is referred to as aliasing. It can, in principle, be eliminated by applying anti-aliasing filters to the signal. Unfortunately the details behind implementing such filters is beyond the scope of the current discussion. Here we will simply give a brief illustration of what they do.
Suppose we have an input signal whose spectrum is an unnormalised zero mean Gaussian of the form $$ |Y(f)| = e^{\frac{-f^2}{2\sigma^2}}, $$ where $\sigma$ is the standard deviation and $f$ is the frequency in Hertz. If we sampled the original signal with a sampling frequency of $f_s = 10$Hz (i.e. $\Delta t = 0.1$s) then, regardless of the width $\sigma$ of $|Y(f)|$, the DTFT repeats itself over frequency intervals $[(k-\frac{1}{2})f_s,(k+\frac{1}{2})f_s]$ where $k \in \mathbb{Z}$ is any integer. Whether these aliases overlap depends on the width $\sigma$ of $|Y(f)|$. For example, with $\sigma = 1$ we get the following output
In [6]:
def gaussian(x, mu, sig):
return np.exp(-(x - mu)**2.0 / (2 * sig**2.0))
#Set sampling frequency
fs = 10
#Set frequency domain of DFT
f = np.linspace(-fs,fs)
Y = gaussian(f,0,1.0)
plt.figure('rep_spec')
plt.plot(f-0,Y,'b')
plt.plot(f-10.,Y, 'k')
plt.plot(f+10,Y, 'k')
plt.plot(f-20.,Y, 'k')
plt.plot(f+20,Y, 'k')
#plt.ylim(-0.5,fft_gau.max())
plt.ylabel('$|Y_{f_s}(f)|$',fontsize=18)
plt.xlabel('$f / [Hz]$',fontsize=18)
plt.show()
Here we have plotted $|Y(f)|$ (i.e. the $k=0$ component of $|Y_{f_s}(f)|$) in blue. This is the quantity we are actually after. With the stated values of $\sigma$ and $f_s$ we would have no problem recovering the input signal $y(t)$ from $Y(f)$ because there is almost no overlap between the aliases. If, on the other hand, the width is $\sigma = 2.5$, the situation changes because the aliases start overlapping as shown below.
In [7]:
f = np.linspace(-10,10)
Y = gaussian(f,0,2.5)
plt.figure('rep_spec')
plt.plot(f-0,Y,'b')
plt.plot(f-10.,Y, 'k')
plt.plot(f+10,Y, 'k')
plt.plot(f-20.,Y, 'k')
plt.plot(f+20,Y, 'k')
#plt.ylim(-0.5,fft_gau.max())
plt.ylabel('$|Y_{f_s}(f)|$',fontsize=18)
plt.xlabel('$f / [Hz]$',fontsize=18)
plt.show()
With the specified sampling rate there will always be some loss of information about the signal. Basically we could never even hope to recover $Y(f)$ at frequencies larger than the Nyquist frequency $f_N = \frac{f_s}{2}$. Furthermore, when computing the DFT, input frequencies at $f_i = f_N + \epsilon$, for $\epsilon > 0$, will also be aliased back into the spectrum at $f_k = f_N - \epsilon$. Attempting to reconstruct the input signal $y(t)$ from the resulting DFT components $Y_k$ would not return the correct result. An anti-aliasing filter is a form of low-pass filter which prevents the components of $Y_k$ corresponding to input frequencies $f_i > f_N$ from contaminating the reconstructed signal $y(t)$. Its basic operation is illustrated in the following figure.
In [8]:
f = np.linspace(-5,5,20)
f2 = np.hstack((f[0],f, f[-1]))
Y = gaussian(f,0,2.5)
Y2 = np.hstack((0.0,Y,0.0))
plt.figure('rep_spec')
plt.plot(f2-0,Y2,'b')
plt.plot(f2-10.,Y2, 'k')
plt.plot(f2+10,Y2, 'k')
plt.plot(f2-20.,Y2, 'k')
plt.plot(f2+20,Y2, 'k')
#plt.ylim(-0.5,fft_gau.max())
plt.ylabel('$|Y_{f_s}(f)|$',fontsize=18)
plt.xlabel('$f / [Hz]$',fontsize=18)
plt.show()
An ideal anti-aliasing filter filters out all frequencies above the Nyquist frequency $f_N$. This ensures that the signal $y(t)$ computed from the (possibly truncated) spectrum of $Y(f)$ will not contain any incorrect frequency components. Note that information about frequencies above $f_N$ will still have been lost. For simlicity the above discussion only considered the magnitude $|Y(f)|$ but similar ideas apply to the phase as well. Whereas an ideal anti-aliasing filter blocks out all frequencies above $f_N$ (i.e. has a gain of one when $f \leq f_N$ and zero otherwise), its frequency response should be linear so that it does not result in any phase distortion.