Import standard modules:



In [ ]:

    
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import HTML 
HTML('../style/course.css') #apply general CSS

Import section specific modules:



In [ ]:

    
from IPython.display import HTML, display
from ipywidgets import interact
HTML('../style/code_toggle.html')

2.9 Sampling Theory

Our goal in radio interferometry is to produce a map of the sky by sampling a finite number of spatial frequency modes present in its spectrum (this will become clearer as the course unfolds). By sampling only a discrete number of frequency modes we are effectively turning a continuous problem into a discrete one. Recall that at the end of the last section we saw some peculiarities in the output of the discrete Fourier transform (DFT). In particular we note the following features:

That there were non-zero components $Y_k$ at frequencies which were not present in the input signal.
That the amplitudes of the $Y_k$ corresponding to the frequencies present in the input signal were not all equal dispite the fact that they are the same in the input signal.
That is was not possible to find the frequencies present in the input signal when $N$ (the number of samples) was below a certain number.

These features will, in some form or another, also be present in radio interferometry. The aim of this section is to develop an intuitive understanding of why the above features are present in the discrete spectrum. This requires discussing aliasing and the Nyquist-Shannon sampling theorem (or just sampling theorem for short) which will allow us to answer questions such as:

Given the (effective) diameter of a telescope, what is the maximum pixel size we can choose for the reconstructed image?
Which of the features in the reconstructed image are real features corresponding to the sky and which are artificial features introduced by sampling only a limited number of frequency modes?

Note that, in interferometry, we are sampling in the spatial frequency domain. Unfortunately, our intuition does not extend as easily to this domain. We are more inclined to think of the world in terms of a sequence of events i.e. in the time domain. A similar statement is true in interferometry. Our intuition extends more easily to the image domain than to its spectral dual. In this section, for simplicity, we will use time series to get a feeling for what is going on in the dual Fourier space viz. the frequency domain.

2.9.1 Sampling a Continuous Function

Intuitively we can think of the act of sampling simply as selecting a number of points of a continuous function. For simplicity let's start by considering a real valued function

$$ f:\mathbb{R} \rightarrow \mathbb{R}. $$

The mathematical equivalent of selecting an infinite number of equally spaced samples can be expressed as

$$ f(t_n) = \sum_{n = -\infty}^{\infty} f(t) \delta(t - n\Delta t) = f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t}), $$

where $\Delta t$ is the sampling interval, the index $n$ labels the samples and $III(\cdot)$ is the Dirac comb (Shah function). Obviously it is not possible to sample a function at an infinite number of points in practice. Suppose we sample a function at $N$ points in a finite domain, $t_0 \leq t \leq t_f$ say. This can be achieved as follows

$$ f(t_n) = \sum_{n = 0}^{N-1} f(t) \delta(t - n\Delta t - t_0), \quad \mbox{where} \quad \Delta t = \frac{t_f - t_0}{N-1}, \quad \mbox{and} \quad t_n = t_0 + n\Delta t. $$

So far we have done nothing fancy, we have just expressed the act of sampling in terms of a continuous mathematical function. We can go a step further and write it purely in terms of the special functions introduced in ($\S$ 2.2) as

$$f(t_n)= f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t})\sqcap_{t_0,t_f}(t), $$

where $\sqcap_{a,b}(\cdot)$ is the boxcar function. The sampling function can therefore be expressed as

$$ s_{t_0,\Delta t, N} = \frac{1}{\Delta t}III(\frac{t}{\Delta t})\sqcap_{t_0,t_f}(t), $$

where it should be understood that the function $s_{t_0,\Delta t, N} $ implies a sampling of $N$ points in the interval $t_0 \leq t \leq t_f$ where $t_f = (N-1)\Delta t$.

At this stage you might be rolling your eyes at this seemingly cumbersome notation but it does serve a purpose. Consider what happens when we take the Fourier transform of the sampled function $f(t_n)$. First, recall the definition of the 1-D Fourier transform

$$ \mathscr{F}\{f(t)\}(s) = \int_{-\infty}^{\infty} f(t)e^{-2\pi\imath t s} dt. $$

Substituting in the sampled function we get

$$ \mathscr{F}\{f(t_n)\}(s) = \int_{-\infty}^{\infty} f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t})\sqcap_{t_0,t_f}(t)e^{-2\pi\imath t s} dt. $$

Next the Dirac comb changes the continuous integral into a discrete sum i.e.

$$ \mathscr{F}\{f(t_n)\}(s) = \sum_{n = -\infty}^{\infty} f(t_n) \sqcap_{t_0,t_f}(t)e^{-2\pi\imath t_n s}. $$

The boxcar function sets at the terms for which $n \notin [0,\cdots,N-1]$ to zero. Thus we have

$$ \mathscr{F}\{f(t_n)\}(s) = \sum_{n = 0}^{N-1} f(t_n) e^{-2\pi\imath t_n s}. $$

At this stage you can probably spot where we are going with this. Defining

$$ y_n = e^{-2\pi\imath t_0}f(t_n) \quad \mbox{where} \quad t_n = t_0 + n\Delta t, $$

we see that

$$ \mathscr{F}\{f(t_n)\}(s) = \sum_{n = 0}^{N-1} f(t_n) e^{-2\pi\imath t_n s} = \sum_{n = 0}^{N-1} y_n e^{-2\pi\imath n\Delta t s}. $$

Finally, defining

$$ s_k = \frac{k}{\Delta t N}, $$

gives

$$ \mathscr{F}\{y_n\}(s)_k = Y_k = \sum_{n = 0}^{N-1} y_n e^{-2\pi\imath \frac{n k}{N}}. $$

Note that, if $t_0 = 0$, we recover the Discrete Fourier Transform (DFT) as the Fourier transform of a sampled function $f(t_n)$. The factor $e^{-2\pi\imath t_0}$ is just a phase shift, it does not alter the amplitude of the components $Y_k$ at all (for a fixed time interval).

Writing the DFT as the Fourier transform of a sampled signal allows us to understand some of the peculiarities we noted about the DFT in the introduction. To see this, recall that multiplication in the time domain is the same as convolution in the frequency domain i.e.

$$ z(t) = f(t)g(t) \quad \Rightarrow \quad \mathscr{F}\{z\}(s) = \mathscr{F}\{f\}(s) \circ \mathscr{F}\{g\}(s). $$

Applying this result to our sampled function we see that

$$ \mathscr{F}\{f(t_n)\}(s) = \mathscr{F}\{f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t})\} \circ \mathscr{F}\{\sqcap_{t_0,t_f}(t)\}. $$

If we also use the fact that the Fourier transform of the boxcar function is given by (might be a good idea for you to verify this)

$$ \mathscr{F}\{\sqcap_{t_0,t_f}(t)\} = (N-1)\Delta t ~ \text{sinc} ((N-1)\Delta t s), $$

then we have established that

$$ \mathscr{F}\{f(t_n)\}(s) = \mathscr{F}\{f(t)\frac{1}{\Delta t}III(\frac{t}{\Delta t})\} \circ (N-1)\Delta t ~ \text{sinc} ((N-1)\Delta t s). $$

We see here that the output of the DFT is the convolution of a sinc function with the Fourier transform of an infinitely sampled function. This explains why there are non-zero components of $Y_k$ even at frequencies which do not correspond to any of the frequencies in the input signal. It is caused by the fact that, in practice, we can only compute the DFT of a limited number of samples. Before we discuss the concept of aliasing, use the interactive demo below to convince yourself (by adjusting the $N$ slider from the far left to the far right) that the width of the sinc function decreases with increased number of samples (for a fixed time interval). Also check that, for a fixed total time interval and as long as we have a sufficient number of samples, the starting value $t_0$ only changes the phase of $Y_k$ but not its amplitude. Note we are only plotting the components of $Y_k$ corresponding to frequencies below $f_k \leq 5$Hz.



In [ ]:

    
def inter_DFT(N,t0,tf,f1=1,f2=2,f3=3,plot_interval=5.0,plot_phase=True,show_Nyquist=False):
    """
    Interactive DFT visualizer
    """
    #set time domain
    t = np.linspace(t0,tf,N)
    #Get the signal
    y = np.sin(2.0*np.pi*f1*t) + np.sin(2.0*np.pi*f2*t) + np.sin(2.0*np.pi*f3*t)
    #Take the DFT (here we use FFT for speed)
    Y = np.fft.fft(y)
    #Get sampling interval
    delt = (tf - t0)/(N-1)
    #Sampling rate
    fs = 1.0/delt
    #Covert k to frequency
    k = np.arange(N)
    fk = k*fs/N    
    #Plot amplitude and phase
    plt.figure(figsize=(15, 6))
    plt.subplot(121)
    absY = abs(Y)
    Ymax = np.unique(absY.max())[0]
    plt.stem(fk,absY)
    plt.xlabel('$f_k$',fontsize=18)
    plt.ylabel(r'$|Y_k|$',fontsize=18)
    plt.xlim(0,plot_interval)
    #Compute Nyquist freq
    f_N = (N-1)/(2*(tf - t0))
    if show_Nyquist and f_N < plot_interval:
        arrow = plt.arrow(f_N, 0, 0, Ymax, head_width=0.5, head_length=3.0, fc='r', ec='k')
    plt.subplot(122)
    if plot_phase:
        plt.stem(fk,np.angle(Y))
        plt.xlabel('$f_k$',fontsize=18)
        plt.ylabel(r'phase$(Y_k)$',fontsize=18)
        plt.xlim(0,plot_interval)
        plt.show()
    else:
        #Here we plot the theoretical spectrum
        ft = np.array([f1,f2,f3])
        Ymax = absY.max()
        Yt = np.array([Ymax,Ymax,Ymax])
        plt.stem(ft,Yt)
        plt.xlabel('$f_k$',fontsize=18)
        plt.ylabel('$|Y_k|$',fontsize=18)
        plt.xlim(0,plot_interval)
        plt.arrow(f_N, 0, 0, Ymax, head_width=0.5, head_length=3.0, fc='r', ec='k')



In [ ]:

    
i = interact(lambda N,t0:inter_DFT(N=N,t0=t0,tf=t0+10),
                    N=(50,512,1),t0=(0,5,0.01)) and None

Figure 2.9.1: This demo illustrates the effect that finite sampling has on the DFT. As the number $N$ is increased you should see that the peaks in $|Y_k|$ get sharper (because the convolving sinc function gets narrower). Moreover, the value of $t_0$ only affects the phase of $Y_k$ and not its amplitude.

Can you explain what happens to the $|Y_k|$ as $N \rightarrow 0$? Why do the amplitudes change?

2.9.2 Nyquist-Shannon Sampling Theorem

Hopefully, you will have noticed that the information contained in $Y_k$ is very sensitive to the sampling interval $\Delta t$. The Nyquist-Shannon sampling theorem explains why it is not possible to recover the frequency information contained in a signal when the sampling becomes too course i.e. $\Delta t$ is too large or $N$ is too small. One way to state the theorem is that, when sampling a signal at an interval $\Delta t$ (equivalently a frequency of $f_s = \frac{1}{\Delta t}$), it is not possible to recover any frequency information above a frequency of $f_N = \frac{1}{2\Delta t} = \frac{f_s}{2}$. We call $f_N$ the Nyquist frequency. In the above example we kept the sampling interval fixed. Thus, by decreasing $N$, we increase the width of the sampling interval $\Delta t$ and hence decrease $f_N$. This explains the seemingly erratic behaviour of $|Y_k|$ as $N \rightarrow 0$. Note that, at $f_1 = 1$Hz, $f_2 = 2$Hz and $f_3 = 3$Hz, the input frequencies remain the same. Decreasing $N$ below a certain limit therefore results in a value of $f_N$ less than the maximum frequency present in the signal. Proving the sampling theorem is trivial once the Poisson summation formula has been established. Recall that, by the Poisson summation formula, the periodic summation of the spectrum $Y(f)$ of the (Schwartz) function $y(t)$ can be obtained as a Fourier series with coefficients $y_n = \Delta t \ y(n \Delta t)$ i.e.

$$ Y_{f_s}(f) = \sum_{k = -\infty}^{\infty} Y(f - kf_s) = \sum_{n = -\infty}^{\infty} y_n e^{-2 \pi \imath f \Delta t n} = \sum_{n = -\infty}^{\infty} \Delta t ~ y(\Delta t n) e^{-2\pi\imath f \Delta t n}. $$

Since our primary aim is to develop an intuition for the effects of finite sampling, we will leave the maths to the mathematicians and take this fact as given. It tells us that, if we sample this signal at equally space intervals of $t_n = n\Delta t$, we can construct the periodic summation of its spectrum with period $f_s = \frac{1}{\Delta t}$ just by summing the series on the RHS of the above equation. Now suppose we have a signal $y(t)$ which contains no frequencies above a certain upper limit called the bandwidth which we denote $B$ (in other words we have a bandlimited signal). The interactive figure below allows you to experiment with what the periodic summation of the signal's spectrum would look like as a function of the bandwidth.



In [ ]:

    
def bandlimit_func(f,B,fc):
    Y = np.zeros(f.size)
    I = np.argwhere(np.abs(f) <= B).squeeze()
    fint = f[I[:]]
    Il = np.argwhere(f <= fc - B).squeeze()[-1]
    Iu = np.argwhere(f <= fc + B).squeeze()[-1]
    try:
        Y[Il:Iu+1] = 1 - fint**2/B**2
    except:
        Y[Il:Iu] = 1 - fint**2/B**2
    return Y

def inter_PerSum(B):
    fs = 10
    print('f_s = ', fs)
    N = 100
    m = 4 # half the number of copies
    #f = np.linspace(-fs,fs,N)
    f = np.linspace(-m*fs,m*fs,2*m*N)
    Ys = np.zeros(2*m*N)
    plt.figure('x',figsize=(14,7))
    for i in range(-(m-1),m):
        Ys += bandlimit_func(f,B,i*fs)
        #plt.xticks([i*fs],[r'$%s f_s0$'%i])
    plt.plot(f[-2*(m-1)*N:2*(m-1)*N+1],Ys[-2*(m-1)*N:2*(m-1)*N+1])
    plt.ylabel('$|Y_{f_s}(f)|$',fontsize=18)
    plt.xlabel('$f / [Hz]$',fontsize=18)
    plt.xticks([-fs, -fs/2, 0, fs/2, fs],[r'$-f_s$', r'$-f_N$', r'$0$', r'$f_N$', r'$f_s$'],fontsize=16)
    plt.xlim(-2*fs,2*fs)
    plt.show()
    
interact(lambda B:inter_PerSum(B=B),
                B=(0,8,1)) and None

Figure 2.9.2: Demonstrates the periodic summation with period $f_s$ of a spectrum $|Y(f)|$ as a function of bandwidth $B$. The Nyquist frequency is indicated as $f_N$ in the figure.

Note that, when $B > \frac{f_s}{2} = f_N$, the individual copies of the spectrum start overlapping and it is becomes impossible to isolate the spectrum $Y(f)$ from $Y_{f_s}(f)$. On the onther hand, when $B < f_N$, we can recover $Y(f)$ by only retaining the part of the spectrum confined to the region $-f_N < f < f_N$. At this point the Nyquist-Shannon sampling theorem is proved since the spectrum $Y(f)$ uniquely determines the signal $y(t)$. All that remains to do is specify how we could go about doing this. Note that, if there are no frequencies greater than $\frac{f_s}{2} = f_N$ present in the signal, the spectrum $Y(f)$ can be found from $Y_{f_s}(f)$ simply by constructing the filter

$$ Y(f) = \sqcap(\frac{f}{f_s}) Y_{f_s}(f), $$

where $\sqcap(\cdot)$ is the rectangle (or boxcar) function. Now we can apply the Poisson summation formula to find

\begin{eqnarray} Y(f) &=& \sqcap(\frac{f}{f_s}) Y_{f_s}(f), \\ &=& \sqcap(\frac{f}{f_s}) \sum_{n = -\infty}^{\infty} \Delta t ~ y(\Delta t n) e^{-2\pi\imath f \Delta t n},\\ &=& \sum_{n = -\infty}^{\infty} y(\Delta t n) \Delta t ~ \sqcap(\Delta t ~ f) e^{-2\pi\imath f \Delta t n},\\ &=& \sum_{n = -\infty}^{\infty} y(\Delta t n) \mathscr{F}\{\text{sinc}\left(\frac{t - n\Delta t}{\Delta t}\right)\}, \end{eqnarray}

where we have used the formula for the Fourier transform of the rectangle function. Taking the inverse Fourier transform of both sides results in the Whittaker-Shannon interpolation formula

$$ y(t) = \sum_{n =-\infty}^{\infty} y(n\Delta t) ~ \text{sinc}\left(\frac{t - n\Delta t}{\Delta t}\right) = \sum_{n =-\infty}^{\infty} y_n ~ \text{sinc}\left(\frac{t - t_n}{\Delta t}\right) . $$

Note that, in order to use this formula, we still need an infinite number of samples of $y(t)$, something which is impossible in practice. The operation of transforming the RHS into the LHS is effectively a digital to analogue conversion since we are going from a finitely sampled signal $y_n$ to a continuous approximation $y(t)$. This could be implemented approximately using, for example, the zero-order hold model (see here ⤴ for example). The approximation error reduces if the signal is sufficiently oversampled (i.e. sampled at intervals $\Delta t < \frac{1}{2f_N}$). Undersampling, on the other hand, results in aliasing.

2.9.3 Aliasing

We have already touched upon the concept of aliasing. Recall that, during our discussion on periodic summation, we noted that the Discrete Time Fourier Transform (DTFT) results in a periodic summation $Y_{f_s}(f)$ of the spectrum $Y(f)$ of the signal $y(t)$. The copies of $Y(f)$ at $k \neq 0$ were called aliases. From the discussion above, we also know that, when the signal is undersampled, these aliases will start to overlap resulting in a distortion of the spectrum $Y(f)$. This is what is usually refered to as aliasing. If we try to reconstruct the signal $y(t)$ from the distorted spectrum we will end up with artefacts in the signal $y(t)$. Here we will demonstrate the consequences of aliasing with a simple example and explain the implications this has for practical applications of the DFT.

Going back to our sum of sine functions model above, let's consider what happens if we now allow one of the frequency components to vary i.e.

$$ y = \sin(2\pi f_1 t) + \sin(2\pi f_2 t) + \sin(2\pi f_i t), $$

where $f_i$ is the varaible frequency. If we sample this function in the interval $0< t < 10$s with 256 samples (i.e. $\Delta t = \frac{t_f - t_0}{N - 1} = \frac{10}{255} \approx 0.0392s$) we obtain a Nyquist frequency of $f_N = \frac{1}{2\Delta t} = 12.75$ Hz. In the following demonstration the Nyquist frequency is indicated with the red arrow. The figure on the left shows the resulting $|Y_k|$ from the DFT. The figure on the right shows the input frequencies $f_1$, $f_2$ and $f_i$. Use the slider to investigate the output of the DFT as $f_i$ approaches $f_N$. Can you explain the output when $f_i > f_N$?



In [ ]:

    
interact(lambda fi:inter_DFT(N=256,t0=0.0,tf=10,f3=fi,plot_interval=26,plot_phase=False,show_Nyquist=True),
                fi=(3,25,0.25)) and None

Figure 2.9.3: This demo illustrates the effects of aliasing. Note how frequencies $f_i > f_N$ are aliased back into the spectrum at $f < f_N$.

Notice that the spectrum is reflected about $f_N$. As we noted before, this happens because of periodicity and because the DFT of a real valued signal is Hermitian. Since these additional frequency peaks do not contain any new information, they do not present a problem when trying to recover the signal $y(t)$ from its spectrum. Any non-zero frequency components above $f_N$ can safely be discarded. A more troublesome feature of the spectrum is that input frequencies $f_i$ above $f_N$ are folded or aliased back into the spectrum i.e. a frequency at $f_i = f_N + \epsilon$ appears at $f_k = f_N - \epsilon$. This effect is referred to as aliasing. It can, in principle, be eliminated by applying anti-aliasing filters to the signal. Unfortunately the details behind practical implementations of anti-aliasing filters is beyond the scope of the current discussion. Here we will simply give a brief illustration of what they do.

Suppose we have an input signal whose spectrum is an unnormalised zero mean Gaussian of the form $$ |Y(f)| = e^{\frac{-f^2}{2\sigma^2}}, $$ where $\sigma$ is the standard deviation and $f$ is the frequency in Hertz. If we sampled the original signal with a sampling frequency of $f_s = 10$Hz (i.e. $\Delta t = 0.1$s) then, regardless of the width $\sigma$ of $|Y(f)|$, the DTFT repeats itself over frequency intervals $[(k-\frac{1}{2})f_s,(k+\frac{1}{2})f_s]$ where $k \in \mathbb{Z}$ is any integer. Whether these aliases overlap depends on the width $\sigma$ of $|Y(f)|$. For example, with $\sigma = 1$ we get the following output



In [ ]:

    
def gaussian(x, mu, sig):
    return np.exp(-(x - mu)**2.0 / (2 * sig**2.0))

#Set sampling frequency 
fs = 10
#Set frequency domain of DFT
f = np.linspace(-fs,fs)
Y = gaussian(f,0,1.0)

plt.figure('rep_spec',figsize=(14,7))
plt.plot(f-0,Y,'b')
plt.plot(f-10.,Y, 'k')
plt.plot(f+10,Y, 'k')
plt.plot(f-20.,Y, 'k')
plt.plot(f+20,Y, 'k')
#plt.ylim(-0.5,fft_gau.max())
plt.ylabel('$|Y_{f_s}(f)|$',fontsize=18)
plt.xlabel('$f / [Hz]$',fontsize=18)
plt.show()

Figure 2.9.4: The spectrum of a sufficiently oversampled signal.

Here we have plotted $|Y(f)|$ (i.e. the $k=0$ component of $|Y_{f_s}(f)|$) in blue. This is the quantity we are actually after. With the stated values of $\sigma$ and $f_s$ we would have no problem recovering the input signal $y(t)$ from $Y(f)$ because there is almost no overlap between the aliases. If, on the other hand, the width is $\sigma = 2.5$, the situation changes because the aliases start overlapping as shown below.



In [ ]:

    
f = np.linspace(-10,10)
Y = gaussian(f,0,2.5)

plt.figure('rep_spec',figsize=(14,7))
plt.plot(f-0,Y,'b')
plt.plot(f-10.,Y, 'k')
plt.plot(f+10,Y, 'k')
plt.plot(f-20.,Y, 'k')
plt.plot(f+20,Y, 'k')
#plt.ylim(-0.5,fft_gau.max())
plt.ylabel('$|Y_{f_s}(f)|$',fontsize=18)
plt.xlabel('$f / [Hz]$',fontsize=18)
plt.show()

Figure 2.9.5: The spectrum of an undersampled signal.

With the specified sampling rate there will always be some loss of information about the signal. Actually the situation is even worse than that since, when computing the DFT, input frequencies at $f_i = f_N + \epsilon$ (for $\epsilon > 0$) will also be aliased back into the spectrum at $f_k = f_N - \epsilon$. Attempting to reconstruct the input signal $y(t)$ from the resulting DFT components $Y_k$ would not return the correct result. An anti-aliasing filter is a form of low-pass filter which prevents the components of $Y_k$ corresponding to input frequencies $f_i > f_N$ from contaminating the reconstructed signal $y(t)$. Its basic operation is illustrated in the following figure.



In [ ]:

    
f = np.linspace(-5,5,20)
f2 = np.hstack((f[0],f, f[-1]))
Y = gaussian(f,0,2.5)
Y2 = np.hstack((0.0,Y,0.0))

plt.figure('rep_spec',figsize=(14,7))
plt.plot(f2-0,Y2,'b')
plt.plot(f2-10.,Y2, 'k')
plt.plot(f2+10,Y2, 'k')
plt.plot(f2-20.,Y2, 'k')
plt.plot(f2+20,Y2, 'k')

#plt.ylim(-0.5,fft_gau.max())
plt.ylabel('$|Y_{f_s}(f)|$',fontsize=18)
plt.xlabel('$f / [Hz]$',fontsize=18)
plt.show()

Figure 2.9.6: This figure illustrates the function of an anti-aliasing filter.

An ideal anti-aliasing filter filters out all frequencies above the Nyquist frequency $f_N$. This ensures that the signal $y(t)$ computed from the (possibly truncated) spectrum of $Y(f)$ will not contain any incorrect frequency components. Note that information about frequencies above $f_N$ will still have been lost. For simlicity the above discussion only considered the magnitude $|Y(f)|$ but similar ideas apply to the phase as well. Whereas an ideal anti-aliasing filter blocks out all frequencies above $f_N$ (i.e. has a gain of one when $f \leq f_N$ and zero otherwise), its frequency response should be linear so that it does not result in any phase distortion.

Next: 2.10 Linear Algrebra