This notebook will demonstrate some basic aspects related with digital signal processing using FFT and psd. It is mostly inspired by two LabVIEW white papers, this one and this one. We will also take the oportunity to test different power spectrum estimation implementations from two common Python packages, matplotlib.mlab and scipy.signal, following this StackOverflow question.
The computational environment set up for this Python notebook includes numpy and scipy for the numerical simulations, matplotlib and pandas for the plots:
In [1]:
import sys
import numpy as np
import scipy as sp
import matplotlib as mpl
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
print(sys.version)
for package in (np, sp, mpl, pd):
print('{:.<15} {}'.format(package.__name__, package.__version__))
Furthermore, we will need the following special functions:
In [2]:
from numpy.fft import fft, fftfreq, rfft, rfftfreq, fftshift
from scipy.signal import periodogram, welch
from matplotlib.mlab import rms_flat, psd, detrend_none, window_hanning
The power spectral density of a digital signal can be estimated in several different ways, namely through:
We will illustrate them below. However, before that we will have to set up a sample signal.
For the purpose of illustration, in this notebook we will use a sample signal (in volt) composed of a small amplitude sine wave with an additive large amplitude random noise:
In [3]:
Ns = 4096 # number of samples
np.random.seed(1234) # random seed (for repeatability)
rn = np.random.random(Ns)-0.5 # zero mean random noise
Fs = 100 # sampling frequency
dt = 1./Fs # time discretisation
tt = np.arange(Ns)*dt # time sampling
A = 0.067 # sine wave amplitude
f = 10.24 # sine wave frequency
sw = A*np.sin(2*np.pi*f*tt) # sine wave
ss = sw+rn # sample signal
signals = (rn, sw, ss)
labels = ('Random noise', 'Sine wave', 'Sample signal')
v = [(np.max(v), np.min(v), np.mean(v), rms_flat(v)) for v in signals]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Min', 'Mean', 'RMS'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.plot(tt, signals[v], label=labels[v])
ax.set_title('Time history')
ax.set_xlabel('Time [s]')
ax.set_ylabel('Amplitude [V]')
ax.legend()
ax.grid()
The sample signal time history plot shows that the sine wave is completely hidden by the additive random noise because of the difference in the amplitudes of both signals.
The theoretical sine wave RMS value is equal to its amplitude ($A$) divided by the square root of 2:
$$RMS(sine wave) = \frac{A}{\sqrt 2} = A_{RMS}$$
In [4]:
print('{:.6f}, {:.6f}'.format(df['RMS']['Sine wave'], A/np.sqrt(2)))
For additive orthogonal signals, the RMS value of the total is equal to the square root of sum of squares (SRSS) of the parts. Let us check that with the random noise and the sine wave against the sample signal:
In [5]:
SRSS = np.sqrt(df['RMS']['Random noise']**2 + df['RMS']['Sine wave']**2)
print('{:.6f}, {:.6f}'.format(SRSS, df['RMS']['Sample signal']))
We are now ready to start processing these signals.
We will start processing these signals by taking their Fourier transform into the frequency domain. For that we will use the FFT algorithm, implemented in NumPy as the fft function, and normalise the result by the number of samples (Ns):
In [6]:
RN2 = fft(rn)/Ns
SW2 = fft(sw)/Ns
SS2 = fft(ss)/Ns
FT2 = (RN2, SW2, SS2)
freqs = fftfreq(Ns, d=1./Fs)
v = [(np.absolute(v[ix]), freqs[ix]) for v in FT2 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Freq'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.semilogy(fftshift(freqs), fftshift(np.absolute(FT2[v])), label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('Two-sided amplitude spectrum')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [$V_{RMS}$]')
ax.legend()
ax.grid()
Several aspects are worth mentioning about these plots:
The theoretical value of a sine wave two-sided amplitude spectrum is equal to the sine wave amplitude ($A$) divided by the double of the square root of 2:
$$\left| FT_{2sided} \right| = \frac{A}{2 \cdot \sqrt 2} = \frac{A_{RMS}}{2}$$
In [7]:
print('{:.6f}, {:.6f}'.format(df['Max']['Sine wave'], A/(2*np.sqrt(2))))
The difference between the actual and theoretical values will become smaller as the duration of the signals increases towards infinity. This aspect, which also influences the frequency discretisation mentioned above, affects most of the numerical comparisons shown in this notebook.
We will now take advantage of this symmetry property of the Fourier transform with real signals to compute only the non-negative frequency terms. There are two options to achieve that:
For the first one we will use the rfft function implemented in NumPy which gives the spectral components from DC up to the Nyquist frequency:
In [8]:
TRN = rfft(rn)/Ns
TSW = rfft(sw)/Ns
TSS = rfft(ss)/Ns
TFT = (TRN, TSW, TSS)
tfreqs = rfftfreq(Ns, d=1./Fs)
v = [(np.absolute(v[ix]), tfreqs[ix], np.absolute(v[0])) for v in TFT for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.semilogy(tfreqs, np.absolute(TFT[v]), label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('Truncated amplitude spectrum')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [$V_{RMS}$]')
ax.legend()
ax.grid()
As can be seen, these truncated amplitude spectra are exactly the same as the previous two-sided amplitude spectra, but were computed only for the non-negative frequency terms.
The one-sided spectra, on the other hand, are computed by taking the complex conjugate of the second half of the two-sided spectrum (the negative frequencies), reversing and adding it to the first half (in the corresponding positive frequencies). Better yet, multiply the truncated spectrum ordinates by two, with the only exceptions of the DC and Nyquist (if it exists) components:
In [9]:
scale = 2.*np.ones_like(tfreqs) # scale rfft components by a factor of 2
scale[0] = 1. # the DC component is not scaled
if scale.size%2 == True: # if there is a Nyquist component...
scale[-1] = 1. # ...then it is not scaled
FT1 = [v*scale for v in TFT]
v = [(np.absolute(v[ix]), tfreqs[ix], np.absolute(v[0])) for v in FT1 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.semilogy(tfreqs, np.absolute(FT1[v]), label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('One-sided amplitude spectrum')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [$V_{RMS}$]')
ax.legend()
ax.grid()
The theoretical value of a sine wave one-sided amplitude spectrum is equal to the sine wave RMS, reason why the units of the amplitude spectrum are often referred to as quantity squared rms, where quantity is the unit of the time-domain signal:
$$\left| FT_{1sided} \right| = 2 \cdot \left| FT_{2sided} \right| = 2 \cdot \frac{A}{2 \cdot \sqrt 2} = \frac{A}{\sqrt 2} = A_{RMS}$$
In [10]:
print('{:.6f}, {:.6f}'.format(df['Max']['Sine wave'], A/np.sqrt(2)))
The peak corresponding to the sine wave frequency is still barely distinguishable above the major peaks in the sample signal one-sided amplitude spectrum. Let us query for the top 5 values:
In [11]:
df = pd.DataFrame(data=np.column_stack((np.absolute(FT1[2]), tfreqs)), columns=('RSS', 'Freq'))
print(df.nlargest(5, columns='RSS').to_string(index=False))
So we see that the sine wave frequency really does not stand out amongst the amplitude spectrum peaks. In order to improve this result we will need other processing tools.
We will compute now the power spectra of the three signals using the normalised Fourier transforms. First of all, we will multiply the two-sided fft by their complex conjugates in order to obtain the two-sided power spectra:
In [12]:
PS2 = [np.real(v*np.conj(v)) for v in FT2]
v = [(v[ix], freqs[ix], np.absolute(v[0])) for v in PS2 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.semilogy(fftshift(freqs), fftshift(PS2[v]), label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('Two-sided power spectrum (fft)')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [${V_{RMS}}^2$]')
ax.legend()
ax.grid()
The theoretical value of a sine wave two-sided power spectrum is equal to the square of the sine wave two-sided amplitude spectrum, that is, the square of the sine wave amplitude ($A$) divided by 8:
$$\left| S_{2sided} \right| = {\left( \frac{A}{2 \cdot \sqrt 2} \right)}^2 = \frac{A^2}{8} = \frac{{A_{RMS}}^2}{4}$$
In [13]:
print('{:.6f}, {:.6f}'.format(df['Max']['Sine wave'], (A**2/8)))
Similar to the Fourier transform case, we will compute the one-sided power spectra by mutliplying the truncated rfft by their complex conjugates and applying the same scaling as for the one-sided amplitude spectra:
In [14]:
PS1 = [np.real(v*np.conj(v))*scale for v in TFT]
v = [(v[ix], tfreqs[ix], np.absolute(v[0])) for v in PS1 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.semilogy(tfreqs, PS1[v], label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('One-sided power spectrum (scaled rfft)')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [${V_{rms}}^2$]')
ax.legend()
ax.grid()
We can see that the one-sided power spectrum estimates are the same. The theoretical value for a sine wave one-sided power spectrum is now given by the square of the sine wave one-sided amplitude spectrum, that is, the square of the sine wave amplitude ($A$) divided by 4:
$$\left| G_{1sided} \right| = 2 \cdot \left| S_{2sided} \right| = 2 \cdot \frac{A^2}{8} = \frac{A^2}{4} = \frac{{A_{RMS}}^2}{2}$$
In [15]:
print('{:.6f}, {:.6f}'.format(df['Max']['Sine wave'], (A**2/4)))
The peak corresponding to the sine wave frequency is now more visible above the major peaks in the sample signal one-sided amplitude spectrum. Let us query again for the top 5 values:
In [16]:
df = pd.DataFrame(data=np.column_stack((np.absolute(PS1[2]), tfreqs)), columns=('RSS', 'Freq'))
print(df.nlargest(5, columns='RSS').to_string(index=False))
We have reached some important results thus far, let us summarize them here:
Amplitude spectra | Fourier transform | Power spectra |
---|---|---|
Two-sided (peak) | $\frac{A}{2} = \frac{A_{RMS}}{\sqrt 2}$ | $\frac{A^2}{4} = \frac{{A_{RMS}}^2}{2}$ |
Two-sided (RMS) | $\frac{A}{2 \cdot \sqrt 2} = \frac{A_{RMS}}{2}$ | $\frac{A^2}{8} = \frac{{A_{RMS}}^2}{4}$ |
One-sided (peak) | $A = \sqrt 2 \cdot A_{RMS}$ | $\frac{A^2}{2} = {A_{RMS}}^2$ |
One-sided (RMS) | $\frac{A}{\sqrt 2} = A_{RMS}$ | $\frac{A^2}{4} = \frac{{A_{RMS}}^2}{2}$ |
Now that we have seen how to compute the power spectra from the Fourier transforms, we will use directly the periodogram function implemented in SciPy to obtain the same results:
In [17]:
freqs, Prn = periodogram(rn, fs=Fs, window=None, nfft=None, detrend=None, return_onesided=False, scaling='spectrum')
freqs, Psw = periodogram(sw, fs=Fs, window=None, nfft=None, detrend=None, return_onesided=False, scaling='spectrum')
freqs, Pss = periodogram(ss, fs=Fs, window=None, nfft=None, detrend=None, return_onesided=False, scaling='spectrum')
PS0 = (Prn, Psw, Pss)
v = [(np.absolute(v[ix]), freqs[ix], np.absolute(v[0])) for v in PS0 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=('Random noise', 'Sine wave', 'Masked signal'), columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.semilogy(fftshift(freqs), fftshift(PS0[v]), label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('Two-sided power spectrum (periodogram)')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [${V_{rms}}^2$]')
ax.legend()
ax.grid()
In [18]:
freqs, Prn = periodogram(rn, fs=Fs, window=None, nfft=None, detrend=None, return_onesided=True, scaling='spectrum')
freqs, Psw = periodogram(sw, fs=Fs, window=None, nfft=None, detrend=None, return_onesided=True, scaling='spectrum')
freqs, Pss = periodogram(ss, fs=Fs, window=None, nfft=None, detrend=None, return_onesided=True, scaling='spectrum')
PS0 = (Prn, Psw, Pss)
v = [(np.absolute(v[ix]), freqs[ix], np.absolute(v[0])) for v in PS0 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=('Random noise', 'Sine wave', 'Masked signal'), columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.semilogy(freqs, PS0[v], label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('One-sided power spectrum (periodogram)')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [${V_{rms}}^2$]')
ax.legend()
ax.grid()
The periodogram function also allows us to compute the power spectral density of a signal which, in some cases, is more relevant than the power spectra:
In [19]:
freqs, Prn = periodogram(rn, fs=Fs, window=None, nfft=None, detrend=None, return_onesided=True, scaling='density')
freqs, Psw = periodogram(sw, fs=Fs, window=None, nfft=None, detrend=None, return_onesided=True, scaling='density')
freqs, Pss = periodogram(ss, fs=Fs, window=None, nfft=None, detrend=None, return_onesided=True, scaling='density')
PSD0 = (Prn, Psw, Pss)
v = [(np.absolute(v[ix]), freqs[ix], np.absolute(v[0])) for v in PSD0 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.semilogy(freqs, PSD0[v], label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('One-sided power spectral density (periodogram)')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [${V_{rms}}^2/Hz$]')
ax.legend()
ax.grid()
Naturally, the power spectrum can be estimated from the power spectral density by multiplying by the frequency discretisation:
In [20]:
print(df['Max']['Sine wave']*(freqs[1]-freqs[0]))
The peak corresponding to the sine wave frequency is now more visible above the major peaks in the sample signal one-sided amplitude spectrum. Let us query again for the top 5 values:
In [21]:
df = pd.DataFrame(data=np.column_stack((np.absolute(PSD0[2]), tfreqs)), columns=('RSS', 'Freq'))
print(df.nlargest(5, columns='RSS').to_string(index=False))
Matplotlib supplies one function, psd, to estimate the power spectral density (Pxx) of a signal (x) in the mlab module. This function, which is described as a "Welch’s average periodogram method", has the following signature:
matplotlib.mlab.psd(x, NFFT=256, Fs=2, detrend=mlab.detrend_none, window=mlab.window_hanning, noverlap=0, pad_to=None,sides='default', scale_by_freq=None)
The function returns a tuple with the power spectral density estimate and the corresponding frequencies (Pxx, freqs).
In [22]:
Prn, freqs = psd(rn, NFFT=512, Fs=Fs, detrend=detrend_none, window=window_hanning, noverlap=0, pad_to=None,
sides='onesided', scale_by_freq=True)
Psw, freqs = psd(sw, NFFT=512, Fs=Fs, detrend=detrend_none, window=window_hanning, noverlap=0, pad_to=None,
sides='onesided', scale_by_freq=True)
Pss, freqs = psd(ss, NFFT=512, Fs=Fs, detrend=detrend_none, window=window_hanning, noverlap=0, pad_to=None,
sides='onesided', scale_by_freq=True)
PSD1 = (Prn, Psw, Pss)
v = [(np.absolute(v[ix]), freqs[ix], np.absolute(v[0])) for v in PSD1 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.plot(freqs, PSD1[v], label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('Power spectral density')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [V**2/Hz]')
ax.legend()
ax.grid()
The peak corresponding to the sine wave frequency is now more visible above the major peaks in the sample signal one-sided amplitude spectrum. Let us query again for the top 5 values:
In [23]:
df = pd.DataFrame(data=np.column_stack((np.absolute(PSD1[2]), freqs)), columns=('RSS', 'Freq'))
print(df.nlargest(5, columns='RSS').to_string(index=False))
SciPy supplies two functions to estimate the power spectral density (Pxx) of a signal (x) in the signal module, periodogram and welch. Their signatures are the following:
scipy.signal.periodogram(x, fs=1.0, window=None, nfft=None, detrend='constant', return_onesided=True, scaling='density', axis=-1)
scipy.signal.welch(x, fs=1.0, window='hann', nperseg=256, noverlap=None, nfft=None, detrend='constant', return_onesided=True, scaling='density', axis=-1)
The first function is the periodogram whereas the second function uses the Welch’s method. In the particular case that noverlap is 0, this method is equivalent to Bartlett’s method. Both functions return a tuple with the frequencies and the power spectral density estimate (freqs, Pxx).
In [24]:
freqs, Prn = periodogram(rn, fs=Fs, window='hann', nfft=512, detrend=None, return_onesided=True, scaling='density', axis=-1)
freqs, Psw = periodogram(sw, fs=Fs, window='hann', nfft=512, detrend=None, return_onesided=True, scaling='density', axis=-1)
freqs, Pss = periodogram(ss, fs=Fs, window='hann', nfft=512, detrend=None, return_onesided=True, scaling='density', axis=-1)
PSD2 = (Prn, Psw, Pss)
v = [(np.absolute(v[ix]), freqs[ix], np.absolute(v[0])) for v in PSD2 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.plot(freqs, PSD2[v], label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('Power spectral density')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [V**2/Hz]')
ax.legend()
ax.grid()
The peak corresponding to the sine wave frequency is now more visible above the major peaks in the sample signal one-sided amplitude spectrum. Let us query again for the top 5 values:
In [25]:
df = pd.DataFrame(data=np.column_stack((np.absolute(PSD2[2]), freqs)), columns=('RSS', 'Freq'))
print(df.nlargest(5, columns='RSS').to_string(index=False))
In [26]:
freqs, Prn = welch(rn, fs=Fs, window='hann', nperseg=512, noverlap=None, nfft=None, detrend=None,
return_onesided=True, scaling='density', axis=-1)
freqs, Psw = welch(sw, fs=Fs, window='hann', nperseg=512, noverlap=None, nfft=None, detrend=None,
return_onesided=True, scaling='density', axis=-1)
freqs, Pss = welch(ss, fs=Fs, window='hann', nperseg=512, noverlap=None, nfft=None, detrend=None,
return_onesided=True, scaling='density', axis=-1)
PSD3 = (Prn, Psw, Pss)
v = [(np.absolute(v[ix]), freqs[ix], np.absolute(v[0])) for v in PSD3 for ix in (np.argmax(np.absolute(v)),)]
df = pd.DataFrame(data=v, index=labels, columns=('Max', 'Freq', 'DC'))
print(df)
fig, ax = plt.subplots()
ax.hold(True)
for v in (2,1):
ax.plot(freqs, PSD3[v], label=labels[v])
ax.axvline(f, label='{}Hz'.format(f), ls=':')
ax.set_title('Power spectral density')
ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude [V**2/Hz]')
ax.legend()
ax.grid(True)
The peak corresponding to the sine wave frequency is now more visible above the major peaks in the sample signal one-sided amplitude spectrum. Let us query again for the top 5 values:
In [27]:
df = pd.DataFrame(data=np.column_stack((np.absolute(PSD3[2]), freqs)), columns=('RSS', 'Freq'))
print(df.nlargest(5, columns='RSS').to_string(index=False))