``````

In [1]:

import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa

``````

# Signal Representations

## Time Domain

The basic representation of an audio signal is in the time domain.

Sound is air vibrating. An audio signal represents the fluctuation in air pressure caused by the vibration as a function of time.

``````

In [2]:

import urllib
urllib.urlretrieve('http://audio.musicinformationretrieval.com/c_strum.wav')

from IPython.display import Audio
Audio(x, rate=fs)

``````
``````

Out[2]:

Your browser does not support the audio element.

``````

To plot a signal in the time domain, use `librosa.display.waveplot`:

``````

In [3]:

librosa.display.waveplot(x, fs, alpha=0.5)

``````
``````

Out[3]:

<matplotlib.collections.PolyCollection at 0x112d35890>

``````

Let's zoom in:

``````

In [4]:

plt.plot(x[8000:9000])

``````
``````

Out[4]:

[<matplotlib.lines.Line2D at 0x11321eed0>]

``````

Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the sampling frequency (abbreviated `fs`) or sample rate (abbreviated `sr`). For this workshop, we will mostly work with a sampling frequency of 44100 Hz.

## Fourier Transform

The Fourier Transform is one of the most fundamental operations in applied mathematics and signal processing.

It transforms our time-domain signal into the frequency domain. Whereas the time domain expresses our signal as a sequence of samples, the frequency domain expresses our signal as a superposition of sinusoids of varying magnitudes, frequencies, and phase offsets.

To compute a Fourier transform in NumPy or SciPy, use `scipy.fft`:

``````

In [5]:

X = scipy.fft(x)
X_mag = numpy.absolute(X)
plt.plot(X_mag) # magnitude spectrum

``````
``````

Out[5]:

[<matplotlib.lines.Line2D at 0x11326a5d0>]

``````

Zoom in:

``````

In [6]:

plt.plot(X_mag[:5000])

``````
``````

Out[6]:

[<matplotlib.lines.Line2D at 0x11329ed50>]

``````

In Essentia, you can also use `essentia.standard.Spectrum` to compute a magnitude spectrum.

## Spectrogram

Musical signals are highly non-stationary, i.e., their statistics change over time. It would be rather meaningless to compute a spectrum over an entire 10-minute song.

Instead, we compute a spectrum for small frames of the audio signal. The resulting sequence of spectra is called a spectrogram.

### `matplotlib.specgram`

Matplotlib has `specgram` which computes and displays a spectrogram:

``````

In [7]:

S, freqs, bins, im = plt.specgram(x, NFFT=1024, noverlap=512, Fs=44100)
plt.xlabel('Time (seconds)')
plt.ylabel('Frequency (Hz)')

``````
``````

Out[7]:

<matplotlib.text.Text at 0x1132b68d0>

``````

### `librosa.feature.melspectrogram`

`librosa` has some outstanding spectral representations, including `librosa.feature.melspectrogram`:

``````

In [8]:

S = librosa.feature.melspectrogram(x, sr=fs, n_fft=1024)
logS = librosa.logamplitude(S)

``````

To display any type of spectrogram in librosa, use `librosa.display.specshow`:

``````

In [9]:

librosa.display.specshow(logS, sr=fs, x_axis='time', y_axis='mel')

``````
``````

Out[9]:

<matplotlib.image.AxesImage at 0x113018d90>

``````

## Autocorrelation

The autocorrelation of a signal describes the similarity of a signal against a delayed version of itself.

``````

In [10]:

# Because the autocorrelation produces a symmetric signal, we only care about the "right half".
r = numpy.correlate(x, x, mode='full')[len(x)-1:]
print x.shape, r.shape
plt.plot(r[:10000])
plt.xlabel('Lag (samples)')

``````
``````

(204800,) (204800,)

Out[10]:

<matplotlib.text.Text at 0x113002950>

``````
``````

In [11]:

r = librosa.autocorrelate(x, max_size=10000)
plt.plot(r)
plt.xlabel('Lag (samples)')

``````
``````

Out[11]: