In [ ]:
import scipy.io.wavfile as wavfile
import numpy as np
import scipy as sp
import os
import matplotlib.pyplot as plt
%matplotlib inline
You'll spend an embarrassing amount of time as a young scientist loading, processing, and saving data that streams in over time. In this tutorial you'll practice loading a few .wav audio files and plotting them as waveforms, power spectra, and spectrograms.
This folder should contain two wav audio files. They are a sentence from Gulliver's Travels being read by a male and female speaker. We'll see if we can distinguish them based on analysis of the audio data. It is probably very easy to hear which is which by listening directly to the sound files.
Use functions in os to find files ending in ".wav" from the current folder and wavfile to load the into the program.
In [ ]:
files = os.listdir('.')
files = [f for f in files if '.wav' in f]
wav_data = [wavfile.read(f) for f in files]
Inspect wav_data in the following cell to understand what information is returned. You can also look at the documentation by googling "scipy.io.wavfile".
In [ ]:
Plot both waveforms to see what they look like as a function of time. Keep in mind the scale may be different for the two graphs.
In [ ]:
plt.plot()# Plot the first one here
plt.title('First speaker')
plt.figure()
plt.plot()# Plot the second one here
plt.title('Second speaker')
Can you tell which speaker is male or female from this plot?
The Power Spectrum of a signal is defined as the square of the Fourier Series Coefficients of the signal. It measures the amplitude-squared of each frequency in the signal. Let's calculate the plot them.
In [ ]:
sampling_rate, signal_1 = wav_data[0]
sampling_rate, signal_2 = wav_data[1]
In [ ]:
fft_1 = np.fft.fftshift(np.fft.fft(signal_1))
freq_1 = np.fft.fftshift(np.fft.fftfreq(signal_1.shape[0], 1./sampling_rate))
fft_2 = np.fft.fftshift(np.fft.fft(signal_2))
freq_2 = np.fft.fftshift(np.fft.fftfreq(signal_2.shape[0], 1./sampling_rate))
pow_1 = #Get the square of the magnitude of the fft
pow_2 = #Get the square of the magnitude of the fft
Using google, figure out what the functions fftshift and fftfreq do.
In [ ]:
plt.plot(freq_1, pow_1)
plt.title('Power Spectrum for Signal 1')
plt.figure()
plt.plot(freq_2, pow_2)
plt.title('Power Spectrum for Signal 2')
That's Probably difficult to visually parse. Copy the code above and plot the Log-Power Spectrum (use np.log on the pow_x variables).
In [ ]:
# Plot the log-power spectrum here
Can you tell from this plot which speaker is male and which is female?
One downside of the power spectrum is that we have lost all time information.
Ideally, for sound data, we'd like a format that has both time and frequency information. One simple way of getting this data is to take the spectrum for small windows and move the window through the file. You'll need to choose a window length in milliseconds.
In [ ]:
window_ms = 100.# Window length in milliseconds
spects = []
freqs = []
for sampling_rate, signal in wav_data:
window_pts = int(sampling_rate*window_ms/1000.)
n_windows = int(signal.shape[0]/window_pts)
temp_spect = np.empty(shape=(n_windows, int(window_pts/2)+1))
for ii in xrange(n_windows):
fft = np.fft.fftshift(np.fft.fft(sp.kaiser(window_pts, 7)*signal[ii*window_pts:(ii+1)*window_pts]))
pow_spect = np.square(np.absolute(fft))
temp_spect[ii] = pow_spect[:int(window_pts/2)+1]
freqs.append(np.fft.fftshift(np.fft.fftfreq(window_pts, 1./sampling_rate))[int(window_pts/2)+1:])
spects.append(temp_spect)
In [ ]:
plt.imshow(np.log(spects[0].T+1.e-10), aspect='auto',
interpolation='nearest', extent=[0, int(window_ms*len(spects[0])/1000.), freqs[0].min(), freqs[0].max()])
plt.figure()
plt.imshow(np.log(spects[1].T+1.e-10), aspect='auto',
interpolation='nearest', extent=[0, int(window_ms*len(spects[1])/1000.), freqs[1].min(), freqs[1].max()])
Plotting the y-axis on a log scale can make it easier to see what is happening at different frequencies. You'll need to add "plt.yscale('log')" after each imshow. Can you tell which speaker is male or female now?
In [ ]: