In [ ]:
This exercise will look at recognising the audio registered by a piezo contact microphone on a mobile device when different parts of it are touched by a user. This is data from the Stane project (Paper and video), which used 3D printed surfaces to make super-cheap touch controllers.
The machine learning problem is simple: given a set of recordings of a user rubbing discrete touch zones on this 3D printed case, train a classifier which can distinguish which zone is being touched. This is in essence similar to speech recognition, but with a much simpler acoustic problem and no need to deal with language modeling.
We will use multi-class classification to distinguish the touch zones from the audio alone. We can assume a small number of discrete touch areas, and that there is no model governing how they might be touched (i.e. touches happen at random).
We need to develop a pipeline to process the data. There are several stages common to most supervised learning tasks:
In [30]:
# standard imports
import numpy as np
import scipy.io.wavfile as wavfile
import scipy.signal as sig
import matplotlib.pyplot as plt
import sklearn.preprocessing, sklearn.cluster, sklearn.tree, sklearn.neighbors, sklearn.ensemble, sklearn.multiclass, sklearn.feature_selection
import ipy_table
import sklearn.svm, sklearn.cross_validation, sklearn.grid_search, sklearn.metrics, sklearn.datasets, sklearn.decomposition, sklearn.manifold
import pandas as pd
import seaborn
import scipy.ndimage
# force plots to appear inline on this page
%matplotlib inline
In [5]:
%cd datasets\stane
%ls
Since these are just plain wave files, we can listen to the data using aplay:
In [67]:
# play the first five seconds of these two files
!aplay stane_2.wav -d 5
!aplay stane_4.wav -d 5
In [6]:
# load each of the files into sound_files
sound_files = []
for texture in "12345":
# load the wavefile
fname = "stane_%s.wav" % texture
sr, data = wavfile.read(fname)
print "Loaded %s, %s samples at %dHz (%f seconds)" % (fname, len(data), sr, len(data)/float(sr))
sound_files.append(data)
This has loaded each of the wave files into sound_files[]
, one for each of our 5 classes. We must process this into fixed length feature vectors which we can feed to a classifier. This is the major "engineering" of the machine learning process -- good feature selection is essential to getting good performance.
It's important that we can change the parameters of the feature extraction and learning and be able to rerun the entire process in one go. We define a dictionary called params
which will hold every adjustable parameter and a function called run_pipeline()
which will run our entire pipeline. For now, it does nothing.
In [7]:
params = {'sample_rate':4096,
}
def run_pipeline(sound_files, params):
# this is the outline of our pipeline
pre_processed = pre_process(sound_files, params)
features, targets = feature_extract(pre_processed, params)
train, validate, test = split_features(features, targets, params)
classifier = train_classifier(features, targets, params)
evaluate(classifier, features, targets, params)
In [8]:
one_second = params["sample_rate"]
# plot two of the files
plot_section_0 = sound_files[0][:one_second]
plot_section_1 = sound_files[1][:one_second]
# generate time indices
timebase = np.arange(len(plot_section_0)) / float(params["sample_rate"])
plt.figure()
plt.plot(timebase, plot_section_0)
plt.xlabel("Time (s)")
plt.figure()
plt.plot(timebase, plot_section_1)
plt.xlabel("Time (s)")
Out[8]:
We can also view this in the frequency domain using plt.specgram()
. We have to choose an FFT size and overlap (here I used N=256 samples, overlap=128)
In [ ]:
# the cmap= just selects a prettier heat map
_ = plt.specgram(plot_section_0, NFFT=256, Fs=params["sample_rate"], noverlap=128, cmap="gist_heat")
plt.figure()
_ = plt.specgram(plot_section_1, NFFT=256, Fs=params["sample_rate"], noverlap=128, cmap="gist_heat")
In [9]:
def bandpass(x, low, high, sample_rate):
# scipy.signal.filtfilt applies a linear filter to data (*without* phase distortion)
# scipy.signal.butter will design a linear Butterworth filter
nyquist = sample_rate / 2
b,a = sig.butter(4, [low/float(nyquist), high/float(nyquist)], btype="band")
return sig.filtfilt(b,a,x)
def pre_process(sound_files, params):
processed = []
for sound_file in sound_files:
normalised = sound_file / 32768.0
p = bandpass(normalised, params["low_cutoff"], params["high_cutoff"], params["sample_rate"])
processed.append(p)
return processed
In [10]:
def plot_second(x, params):
one_second = params["sample_rate"]
plot_section = x[:one_second]
# generate time indices
timebase = np.arange(len(plot_section)) / float(params["sample_rate"])
plt.figure()
plt.plot(timebase, plot_section)
plt.ylabel("Amplitude")
plt.xlabel("Time (s)")
plt.figure()
_ = plt.specgram(plot_section, NFFT=256, Fs=params["sample_rate"], noverlap=128, cmap='gist_heat')
plt.ylabel("Freq (Hz)")
plt.xlabel("Time (s)")
In [11]:
# test the filtering; these are example values only
params["low_cutoff"]=100
params["high_cutoff"]=1500
processed = pre_process(sound_files, params)
# plot the results
plot_second(sound_files[0], params)
plot_second(processed[0], params)
The next step is to make fixed length feature vectors. This requires some assumptions: we have a continuous signal, so how do we split it up? What processing should we apply to transform the data?
The obvious thing to do with a time series is to split it into windows of a fixed length. These windows can be overlapping (i.e. the next window can include part of the previous one). The function sliding_window() below splits up a 1D time series into such overlapping windows.
In [12]:
def sliding_window(x, length, overlap):
"""Split x into windows of the given length, with the specified overlap"""
wins = len(x)//(length-overlap)
windows = []
offset = 0
for i in range(wins):
windows.append(x[offset:offset+length])
offset += length-overlap
return windows
In [16]:
# for example
sliding_window(sound_files[0], 512, 256);
Exercise: Produce a feature matrix for the sound files using sliding window, and a corresponding label vector.
Hint: you can use np.full(n, x)
to generate a vector [x,x,x,x,...] and np.vstack(l)
to stack a list of vectors into a matrix.
Make the window size and overlap part of params
(window_length
and overlap
) and write a function features, labels = make_features(processed, params)
In [18]:
params['window_overlap'] = -1024
params['window_length'] = 256
features, labels = make_features(processed, params)
print features.shape
The raw audio data isn't a great feature for classification. We can apply transformations to the feature vectors to improve the features. The Fourier transform is one way of doing that. To avoid artifacts, each section must have a window function applied to taper off the signal at the ends and avoid a large discontinuity.
In [29]:
def transform_features(data):
# window features and compute magnitude spectrum
# try different window functions (e.g. Hann, Blackman-Harris)
window = sig.hamming(features.shape[1])
fft_features = np.abs(np.fft.fft(features * window))
return fft_features
print transform_features(features).shape
Build a classifier using sklearn to classify this data. You should make sure you: