In [1]:
import numpy, scipy, matplotlib.pyplot as plt, sklearn, librosa, mir_eval, IPython.display, urllib
plt.rcParams['figure.figsize'] = (14, 4)
Sometimes, an unsupervised learning technique is preferred. Perhaps you do not have access to adequate training data, or perhaps the training data's labels are not completely clear. Maybe you just want to quickly sort real-world, unseen, data into groups based on its feature similarity.
In such cases, clustering is a great option!
Download an audio file:
In [2]:
filename = '125_bounce.wav'
urllib.urlretrieve('http://audio.musicinformationretrieval.com/' + filename,
filename=filename)
Out[2]:
Play the audio file:
In [3]:
IPython.display.Audio(filename)
Out[3]:
Load the audio file into an array.
In [4]:
x, fs = librosa.load(filename)
print fs
Plot audio signal:
In [5]:
librosa.display.waveplot(x, fs)
Out[5]:
Detect onsets:
In [6]:
onset_frames = librosa.onset.onset_detect(x, sr=fs, delta=0.04, wait=4)
onset_times = librosa.frames_to_time(onset_frames, sr=fs)
onset_samples = librosa.frames_to_samples(onset_frames)
Listen to detected onsets:
In [7]:
x_with_beeps = mir_eval.sonify.clicks(onset_times, fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[7]:
Let's compute the zero crossing rate and energy for each detected onset.
Plot the zero crossing rate:
In [8]:
def extract_features(x, fs):
zcr = librosa.zero_crossings(x).sum()
energy = scipy.linalg.norm(x)
return [zcr, energy]
In [9]:
frame_sz = fs*0.090
features = numpy.array([extract_features(x[i:i+frame_sz], fs) for i in onset_samples])
print features.shape
Scale the features (using the scale function) from -1 to 1.
In [10]:
min_max_scaler = sklearn.preprocessing.MinMaxScaler(feature_range=(-1, 1))
features_scaled = min_max_scaler.fit_transform(features)
print features_scaled.shape
print features_scaled.min(axis=0)
print features_scaled.max(axis=0)
Plot the features.
In [11]:
plt.scatter(features_scaled[:,0], features_scaled[:,1])
plt.xlabel('Zero Crossing Rate (scaled)')
plt.ylabel('Spectral Centroid (scaled)')
Out[11]:
Time to cluster! Let's initialize the algorithm to find three clusters.
In [12]:
model = sklearn.cluster.KMeans(n_clusters=2)
labels = model.fit_predict(features_scaled)
print labels
Plot the results.
In [13]:
plt.scatter(features_scaled[labels==0,0], features_scaled[labels==0,1], c='b')
plt.scatter(features_scaled[labels==1,0], features_scaled[labels==1,1], c='r')
plt.xlabel('Zero Crossing Rate (scaled)')
plt.ylabel('Energy (scaled)')
plt.legend(('Class 0', 'Class 1'))
Out[13]:
Listen to onsets assigned to Class 0:
In [14]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[14]:
Class 1:
In [15]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==1], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[15]:
In scikit-learn, other clustering algorithms such as affinity propagation can cluster without defining the number of clusters beforehand.
All we need to do is swap out KMeans
for AffinityPropagation
:
In [16]:
model = sklearn.cluster.AffinityPropagation()
labels = model.fit_predict(features_scaled)
print labels
Plot features:
In [17]:
plt.scatter(features_scaled[labels==0,0], features_scaled[labels==0,1], c='b')
plt.scatter(features_scaled[labels==1,0], features_scaled[labels==1,1], c='r')
plt.scatter(features_scaled[labels==2,0], features_scaled[labels==2,1], c='y')
plt.xlabel('Zero Crossing Rate (scaled)')
plt.ylabel('Energy (scaled)')
plt.legend(('Class 0', 'Class 1', 'Class 2'))
Out[17]:
Play a beep upon each frame in the same cluster:
Class 0:
In [18]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[18]:
Class 1:
In [19]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==1], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[19]:
Class 2:
In [20]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==2], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[20]: