In [ ]:
    
import numpy, scipy, matplotlib.pyplot as plt, sklearn, librosa, mir_eval, IPython.display, urllib
    
This lab is loosely based on Lab 3 (2010).
Retrieve an audio file, load it into an array, and listen to it.
In [ ]:
    
urllib.urlretrieve?
    
In [ ]:
    
librosa.load?
    
In [ ]:
    
IPython.display.Audio?
    
Detect onsets in the audio signal:
In [ ]:
    
librosa.onset.onset_detect?
    
Convert the onsets from units of frames to seconds (and samples):
In [ ]:
    
librosa.frames_to_time?
    
In [ ]:
    
librosa.frames_to_samples?
    
Listen to detected onsets:
In [ ]:
    
mir_eval.sonify.clicks?
    
In [ ]:
    
IPython.display.Audio?
    
Extract a set of features from the audio at each onset. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the librosa API reference.
First, define which features to extract:
In [ ]:
    
def extract_features(x, fs):
    feature_1 = librosa.zero_crossings(x).sum() # placeholder
    feature_2 = 0 # placeholder
    return [feature_1, feature_2]
    
For each onset, extract a feature vector from the signal:
In [ ]:
    
# Assumptions:
# x: input audio signal
# fs: sampling frequency
# onset_samples: onsets in units of samples
frame_sz = fs*0.100
features = numpy.array([extract_features(x[i:i+frame_sz], fs) for i in onset_samples])
    
Use sklearn.preprocessing.MinMaxScaler to scale your features to be within [-1, 1].
In [ ]:
    
sklearn.preprocessing.MinMaxScaler?
    
In [ ]:
    
sklearn.preprocessing.MinMaxScaler.fit_transform?
    
Use scatter to plot features on a 2-D plane. (Choose two features at a time.)
In [ ]:
    
plt.scatter?
    
Use KMeans to cluster your features and compute labels.
In [ ]:
    
sklearn.cluster.KMeans?
    
In [ ]:
    
sklearn.cluster.KMeans.fit_predict?
    
Use scatter, but this time choose a different marker color (or type) for each class.
In [ ]:
    
plt.scatter?
    
Create a beep for each onset within a class:
In [ ]:
    
beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))
    
In [ ]:
    
IPython.display.Audio?
    
Use the concatenated_segments function from the feature sonification exercise to concatenate frames from the same cluster into one signal. Then listen to the signal.
In [ ]:
    
def concatenate_segments(segments, fs=44100, pad_time=0.300):
    padded_segments = [numpy.concatenate([segment, numpy.zeros(int(pad_time*fs))]) for segment in segments]
    return numpy.concatenate(padded_segments)
concatenated_signal = concatenate_segments(segments, fs)
    
Compare across separate classes. What do you hear?
Use a different number of clusters in KMeans.
Use a different initialization method in KMeans.
Use different features. Compare tonal features against timbral features.
In [ ]:
    
librosa.feature?
    
Use different audio files.
In [ ]:
    
#filename = '1_bar_funk_groove.mp3'
#filename = '58bpm.wav'
#filename = '125_bounce.wav'
#filename = 'prelude_cmaj_10s.wav'