Audio t-SNE

This notebook will show you how to create a t-SNE plot of a group of audio clips. Along the way, we'll cover a few basic audio processing and machine learning tasks.

We will make two separate t-SNE plots. The first is clustering a group of many audio files in a single directory. The second one takes only a single audio track (a song) as it's input, segments it into many audio chunks (and saves them to a directory) and clusters the resulting chunks.

This notebook requires numpy, matplotlib, scikit-learn, and librosa to run. To install librosa, run the following command in the terminal:

pip install librosa

After verifying you have the required libraries, verify the following import commands work.



In [1]:

    
%matplotlib inline
from matplotlib import pyplot as plt
import matplotlib.cm as cm
import fnmatch
import os
import numpy as np
import librosa
import matplotlib.pyplot as plt
import librosa.display
from sklearn.manifold import TSNE
import json

First we need to scan some directory of audio files and collect all their paths into a single list. This notebook is using a free sample pack called "Vintage drum machines" which can be downloaded, along with all the other data needed for ml4a-guides, by running the script download.sh in the data folder, or downloaded and unzipped manually from http://ivcloud.de/index.php/s/QyDXk1EDYDTVYkF.

Once you've done that, or changed the path variable to another directory of audio samples on your computer, you can proceed with the next code block to load all the filepaths into memory.



In [2]:

    
path = '../data/Vintage Drum Machines'

files = []
for root, dirnames, filenames in os.walk(path):
    for filename in fnmatch.filter(filenames, '*.wav'):
        files.append(os.path.join(root, filename))

print("found %d .wav files in %s"%(len(files),path))









    



found 6153 .wav files in ../data/Vintage Drum Machines

In the next block, we're going to create a function which extracts a feature vector from an audio file. We are using librosa, a python library for audio analysis and music information retrieval, to handle the feature extraction.

The function we're creating get_features will take a waveform y at a sample rate sr, and extract features for only the first second of the audio. It is possible to use longer samples, but because we are interested in clustering according to sonic similarity, we choose to focus on short samples with relatively homogenous content over their durations. Longer samples have different sections and would require somewhat more sophisticated feature extraction.

The feature extraction will calculate the first 13 mel-frequency cepstral coefficients of the audio file, as well as their first- and second-order derivatives, and concatenate them into a single 39-element feature vector. The feature vector is also standardized so that each feature has equal variance.



In [3]:

    
def get_features(y, sr):
    y = y[0:sr]  # analyze just first second
    S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128)
    log_S = librosa.amplitude_to_db(S, ref=np.max)
    mfcc = librosa.feature.mfcc(S=log_S, n_mfcc=13)
    delta_mfcc = librosa.feature.delta(mfcc, mode='nearest')
    delta2_mfcc = librosa.feature.delta(mfcc, order=2, mode='nearest')
    feature_vector = np.concatenate((np.mean(mfcc,1), np.mean(delta_mfcc,1), np.mean(delta2_mfcc,1)))
    feature_vector = (feature_vector-np.mean(feature_vector)) / np.std(feature_vector)
    return feature_vector

Now we will iterate through all the files, and get their feature vectors, placing them into a new list feature_vectors. We also make a new array sound_paths to index the feature vectors to the correct paths, in case some of the files are empty or corrupted (as they are in the Vintage Drum Machines sample pack), or otherwise something goes wrong in analyzing them.



In [4]:

    
feature_vectors = []
sound_paths = []
for i,f in enumerate(files):
    if i % 100 == 0:
        print("get %d of %d = %s"%(i+1, len(files), f))
    try:
        y, sr = librosa.load(f)
        if len(y) < 2:
            print("error loading %s" % f)
            continue
        feat = get_features(y, sr)
        feature_vectors.append(feat)
        sound_paths.append(f)
    except:
        print("error loading %s" % f)
        
print("calculated %d feature vectors"%len(feature_vectors))









    



get 1 of 6153 = ../data/Vintage Drum Machines/Ace Tone Rhythm Ace/CLAVE.wav
get 101 of 6153 = ../data/Vintage Drum Machines/Akai  XR-10/Snaredrums/Snaredrum-09.wav
get 201 of 6153 = ../data/Vintage Drum Machines/Akai XR10/XR10Rimshot01.wav
get 301 of 6153 = ../data/Vintage Drum Machines/Alesis DM5/DM5Hat_C07.wav
get 401 of 6153 = ../data/Vintage Drum Machines/Alesis DM5/DM5Kick89.wav
get 501 of 6153 = ../data/Vintage Drum Machines/Alesis DM5/DM5Perc094.wav
get 601 of 6153 = ../data/Vintage Drum Machines/Alesis DM5/DM5Snare072.wav
get 701 of 6153 = ../data/Vintage Drum Machines/Alesis HR-16/Agogo Bell.wav
get 801 of 6153 = ../data/Vintage Drum Machines/Alesis HR16B/HR16B 038 Bottles.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_hats22/SR16 Hat_Closed.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_hats22/SR16 Hat_Small.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_hats22/SR16 Hat_Tight.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_hats22/SR16 Hat_Vari.wav
get 901 of 6153 = ../data/Vintage Drum Machines/Alesis SR15/sr16_kik22/SR16 Kik_Dry Wood 2.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_kik22/SR16 Kik_Dry05.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_kik22/SR16 Kik_Dry08.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_kik22/SR16 Kik_Elect Hi.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_perc22/SR16 Prc_Agogo Hi.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_perc22/SR16 Prc_Cabasa.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_perc22/SR16 Prc_Clave Hi.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_perc22/SR16 Prc_Clave Lo.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_perc22/SR16 Prc_Cowbell Hi.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_perc22/SR16 Prc_Fish Lo.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_perc22/SR16 Prc_Fishstick.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_perc22/SR16 Prc_Shaker.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_perc22/SR16 Prc_Sticks Hi.wav
get 1001 of 6153 = ../data/Vintage Drum Machines/Alesis SR15/sr16_snares/SR16 Snr_Flange.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_toms/SR16 Tom_Elect 1.wav
error loading ../data/Vintage Drum Machines/Alesis SR15/sr16_toms/SR16 Tom_Elect 2.wav
get 1101 of 6153 = ../data/Vintage Drum Machines/Alesis SR16/SR16Hat_O2.wav
get 1201 of 6153 = ../data/Vintage Drum Machines/Boss DR-202/202kik05.wav
get 1301 of 6153 = ../data/Vintage Drum Machines/Boss DR-202/202snr05.wav
get 1401 of 6153 = ../data/Vintage Drum Machines/Boss DR-55/Rimshot.wav
get 1501 of 6153 = ../data/Vintage Drum Machines/Boss DR-550/Toms/Tom M-02.wav
get 1601 of 6153 = ../data/Vintage Drum Machines/Boss DR-660/DR-660Perc36.wav
get 1701 of 6153 = ../data/Vintage Drum Machines/Boss DR-660/DR-660Snare61.wav
get 1801 of 6153 = ../data/Vintage Drum Machines/Cheetah MD16/CHTMD38.wav
get 1901 of 6153 = ../data/Vintage Drum Machines/Emu SP12/SNARES/Rim Shot-01.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_00.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_01.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_02.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_03.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_04.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_05.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_06.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_07.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_08.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_09.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_10.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_11.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_12.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_13.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_14.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_15.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_16.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_17.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_18.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_19.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_20.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_21.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_22.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_23.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_24.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_25.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_26.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_27.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_28.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_29.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_30.wav
error loading ../data/Vintage Drum Machines/Kawai K3/K3M_31.wav
get 2001 of 6153 = ../data/Vintage Drum Machines/Kawai R50/R50 Timbale.wav
get 2101 of 6153 = ../data/Vintage Drum Machines/Korg KPR77/KPRTom_Lo.wav
get 2201 of 6153 = ../data/Vintage Drum Machines/Korg Poly-800/Bassdrum-02.wav
error loading ../data/Vintage Drum Machines/Korg T3/Rim shot.wav
get 2301 of 6153 = ../data/Vintage Drum Machines/Linn LinnDrum/Percussion/Conga L-02.wav
get 2401 of 6153 = ../data/Vintage Drum Machines/Novation Drumstation/Kit00/DS00808TomLo.wav
get 2501 of 6153 = ../data/Vintage Drum Machines/Novation Drumstation/Kit04/DS04808Cymbal.wav
get 2601 of 6153 = ../data/Vintage Drum Machines/Novation Drumstation/Kit07/DS07909TomLo.wav
get 2701 of 6153 = ../data/Vintage Drum Machines/Novation Drumstation/Kit11/DS11909Crash.wav
get 2801 of 6153 = ../data/Vintage Drum Machines/Novation Drumstation/Kit15/DS15808Kick.wav
get 2901 of 6153 = ../data/Vintage Drum Machines/Novation Drumstation/Kit19/DS19808Clave.wav
get 3001 of 6153 = ../data/Vintage Drum Machines/Novation Drumstation/Kit22/DS22909Kick.wav
get 3101 of 6153 = ../data/Vintage Drum Machines/Quasimidi 309/QuasiA 043.wav
get 3201 of 6153 = ../data/Vintage Drum Machines/Quasimidi 309/QuasiB 093.wav
get 3301 of 6153 = ../data/Vintage Drum Machines/Rhythm 33/rhythm33_tom.wav
get 3401 of 6153 = ../data/Vintage Drum Machines/Roland CR-8000/CR8000Tom_Lo.wav
get 3501 of 6153 = ../data/Vintage Drum Machines/Roland JD-990/Org_bell.wav
get 3601 of 6153 = ../data/Vintage Drum Machines/Roland JD800 Dance Card/JD800 Dance Card Drums 030.wav
get 3701 of 6153 = ../data/Vintage Drum Machines/Roland MC-303/Percussion/Hibamboo.wav
get 3801 of 6153 = ../data/Vintage Drum Machines/Roland MC09/MC09 HatO_02.wav
get 3901 of 6153 = ../data/Vintage Drum Machines/Roland R-5/R-5 Reverb Snare.wav
get 4001 of 6153 = ../data/Vintage Drum Machines/Roland R-8/R8Kick06.wav
get 4101 of 6153 = ../data/Vintage Drum Machines/Roland R-8/R8Snare46.wav
get 4201 of 6153 = ../data/Vintage Drum Machines/Roland R8/SNARES/Snaredrum-07.wav
get 4301 of 6153 = ../data/Vintage Drum Machines/Roland SH-09/Bassdrum-40.wav
error loading ../data/Vintage Drum Machines/Roland TR-505/Bassdrum.wav
error loading ../data/Vintage Drum Machines/Roland TR-505/Clap.wav
error loading ../data/Vintage Drum Machines/Roland TR-505/Conga H.wav
error loading ../data/Vintage Drum Machines/Roland TR-505/Cowbell H.wav
error loading ../data/Vintage Drum Machines/Roland TR-505/Cowbell L.wav
error loading ../data/Vintage Drum Machines/Roland TR-505/Hat Closed.wav
error loading ../data/Vintage Drum Machines/Roland TR-505/Rimshot.wav
error loading ../data/Vintage Drum Machines/Roland TR-505/Snaredrum.wav
get 4401 of 6153 = ../data/Vintage Drum Machines/Roland TR-606/Hat Closed.wav
get 4501 of 6153 = ../data/Vintage Drum Machines/Roland TR-626/TR-626Shaker.wav
get 4601 of 6153 = ../data/Vintage Drum Machines/Roland TR-808/TR-808Hat_C05.wav
get 4701 of 6153 = ../data/Vintage Drum Machines/Roland TR-909/TR-909Snare 10.wav
get 4801 of 6153 = ../data/Vintage Drum Machines/Sequential Drumtrax/DrumtraxKick.wav
get 4901 of 6153 = ../data/Vintage Drum Machines/Simmons SDS-5/BASSDRUM/Bassdrum-06.wav
get 5001 of 6153 = ../data/Vintage Drum Machines/Simmons SDSV/sds5_misc/SDS5 06.wav
get 5101 of 6153 = ../data/Vintage Drum Machines/X Drum LM8953/Tom-02.wav
get 5201 of 6153 = ../data/Vintage Drum Machines/Yamaha CS6/Yamaha CS6 B 004.wav
get 5301 of 6153 = ../data/Vintage Drum Machines/Yamaha EX5/EX5B 008.wav
get 5401 of 6153 = ../data/Vintage Drum Machines/Yamaha RM 50/Bassdrums/BD-012.wav
get 5501 of 6153 = ../data/Vintage Drum Machines/Yamaha RM 50/Cymbals/CYMBAL_009.wav
get 5601 of 6153 = ../data/Vintage Drum Machines/Yamaha RM 50/FX/FX_049.wav
error loading ../data/Vintage Drum Machines/Yamaha RM 50/FX/FX_136.wav
error loading ../data/Vintage Drum Machines/Yamaha RM 50/FX/FX_137.wav
error loading ../data/Vintage Drum Machines/Yamaha RM 50/FX/FX_138.wav
get 5701 of 6153 = ../data/Vintage Drum Machines/Yamaha RM 50/Snaredrums/SNAREDRUM_040.wav
get 5801 of 6153 = ../data/Vintage Drum Machines/Yamaha RM 50/Toms/TOMS_032.wav
get 5901 of 6153 = ../data/Vintage Drum Machines/Yamaha RX11/RX11Crash.wav
get 6001 of 6153 = ../data/Vintage Drum Machines/Yamaha RY-30/Percussion/Cabasa-02.wav
get 6101 of 6153 = ../data/Vintage Drum Machines/Yamaha RY30/RY30Snare20.wav
calculated 6091 feature vectors

Now we can run t-SNE over the feature vectors to get a 2-dimensional embedding of our audio files. We use scikit-learn's TSNE function, and additionally normalize the results so that they are between 0 and 1.



In [5]:

    
model = TSNE(n_components=2, learning_rate=150, perplexity=30, verbose=2, angle=0.1).fit_transform(feature_vectors)









    



[t-SNE] Computing pairwise distances...
[t-SNE] Computing 91 nearest neighbors...
[t-SNE] Computed conditional probabilities for sample 1000 / 6091
[t-SNE] Computed conditional probabilities for sample 2000 / 6091
[t-SNE] Computed conditional probabilities for sample 3000 / 6091
[t-SNE] Computed conditional probabilities for sample 4000 / 6091
[t-SNE] Computed conditional probabilities for sample 5000 / 6091
[t-SNE] Computed conditional probabilities for sample 6000 / 6091
[t-SNE] Computed conditional probabilities for sample 6091 / 6091
[t-SNE] Mean sigma: 0.121914
[t-SNE] Iteration 25: error = 1.7862769, gradient norm = 0.0000158
[t-SNE] Iteration 25: gradient norm 0.000016. Finished.
[t-SNE] Iteration 50: error = 1.7862240, gradient norm = 0.0006299
[t-SNE] Iteration 50: gradient norm 0.000630. Finished.
[t-SNE] KL divergence after 50 iterations with early exaggeration: 1.786224
[t-SNE] Iteration 75: error = 1.7804098, gradient norm = 0.0110668
[t-SNE] Iteration 100: error = 1.7665591, gradient norm = 0.0087650
[t-SNE] Iteration 125: error = 1.7619910, gradient norm = 0.0082601
[t-SNE] Iteration 150: error = 1.7607527, gradient norm = 0.0081405
[t-SNE] Iteration 175: error = 1.7604173, gradient norm = 0.0081076
[t-SNE] Iteration 200: error = 1.7603196, gradient norm = 0.0080997
[t-SNE] Iteration 225: error = 1.7602888, gradient norm = 0.0080982
[t-SNE] Iteration 250: error = 1.7602814, gradient norm = 0.0080974
[t-SNE] Iteration 275: error = 1.7602798, gradient norm = 0.0080971
[t-SNE] Iteration 300: error = 1.7602792, gradient norm = 0.0080971
[t-SNE] Iteration 300: error difference 0.000000. Finished.
[t-SNE] Error after 300 iterations: 1.786224

Let's plot our t-SNE points. We can use matplotlib to quickly scatter them and see their distribution.



In [6]:

    
x_axis=model[:,0]
y_axis=model[:,1]

plt.figure(figsize = (10,10))
plt.scatter(x_axis, y_axis)
plt.show()

We see our t-SNE plot of our audio files, but it's not particularly interesting! Since we are dealing with audio files, there's no easy way to compare neighboring audio samples to each other. We can use some other, more interactive environment to view the results of the t-SNE. One way we can do this is by saving the results to a JSON file which stores the filepaths and t-SNE assignments of all the audio files. We can then load this JSON file in another environment.

One example of this is provided in the "AudioTSNEViewer" application in ml4a-ofx. This is an openFrameworks application which loads all the audio clips into an interactive 2d grid (using the t-SNE layout) and lets you play each sample by hovering your mouse over it.

In any case, to save the t-SNE to a JSON file, we first normalize the coordinates to between 0 and 1 and save them, along with the full filepaths.



In [7]:

    
tsne_path = "../data/example-audio-tSNE.json"

x_norm = (x_axis - np.min(x_axis)) / (np.max(x_axis) - np.min(x_axis))
y_norm = (y_axis - np.min(y_axis)) / (np.max(y_axis) - np.min(y_axis))

data = [{"path":os.path.abspath(f), "point":[x, y]} for f, x, y in zip(sound_paths, x_norm, y_norm)]
with open(tsne_path, 'w') as outfile:
    json.dump(data, outfile, cls=MyEncoder)

print("saved %s to disk!" % tsne_path)









    



saved ../data/example-audio-tSNE.json to disk!

Now what about if we want to do this same analysis, but instead of analyzing a directory of individual audio clips, we want to cut up a single audio file into many chunks and cluster those instead.

We can add a few extra lines of code to the above in order to do this. First let's select a piece of audio. Find any song on your computer that you want to do this analysis to and set a path to it.



In [8]:

    
source_audio = '/Users/gene/Downloads/bohemianrhapsody.mp3'

What we will now do is use librosa to calculate the onsets of our audio file. Onsets are the timestamps to the beginning of discrete sonic events in our audio. We set the hop_length (number of samples for each frame) to 512.



In [9]:

    
hop_length = 512
y, sr = librosa.load(source_audio)
onsets = librosa.onset.onset_detect(y=y, sr=sr, hop_length=hop_length)

How do we interpret these numbers? Initially our original audio, at a sample rate sr is divided up into bins which each cointain 512 samples (specified by hop_length in the onset detection function). The onset numbers are an index to each of these bins. So for example, if the first onset is 20, this corresponds to sample 20 * 512 = 10,240. Given a sample rate of 22050, this corresponds to 10250/22050 = 0.46 seconds into the track.

We can view the onsets as vertical lines on top of the waveform using matplotlib and the following code.



In [10]:

    
times = [hop_length * onset / sr for onset in onsets]

plt.figure(figsize=(16,4))
plt.subplot(1, 1, 1)
librosa.display.waveplot(y, sr=sr)
plt.vlines(times, -1, 1, color='r', alpha=0.9, label='Onsets')
plt.title('Wavefile with %d onsets plotted' % len(times))









    Out[10]:





<matplotlib.text.Text at 0x105dbc990>

Now what we will do is, we will go through each of the detected onsets, and crop the original audio to the interval from that onset until the next one. We will create a new folder, ../data/audio_clips, into which we will then save all of the individual audio clips, and extract the same feature vector we described above for our new audio sample.



In [11]:

    
# where to save our new clips to
path_save_intervals = "/Users/gene/Downloads/bohemian_segments/"

# make new directory to save them 
if not os.path.isdir(path_save_intervals):
    os.mkdir(path_save_intervals)

# grab each interval, extract a feature vector, and save the new clip to our above path
feature_vectors = []
for i in range(len(onsets)-1):
    idx_y1 = onsets[i  ] * hop_length  # first sample of the interval
    idx_y2 = onsets[i+1] * hop_length  # last sample of the interval
    y_interval = y[idx_y1:idx_y2]
    features = get_features(y_interval, sr)   # get feature vector for the audio clip between y1 and y2
    file_path = '%s/onset_%d.wav' % (path_save_intervals, i)   # where to save our new audio clip
    feature_vectors.append({"file":file_path, "features":features})   # append to a feature vector
    librosa.output.write_wav(file_path, y_interval, sr)   # save to disk
    if i % 50 == 0:
        print "analyzed %d/%d = %s"%(i+1, len(onsets)-1, file_path)









    



analyzed 1/784 = /Users/gene/Downloads/bohemian_segments//onset_0.wav
analyzed 51/784 = /Users/gene/Downloads/bohemian_segments//onset_50.wav
analyzed 101/784 = /Users/gene/Downloads/bohemian_segments//onset_100.wav
analyzed 151/784 = /Users/gene/Downloads/bohemian_segments//onset_150.wav
analyzed 201/784 = /Users/gene/Downloads/bohemian_segments//onset_200.wav
analyzed 251/784 = /Users/gene/Downloads/bohemian_segments//onset_250.wav
analyzed 301/784 = /Users/gene/Downloads/bohemian_segments//onset_300.wav
analyzed 351/784 = /Users/gene/Downloads/bohemian_segments//onset_350.wav
analyzed 401/784 = /Users/gene/Downloads/bohemian_segments//onset_400.wav
analyzed 451/784 = /Users/gene/Downloads/bohemian_segments//onset_450.wav
analyzed 501/784 = /Users/gene/Downloads/bohemian_segments//onset_500.wav
analyzed 551/784 = /Users/gene/Downloads/bohemian_segments//onset_550.wav
analyzed 601/784 = /Users/gene/Downloads/bohemian_segments//onset_600.wav
analyzed 651/784 = /Users/gene/Downloads/bohemian_segments//onset_650.wav
analyzed 701/784 = /Users/gene/Downloads/bohemian_segments//onset_700.wav
analyzed 751/784 = /Users/gene/Downloads/bohemian_segments//onset_750.wav



In [12]:

    
# save results to this json file
tsne_path = "../data/example-audio-tSNE-onsets.json"

# feature_vectors has both the features and file paths in it. let's pull out just the feature vectors
features_matrix = [f["features"] for f in feature_vectors]

# calculate a t-SNE and normalize it
model = TSNE(n_components=2, learning_rate=150, perplexity=30, verbose=2, angle=0.1).fit_transform(features_matrix)
x_axis, y_axis = model[:,0], model[:,1] # normalize t-SNE
x_norm = (x_axis - np.min(x_axis)) / (np.max(x_axis) - np.min(x_axis))
y_norm = (y_axis - np.min(y_axis)) / (np.max(y_axis) - np.min(y_axis))

data = [{"path":os.path.abspath(f['file']), "point":[float(x), float(y)]} for f, x, y in zip(feature_vectors, x_norm, y_norm)]
with open(tsne_path, 'w') as outfile:
    json.dump(data, outfile)

print("saved %s to disk!" % tsne_path)









    



[t-SNE] Computing pairwise distances...
[t-SNE] Computing 91 nearest neighbors...
[t-SNE] Computed conditional probabilities for sample 784 / 784
[t-SNE] Mean sigma: 0.231958
[t-SNE] Iteration 25: error = 2.3879926, gradient norm = 0.0265828
[t-SNE] Iteration 50: error = 1.9338583, gradient norm = 0.0146562
[t-SNE] Iteration 75: error = 1.2881635, gradient norm = 0.0044903
[t-SNE] Iteration 100: error = 1.2138703, gradient norm = 0.0037113
[t-SNE] KL divergence after 100 iterations with early exaggeration: 1.213870
[t-SNE] Iteration 125: error = 1.1115109, gradient norm = 0.0026617
[t-SNE] Iteration 150: error = 1.0837917, gradient norm = 0.0024060
[t-SNE] Iteration 175: error = 1.0771828, gradient norm = 0.0023484
[t-SNE] Iteration 200: error = 1.0754088, gradient norm = 0.0023334
[t-SNE] Iteration 225: error = 1.0749190, gradient norm = 0.0023294
[t-SNE] Iteration 250: error = 1.0747885, gradient norm = 0.0023282
[t-SNE] Iteration 275: error = 1.0747490, gradient norm = 0.0023279
[t-SNE] Iteration 300: error = 1.0747377, gradient norm = 0.0023278
[t-SNE] Iteration 300: error difference 0.000000. Finished.
[t-SNE] Error after 300 iterations: 1.213870
saved ../data/example-audio-tSNE-onsets.json to disk!

Let's plot the results on another scatter plot. One nice thing we can do is color the points according to their order in the original track.



In [13]:

    
colors = cm.rainbow(np.linspace(0, 1, len(x_axis)))
plt.figure(figsize = (8,6))
plt.scatter(x_axis, y_axis, color=colors)
plt.show()

Not surprisingly, audio clips which appear close to each other (and therefore have a similar color) also cluster together in the t-SNE. Two clips that are right next to each other probably have similar sound content and therefore similar feature vectors. But we also see different sections (e.g. teal and orange) appearing clustered together as well. This suggests those two sections may have similar audio content as well.