Exploratory analysis of Monkey Sounds

Making sure that we can read and transform the wav files, plotting FA.wav's Mel transform

The filterbanks that we use are similar to the ones we use in speech: 25ms sliding windows, 10ms timesteps, 40 Mel-based log filterbanks. Look at the "MFCC_pipeline.png" in this Monkey Sounds folder to get an idea, we stop before the last cepstra transformation (just before MFCC) and use N=40 filterbanks.



In [5]:

    
from pylab import imshow, show
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = 15, 10
import numpy as np
from scipy.io import wavfile
wavfname = "Blue monkey/FA.wav"  # I used "sox FA.WAV FA_soxed.wav" because there was a non-data warning by scipy.io.wavfile
srate, sound = wavfile.read(wavfname)
print sound.shape
print srate
if len(sound.shape) == 2: # in mono sound is a list
    sound = sound[:, 0] + sound[:, 1]
print sound

from spectral import Mel  # you need https://github.com/mwv/spectral

nfbanks = 40
wfbanks = 0.025 # 25ms
rfbanks = 100 # 10ms
fbanks = Mel(nfilt=nfbanks,          # nb of filters in mel bank
             alpha=0.97,             # pre-emphasis
             fs=srate,               # sampling rate
             frate=rfbanks,          # frame rate
             wlen=wfbanks,           # window length
             nfft=1024,              # length of dft => 512 is not enough and produces a glitch
             mel_deltas=False,       # speed
             mel_deltasdeltas=False  # acceleration
             )
fbank = fbanks.transform(sound)[0]  # first dimension is for
                                    # deltas & deltasdeltas
print fbank.shape
imshow(fbank[:100].T, interpolation='nearest')









    



(18549504,)
44100
[-770 -755 -759 ...,  290  323  348]
(42063, 40)






    Out[5]:





<matplotlib.image.AxesImage at 0x1193f9c50>

Extract data from the table files (XLS -> CSV before) + now selecting blue monkeys for an example



In [7]:

    
import csv

fname = "FA"
all_monkeys = []
with open('Monkey Sounds.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter=';', quotechar='|')
    keys = reader.next()
    print keys
    all_monkeys = [dict(zip(keys, row)) for row in reader]
#print sounds

# Selecting blue Monkeys for the rest from now on:
blue = filter(lambda l: l['Species'] == 'Blue monkey', all_monkeys)
t = [filter(lambda (k, v): k in ['Word', 'Filename', 'StartTime (s)', 'EndTime (s)'], d.iteritems()) for d in blue]
#t = [zip(*l)[1] for l in t]
t = [map(lambda (k, v): float(v.replace(',', '.')) if '(s)' in k else v, l) for l in t]
print t









    



['Species', 'Filename', 'StartTime (s)', 'EndTime (s)', 'Word', 'FileDuration (s)']
[['Pyow', 118.053, 118.321, 'EC'], ['Hack', 119.678, 119.723, 'EC'], ['Pyow', 125.97, 126.228, 'EC'], ['Hack', 110.384, 110.575, 'FA'], ['Hack', 110.671, 110.896, 'FA'], ['Hack', 110.966, 111.169, 'FA']]

Building the filterbanks for each of the needed files + looking at calls durations



In [75]:

    
fbanks_arrays = {}
durations = []  # We will also look at the durations of the calls in the same loop pass
for s in t:
    durations.append(s[2] - s[1])
    if not s[-1] in fbanks_arrays:
        fname = s[-1] + ".wav"
        srate, wav = wavfile.read(fname)
        # rebuilding the Mel object in case the sampling rate "srate" changes accross files
        fbanks = Mel(nfilt=nfbanks, alpha=0.97, fs=srate, frate=rfbanks, wlen=wfbanks,
             nfft=1024, mel_deltas=False, mel_deltasdeltas=False)
        fbanks_arrays[s[-1]] = fbanks.transform(wav)[0]
dur_mean = np.mean(durations)
dur_std = np.std(durations)
print dur_mean
print dur_std
print dur_mean + 3*dur_std









    



0.198333333333
0.0738369073627
0.419844055421

Taking 420ms of sound centered on the calls annotation and building a properly formatted dataset



In [82]:

    
y = []
x = []
for s in t:
    y.append(s[0])
    # let's take 420ms of sound for the call to cover most cases, this has to be changed / revaluated
    middle = s[1] + (s[2]-s[1])/2
    start = int((middle - 0.21) * rfbanks)
    end = int((middle + 0.21) * rfbanks)
    print start, end
    assert fbanks_arrays[s[-1]][start:end].shape[0] == 42
    x.append(fbanks_arrays[s[-1]][start:end])
y = np.asarray(y)
x = np.asarray(x)
#print y
#print x
print y.shape
print x.shape









    



11797 11839
11949 11991
12588 12630
11026 11068
11057 11099
11085 11127
(6,)
(6, 42, 40)

Hack (below): Mean 40 (y-axis) filterbanks for the 42 frames (x-axis) centered on a "Hack" call



In [100]:

    
imshow(np.mean(x[y=='Hack'], axis=0).T, interpolation='nearest')









    Out[100]:





<matplotlib.image.AxesImage at 0x128f53610>

Pyow (below): Mean 40 (y-axis) filterbanks for the 42 frames (x-axis) centered on a "Pyow" call



In [101]:

    
imshow(np.mean(x[y=='Pyow'], axis=0).T, interpolation='nearest')









    Out[101]:





<matplotlib.image.AxesImage at 0x128f61190>

Training a support vector machine (maximum margin L2 regularized linear classifier) on the 42 frames of 40 filterbanks



In [102]:

    
from sklearn import svm
clf = svm.SVC()
X = x.reshape(x.shape[0], x.shape[1]*x.shape[2])  # reshaping to (n_samples, n_features)
clf.fit(X, y)  # THIS IS WHERE THE MAGIC HAPPENS!
# If the magic is not strong enough, get bigger guns (features engineering and selection, 
# less eyeballing/more data-driven duration selection, HMMs, deep neural networks...)
print clf
print y, clf.predict(X)  # too few examples (by far!), but seems linearly separable









    



SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel=rbf, max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
['Pyow' 'Hack' 'Pyow' 'Hack' 'Hack' 'Hack'] ['Pyow' 'Hack' 'Pyow' 'Hack' 'Hack' 'Hack']

Principal component analysis to reduce the signal from 42*40=1680 dimensions to just two dimensions and plot calls: red is "Pyow" and blue is "Hack"



In [103]:

    
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
xx = pca.fit_transform(X)
print pca
print pca.components_
print pca.explained_variance_ratio_
print xx
xplt = zip(*xx)
colors = {'Pyow': 'r', 'Hack': 'b'}
plt.scatter(xplt[0], xplt[1], color=map(lambda lab: colors[lab], y))

DO_ICA = False
if DO_ICA:
    from sklearn.decomposition import FastICA
    ica = FastICA(n_components=2)
    xx = ica.fit_transform(X)
    print ica
    print ica.components_
    print xx
    xplt = zip(*xx)
    plt.scatter(xplt[0], xplt[1])









    



PCA(copy=True, n_components=2, whiten=False)
[[ 0.02678048  0.04565061  0.03986244 ...,  0.00080109  0.00338247
   0.01356678]
 [ 0.04864516  0.01296897  0.02565634 ..., -0.00305037  0.00777951
   0.0001592 ]]
[ 0.4065103   0.34673155]
[[-23.16746262 -58.95780154]
 [-63.70355308  68.54504852]
 [-23.08008984 -48.14297732]
 [-13.58388925   6.57519659]
 [ 59.50947582  15.22069074]
 [ 64.02551897  16.75984301]]

Now for all the monkeys at once



In [16]:

    
t = [filter(lambda (k, v): k in ['Species', 'Word', 'Filename', 'StartTime (s)', 'EndTime (s)'], d.iteritems()) for d in all_monkeys]
t = [map(lambda (k, v): float(v.replace(',', '.')) if '(s)' in k else v, l) for l in t]
print t

fbanks_arrays = {}
durations = []  # We will also look at the durations of the calls in the same loop pass
for s in t:
    durations.append(s[2] - s[1])
    if not s[-2] in fbanks_arrays:
        fname = s[-1] + '/' + s[-2] + ".wav"
        print fname
        srate, wav = wavfile.read(fname)
        if len(wav.shape) > 1 and wav.shape[1] == 2:
            wav = (wav[:, 0] + wav[:, 1]) / 2.
        # rebuilding the Mel object in case the sampling rate "srate" changes accross files
        fbanks = Mel(nfilt=nfbanks, alpha=0.97, fs=srate, frate=rfbanks, wlen=wfbanks,
             nfft=1024, mel_deltas=False, mel_deltasdeltas=False)
        fbanks_arrays[s[-2]] = fbanks.transform(wav)[0]
dur_mean = np.mean(durations)
dur_std = np.std(durations)
print dur_mean
print dur_std
print dur_mean + 3*dur_std









    



[['Snort', 160.062, 160.107, 'KP_E_2007_TAPE_5_exp_1', 'Colobus guereza'], ['Snort', 170.762, 170.817, 'KP_E_2007_TAPE_5_exp_1', 'Colobus guereza'], ['Snort', 171.968, 172.08, 'KP_E_2007_TAPE_5_exp_1', 'Colobus guereza'], ['Snort', 174.517, 174.556, 'KP_E_2007_TAPE_5_exp_1', 'Colobus guereza'], ['Snort', 176.724, 176.763, 'KP_E_2007_TAPE_5_exp_1', 'Colobus guereza'], ['Roar', 196.509, 197.775, 'KP_E_2007_TAPE_5_exp_1', 'Colobus guereza'], ['Roar', 198.673, 199.979, 'KP_E_2007_TAPE_5_exp_1', 'Colobus guereza'], ['Pyow', 118.053, 118.321, 'EC', 'Blue monkey'], ['Hack', 119.678, 119.723, 'EC', 'Blue monkey'], ['Pyow', 125.97, 126.228, 'EC', 'Blue monkey'], ['Hack', 110.384, 110.575, 'FA', 'Blue monkey'], ['Hack', 110.671, 110.896, 'FA', 'Blue monkey'], ['Hack', 110.966, 111.169, 'FA', 'Blue monkey'], ['Chirp', 11.496, 11.525, '0-0-Raptor_model_canopy_GD_1minute', 'Titi monkey'], ['Chirp', 13.315, 13.35, '0-0-Raptor_model_canopy_GD_1minute', 'Titi monkey'], ['Chirp', 14.693, 14.757, '0-0-Raptor_model_canopy_GD_1minute', 'Titi monkey'], ['Cheep', 0.522, 0.585, 'Descen_Feed_GR_RF_17062009', 'Titi monkey'], ['Cheep', 1.153, 1.22, 'Descen_Feed_GR_RF_17062009', 'Titi monkey'], ['Cheep', 1.671, 1.697, 'Descen_Feed_GR_RF_17062009', 'Titi monkey']]
Colobus guereza/KP_E_2007_TAPE_5_exp_1.wav
Blue monkey/EC.wav
Blue monkey/FA.wav
Titi monkey/0-0-Raptor_model_canopy_GD_1minute.wav
Titi monkey/Descen_Feed_GR_RF_17062009.wav
0.228210526316
0.371591963392
1.34298641649



In [19]:

    
y = []
x = []
for s in t:
    y.append(s[0])
    # let's take 420ms of sound for the call to cover most cases, this has to be changed / revaluated
    middle = s[1] + (s[2]-s[1])/2
    start = int((middle - 0.21) * rfbanks)
    end = int((middle + 0.21) * rfbanks)
    print start, end
    assert fbanks_arrays[s[-2]][start:end].shape[0] == 42
    x.append(fbanks_arrays[s[-2]][start:end])
y = np.asarray(y)
x = np.asarray(x)
#print y
#print x
print y.shape
print x.shape









    



15987 16029
17057 17099
17181 17223
17432 17474
17653 17695
19693 19735
19911 19953
11797 11839
11949 11991
12588 12630
11026 11068
11057 11099
11085 11127
1130 1172
1312 1354
1451 1493
34 76
97 139
147 189
(19,)
(19, 42, 40)

Let's look at a "Snort"!



In [21]:

    
imshow(np.mean(x[y=='Snort'], axis=0).T, interpolation='nearest')









    Out[21]:





<matplotlib.image.AxesImage at 0x11969cd10>

Hocus-pocus again, seems to work for classifying all the calls of all the monkeys at once!



In [24]:

    
from sklearn import svm
clf = svm.SVC()
X = x.reshape(x.shape[0], x.shape[1]*x.shape[2])  # reshaping to (n_samples, n_features)
clf.fit(X, y)  # THIS IS WHERE THE MAGIC HAPPENS!
# If the magic is not strong enough, get bigger guns (features engineering and selection, 
# less eyeballing/more data-driven duration selection, HMMs, deep neural networks...)
print clf
print y
print clf.predict(X)  # too few examples (by far!), but seems linearly separable









    



SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel=rbf, max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
['Snort' 'Snort' 'Snort' 'Snort' 'Snort' 'Roar' 'Roar' 'Pyow' 'Hack' 'Pyow'
 'Hack' 'Hack' 'Hack' 'Chirp' 'Chirp' 'Chirp' 'Cheep' 'Cheep' 'Cheep']
['Snort' 'Snort' 'Snort' 'Snort' 'Snort' 'Roar' 'Roar' 'Pyow' 'Hack' 'Pyow'
 'Hack' 'Hack' 'Hack' 'Chirp' 'Chirp' 'Chirp' 'Cheep' 'Cheep' 'Cheep']

PCA of all the calls, colors => 'Snort' = green, 'Roar' = cyan, 'Pyow' = red, 'Hack' = blue, 'Chirp' = magenta (purple-ish), 'Cheep' = black



In [30]:

    
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
xx = pca.fit_transform(X)
print pca
print pca.components_
print pca.explained_variance_ratio_
print xx
xplt = zip(*xx)
colors = {'Snort': 'g', 'Roar': 'c', 'Pyow': 'r', 'Hack': 'b', 'Chirp': 'm', 'Cheep': 'k'}
plt.scatter(xplt[0], xplt[1], color=map(lambda lab: colors[lab], y))









    



PCA(copy=True, n_components=2, whiten=False)
[[-0.02163757 -0.01973984 -0.01664406 ..., -0.01835749 -0.01987647
  -0.02393134]
 [-0.02874522 -0.03201159 -0.0219484  ..., -0.00279395 -0.00735004
  -0.02146718]]
[ 0.58071654  0.15339172]
[[-152.55833665  -73.43531832]
 [ -26.44619898  -28.74665467]
 [ -40.57046001  -43.45087078]
 [ -22.07137354  -39.37788168]
 [   3.56759179   -2.8780591 ]
 [-158.05568162   -7.80634557]
 [-135.27998908   -7.41642127]
 [ -15.73692973   60.63024611]
 [  33.8167066    -9.45619048]
 [ -10.32717908   56.34724274]
 [  16.37376777   70.55682674]
 [ -32.34670535   72.43786199]
 [ -39.34528858   69.53418573]
 [  96.99759506  -22.79457622]
 [  82.19034524  -33.12469523]
 [ 125.47099428  -20.03575554]
 [ 105.94392784   -6.13649418]
 [  84.79840243  -26.77343552]
 [  83.57881161   -8.07366475]]






    Out[30]:





<matplotlib.collections.PathCollection at 0x178ca4b50>

ICA of all the calls



In [29]:

    
from sklearn.decomposition import FastICA
ica = FastICA(n_components=2)
xx = ica.fit_transform(X)
print ica
print ica.components_
print xx
xplt = zip(*xx)
plt.scatter(xplt[0], xplt[1], color=map(lambda lab: colors[lab], y))









    



FastICA(algorithm=parallel, fun=logcosh, fun_args=None, max_iter=200,
    n_components=2, random_state=None, tol=0.0001, w_init=None,
    whiten=True)
[[ -1.54498552e-04  -1.63957499e-04  -1.18195350e-04 ...,  -4.52275775e-05
   -6.61644698e-05  -1.29826505e-04]
 [ -5.97412423e-05  -7.54312858e-05  -4.53646961e-05 ...,   2.74059554e-05
    1.40331666e-05  -2.87343032e-05]]
[[-0.57550567  0.04615354]
 [-0.1634417  -0.04992096]
 [-0.24815298 -0.07423671]
 [-0.19757713 -0.09730651]
 [-0.00481149 -0.0176978 ]
 [-0.32480193  0.29472495]
 [-0.28092024  0.24959743]
 [ 0.21179874  0.25141673]
 [ 0.02525758 -0.10329485]
 [ 0.20482549  0.22487405]
 [ 0.31094561  0.22170305]
 [ 0.22787096  0.32805624]
 [ 0.20331907  0.3318557 ]
 [ 0.08965771 -0.28062907]
 [ 0.02106711 -0.28773091]
 [ 0.15354732 -0.32883192]
 [ 0.17251315 -0.23866734]
 [ 0.05116522 -0.27009145]
 [ 0.12324318 -0.19997416]]






    Out[29]:





<matplotlib.collections.PathCollection at 0x17a06ca10>

The following two plots are a very bad attempt at showing what happens in the classifer by simply training the classifier (SVM) on the 2D transformed space (instead of the 42*40=1680 dimensional space that gives good results above).



In [42]:

    
yy = map(lambda call_name: colors.keys().index(call_name), y)
svc = svm.SVC(kernel='linear').fit(xx, yy)
x_min, x_max = xx[:, 0].min() - 1, xx[:, 0].max() + 1
y_min, y_max = xx[:, 1].min() - 1, xx[:, 1].max() + 1
h = 0.1
x_grid, y_grid = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))
Z = svc.predict(np.c_[x_grid.ravel(), y_grid.ravel()])
Z = Z.reshape(x_grid.shape)
pl.contourf(x_grid, y_grid, Z, cmap=pl.cm.Paired)
pl.axis('off')
pl.scatter(xx[:, 0], xx[:, 1], c=yy, cmap=pl.cm.Paired)









    Out[42]:





<matplotlib.collections.PathCollection at 0x112ab4c90>



In [43]:

    
svm_rbf = svm.SVC(kernel='poly').fit(xx, yy)
Z = svm_rbf.predict(np.c_[x_grid.ravel(), y_grid.ravel()])
Z = Z.reshape(x_grid.shape)
pl.contourf(x_grid, y_grid, Z, cmap=pl.cm.Paired)
pl.axis('off')
pl.scatter(xx[:, 0], xx[:, 1], c=yy, cmap=pl.cm.Paired)









    Out[43]:





<matplotlib.collections.PathCollection at 0x10d325750>



In [ ]: