MUSIC014/102 - Music, Information, Neuroscience - Professor Michael Casey, 1/7/2015

Week 1 Lab

Using the Plompt and Levelt dissonance function

We first include some libraries that will allow us to load sound files, plot figures, and playback audio


In [1]:
from pylab import *
from bregman.suite import *
import scipy.signal as signal
import pdb

The next line tells ipython that we'd like to plot all figures directly in the browser


In [2]:
%matplotlib inline

This is our first function definition. The function is called ideal_chromatic_dissonance and has 2 possible parameters: num_harmonics and f0. We set these parameters to 7 and 440 as default values. Whenever we "call" this function, it returns an array of chromatic dissonance values based on the parameters we have sent it for the number of harmonics and the fundamental frequency.


In [3]:
def ideal_chromatic_dissonance(num_harmonics=7, f0=440):
    """
    One octave of chromatic dissonance values
    """
    harms = arange(num_harmonics)+1
    freqs = [f0*i for i in harms]
    amps = [exp(-.5*i) for i in harms]
    freqs2 = array([[f0*2**(k/12.)*i for i in harms] for k in range(0,13)])

    all_amps = r_[amps,amps]
    diss = []
    for f in freqs2:
        all_freqs = r_[freqs,f]
        idx = all_freqs.argsort()
        diss.append(dissonance_fun(all_freqs[idx], all_amps[idx]))
    return array(diss)

We can use the following function to find possible peaks in F, representing the features from a signal. Stored in F.X is the Short-Time-Fourier-Transform. By finding peaks in our frequency transform, we can better understand where most of the energy of a signal is located in terms of its frequencies, rather than working with all frequency values. We'll need this function when calculating the dissonance.


In [4]:
def get_peaks(F):
    """
    Extract peaks from linear spectrum in F
    Algorithm 1: zero-crossings of derivative of smoothed spectrum
    """
    X = F.X.copy()
    b,a = signal.butter(10, .25) # lp filter coefficients
    # Smoothing
    signal.filtfilt(b,a,X,axis=0)
    # Derivative
    Xd = diff(X,axis=0)
    # Zero crossing
    thresh=1e-9
    peak_idx = []
    for i,x in enumerate(Xd.T):
        idx = where((x[:-1]>thresh)&(x[1:]<-thresh))[0] + 1
        if len(idx):
            idx = idx[X[idx,i].argsort()][::-1]
        peak_idx.append(idx)
    return peak_idx

This function produces a signal which we could directly send to the speaker, or pass to a Fourier transform. We'll plot the output of this function later.


In [5]:
def audio_chromatic_scale(f0=440, num_harmonics=7):
    N = 11025
    nH = num_harmonics
    H = vstack([harmonics(f0=f0*2**(k/12.),num_harmonics=nH, num_points=N) for k in arange(13)])
    return H

This function will take a fundamental, number of harmonics, and number of peaks to select from a Fourier transform, and return the dissonance values from each note of a chromatic scale.


In [6]:
def audio_chromatic_dissonance(f0=440, num_harmonics=7, num_peaks=10):
    sr = 44100
    nfft = 8192
    afreq = sr/nfft
    H = audio_chromatic_scale(f0=f0, num_harmonics=num_harmonics)
    h0 = H[0]
    diss = []
    for i,h in enumerate(H):
        F = LinearFrequencySpectrum((h0+h)/2.,nfft=nfft,wfft=nfft/2,nhop=nfft/4)
        P = get_peaks(F)
        frame = []
        for j,p in enumerate(P):
            freqs = afreq*p[:num_peaks] # take middle frame as reference
            mags = F.X[p[:num_peaks],j]
            idx = freqs.argsort()
            frame.append(dissonance_fun(freqs[idx],mags[idx]))        
        diss.append(array(frame).mean())
    return array(diss)

Finally, this function will help us plot/visualize dissonance values for an entire scale, again using the parameters of a fundamental frequency, number of harmonics, and number of peaks.


In [7]:
def dissonance_plot(f0=440, num_harmonics=7, num_peaks=10):
    figure()
    diss_i = ideal_chromatic_dissonance(f0=f0, num_harmonics=num_harmonics)
    diss = audio_chromatic_dissonance(f0=f0, num_harmonics=num_harmonics, num_peaks=num_peaks)
    plot(diss_i / diss_i.max(), linestyle='--', linewidth=2)
    plot(diss / diss.max())
    t_str = 'f0=%d, partials=%d, peaks=%d'%(f0,num_harmonics,num_peaks)
    title('Dissonance (chromatic): '+t_str,fontsize=16)
    legend(['ideal','estimated'])
    xlabel('Pitch class (chroma)',fontsize=14)
    ylabel('Dissonance',fontsize=14)
    grid()

Let's first try the last function and see what dissonance values look like for a chromatic scale. We "call" the function by writing it's name, and setting any possible parameters that we'd like to change from the default values. Here we simply replicate the default behavior by passing the same default values to each parameter.


In [8]:
dissonance_plot(f0=440, num_harmonics=7, num_peaks=10)


Looking at the above plot, on the x-axis, we see the different possible chroma values from the entire scale sequence. On the y-axis, we have the dissonance values. Try changing the values of f0, num_harmonics, and num_peaks to see how they effect the resulting plot. What are each of these values doing?

Let's break down the dissonance computation a bit. We can first use the function audio_chromatic_scale to get a 13 x 11025 samples array of notes.


In [9]:
notes = audio_chromatic_scale(f0 = 440, num_harmonics = 7)
print notes.shape


(13, 11025)

Because there are 13 notes, the first dimension is 13. The second dimension is the number of samples of our waveform, or 11025. We can play each of these samples as an audio signal through the speaker. We can use the hstack function to combine all notes into one long sequence, and the play function to play the entire sequence:


In [10]:
play(hstack(notes))

Or we can visualize it as a signal


In [11]:
plt.plot(hstack(notes))


Out[11]:
[<matplotlib.lines.Line2D at 0x10f37bcd0>]

Because there are so many "samples" or discretely sampled points of the signal/waveform, we cannot really tell from the plot what our signal looks like. Let's just take a short chunk of it from the first note. Remember that our notes array is 13 x 11025. To access the first note, we can say: notes[0]. We index the first note with 0 and the last one with 12. The dimension of each note is 11025 samples. Let's try visualizing just the first note and the first 400 samples of that note:


In [12]:
plt.plot(notes[0][:400])


Out[12]:
[<matplotlib.lines.Line2D at 0x10f31f710>]

That's a pretty weird signal... If we try looking at a simpler signal without so many harmonics, maybe it will look more like a sine wave. Let's try:


In [13]:
F = 440
N = 1
notes = audio_chromatic_scale(f0 = F, num_harmonics = N)
plt.plot(notes[0][:400])
play(hstack(notes))
dissonance_plot(f0 = F, num_harmonics = N, num_peaks = 2)


That looks more familiar. Try changing the parameter of f0 and num_harmonics and both visualize and play the sounds to see how it changes the signal and dissonance values.

Let's now delve a bit into the Fourier Transform itself and see how we can compute the "spectrum". I will copy and paste some of the code from the function audio_chromatic_dissonance which neatly describes the process already


In [14]:
# Let's try a fundamental frequency of 440
F = 440

# With 7 harmonics
H = 7

# sr, or sample rate, describes how many samples we'd like in a single second of audio data
sr = 44100 

# this is how many samples we'll use when calculating the FFT
nfft = 8192

# Let's get the signal 
H = audio_chromatic_scale(f0 = F, num_harmonics = H)

# The first note is stored in H[0]
h0 = H[0]

Now we'll use the linear spectrum function to get a bunch of possible features


In [15]:
F = LinearFrequencySpectrum(h0,nfft=nfft,wfft=nfft/2,nhop=nfft/4)

Inside F is a variable X which stores the Short Time Fourier Transform


In [16]:
print F.X.shape


(4097, 6)

Let's just look at a small portion of this spectrum:


In [17]:
stfts = F.X[:100,:]

And plot it using the x-axis in terms of Frequency Bins.


In [18]:
plt.plot(np.arange(stfts.shape[0]) * sr / nfft, stfts)


Out[18]:
[<matplotlib.lines.Line2D at 0x10f5e8e10>,
 <matplotlib.lines.Line2D at 0x10f413cd0>,
 <matplotlib.lines.Line2D at 0x10f3afb90>,
 <matplotlib.lines.Line2D at 0x10f3aff10>,
 <matplotlib.lines.Line2D at 0x10f3af9d0>,
 <matplotlib.lines.Line2D at 0x116099850>]

The get_peaks function will take as input a STFT and return the peaks:


In [19]:
P = get_peaks(F)

We can see where these peaks are located for the first note, e.g.:


In [20]:
P[0] * sr / nfft


Out[20]:
array([  441,   877,  1318, ..., 22017, 22028, 22039])

Extra-credit: Try writing your own dissonance function which takes any signal and computes the dissonance plot. Try loading a sound file or combining different sine waves to see how the dissonance function is changed.


In [20]: