Zero Crossings

References

  • Sethares, William A. Tuning, timbre, spectrum, scale. Springer Science & Business Media, 2005. link
  • Collins, Nick. "Influence in Early Electronic Dance Music: An Audio Content Analysis Investigation." ISMIR. 2012.
  • Stowell, Dan, and Mark D. Plumbley. "Timbre remapping through a regression-tree technique." Sound and Music Computing (SMC) (2010).
  • Peeters, Geoffroy. "A large set of audio features for sound description (similarity and classification) in the CUIDADO project." (2004). Found it here
  • CMU slides on Cepstrum and MFCC by Kishore Prahallad
  • Muda, Lindasalwa, Mumtaj Begam, and I. Elamvazuthi. "Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques." arXiv preprint arXiv:1003.4083 (2010). link
  • Logan, Beth. "Mel Frequency Cepstral Coefficients for Music Modeling." ISMIR. 2000. link

RMS

Spectral Power

Spectral Power Ratio

Can choose 5 log-spaced sub- bands (50–400, 400–800, 800–1600, 1600–3200, and 3200– 6400 Hz)

Spectral Centroid

Spectral 95- and 25-percentiles

Seen in literature: 0.25 an 0.95, or 0.8 and 0.95% energy

Spectral Rolloff

Spectral Crest Measure

Spectral Slope

Harmonicity

MFCC

MFCCs combine consideration of aspects of human hearing (logarithmic frequency perception, the mel scale) and physics of musical instruments (these systems often have well defined overtones that are harmonic. They are commonly used for speech detection and concatenative synthesis of speech.

Check pitch detection methods Python notebook for cepstrum analysis. Review: cepstrum analysis captures spectral envelope. How can we link this to the way the ear perceives sound? (Prahallad)

Mel-Frequency Analysis

  • "Human ear acts as a filter: only concentrates on certain frequency components
  • Filters are non-uniformly spaced on the frequency axis: more filters in the low frequency regions" tha in the high frequency regions

Lindasalwa: 7 steps to computing MFCC

  1. Pre-Emphasis: pass signal through a filter which emphasizes higher frequencies. $ Y(n) = X(n) - 0.95X(n-1)$
  2. Framing: take sequence of 2^n samples, typically N = 256
  3. Hamming windowing
  4. FFT
  5. Mel Filter Bank Processing: "a set of triangular filters are used to compute a weighted sum of filter spectral components so that the output of the process approximates to a Mel scale" (linear in below 1000 Hz and logarithmic above). "Each filter's magnitude frequency response is triangular in shape and equal to unity at the centre frequency and decreases linearly to zero at the centre frequency of two adjacent filters. Then, each filter output is the sum of its filtered spectral components. After that the following equation is used to compute the Mel for a given frequency f in Hz: $ F(Mel) = [2595 * log_{10}(1 + f)700] $
  6. Discrete Cosine Transform: convert the log Mel spectrum into time domain using DCT. Result is called Mel Frequency Cepstrum Coefficient. This is a type of compression
  7. Delta Energy and Delta Spectrum: analyze how MFCC frames change over time
  • Can calculate difference of beatwise chroma vectors to get "a one-dimensional measure of harmonic change, where the summation process avoided issues with different absolute pitch centers (Collins).

Perceptual Loudness

Sensory Dissonance (Sethares Model)

When two sine waves are very close in frequency, their combination sounds pleasant to the ear: a pure tone with some beating. As the frequency difference between the two waves grows, the sense of roughness or dissonance grows, until the two are far enough apart that the ear perceives them as two separate pitches.

Idea: measure dissonance by counting the beats. Experimental results show that beating at 20 to 30 hz is perceived as roughest.

  1. make all values of the signal nonnegative
  2. "combined with a low-pass filter, this creates an envelope detector with an output that rides along the outer edge of the signal. The bandpass filter is tuned to have maximum response in frequencies where the beating is most critical." (Sethares, 48)

"LPF is a Remez filter with cutoff at 100 Hz and BPF (which influences the detailed shape of the output signal) was a second-order Butterworth filter with passband between 15 and 35 Hz." (Sethares, 49)

More information is in the appendices, which are not available online.