In [1]:

%matplotlib inline
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display
plt.rcParams['figure.figsize'] = (14, 5)




In [2]:

plt.style.use('seaborn-muted')
plt.rcParams['figure.figsize'] = (14, 5)
plt.rcParams['axes.grid'] = True
plt.rcParams['axes.spines.left'] = False
plt.rcParams['axes.spines.right'] = False
plt.rcParams['axes.spines.bottom'] = False
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.xmargin'] = 0
plt.rcParams['axes.ymargin'] = 0
plt.rcParams['image.cmap'] = 'gray'
plt.rcParams['image.interpolation'] = None



# Energy and RMSE

The energy (Wikipedia; FMP, p. 66) of a signal corresponds to the total magntiude of the signal. For audio signals, that roughly corresponds to how loud the signal is. The energy in a signal is defined as

$$\sum_n \left| x(n) \right|^2$$

The root-mean-square energy (RMSE) in a signal is defined as

$$\sqrt{ \frac{1}{N} \sum_n \left| x(n) \right|^2 }$$



In [3]:




In [4]:

sr




Out[4]:

22050




In [5]:

x.shape




Out[5]:

(49613,)




In [6]:

librosa.get_duration(x, sr)




Out[6]:

2.2500226757369615



Listen to the signal:



In [7]:

ipd.Audio(x, rate=sr)




Out[7]:

Your browser does not support the audio element.



Plot the signal:



In [8]:

librosa.display.waveplot(x, sr=sr)




Out[8]:

<matplotlib.collections.PolyCollection at 0x10cd21cc0>



Compute the short-time energy using a list comprehension:



In [9]:

hop_length = 256
frame_length = 512




In [10]:

energy = numpy.array([
sum(abs(x[i:i+frame_length]**2))
for i in range(0, len(x), hop_length)
])




In [11]:

energy.shape




Out[11]:

(194,)



Compute the RMSE using librosa.feature.rmse:



In [12]:

rmse = librosa.feature.rmse(x, frame_length=frame_length, hop_length=hop_length, center=True)




In [13]:

rmse.shape




Out[13]:

(1, 194)




In [14]:

rmse = rmse[0]



Plot both the energy and RMSE along with the waveform:



In [15]:

frames = range(len(energy))
t = librosa.frames_to_time(frames, sr=sr, hop_length=hop_length)




In [16]:

librosa.display.waveplot(x, sr=sr, alpha=0.4)
plt.plot(t, energy/energy.max(), 'r--')             # normalized for visualization
plt.plot(t[:len(rmse)], rmse/rmse.max(), color='g') # normalized for visualization
plt.legend(('Energy', 'RMSE'))




Out[16]:

<matplotlib.legend.Legend at 0x10cd54cc0>



## Questions

Write a function, strip, that removes leading silence from a signal. Make sure it works for a variety of signals recorded in different environments and with different signal-to-noise ratios (SNR).



In [17]:

def strip(x, frame_length, hop_length):

# Compute RMSE.
rmse = librosa.feature.rmse(x, frame_length=frame_length, hop_length=hop_length, center=True)

# Identify the first frame index where RMSE exceeds a threshold.
thresh = 0.01
frame_index = 0
while rmse[0][frame_index] < thresh:
frame_index += 1

# Convert units of frames to samples.
start_sample_index = librosa.frames_to_samples(frame_index, hop_length=hop_length)

# Return the trimmed signal.
return x[start_sample_index:]



Let's see if it works.



In [18]:

y = strip(x, frame_length, hop_length)




In [19]:

ipd.Audio(y, rate=sr)




Out[19]:

Your browser does not support the audio element.




In [20]:

librosa.display.waveplot(y, sr=sr)




Out[20]:

<matplotlib.collections.PolyCollection at 0x10ce20128>



It worked!