SaturationDetector use example

this algorithm outputs the staring/ending locations of the saturated regions in seconds. Saturated regions are found by means of a tripe criterion:

     1. samples in a saturated region should have more energy than a given threshold.
     2. the difference between the samples in a saturated region should be smaller than a given threshold.
     3. the duration of the saturated region should be longer than a given threshold.

  note: The algorithm was designed for a framewise use and the returned timestamps are related to the first frame processed. Use reset() or configure() to restart the count.



In [6]:

    
import essentia.standard as es
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Audio 
from essentia import array as esarr
plt.rcParams["figure.figsize"] =(12,9)



In [2]:

    
def compute(x, frame_size=1024, hop_size=512, **kwargs):
    saturationDetector = es.SaturationDetector(frameSize=frame_size,
                                     hopSize=hop_size, 
                                     **kwargs)
    ends = []
    starts = []
    for frame in es.FrameGenerator(x, frameSize=frame_size,
                                   hopSize=hop_size, startFromZero=True):
        frame_starts, frame_ends = saturationDetector(frame)

        for s in frame_starts:
            starts.append(s)
        for e in frame_ends:
            ends.append(e)

    return starts, ends

A synthetic gap

Lets start by testing the algorithm in the easiest possible scenario



In [7]:

    
fs = 44100

signal = [0]*fs + [1]*fs + [0]*fs

starts, ends = compute(signal)


times = np.linspace(0, len(signal) / float(fs), len(signal))

plt.plot(times, signal)
plt.title('Synthetic gap')
for idx in range(len(starts)):
    l = plt.axvline(starts[idx], color='r')
    plt.axvline(ends[idx], color='r')
l.set_label('Saturation borders')
plt.legend()









    Out[7]:





<matplotlib.legend.Legend at 0x7f3be2e3f850>

A real saturated signal

In this example, we feed a very noisy clip into the algorithm and show a couple of detected saturated regions.



In [21]:

    
fs = 44100.

audio_dir = '../../audio/'
audio = es.MonoLoader(filename='{}/{}'.format(audio_dir,
                      'recorded/distorted.wav'),
                      sampleRate=fs)()



In [22]:

    
Audio(audio, rate=fs)









    Out[22]:



In [23]:

    
starts, ends = compute(audio)


times = np.linspace(0, len(audio) / float(fs), len(audio))

random_indexes = [3, 12, 31]

fig, ax = plt.subplots(len(random_indexes))
plt.subplots_adjust(hspace=.4)
for idx, ridx in enumerate(random_indexes):
    l1 = ax[idx].axvline(starts[ridx], color='r', alpha=.5)
    ax[idx].axvline(ends[ridx], color='r', alpha=.5)
    ax[idx].plot(times, audio)
    ax[idx].set_xlim([starts[ridx] - .001, ends[ridx] + .001])
    ax[idx].set_title('Saturated region located around {:.2f}s'.format(np.mean([ends[ridx], starts[ridx]])))
    
l1.set_label('Saturated region bounds')
fig.legend()









    Out[23]:





<matplotlib.legend.Legend at 0x7f3be2b65090>

Parameters

this is an explanation of the most relevant parameters of the algorithm

differentialThreshold. This algorithm works on top of the derivative of the signal. This parameter control how small the first derivative of a single sample has to be in order to be considered a saturated sample.
energyThreshold. This parameter filters out saturated regions with smaller energy. The main idea is to discard everything that is not loud.
minimumDuration. This parameter filters out the shortest segments. The main motivation is that if the saturated region is too short it will not be perceived.



In [ ]: