Stochastic signals
Described by the laws of probablity; mean, variance, probability distributions
$Yst_l[k]=|H_l[k]||U[k]|e^{j(\sphericalangle H[k]+\sphericalangle U[k])} =|\hat{X}_l[k]|e^{j\sphericalangle U[k]}$
$\hat{X}[k]|$: approximation of magnitude spectrum of input single $x[n]$
$\sphericalangle U[k]:$ spectral phases of noise signal
$l$: frame number
Convolution in the spectral domain is the product of the two spectra
In polar coordinates, the product of two magnitude spectra multiplied by exponential e to the j and the sum of the two phase spectra
Magnitude spectra of white noise is a flat line; a constant, so we can take out of equation.
As phase of model, use phase of white noise.
Take approximation of magnitude spectrum of our signal $|\hat{X}_l[k]|$ and take random phases $e^{j\sphericalangle U[k]}$for the modeling of the phase spectrum
details of shape are not relevant; rather, approximation of the time varying magnitude spectrum of the input signal.
phase = sequence of random numbers
Linear Predictive Coding
$\hat{x}[n] = \sum\limits_{k=1}^{K}a_kx[n-k]$ Linear combination of past samples = expression of IR filter — infinite impulse response filter—linear combination of previous samples; goal=find coefficients
$Error=\sum\limits_{n=-\infty }^{n=\infty}(x[n]+\sum\limits_{k=1}^{K} a_kx[n-k])^2$ — sum of original with approximated ; then narrowed to finite length signals; find a coefficient that minimizes the error signal
obtain a sample set of filter coefficients (sub k) and the frequency response of the resulting filter approximates the spectrum
voice sound commonly approximated with LPC; a way to approximate the resonances, the formants, of a signal;
We develop from here; sum of the sinusoid as the sum of the transform the windows shifted to a frequency and scaled to the amplitudes of these sinusoids, plus the spectrum of the residiual components.
$Y_l[k]=\sum\limits_{r=1}^{R_l}A_{(r,l)}W[k-\hat{f}_{(r,l)}]+Xr_l[k]=Ys_l[k]+Xr_l[k]$]
$W[k]$ = spectrum of analysis window
$R_l$: number of sinusoidal components
$A_{(r,l)}$: amplitude of sinusoid
$\hat{f}_{(r,l)}$: normalized frequency of sinusoid
$Xr_l[k]=X_l[k]-Ys_l[k]$: resiudent component spectrum
$Ys_l[k]$: sinusoidal component spectrum
$l$: frame number
subtract generated from spectrum of same size and window=residual spectrum (usual 512 samples); use blackman-harris so it can be easily subtracted
inverse of that shows residual signal (e.g. breath noise of flute)
To improve stocastic:
smaller hop size, increase approx size-- A FINER GRAIN
With speak sound (not completely stochastic): 256 Hop, .2 approx
get rid of phase spectrum smooth out magnitude spectrum — sounds like a whispered type of sound FFT:
Blackman window—stable sound; good for low sidelobes; good signal to noise ratio
M: blackman: 6 bins * fps / fundamental frequency—odd; to minimize rest of components, make bigger
analyze middle of sound
look for high stochastic in high frequencies; less harmonic; not clearly defined as partials
inf crease max freq deviation in harmonic track shows unstable harmonics in high (with flute)
Stochastic approximation factor: reduces the whole spectrum by 90%
For stochastic, no need for odd-sized window or zero-padding for the FFT
DV?
resample: FFT approach; to downsample a signal
xw=x[10000:10000+M] w mX = 20 np.log10(abs([X[:
In [ ]:
Blackman is good for stable sound due to low sidelobes; good signal to noise ratio
window size: 6 * 44100 / 440 hz
6*44100/200=1323
In [ ]: