This notebook imports a local version of sound_field_analysis-py to prototype quickly inside the package. It produces a binaural rendering of a spherical microphone array measurement.
To run it, you need to download the necessary impulse responses either as SOFA or MIRO files and reflect your choice in the variable is_load_sofa. After downloading from the provided links the files need to be placed in the specified directory.
In [1]:
# this automatically reloads the sfa-py package if anything is changed there
%load_ext autoreload
%autoreload 2
# Requirements
from IPython import display
import numpy as np
# Load the local sound_field_analysis-py package
import sys
sys.path.insert(0, '../')
from sound_field_analysis import io, gen, process, plot, sph, utils
In [2]:
is_load_sofa = True # whether SOFA source data should be loaded, MIRO source data otherwise
sh_max_order = 8 # maximal utilized spherical harmonics rendering order
rf_nfft = 4096 # target length of radial filter in samples
rf_amp_max_db = 0 # soft limiting level of radial filter in dB
is_apply_rfi = True # whether radial filter improvement should be applied
azim_offset_deg = -37 # azimuth head roation offset in degrees (to make the listener face the sound source)
SOFA (recommended):
| Variable | File | Description | Author | Source |
|---|---|---|---|---|
DRIR |
examples/ data/ DRIR_CR1_VSA_110RS_L.sofa | WDR Control Room 1, left loudspeaker spherical Lebedev grid, 110 nodes up to SH order 29 |
TH Köln | WDR Impulse Response Compilation (SOFA) |
HRIR |
examples/ data/ HRIR_L2702.sofa | Neumann K100, artificial head spherical Lebedev grid, 2702 nodes up to SH order 8 |
TH Köln | sofaconventions.org |
MIRO:
| Variable | File | Description | Author | Source |
|---|---|---|---|---|
DRIR |
examples/ data/ CR1_VSA_110RS_L.mat ** | WDR Control Room 1, left loudspeaker spherical Lebedev grid, 110 nodes up to SH order 29 |
TH Köln | WDR CR1_VSA |
HRIR_L, HRIR_R |
examples/ data/ HRIR_L2702.mat ** | Neumann K100, artificial head spherical Lebedev grid, 2702 nodes up to SH order 8 |
TH Köln | KU100 HRIR Dataset |
| examples/ miro.m | MIRO Matlab Class Definition | TH Köln | miro Class Definition (Required for KU/VSA Datasets) |
** These files contain data in the Matlab MIRO format, which cannot be accessed in Python. To read them, you need to download the specified MIRO Matlab Class Definition file. You will have to convert the data to a Matlab structure first by executing the prepared Matlab script examples/ miro_to_struct.m .
In [3]:
if is_load_sofa:
# load impulse responses from SOFA file
DRIR = io.read_SOFA_file('data/DRIR_CR1_VSA_110RS_L.sofa')
HRIR = io.read_SOFA_file('data/HRIR_L2702.sofa')
NFFT = HRIR.l.signal.shape[-1]
else:
# load impulse responses from MIRO struct
DRIR = io.read_miro_struct('data/CR1_VSA_110RS_L_struct.mat')
HRIR_L = io.read_miro_struct('data/HRIR_L2702_struct.mat', channel='irChOne')
HRIR_R = io.read_miro_struct('data/HRIR_L2702_struct.mat', channel='irChTwo')
NFFT = HRIR_L.signal.signal.shape[-1]
# automatically calculate target processing length
# by summing impulse response lengths of DRIR, HRIR and radial filters
NFFT += DRIR.signal.signal.shape[1] + rf_nfft
In [4]:
# FFT and spatFT
if is_load_sofa:
# transform SOFA data
HRTF_L = process.FFT(HRIR.l.signal, fs=HRIR.l.fs, NFFT=NFFT, calculate_freqs=False)
HRTF_R = process.FFT(HRIR.r.signal, fs=HRIR.r.fs, NFFT=NFFT, calculate_freqs=False)
Hnm_L = process.spatFT(HRTF_L, HRIR.grid, order_max=sh_max_order)
Hnm_R = process.spatFT(HRTF_R, HRIR.grid, order_max=sh_max_order)
else:
# transform MIRO data
HRTF_L = process.FFT(HRIR_L.signal.signal, fs=HRIR_L.signal.fs, NFFT=NFFT, calculate_freqs=False)
HRTF_R = process.FFT(HRIR_R.signal.signal, fs=HRIR_R.signal.fs, NFFT=NFFT, calculate_freqs=False)
Hnm_L = process.spatFT(HRTF_L, HRIR_L.grid, order_max=sh_max_order)
Hnm_R = process.spatFT(HRTF_R, HRIR_R.grid, order_max=sh_max_order)
DRTF = process.FFT(DRIR.signal.signal[:,:int(NFFT)], fs=DRIR.signal.fs, NFFT=NFFT, calculate_freqs=False)
Pnm = process.spatFT(DRTF, DRIR.grid, order_max=sh_max_order)
# compute radial filter
dn = gen.radial_filter_fullspec(max_order=sh_max_order, NFFT=rf_nfft, fs=DRIR.signal.fs,
array_configuration=DRIR.configuration, amp_maxdB=rf_amp_max_db)
# show frequency domain preview of radial filter
if is_apply_rfi:
# improve radial filter (remove DC offset and make casual)
dn, _, dn_delay_samples = process.rfi(dn, kernelSize=rf_nfft)
else:
# make radial filter causal
dn_delay_samples = rf_nfft / 2
dn *= gen.delay_fd(target_length_fd=dn.shape[1], delay_samples=dn_delay_samples)
# show frequency domain preview of radial filter
dn_legend = list(f'n = {n}' for n in range(sh_max_order + 1))
plot.plot2D(dn, fs=DRIR.signal.fs, viz_type='LogFFT', title='radial filter', line_names=dn_legend)
# adjust radial filter length
dn = utils.zero_pad_fd(dn, target_length_td=NFFT)
# # show time domain preview of radial filter
# plot.plot2D(np.fft.irfft(dn), fs=DRIR.signal.fs, viz_type='Time', title='radial filter', line_names=dn_legend)
# adjust radial filter size according to SH order as grades
dn = np.repeat(dn, np.arange(sh_max_order * 2 + 1, step=2) + 1, axis=0)
Now, sum up $d_n * (-1)^m * P_{n(-m)} * H_{nm} * e^{-jm\alpha}$ for 360 different $\alpha$ being the head orientation; according to [1, Sec. 2.4].
[1] Carl Andersson, Headphone Auralization of Acoustic Spaces Recorded with Spherical Microphone Arrays, MSc thesis, Chalmers University of Technology, 2017.
In [5]:
m = sph.mnArrays(sh_max_order)[0] # SH grades stacked by order
m_rev_id = sph.reverseMnIds(sh_max_order) # reverse indices for stacked SH grades
# compute everything you can before the loop
Pnm_dn = np.float_power(-1.0, m)[:, np.newaxis] * Pnm[m_rev_id] * dn
Pnm_dn_Hnm_L = Pnm_dn * Hnm_L
Pnm_dn_Hnm_R = Pnm_dn * Hnm_R
# select azimuth head orientations to compute
azims_rad = utils.deg2rad(np.arange(0, 360) - azim_offset_deg)
# loop over all head orientations that are to be computed
# this could be done with one inner product but the loop helps clarity
S_L = np.zeros([360, int(NFFT/2+1)], dtype=np.complex_)
S_R = np.zeros([360, int(NFFT/2+1)], dtype=np.complex_)
for azims_id, alpha in enumerate(azims_rad):
alpha_exp = np.exp(-1j * m * alpha)[:, np.newaxis]
# these are the spectra of the ear signals
S_L[azims_id] = np.sum(Pnm_dn_Hnm_L * alpha_exp, axis=0)
S_R[azims_id] = np.sum(Pnm_dn_Hnm_R * alpha_exp, axis=0)
# IFFT to yield ear impulse responses
BRIR_L = process.iFFT(S_L)
BRIR_R = process.iFFT(S_R)
# normalize BRIRs
BRIR_peak = np.max(np.abs([BRIR_L, BRIR_R]))
BRIR_L *= 0.9 / BRIR_peak
BRIR_R *= 0.9 / BRIR_peak
In [6]:
# generate grid compatible to SSR (1 degree steps on the horizon)
BRIR_grid = io.SphericalGrid(azimuth=azims_rad + utils.deg2rad(azim_offset_deg),
colatitude=np.broadcast_to(utils.deg2rad(90), (360,)))
BRIR_L_tuple = io.ArraySignal(signal=io.TimeSignal(signal=BRIR_L, fs=DRIR.signal.fs), grid=BRIR_grid)
BRIR_R_tuple = io.ArraySignal(signal=io.TimeSignal(signal=BRIR_R, fs=DRIR.signal.fs), grid=BRIR_grid)
# export file
io.write_SSR_IRs(filename='SSR_IRs.wav', time_data_l=BRIR_L_tuple, time_data_r=BRIR_R_tuple, wavformat='float32')
Start SSR with something like the following to listen to the result:
ssr-binaural --hrirs=/Users/jens.ahrens/Documents/coding/sound_field_analysis-py/examples/SSR_IRs.wav ~/Documents/audio/ssr_scenes/single_mono_48k.asd
Note that if you start SSR as instructed above, then the head rotation direction is reversed because the file SSR_IRs.wav contains binaural room impulse responses rather than HRIRs, which use a different data arrangement (cf. https://ssr.readthedocs.io/en/latest/renderers.html#binaural-renderer vs. https://ssr.readthedocs.io/en/latest/renderers.html#binaural-room-synthesis-renderer).
To avoid this, start SSR in BRS mode with the scene Exp4.asd. The commands are mentioned in the file Exp4.asd. You just need to adjust all paths to your setup.
In [7]:
# define angles in degree to preview
pre_azims_deg = [0, 90]
# read source file
source, source_fs = io.read_wavefile('data/audio.wav')
if len(source.shape) > 1:
source = source[0] # consider only first channel
# select shorter extract from source
source = np.atleast_2d(source[:10*source_fs])
# resample source to match BRIRs
source = utils.simple_resample(source, original_fs=source_fs, target_fs=DRIR.signal.fs)
source_fs = int(DRIR.signal.fs)
for azim in pre_azims_deg:
# show preview
display.display(
display.Markdown(f'### Head orientation {azim}°'),
display.Audio(process.convolve(source, np.array([BRIR_L[azim], BRIR_R[azim]])),
rate=source_fs))
In [8]:
pre_legend = list(f'left ear, {azim}°' for azim in pre_azims_deg)
pre_IR = list(BRIR_L[azim] for azim in pre_azims_deg)
pre_TF = process.FFT(pre_IR, fs=source_fs, calculate_freqs=False)
# show time domain preview
plot.plot2D(pre_IR, fs=source_fs, viz_type='Time', title='', line_names=pre_legend)
plot.plot2D(pre_IR, fs=source_fs, viz_type='ETC', title='', line_names=pre_legend)
display.display(
display.Markdown('### <center>casual radial filter introduced delay of {:.0f} samples or {:.0f} ms</center>'.format(
dn_delay_samples, 1000 * dn_delay_samples / source_fs))
)
# show frequency domain preview
plot.plot2D(pre_TF, fs=source_fs, viz_type='LogFFT', title='', line_names=pre_legend)