Binaural Rendering of Array Data

This notebook imports a local version of sound_field_analysis-py to prototype quickly inside the package. It produces a binaural rendering of a spherical microphone array measurement.

To run it, you need to download the necessary impulse responses either as SOFA or MIRO files and reflect your choice in the variable is_load_sofa. After downloading from the provided links the files need to be placed in the specified directory.


In [1]:
# this automatically reloads the sfa-py package if anything is changed there
%load_ext autoreload
%autoreload 2

# Requirements
from IPython import display
import numpy as np

# Load the local sound_field_analysis-py package
import sys
sys.path.insert(0, '../')
from sound_field_analysis import io, gen, process, plot, sph, utils

Settings


In [2]:
is_load_sofa    = True  # whether SOFA source data should be loaded, MIRO source data otherwise
sh_max_order    = 8  # maximal utilized spherical harmonics rendering order
rf_nfft         = 4096  # target length of radial filter in samples
rf_amp_max_db   = 0  # soft limiting level of radial filter in dB
is_apply_rfi    = True  # whether radial filter improvement should be applied
azim_offset_deg = -37  # azimuth head roation offset in degrees (to make the listener face the sound source)

Load data

SOFA (recommended):

Variable File Description Author Source
DRIR examples/ data/ DRIR_CR1_VSA_110RS_L.sofa WDR Control Room 1, left loudspeaker
spherical Lebedev grid, 110 nodes
up to SH order 29
TH Köln WDR Impulse Response Compilation (SOFA)
HRIR examples/ data/ HRIR_L2702.sofa Neumann K100, artificial head
spherical Lebedev grid, 2702 nodes
up to SH order 8
TH Köln sofaconventions.org

MIRO:

Variable File Description Author Source
DRIR examples/ data/ CR1_VSA_110RS_L.mat ** WDR Control Room 1, left loudspeaker
spherical Lebedev grid, 110 nodes
up to SH order 29
TH Köln WDR CR1_VSA
HRIR_L,
HRIR_R
examples/ data/ HRIR_L2702.mat ** Neumann K100, artificial head
spherical Lebedev grid, 2702 nodes
up to SH order 8
TH Köln KU100 HRIR Dataset
examples/ miro.m MIRO Matlab Class Definition TH Köln miro Class Definition (Required for KU/VSA Datasets)

** These files contain data in the Matlab MIRO format, which cannot be accessed in Python. To read them, you need to download the specified MIRO Matlab Class Definition file. You will have to convert the data to a Matlab structure first by executing the prepared Matlab script examples/ miro_to_struct.m .


In [3]:
if is_load_sofa:
    # load impulse responses from SOFA file
    DRIR   = io.read_SOFA_file('data/DRIR_CR1_VSA_110RS_L.sofa')
    HRIR   = io.read_SOFA_file('data/HRIR_L2702.sofa')
    NFFT   = HRIR.l.signal.shape[-1]
else:
    # load impulse responses from MIRO struct
    DRIR   = io.read_miro_struct('data/CR1_VSA_110RS_L_struct.mat')
    HRIR_L = io.read_miro_struct('data/HRIR_L2702_struct.mat', channel='irChOne')
    HRIR_R = io.read_miro_struct('data/HRIR_L2702_struct.mat', channel='irChTwo')
    NFFT   = HRIR_L.signal.signal.shape[-1]

# automatically calculate target processing length
# by summing impulse response lengths of DRIR, HRIR and radial filters
NFFT += DRIR.signal.signal.shape[1] + rf_nfft


open SOFA file "data/DRIR_CR1_VSA_110RS_L.sofa"
 --> samplerate: 48000 Hz, receivers: 110, emitters: 1, measurements: 1, samples: 17000, format: float64
 --> convention: SingleRoomDRIR, version: 0.3
 --> listener: Earthworks M30 Omni - S/N3551E (Rigid Sphere)
 --> author: Benjamin Bernsch�tz, Philipp Stade, Max R�hl
('degree, degree, metre', 'spherical')

open SOFA file "data/HRIR_L2702.sofa"
 --> samplerate: 48000 Hz, receivers: 2, emitters: 1, measurements: 2702, samples: 128, format: float64
 --> convention: SimpleFreeFieldHRIR, version: 1.0
 --> listener: Neumann KU100
 --> author: Benjamin Bernsch�tz
('degree, degree, metre', 'spherical')

Processing


In [4]:
# FFT and spatFT
if is_load_sofa:
    # transform SOFA data
    HRTF_L = process.FFT(HRIR.l.signal, fs=HRIR.l.fs, NFFT=NFFT, calculate_freqs=False)
    HRTF_R = process.FFT(HRIR.r.signal, fs=HRIR.r.fs, NFFT=NFFT, calculate_freqs=False)
    Hnm_L = process.spatFT(HRTF_L, HRIR.grid, order_max=sh_max_order)
    Hnm_R = process.spatFT(HRTF_R, HRIR.grid, order_max=sh_max_order)
else:
    # transform MIRO data
    HRTF_L = process.FFT(HRIR_L.signal.signal, fs=HRIR_L.signal.fs, NFFT=NFFT, calculate_freqs=False)
    HRTF_R = process.FFT(HRIR_R.signal.signal, fs=HRIR_R.signal.fs, NFFT=NFFT, calculate_freqs=False)
    Hnm_L = process.spatFT(HRTF_L, HRIR_L.grid, order_max=sh_max_order)
    Hnm_R = process.spatFT(HRTF_R, HRIR_R.grid, order_max=sh_max_order)

DRTF = process.FFT(DRIR.signal.signal[:,:int(NFFT)], fs=DRIR.signal.fs, NFFT=NFFT, calculate_freqs=False)
Pnm = process.spatFT(DRTF, DRIR.grid, order_max=sh_max_order)

# compute radial filter
dn = gen.radial_filter_fullspec(max_order=sh_max_order, NFFT=rf_nfft, fs=DRIR.signal.fs,
                                array_configuration=DRIR.configuration, amp_maxdB=rf_amp_max_db)

# show frequency domain preview of radial filter


if is_apply_rfi:
    # improve radial filter (remove DC offset and make casual)
    dn, _, dn_delay_samples = process.rfi(dn, kernelSize=rf_nfft)
else:
    # make radial filter causal
    dn_delay_samples = rf_nfft / 2
    dn *= gen.delay_fd(target_length_fd=dn.shape[1], delay_samples=dn_delay_samples)

# show frequency domain preview of radial filter
dn_legend = list(f'n = {n}' for n in range(sh_max_order + 1))
plot.plot2D(dn, fs=DRIR.signal.fs, viz_type='LogFFT', title='radial filter', line_names=dn_legend)

# adjust radial filter length
dn = utils.zero_pad_fd(dn, target_length_td=NFFT)

# # show time domain preview of radial filter
# plot.plot2D(np.fft.irfft(dn), fs=DRIR.signal.fs, viz_type='Time', title='radial filter', line_names=dn_legend)

# adjust radial filter size according to SH order as grades
dn = np.repeat(dn, np.arange(sh_max_order * 2 + 1, step=2) + 1, axis=0)


Now, sum up $d_n * (-1)^m * P_{n(-m)} * H_{nm} * e^{-jm\alpha}$ for 360 different $\alpha$ being the head orientation; according to [1, Sec. 2.4].

[1] Carl Andersson, Headphone Auralization of Acoustic Spaces Recorded with Spherical Microphone Arrays, MSc thesis, Chalmers University of Technology, 2017.


In [5]:
m = sph.mnArrays(sh_max_order)[0]  # SH grades stacked by order
m_rev_id = sph.reverseMnIds(sh_max_order)  # reverse indices for stacked SH grades

# compute everything you can before the loop
Pnm_dn = np.float_power(-1.0, m)[:, np.newaxis] * Pnm[m_rev_id] * dn
Pnm_dn_Hnm_L = Pnm_dn * Hnm_L
Pnm_dn_Hnm_R = Pnm_dn * Hnm_R

# select azimuth head orientations to compute
azims_rad = utils.deg2rad(np.arange(0, 360) - azim_offset_deg)

# loop over all head orientations that are to be computed
# this could be done with one inner product but the loop helps clarity
S_L = np.zeros([360, int(NFFT/2+1)], dtype=np.complex_)
S_R = np.zeros([360, int(NFFT/2+1)], dtype=np.complex_)
for azims_id, alpha in enumerate(azims_rad):
    alpha_exp = np.exp(-1j * m * alpha)[:, np.newaxis]

    # these are the spectra of the ear signals
    S_L[azims_id] = np.sum(Pnm_dn_Hnm_L * alpha_exp, axis=0)
    S_R[azims_id] = np.sum(Pnm_dn_Hnm_R * alpha_exp, axis=0)

# IFFT to yield ear impulse responses
BRIR_L = process.iFFT(S_L)
BRIR_R = process.iFFT(S_R)

# normalize BRIRs
BRIR_peak = np.max(np.abs([BRIR_L, BRIR_R]))
BRIR_L *= 0.9 / BRIR_peak
BRIR_R *= 0.9 / BRIR_peak

Export IRs to SSR wavs


In [6]:
# generate grid compatible to SSR (1 degree steps on the horizon)
BRIR_grid = io.SphericalGrid(azimuth=azims_rad + utils.deg2rad(azim_offset_deg),
                             colatitude=np.broadcast_to(utils.deg2rad(90), (360,)))
BRIR_L_tuple = io.ArraySignal(signal=io.TimeSignal(signal=BRIR_L, fs=DRIR.signal.fs), grid=BRIR_grid)
BRIR_R_tuple = io.ArraySignal(signal=io.TimeSignal(signal=BRIR_R, fs=DRIR.signal.fs), grid=BRIR_grid)

# export file
io.write_SSR_IRs(filename='SSR_IRs.wav', time_data_l=BRIR_L_tuple, time_data_r=BRIR_R_tuple, wavformat='float32')

Start SSR

Start SSR with something like the following to listen to the result:

ssr-binaural --hrirs=/Users/jens.ahrens/Documents/coding/sound_field_analysis-py/examples/SSR_IRs.wav ~/Documents/audio/ssr_scenes/single_mono_48k.asd

Note that if you start SSR as instructed above, then the head rotation direction is reversed because the file SSR_IRs.wav contains binaural room impulse responses rather than HRIRs, which use a different data arrangement (cf. https://ssr.readthedocs.io/en/latest/renderers.html#binaural-renderer vs. https://ssr.readthedocs.io/en/latest/renderers.html#binaural-room-synthesis-renderer).

To avoid this, start SSR in BRS mode with the scene Exp4.asd. The commands are mentioned in the file Exp4.asd. You just need to adjust all paths to your setup.

Audio preview

Put a source file of your choice (arbitrary sampling rate) at the location examples/ data/ audio.wav.


In [7]:
# define angles in degree to preview
pre_azims_deg = [0, 90]

# read source file
source, source_fs = io.read_wavefile('data/audio.wav')
if len(source.shape) > 1:
    source = source[0]  # consider only first channel

# select shorter extract from source
source = np.atleast_2d(source[:10*source_fs])

# resample source to match BRIRs
source = utils.simple_resample(source, original_fs=source_fs, target_fs=DRIR.signal.fs)
source_fs = int(DRIR.signal.fs)

for azim in pre_azims_deg:
    # show preview
    display.display(
        display.Markdown(f'### Head orientation {azim}°'),
        display.Audio(process.convolve(source, np.array([BRIR_L[azim], BRIR_R[azim]])),
                      rate=source_fs))


Head orientation 0°

Head orientation 90°

Plot preview


In [8]:
pre_legend = list(f'left ear, {azim}°' for azim in pre_azims_deg)
pre_IR = list(BRIR_L[azim] for azim in pre_azims_deg)
pre_TF = process.FFT(pre_IR, fs=source_fs, calculate_freqs=False)

# show time domain preview
plot.plot2D(pre_IR, fs=source_fs, viz_type='Time', title='', line_names=pre_legend)
plot.plot2D(pre_IR, fs=source_fs, viz_type='ETC', title='', line_names=pre_legend)
display.display(
    display.Markdown('### <center>casual radial filter introduced delay of {:.0f} samples or {:.0f} ms</center>'.format(
        dn_delay_samples, 1000 * dn_delay_samples / source_fs))
)

# show frequency domain preview
plot.plot2D(pre_TF, fs=source_fs, viz_type='LogFFT', title='', line_names=pre_legend)


casual radial filter introduced delay of 2048 samples or 43 ms