Sounddevice

sounddevice is a Python module available for Linux, OSX and MS Windows. sounddevice is a Python wrapper for the PortAudio library, which allows us to handle PCM audio in these operating systems.


In [ ]:
try:
    import sounddevice as sd  # https://python-sounddevice.readthedocs.io
except ModuleNotFoundError:
    !pip3 install sounddevice --user
    import sounddevice as sd

In [ ]:
help(sd)

In [ ]:
# Get device's list
sd.query_devices()

In [ ]:
# Get information about how many devices are available and the default ones
sd.query_hostapis()

In [ ]:
# Get information about default input device
sd.query_devices(0)

In [ ]:
# Get information about default output device
sd.query_devices(2)

The sounddevice.Stream class

A sounddevice.Stream object allows to simulteneous input and output PCM digital audio through NumPy arrays. The following parameters are available (all are optional). Summarizing:

  1. samplerate: Sampling frequency (for both, input and output) in frames per second.
  2. blocksize: Number of frames (single samples in the case of mono audio or tuples of samples in the case of multichannel audio, normally, stereo) passed to the callback function (see below). By default, blocksize=0, which means that the block size possiblely will have a variable size, depending on the host workload and the requested latency setting (see below).
  3. device: Input and output devices.
  4. dtype: The sample format of the numpy.ndarray provided to the stream callback (see below).
  5. latency: The desired latency (time elapsed between an action that produces sound and the actual perception of that sound by a listener) of the ADC converter in seconds. The special values low and high (being the latter one the default) select the default low and high latency, respectively. This parameter has only effect if blocksize!=0.
  6. extra_settings: Used for host-API-specific input/output settings.
  7. callback: Callback function, which has the signature:
    callback(indata: ndarray, outdata: ndarray, frames: int, time: CData, status: CallbackFlags) -> None
    1. indata and outdata: Input and output buffer, respectively, as two-dimensional numpy.ndarray with one column per channel (i.e. with a shape of (frames, channels)) and with a data type specified by dtype. The output buffer contains uninitialized data and the callback is supposed to fill it with audio data, that will depend on the application. `
    2. frames: Number of frames in indata and outdata.
    3. time: Time-stamps of the first frame in indata, in outdata, and the time at which the callback function was called.
    4. status: Indicates if underflow or overflow conditions happened during the last call to the callbak function. An underflow happens when the audio device is consuming the data faster than it arrives to the audio buffer. An overflow happens when the audio device is consuming the data too slow and the audio buffer overflows. Typically, the underflow problem is much more frequent than the overflow problem.
  8. finished_callback: User-supplied function which will be called when the stream becomes inactive.
  9. clip_off: Set to True to disable clipping).
  10. dither_off: Set to True to disable dithering.
  11. never_drop_input: Set to True to request that, where possible, a full duplex stream will not discard overflowed input samples without calling the stream callback. This only works if blocksize=0.
  12. prime_output_buffers_using_stream_callback: Set to True to call the stream callback to fill initial output buffers, rather than the default behavior of priming the buffers with zeros (silence).

In [ ]:
try:
    import numpy as np
except ModuleNotFoundError:
    import os
    os.system("pip3 install numpy --user")
    import numpy as np
import time
import sys
try:
    import psutil
except ModuleNotFoundError:
    import os
    os.system("pip3 install psutil --user")
    import psutil

def record_and_play(indata, outdata, frames, time, status):
    outdata[:] = indata
    print(f"measured buffer latency = {1000*(time.outputBufferDacTime - time.inputBufferAdcTime):3.3f} mili-seconds;",
        f"current time = {time.currentTime:8.2f} seconds;",
        f"CPU usage = {psutil.cpu_percent():4.2f}", end='\r')

def run(frames_per_second, frames_per_block):
    with sd.Stream(samplerate=frames_per_second,
                   blocksize=frames_per_block,
                   dtype=np.int16,
                   channels=2,
                   callback=record_and_play):
        print(f"ideal (minimum possible) buffer latency = {1000*(frames_per_block/frames_per_second):3.3f} mili-seconds")
        while True:
            time.sleep(1)

# Typical configuration
run(frames_per_second = 44100, frames_per_block = 1024)

In [ ]:
# If the sampling frequency is decreased, the buffer latency is increased (ideally) proportionally
run(frames_per_second = 22050, frames_per_block = 1024)

In [ ]:
# If the buffer size is smaller, the buffer latency is also smaller (ideally) proportionally
run(frames_per_second = 44100, frames_per_block = 512)