Exercise 1: Python and sounds

This exercise aims to get familiar with some basic audio operations using Python. There are four parts to it: 1) Reading an audio file, 2) Basic operations with audio, 3) Python array indexing, and 4) Downsampling audio - Changing the sampling rate.

Before doing the exercise, please go through the general information for all the exercises given in README.txt of the exercises directory.

Relevant concepts

Python: Python is a powerful and easy to learn programming language, which is used in a wide variety of application areas. More information in https://www.python.org/. We will use python in all the exercises and in this first one you will start learning about it by performing some basic operations with sound files.

Jupyter notebooks: Jupiter notebooks are interactive documents containing live code, equations, visualizations and narrative text. More information in https://jupyter.org/. It supports Python and all the exercises here use it.

Wav file: The wav file format is a lossless format to store sounds on a hard drive. Each audio sample is stored as a 16 bit integer number (sometimes also as 24 bit integer or 32 bit float). In this course we will work with only one type of audio files. All the sound files we use in the assignments should be wav files that are mono (one channel), in which the samples are stored in 16 bits, and that use (most of the time) the sampling rate of 44100 Hz. Once read into python, the samples will be converted to floating point values with a range from -1 to 1, resulting in a one-dimensional array of floating point values.

Part 1 - Reading in an audio file

The read_audio_samples() function bellow should read an audio file and return a specified number of consecutive samples of the file starting at a given sample.

The input to the function is the file name (including the path), plus the location of first sample and the number of consecutive samples to take, and the output should be a numpy array.

If you use the wavread() function from the utilFunctions module available in the software/models directory, the input samples will be automatically converted to a numpy array of floating point numbers with a range from -1 to 1, which is what we want.

Remember that in python, the index of the first sample of an array is 0 and not 1.


In [1]:
import sys
import os
import numpy as np

sys.path.append('../software/models/')
from utilFunctions import wavread, wavwrite

In [ ]:
# E1 - 1.1: Complete the read_audio_samples() function

def read_audio_samples(input_file, first_sample=50001, num_samples=10):
    """Read num_samples samples from an audio file starting at sample first_sample
    
    Args:
        input_file (str): path of a wav file      
    
    Returns:
        np.array: numpy array containing the selected samples
    
    """
    
    ### Your code here

You can use as input the sound files from the sounds directory, thus using a relative path to it. If you run the read_audio_samples() function using the piano.wav sound file as input, with the default arguments, it should return the following samples:

array([-0.06213569, -0.04541154, -0.02734458, -0.0093997, 0.00769066, 0.02319407, 0.03503525, 0.04309214, 0.04626606,  0.0441908], dtype=float32)

In [ ]:
# E1 - 1.2: Call read_audio_samples() with the proposed input sound and default arguments

### Your code here

Part 2 - Basic operations with audio

The function minMaxAudio() should read an audio file and return the minimum and maximum values of the audio samples in that file. The input to the function is the wav file name (including the path) and the output should be two floating point values returned as a tuple.


In [ ]:
# E1 - 2.1: Complete function minMaxAudio()

def min_max_audio(input_file):
    """Compute the minimum and maximum values of the audio samples in the input file
    
    Args:
        inputFile(str): file name of the wav file (including path)
    
    Returns:
        tuple: minimum and maximum value of the audio samples, like: (min_val, max_val)
    """
    ### Your code here

If you run min_max_audio() using oboe-A4.wav as input, it should return the following output:

(-0.83486432, 0.56501967)

In [ ]:
# E1 - 2.2: Plot input sound with x-axis in seconds, and call min_max_audio() with the proposed sound file

### Your code here

Part 3 - Python array indexing

For the function hop_samples(), given a numpy array x, it should return every Mth element of x, starting from the first element. The input arguments to this function are a numpy array x and a positive integer M such that M < number of elements in x. The output of this function should be a numpy array.


In [ ]:
# E1 - 3.1: Complete the function hop_samples()

def hop_samples(x, M):
    """Return every Mth element of the input array
    
    Args:
        x(np.array): input numpy array
        M(int): hop size (positive integer)
    
    Returns:
        np.array: array containing every Mth element in x, starting from the first element in x
    """
    ### Your code here

If you run the functionhop_samples() with x = np.arange(10) and M = 2 as inputs, it should return:

array([0, 2, 4, 6, 8])

In [ ]:
# E1 - 3.2: Plot input array, call hop_samples() with proposed input, and plot output array

### Your code here

Part 4 - Downsampling

One of the required processes to represent an analog signal inside a computer is sampling. The sampling rate is the number of samples obtained in one second when sampling a continuous analog signal to a discrete digital signal. As mentioned we will be working with wav audio files that have a sampling rate of 44100 Hz, which is a typical value. Here you will learn a simple way of changing the original sampling rate of a sound to a lower sampling rate, and will learn the implications it has in the audio quality.

The function down_sample_audio() has as input an audio file with a given sampling rate, it should apply downsampling by a factor of M and return a down-sampled version of the input samples. The sampling rates and downsampling factors to use have to be integer values.

From the output samples if you need to create a wav audio file from an array, you can use the wavwrite() function from the utilFunctions.py module. However, in this exercise there is no need to write an audio file, we will be able to hear the sound without creating a file, just playing the array of samples.


In [ ]:
# E1 - 4.1: Complete function down_sample_audio()

def down_sample_audio(input_file, M):
    """Downsample by a factor of M the input signal
    
    Args:
        input_file(str): file name of the wav file (including path)
        M(int): downsampling factor (positive integer)
        
    Returns:
        tuple: input samples (np.array), original sampling rate (int), down-sampled signal (np.array), 
               and new sampling rate (int), like: (x, fs, y, fs_new) 
    """
    ### Your code here

Test cases for down_sample_audio():

Test Case 1: Use the file from the sounds directory vibraphone-C6.wav and a downsampling factor of M=14.

Test Case 2: Use the file from the sounds directory sawtooth-440.wav and a downsampling factor of M=14.

To play the output samples, import the Ipython.display package and use ipd.display(ipd.Audio(data=y, rate=fs_new)). To visualize the output samples import the matplotlib.pyplot package and use plt.plot(x).

You can find some related information in https://en.wikipedia.org/wiki/Downsampling_(signal_processing)


In [ ]:
import IPython.display as ipd
import matplotlib.pyplot as plt

In [ ]:
# E1 - 4.2: Plot and play input sounds, call the function down_sample_audio() for the two test cases, 
# and plot and play the output sounds. 

### Your code here

In [ ]:
# E1 - 4.3: Explain the results of part 4. What happened to the output signals compared to the input ones? 
# Is there a difference between the 2 cases? Why? How could we avoid damaging the signal when downsampling it?

"""

"""