Batch processing of files

Using the Python standard libraries (i.e., the glob and os modules), we can also quickly code up batch operations e.g. over all files with a certain extension in a directory. For example, we can make a list of all .wav files in the audio directory, use Praat to pre-emphasize these Sound objects, and then write the pre-emphasized sound to a WAV and AIFF format file.


In [ ]:
# Find all .wav files in a directory, pre-emphasize and save as new .wav and .aiff file
import parselmouth

import glob
import os.path

for wave_file in glob.glob("audio/*.wav"):
    print("Processing {}...".format(wave_file))
    s = parselmouth.Sound(wave_file)
    s.pre_emphasize()
    s.save(os.path.splitext(wave_file)[0] + "_pre.wav", 'WAV') # or parselmouth.SoundFileFormat.WAV instead of 'WAV'
    s.save(os.path.splitext(wave_file)[0] + "_pre.aiff", 'AIFF')

After running this, the original home directory now contains all of the original .wav files pre-emphazised and written again as .wav and .aiff files. The reading, pre-emphasis, and writing are all done by Praat, while looping over all .wav files is done by standard Python code.


In [ ]:
# List the current contents of the audio/ folder
!ls audio/

In [ ]:
# Remove the generated audio files again, to clean up the output from this example
!rm audio/*_pre.wav
!rm audio/*_pre.aiff

Similarly, we can use the pandas library to read a CSV file with data collected in an experiment, and loop over that data to e.g. extract the mean harmonics-to-noise ratio. The results CSV has the following structure:

condition ... pp_id
0 ... 1877
1 ... 801
1 ... 2456
0 ... 3126

The following code would read such a table, loop over it, use Praat through Parselmouth to calculate the analysis of each row, and then write an augmented CSV file to disk. To illustrate we use an example set of sound fragments: results.csv, 1_b.wav, 2_b.wav, 3_b.wav, 4_b.wav, 5_b.wav, 1_y.wav, 2_y.wav, 3_y.wav, 4_y.wav, 5_y.wav

In our example, the original CSV file, results.csv contains the following table:


In [ ]:
import pandas as pd

print(pd.read_csv("other/results.csv"))

In [ ]:
def analyse_sound(row):
    condition, pp_id = row['condition'], row['pp_id']
    filepath = "audio/{}_{}.wav".format(condition, pp_id)
    sound = parselmouth.Sound(filepath)
    harmonicity = sound.to_harmonicity()
    return harmonicity.values[harmonicity.values != -200].mean()

# Read in the experimental results file
dataframe = pd.read_csv("other/results.csv")

# Apply parselmouth wrapper function row-wise
dataframe['harmonics_to_noise'] = dataframe.apply(analyse_sound, axis='columns')

# Write out the updated dataframe
dataframe.to_csv("processed_results.csv", index=False)

We can now have a look at the results by reading in the processed_results.csv file again:


In [ ]:
print(pd.read_csv("processed_results.csv"))

In [ ]:
# Clean up, remove the CSV file generated by this example
!rm processed_results.csv