Interaction with the World Homework (#3)

Python Computing for Data Science (c) J Bloom, UC Berkeley, 2016

1) Monty: The Python Siri

Let's make a Siri-like program with the following properties:

  • record your voice command
  • use a webservice to parse that sound file into text
  • based on what the text, take three different types of actions:
    • send an email to yourself
    • do some math
    • tell a joke

So for example, if you say "Monty: email me with subject hello and body goodbye", it will email you with the appropriate subject and body. If you say "Monty: tell me a joke" then it will go to the web and find a joke and print it for you. If you say, "Monty: calculate two times three" it should response with printing the number 6.

Hint: you can use speed-to-text apps like Houndify to return the text (but not do the actions). You'll need to sign up for a free API and then follow documentation instructions for using the service within Python.


In [ ]:
# you need to install the following package to continue:
#     pip3 install SpeechRecognition

# load monty class
from monty import Monty

To demo, first load gmail login info To test on other computers, please manually define

  • monty_gmail
  • monty_password

In [ ]:
def get_gmail_credential():
    """
    Get Monty's gmail login credentials.
    -----------------------------------------
    | Only works on Yuguang Tong's computer |
    -----------------------------------------
    
    return
    ------
        (username, password)
    """
    import netrc

    host = 'test_gmail'
    secrets = netrc.netrc()
    username, _, password = secrets.authenticators(host)
    return username, password

# this won't work on your computer, change it to your "testing" email address
monty_gmail, monty_password = get_gmail_credential()

# which email you want monty to send to?
user_email = 'tongyuguang09@gmail.com'

In [ ]:
# make an instance of monty!
monty = Monty(monty_gmail, monty_password, user_email)

In [ ]:
# run this as many times as you like
# might be a bit slow 
monty.ask()

2) Write a program that identifies musical notes from sound (AIFF) files.

  • Run it on the supplied sound files (12) and report your program’s results.
  • Use the labeled sounds (4) to make sure it works correctly. The provided sound files contain 1-3 simultaneous notes from different organs.
  • Save copies of any example plots to illustrate how your program works.

    https://piazza.com/berkeley/fall2016/ay250/resources -> hw3_sound_files.zip


In [ ]:

Hints: You’ll want to decompose the sound into a frequency power spectrum. Use a Fast Fourier Transform. Be care about “unpacking” the string hexcode into python data structures. The sound files use 32 bit data. Play around with what happens when you convert the string data to other integer sizes, or signed vs unsigned integers. Also, beware of harmonics.


In [1]:
import aifc
import numpy as np

In [2]:
def aif_psd(file):
    """
    1, open an AIFF file (known format, 2 channels, 16bit sample size) 
    2, extract amplitude
    3, perform FFT to obtain power spectral density (PSD),
    4, smooth PSD by a low pass filter
    
    
    return 
    ------
    freq: frequency in Hz
    smoothed_spec: sqrt(PSD)
    """
    # opn AIFF file
    aif = aifc.open(file, 'rb')
    # get number of frames
    nframes = aif.getnframes()
    # unpack samples and convert to signed 16-bit int
    amp = np.fromstring(aif.readframes(nframes), dtype=np.int16)

    # we only use left channel to obtain psd
    left_ts = amp[::2]
    spec = np.abs(np.fft.rfft(left_ts))

    # sampling frequency
    fs = aif.getframerate()
    freq = np.arange(len(spec)) * fs / nframes
    
    from smooth import smooth
    
    # use low pass filters to smooth PSD
    smoothed_spec = smooth(spec, window_len=24, window='hamming')
    return (freq, smoothed_spec)

Demonstrating correctness by identifying notes from labeled samples


In [7]:
from bokeh.layouts import gridplot
from bokeh.plotting import figure, output_notebook, show, output_file

To reproduce the result below, you need to put the data directory sound_files/ in the directory containing this notebook

The demo plots are "labeled_demo.html" and "unlabeled_demo.html"


In [8]:
labeled_names = ['A4_PopOrgan', 'C4+A4_PopOrgan', 'F3_PopOrgan', 'F4_CathedralOrgan']
labeled_files = ['sound_files/'+sample + '.aif' for sample in labeled_names]
labeled_psds = [aif_psd(file) for file in labeled_files]

In [9]:
# output_notebook()
output_file('labeled_demo.html')

In [10]:
s= {}
fmax = 1000
x_range = [0, fmax]
TOOLS="reset,crosshair,pan,wheel_zoom,box_zoom"
for i in range(4):
    s[i] = figure(width=250, plot_height=250, title=labeled_names[i], 
                 x_axis_label='f[Hz]', y_axis_label='PSD', 
                 x_range = x_range, tools=TOOLS)
    freq = np.array(labeled_psds[i][0])
    psd = np.array(labeled_psds[i][1])
    ind = freq < fmax
    freq = freq[ind]
    psd = psd[ind]
    s[i].line(freq, psd)

In [11]:
p = gridplot([[s[0], s[1]], [s[2], s[3]]])
show(p)

Now process unlabeled files


In [12]:
unlabeled_names = [str(sample)+'.aif' for sample in np.arange(1,13)]
unlabeled_files = ['sound_files/'+ name for name in unlabeled_names]
unlabeled_psds = [aif_psd(file) for file in unlabeled_files]

In [13]:
s= {}
fmax = 1500
x_range = [0, fmax]
TOOLS="reset,crosshair,pan,wheel_zoom,box_zoom"
for i in range(12):
    s[i] = figure(width=250, plot_height=250, title=unlabeled_names[i], 
                 x_axis_label='f[Hz]', y_axis_label='PSD', 
                 x_range = x_range, tools=TOOLS)
    freq = np.array(unlabeled_psds[i][0])
    psd = np.array(unlabeled_psds[i][1])
    ind = freq < fmax
    freq = freq[ind]
    psd = psd[ind]
    s[i].line(freq, psd)

In [14]:
# output_notebook()
output_file('unlabeled_demo.html')
grid = np.array([s[i] for i in range(12)]).reshape(4,3).tolist()
p = gridplot(grid)
show(p)

#Result

  • 1.aif PSD peaks at ~ 250, 390Hz --> B3, G3
  • 2.aif PSD peaks at ~ 350, 520, 695Hz --> F4, C5
  • 3.aif PSD peaks at ~ 440Hz --> A4
  • 4.aif PSD peaks at ~ 260Hz --> C4
  • 5.aif PSD peaks at ~ 293Hz --> D4
  • 6.aif PSD peaks at ~ 525Hz --> C5
  • 7.aif PSD peaks at ~ 589Hz --> D5
  • 8.aif PSD peaks at ~ 350, 695Hz --> F4
  • 9.aif PSD peaks at ~ 195, 390, 590Hz etc --> G3
  • 10.aif PSD peaks at ~ 260, 390, 1050Hz etc --> C4, G4
  • 11.aif PSD peaks at ~ 245, 990, 1320HZ etc --> B3
  • 12.aif PSD peaks at ~ 65, 131Hz etc --> C2