Let's make a Siri-like program with the following properties:
So for example, if you say "Monty: email me with subject hello and body goodbye", it will email you with the appropriate subject and body. If you say "Monty: tell me a joke" then it will go to the web and find a joke and print it for you. If you say, "Monty: calculate two times three" it should response with printing the number 6.
Hint: you can use speed-to-text apps like Houndify to return the text (but not do the actions). You'll need to sign up for a free API and then follow documentation instructions for using the service within Python.
In [ ]:
# you need to install the following package to continue:
# pip3 install SpeechRecognition
# load monty class
from monty import Monty
To demo, first load gmail login info To test on other computers, please manually define
In [ ]:
def get_gmail_credential():
"""
Get Monty's gmail login credentials.
-----------------------------------------
| Only works on Yuguang Tong's computer |
-----------------------------------------
return
------
(username, password)
"""
import netrc
host = 'test_gmail'
secrets = netrc.netrc()
username, _, password = secrets.authenticators(host)
return username, password
# this won't work on your computer, change it to your "testing" email address
monty_gmail, monty_password = get_gmail_credential()
# which email you want monty to send to?
user_email = 'tongyuguang09@gmail.com'
In [ ]:
# make an instance of monty!
monty = Monty(monty_gmail, monty_password, user_email)
In [ ]:
# run this as many times as you like
# might be a bit slow
monty.ask()
Save copies of any example plots to illustrate how your program works.
https://piazza.com/berkeley/fall2016/ay250/resources -> hw3_sound_files.zip
In [ ]:
Hints: You’ll want to decompose the sound into a frequency power spectrum. Use a Fast Fourier Transform. Be care about “unpacking” the string hexcode into python data structures. The sound files use 32 bit data. Play around with what happens when you convert the string data to other integer sizes, or signed vs unsigned integers. Also, beware of harmonics.
In [1]:
import aifc
import numpy as np
In [2]:
def aif_psd(file):
"""
1, open an AIFF file (known format, 2 channels, 16bit sample size)
2, extract amplitude
3, perform FFT to obtain power spectral density (PSD),
4, smooth PSD by a low pass filter
return
------
freq: frequency in Hz
smoothed_spec: sqrt(PSD)
"""
# opn AIFF file
aif = aifc.open(file, 'rb')
# get number of frames
nframes = aif.getnframes()
# unpack samples and convert to signed 16-bit int
amp = np.fromstring(aif.readframes(nframes), dtype=np.int16)
# we only use left channel to obtain psd
left_ts = amp[::2]
spec = np.abs(np.fft.rfft(left_ts))
# sampling frequency
fs = aif.getframerate()
freq = np.arange(len(spec)) * fs / nframes
from smooth import smooth
# use low pass filters to smooth PSD
smoothed_spec = smooth(spec, window_len=24, window='hamming')
return (freq, smoothed_spec)
Demonstrating correctness by identifying notes from labeled samples
In [7]:
from bokeh.layouts import gridplot
from bokeh.plotting import figure, output_notebook, show, output_file
In [8]:
labeled_names = ['A4_PopOrgan', 'C4+A4_PopOrgan', 'F3_PopOrgan', 'F4_CathedralOrgan']
labeled_files = ['sound_files/'+sample + '.aif' for sample in labeled_names]
labeled_psds = [aif_psd(file) for file in labeled_files]
In [9]:
# output_notebook()
output_file('labeled_demo.html')
In [10]:
s= {}
fmax = 1000
x_range = [0, fmax]
TOOLS="reset,crosshair,pan,wheel_zoom,box_zoom"
for i in range(4):
s[i] = figure(width=250, plot_height=250, title=labeled_names[i],
x_axis_label='f[Hz]', y_axis_label='PSD',
x_range = x_range, tools=TOOLS)
freq = np.array(labeled_psds[i][0])
psd = np.array(labeled_psds[i][1])
ind = freq < fmax
freq = freq[ind]
psd = psd[ind]
s[i].line(freq, psd)
In [11]:
p = gridplot([[s[0], s[1]], [s[2], s[3]]])
show(p)
In [12]:
unlabeled_names = [str(sample)+'.aif' for sample in np.arange(1,13)]
unlabeled_files = ['sound_files/'+ name for name in unlabeled_names]
unlabeled_psds = [aif_psd(file) for file in unlabeled_files]
In [13]:
s= {}
fmax = 1500
x_range = [0, fmax]
TOOLS="reset,crosshair,pan,wheel_zoom,box_zoom"
for i in range(12):
s[i] = figure(width=250, plot_height=250, title=unlabeled_names[i],
x_axis_label='f[Hz]', y_axis_label='PSD',
x_range = x_range, tools=TOOLS)
freq = np.array(unlabeled_psds[i][0])
psd = np.array(unlabeled_psds[i][1])
ind = freq < fmax
freq = freq[ind]
psd = psd[ind]
s[i].line(freq, psd)
In [14]:
# output_notebook()
output_file('unlabeled_demo.html')
grid = np.array([s[i] for i in range(12)]).reshape(4,3).tolist()
p = gridplot(grid)
show(p)