DIVA is a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimaging data concerning the control of speech movements, as describe on this page. The code of the model is open-source and is avaible here. You will have to download and unzip it to follow this tutorial. You will need Matlab installed on your computer (unfortunately), be carefull to download the Diva version matching your matlab version.
The DIVA model uses an articulatory synthesizer, i.e. a computer simulation of the human vocal tract allowing to generate the sound wave resulting from articulator movements involving the jaw, the tongue, the lips ... This is this articulatory synthesizer that we will use, independently of the neural model. For more information please refer to the documentation in the pdf provided in the DIVA zip archive.
Using the DIVA bindings in Explauto also requires pymatlab which provides a way to bind Matlab code to Python, so please intall this too.
Finally, Matlab needs to be aware of your DIVA installation path (i.e. the path of the unzipped DIVA directory). You will have to ass it by editing your search path permanently.
Now you should be able to use DIVA in Explauto, to check it run:
In [1]:
from explauto.environment.diva import DivaEnvironment, full_config
env = DivaEnvironment(**full_config)
If everything went well, you can continue :)
The synthesizer can be used in to ways:
Using DIVA as a standalone synthesizer is performed by instanciating the DivaSynth class:
In [1]:
from explauto.environment.diva import DivaSynth
synth = DivaSynth()
The two methods of this object requires articulatory trajectories as input, i.e. a numpy array of shape $(13, t)$, where $t$ is the number of time steps. Each of the 13 rows of the array corresponds to the trajectory of a particular articulator. For example, the first articulator globally corresponds to an open/close dimension mainly involving the jaw, as shown in the figure below extracted from the DIVA documentation (the pdf in the zip archive). It illustrates the movements induced by the 10 first articulators (left to right, top to bottom), the 3 last ones controlling the pitch, the pressure and the voicing (see the DIVA documentation for more details). All values in the array should be in the range $[-1, 1]$.
Let's consider the following articulatory trajectory:
In [2]:
from numpy import zeros, linspace
art_traj = zeros((13, 1000)) # hence 1000 time steps
art_traj[0, :] = linspace(1, -1, 1000) # The jaw moves linearly for 1 (completely close) to 0 (completely open)
art_traj[11:13, :] = 1 # maximum pressure and voicing to ensure phonation
compute the corresponding sound:
In [3]:
sound = synth.sound_wave(art_traj)
plot the sound wave:
In [4]:
%pylab inline
plot(sound)
Out[4]:
Play the sound
In [6]:
import pyaudio
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=11025,
output=True)
In [8]:
stream.write(sound.astype(float32).tostring())
Compute auditory (aud) and somato-sensory (som) features, as well as vocal tract shapes:
In [9]:
aud, som, vt = synth.execute(art_traj)
In [10]:
subplot(221)
plot(aud.T)
subplot(222)
plot(som.T)
subplot(223)
plot(real(vt[:, 0]), imag(vt[:,0]))
axis('equal')
subplot(224)
plot(real(vt[:, -1]), imag(vt[:, -1]))
axis('equal')
Out[10]:
In [11]:
from explauto.environment.diva import DivaEnvironment, configurations
In [12]:
configurations.keys()
Out[12]:
In [13]:
env = DivaEnvironment(**configurations['vowel_config'])
In [14]:
from explauto import Agent, SensorimotorModel, InterestModel, Experiment
sm_model = SensorimotorModel.from_configuration(env.conf, 'nearest_neighbor', 'default')
im_model = InterestModel.from_configuration(env.conf, env.conf.s_dims, 'discretized_progress')
ag = Agent(env.conf, sm_model, im_model)
expe = Experiment(env, ag)
In [15]:
expe.run(100)
In [16]:
%pylab inline
ax = axes()
expe.log.scatter_plot(ax, (('sensori', range(2)),))
In [ ]:
%debug
In [10]:
s = env.update(env.random_motors()[0])
In [11]:
s.shape
Out[11]:
In [7]:
ag.conf.s_ndims
Out[7]:
In [39]:
environments['diva'] = (DivaEnvironment, configurations, _)
In [42]:
available_configurations('diva').keys()
Out[42]:
In [2]:
env = environment(**configurations['full_config'])
In [3]:
from explauto.models.dmp import DmpPrimitive
n_bfs = 8
n_dmps = env.conf.m_ndims
dmp = DmpPrimitive(env.conf.m_ndims, bfs=n_bfs,
used=[False] * n_dmps + [True] * n_dmps * n_bfs + [False] * n_dmps,
default = [0.] * n_dmps * (n_bfs + 1) + [0.2] * n_dmps, type='rythmic') #, run_time=4)
# dmp.dmp.cs.timesteps * 4
In [4]:
from numpy.random import randn
m = list(70. * randn(dmp.n_dmps * dmp.n_bfs))
xs = dmp.trajectory(m, n_times=8)
In [21]:
%pylab inline
plot(xs)
Out[21]:
In [22]:
s= env.sound_wave(xs)
In [ ]:
# from explauto.environment import available_configurations
# available_configurations('diva')['full_config']
In [8]:
%pylab inline
In [15]:
plot(s)
Out[15]:
In [6]:
import pyaudio
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=11025,
output=True)
In [19]:
stream.write(s.astype(float32).tostring())
In [23]:
from scipy import interpolate
def interpol(signal, mult):
""" signal.shape will be multiply by mult """
x = linspace(0, 1, signal.shape[0])
#y = np.exp(-x/3.0)
f = interpolate.interp1d(x, signal.T)
xnew = linspace(0, 1, signal.shape[0] * mult)
return f(xnew).T # use interpolation function returned by `interp1d`
In [38]:
import librosa
In [9]:
for i in range(100):
m = list(70. * randn(dmp.n_dmps * dmp.n_bfs))
xs = dmp.trajectory(m, n_times=1)
# xs = interpol(xs, 2)
#n_samples = 1000
s= env.sound_wave(xs)
stream.write(s.astype(float32).tostring())
#librosa.output.write_wav('poppy_voc/sound_' + str(i) + '.wav', s.astype(float32), 11025)
In [23]:
xs.shape
Out[23]:
In [30]:
xs.shape, ynew.shape
Out[30]:
In [19]:
subplot(121)
plot(xs)
subplot(122)
plot(s)
Out[19]:
In [33]:
import librosa
librosa.output.write_wav('sound.wav', s.astype(float32), 11025)
In [34]:
import pyaudio
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=11025,
output=True)
In [22]:
stream.write(s.astype(float32).tostring())
In [11]:
sys.path.append('../../dmpbbo/python/')
sys.path.append('../../dmpbbo/build_dir/python/')
from dmpbbo import Dmp
In [12]:
n_dmps = env.conf.m_ndims
bfs = 4
dmp = Dmp(n_dmps, bfs)
dmp.tau = 4.
dmp.set_attractor_state([0.2] * n_dmps)
In [12]:
m = list(70. * randn(n_dmps * bfs))
ts, xs, xds, xdds = dmp.trajectory(400, m)
In [13]:
xs = array(xs)
#plot(xdds)
In [ ]:
dmp.trajectory(
In [10]:
xs.shape
Out[10]:
In [ ]:
sounds = []
for _ in range(100):
m = list(70. * randn(n_dmps * bfs))
ts, xs, xds, xdds = dmp.trajectory(400, m)
s= env.sound_wave(xs)
sounds.append(s)
stream.write(s.astype(float32).tostring())
In [23]:
s_44 = librosa.resample(s, 11025, 44100)
In [26]:
from numpy import save
save('sounds.npy', sounds)
In [3]:
from numpy import load
sounds = load('sounds.npy')
In [29]:
len(test)
Out[29]:
In [5]:
from numpy import load, float32
import pyaudio
import time
sounds = load('sounds.npy')
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
channels=1,
rate=11025,
output=True)
for s in sounds:
stream.write(s.astype(float32).tostring())
time.sleep(0.4)
In [4]:
from scipy.io import wavfile
In [28]:
wavfile.write('sound_11025.wav', 11025, s.astype(float32))
In [6]:
%pylab inline
plot(sounds[0])
Out[6]:
In [7]:
wavfile.write?
In [10]:
sounds[0].dtype
Out[10]:
In [37]:
for i, s in enumerate(sounds):
librosa.output.write_wav('poppy_voc/sound_' + str(i) + '.wav', s.astype(float32), 11025)
#s /= 600.
#wavfile.write('sound.wav', 11025, s.astype(float32))
In [26]:
s.dtype
Out[26]:
In [27]:
s.astype(float32).dtype
Out[27]:
In [ ]: