The Diva synthesizer environment

DIVA is a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimaging data concerning the control of speech movements, as describe on this page. The code of the model is open-source and is avaible here. You will have to download and unzip it to follow this tutorial. You will need Matlab installed on your computer (unfortunately), be carefull to download the Diva version matching your matlab version.

The DIVA model uses an articulatory synthesizer, i.e. a computer simulation of the human vocal tract allowing to generate the sound wave resulting from articulator movements involving the jaw, the tongue, the lips ... This is this articulatory synthesizer that we will use, independently of the neural model. For more information please refer to the documentation in the pdf provided in the DIVA zip archive.

Using the DIVA bindings in Explauto also requires pymatlab which provides a way to bind Matlab code to Python, so please intall this too.

Finally, Matlab needs to be aware of your DIVA installation path (i.e. the path of the unzipped DIVA directory). You will have to ass it by editing your search path permanently.

Now you should be able to use DIVA in Explauto, to check it run:


In [1]:
from explauto.environment.diva import DivaEnvironment, full_config
env = DivaEnvironment(**full_config)

If everything went well, you can continue :)

The synthesizer can be used in to ways:

  • as a standalone articulatory synthesizer through the DivaSynth class, providing the python bindings to execute the standard methods from the matlab code
  • as an Explauto Environment through the DivaEnvironment class providing the interface to interact with an Explauto Agent.

Standalone synthesizer

Using DIVA as a standalone synthesizer is performed by instanciating the DivaSynth class:


In [1]:
from explauto.environment.diva import DivaSynth
synth = DivaSynth()

The two methods of this object requires articulatory trajectories as input, i.e. a numpy array of shape $(13, t)$, where $t$ is the number of time steps. Each of the 13 rows of the array corresponds to the trajectory of a particular articulator. For example, the first articulator globally corresponds to an open/close dimension mainly involving the jaw, as shown in the figure below extracted from the DIVA documentation (the pdf in the zip archive). It illustrates the movements induced by the 10 first articulators (left to right, top to bottom), the 3 last ones controlling the pitch, the pressure and the voicing (see the DIVA documentation for more details). All values in the array should be in the range $[-1, 1]$.

Let's consider the following articulatory trajectory:


In [2]:
from numpy import zeros, linspace
art_traj = zeros((13, 1000))  # hence 1000 time steps
art_traj[0, :] = linspace(1, -1, 1000) # The jaw moves linearly for 1 (completely close) to 0 (completely open)
art_traj[11:13, :] = 1  # maximum pressure and voicing to ensure phonation

compute the corresponding sound:


In [3]:
sound = synth.sound_wave(art_traj)

plot the sound wave:


In [4]:
%pylab inline
plot(sound)


Populating the interactive namespace from numpy and matplotlib
Out[4]:
[<matplotlib.lines.Line2D at 0x7ff6ad76a050>]

Play the sound


In [6]:
import pyaudio
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
                channels=1,
                rate=11025,
                output=True)

In [8]:
stream.write(sound.astype(float32).tostring())

Compute auditory (aud) and somato-sensory (som) features, as well as vocal tract shapes:


In [9]:
aud, som, vt = synth.execute(art_traj)

In [10]:
subplot(221)
plot(aud.T)
subplot(222)
plot(som.T)
subplot(223)
plot(real(vt[:, 0]), imag(vt[:,0]))
axis('equal')
subplot(224)
plot(real(vt[:, -1]), imag(vt[:, -1]))
axis('equal')


Out[10]:
(-20.0, 140.0, -200.0, 100.0)

Explauto environment


In [11]:
from explauto.environment.diva import DivaEnvironment, configurations

In [12]:
configurations.keys()


Out[12]:
['default', 'vowel_config', 'full_config', 'low_config']

In [13]:
env = DivaEnvironment(**configurations['vowel_config'])

In [14]:
from explauto import Agent, SensorimotorModel, InterestModel, Experiment

sm_model = SensorimotorModel.from_configuration(env.conf, 'nearest_neighbor', 'default')
im_model = InterestModel.from_configuration(env.conf, env.conf.s_dims, 'discretized_progress')

ag = Agent(env.conf, sm_model, im_model)

expe = Experiment(env, ag)

In [15]:
expe.run(100)


WARNING:explauto.agent.agent:Sensorimotor model not bootstrapped yet

In [16]:
%pylab inline
ax = axes()
expe.log.scatter_plot(ax, (('sensori', range(2)),))


Populating the interactive namespace from numpy and matplotlib

In [ ]:
%debug


> /home/clement/Documents/Boulot/INRIA_FLOWERS/CODE/explauto/explauto/utils/utils.py(12)bounds_min_max()
     11 def bounds_min_max(v, mins, maxs):
---> 12     res = np.minimum(v, maxs)
     13     res = np.maximum(res, mins)

ipdb> u
> /home/clement/Documents/Boulot/INRIA_FLOWERS/CODE/explauto/explauto/agent/agent.py(114)sensory_primitive()
    113         """
--> 114         return bounds_min_max(s, self.conf.s_mins, self.conf.s_maxs)
    115 

ipdb> p s.shape
(2,)

In [10]:
s = env.update(env.random_motors()[0])

In [11]:
s.shape


Out[11]:
(9,)

In [7]:
ag.conf.s_ndims


Out[7]:
4

In [39]:
environments['diva'] = (DivaEnvironment, configurations, _)

In [42]:
available_configurations('diva').keys()


Out[42]:
['default', 'vowel_config', 'full_config', 'low_config']

In [2]:
env = environment(**configurations['full_config'])

In [3]:
from explauto.models.dmp import DmpPrimitive
n_bfs = 8
n_dmps = env.conf.m_ndims
dmp = DmpPrimitive(env.conf.m_ndims, bfs=n_bfs, 
                   used=[False] * n_dmps + [True] * n_dmps * n_bfs + [False] * n_dmps,
                   default = [0.] * n_dmps * (n_bfs + 1) + [0.2] * n_dmps, type='rythmic') #, run_time=4)
# dmp.dmp.cs.timesteps * 4

In [4]:
from numpy.random import randn
m = list(70. * randn(dmp.n_dmps * dmp.n_bfs))
xs = dmp.trajectory(m, n_times=8)

In [21]:
%pylab inline
plot(xs)


Populating the interactive namespace from numpy and matplotlib
Out[21]:
[<matplotlib.lines.Line2D at 0x7f14822c2490>,
 <matplotlib.lines.Line2D at 0x7f14822c2710>,
 <matplotlib.lines.Line2D at 0x7f14822c2950>,
 <matplotlib.lines.Line2D at 0x7f14822c2b10>,
 <matplotlib.lines.Line2D at 0x7f14822c2cd0>,
 <matplotlib.lines.Line2D at 0x7f14822c2e90>,
 <matplotlib.lines.Line2D at 0x7f14822cb090>,
 <matplotlib.lines.Line2D at 0x7f1482d6e810>,
 <matplotlib.lines.Line2D at 0x7f14822cb410>,
 <matplotlib.lines.Line2D at 0x7f14822cb5d0>,
 <matplotlib.lines.Line2D at 0x7f14822cb790>,
 <matplotlib.lines.Line2D at 0x7f14822cb950>,
 <matplotlib.lines.Line2D at 0x7f14822cbb10>]

In [22]:
s= env.sound_wave(xs)


11025

In [ ]:
# from explauto.environment import available_configurations
# available_configurations('diva')['full_config']

In [8]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib

In [15]:
plot(s)


Out[15]:
[<matplotlib.lines.Line2D at 0x7f1482de6510>]

In [6]:
import pyaudio
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
                channels=1,
                rate=11025,
                output=True)

In [19]:
stream.write(s.astype(float32).tostring())

In [23]:
from scipy import interpolate
def interpol(signal, mult):
    """ signal.shape will be multiply by mult """
    x = linspace(0, 1, signal.shape[0])
    #y = np.exp(-x/3.0)
    f = interpolate.interp1d(x, signal.T)

    xnew = linspace(0, 1, signal.shape[0] * mult)
    return f(xnew).T   # use interpolation function returned by `interp1d`

In [38]:
import librosa

In [9]:
for i in range(100):
    m = list(70. * randn(dmp.n_dmps * dmp.n_bfs))
    xs = dmp.trajectory(m, n_times=1)
    # xs = interpol(xs, 2)
    #n_samples = 1000
    s= env.sound_wave(xs)
    stream.write(s.astype(float32).tostring())
    #librosa.output.write_wav('poppy_voc/sound_' + str(i) + '.wav', s.astype(float32), 11025)


11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-9-b7054ecdf516> in <module>()
      4     # xs = interpol(xs, 2)
      5     #n_samples = 1000
----> 6     s= env.sound_wave(xs)
      7     stream.write(s.astype(float32).tostring())
      8     #librosa.output.write_wav('poppy_voc/sound_' + str(i) + '.wav', s.astype(float32), 11025)

/home/clement/Documents/Boulot/INRIA_FLOWERS/CODE/explauto/explauto/environment/diva/diva.pyc in sound_wave(self, art_traj)
     63         synth_art = self.m_default.reshape(1, -1).repeat(len(art_traj), axis=0)
     64         synth_art[:, self.m_used] = art_traj
---> 65         return self.synth.sound_wave(synth_art.T)

/home/clement/Documents/Boulot/INRIA_FLOWERS/CODE/explauto/explauto/environment/diva/diva.pyc in sound_wave(self, art)
     29         self.session.run('sr = sr(1)')
     30         print self.session.getvalue('sr')
---> 31         self.session.run('wave = diva_synth(art, \'sound\')')
     32         return self.session.getvalue('wave')
     33 

/usr/local/lib/python2.7/dist-packages/pymatlab/matlab.pyc in run(self, matlab_statement)
     74         #wrap statement to be able to catch errors
     75         real_statement = wrap_script.format(matlab_statement)
---> 76         self.engine.engEvalString(self.ep,c_char_p(real_statement))
     77         self.engine.engGetVariable.restype=POINTER(mxArray)
     78         mxresult = self.engine.engGetVariable(

KeyboardInterrupt: 

In [23]:
xs.shape


Out[23]:
(628, 13)

In [30]:
xs.shape, ynew.shape


Out[30]:
((628, 13), (1256, 13))

In [19]:
subplot(121)
plot(xs)
subplot(122)
plot(s)


Out[19]:
[<matplotlib.lines.Line2D at 0x6e48710>]

In [33]:
import librosa
librosa.output.write_wav('sound.wav', s.astype(float32), 11025)

In [34]:
import pyaudio
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
                channels=1,
                rate=11025,
                output=True)

In [22]:
stream.write(s.astype(float32).tostring())

In [11]:
sys.path.append('../../dmpbbo/python/')
sys.path.append('../../dmpbbo/build_dir/python/')
from dmpbbo import Dmp

In [12]:
n_dmps = env.conf.m_ndims
bfs = 4
dmp = Dmp(n_dmps, bfs)
dmp.tau = 4.
dmp.set_attractor_state([0.2] * n_dmps)

In [12]:
m = list(70. * randn(n_dmps * bfs))
ts, xs, xds, xdds = dmp.trajectory(400, m)

In [13]:
xs = array(xs)
#plot(xdds)

In [ ]:
dmp.trajectory(

In [10]:
xs.shape


Out[10]:
(400, 13)

In [ ]:
sounds = []
for _ in range(100):
    m = list(70. * randn(n_dmps * bfs))
    ts, xs, xds, xdds = dmp.trajectory(400, m)    
    s= env.sound_wave(xs)
    sounds.append(s)
    stream.write(s.astype(float32).tostring())

In [23]:
s_44 = librosa.resample(s, 11025, 44100)

In [26]:
from numpy import save
save('sounds.npy', sounds)

In [3]:
from numpy import load
sounds = load('sounds.npy')

In [29]:
len(test)


Out[29]:
100

In [5]:
from numpy import load, float32
import pyaudio
import time


sounds = load('sounds.npy')

pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
                channels=1,
                rate=11025,
                output=True)

for s in sounds:
    stream.write(s.astype(float32).tostring())
    time.sleep(0.4)

In [4]:
from scipy.io import wavfile

In [28]:
wavfile.write('sound_11025.wav', 11025, s.astype(float32))

In [6]:
%pylab inline
plot(sounds[0])


Populating the interactive namespace from numpy and matplotlib
Out[6]:
[<matplotlib.lines.Line2D at 0x401be90>]

In [7]:
wavfile.write?

In [10]:
sounds[0].dtype


Out[10]:
dtype('float64')

In [37]:
for i, s in enumerate(sounds):
    librosa.output.write_wav('poppy_voc/sound_' + str(i) + '.wav', s.astype(float32), 11025)
#s /= 600.
#wavfile.write('sound.wav', 11025, s.astype(float32))

In [26]:
s.dtype


Out[26]:
dtype('float64')

In [27]:
s.astype(float32).dtype


Out[27]:
dtype('float32')

In [ ]: