The Diva synthesizer environment

DIVA is a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimaging data concerning the control of speech movements, as describe on this page. The code of the model is open-source and is avaible here. You will have to download and unzip it to follow this tutorial. You will need Matlab installed on your computer (unfortunately), be carefull to download the Diva version matching your matlab version.

The DIVA model uses an articulatory synthesizer, i.e. a computer simulation of the human vocal tract allowing to generate the sound wave resulting from articulator movements involving the jaw, the tongue, the lips ... This is this articulatory synthesizer that we will use, independently of the neural model. For more information please refer to the documentation in the pdf provided in the DIVA zip archive.

Using the DIVA bindings in Explauto also requires pymatlab which provides a way to bind Matlab code to Python, so please intall this too.

Finally, Matlab needs to be aware of your DIVA installation path (i.e. the path of the unzipped DIVA directory). You will have to ass it by editing your search path permanently.

Now you should be able to use DIVA in Explauto, to check it run:



In [1]:

    
from explauto.environment.diva import DivaEnvironment, full_config
env = DivaEnvironment(**full_config)

If everything went well, you can continue :)

The synthesizer can be used in to ways:

as a standalone articulatory synthesizer through the DivaSynth class, providing the python bindings to execute the standard methods from the matlab code
as an Explauto Environment through the DivaEnvironment class providing the interface to interact with an Explauto Agent.

Standalone synthesizer

Using DIVA as a standalone synthesizer is performed by instanciating the DivaSynth class:



In [1]:

    
from explauto.environment.diva import DivaSynth
synth = DivaSynth()

The two methods of this object requires articulatory trajectories as input, i.e. a numpy array of shape $(13, t)$, where $t$ is the number of time steps. Each of the 13 rows of the array corresponds to the trajectory of a particular articulator. For example, the first articulator globally corresponds to an open/close dimension mainly involving the jaw, as shown in the figure below extracted from the DIVA documentation (the pdf in the zip archive). It illustrates the movements induced by the 10 first articulators (left to right, top to bottom), the 3 last ones controlling the pitch, the pressure and the voicing (see the DIVA documentation for more details). All values in the array should be in the range $[-1, 1]$.

Let's consider the following articulatory trajectory:



In [2]:

    
from numpy import zeros, linspace
art_traj = zeros((13, 1000))  # hence 1000 time steps
art_traj[0, :] = linspace(1, -1, 1000) # The jaw moves linearly for 1 (completely close) to 0 (completely open)
art_traj[11:13, :] = 1  # maximum pressure and voicing to ensure phonation

compute the corresponding sound:



In [3]:

    
sound = synth.sound_wave(art_traj)

plot the sound wave:



In [4]:

    
%pylab inline
plot(sound)









    



Populating the interactive namespace from numpy and matplotlib






    Out[4]:





[<matplotlib.lines.Line2D at 0x7ff6ad76a050>]

Play the sound



In [6]:

    
import pyaudio
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
                channels=1,
                rate=11025,
                output=True)



In [8]:

    
stream.write(sound.astype(float32).tostring())

Compute auditory (aud) and somato-sensory (som) features, as well as vocal tract shapes:



In [9]:

    
aud, som, vt = synth.execute(art_traj)



In [10]:

    
subplot(221)
plot(aud.T)
subplot(222)
plot(som.T)
subplot(223)
plot(real(vt[:, 0]), imag(vt[:,0]))
axis('equal')
subplot(224)
plot(real(vt[:, -1]), imag(vt[:, -1]))
axis('equal')









    Out[10]:





(-20.0, 140.0, -200.0, 100.0)

Explauto environment



In [11]:

    
from explauto.environment.diva import DivaEnvironment, configurations



In [12]:

    
configurations.keys()









    Out[12]:





['default', 'vowel_config', 'full_config', 'low_config']



In [13]:

    
env = DivaEnvironment(**configurations['vowel_config'])



In [14]:

    
from explauto import Agent, SensorimotorModel, InterestModel, Experiment

sm_model = SensorimotorModel.from_configuration(env.conf, 'nearest_neighbor', 'default')
im_model = InterestModel.from_configuration(env.conf, env.conf.s_dims, 'discretized_progress')

ag = Agent(env.conf, sm_model, im_model)

expe = Experiment(env, ag)



In [15]:

    
expe.run(100)









    



WARNING:explauto.agent.agent:Sensorimotor model not bootstrapped yet



In [16]:

    
%pylab inline
ax = axes()
expe.log.scatter_plot(ax, (('sensori', range(2)),))









    



Populating the interactive namespace from numpy and matplotlib



In [ ]:

    
%debug









    



> /home/clement/Documents/Boulot/INRIA_FLOWERS/CODE/explauto/explauto/utils/utils.py(12)bounds_min_max()
     11 def bounds_min_max(v, mins, maxs):
---> 12     res = np.minimum(v, maxs)
     13     res = np.maximum(res, mins)

ipdb> u
> /home/clement/Documents/Boulot/INRIA_FLOWERS/CODE/explauto/explauto/agent/agent.py(114)sensory_primitive()
    113         """
--> 114         return bounds_min_max(s, self.conf.s_mins, self.conf.s_maxs)
    115 

ipdb> p s.shape
(2,)



In [10]:

    
s = env.update(env.random_motors()[0])



In [11]:

    
s.shape









    Out[11]:





(9,)



In [7]:

    
ag.conf.s_ndims









    Out[7]:





4



In [39]:

    
environments['diva'] = (DivaEnvironment, configurations, _)



In [42]:

    
available_configurations('diva').keys()









    Out[42]:





['default', 'vowel_config', 'full_config', 'low_config']



In [2]:

    
env = environment(**configurations['full_config'])



In [3]:

    
from explauto.models.dmp import DmpPrimitive
n_bfs = 8
n_dmps = env.conf.m_ndims
dmp = DmpPrimitive(env.conf.m_ndims, bfs=n_bfs, 
                   used=[False] * n_dmps + [True] * n_dmps * n_bfs + [False] * n_dmps,
                   default = [0.] * n_dmps * (n_bfs + 1) + [0.2] * n_dmps, type='rythmic') #, run_time=4)
# dmp.dmp.cs.timesteps * 4



In [4]:

    
from numpy.random import randn
m = list(70. * randn(dmp.n_dmps * dmp.n_bfs))
xs = dmp.trajectory(m, n_times=8)



In [21]:

    
%pylab inline
plot(xs)









    



Populating the interactive namespace from numpy and matplotlib






    Out[21]:





[<matplotlib.lines.Line2D at 0x7f14822c2490>,
 <matplotlib.lines.Line2D at 0x7f14822c2710>,
 <matplotlib.lines.Line2D at 0x7f14822c2950>,
 <matplotlib.lines.Line2D at 0x7f14822c2b10>,
 <matplotlib.lines.Line2D at 0x7f14822c2cd0>,
 <matplotlib.lines.Line2D at 0x7f14822c2e90>,
 <matplotlib.lines.Line2D at 0x7f14822cb090>,
 <matplotlib.lines.Line2D at 0x7f1482d6e810>,
 <matplotlib.lines.Line2D at 0x7f14822cb410>,
 <matplotlib.lines.Line2D at 0x7f14822cb5d0>,
 <matplotlib.lines.Line2D at 0x7f14822cb790>,
 <matplotlib.lines.Line2D at 0x7f14822cb950>,
 <matplotlib.lines.Line2D at 0x7f14822cbb10>]



In [22]:

    
s= env.sound_wave(xs)



In [ ]:

    
# from explauto.environment import available_configurations
# available_configurations('diva')['full_config']



In [8]:

    
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib



In [15]:

    
plot(s)









    Out[15]:





[<matplotlib.lines.Line2D at 0x7f1482de6510>]



In [6]:

    
import pyaudio
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
                channels=1,
                rate=11025,
                output=True)



In [19]:

    
stream.write(s.astype(float32).tostring())



In [23]:

    
from scipy import interpolate
def interpol(signal, mult):
    """ signal.shape will be multiply by mult """
    x = linspace(0, 1, signal.shape[0])
    #y = np.exp(-x/3.0)
    f = interpolate.interp1d(x, signal.T)

    xnew = linspace(0, 1, signal.shape[0] * mult)
    return f(xnew).T   # use interpolation function returned by `interp1d`



In [38]:

    
import librosa



In [9]:

    
for i in range(100):
    m = list(70. * randn(dmp.n_dmps * dmp.n_bfs))
    xs = dmp.trajectory(m, n_times=1)
    # xs = interpol(xs, 2)
    #n_samples = 1000
    s= env.sound_wave(xs)
    stream.write(s.astype(float32).tostring())
    #librosa.output.write_wav('poppy_voc/sound_' + str(i) + '.wav', s.astype(float32), 11025)









    



11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025
11025






    



---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-9-b7054ecdf516> in <module>()
      4     # xs = interpol(xs, 2)
      5     #n_samples = 1000
----> 6     s= env.sound_wave(xs)
      7     stream.write(s.astype(float32).tostring())
      8     #librosa.output.write_wav('poppy_voc/sound_' + str(i) + '.wav', s.astype(float32), 11025)

/home/clement/Documents/Boulot/INRIA_FLOWERS/CODE/explauto/explauto/environment/diva/diva.pyc in sound_wave(self, art_traj)
     63         synth_art = self.m_default.reshape(1, -1).repeat(len(art_traj), axis=0)
     64         synth_art[:, self.m_used] = art_traj
---> 65         return self.synth.sound_wave(synth_art.T)

/home/clement/Documents/Boulot/INRIA_FLOWERS/CODE/explauto/explauto/environment/diva/diva.pyc in sound_wave(self, art)
     29         self.session.run('sr = sr(1)')
     30         print self.session.getvalue('sr')
---> 31         self.session.run('wave = diva_synth(art, \'sound\')')
     32         return self.session.getvalue('wave')
     33 

/usr/local/lib/python2.7/dist-packages/pymatlab/matlab.pyc in run(self, matlab_statement)
     74         #wrap statement to be able to catch errors
     75         real_statement = wrap_script.format(matlab_statement)
---> 76         self.engine.engEvalString(self.ep,c_char_p(real_statement))
     77         self.engine.engGetVariable.restype=POINTER(mxArray)
     78         mxresult = self.engine.engGetVariable(

KeyboardInterrupt:



In [23]:

    
xs.shape









    Out[23]:





(628, 13)



In [30]:

    
xs.shape, ynew.shape









    Out[30]:





((628, 13), (1256, 13))



In [19]:

    
subplot(121)
plot(xs)
subplot(122)
plot(s)









    Out[19]:





[<matplotlib.lines.Line2D at 0x6e48710>]



In [33]:

    
import librosa
librosa.output.write_wav('sound.wav', s.astype(float32), 11025)



In [34]:

    
import pyaudio
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
                channels=1,
                rate=11025,
                output=True)



In [22]:

    
stream.write(s.astype(float32).tostring())



In [11]:

    
sys.path.append('../../dmpbbo/python/')
sys.path.append('../../dmpbbo/build_dir/python/')
from dmpbbo import Dmp



In [12]:

    
n_dmps = env.conf.m_ndims
bfs = 4
dmp = Dmp(n_dmps, bfs)
dmp.tau = 4.
dmp.set_attractor_state([0.2] * n_dmps)



In [12]:

    
m = list(70. * randn(n_dmps * bfs))
ts, xs, xds, xdds = dmp.trajectory(400, m)



In [13]:

    
xs = array(xs)
#plot(xdds)



In [ ]:

    
dmp.trajectory(



In [10]:

    
xs.shape









    Out[10]:





(400, 13)



In [ ]:

    
sounds = []
for _ in range(100):
    m = list(70. * randn(n_dmps * bfs))
    ts, xs, xds, xdds = dmp.trajectory(400, m)    
    s= env.sound_wave(xs)
    sounds.append(s)
    stream.write(s.astype(float32).tostring())



In [23]:

    
s_44 = librosa.resample(s, 11025, 44100)



In [26]:

    
from numpy import save
save('sounds.npy', sounds)



In [3]:

    
from numpy import load
sounds = load('sounds.npy')



In [29]:

    
len(test)









    Out[29]:





100



In [5]:

    
from numpy import load, float32
import pyaudio
import time


sounds = load('sounds.npy')

pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paFloat32,
                channels=1,
                rate=11025,
                output=True)

for s in sounds:
    stream.write(s.astype(float32).tostring())
    time.sleep(0.4)



In [4]:

    
from scipy.io import wavfile



In [28]:

    
wavfile.write('sound_11025.wav', 11025, s.astype(float32))



In [6]:

    
%pylab inline
plot(sounds[0])









    



Populating the interactive namespace from numpy and matplotlib






    Out[6]:





[<matplotlib.lines.Line2D at 0x401be90>]



In [7]:

    
wavfile.write?



In [10]:

    
sounds[0].dtype









    Out[10]:





dtype('float64')



In [37]:

    
for i, s in enumerate(sounds):
    librosa.output.write_wav('poppy_voc/sound_' + str(i) + '.wav', s.astype(float32), 11025)
#s /= 600.
#wavfile.write('sound.wav', 11025, s.astype(float32))



In [26]:

    
s.dtype









    Out[26]:





dtype('float64')



In [27]:

    
s.astype(float32).dtype









    Out[27]:





dtype('float32')



In [ ]: