IPA to IMAGE model

Prerequisites

In principle, this should work on a computer without a GPU. It will help if you have a lot of RAM.

Pull latest changes from https://github.com/gchrupala/reimaginet
Download http://grzegorz.chrupala.me/data/coco.zip and unzip it in the reimaginet/data/coco directory. You should have the following files:
- dataset.json - MSCOCO sentences
- vgg_feats.mat - MSCOCO image vectors
- dataset.ipa.jsonl.gz - IPA transcriptions of MSCOCO sentences
Download http://grzegorz.chrupala.me/data/model-ipa.zip and put it in the examples directory (same as this notebook)



In [1]:

    
import imaginet.task









    



Couldn't import dot_parser, loading of dot files will not be possible.






    



Using gpu device 0: Tesla K20m

Load the model

(It will take a couple of minutes)



In [2]:

    
model = imaginet.task.load(path="model-ipa.zip")









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-9ae603bbf245> in <module>()
----> 1 model = imaginet.task.load(path="model-ipa.zip")

/exp2/gchrupala/repos/reimaginet/imaginet/task.pyc in load(path)
    219             scaler  = pickle.loads(zf.read('scaler.pkl'))
    220             batcher = pickle.loads(zf.read('batcher.pkl'))
--> 221             maker   = pickle.loads(zf.read('maker.pkl'))
    222         return Model(maker, config, scaler, batcher, weights=weights)

/usr/lib/python2.7/zipfile.pyc in read(self, name, pwd)
    933     def read(self, name, pwd=None):
    934         """Return file bytes (as a string) for name."""
--> 935         return self.open(name, "r", pwd).read()
    936 
    937     def open(self, name, mode="r", pwd=None):

/usr/lib/python2.7/zipfile.pyc in open(self, name, mode, pwd)
    959             else:
    960                 # Get info object for name
--> 961                 zinfo = self.getinfo(name)
    962 
    963             zef_file.seek(zinfo.header_offset, 0)

/usr/lib/python2.7/zipfile.pyc in getinfo(self, name)
    907         if info is None:
    908             raise KeyError(
--> 909                 'There is no item named %r in the archive' % name)
    910 
    911         return info

KeyError: "There is no item named 'maker.pkl' in the archive"

Symbol embeddings



In [3]:

    
emb = imaginet.task.embeddings(model)
print(emb.shape)

The table of IPA symbols corresponding to the 49 dimensions



In [5]:

    
symb = imaginet.task.symbols(model)
print " ".join(symb.values())









    



<BEG> <END> <UNK> t ˈ u ː k æ s l e ɪ ŋ ə ɡ ˌ ɛ ð ɚ ɹ ɑ n p o ʊ z v a ʃ ɐ m ᵻ d i f j b ɔ h w ʒ ʌ ɾ θ ɜ ̃ r ʲ

Let's display the embeddings projected to 2D via PCA



In [7]:

    
%pylab inline
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
xy = pca.fit_transform(emb)









    



Populating the interactive namespace from numpy and matplotlib



In [8]:

    
pylab.rc('font', family='DejaVu Sans')
pylab.figure(figsize=(8,8))
pylab.scatter(xy[:,0], xy[:,1], alpha=0.1)
for j,symb_j in symb.items():
    pylab.text(xy[j,0], xy[j,1], symb_j)









    



/usr/local/lib/python2.7/dist-packages/matplotlib-1.4.3-py2.7-linux-x86_64.egg/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):

Seems mostly random...

Load MSCOCO validation data



In [9]:

    
from imaginet.data_provider import getDataProvider
# Adjust the root to point to the directory above data
prov = getDataProvider('coco', root="..")



In [10]:

    
sents = list(prov.iterSentences(split='val'))



In [11]:

    
from imaginet.simple_data import phonemes
sents_ipa = [ phonemes(sent) for sent in sents ]

Project sentences to state space



In [12]:

    
reps = imaginet.task.representation(model, sents_ipa)

Find similar sentences in state space

Compute the pairwise cosine distance matrix.



In [13]:

    
from scipy.spatial.distance import cdist
distance = cdist(reps, reps, metric='cosine')

Display neighbors for a sentence



In [14]:

    
import numpy
def neighbors(k, distance=distance):
    nn =  numpy.argsort(distance[k,:])[1:5]
    print sents[k]['raw'], ''.join(sents_ipa[k])
    for n in nn:
        print u"✔" if sents[n]['imgid']==sents[k]['imgid'] else u"✘", \
        sents[n]['raw'], ''.join(sents_ipa[n])



In [15]:

    
import random



In [16]:

    
for _ in range(10):
    neighbors(random.randint(0, len(sents)))
    print









    



Man dishing out pizza onto a plate held by another person. mˈændˈɪʃɪŋˈaʊtpˈiːtsəˌɑːntʊɐplˈeɪthˈɛldbaɪɐnˈʌðɚpˈɜːsən
✔ A man placing pizza on a plate through an opening. ɐmˈænplˈeɪsɪŋpˈiːtsəˌɑːnɐplˈeɪtθɹuːɐnˈoʊpənɪŋ
✘ a man decorating a pizza on the table ɐmˈændˈɛkoːɹˌeɪɾɪŋɐpˈiːtsəɑːnðətˈeɪbəl
✘ a man is cutting a pizza on a table ɐmˈænɪzkˈʌɾɪŋɐpˈiːtsəˌɑːnɐtˈeɪbəl
✘ Guys looking at contents of a pizza with glasses. ɡˈaɪzlˈʊkɪŋætkˈɑːntɛntsəvəpˈiːtsəwɪðɡlˈæsᵻz

A cook preparing fresh pizza in a kitchen. ɐkˈʊkpɹɪpˈɛɹɹɪŋfɹˈɛʃpˈiːtsəɪnɐkˈɪtʃən
✘ a man working on a pizza in a kitchen ɐmˈænwˈɜːkɪŋˌɑːnɐpˈiːtsəɪnɐkˈɪtʃən
✘ A man preparing a pizza inside of a kitchen. ɐmˈænpɹɪpˈɛɹɹɪŋɐpˈiːtsəɪnsˈaɪdəvəkˈɪtʃən
✘ A man standing in a kitchen preparing a pizza. ɐmˈænstˈændɪŋɪnɐkˈɪtʃənpɹɪpˈɛɹɹɪŋɐpˈiːtsə
✘ Man working in a restaurant kitchen cutting a pizza. mˈænwˈɜːkɪŋɪnɐɹˈɛstɹɑːntkˈɪtʃənkˈʌɾɪŋɐpˈiːtsə

A person with a toothbrush in their mouth with a baby. ɐpˈɜːsənwɪðɐtˈuːθbɹʌʃɪnðɛɹmˈaʊθwɪðɐbˈeɪbi
✘ A man is holding a baby with a toothbrush in its mouth. ɐmˈænɪzhˈoʊldɪŋɐbˈeɪbiwɪðɐtˈuːθbɹʌʃɪnɪtsmˈaʊθ
✘ An adult holding a baby that is brushing their teeth. ɐnɐdˈʌlthˈoʊldɪŋɐbˈeɪbiðætɪzbɹˈʌʃɪŋðɛɹtˈiːθ
✘ The baby is brushing his teeth with his tooth brush ðəbˈeɪbiɪzbɹˈʌʃɪŋhɪztˈiːθwɪðhɪztˈuːθbɹˈʌʃ
✔ A baby is sitting in a tub looking at a toothbrush. ɐbˈeɪbiɪzsˈɪɾɪŋɪnɐtˈʌblˈʊkɪŋæɾətˈuːθbɹʌʃ

Horses are sticking their heads out from the rocks hˈɔːɹsᵻzɑːɹstˈɪkɪŋðɛɹhˈɛdzˈaʊtfɹʌmðəɹˈɑːks
✘ A horse standing near a wood fence being looked at through a camera lens ɐhˈɔːɹsstˈændɪŋnˌɪɹɐwˈʊdfˈɛnsbˌiːɪŋlˈʊktætθɹuːɐkˈæmɹəlˈɛnz
✘ A light brown horse's face is shown at close range. ɐlˈaɪtbɹˈaʊnhˈɔːɹsɪzfˈeɪsɪzʃˈoʊnætklˈoʊsɹˈeɪndʒ
✘ A group of horses standing next to each other. ɐɡɹˈuːpʌvhˈɔːɹsᵻzstˈændɪŋnˈɛksttʊˈiːtʃˈʌðɚ
✘ A close shot of a horse standing in its area.  ɐklˈoʊsʃˈɑːtəvəhˈɔːɹsstˈændɪŋɪnɪtsˈɛɹiə

A police officer sitting on a bench in a hallway.  ɐpəlˈiːsˈɑːfɪsɚsˈɪɾɪŋˌɑːnɐbˈɛntʃɪnɐhˈɔːlweɪ
✘ A man sits on the bench at the bus station while a person in costume stands nearby. ɐmˈænsˈɪtsɑːnðəbˈɛntʃætðəbˈʌsstˈeɪʃənwˌaɪlɐpˈɜːsənɪnkˈɔstuːmstˈændznɪɹbˈaɪ
✘ A woman sitting on a metal bench playing an accordion with passersby. ɐwˈʊmənsˈɪɾɪŋˌɑːnɐmˈɛɾəlbˈɛntʃplˈeɪɪŋɐnɐkˈoːɹdiənwɪðpˌæsɚzbˈaɪ
✘ Couple of women sitting on benches while other people walk kˈʌpəlʌvwˈɪmɪnsˈɪɾɪŋˌɑːnbˈɛntʃᵻzwˌaɪlˈʌðɚpˈiːpəlwˈɔːk
✘ Two young men setting on a bench at the mall, one is on a cell phone. tˈuːjˈʌŋmˈɛnsˈɛɾɪŋˌɑːnɐbˈɛntʃætðəmˈɔːlwˈʌnɪzˌɑːnɐsˈɛlfˈoʊn

A young man playing tennis and preparing to hit the ball. ɐjˈʌŋmˈænplˈeɪɪŋtˈɛnᵻsændpɹɪpˈɛɹɹɪŋtəhˈɪtðəbˈɔːl
✘ A young man swinging a tennis racquet at a tennis ball. ɐjˈʌŋmˈænswˈɪŋɪŋɐtˈɛnᵻsɹˈækeɪæɾətˈɛnᵻsbˈɔːl
✘ A guy playing tennis about to hit the ball.  ɐɡˈaɪplˈeɪɪŋtˈɛnᵻsɐbˌaʊttəhˈɪtðəbˈɔːl
✘ A young man hits a tennis ball while wearing a helmet. ɐjˈʌŋmˈænhˈɪtsɐtˈɛnᵻsbˈɔːlwˌaɪlwˈɛɹɪŋɐhˈɛlmɪt
✘ A man playing tennis with his racket raised to receive the ball. ɐmˈænplˈeɪɪŋtˈɛnᵻswɪðhɪzɹˈækɪtɹˈeɪzdtəɹɪsˈiːvðəbˈɔːl

Woman in sunglasses preparing to take a bite of a sandwich. wˈʊmənɪnsˈʌŋɡlæsᵻzpɹɪpˈɛɹɹɪŋtətˈeɪkɐbˈaɪtəvəsˈændwɪtʃ
✔ A woman makes a face as she bites into a sandwich. ɐwˈʊmənmˌeɪksɐfˈeɪsæzʃiːbˈaɪtsˌɪntʊɐsˈændwɪtʃ
✘ A girl is taking a bite out of a sandwich. ɐɡˈɜːlɪztˈeɪkɪŋɐbˈaɪtˌaʊɾəvɐsˈændwɪtʃ
✘ A woman is taking a bite out of a hot dog. ɐwˈʊmənɪztˈeɪkɪŋɐbˈaɪtˌaʊɾəvɐhˈɑːtdˈɑːɡ
✘ A man in glasses is eating a sandwich. ɐmˈænɪnɡlˈæsᵻzɪzˈiːɾɪŋɐsˈændwɪtʃ

A bunch of jackets hanging on the all in a room. ɐbˈʌntʃʌvdʒˈækɪtshˈæŋɪŋɑːnðɪˈɔːlɪnɐɹˈuːm
✘ A group of people at a podium in a room. ɐɡɹˈuːpʌvpˈiːpəlæɾəpˈoʊdiəmɪnɐɹˈuːm
✘ Many people in business attire are sitting around tables. mˈɛnipˈiːpəlɪnbˈɪznəsɐtˈaɪɚɹɑːɹsˈɪɾɪŋɐɹˈaʊndtˈeɪbəlz
✘ A man gives a toast in a dining room full of people.  ɐmˈænɡˈɪvzɐtˈoʊstɪnɐdˈaɪnɪŋɹˈuːmfˈʊlʌvpˈiːpəl
✘ Many mannequin heads feature fashionable scarves and hats. mˈɛnimˈænɪkwˌɪnhˈɛdzfˈiːtʃɚfˈæʃənəbəlskˈɑːɹvzændhˈæts

young girls standing under an umbrella with writing on it jˈʌŋɡˈɜːlzstˈændɪŋˌʌndɚɹɐnʌmbɹˈɛləwɪðɹˈaɪɾɪŋˈɑːnɪt
✔ a couple of girls stand underneath an umbrella  ɐkˈʌpəlʌvɡˈɜːlzstˈændˌʌndɚnˈiːθɐnʌmbɹˈɛlə
✘ Two little girls standing next to each other holding umbrellas. tˈuːlˈɪɾəlɡˈɜːlzstˈændɪŋnˈɛksttʊˈiːtʃˈʌðɚhˈoʊldɪŋʌmbɹˈɛləz
✘ A girl and a woman holding umbrellas.  ɐɡˈɜːlændɐwˈʊmənhˈoʊldɪŋʌmbɹˈɛləz
✘ A girl in a white outfit standing with an umbrella. ɐɡˈɜːlɪnɐwˈaɪtˈaʊtfɪtstˈændɪŋwɪðɐnʌmbɹˈɛlə

A cheetah, cow, horse, and woman looking down a hallway. ɐtʃˈiːɾəkˈaʊhˈɔːɹsændwˈʊmənlˈʊkɪŋdˌaʊnɐhˈɔːlweɪ
✘ A cow has a ring in its nose near two men. ɐkˈaʊhɐzɐɹˈɪŋɪnɪtsnˈoʊznˌɪɹtˈuːmˈɛn
✘ A storm is brewing overhead as cows appear to be heading somewhere else.  ɐstˈoːɹmɪzbɹˈuːɪŋˌoʊvɚhˈɛdæzkˈaʊzɐpˈɪɹtəbihˈɛdɪŋsˈʌmwɛɹˈɛls
✘ The cow has a map drawn on it. ðəkˈaʊhɐzɐmˈæpdɹˈɔːnˈɑːnɪt
✘ A cow standing next to a calf in the corner of a paddock with a dirt floor. ɐkˈaʊstˈændɪŋnˈɛksttʊɐkˈæfɪnðəkˈɔːɹnɚɹəvəpˈædəkwɪðɐdˈɜːtflˈoːɹ

Tracing the evolution of states



In [17]:

    
import imaginet.tracer
reload(imaginet.tracer)









    Out[17]:





<module 'imaginet.tracer' from '/exp2/gchrupala/repos/reimaginet/imaginet/tracer.pyc'>



In [18]:

    
tr = imaginet.tracer.Tracer()



In [19]:

    
tr.fit(reps)









    



Embedding
Fitting PCA



In [20]:

    
tr.proj.explained_variance_









    Out[20]:





array([ 16.707798 ,  12.0665617], dtype=float32)

Use espeak to convert graphemes to phonemes



In [21]:

    
from subprocess import check_output
def espeak(words):
    return phon(check_output(["espeak", "-q", "--ipa",
                        '-v', 'en-us',
                        words]).decode('utf-8'))
def phon(inp):
    return list(''.join(inp.split()))



In [22]:

    
def trace(orths, tracer=tr, model=model, eos=True, size=(6,6)):
    ipas = [ espeak(orth) for orth in orths ]
    states = [ imaginet.task.states(model, ipa) for ipa in ipas ]
    pylab.figure(figsize=size)
    tracer.traces(ipas, states, eos=eos)

Plot traces of example sentences



In [23]:

    
trace(["A bowl of salad","A plate of pizza","A brown dog", "A black cat"])



In [24]:

    
trace(["a girl skiing", "a girl wind surfing", "a girl water skiing",])



In [25]:

    
trace(["a cow", "a baby cow","a tiny baby cow"])



In [26]:

    
trace(["some food on a table","a computer on a table","a table with food"])
pylab.axis('off')









    Out[26]:





(-5.0, 1.0, -1.0, 0.60000000000000009)



In [42]:

    
trace(["a bear in a cage", "a brown bear in the zoo","a teddy bear on a chair"])



In [ ]: