In [1]:
import sys
sys.path.append('..')

Types of vocabularies: Existing ones, designing new ones

Vocabularies are found in the submodule ngvoc.

For the moment, "matrix" and "sparse_matrix" are available. they both have fixed size (M,W), only difference is how the information is stored (numpy matrix and scipy sparse.lil_sparse).

To define a vocabulary type, the following things are needed:

  • def get_known_words
  • def get_known_meanings
  • def exists
  • def get_content
  • def add
  • def rm
  • def get_unknown_words
  • def get_unknown_meanings
  • def get_new_unknown_m
  • def get_new_unknown_w
  • def get_random_known_m
  • def get_random_known_w

you can add a visual method to visualize some aspects of your vocabulary. All those methods are inheritable from sparse_matrix.VocSparseMatrix and matrix.VocMatrix.

If you want to design a new Vocabulary class, just add it in the ngvoc folder, in a *.py file. Example is given in TEST.py.

To call it in your experiments, simply give as voc_cfg a configuration dict containing the key/value: 'voc_type':'\$pyfile.\$classname'. You can use any other keys you need, it will be fed directly to the __init__ of your class.

You should use the __init__ of the parent class, via super(), so do not forget to give the appropriate arguments.

Example:

Let's test the vocabulary class VocTest, in TEST.py. The class is a child of VocMatrix, uses its __init__ with M = number/2 and W = 15 (constant). The test method prints the 'testkey' key-value pair in the voc_cfg object. Feel free to test other modifications.


In [2]:
from lib import ngvoc

In [3]:
voc_cfg = {
    'voc_type':'TEST.VocTest',
    'testkey':'Test',
    'number':12
    }

In [4]:
voc_test = ngvoc.Vocabulary(**voc_cfg)

In [5]:
voc_test


Out[5]:
<lib.ngvoc.TEST.VocTest at 0x7ff67aab4290>

In [6]:
voc_test.test()


testkey:TeSt

In [7]:
print(voc_test)


                                    Words
        [[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
         [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
         [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
Meanings [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
         [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
         [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

Designing a Vocabulary with adaptable matrix size

We need to define different things for this:

  • Label the meanings and words globally, outside the vocabulary, being
  • Refer to these global labels