In [1]:
import numpy as np
In [2]:
from dstoolbox.models import W2VClassifier
Mock the fit method so that we don't depend on an external file.
In [3]:
word2idx = {'herren': 0, 'damen': 1, 'stiefel': 2, 'rock': 3}
word_embeddings = np.array([
[0.1, 0.1, 0.1],
[10.0, 10.0, 10.1],
[-0.1, -0.1, -0.1],
[-0.1, -0.1, 0.1],
])
def mock_fit(self, X=None, y=None):
from dstoolbox.utils import normalize_matrix
self.word2idx = word2idx
idx2word = {val: key for key, val in word2idx.items()}
self.classes_ = np.array([idx2word[i] for i in range(len(idx2word))])
self.idx2word = idx2word
self.syn0 = normalize_matrix(word_embeddings)
return self
In [4]:
print(word2idx)
print(word_embeddings)
In [5]:
setattr(W2VClassifier, 'fit', mock_fit)
In [6]:
clf = W2VClassifier('a/path', topn=3).fit()
The most_similar
method works similarly to the gensim method of the same name but does not support multiple positive terms or any negative terms.
In [7]:
clf.classes_
Out[7]:
In [8]:
clf.most_similar('herren')
Out[8]:
In [9]:
clf.most_similar(['damen'])
Out[9]:
In [10]:
clf.most_similar('rock')
Out[10]:
The predict method works similarly to what would be expected from an sklearn classifier. The classes corresponding to the indices can be found in the classes_
attribute.
A predict_proba
method does not exist, since it is not well defined for this case.
In [11]:
clf.predict(['herren', 'damen', 'rock'])
Out[11]:
In [12]:
clf.classes_[clf.predict(['herren', 'damen', 'rock'])]
Out[12]:
Similarly to KNeighborsClassifier et al., the W2VClassifier supports the kneighbors
method.
In [13]:
clf.kneighbors(['herren', 'rock'], return_distance=False)
Out[13]:
In [14]:
clf.kneighbors(['herren', 'rock'])
Out[14]:
In [15]:
clf.most_similar(['herren', 'rock'])
Out[15]:
The new search will result in an update in the dictionary and can thus be retrieved at a later point in time
In [16]:
clf.classes_
Out[16]:
In [17]:
clf.most_similar('rock')
Out[17]: