Short-Sentence Similarity using Gensim Word Mover Distance

1. Gensim Word-Movers model

Reference:

Note: Refer to other similarity functions
https://radimrehurek.com/gensim/models/word2vec.html

In [1]:
# Importing the dependecies
import gensim


c:\users\manojkumar_meno\appdata\local\programs\python\python35\lib\site-packages\gensim\utils.py:860: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
  warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
Load the Google's pre-trained model

In [ ]:
#load word2vec model, here GoogleNews is used , this should be downloaded and to be loaded from the local path
model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

In [34]:
# intent list for the application
intent_list = ['Read the news', 'Hello', 'Get my news', 'Get feed', 'Read my feed']
Algorithm

In [35]:
print ("provide your intent")
input_intent = input()
intent_similiarity_map = dict()

print ("\n")
for each in intent_list:
    
    #calculate distance between two sentences using WMD algorithm (Word Movers Distance)
    distance = model.wmdistance(each, input_intent)
    # map the values into a dictionary
    intent_similiarity_map[each] = distance
    
print (intent_similiarity_map)
print ("\n")
print ("Selected Intent for the given user input")
# pick the intent with minimum distance
print (min(intent_similiarity_map, key = intent_similiarity_map.get))


provide your intent
What is the today's news?


{'Read the news': 1.2377885709077778, 'Get my news': 1.3753294548871398, 'Read my feed': 2.1875672564865503, 'Get feed': 2.0380287568701823, 'Hello': 2.2574398337508508}


Selected Intent for the given user input
Read the news