Short-Sentence Similarity using Gensim Word Mover Distance

1. Gensim Word-Movers model

Reference:

Note: Refer to other similarity functions
https://radimrehurek.com/gensim/models/word2vec.html



In [1]:

    
# Importing the dependecies
import gensim









    



c:\users\manojkumar_meno\appdata\local\programs\python\python35\lib\site-packages\gensim\utils.py:860: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
  warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")

Load the Google's pre-trained model



In [ ]:

    
#load word2vec model, here GoogleNews is used , this should be downloaded and to be loaded from the local path
model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)



In [34]:

    
# intent list for the application
intent_list = ['Read the news', 'Hello', 'Get my news', 'Get feed', 'Read my feed']

Algorithm



In [35]:

    
print ("provide your intent")
input_intent = input()
intent_similiarity_map = dict()

print ("\n")
for each in intent_list:
    
    #calculate distance between two sentences using WMD algorithm (Word Movers Distance)
    distance = model.wmdistance(each, input_intent)
    # map the values into a dictionary
    intent_similiarity_map[each] = distance
    
print (intent_similiarity_map)
print ("\n")
print ("Selected Intent for the given user input")
# pick the intent with minimum distance
print (min(intent_similiarity_map, key = intent_similiarity_map.get))









    



provide your intent
What is the today's news?


{'Read the news': 1.2377885709077778, 'Get my news': 1.3753294548871398, 'Read my feed': 2.1875672564865503, 'Get feed': 2.0380287568701823, 'Hello': 2.2574398337508508}


Selected Intent for the given user input
Read the news