NLTK (Natural Language Toolkit) is a python library for accessing many NLP tools and resources. The NLTK WordNet interface is described here: http://www.nltk.org/howto/wordnet.html
The NLTK python package can be installed using pip:
In [ ]:
!pip install nltk
Import nltk and use its internal download tool to get WordNet:
In [ ]:
import nltk
nltk.download('wordnet')
Import the wordnet module:
In [ ]:
from nltk.corpus import wordnet as wn
Access synsets of a word using the synsets function:
In [ ]:
club_synsets = wn.synsets('club')
print(club_synsets)
Each synset has a definition function:
In [ ]:
for synset in club_synsets:
print("{0}\t{1}".format(synset.name(), synset.definition()))
In [ ]:
dog = wn.synsets('dog')[0]
dog.definition()
List lemmas of a synset:
In [ ]:
dog.lemmas()
List hypernyms and hyponyms of a synset
In [ ]:
dog.hypernyms()
In [ ]:
dog.hyponyms()
The closure method of synsets allows us to retrieve the transitive closure of the hypernym, hyponym, etc. relations:
In [ ]:
list(dog.closure(lambda s: s.hypernyms()))
common_hypernyms and lowest_common_hypernyms work in relation to another synset:
In [ ]:
cat = wn.synsets('cat')[0]
dog.lowest_common_hypernyms(cat)
In [ ]:
dog.common_hypernyms(cat)
In [ ]:
dog.path_similarity(cat)
To iterate through all synsets, possibly by POS-tag, use all_synsets, which returns a generator:
In [ ]:
wn.all_synsets(pos='n')
In [ ]:
for c, noun in enumerate(wn.all_synsets(pos='n')):
if c > 5:
break
print(noun.name())
Excercise (optional): use WordNet to implement the "Guess the category" game: the program lists lemmas that all share a hypernym, which the user has to guess.
In [ ]:
!wget http://sandbox.hlt.bme.hu/~recski/stuff/glove.6B.50d.txt.gz
!gunzip -f glove.6B.50d.txt.gz
In [ ]:
import numpy as np
In [ ]:
In [ ]:
In [ ]:
# words, word_index, emb = read_embedding('glove.6B.50d.txt')
# emb = normalize_embedding(emb)
In [ ]:
In [ ]:
# vec_sim('cat', 'dog', word_index, emb)
In [ ]:
In [ ]:
# print(nearest_n('dog', words, word_index, emb))
# print(nearest_n('king', words, word_index, emb))
Use the code written in 11.E2 to analyze word groups in WordNet:
In [ ]:
In [ ]:
# synset_emb = embed_synsets(words, word_index, emb)
In [ ]:
In [ ]:
# synset_sim(dog, cat, synset_emb)
In [ ]:
In [ ]:
# nearest_n_synsets(wn.synsets('penguin')[0], synset_emb, 10)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
# compare_sims(sample, synset_emb, word_index, emb)