(C) 2016-2019 by Damir Cavar <dcavar@iu.edu>
Version: 1.1, November 2019
You will find more details on WordNet as such on the WordNet website.
Some content and ideas in the following introduction are taken from the NLTK-howto on WordNet.
In [1]:
from nltk.corpus import wordnet
WordNet is a lexical resource that organizes nouns, verbs, adjectives, and adverbs into some form of taxonomy. Lexical items are for example organized in groups of synonyms. In WordNet these synonym groups are calls synsets. Every each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
In [2]:
wordnet.synsets('can')
Out[2]:
The output for the synset contains all synonyms of the word can in a list. Each individual synset is a dot-delimited triple that specifies the word, the part-of-speech (PoS) of the specific words, and a running number from 1 to n, for every specific synset. The PoS-tag n stands for noun and the PoS-tag v for verb.
You can request the synset providing the full code:
In [3]:
wordnet.synset('can.v.01')
Out[3]:
You can output the definition of any such synset:
In [6]:
wordnet.synset('displace.v.03').definition()
Out[6]:
In [5]:
wordnet.synsets('can', pos=wordnet.VERB)
Out[5]:
The possible PoS-tags are: ADJ, ADJ_SAT, ADV, NOUN, VERB.
In [7]:
wordnet.synset('can.v.01').lemmas()
Out[7]:
You can map the lammas to a list of strings using the following list comprehension function:
In [8]:
[str(lemma.name()) for lemma in wordnet.synset('can.v.01').lemmas()]
Out[8]:
The NLTK WordNet reader provides access to a multi-lingual WordNet, that is the Open Multilingual WordNet. The multi-lingual data is accessible using ISO-639 language codes (see the ISO-639 Wikipedia page):
In [9]:
wordnet.langs()
Out[9]:
To access the synsets of the Croatian (hrv) word kuća, you can use the language code specification in the synset function:
In [11]:
wordnet.synsets('kot', lang='pol')
Out[11]:
We can even request the list of lemmas in a specific language for a given English word, for example the synset 01 for the noun house would have the following lemmas in Croatian (hrv):
In [14]:
wordnet.synset('house.n.01').lemma_names('jpn')
Out[14]:
The same word would have the following lemmas in Japanese:
In [18]:
wordnet.synset('house.n.01').lemma_names('jpn')
Out[18]:
We can save the synset request in a variable called house:
In [15]:
house = wordnet.synset('house.n.02')
We can now request the hypernyms for the word house using the variable:
In [16]:
house.hypernyms()
Out[16]:
Try this for some other words like trout and poodle:
In [17]:
wordnet.synset('trout.n.01').hypernyms()
Out[17]:
In [18]:
wordnet.synset('poodle.n.01').hypernyms()
Out[18]:
In [19]:
wordnet.synset('dog.n.01').hyponyms()
Out[19]:
In [23]:
dog = wordnet.synset('bird.n.01')
dog.member_holonyms()
Out[23]:
We can also request the root hypernym for some word:
In [25]:
wordnet.synset('finger.n.01').root_hypernyms()
Out[25]:
We can request the lowest common hypernym for two words, here for example for leg and arm:
In [26]:
wordnet.synset('leg.n.01').lowest_common_hypernyms(wordnet.synset('arm.n.01'))
Out[26]:
In addition to hypernym, hyponyms, holonyms, WordNet also provides the means to request antonyms), derivationally related forms and pertainyms. Consider for example the word good. You can request the antonyms for a lemma, that is we fetch all lemmas of the synset good and request the antonyms for the first lemma:
In [28]:
wordnet.synset('white.a.01').lemmas()[0].antonyms()
Out[28]:
We can now fetch the lemma names for good for Slovenian for example:
In [29]:
wordnet.synset('cold.n.01').lemma_names('slv')
Out[29]:
Once again, the lemma names we can now use to request their Spanish lemma names:
In [30]:
slv_good = wordnet.synset('dog.n.01').lemma_names('spa')
print(slv_good)
In [31]:
wordnet.lemma('singer.n.01.singer').derivationally_related_forms()
Out[31]:
We can also request the pertainyms for specific words:
In [32]:
wordnet.lemma('vocal.a.01.vocal').pertainyms()
Out[32]:
For verbs we can for example request the verb frames from WordNet. In the following example we request the frames for all the different lemmas of the verb sleep:
In [36]:
wordnet.synsets("say")
wordnet.synset('say.v.07').definition()
Out[36]:
In [33]:
wordnet.synset('say.v.01').frame_ids()
for lemma in wordnet.synset('say.v.01').lemmas():
print(lemma, lemma.frame_ids())
print(" | ".join(lemma.frame_strings()))
In the following example we request the verb-frames for the ditransitive verb to give:
In [37]:
wordnet.synset('give.v.01').frame_ids()
for lemma in wordnet.synset('give.v.01').lemmas():
print(lemma, lemma.frame_ids())
print(" | ".join(lemma.frame_strings()))
For many tasks in NLP one needs a lemmatizer or morphological analyzer to map inflected word forms to lemmas. Morphy in the WordNet module of the NLTK can do that. To lemmatize a word, provide the word and the PoS to the morphy function in wordnet:
In [38]:
wordnet.morphy('calls', wordnet.NOUN)
Out[38]:
Morphy can cope with surface forms that are the result of various rules of English word formations, as for example e-insertion or consonant reduplication:
In [40]:
wordnet.morphy('stopped', wordnet.VERB)
Out[40]:
...
Fellbaum, Christiane (2005). WordNet and wordnets. In: Brown, Keith et al. (eds.), Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, 665-670.
(C) 2016-2019 by Damir Cavar <dcavar@iu.edu>