Python WordNet using NLTK

(C) 2017-2019 by Damir Cavar

Version: 1.1, November 2019

This is a tutorial related to the discussion of a WordSense disambiguation and various machine learning strategies discussed in the textbook Machine Learning: The Art and Science of Algorithms that Make Sense of Data by Peter Flach.

This tutorial was developed as part of my course material for the course Machine Learning for Computational Linguistics in the Computational Linguistics Program of the Department of Linguistics at Indiana University.

Using WordNet

Importing wordnet from the NLTK module:


In [1]:
from nltk.corpus import wordnet

Asking for a synset in WordNet:


In [2]:
wordnet.synsets('cat')


Out[2]:
[Synset('cat.n.01'),
 Synset('guy.n.01'),
 Synset('cat.n.03'),
 Synset('kat.n.01'),
 Synset('cat-o'-nine-tails.n.01'),
 Synset('caterpillar.n.02'),
 Synset('big_cat.n.01'),
 Synset('computerized_tomography.n.01'),
 Synset('cat.v.01'),
 Synset('vomit.v.01')]

A synset is identified with a 3-part name of the form: word.pos.nn. Except of the last synset, all other synsets of dog above are nouns with the part-of-speech tag n. We can pick a synset with a specific PoS:


In [3]:
wordnet.synsets('dog', pos=wordnet.VERB)


Out[3]:
[Synset('chase.v.01')]

Besides VERB the other parts of speech are NOUN, ADJ and ADV.

We can select a specific synset from the list using the full 3-part name notation:


In [4]:
wordnet.synset('dog.n.01')


Out[4]:
Synset('dog.n.01')

Fort this particular synset we can fetch the definition:


In [6]:
print(wordnet.synset('dog.n.01').definition())


a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds

Synsets might also have examples. We can count the number of examples for this concrete synset this way:


In [7]:
len(wordnet.synset('dog.n.01').examples())


Out[7]:
1

We can print out the example using:


In [8]:
print(wordnet.synset('dog.n.01').examples()[0])


the dog barked all night

We can also output the lemmata for a specific synset:


In [9]:
wordnet.synset('dog.n.01').lemmas()


Out[9]:
[Lemma('dog.n.01.dog'),
 Lemma('dog.n.01.domestic_dog'),
 Lemma('dog.n.01.Canis_familiaris')]

Using list comprehension we can convert this list to just the lemma list:


In [10]:
[str(lemma.name()) for lemma in wordnet.synset('dog.n.01').lemmas()]


Out[10]:
['dog', 'domestic_dog', 'Canis_familiaris']

We can also reference a concrete lemma:


In [11]:
wordnet.lemma('dog.n.01.dog')


Out[11]:
Lemma('dog.n.01.dog')

Multilingual Functions

The current version of WordNet in NLTK is multilingual. To see which languages are supported, use this command:


In [12]:
sorted(wordnet.langs())


Out[12]:
['als',
 'arb',
 'bul',
 'cat',
 'cmn',
 'dan',
 'ell',
 'eng',
 'eus',
 'fas',
 'fin',
 'fra',
 'glg',
 'heb',
 'hrv',
 'ind',
 'ita',
 'jpn',
 'nld',
 'nno',
 'nob',
 'pol',
 'por',
 'qcn',
 'slv',
 'spa',
 'swe',
 'tha',
 'zsm']

We can ask for the Japanese names of synsets:


In [16]:
wordnet.synset('dog.n.01').lemma_names('cmn')


Out[16]:
['犬', '狗']

We can fetch the English lemmata from different languages for a specific synset:


In [17]:
wordnet.lemmas('cane', lang='ita')


Out[17]:
[Lemma('dog.n.01.cane'),
 Lemma('cramp.n.02.cane'),
 Lemma('hammer.n.01.cane'),
 Lemma('bad_person.n.01.cane'),
 Lemma('incompetent.n.01.cane')]

Synonyms, hypernyms, holonyms


In [ ]:
dog = wordnet.synset('dog.n.01')

In [ ]:
dog.hypernyms()

In [ ]:
dog.hyponyms()

In [ ]:
dog.member_holonyms()

In [ ]:
dog.root_hypernyms()

In [ ]:
wordnet.synset('dog.n.01').lowest_common_hypernyms(wordnet.synset('cat.n.01'))

In [ ]:
good = wordnet.synset('good.a.01')

In [ ]:
good.lemmas()[0].antonyms()

In [ ]: