11. Semantics 1: words

11.1 What is (computational) semantics?

11.3 Lexical relationships: synonymy, homonymy, hypernymy

11.1 What is (computational) semantics?

We have seen ways to analyze, parse, annotate, translate, etc. written text...

...but without modeling its meaning

Semantics is the study of meaning

Semantics is the study of meaning, and in linguistics and NLP it refers to the meaning of linguistic utterances. Applications covered in this course so far are all possible without any explicit representation of what certain linguistic structures mean. Even complex processes like machine translation can be ignorant of what each word, phrase or sentence means, i.e. what information it conveys to speakers of the language.

Since the semantics of an utterance can be thought of as a mapping from linguistic structures to a representation of the world, it is closely connected with the fields of philosophy, logic, and knowledge representation. Some consider semantics to be AI-complete, i.e. at least as difficult as modeling human cognition.

Applications 1: supporting common NLP tasks

Many levels of language processing can benefit from analyzing meaning:

Syntactic parsing

I made spagetti with meatballs

I made spagetti with my sister

I shot an elephant in my pajamas

Machine translation

Applications 2: Semantics Tasks

Other tasks rely on semantics so heavily, they are considered semantic technologies:

question answering
recognizing entailment
semantic web
personal assistants
conversational agents

Question answering

the process of generating adequate answers to user's questions, based on some knowledge of the world:

Recognizing entailment

deciding whether one statement implies another or not

Bikel, D., & Zitouni, I. (2012). Multilingual natural language processing applications: from theory to practice. IBM Press.

Semantic web

enabling computers to understand what is on the internet and what you can do on the internet

Watch Bruce Willis use the semantic web in 1997

Personal assistants

such as Apple Siri, Amazon Alexa, or Google Now

Conversational agents (chatbots),

Systems that can, to some extent, carry human-like conversations

Semantic analysis / Semantic parsing

The task of mapping linguistic units to some representation of their meaning

But what types of units?
- Words?
- Phrases, sentences?
- Paragraphs, documents?

And what representation? How do we represent meaning? What is meaning?
- is it a graph?
- or a formula in first-order logic?
- or a real-valued vector?
- or something else?

(We'll see some examples in 2 weeks)

Semantic analysis is the process of determining the meaning of linguistic units. While all technologies introduced so far can benefit from such analyses, the ones discussed in this and the following two lectures are outright impossible unless we build explicit representations of the information content of linguistic data. Mapping linguistic data to some representation of meaning requires us to choose a semantic representation

Today there exist dozens of different theories and systems of semantic representation. In syntactic or morphological analysis, there are theoretical concepts that are widely accepted by linguists and used by engineers, such as the constituent structure of a sentence or the concept of verb tense. There is no such agreement on the basic elements of semantic representation.

In a narrow sense, semantic analysis involves modeling the meaning of a sentence only as far as it can be determined without knowing the context in which it is uttered, the previous knowledge (information state) of each speaker, etc. Detecting meaning as a function of linguistic form only is sometimes called syntax-driven semantic analysis or semantic parsing.

In the broader sense, semantic analysis involves modeling all new information that an utterance conveys, and thus includes the process of inference. In this case the analysis of the sentence What did you do today? should at least be aware of the identity of the person this question was addressed to and the exact time when the language was uttered. It is far from trivial to define the limits of this broader process: the true scope of such a question is actually determined by factors such as the nature of the relationship between speakers (the answer is different if the question is asked by someone's boss or by a friend) or the history of interactions between them. The field of linguistics concerned with such factors is called pragmatics.

11.2 Word meaning

We know that dog is a singular common noun, but how do we distinguish it from cat, television, Monday, or peace?

Two major approaches:

decomposing meaning into elements or features (e.g. a dog is an animal, four-legged, faithful, etc.) (discrete representation)

modeling meaning as distribution - the contexts in which it appears (e.g. dog is likely to appear in the context I take my ... for a walk twice a day) (continuous representation)

Decomposing meaning

dog: animal, four-legged, faithful, barks

peace: period, no war

Advantages

transparent representation (we understand what each element means)

makes it straightforward to model lexical relationships (synonymy, hypernymy, similarity, etc., see 11.3 on what these mean)

Problems

what are the primitives of representation, i.e. what elements shall be used in such "definitions"

how to determine the exact set of elements in a definition: is faithful an inherent property of dog? Is peace really a period?

should representations have additional structure? Is it only a list or maybe a graph?

The distributional approach

Two words are similar in meaning if they appear in similar contexts. Typically we represent words using real-valued vectors in a way that Euclidean distance between vectors is proportional to the similarity of contexts.

Advantages

Robust, can be constructed from large unannotated data, which is available nowadays

Has proven useful in virtually all NLP tasks

Problems:

non-transparent representation: we cannot truly understand the meaning of a representation (e.g. the meaning of dimension $i$)

cannot handle rare words - and a large part of any data is rare words!

11.3 Lexical relationships: synonymy, homonymy, hypernymy

Synonyms

Pairs of words that mean roughly the same thing are called synonyms

dog - canine
buy - purchase

Q: are there "perfect synonyms", ever, in any language? Depends on our definition of meaning!

Hypernyms, hyponyms

A word is a hypernym of another if it is a broader or more general concept of which the other is a special case, e.g. mammal is the hypernym of dog, rectangle is the hypernym of square.

We also say that dog is a hyponym of mammal and square is a hyponym of rectangle.

Q: in what way is this similar to the IS_A relationship in programming?

Homonyms, homophones

Bank (as in financial institution) and bank (as in the bank of a river) are homonyms (they are spelled the same but have very different meanings)

Q: glass (material) and glass (dish) are not homonyms, but why?

Two and too are homophones, which means they are pronounced the same

11.4 Lexical ontologies

Some examples are:

WordNet
FrameNet
4lang

WordNet

widely used lexical database (project website: https://wordnet.princeton.edu/)

groups words into sets of synonyms (synsets), and models semantic relationships among them

available for dozens of languages (including Hungarian, see http://rgai.inf.u-szeged.hu/index.php?lang=en&page=HuWN)

WordNet example

WordNet example (cont'd)

FrameNet

Website: https://framenet.icsi.berkeley.edu/fndrupal/

A resource based on Frame Semantics (see e.g. Fillmore & Baker 2001)

Frames are script-like structures that represent a situation, event or object, and lists its typical participants or props, which are called event roles

Here's an example

Has been used to train semantic parsers / semantic role labelers, e.g. SEMAFOR

11.5 Semantic similarity

Task definition

Measure the degree to which the meaning of two words are similar

e.g. cat and dog are more similar than cat and car

Not a precise definition - that would require a model of meaning

Datasets are created based on the human intuition of hundreds of annotators

Motivation

various NLP tasks benefit from a similarity metric, e.g. machine translation, info retrieval (search).

for any task, extra data for rare words may be obtained through similar but more frequent words

models of word meaning can be evaluated based on their inherent concept of semantic distance/similarity

Distributional approaches

cosine similarity of word vectors is expected to be proportional to semantic similarity

e.g. nearest neighbors in the glove.6B.50d embedding:

Distributional approaches - example

words closest to king:


prince	0.824
queen	0.784
ii	0.775
emperor	0.774
son	0.767

words closest to dog:


cat	0.922
dogs	0.851
horse	0.791
puppy	0.775
pet	0.772

Distributional approaches - example

Not as reliable with less frequent words, e.g. opossum:


four-eyed	0.752
raccoon	0.717
songbird	0.704

Or woodpecker:


pileated	0.805
ivory-billed	0.72
red-cockaded	0.71

Ontology-based approaches

Distance between words in lexical graphs such as WordNet is also used as a source of semantic similarity

Path similarity in wordnet between dog and some other synsets:


canine	0.5
wolf	0.33
cat	0.2
refrigerator	0.07