SippyCup is a simple semantic parser, written in Python, created purely for didactic purposes. The design favors simplicity and readability over efficiency and performance. The goal is to make semantic parsing look easy!
SippyCup demonstrates an approach to semantic parsing based around:
We present applications of SippyCup to three different domains:
SippyCup was inspired by, and partly adapted from, the demonstration code published as a companion to Liang & Potts 2015, "Bringing machine learning and compositional semantics together". It was developed primarily for the benefit of students in Stanford's CS224U: Natural Language Understanding, and therefore contains exercises (without solutions) for use in the class. However, it should be of use to anyone interested in learning about semantic parsing.
SippyCup remains a work in progress, and you will find a number of TODOs throughout this codelab and the accompanying Python codebase. You will likely also find errors! You can help to contribute to SippyCup by sending corrections to the author or by sending a pull request to the SippyCup GitHub repository.
Grammar
classGeoNamesAnnotator
GraphKB
classGraphKBExecutor
classGraphKBExecutor
with GeobaseSemantic parsing is a computation which takes a linguistic input and returns as output a structured, machine-readable representation of its meaning, known as the semantic representation (or, informally, "the semantics"). The semantic representation is a computational object that captures key elements of the meaning of the input and enables the machine to respond appropriately. Depending upon the application, it can be a simple string, a tree, a nested map, an XML document, an SQL query, or a real-valued vector. In many cases, the semantic representation can be viewed as a little program, or as a parameterized request to a backend service.
For example, in a question-answering application, we may want to map a linguistic input such as
"how tall is obama"
into a semantic representation such as
(/person/height /m/02mjmr)
Of course, this is useful only if there exists some other component that can make sense of this semantic representation and take appropriate action — for example, by interpreting it as a query against Freebase and retrieving the answer, namely 1.85 meters.
The downstream component which consumes the output of the semantic parser is known as the executor. It is a function which takes a semantic representation as input and performs some computation specified by the semantics. If it returns an output, the output is known as the denotation.
The idea is to achieve a separation of concerns. We wish to endow the overall system with the ability to respond to linguistic inputs. The semantic parser focuses on dealing with the complexities of human language: polysemy, syntactic ambiguity, indexicality and other forms of context-dependence, ellipsis, and so on. It emits an output which captures the key elements of the user's intent in a clear, unambiguous, fully-grounded, machine-readable representation. The executor is thereby freed of the burden of interpreting language, and can focus on its application-specific business.
Of course, semantic representations need not look like the example shown here. As we'll see in a moment, different problem settings may demand quite different semantic representations.
In this codelab, we'll take a close look at three example applications of semantic parsing: one rather artificial, the others more realistic.
Our first case study will follow Liang & Potts 2015 in considering the problem of interpreting natural language arithmetic expressions, such as:
"one plus one"
"minus three minus two"
"three plus three minus two"
"two times two plus three"
In this problem setting, we may wish to produce semantic representations which are binary expression trees:
(+ 1 1)
(- (- 3) 2)
(- (+ 3 3) 2)
(+ (* 2 2) 3)
These representations fully resolve the ambiguity present in the linguistic inputs, and can be evaluated by a simple, deterministic executor to produce a numeric answer.
In our second case study, we'll examine a more realistic application: the domain of queries about travel and transportation, such as:
"birmingham al distance from indianapolish in"
"directions from washington to canada"
"discount travel flights to austin texas"
A possible style of semantic representation (described further in Unit 2) for this domain is:
{domain: travel, type: distance,
origin: {id: 4259418, name: "Indianapolis, IN, US"},
destination: {id: 4049979, name: "Birmingham, AL, US"}}
{domain: travel, type: directions,
origin: {id: 4140963, name: "Washington, DC, US"},
destination: {id: 6251999, name: "Canada"}}
{domain: travel, mode: air,
destination: {id: 4671654, name: "Austin, TX, US"}}
Here the request types and transportation modes have been resolved to canonical values, and the locations have been resolved to unique identifiers in the GeoNames geographical database. It's easy to imagine passing representations like these to a backend API, such a route-planning service, for execution.
Our third case study will focus on the Geo880 corpus developed in Ray Mooney's group at UT Austin, which has for many years served as a standard evaluation for semantic parsing systems. (See, for example, Zelle & Mooney 1996, Tang & Mooney 2001, Zettlemoyer & Collins 2005, and Liang et al. 2011.) The Geo880 corpus contains 880 questions about U.S. geography, often with complex compositional structure. Examples include:
"which states border texas?"
"how many states border the largest state?"
"what is the size of the capital of texas?"
Here again, many styles of semantic representation are possible. One possibility (described further in Unit 3) is:
(.and state (borders /state/texas))
(.count (.and state (borders (.argmax area state))))
((/state/texas capital) population)
The Geo880 corpus is a paired with a simple geographical database known as Geobase. A suitable executor can evaluate these semantic representations against Geobase to return answers.
Academic researchers have explored many other potential applications for semantic parsing, including:
"go to the third junction and take a right"
"how many countries use the euro"
"there is a room with a desk and a red chair"
"phalanxes are twice as effective at defending cities as warriors"
"an amusement park sells two kinds of tickets ..."
In each case, the task is to map a linguistic input into a structured, machine-readable representation of meaning, on which some downstream component can take action.
(TODO: flesh out a section explaining the key challenges of semantic parsing.)
When developing a semantic parsing system for a new domain, one of the first tasks must be to choose a good semantic representation. After all, this representation is the target output of the system, so its design will drive many other design decisions. We can't control the form of the linguistic inputs, but we can control the design of the semantic outputs.
(Actually, in some problem settings, we have to work with a downstream executor over whose request language we have no control. For example, the executor may be an API provided by a third party, or it may be a relational database which only responds to SQL queries. However, even in such situations, we have the opportunity to control the design of our semantic representation, by inserting into the stack an adapter which performs a deterministic conversion from our semantic representation into the request language of the executor.)
So what constitutes a good semantic representation? We can't make rigid generalizations, but we can identify some guiding principles. Above all, in order to achieve a separation of concerns between the semantic parsing system and the executor, the semantic representation must be straightforwardly machine-readable by the executor. Machine-readability means that executor can immediately understand what it needs to do, and get straight to work. (Actually, machine-interpretable would be a more accurate term. In some limited sense, any ASCII-encoded text is machine-readable, but it is usually not machine-interpretable. But we'll stick with the standard term.) With consideration, two hallmarks of machine-readability emerge: ambiguity resolution and canonicalization.
Ambiguity resolution means that if an input has two different meanings, it should have two different semantic representations. For example, consider the input "how big is new york". This has at least four different meanings, depending on whether we take "big" to refer to population or area, and on whether we take "new york" to refer to the city or the state. If all four interpretations are valid, then our semantic parser should generate four semantic representations, perhaps along with some kind of scores. But when the executor receives one of these semantic representations as input, it has no decisions to make — it can take action immediately without having to do further interpretation.
Canonicalization is the obverse of ambiguity resolution: it means that if two inputs have the same meaning, they should have the same semantic representation. For example, consider the inputs "nyc population" and "how many people live in new york city". These have the same meaning, so they should have the same representation — a canonical representation. Or consider the inputs "next thanksgiving" and "november 26". As of this writing (in early 2015), these phrases refer to the same date, so they should have the same representation — perhaps the ISO 8601 representation "2015-11-26". Canonicalization thus means using unique identifiers for entities, and fully grounding expressions whose meaning depends on context.
In some problem settings, canonicalization is less critical than ambiguity resolution, because some downstream executors can tolerate two different ways of saying the same thing. But the great virtue of canonicalization is that it makes checking semantic equivalence as easy as checking equality of the semantic representations. For example, the semantic equivalence of "two forty five pm" and "quarter of three in the afternoon" may not be immediately obvious (to a machine), but becomes trivial if both are represented by 14:45.
In the example applications above, we showed three quite different styles of semantic representation. Indeed, most academic research in semantic parsing has used application-specific semantic representations, although many of these share some common elements (such as the use of the simply typed lambda calculus).
But a natural question is: couldn't we design a single, universal semantic representation, capable of representing in machine-readable form all meanings which can be expressed in natural language? Such a representation would not only spare us the trouble of inventing a new representation for each application; it would also open the door to effortlessly combining semantic parsing models which had been developed for different domains. For example, we can imagine combining an arithmetic model with a travel model to interpret inputs like "four times the distance from boston to miami".
It's a tempting idea, and an old one. Leibniz, for example, dreamed of creating a characteristica universalis which would serve as a sort of algebra of ideas and thereby endow all thought with the exactitude of arithmetic. Since Leibniz's time, there have been many attempts to construct universal logical languages for human use, such as Lojban. However, such languages fail the test of true machine-readability. The ideal representation should enable a machine to take appropriate action, for example, by testing equivalence or performing simple inferences.
One recent creation that might seem like a better candidate is the Abstract Meaning Representation (AMR) being developed at ISI. As a simple example, all of these inputs:
the boy wants the girl to believe him
the boy wants to be believed by the girl
the boy has a desire to be believed by the girl
the boy's desire is for the girl to believe him
the boy is desirous of the girl believing him
are represented by the same AMR:
(w / want-01
:ARG0 (b / boy)
:ARG1 (b2 / believe-01
:ARG0 (g / girl)
:ARG1 b))
(Note that the variable b
appears twice, once as the :ARG0
of want-01
, and once as the :ARG1
of believe-01
. You can learn more about how AMR works by reading the AMR specification.)
These examples might give us hope that AMR can satisfy our goals for a semantic representation. By abstracting away from certain lexical choices ("want" vs. "desire") and syntactic choices (active vs. passive), AMR canonicalizes these inputs to a common representation, and thus renders obvious their semantic equivalence.
But as Hal Daumé III argues in a persuasive blog post, AMR doesn't go nearly far enough in this direction to serve as a representation over which we can reliably perform automated reasoning. It resolves some kinds of ambiguity, but not others. It achieves some canonicalization, but not all. There are many kinds of semantically equivalent transformations — lexical, compositional, logical — whose AMR formulations are not obviously equivalent. In fact, AMR seems to be a bit of a hodge-podge, reflecting some information about lexical choice, some about syntactic structure, and some about semantic content. In many ways, it looks like a souped-up version of semantic role labeling. Like SRL, AMR may be useful for some purposes, but it can't reliably be used as a vehicle for inference. (To be fair, the creators of AMR never promised this — they have been motivated rather by the goals of machine translation.)
(Note that if AMR could be used for automated reasoning, then a good AMR parser would provide a simple solution to the problem of recognizing textual entailment (RTE), that is, recognizing whether one natural-language sentence can be inferred from another. One would simply parse both sentences into AMR, and then apply an automated reasoner to check entailment. To my knowledge, no one has demonstrated that this is possible.)
In short, my view is that AMR is not a suitable target representation for semantic parsing. Indeed, after decades (or even centuries!) of failures, I'm skeptical of any attempt to define a universal, machine-readable semantic representation, and prefer instead to develop application-specific representations as needed.
(TODO)
The codelab you're reading has been published as an IPython Notebook, which makes it easy to combine text and interactive Python code. If you're new to Python, you might find this Python tutorial useful.
However, SippyCup is more than this codelab. First and foremost, it is a library of Python source files, which should be found in the same directory as the codelab you're reading. If not, you can find the codelab and the source files together in the SippyCup GitHub repository. Note that this codelab uses import
statements to pull in definitions of certain utility classes and functions from the accompanying source files. If the source files are missing, running the codelab code interactively will cause those imports to fail.
You will observe that the SippyCup code architecture does not always adhere to the principles of object-oriented design. Functionality which might naturally be captured by instance methods of classes will in some cases appear instead in standalone static functions. This is in order to facilitate an incremental presentation in this codelab format.
Let's get started with Unit 1!
Copyright (C) 2015 Bill MacCartney