Penman API Demo

This notebook demonstrates the basic usage of the Penman API. For an overview of what Penman does, see the project page. For API documentation, see here.

To start, import the penman module.


In [1]:
import penman
penman.__version__


Out[1]:
'1.0.0'

Basic Decoding and Encoding

A common task is reading a PENMAN string into a graph object. The simplest way to do this is with penman.decode():


In [2]:
g = penman.decode('''
  # ::snt The dog didn't bark
  (b / bark-01
     :polarity -
     :ARG0 (d / dog))''')
g


Out[2]:
<Graph object (top=b) at 140670460826048>

The penman.encode() function can serialize a graph back to PENMAN notation (note that the metadata is also printed):


In [3]:
print(penman.encode(g))


# ::snt The dog didn't bark
(b / bark-01
   :polarity -
   :ARG0 (d / dog))

You may customize things like indentation:


In [4]:
print(penman.encode(g, indent=None))  # single-line


# ::snt The dog didn't bark
(b / bark-01 :polarity - :ARG0 (d / dog))

In [5]:
print(penman.encode(g, indent=6, compact=True))  # attributes following concepts printed on same line


# ::snt The dog didn't bark
(b / bark-01 :polarity -
      :ARG0 (d / dog))

Graph Introspection and Manipulation

The Graph object returned by decode() has methods for inspecting things like the variables and different types of edges.


In [6]:
g.variables()


Out[6]:
{'b', 'd'}

In [7]:
g.instances()


Out[7]:
[Instance(source='b', role=':instance', target='bark-01'),
 Instance(source='d', role=':instance', target='dog')]

In [8]:
g.attributes()


Out[8]:
[Attribute(source='b', role=':polarity', target='-')]

In [9]:
g.edges()


Out[9]:
[Edge(source='b', role=':ARG0', target='d')]

In [10]:
g.metadata


Out[10]:
{'snt': "The dog didn't bark"}

You may also view and modify the full list of triples and the metadata directly:


In [11]:
g.triples


Out[11]:
[('b', ':instance', 'bark-01'),
 ('b', ':polarity', '-'),
 ('b', ':ARG0', 'd'),
 ('d', ':instance', 'dog')]

In [12]:
g.triples.extend([('b', ':location', 'g'), ('g', ':instance', 'garden')])
g.triples


Out[12]:
[('b', ':instance', 'bark-01'),
 ('b', ':polarity', '-'),
 ('b', ':ARG0', 'd'),
 ('d', ':instance', 'dog'),
 ('b', ':location', 'g'),
 ('g', ':instance', 'garden')]

In [13]:
g.metadata['snt'] = "The dog didn't bark in the garden."

In [14]:
print(penman.encode(g))


# ::snt The dog didn't bark in the garden.
(b / bark-01
   :polarity -
   :ARG0 (d / dog)
   :location (g / garden))

Advanced Decoding and Encoding

Penman's decoding strategy has 3 stages: first it starts with a PENMAN string and parses it to a tree structure, then it interprets the tree structure to produce a pure graph. Earlier when we called the penman.decode() function, we performed the tree-parsing and graph-interpretation in one call. It is also possible to perform these steps separately with the penman.parse() and penman.interpret(), or to parse to a tree without ever interpreting the graph. This is useful if you prefer to work with AMR data as trees than as pure graphs, or if you wish to use some of Penman's tree transformations.


In [15]:
t = penman.parse('''
  # ::snt The dog didn't bark
  (b / bark-01
     :polarity -
     :ARG0 (d / dog))''')
t


Out[15]:
Tree(('b', [('/', 'bark-01'), (':polarity', '-'), (':ARG0', ('d', [('/', 'dog')]))]))

Getting the graph from the tree then requires a separate call to penman.interpret():


In [16]:
g = penman.interpret(t)
g


Out[16]:
<Graph object (top=b) at 140670460395632>

We can also go the other way; call penman.configure() to get a tree from a graph, and finally penman.format() to get a string again:


In [17]:
t2 = penman.configure(g)
print(penman.format(t2))


# ::snt The dog didn't bark
(b / bark-01
   :polarity -
   :ARG0 (d / dog))

The interface between trees and graphs is defined in the penman.layout module. Both penman.interpret() and penman.configure() are just aliases for penman.layout.interpret() and penman.layout.configure().

Tree Inspection and Manipulation

Tree objects are simple structures that contain a node data attribute as a (var, branches) pair, where var is the node's variable and branches is a list of (branch_label, target) pairs. branch_label is like a graph role, but it is not normalized for inversion and concept branches use the / label instead of the :instance role. target is either an atomic type (e.g., a string) or, recursively, another node. Tree objects also contain metadata.


In [18]:
t.node


Out[18]:
('b', [('/', 'bark-01'), (':polarity', '-'), (':ARG0', ('d', [('/', 'dog')]))])

In [19]:
t.metadata


Out[19]:
{'snt': "The dog didn't bark"}

Tree.nodes() traverses the tree and returns a flat list of the nodes in the tree (but the nodes themselves are not flat):


In [20]:
t.nodes()


Out[20]:
[('b',
  [('/', 'bark-01'), (':polarity', '-'), (':ARG0', ('d', [('/', 'dog')]))]),
 ('d', [('/', 'dog')])]

Tree.reset_variables() reassigns the node variables based on their appearance in the tree. It takes a formatting parameter with a few possible replacements (see the documentation for details):


In [21]:
t.reset_variables('a{i}')
t


Out[21]:
Tree(('a0', [('/', 'bark-01'), (':polarity', '-'), (':ARG0', ('a1', [('/', 'dog')]))]))

In [22]:
t.reset_variables('{prefix}{j}')
t


Out[22]:
Tree(('b', [('/', 'bark-01'), (':polarity', '-'), (':ARG0', ('d', [('/', 'dog')]))]))

Using Models

In Penman, the interpretation of a graph from a tree relies on a Model to determine things like whether a role is inverted. By default, a basic model with no special roles defined is used, and this is often enough:


In [23]:
g = penman.decode('''
  # ::snt The dog that barked slept.
  (s / sleep-01
     :ARG0 (d / dog
              :ARG0-of (b / bark-01)))''')
g.edges()  # note that edge directions are normalized


Out[23]:
[Edge(source='s', role=':ARG0', target='d'),
 Edge(source='b', role=':ARG0', target='d')]

AMR, however, has some roles that use -of in their primary, or non-inverted, form, which can lead to invalid graphs:


In [24]:
g = penman.decode('''
  # ::snt I bought a ceramic knife
  (b / buy-01
     :ARG0 (i / i)
     :ARG1 (k / knife
              :consist-of (c / ceramic)))''')
g.edges()


Out[24]:
[Edge(source='b', role=':ARG0', target='i'),
 Edge(source='b', role=':ARG1', target='k'),
 Edge(source='c', role=':consist', target='k')]

Instead, by using the AMR-specific model, these edges are correctly interpreted:


In [25]:
from penman.models import amr
g = penman.decode('''
  # ::snt I bought a ceramic knife
  (b / buy-01
     :ARG0 (i / i)
     :ARG1 (k / knife
              :consist-of (c / ceramic)))''',
          model=amr.model)
g.edges()


Out[25]:
[Edge(source='b', role=':ARG0', target='i'),
 Edge(source='b', role=':ARG1', target='k'),
 Edge(source='k', role=':consist-of', target='c')]

If you don't want to pass in the model object each time, you can create a PENMANCodec object with a model, then it can decode and encode using the model. The only difference between using the codec object or the module functions is how the model is specified, as shown above.


In [26]:
amrcodec = penman.PENMANCodec(model=amr.model)
amrcodec.decode('(k / knife :consist-of (c / ceramic))').edges()


Out[26]:
[Edge(source='k', role=':consist-of', target='c')]

Models are also useful as a source of information for transformations, as shown in the next section.

Transformations

Penman's transformations sometimes modify the content of the graph and other times only restructure how the graph is displayed. They rely on a Model for information on how to apply the transformations.


In [27]:
from penman import transform

Consider the following graph which is erroneous because an inverted relation specifies a constant (meaning the source of the normalized relation is not a node):


In [28]:
g = amrcodec.decode('''
  (c / chapter
     :domain-of 7)''')  # this will log a warning


WARNING:penman.layout:cannot deinvert attribute: ('c', ':domain-of', '7')

In [29]:
g.attributes()  # note that it is not normalized


Out[29]:
[Attribute(source='c', role=':domain-of', target='7')]

In AMR, the inverted :domain-of relation has the canonical form :mod which is not inverted and thus eligible for specifying attributes. Note, however, that even decoding with the AMR model above did not convert the :domain-of into the more canonical :mod automatically as doing so would change the triples. To fix the error, you can use transform.canonicalize_roles() with the AMR model. It works on the tree structure, so we first reparse it as a tree:


In [30]:
t = penman.parse('''
  (c / chapter
     :domain-of 7)''')
t2 = transform.canonicalize_roles(t, model=amr.model)
print(penman.format(t2))


(c / chapter
   :mod 7)

Reification is another kind of transformation. It works on graphs. There are two kinds of reification in Penman, and the first is transform.reify_edges(), which does reification as defined by the AMR guidelines:


In [31]:
g = penman.interpret(t2, model=amr.model)  # get a graph from the tree
g2 = transform.reify_edges(g, model=amr.model)  # :mod -> have-mod-91 is defined by the AMR model
print(amrcodec.encode(g2))


(c / chapter
   :ARG1-of (_ / have-mod-91
               :ARG2 7))

There is also transform.reify_attributes() which replaces attribute values with nodes. This is another way one could deal with the warning above about interpretation being unable to deinvert an attribute. As this procedure is not defined by a model, the function does not take one:


In [32]:
g3 = transform.reify_attributes(g)
print(amrcodec.encode(g3))


(c / chapter
   :mod (_ / 7))

Finally, there are some transformations defined by other parts of Penman. We've already seen Tree.reset_variables(). Two others are defined in the penman.layout module.

First, layout.rearrange() will reorder branches in the tree without otherwise changing its structure. For example:


In [33]:
from penman import layout
t = penman.parse('''
  (t / try-01
     :ARG1 (c / chase-01
              :ARG1 (c2 / cat)
              :ARG0 (d / dog))
     :ARG0 d)''')
layout.rearrange(t, key=amr.model.canonical_order)
print(penman.format(t))


(t / try-01
   :ARG0 d
   :ARG1 (c / chase-01
            :ARG0 (d / dog)
            :ARG1 (c2 / cat)))

Next, layout.reconfigure() performs more significant structure changes to the graph.


In [34]:
g = penman.interpret(t)  # or layout.interpret()
t2 = layout.reconfigure(g, key=amr.model.canonical_order)
print(penman.format(t2))


(t / try-01
   :ARG0 (d / dog
            :ARG0-of (c / chase-01
                        :ARG1 (c2 / cat)))
   :ARG1 c)

Command-line Utility

Many of the operations described above are available via the command-line penman utility. For more information, see the documentation.