Exercise 1
In [1]:
# Today is Saturday, the 5th of May 2018 and my name is Julia, living in [address],
# and I'm currently fullfilling an nltk book assignment.
Exercise 2
In [3]:
# used in sense of 'no matter how':
# “However beautiful the strategy, you should occasionally look at the results.”
# “However bad you think you’re going to be in that room, not being there is worse.”
# used as connector:
# "However, they kept on, with unabated perseverance."
Exercise 3
In [14]:
# (Kim arrived) or (Dana left and everyone cheered).
# (Kim arrived or Dana left) and everyone cheered.
import nltk, pprint, re
grammar = nltk.CFG.fromstring("""
S -> NP VP
S -> S Conj S
VP -> "arrived" | "left" | "cheered"
NP -> "Kim" | "Dana" | "everyone"
Conj -> "and" | "or"
""")
sr_parse = nltk.ShiftReduceParser(grammar, trace=2)
sent = 'Kim arrived or Dana left and everyone cheered'.split()
for tree in sr_parse.parse(sent):
print tree
Parsing u'Kim arrived or Dana left and everyone cheered'
[ * Kim arrived or Dana left and everyone cheered]
S [ 'Kim' * arrived or Dana left and everyone cheered]
R [ NP * arrived or Dana left and everyone cheered]
S [ NP 'arrived' * or Dana left and everyone cheered]
R [ NP VP * or Dana left and everyone cheered]
R [ S * or Dana left and everyone cheered]
S [ S 'or' * Dana left and everyone cheered]
R [ S Conj * Dana left and everyone cheered]
S [ S Conj 'Dana' * left and everyone cheered]
R [ S Conj NP * left and everyone cheered]
S [ S Conj NP 'left' * and everyone cheered]
R [ S Conj NP VP * and everyone cheered]
R [ S Conj S * and everyone cheered]
R [ S * and everyone cheered]
S [ S 'and' * everyone cheered]
R [ S Conj * everyone cheered]
S [ S Conj 'everyone' * cheered]
R [ S Conj NP * cheered]
S [ S Conj NP 'cheered' * ]
R [ S Conj NP VP * ]
R [ S Conj S * ]
R [ S * ]
(S
(S (S (NP Kim) (VP arrived)) (Conj or) (S (NP Dana) (VP left)))
(Conj and)
(S (NP everyone) (VP cheered)))
In [16]:
grammar = nltk.PCFG.fromstring("""
S -> NP VP [0.6]
S -> S Conj S [0.4]
VP -> "arrived" | "left" | "cheered" [1.0]
NP -> "Kim" | "Dana" | "everyone" [1.0]
Conj -> "and" | "or" [1.0]
""")
viterbi_parse = nltk.ViterbiParser(grammar, trace=2)
sent = 'Kim arrived or Dana left and everyone cheered'.split()
for tree in viterbi_parse.parse(sent):
print tree
Inserting tokens into the most likely constituents table...
Insert: |=.......| Kim
Insert: |.=......| arrived
Insert: |..=.....| or
Insert: |...=....| Dana
Insert: |....=...| left
Insert: |.....=..| and
Insert: |......=.| everyone
Insert: |.......=| cheered
Finding the most likely constituents spanning 1 text elements...
Insert: |=.......| NP -> 'Kim' [0]
Insert: |.=......| VP -> 'arrived' [0]
Insert: |..=.....| Conj -> 'or' [1.0]
Insert: |...=....| NP -> 'Dana' [0]
Insert: |....=...| VP -> 'left' [0]
Insert: |.....=..| Conj -> 'and' [0]
Insert: |......=.| NP -> 'everyone' [1.0]
Insert: |.......=| VP -> 'cheered' [1.0]
Finding the most likely constituents spanning 2 text elements...
Insert: |==......| S -> NP VP [0.6]
Insert: |...==...| S -> NP VP [0.6]
Insert: |......==| S -> NP VP [0.6]
Finding the most likely constituents spanning 3 text elements...
Finding the most likely constituents spanning 4 text elements...
Finding the most likely constituents spanning 5 text elements...
Insert: |=====...| S -> S Conj S [0.4]
Insert: |...=====| S -> S Conj S [0.4]
Finding the most likely constituents spanning 6 text elements...
Finding the most likely constituents spanning 7 text elements...
Finding the most likely constituents spanning 8 text elements...
Insert: |========| S -> S Conj S [0.4]
Discard: |========| S -> S Conj S [0.4]
Discard: |========| S -> S Conj S [0.4]
(S
(S (NP Kim) (VP arrived))
(Conj or)
(S
(S (NP Dana) (VP left))
(Conj and)
(S (NP everyone) (VP cheered)))) (p=0)
Exercise 4
In [17]:
from nltk import Tree
help(Tree)
Help on class Tree in module nltk.tree:
class Tree(__builtin__.list)
| A Tree represents a hierarchical grouping of leaves and subtrees.
| For example, each constituent in a syntax tree is represented by a single Tree.
|
| A tree's children are encoded as a list of leaves and subtrees,
| where a leaf is a basic (non-tree) value; and a subtree is a
| nested Tree.
|
| >>> from nltk.tree import Tree
| >>> print(Tree(1, [2, Tree(3, [4]), 5]))
| (1 2 (3 4) 5)
| >>> vp = Tree('VP', [Tree('V', ['saw']),
| ... Tree('NP', ['him'])])
| >>> s = Tree('S', [Tree('NP', ['I']), vp])
| >>> print(s)
| (S (NP I) (VP (V saw) (NP him)))
| >>> print(s[1])
| (VP (V saw) (NP him))
| >>> print(s[1,1])
| (NP him)
| >>> t = Tree.fromstring("(S (NP I) (VP (V saw) (NP him)))")
| >>> s == t
| True
| >>> t[1][1].set_label('X')
| >>> t[1][1].label()
| 'X'
| >>> print(t)
| (S (NP I) (VP (V saw) (X him)))
| >>> t[0], t[1,1] = t[1,1], t[0]
| >>> print(t)
| (S (X him) (VP (V saw) (NP I)))
|
| The length of a tree is the number of children it has.
|
| >>> len(t)
| 2
|
| The set_label() and label() methods allow individual constituents
| to be labeled. For example, syntax trees use this label to specify
| phrase tags, such as "NP" and "VP".
|
| Several Tree methods use "tree positions" to specify
| children or descendants of a tree. Tree positions are defined as
| follows:
|
| - The tree position *i* specifies a Tree's *i*\ th child.
| - The tree position ``()`` specifies the Tree itself.
| - If *p* is the tree position of descendant *d*, then
| *p+i* specifies the *i*\ th child of *d*.
|
| I.e., every tree position is either a single index *i*,
| specifying ``tree[i]``; or a sequence *i1, i2, ..., iN*,
| specifying ``tree[i1][i2]...[iN]``.
|
| Construct a new tree. This constructor can be called in one
| of two ways:
|
| - ``Tree(label, children)`` constructs a new tree with the
| specified label and list of children.
|
| - ``Tree.fromstring(s)`` constructs a new tree by parsing the string ``s``.
|
| Method resolution order:
| Tree
| __builtin__.list
| __builtin__.object
|
| Methods defined here:
|
| __add__(self, v)
|
| __delitem__(self, index)
|
| __eq__(self, other)
|
| __ge__ lambda self, other
|
| __getitem__(self, index)
|
| __gt__ lambda self, other
|
| __init__(self, node, children=None)
|
| __le__ lambda self, other
|
| __lt__(self, other)
|
| __mul__(self, v)
|
| __ne__ lambda self, other
| # @total_ordering doesn't work here, since the class inherits from a builtin class
|
| __radd__(self, v)
|
| __repr__(self)
|
| __rmul__(self, v)
|
| __setitem__(self, index, value)
|
| __str__(self)
|
| __unicode__ = __str__(self)
|
| chomsky_normal_form(self, factor=u'right', horzMarkov=None, vertMarkov=0, childChar=u'|', parentChar=u'^')
| This method can modify a tree in three ways:
|
| 1. Convert a tree into its Chomsky Normal Form (CNF)
| equivalent -- Every subtree has either two non-terminals
| or one terminal as its children. This process requires
| the creation of more"artificial" non-terminal nodes.
| 2. Markov (vertical) smoothing of children in new artificial
| nodes
| 3. Horizontal (parent) annotation of nodes
|
| :param factor: Right or left factoring method (default = "right")
| :type factor: str = [left|right]
| :param horzMarkov: Markov order for sibling smoothing in artificial nodes (None (default) = include all siblings)
| :type horzMarkov: int | None
| :param vertMarkov: Markov order for parent smoothing (0 (default) = no vertical annotation)
| :type vertMarkov: int | None
| :param childChar: A string used in construction of the artificial nodes, separating the head of the
| original subtree from the child nodes that have yet to be expanded (default = "|")
| :type childChar: str
| :param parentChar: A string used to separate the node representation from its vertical annotation
| :type parentChar: str
|
| collapse_unary(self, collapsePOS=False, collapseRoot=False, joinChar=u'+')
| Collapse subtrees with a single child (ie. unary productions)
| into a new non-terminal (Tree node) joined by 'joinChar'.
| This is useful when working with algorithms that do not allow
| unary productions, and completely removing the unary productions
| would require loss of useful information. The Tree is modified
| directly (since it is passed by reference) and no value is returned.
|
| :param collapsePOS: 'False' (default) will not collapse the parent of leaf nodes (ie.
| Part-of-Speech tags) since they are always unary productions
| :type collapsePOS: bool
| :param collapseRoot: 'False' (default) will not modify the root production
| if it is unary. For the Penn WSJ treebank corpus, this corresponds
| to the TOP -> productions.
| :type collapseRoot: bool
| :param joinChar: A string used to connect collapsed node values (default = "+")
| :type joinChar: str
|
| copy(self, deep=False)
|
| draw(self)
| Open a new window containing a graphical diagram of this tree.
|
| flatten(self)
| Return a flat version of the tree, with all non-root non-terminals removed.
|
| >>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
| >>> print(t.flatten())
| (S the dog chased the cat)
|
| :return: a tree consisting of this tree's root connected directly to
| its leaves, omitting all intervening non-terminal nodes.
| :rtype: Tree
|
| freeze(self, leaf_freezer=None)
|
| height(self)
| Return the height of the tree.
|
| >>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
| >>> t.height()
| 5
| >>> print(t[0,0])
| (D the)
| >>> t[0,0].height()
| 2
|
| :return: The height of this tree. The height of a tree
| containing no children is 1; the height of a tree
| containing only leaves is 2; and the height of any other
| tree is one plus the maximum of its children's
| heights.
| :rtype: int
|
| label(self)
| Return the node label of the tree.
|
| >>> t = Tree.fromstring('(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))')
| >>> t.label()
| 'S'
|
| :return: the node label (typically a string)
| :rtype: any
|
| leaf_treeposition(self, index)
| :return: The tree position of the ``index``-th leaf in this
| tree. I.e., if ``tp=self.leaf_treeposition(i)``, then
| ``self[tp]==self.leaves()[i]``.
|
| :raise IndexError: If this tree contains fewer than ``index+1``
| leaves, or if ``index<0``.
|
| leaves(self)
| Return the leaves of the tree.
|
| >>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
| >>> t.leaves()
| ['the', 'dog', 'chased', 'the', 'cat']
|
| :return: a list containing this tree's leaves.
| The order reflects the order of the
| leaves in the tree's hierarchical structure.
| :rtype: list
|
| pformat(self, margin=70, indent=0, nodesep=u'', parens=u'()', quotes=False)
| :return: A pretty-printed string representation of this tree.
| :rtype: str
| :param margin: The right margin at which to do line-wrapping.
| :type margin: int
| :param indent: The indentation level at which printing
| begins. This number is used to decide how far to indent
| subsequent lines.
| :type indent: int
| :param nodesep: A string that is used to separate the node
| from the children. E.g., the default value ``':'`` gives
| trees like ``(S: (NP: I) (VP: (V: saw) (NP: it)))``.
|
| pformat_latex_qtree(self)
| Returns a representation of the tree compatible with the
| LaTeX qtree package. This consists of the string ``\Tree``
| followed by the tree represented in bracketed notation.
|
| For example, the following result was generated from a parse tree of
| the sentence ``The announcement astounded us``::
|
| \Tree [.I'' [.N'' [.D The ] [.N' [.N announcement ] ] ]
| [.I' [.V'' [.V' [.V astounded ] [.N'' [.N' [.N us ] ] ] ] ] ] ]
|
| See http://www.ling.upenn.edu/advice/latex.html for the LaTeX
| style file for the qtree package.
|
| :return: A latex qtree representation of this tree.
| :rtype: str
|
| pos(self)
| Return a sequence of pos-tagged words extracted from the tree.
|
| >>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
| >>> t.pos()
| [('the', 'D'), ('dog', 'N'), ('chased', 'V'), ('the', 'D'), ('cat', 'N')]
|
| :return: a list of tuples containing leaves and pre-terminals (part-of-speech tags).
| The order reflects the order of the leaves in the tree's hierarchical structure.
| :rtype: list(tuple)
|
| pprint(self, **kwargs)
| Print a string representation of this Tree to 'stream'
|
| pretty_print(self, sentence=None, highlight=(), stream=None, **kwargs)
| Pretty-print this tree as ASCII or Unicode art.
| For explanation of the arguments, see the documentation for
| `nltk.treeprettyprinter.TreePrettyPrinter`.
|
| productions(self)
| Generate the productions that correspond to the non-terminal nodes of the tree.
| For each subtree of the form (P: C1 C2 ... Cn) this produces a production of the
| form P -> C1 C2 ... Cn.
|
| >>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
| >>> t.productions()
| [S -> NP VP, NP -> D N, D -> 'the', N -> 'dog', VP -> V NP, V -> 'chased',
| NP -> D N, D -> 'the', N -> 'cat']
|
| :rtype: list(Production)
|
| set_label(self, label)
| Set the node label of the tree.
|
| >>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
| >>> t.set_label("T")
| >>> print(t)
| (T (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))
|
| :param label: the node label (typically a string)
| :type label: any
|
| subtrees(self, filter=None)
| Generate all the subtrees of this tree, optionally restricted
| to trees matching the filter function.
|
| >>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
| >>> for s in t.subtrees(lambda t: t.height() == 2):
| ... print(s)
| (D the)
| (N dog)
| (V chased)
| (D the)
| (N cat)
|
| :type filter: function
| :param filter: the function to filter all local trees
|
| treeposition_spanning_leaves(self, start, end)
| :return: The tree position of the lowest descendant of this
| tree that dominates ``self.leaves()[start:end]``.
| :raise ValueError: if ``end <= start``
|
| treepositions(self, order=u'preorder')
| >>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
| >>> t.treepositions() # doctest: +ELLIPSIS
| [(), (0,), (0, 0), (0, 0, 0), (0, 1), (0, 1, 0), (1,), (1, 0), (1, 0, 0), ...]
| >>> for pos in t.treepositions('leaves'):
| ... t[pos] = t[pos][::-1].upper()
| >>> print(t)
| (S (NP (D EHT) (N GOD)) (VP (V DESAHC) (NP (D EHT) (N TAC))))
|
| :param order: One of: ``preorder``, ``postorder``, ``bothorder``,
| ``leaves``.
|
| un_chomsky_normal_form(self, expandUnary=True, childChar=u'|', parentChar=u'^', unaryChar=u'+')
| This method modifies the tree in three ways:
|
| 1. Transforms a tree in Chomsky Normal Form back to its
| original structure (branching greater than two)
| 2. Removes any parent annotation (if it exists)
| 3. (optional) expands unary subtrees (if previously
| collapsed with collapseUnary(...) )
|
| :param expandUnary: Flag to expand unary or not (default = True)
| :type expandUnary: bool
| :param childChar: A string separating the head node from its children in an artificial node (default = "|")
| :type childChar: str
| :param parentChar: A sting separating the node label from its parent annotation (default = "^")
| :type parentChar: str
| :param unaryChar: A string joining two non-terminals in a unary production (default = "+")
| :type unaryChar: str
|
| unicode_repr = __repr__(self)
|
| ----------------------------------------------------------------------
| Class methods defined here:
|
| convert(cls, tree) from __builtin__.type
| Convert a tree between different subtypes of Tree. ``cls`` determines
| which class will be used to encode the new tree.
|
| :type tree: Tree
| :param tree: The tree that should be converted.
| :return: The new Tree.
|
| fromstring(cls, s, brackets=u'()', read_node=None, read_leaf=None, node_pattern=None, leaf_pattern=None, remove_empty_top_bracketing=False) from __builtin__.type
| Read a bracketed tree string and return the resulting tree.
| Trees are represented as nested brackettings, such as::
|
| (S (NP (NNP John)) (VP (V runs)))
|
| :type s: str
| :param s: The string to read
|
| :type brackets: str (length=2)
| :param brackets: The bracket characters used to mark the
| beginning and end of trees and subtrees.
|
| :type read_node: function
| :type read_leaf: function
| :param read_node, read_leaf: If specified, these functions
| are applied to the substrings of ``s`` corresponding to
| nodes and leaves (respectively) to obtain the values for
| those nodes and leaves. They should have the following
| signature:
|
| read_node(str) -> value
|
| For example, these functions could be used to process nodes
| and leaves whose values should be some type other than
| string (such as ``FeatStruct``).
| Note that by default, node strings and leaf strings are
| delimited by whitespace and brackets; to override this
| default, use the ``node_pattern`` and ``leaf_pattern``
| arguments.
|
| :type node_pattern: str
| :type leaf_pattern: str
| :param node_pattern, leaf_pattern: Regular expression patterns
| used to find node and leaf substrings in ``s``. By
| default, both nodes patterns are defined to match any
| sequence of non-whitespace non-bracket characters.
|
| :type remove_empty_top_bracketing: bool
| :param remove_empty_top_bracketing: If the resulting tree has
| an empty node label, and is length one, then return its
| single child instead. This is useful for treebank trees,
| which sometimes contain an extra level of bracketing.
|
| :return: A tree corresponding to the string representation ``s``.
| If this class method is called using a subclass of Tree,
| then it will return a tree of that type.
| :rtype: Tree
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| node
| Outdated method to access the node value; use the label() method instead.
|
| ----------------------------------------------------------------------
| Methods inherited from __builtin__.list:
|
| __contains__(...)
| x.__contains__(y) <==> y in x
|
| __delslice__(...)
| x.__delslice__(i, j) <==> del x[i:j]
|
| Use of negative indices is not supported.
|
| __getattribute__(...)
| x.__getattribute__('name') <==> x.name
|
| __getslice__(...)
| x.__getslice__(i, j) <==> x[i:j]
|
| Use of negative indices is not supported.
|
| __iadd__(...)
| x.__iadd__(y) <==> x+=y
|
| __imul__(...)
| x.__imul__(y) <==> x*=y
|
| __iter__(...)
| x.__iter__() <==> iter(x)
|
| __len__(...)
| x.__len__() <==> len(x)
|
| __reversed__(...)
| L.__reversed__() -- return a reverse iterator over the list
|
| __setslice__(...)
| x.__setslice__(i, j, y) <==> x[i:j]=y
|
| Use of negative indices is not supported.
|
| __sizeof__(...)
| L.__sizeof__() -- size of L in memory, in bytes
|
| append(...)
| L.append(object) -- append object to end
|
| count(...)
| L.count(value) -> integer -- return number of occurrences of value
|
| extend(...)
| L.extend(iterable) -- extend list by appending elements from the iterable
|
| index(...)
| L.index(value, [start, [stop]]) -> integer -- return first index of value.
| Raises ValueError if the value is not present.
|
| insert(...)
| L.insert(index, object) -- insert object before index
|
| pop(...)
| L.pop([index]) -> item -- remove and return item at index (default last).
| Raises IndexError if list is empty or index is out of range.
|
| remove(...)
| L.remove(value) -- remove first occurrence of value.
| Raises ValueError if the value is not present.
|
| reverse(...)
| L.reverse() -- reverse *IN PLACE*
|
| sort(...)
| L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
| cmp(x, y) -> -1, 0, 1
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from __builtin__.list:
|
| __hash__ = None
|
| __new__ = <built-in method __new__ of type object>
| T.__new__(S, ...) -> a new object with type S, a subtype of T
Exercise 5
In [24]:
# a
tree1 = Tree('NP',
[Tree('JJ', ['old']),
Tree('NP',
[Tree('N', ['men']), Tree('Conj', ['and']), Tree('N', ['women'])])])
print(tree1)
(NP (JJ old) (NP (N men) (Conj and) (N women)))
In [25]:
tree2 = Tree('NP',
[Tree('NP',
[Tree('JJ', ['old']), Tree('N', ['men'])]),
Tree('Conj', ['and']),
Tree('NP', ['women'])])
print(tree2)
(NP (NP (JJ old) (N men)) (Conj and) (NP women))
In [22]:
# b
tree3 = Tree.fromstring("((S (NP I) (VP (VP (V shot) (NP (Det an) (N elephant))) (PP (P in) (NP (Det my) (N pajamas))))))")
tree3.draw()
In [23]:
# c
tree4 = Tree('S',
[Tree('NP',
[Tree('Det', ['The']), Tree('N', ['woman'])]),
Tree('VP',
[Tree('V', ['saw']),
Tree('NP',
[Tree('Det', ['a']), Tree('N', ['man'])]),
Tree('NP',
[Tree('JJ', ['last']), Tree('N', ['Thursday'])])])])
print(tree4)
tree4.draw()
(S
(NP (Det The) (N woman))
(VP (V saw) (NP (Det a) (N man)) (NP (JJ last) (N Thursday))))
In [ ]:
Content source: JuliaNeumann/nltk_book_exercises
Similar notebooks: