TODOs

which one is the root node of a URML doc?

hyp vs. par

  • hyp: nuc-sat
  • par: nuc-nuc

In [1]:
import os
from lxml import etree

urml_filepath = 'urml-pq-041104.xml'
# urml_filepath = os.path.join(dg.DATA_ROOT_DIR, 'urml-example.xml')

In [2]:
%load_ext gvmagic

In [3]:
import discoursegraphs as dg

In [16]:
urml_corpus = dg.read_urml(urml_filepath, tokenize=False)

In [17]:
len(urml_corpus)


Out[17]:
172

In [6]:
udg = urml_corpus.next()

%dotstr dg.print_dot(udg)


maz3377 urml:maz3377.9_4 die urml:maz3377.6_27 Eingemeindung urml:maz3377.7_2 die urml:maz3377.2_16 . urml:maz3377.8_15 zu urml:maz3377.1003 urml:maz3377.1003 urml:maz3377.1004 urml:maz3377.1004 urml:maz3377.1003->urml:maz3377.1004 urml:contrast urml:maz3377.1000 urml:maz3377.1000 urml:maz3377.1003->urml:maz3377.1000 urml:contrast urml:maz3377.4 [n]:maz3377.4: welche Absichten hin... urml:maz3377.4_1 Absichten urml:maz3377.4->urml:maz3377.4_1 urml:maz3377.4_0 welche urml:maz3377.4->urml:maz3377.4_0 urml:maz3377.4_3 dem urml:maz3377.4->urml:maz3377.4_3 urml:maz3377.4_2 hinter urml:maz3377.4->urml:maz3377.4_2 urml:maz3377.4_5 Aktionismus urml:maz3377.4->urml:maz3377.4_5 urml:maz3377.4_4 plötzlichen urml:maz3377.4->urml:maz3377.4_4 urml:maz3377.4_7 : urml:maz3377.4->urml:maz3377.4_7 urml:maz3377.4_6 stehen urml:maz3377.4->urml:maz3377.4_6 urml:maz3377.11_7 müssen urml:maz3377.6_17 Verhandlungsmandat urml:maz3377.11_14 . urml:maz3377.6_22 nur urml:maz3377.1_5 den urml:maz3377.6_0 Warum urml:maz3377.5_2 SPD-Fraktion urml:maz3377.6_13 einem urml:maz3377.2_5 Anträgen urml:maz3377.10_4 und urml:maz3377.9_13 einheitlichen urml:maz3377.5_15 ? urml:maz3377.1_6 Dallgower urml:maz3377.1009 urml:maz3377.1009 urml:maz3377.7 [n]:maz3377.7: Warum beharrt die CD... urml:maz3377.1009->urml:maz3377.7 urml:contrast urml:maz3377.8 [n]:maz3377.8: Warten die Christdem... urml:maz3377.1009->urml:maz3377.8 urml:contrast urml:maz3377.11_5 Mittwoch urml:maz3377.6_26 der urml:maz3377.9_5 der urml:maz3377.8_9 , urml:maz3377.8_1 die urml:maz3377.5 [n]:maz3377.5: Will die SPD-Fraktio... urml:maz3377.5->urml:maz3377.5_2 urml:maz3377.5->urml:maz3377.5_15 urml:maz3377.5_3 - urml:maz3377.5->urml:maz3377.5_3 urml:maz3377.5_4 bisher urml:maz3377.5->urml:maz3377.5_4 urml:maz3377.5_13 Verhandlungen urml:maz3377.5->urml:maz3377.5_13 urml:maz3377.5_7 über urml:maz3377.5->urml:maz3377.5_7 urml:maz3377.5_5 völlig urml:maz3377.5->urml:maz3377.5_5 urml:maz3377.5_12 faire urml:maz3377.5->urml:maz3377.5_12 urml:maz3377.5_6 uneins urml:maz3377.5->urml:maz3377.5_6 urml:maz3377.5_8 Dallgows urml:maz3377.5->urml:maz3377.5_8 urml:maz3377.5_10 - urml:maz3377.5->urml:maz3377.5_10 urml:maz3377.5_0 Will urml:maz3377.5->urml:maz3377.5_0 urml:maz3377.5_11 wirklich urml:maz3377.5->urml:maz3377.5_11 urml:maz3377.5_9 Zukunft urml:maz3377.5->urml:maz3377.5_9 urml:maz3377.5_14 anstreben urml:maz3377.5->urml:maz3377.5_14 urml:maz3377.5_1 die urml:maz3377.5->urml:maz3377.5_1 urml:maz3377.6_30 . urml:maz3377.11_6 beweisen urml:maz3377.6_14 solch urml:maz3377.6_7 , urml:maz3377.10_5 lässt urml:maz3377.2_4 mit urml:maz3377.3_4 jetzigen urml:maz3377.1001 urml:maz3377.1001 urml:maz3377.1004->urml:maz3377.1001 urml:contrast urml:maz3377.1006 urml:maz3377.1006 urml:maz3377.1004->urml:maz3377.1006 urml:contrast urml:maz3377.7_4 auf urml:maz3377.10_0 bekennt urml:maz3377.1008 urml:maz3377.1008 urml:maz3377.1008->urml:maz3377.5 urml:contrast urml:maz3377.6 [n]:maz3377.6: Warum stellt sie dan... urml:maz3377.1008->urml:maz3377.6 urml:contrast urml:maz3377.9_12 einer urml:maz3377.6_15 eng urml:maz3377.1_7 Kommunalpolitikern urml:maz3377.9_18 , urml:maz3377.9_6 Freien urml:maz3377.6_25 Wie urml:maz3377.3_5 Zeitpunkt urml:maz3377.8_0 Warten urml:maz3377.6->urml:maz3377.6_27 urml:maz3377.6->urml:maz3377.6_17 urml:maz3377.6->urml:maz3377.6_22 urml:maz3377.6->urml:maz3377.6_0 urml:maz3377.6->urml:maz3377.6_13 urml:maz3377.6->urml:maz3377.6_26 urml:maz3377.6->urml:maz3377.6_30 urml:maz3377.6->urml:maz3377.6_14 urml:maz3377.6->urml:maz3377.6_7 urml:maz3377.6->urml:maz3377.6_15 urml:maz3377.6->urml:maz3377.6_25 urml:maz3377.6_12 mit urml:maz3377.6->urml:maz3377.6_12 urml:maz3377.6_6 Bedingungen urml:maz3377.6->urml:maz3377.6_6 urml:maz3377.6_24 das urml:maz3377.6->urml:maz3377.6_24 urml:maz3377.6_5 unannehmbare urml:maz3377.6->urml:maz3377.6_5 urml:maz3377.6_23 noch urml:maz3377.6->urml:maz3377.6_23 urml:maz3377.6_4 schier urml:maz3377.6->urml:maz3377.6_4 urml:maz3377.6_11 Bürgermeister urml:maz3377.6->urml:maz3377.6_11 urml:maz3377.6_10 den urml:maz3377.6->urml:maz3377.6_10 urml:maz3377.6_18 ausstattet urml:maz3377.6->urml:maz3377.6_18 urml:maz3377.6_28 klären urml:maz3377.6->urml:maz3377.6_28 urml:maz3377.6_3 dann urml:maz3377.6->urml:maz3377.6_3 urml:maz3377.6_9 sie urml:maz3377.6->urml:maz3377.6_9 urml:maz3377.6_19 , urml:maz3377.6->urml:maz3377.6_19 urml:maz3377.6_21 dieser urml:maz3377.6->urml:maz3377.6_21 urml:maz3377.6_29 darf urml:maz3377.6->urml:maz3377.6_29 urml:maz3377.6_2 sie urml:maz3377.6->urml:maz3377.6_2 urml:maz3377.6_20 dass urml:maz3377.6->urml:maz3377.6_20 urml:maz3377.6_8 indem urml:maz3377.6->urml:maz3377.6_8 urml:maz3377.6_16 geschnürten urml:maz3377.6->urml:maz3377.6_16 urml:maz3377.6_1 stellt urml:maz3377.6->urml:maz3377.6_1 urml:maz3377.8_2 Christdemokraten urml:maz3377.2_3 alle urml:maz3377.10_2 zum urml:maz3377.11_12 ehrlich urml:maz3377.8_16 jeglichen urml:maz3377.7_5 einer urml:maz3377.12_0 R urml:maz3377.9_11 mit urml:maz3377.11_4 nächsten urml:maz3377.8_10 die urml:maz3377.7_3 CDU urml:maz3377.1_8 , urml:maz3377.10_11 Zustandekommens urml:maz3377.9_7 Wählergemeinschaft urml:maz3377.1_0 Erst urml:maz3377.3_2 wird urml:maz3377.8_3 nur urml:maz3377.7->urml:maz3377.7_2 urml:maz3377.7->urml:maz3377.7_4 urml:maz3377.7->urml:maz3377.7_5 urml:maz3377.7->urml:maz3377.7_3 urml:maz3377.7_8 ? urml:maz3377.7->urml:maz3377.7_8 urml:maz3377.7_6 neuerlichen urml:maz3377.7->urml:maz3377.7_6 urml:maz3377.7_0 Warum urml:maz3377.7->urml:maz3377.7_0 urml:maz3377.7_1 beharrt urml:maz3377.7->urml:maz3377.7_1 urml:maz3377.7_7 Einwohnerversammlung urml:maz3377.7->urml:maz3377.7_7 urml:maz3377.9_10 bisher urml:maz3377.1005 urml:maz3377.1005 urml:maz3377.1005->urml:maz3377.1009 urml:list urml:maz3377.1005->urml:maz3377.1008 urml:list urml:maz3377.11_13 meinen urml:maz3377.10_3 Dreierbund urml:maz3377.2_2 sich urml:maz3377.11_10 sie urml:maz3377.8_17 Fusionen urml:maz3377.2_12 mit urml:maz3377.8_11 sie urml:maz3377.8_4 darauf urml:maz3377.11_3 am urml:maz3377.10_10 des urml:maz3377.2_9 den urml:maz3377.10_8 die urml:maz3377.2_1 überschlagen urml:maz3377.1_1 rührt urml:maz3377.3_3 zum urml:maz3377.9_0 Allein urml:maz3377.8_6 genügend urml:maz3377.0 [i]:maz3377.0: Absicht... urml:maz3377.0_0 Absicht urml:maz3377.0->urml:maz3377.0_0 urml:maz3377.9_8 , urml:maz3377.9_17 aufgetreten urml:maz3377.2_15 Nachbarn urml:maz3377.8_18 bestätigen urml:maz3377.12_1 17 urml:maz3377.2_13 den urml:maz3377.8_8 zusammenzubekommen urml:maz3377.11_2 werden urml:maz3377.8_12 in urml:maz3377.10_9 Art urml:maz3377.2_8 und urml:maz3377.1007 urml:maz3377.1007 urml:maz3377.1007->urml:maz3377.4 urml:joint urml:maz3377.1007->urml:maz3377.1005 urml:elaboration urml:maz3377.3 [n]:maz3377.3: Nicht klar wird zum ... urml:maz3377.1007->urml:maz3377.3 urml:joint urml:maz3377.10_1 sich urml:maz3377.2_0 nun urml:maz3377.1_2 sich urml:maz3377.11_11 es urml:maz3377.3_0 Nicht urml:maz3377.9_1 eine urml:maz3377.8_5 , urml:maz3377.1 [n]:maz3377.1: Erst rührt sich niem... urml:maz3377.1->urml:maz3377.1_5 urml:maz3377.1->urml:maz3377.1_6 urml:maz3377.1->urml:maz3377.1_7 urml:maz3377.1->urml:maz3377.1_8 urml:maz3377.1->urml:maz3377.1_0 urml:maz3377.1->urml:maz3377.1_1 urml:maz3377.1->urml:maz3377.1_2 urml:maz3377.1_3 niemand urml:maz3377.1->urml:maz3377.1_3 urml:maz3377.1_4 unter urml:maz3377.1->urml:maz3377.1_4 urml:maz3377.9_9 ist urml:maz3377.12 [i]:maz3377.12: R 17... urml:maz3377.12->urml:maz3377.12_0 urml:maz3377.12->urml:maz3377.12_1 urml:maz3377.9_16 Argumentationslinie urml:maz3377.8->urml:maz3377.8_15 urml:maz3377.8->urml:maz3377.8_9 urml:maz3377.8->urml:maz3377.8_1 urml:maz3377.8->urml:maz3377.8_0 urml:maz3377.8->urml:maz3377.8_2 urml:maz3377.8->urml:maz3377.8_16 urml:maz3377.8->urml:maz3377.8_10 urml:maz3377.8->urml:maz3377.8_3 urml:maz3377.8->urml:maz3377.8_17 urml:maz3377.8->urml:maz3377.8_11 urml:maz3377.8->urml:maz3377.8_4 urml:maz3377.8->urml:maz3377.8_6 urml:maz3377.8->urml:maz3377.8_18 urml:maz3377.8->urml:maz3377.8_8 urml:maz3377.8->urml:maz3377.8_12 urml:maz3377.8->urml:maz3377.8_5 urml:maz3377.8_19 ? urml:maz3377.8->urml:maz3377.8_19 urml:maz3377.8_13 ihrer urml:maz3377.8->urml:maz3377.8_13 urml:maz3377.8_7 Claqueure urml:maz3377.8->urml:maz3377.8_7 urml:maz3377.8_14 Ablehnung urml:maz3377.8->urml:maz3377.8_14 urml:maz3377.2 [n]:maz3377.2: nun überschlagen sic... urml:maz3377.2->urml:maz3377.2_16 urml:maz3377.2->urml:maz3377.2_5 urml:maz3377.2->urml:maz3377.2_4 urml:maz3377.2->urml:maz3377.2_3 urml:maz3377.2->urml:maz3377.2_2 urml:maz3377.2->urml:maz3377.2_12 urml:maz3377.2->urml:maz3377.2_9 urml:maz3377.2->urml:maz3377.2_1 urml:maz3377.2->urml:maz3377.2_15 urml:maz3377.2->urml:maz3377.2_13 urml:maz3377.2->urml:maz3377.2_8 urml:maz3377.2->urml:maz3377.2_0 urml:maz3377.2_10 vorausgehenden urml:maz3377.2->urml:maz3377.2_10 urml:maz3377.2_7 Gemeindereform urml:maz3377.2->urml:maz3377.2_7 urml:maz3377.2_14 südlichen urml:maz3377.2->urml:maz3377.2_14 urml:maz3377.2_11 Verhandlungen urml:maz3377.2->urml:maz3377.2_11 urml:maz3377.2_6 zur urml:maz3377.2->urml:maz3377.2_6 urml:maz3377.9 [n]:maz3377.9: Allein eine Fraktion... urml:maz3377.1001->urml:maz3377.9 urml:conjunction urml:maz3377.10 [n]:maz3377.10: bekennt sich zum Dre... urml:maz3377.1001->urml:maz3377.10 urml:conjunction urml:maz3377.11_9 dass urml:maz3377.1002 urml:maz3377.1002 urml:maz3377.11 [n]:maz3377.11: Die anderen werden a... urml:maz3377.1002->urml:maz3377.11 urml:justify urml:maz3377.3_1 klar urml:maz3377.9_2 Fraktion urml:maz3377.10_6 Spielraum urml:maz3377.11->urml:maz3377.1003 urml:justify urml:maz3377.11->urml:maz3377.11_7 urml:maz3377.11->urml:maz3377.11_14 urml:maz3377.11->urml:maz3377.11_5 urml:maz3377.11->urml:maz3377.11_6 urml:maz3377.11->urml:maz3377.11_12 urml:maz3377.11->urml:maz3377.11_4 urml:maz3377.11->urml:maz3377.11_13 urml:maz3377.11->urml:maz3377.11_10 urml:maz3377.11->urml:maz3377.11_3 urml:maz3377.11->urml:maz3377.11_2 urml:maz3377.11->urml:maz3377.11_11 urml:maz3377.11->urml:maz3377.11_9 urml:maz3377.11_1 anderen urml:maz3377.11->urml:maz3377.11_1 urml:maz3377.11_0 Die urml:maz3377.11->urml:maz3377.11_0 urml:maz3377.11_8 , urml:maz3377.11->urml:maz3377.11_8 urml:maz3377.9_15 nachvollziehbaren urml:maz3377.9->urml:maz3377.9_4 urml:maz3377.9->urml:maz3377.9_13 urml:maz3377.9->urml:maz3377.9_5 urml:maz3377.9->urml:maz3377.9_12 urml:maz3377.9->urml:maz3377.9_18 urml:maz3377.9->urml:maz3377.9_6 urml:maz3377.9->urml:maz3377.9_11 urml:maz3377.9->urml:maz3377.9_7 urml:maz3377.9->urml:maz3377.9_10 urml:maz3377.9->urml:maz3377.9_0 urml:maz3377.9->urml:maz3377.9_8 urml:maz3377.9->urml:maz3377.9_17 urml:maz3377.9->urml:maz3377.9_1 urml:maz3377.9->urml:maz3377.9_9 urml:maz3377.9->urml:maz3377.9_16 urml:maz3377.9->urml:maz3377.9_2 urml:maz3377.9->urml:maz3377.9_15 urml:maz3377.9_3 , urml:maz3377.9->urml:maz3377.9_3 urml:maz3377.9_14 und urml:maz3377.9->urml:maz3377.9_14 urml:maz3377.3->urml:maz3377.3_4 urml:maz3377.3->urml:maz3377.3_5 urml:maz3377.3->urml:maz3377.3_2 urml:maz3377.3->urml:maz3377.3_3 urml:maz3377.3->urml:maz3377.3_0 urml:maz3377.3->urml:maz3377.3_1 urml:maz3377.3_6 , urml:maz3377.3->urml:maz3377.3_6 urml:maz3377.10_7 für urml:maz3377.1006->urml:maz3377.1007 urml:elaboration urml:maz3377.10->urml:maz3377.10_4 urml:maz3377.10->urml:maz3377.10_5 urml:maz3377.10->urml:maz3377.10_0 urml:maz3377.10->urml:maz3377.10_2 urml:maz3377.10->urml:maz3377.10_11 urml:maz3377.10->urml:maz3377.10_3 urml:maz3377.10->urml:maz3377.10_10 urml:maz3377.10->urml:maz3377.10_8 urml:maz3377.10->urml:maz3377.10_9 urml:maz3377.10->urml:maz3377.10_1 urml:maz3377.10->urml:maz3377.10_6 urml:maz3377.10->urml:maz3377.10_7 urml:maz3377.10_12 . urml:maz3377.10->urml:maz3377.10_12 urml:maz3377.1000->urml:maz3377.1 urml:sequence urml:maz3377.1000->urml:maz3377.2 urml:sequence

In [7]:
udg.node[udg.root]


Out[7]:
{'layers': {'urml', 'urml:relation'},
 'metadata': defaultdict(<function discoursegraphs.discoursegraph.<lambda>>,
             {'urml:edus': ['urml:maz3377.0',
               'urml:maz3377.1',
               'urml:maz3377.2',
               'urml:maz3377.3',
               'urml:maz3377.4',
               'urml:maz3377.5',
               'urml:maz3377.6',
               'urml:maz3377.7',
               'urml:maz3377.8',
               'urml:maz3377.9',
               'urml:maz3377.10',
               'urml:maz3377.11',
               'urml:maz3377.12']}),
 'urml:rel_name': 'justify',
 'urml:rel_type': 'hypRelation'}

In [8]:
udg.root


Out[8]:
'urml:maz3377.1002'

In [10]:
urml_minimal_doc = """
<document id="maz3377">
  <info>
  </info>
  <text>
    <segment id="maz3377.1">Erst rührt sich niemand unter den Dallgower Kommunalpolitikern , </segment>
    <segment id="maz3377.2">nun überschlagen sich alle mit Anträgen zur Gemeindereform und den vorausgehenden Verhandlungen mit den südlichen Nachbarn . </segment>
  </text>
  <analysis status="interpretation">
    <parRelation id="maz3377.1000" type="sequential">
      <nucleus id="maz3377.1"/>
      <nucleus id="maz3377.2"/>
    </parRelation>
  </analysis>
</document>
"""

In [11]:
from lxml import etree

urml_min_doc_etree = etree.fromstring(urml_minimal_doc)
udg_min = dg.readwrite.URMLDocumentGraph(urml_min_doc_etree, tokenize=False)

In [12]:
%dotstr dg.print_dot(udg_min)


maz3377 urml:maz3377.2 [n]:maz3377.2: nun überschlagen sic... urml:maz3377.1000 urml:maz3377.1000 urml:maz3377.1000->urml:maz3377.2 urml:sequential urml:maz3377.1 [n]:maz3377.1: Erst rührt sich niem... urml:maz3377.1000->urml:maz3377.1 urml:sequential

TODOs

RST in general: add option to enforce one fully connected tree (with only one root) per document


In [ ]: