A description of ExportXML (Tüba-D/Z Version 8.0)

root node: <exml-doc>

The root contains a <schema> node, which contains descriptions of all the annotations used in the corpus.
In addition, the root contains a <body> node, which contains all <text> nodes.

<text>

Each <text> node contains a complete document (i.e. newspaper article).

<text xml:id="text_0" origin="T990507.2">

Here's a list of all the elements that a text can contain as children (sorted by descreasing frequency):

<sentence>

<sentence xml:id="s1">

possible direct children of <sentence>:

other descendants of <sentence>:

<word>

A <word> element describes a token.

<word xml:id="s1_1" form="Veruntreute" pos="VVFIN" morph="3sit" lemma="veruntreuen"  
      func="HD" parent="s1_500" deprel="ROOT"/>

A <word> element might contain additional features as children:

<node>

A <node> describes an element of a syntax tree. The root <node> element does not have a parent attribute,
while non-root nodes do:

<node xml:id="s1_505" cat="SIMPX" func="--">
    <node xml:id="s1_501" cat="LK" func="-" parent="s1_505">

<ne>

<node name="ne" locality="sentence">
  <enum-attr name="type">
   <val name="PER" description="Person"/>
   <val name="ORG" description="Organisation"/>
   <val name="GPE" description="Gebietskörperschaft"/>
   <val name="LOC" description="Ort"/>
   <val name="OTH" description="andere Eigennamen"/>
  </enum-attr>
 </node>

describes a named entity (span of one or more nodes or words)

<ne xml:id="ne_23" type="PER">
     <word xml:id="s3_2" form="Ute" pos="NE" morph="nsf" lemma="Ute" func="-" parent="s3_501" dephead="s3_1" deprel="APP"/>
     <word xml:id="s3_3" form="Wedemeier" pos="NE" morph="nsf" lemma="Wedemeier" func="-" parent="s3_501" dephead="s3_2" deprel="APP"/>
    </ne>

<edu>

  • the arg1 EDU has a discRel child, the arg2 doesn't
<edu xml:id="edu_55_21_1">
         <discRel relation="Explanation-Cause" marking="-|*um zu" arg2="edu_55_21_2"/>
         <word xml:id="s905_9" form="und" pos="KON" lemma="und" func="-" parent="s905_526" dephead="s905_3" deprel="KON"/>
         <node xml:id="s905_525" cat="FKONJ" func="KONJ" parent="s905_526" span="s905_10..s905_19">

...

       <edu xml:id="edu_55_21_2" span="s905_14..s905_20">
        <node xml:id="s905_524" cat="NF" func="-" parent="s905_525">

<edu-range>

  • can be a child of <text> or <sentence>

  • <edu-range> seems to glue together a number of `<edu> elements,
    which may be scattered over a number of sentences

  • <edu-range> may or may not contain a span attribute
    (it seems that the span attribute is present, when <edu-range> is
    a descendent of <sentence>)
<edu-range xml:id="edus9_3_1-5_0" span="s128_4..s130_7">
    <node xml:id="s128_525" cat="SIMPX" func="--">
     <edu xml:id="edu_9_3_1">
      <discRel relation="Continuation" marking="-" arg2="edu_9_3_2"/>
      <node xml:id="s128_506" cat="VF" func="-" parent="s128_525">
       <node xml:id="s128_505" cat="NX" func="ON" parent="s128_506">
        <relation type="expletive"/>
        <word xml:id="s128_4" form="Es" pos="PPER" morph="nsn3" lemma="es" func="HD" parent="s128_505" dephead="s128_5" deprel="SUBJ"/>
       </node>
      </node>

...

  <edu-range xml:id="edus37_8_0-8_1">
   <discRel relation="Restatement" marking="-" arg2="edu_37_9_0"/>
   <sentence xml:id="s660">

<topic>

<topic> describes the topic of a span (i.e. a sentence, EDU or EDU range)

A <topic> element can contain these children:

<topic xml:id="topic_9_0" description="Kuli">
  <sentence xml:id="s128">

 ...

 <topic xml:id="topic_37_1" description="Die Pläne der AG">
  <edu-range xml:id="edus37_8_0-8_1">
   <discRel relation="Restatement" marking="-" arg2="edu_37_9_0"/>
   <sentence xml:id="s660">

<relation>

<edge name="relation" parent="word|node">
  <enum-attr name="type">
   <val name="anaphoric" description="Anaphorisches Pronomen"/>
   <val name="cataphoric" description="Kataphorisches Pronomen"/>
   <val name="coreferential" description="Diskurs-altes nicht-Pronomen"/>
  </enum-attr>
  <node-ref name="target"/>
</edge>

A <relation> always has a type attribute and inherits its ID from its parent element:

<node xml:id="s29_501" cat="NX" func="ON" parent="s29_523">
       <relation type="expletive"/>
       <word xml:id="s29_2" form="es" pos="PPER" morph="nsn3" lemma="es" func="HD" parent="s29_501" dephead="s29_14" deprel="SUBJ"/>
      </node>

In the case of a non-expletive relation, it also has a target attribute:

<node xml:id="s4_507" cat="NX" func="ON" parent="s4_513">
      <relation type="coreferential" target="s1_502"/>
      <node xml:id="s4_505" cat="NX" func="HD" parent="s4_507">
       <word xml:id="s4_4" form="die" pos="ART" morph="nsf" lemma="die" func="-" parent="s4_505" dephead="s4_5" deprel="DET"/>
       <ne xml:id="ne_32" type="ORG">
        <word xml:id="s4_5" form="Arbeiterwohlfahrt" pos="NN" morph="nsf" lemma="Arbeiterwohlfahrt" func="HD" parent="s4_505" dephead="s4_3" deprel="SUBJ"/>
       </ne>
      </node>
      <node xml:id="s4_506" cat="NX" func="-" parent="s4_507">
       <ne xml:id="ne_33" type="GPE">
        <word xml:id="s4_6" form="Bremen" pos="NE" morph="nsn" lemma="Bremen" func="HD" parent="s4_506" dephead="s4_5" deprel="APP"/>
       </ne>
      </node>
     </node>

<secEdge>

<edge name="secEdge" parent="word|node">
  <enum-attr name="cat">
   <val name="UNKNOWN" description="unbekanntes sekundäres Kantenlabel"/>
   <val name="refcontr" description="Dependenzrelation zw. Kontrollverb u. seinem Komplement"/>
   <val name="refint" description="Dependenzrel. zw. phraseninternem Teil u. dessen Modifikator"/>
   <val name="refmod" description="Dependenzrelation bei ambiger Modifikation"/>
   <val name="refvc" description="Dependenzrelation zw. zwei verbalen Objekten im Verbkomplex"/>
  </enum-attr>
  <node-ref name="parent"/>
</edge>

A <secEdge> element has a cat and a parent attribute, but inherits its ID from its parent element.
It describes a secondary edge in a tree-like syntax representation.

<node xml:id="s10_505" cat="VXINF" func="OV" parent="s10_507">
        <secEdge cat="refvc" parent="s10_504"/>
        <word xml:id="s10_6" form="worden" pos="VAPP" lemma="werden%passiv" func="HD" parent="s10_505" dephead="s10_7" deprel="AUX"/>
       </node>

<connective>

<edge name="connective" parent="word">
  <text-attr name="konn"/>
  <enum-attr name="rel1">
   <val name="Temporal" description="temporal contiguity"/>
   <val name="cause" description="strong causal relation"/>
   <val name="enable" description="weak causal relation"/>
   <val name="evidence" description="argumentative reasoning"/>
   <val name="speech_act" description="circumstances for a speech act (causal)"/>
   <val name="Result" description="causal relation (underspecified)"/>
   <val name="Comparison" description="comparison relation (underspecified)"/>
   <val name="parallel" description="parallel"/>
   <val name="contrast" description="contrast"/>
   <val name="Condition" description="conditional"/>
   <val name="NonFactual" description="counterfactual bevor"/>
   <val name="Concession" description="concessive relation (underspecified)"/>
   <val name="contraexpectation" description="denial-of-expectation"/>
   <val name="antithesis" description="antithesis/Bewertungskontrast"/>
  </enum-attr>
  <enum-attr name="rel2">
   <val name="Temporal" description="temporal contiguity"/>
   <val name="cause" description="strong causal relation"/>
   <val name="enable" description="weak causal relation"/>
   <val name="evidence" description="argumentative reasoning"/>
   <val name="speech_act" description="circumstances for a speech act (causal)"/>
   <val name="Result" description="causal relation (underspecified)"/>
   <val name="Comparison" description="comparison relation (underspecified)"/>
   <val name="parallel" description="parallel"/>
   <val name="contrast" description="contrast"/>
   <val name="Condition" description="conditional"/>
   <val name="NonFactual" description="counterfactual bevor"/>
   <val name="Concession" description="concessive relation (underspecified)"/>
   <val name="contraexpectation" description="denial-of-expectation"/>
   <val name="antithesis" description="antithesis/Bewertungskontrast"/>
  </enum-attr>
</edge>

A <connective> is an annotation of a <word>, featuring one or two rel attributes.

<word xml:id="s29_1" form="Als" pos="KOUS" lemma="als" func="-" parent="s29_500" dephead="s29_14" deprel="KONJ">
       <connective konn="als" rel1="Temporal" rel2="enable"/>
      </word>

<discRel>

<edge name="discRel" parent="edu|topic|edu-range">
  <enum-attr name="relation">
  </enum-attr>
  <enum-attr name="marking">
  </enum-attr>
  <node-ref name="arg2"/>
</edge>

describes the relation between two EDUs. The ID of the other EDU is given in the arg2 attribute.
Note, that arg2 can either reference an EDU (e.g. edu_9_3_2 or an EDU range, e.g. edus9_3_1-5_0).

<edu xml:id="edu_9_3_0">
    <discRel relation="Explanation-Speechact" marking="-" arg2="edus9_3_1-5_0"/>
    <node xml:id="s128_504" cat="SIMPX" func="--">
     <node xml:id="s128_501" cat="MF" func="-" parent="s128_504">
      <node xml:id="s128_500" cat="NX" func="ON" parent="s128_501">
       <word xml:id="s128_1" form="Kulisammler" pos="NN" morph="npm" lemma="Kulisammler" func="HD" parent="s128_500" dephead="s128_2" deprel="SUBJ"/>
      </node>
     </node>
     <node xml:id="s128_503" cat="VC" func="-" parent="s128_504">
      <node xml:id="s128_502" cat="VXINF" func="HD" parent="s128_503">
       <word xml:id="s128_2" form="aufgepaßt" pos="VVPP" lemma="auf#passen" func="HD" parent="s128_502" deprel="ROOT"/>
      </node>
     </node>
    </node>
    <word xml:id="s128_3" form=":" pos="$." lemma=":" func="--" deprel="ROOT"/>
   </edu>
<edu xml:id="edu_9_3_1">
      <discRel relation="Continuation" marking="-" arg2="edu_9_3_2"/>
      <node xml:id="s128_506" cat="VF" func="-" parent="s128_525">
       <node xml:id="s128_505" cat="NX" func="ON" parent="s128_506">
        <relation type="expletive"/>
        <word xml:id="s128_4" form="Es" pos="PPER" morph="nsn3" lemma="es" func="HD" parent="s128_505" dephead="s128_5" deprel="SUBJ"/>
       </node>
      </node>
      ...
     </edu>

<splitRelation>

<edge name="splitRelation" parent="word|node">
  <enum-attr name="type">
  </enum-attr>
  <text-attr name="target"/>
</edge>

A <splitRelation> annotates its parent element (e.g. as an anaphora). Its parent can be either a <word> or a <node>.
A <splitRelation> has a target attribute, which describes the targets (plural! e.g. antecedents) of the relation.

<node xml:id="s2527_528" cat="NX" func="-" parent="s2527_529">
         <splitRelation type="split_antecedent" target="s2527_504 s2527_521"/>
         <word xml:id="s2527_32" form="beider" pos="PIDAT" morph="gpf" lemma="beide" func="-" parent="s2527_528" dephead="s2527_33" deprel="DET"/>
         <word xml:id="s2527_33" form="Firmen" pos="NN" morph="gpf" lemma="Firma" func="HD" parent="s2527_528" dephead="s2527_31" deprel="GMOD"/>
        </node>
<word xml:id="s3456_12" form="ihr" pos="PPOSAT" morph="nsm" lemma="ihr" func="-" parent="s3456_507" dephead="s3456_14" deprel="DET">
         <splitRelation type="split_antecedent" target="s3456_505 s3456_9"/>
        </word>

In [ ]: