<exml-doc>
The root contains a <schema>
node, which contains descriptions of all the annotations used in the corpus.
In addition, the root contains a <body>
node, which contains all <text>
nodes.
<text>
Each <text>
node contains a complete document (i.e. newspaper article).
<text xml:id="text_0" origin="T990507.2">
Here's a list of all the elements that a text can contain as children (sorted by descreasing frequency):
<sentence>
<sentence xml:id="s1">
possible direct children of <sentence>
:
other descendants of <sentence>
:
<word>
A <word>
element describes a token.
<word xml:id="s1_1" form="Veruntreute" pos="VVFIN" morph="3sit" lemma="veruntreuen"
func="HD" parent="s1_500" deprel="ROOT"/>
A <word>
element might contain additional features as children:
<node>
A <node>
describes an element of a syntax tree.
The root <node>
element does not have a parent
attribute,
while non-root nodes do:
<node xml:id="s1_505" cat="SIMPX" func="--">
<node xml:id="s1_501" cat="LK" func="-" parent="s1_505">
<ne>
<node name="ne" locality="sentence">
<enum-attr name="type">
<val name="PER" description="Person"/>
<val name="ORG" description="Organisation"/>
<val name="GPE" description="Gebietskörperschaft"/>
<val name="LOC" description="Ort"/>
<val name="OTH" description="andere Eigennamen"/>
</enum-attr>
</node>
describes a named entity (span of one or more nodes or words)
<ne xml:id="ne_23" type="PER">
<word xml:id="s3_2" form="Ute" pos="NE" morph="nsf" lemma="Ute" func="-" parent="s3_501" dephead="s3_1" deprel="APP"/>
<word xml:id="s3_3" form="Wedemeier" pos="NE" morph="nsf" lemma="Wedemeier" func="-" parent="s3_501" dephead="s3_2" deprel="APP"/>
</ne>
<edu>
arg1
EDU has a discRel child, the arg2
doesn't<edu xml:id="edu_55_21_1">
<discRel relation="Explanation-Cause" marking="-|*um zu" arg2="edu_55_21_2"/>
<word xml:id="s905_9" form="und" pos="KON" lemma="und" func="-" parent="s905_526" dephead="s905_3" deprel="KON"/>
<node xml:id="s905_525" cat="FKONJ" func="KONJ" parent="s905_526" span="s905_10..s905_19">
...
<edu xml:id="edu_55_21_2" span="s905_14..s905_20">
<node xml:id="s905_524" cat="NF" func="-" parent="s905_525">
<edu-range>
can be a child of <text>
or <sentence>
<edu-range>
seems to glue together a number of `<edu>
elements,
which may be scattered over a number of sentences
<edu-range>
may or may not contain a span
attributespan
attribute is present, when <edu-range>
is<sentence>
)<edu-range xml:id="edus9_3_1-5_0" span="s128_4..s130_7">
<node xml:id="s128_525" cat="SIMPX" func="--">
<edu xml:id="edu_9_3_1">
<discRel relation="Continuation" marking="-" arg2="edu_9_3_2"/>
<node xml:id="s128_506" cat="VF" func="-" parent="s128_525">
<node xml:id="s128_505" cat="NX" func="ON" parent="s128_506">
<relation type="expletive"/>
<word xml:id="s128_4" form="Es" pos="PPER" morph="nsn3" lemma="es" func="HD" parent="s128_505" dephead="s128_5" deprel="SUBJ"/>
</node>
</node>
...
<edu-range xml:id="edus37_8_0-8_1">
<discRel relation="Restatement" marking="-" arg2="edu_37_9_0"/>
<sentence xml:id="s660">
<topic>
<topic>
describes the topic of a span (i.e. a sentence, EDU or EDU range)
A <topic>
element can contain these children:
<topic xml:id="topic_9_0" description="Kuli">
<sentence xml:id="s128">
...
<topic xml:id="topic_37_1" description="Die Pläne der AG">
<edu-range xml:id="edus37_8_0-8_1">
<discRel relation="Restatement" marking="-" arg2="edu_37_9_0"/>
<sentence xml:id="s660">
<relation>
<edge name="relation" parent="word|node">
<enum-attr name="type">
<val name="anaphoric" description="Anaphorisches Pronomen"/>
<val name="cataphoric" description="Kataphorisches Pronomen"/>
<val name="coreferential" description="Diskurs-altes nicht-Pronomen"/>
</enum-attr>
<node-ref name="target"/>
</edge>
A <relation>
always has a type attribute and inherits its ID from its parent element:
<node xml:id="s29_501" cat="NX" func="ON" parent="s29_523">
<relation type="expletive"/>
<word xml:id="s29_2" form="es" pos="PPER" morph="nsn3" lemma="es" func="HD" parent="s29_501" dephead="s29_14" deprel="SUBJ"/>
</node>
In the case of a non-expletive relation, it also has a target attribute:
<node xml:id="s4_507" cat="NX" func="ON" parent="s4_513">
<relation type="coreferential" target="s1_502"/>
<node xml:id="s4_505" cat="NX" func="HD" parent="s4_507">
<word xml:id="s4_4" form="die" pos="ART" morph="nsf" lemma="die" func="-" parent="s4_505" dephead="s4_5" deprel="DET"/>
<ne xml:id="ne_32" type="ORG">
<word xml:id="s4_5" form="Arbeiterwohlfahrt" pos="NN" morph="nsf" lemma="Arbeiterwohlfahrt" func="HD" parent="s4_505" dephead="s4_3" deprel="SUBJ"/>
</ne>
</node>
<node xml:id="s4_506" cat="NX" func="-" parent="s4_507">
<ne xml:id="ne_33" type="GPE">
<word xml:id="s4_6" form="Bremen" pos="NE" morph="nsn" lemma="Bremen" func="HD" parent="s4_506" dephead="s4_5" deprel="APP"/>
</ne>
</node>
</node>
<secEdge>
<edge name="secEdge" parent="word|node">
<enum-attr name="cat">
<val name="UNKNOWN" description="unbekanntes sekundäres Kantenlabel"/>
<val name="refcontr" description="Dependenzrelation zw. Kontrollverb u. seinem Komplement"/>
<val name="refint" description="Dependenzrel. zw. phraseninternem Teil u. dessen Modifikator"/>
<val name="refmod" description="Dependenzrelation bei ambiger Modifikation"/>
<val name="refvc" description="Dependenzrelation zw. zwei verbalen Objekten im Verbkomplex"/>
</enum-attr>
<node-ref name="parent"/>
</edge>
A <secEdge>
element has a cat
and a parent
attribute, but inherits its ID from its parent element.
It describes a secondary edge in a tree-like syntax representation.
<node xml:id="s10_505" cat="VXINF" func="OV" parent="s10_507">
<secEdge cat="refvc" parent="s10_504"/>
<word xml:id="s10_6" form="worden" pos="VAPP" lemma="werden%passiv" func="HD" parent="s10_505" dephead="s10_7" deprel="AUX"/>
</node>
<connective>
<edge name="connective" parent="word">
<text-attr name="konn"/>
<enum-attr name="rel1">
<val name="Temporal" description="temporal contiguity"/>
<val name="cause" description="strong causal relation"/>
<val name="enable" description="weak causal relation"/>
<val name="evidence" description="argumentative reasoning"/>
<val name="speech_act" description="circumstances for a speech act (causal)"/>
<val name="Result" description="causal relation (underspecified)"/>
<val name="Comparison" description="comparison relation (underspecified)"/>
<val name="parallel" description="parallel"/>
<val name="contrast" description="contrast"/>
<val name="Condition" description="conditional"/>
<val name="NonFactual" description="counterfactual bevor"/>
<val name="Concession" description="concessive relation (underspecified)"/>
<val name="contraexpectation" description="denial-of-expectation"/>
<val name="antithesis" description="antithesis/Bewertungskontrast"/>
</enum-attr>
<enum-attr name="rel2">
<val name="Temporal" description="temporal contiguity"/>
<val name="cause" description="strong causal relation"/>
<val name="enable" description="weak causal relation"/>
<val name="evidence" description="argumentative reasoning"/>
<val name="speech_act" description="circumstances for a speech act (causal)"/>
<val name="Result" description="causal relation (underspecified)"/>
<val name="Comparison" description="comparison relation (underspecified)"/>
<val name="parallel" description="parallel"/>
<val name="contrast" description="contrast"/>
<val name="Condition" description="conditional"/>
<val name="NonFactual" description="counterfactual bevor"/>
<val name="Concession" description="concessive relation (underspecified)"/>
<val name="contraexpectation" description="denial-of-expectation"/>
<val name="antithesis" description="antithesis/Bewertungskontrast"/>
</enum-attr>
</edge>
A <connective>
is an annotation of a <word>
, featuring one or two rel attributes.
<word xml:id="s29_1" form="Als" pos="KOUS" lemma="als" func="-" parent="s29_500" dephead="s29_14" deprel="KONJ">
<connective konn="als" rel1="Temporal" rel2="enable"/>
</word>
<discRel>
<edge name="discRel" parent="edu|topic|edu-range">
<enum-attr name="relation">
</enum-attr>
<enum-attr name="marking">
</enum-attr>
<node-ref name="arg2"/>
</edge>
describes the relation between two EDUs. The ID of the other EDU is given in the arg2
attribute.
Note, that arg2
can either reference an EDU (e.g. edu_9_3_2
or an EDU range, e.g. edus9_3_1-5_0
).
<edu xml:id="edu_9_3_0">
<discRel relation="Explanation-Speechact" marking="-" arg2="edus9_3_1-5_0"/>
<node xml:id="s128_504" cat="SIMPX" func="--">
<node xml:id="s128_501" cat="MF" func="-" parent="s128_504">
<node xml:id="s128_500" cat="NX" func="ON" parent="s128_501">
<word xml:id="s128_1" form="Kulisammler" pos="NN" morph="npm" lemma="Kulisammler" func="HD" parent="s128_500" dephead="s128_2" deprel="SUBJ"/>
</node>
</node>
<node xml:id="s128_503" cat="VC" func="-" parent="s128_504">
<node xml:id="s128_502" cat="VXINF" func="HD" parent="s128_503">
<word xml:id="s128_2" form="aufgepaßt" pos="VVPP" lemma="auf#passen" func="HD" parent="s128_502" deprel="ROOT"/>
</node>
</node>
</node>
<word xml:id="s128_3" form=":" pos="$." lemma=":" func="--" deprel="ROOT"/>
</edu>
<edu xml:id="edu_9_3_1">
<discRel relation="Continuation" marking="-" arg2="edu_9_3_2"/>
<node xml:id="s128_506" cat="VF" func="-" parent="s128_525">
<node xml:id="s128_505" cat="NX" func="ON" parent="s128_506">
<relation type="expletive"/>
<word xml:id="s128_4" form="Es" pos="PPER" morph="nsn3" lemma="es" func="HD" parent="s128_505" dephead="s128_5" deprel="SUBJ"/>
</node>
</node>
...
</edu>
<splitRelation>
<edge name="splitRelation" parent="word|node">
<enum-attr name="type">
</enum-attr>
<text-attr name="target"/>
</edge>
A <splitRelation>
annotates its parent element (e.g. as an anaphora). Its parent can be either a <word>
or a <node>
.
A <splitRelation>
has a target attribute, which describes the targets (plural! e.g. antecedents) of the
relation.
<node xml:id="s2527_528" cat="NX" func="-" parent="s2527_529">
<splitRelation type="split_antecedent" target="s2527_504 s2527_521"/>
<word xml:id="s2527_32" form="beider" pos="PIDAT" morph="gpf" lemma="beide" func="-" parent="s2527_528" dephead="s2527_33" deprel="DET"/>
<word xml:id="s2527_33" form="Firmen" pos="NN" morph="gpf" lemma="Firma" func="HD" parent="s2527_528" dephead="s2527_31" deprel="GMOD"/>
</node>
<word xml:id="s3456_12" form="ihr" pos="PPOSAT" morph="nsm" lemma="ihr" func="-" parent="s3456_507" dephead="s3456_14" deprel="DET">
<splitRelation type="split_antecedent" target="s3456_505 s3456_9"/>
</word>
In [ ]: