Normalization examples

Add stress to Russian words

Note that this has nothing to do with collation. Normalization is a common step in computational pipelines

  1. Go to our Web service to add stress to Russian text
  2. Check all of the checkboxes under the textbox.
  3. Paste in a short Russian poem. You can copy the one below, or use your own:

    На берегу пустынных волн
    Стоял он, дум великих полн,
    И вдаль глядел. Пред ним широко
    Река неслася; бедный чёлн
    По ней стремился одиноко.
    По мшистым, топким берегам
    Чернели избы здесь и там,
    Приют убогого чухонца;
    И лес, неведомый лучам
    В тумане спрятанного солнца,
    Кругом шумел.

  4. Hit the Submit button

Multilingual collation

What it looks like

How it works

  1. Use natural language processing (NLP) tools to tag individual words with linguistic information.
  2. Use the part of speech (POS) as the normalization value passed to CollateX.
  3. Align on the POS values, but output the originals.

What the linguistic tagging output looks like

<token id="1686779"
form="Блаженꙑи"
lemma="блаженъ"
part-of-speech="A-"
presentation-before=""
morphology="-s---mnpwi"
head-id="1686780"
relation="atr"
presentation-after=" "
citation-part="8"
part="1"
folio="061"
side="r"
line="16"
linebreak="false">

In [ ]: