Transliteration is the conversion of a text from one script to another. For instance, a Latin transliteration of the Greek phrase "Ελληνική Δημοκρατία", usually translated as 'Hellenic Republic', is "Ellēnikḗ Dēmokratía".
In [1]:
from polyglot.transliteration import Transliterator
In [2]:
from polyglot.downloader import downloader
print(downloader.supported_languages_table("transliteration2"))
In [3]:
%%bash
polyglot download embeddings2.en transliteration2.ar
We tag each word in the text with one part of speech.
In [7]:
from polyglot.text import Text
In [8]:
blob = """We will meet at eight o'clock on Thursday morning."""
text = Text(blob)
We can query all the tagged words
In [9]:
for x in text.transliterate("ar"):
print(x)
In [20]:
!polyglot --lang en tokenize --input testdata/cricket.txt | polyglot --lang en transliteration --target ar | tail -n 30