In [1]:
import vcsn
vcsn.automaton('''
context = "lal_char(ab), z
$ -> p <2>
p -> q <3>a,<4>b
q -> q a
q -> $
''')
Out[1]:
See the documentation of vcsn.automaton
for more details about this function. The syntax used to define the automaton is, however, described here.
In order to facilitate the definition of automata, Vcsn provides additional ''magic commands'' to the IPython Notebook. We will see through this guide how use this command.
In [2]:
%%automaton a
context = "lal_char(ab), z"
$ -> p <2>
p -> q <3>a, <4>b
q -> q a
q -> $
The first argument, here a
, is the name of the variable in which this automaton is stored:
In [3]:
a
Out[3]:
You may pass the option -s
or --strip
to strip the automaton from its layer that keeps the state name you have chosen. In that case, the internal numbers are used, unrelated to the user names (actually, the numbers are assigned to state names as they are encountered starting from 0).
In [4]:
%%automaton --strip a
context = "lal_char(ab), z"
$ -> p <2>
p -> q <3>a, <4>b
q -> q a
q -> $
In [5]:
a
Out[5]:
The second argument specifies the format in which the automaton is described, defaulting to auto
, which means "guess the format":
In [6]:
%%automaton a dot
digraph
{
vcsn_context = "lal_char(ab), z"
I -> p [label = "<2>"]
p -> q [label = "<3>a, <4>b"]
q -> q [label = a]
q -> F
}
In [7]:
%%automaton a
digraph
{
vcsn_context = "lal_char(ab), z"
I -> p [label = "<2>"]
p -> q [label = "<3>a, <4>b"]
q -> q [label = a]
q -> F
}
Automata entered this way are persistent: they are stored in the notebook and will be recovered when the page is reopened.
%automaton
: Text-Based Edition of an AutomatonIn IPython "line magic commands" begin with a single %
. The line magic %automaton
takes three arguments:
auto
.h
for horizontal and v
for vertical. Defaults to h
.Contrary to the cell magic, the %automaton
can be used to update an existing automaton:
In [8]:
%automaton a
The real added value is that now you can interactively edit this automaton: changes in the text are immediately propagated on the rendered automaton.
When given a fresh variable name, %automaton
creates a dummy automaton that you can use as a starting point:
In [9]:
%automaton b fado
Beware however that these automata are not persistent: changes will be lost when the notebook is closed.
Vcsn supports differents input and output formats. Some, such as tikz
, are only export-only formats: they cannot be read by Vcsn.
This simple format is work in progress: its precise syntax is still subject to changes. It is roughly a simplification of the dot
syntax. The following example should suffice to understand the syntax. If "guessable", the context can be left implicit.
In [10]:
%%automaton a
context = "lal_char(ab), z"
$ -> p <2>
p -> q <3>a, <4>b
q -> q a
q -> $
This format relies on the "dot" language of the GraphViz toolkit (http://graphviz.org). This is the default format for I/O in Vcsn.
An automaton looks as follows:
In [11]:
%%automaton a dot
// The comments are introduced with //, or /* ... */
//
// The overall syntax is that of Dot for directed graph ("digraph").
digraph
{
// The following attribute defines the context of the automaton.
vcsn_context = "lal_char, b"
// Initial states are denoted by an edge between a node whose name starts
// with an "I". So "0" is a initial state.
I -> 0
// Transitions are edges whose label is that of the transition.
0 -> 0 [label = "a"]
0 -> 0 [label = "b"]
0 -> 1 [label = "c, d"]
// Final states are denoted by an edge to a node whose name starts with "F".
1 -> Finish
}
This format is designed to support import/export with OpenFST (http://openfst.org): it wraps its multi-file format (one file describes the automaton with numbers as transition labels, and one or several others define these labels) into a single format. It is not designed to be used by humans, but rather to be handled by two tools:
efstcompile
to compile such a file into the OpenFST binary file format,efstdecompile
to extract an efsm
file from a binary OpenFST file.As an example, consider the following exchange between Vcsn and OpenFST.
In [12]:
a = vcsn.context('lal_char(ab), zmin').expression('[ab]*a(<2>[ab])').automaton()
a
Out[12]:
In [13]:
efsm = a.format('efsm')
print(efsm)
The following sequence of operations uses OpenFST to determinize this automaton, and to load it back into Vcsn.
In [14]:
import os
# Save the EFSM description of the automaton in a file.
with open("a.efsm", "w") as file:
print(efsm, file=file)
# Compile the EFSM into an OpenFST file.
os.system("efstcompile a.efsm >a.fst")
# Call OpenFST's determinization.
os.system("fstdeterminize a.fst >d.fst")
# Convert from OpenFST format to EFSM.
os.system("efstdecompile d.fst >d.efsm")
# Load this file into Python.
with open("d.efsm", "r") as file:
d = file.read()
# Show the result.
print(d)
# Now read it as an automaton.
d_ofst = vcsn.automaton(d, 'efsm')
d_ofst
Out[14]:
For what it's worth, the above sequence of actions is realized by a.fstdeterminize()
.
Vcsn and OpenFST compute the same automaton.
In [15]:
a.determinize()
Out[15]:
In [16]:
t = a.partial_identity()
t
Out[16]:
In [17]:
tefsm = t.format('efsm')
print(tefsm)
In [18]:
vcsn.automaton(tefsm)
Out[18]:
The EFSM format is a simple format that puts together the various files that OpenFST uses to serialize and deserialize automata: one or two files to describe the labels (called "symbol tables"), and one to list the transitions. More details about these files can be found on FSM Man Pages.
When reading an EFSM file, Vcsn expects the following bits:
a line arc_type=TYPE
which specifies the weightset. If TYPE
is log
or log64
, this is mapped to the log
weightset, if it is standard
, then it is mapped to zmin
or rmin
, depending on whether floatting points were used.
a "here-document" (the Unix name for embedded files, delimited by <<EOF
to a line equal to EOF
) for the first symbol table. If the here-document is named isymbols.txt
, then the automaton is a transducer, otherwise it is considered an acceptor.
if the automaton is a transducer, a second symbol table, osymbols.txt
, to describe the labels of the second tape.
then a final here-document, transitions.fsm
, which list the transitions.
This is the native language of the FAdo platform (http://fado.dcc.fc.up.pt). Weighted automata are not supported.
In [19]:
a = vcsn.B.expression('a+b').standard()
a
Out[19]:
In [20]:
print(a.format('fado'))
This format is made to exchange automata with the Grail (http://grailplus.org). Weighted automata are not supported.
In [21]:
a = vcsn.B.expression('a+b').standard()
a
Out[21]:
In [22]:
print(a.format('grail'))
In [23]:
a = vcsn.Q.expression('<2>a+<3>b').standard()
a
Out[23]:
In [24]:
print(a.format('tikz'))