Questions taken from Stackoverflow

This page lists questions about automata and other rational/regular expressions that were asked on Stackoverflow, and where Vcsn seems to be an appropriate tool to compute the answer.

The set of all strings beginning with 101 and ending with 01010.

First, let's define our "context": we work with "labels are letters" (lal), on the alphabet $\{0, 1\}$. We don't use weights, or rather, we use the traditional Boolean weights: $\mathbb{B}$.


In [1]:
import vcsn
c = vcsn.context('lal_char(01), b')
c


:0: FutureWarning: IPython widgets are experimental and may change in the future.
Out[1]:
$\{0, 1\}\rightarrow\mathbb{B}$

Then, we build our expression using an unusual operator: $\mathsf{E} \& \mathsf{F}$ denotes the conjunction of expressions $\mathsf{E}$ and $\mathsf{F}$. In this case (unweighted/Boolean automata), it denotes exactly the intersection of languages.


In [2]:
e = c.expression('(101[01]*)&([01]*01010)')
e


Out[2]:
$1 \, 0 \, 1 \, \left(0 + 1\right)^{*} \& \left(0 + 1\right)^{*} \, 0 \, 1 \, 0 \, 1 \, 0$

We want to normalize this extended expression (it has conjunction and complement operators) into a basic expression. To this end, we first convert it to an automaton.


In [3]:
a = e.automaton()
a


Out[3]:
%3 I0 0 0 I0->0 F9 1 1 0->1 1 2 2 1->2 0 3 3 1->3 0 6 6 2->6 1 4 4 3->4 1 4->4 0, 1 5 5 4->5 0 5->6 1 7 7 6->7 0 8 8 7->8 1 9 9 8->9 0 9->F9

and then we convert this automaton into a basic expression:


In [4]:
a.expression()


Out[4]:
$1 \, 0 \, 1 \, 0 \, 1 \, 0 + 1 \, 0 \, 1 \, \left(0 + 1\right)^{*} \, 0 \, 1 \, 0 \, 1 \, 0$

Or, in ASCII:


In [5]:
print(a.expression())


101010+101(0+1)*01010

Regular expression to match text that doesn't contain a word?

I'd like to know if it's possible to match lines that don't contain a specific word (e.g. hede) using a regular expression?

First, let's define that alphabet we work on: from $a$ to $z$ for instance.


In [6]:
import vcsn
c = vcsn.context('lal_char(a-z), b')
c


Out[6]:
$\{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z\}\rightarrow\mathbb{B}$

Then we define our expression, which is extended (it uses the complement operator), so to normalize it, we first convert it into automaton (with _expression_.automaton), from which we extract a basic expression (with _automaton_.expresion).


In [7]:
e = c.expression('(hede){c}')
e


Out[7]:
$\left(h \, e \, d \, e\right)^{c}$

In [8]:
a = e.automaton()
a


Out[8]:
%3 I0 0 0 I0->0 F0 F1 F2 F3 F4 0->F0 1 1 0->1 [^h] 2 2 0->2 h 1->F1 1->1 [^] 2->F2 2->1 [^e] 3 3 2->3 e 3->F3 3->1 [^d] 4 4 3->4 d 4->F4 4->1 [^e] 5 5 4->5 e 5->1 [^]

In [9]:
a.expression()


Out[9]:
$\varepsilon + h \, \left(\varepsilon + e \, \left(\varepsilon + d\right)\right) + \left([\hat{}h] + h \, \left([\hat{}e] + e \, \left([\hat{}d] + d \, \left([\hat{}e] + e \, [\hat{}]\right)\right)\right)\right) \, {[\hat{}]}^{*}$

Or, in ASCII (+ is usually denoted |; \e denotes the empty word; and [^] denotes any character, usually written .):


In [10]:
print(a.expression())


\e+h(\e+e(\e+d))+([^h]+h([^e]+e([^d]+d([^e]+e[^]))))[^]*