Questions taken from Stackoverflow

This page lists questions about automata and other rational/regular expressions that were asked on Stackoverflow, and where Vcsn seems to be an appropriate tool to compute the answer.

Build a Regular Expression and Finite Automata

The set of all strings beginning with 101 and ending with 01010.

First, let's define our "context": we work with "labels are letters" (lal), on the alphabet $\{0, 1\}$. We don't use weights, or rather, we use the traditional Boolean weights: $\mathbb{B}$.



In [1]:

    
import vcsn
c = vcsn.context('lal_char(01), b')
c









    



:0: FutureWarning: IPython widgets are experimental and may change in the future.






    














    Out[1]:





$\{0, 1\}\rightarrow\mathbb{B}$

Then, we build our expression using an unusual operator: $\mathsf{E} \& \mathsf{F}$ denotes the conjunction of expressions $\mathsf{E}$ and $\mathsf{F}$. In this case (unweighted/Boolean automata), it denotes exactly the intersection of languages.



In [2]:

    
e = c.expression('(101[01]*)&([01]*01010)')
e









    Out[2]:





$1 \, 0 \, 1 \, \left(0 + 1\right)^{*} \& \left(0 + 1\right)^{*} \, 0 \, 1 \, 0 \, 1 \, 0$

We want to normalize this extended expression (it has conjunction and complement operators) into a basic expression. To this end, we first convert it to an automaton.



In [3]:

    
a = e.automaton()
a









    Out[3]:

and then we convert this automaton into a basic expression:



In [4]:

    
a.expression()









    Out[4]:





$1 \, 0 \, 1 \, 0 \, 1 \, 0 + 1 \, 0 \, 1 \, \left(0 + 1\right)^{*} \, 0 \, 1 \, 0 \, 1 \, 0$

Or, in ASCII:



In [5]:

    
print(a.expression())









    



101010+101(0+1)*01010

Regular expression to match text that doesn't contain a word?

I'd like to know if it's possible to match lines that don't contain a specific word (e.g. hede) using a regular expression?

First, let's define that alphabet we work on: from $a$ to $z$ for instance.



In [6]:

    
import vcsn
c = vcsn.context('lal_char(a-z), b')
c









    Out[6]:





$\{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z\}\rightarrow\mathbb{B}$

Then we define our expression, which is extended (it uses the complement operator), so to normalize it, we first convert it into automaton (with _expression_.automaton), from which we extract a basic expression (with _automaton_.expresion).



In [7]:

    
e = c.expression('(hede){c}')
e









    Out[7]:





$\left(h \, e \, d \, e\right)^{c}$



In [8]:

    
a = e.automaton()
a









    Out[8]:



In [9]:

    
a.expression()









    Out[9]:





$\varepsilon + h \, \left(\varepsilon + e \, \left(\varepsilon + d\right)\right) + \left([\hat{}h] + h \, \left([\hat{}e] + e \, \left([\hat{}d] + d \, \left([\hat{}e] + e \, [\hat{}]\right)\right)\right)\right) \, {[\hat{}]}^{*}$

Or, in ASCII (+ is usually denoted |; \e denotes the empty word; and [^] denotes any character, usually written .):



In [10]:

    
print(a.expression())









    



\e+h(\e+e(\e+d))+([^h]+h([^e]+e([^d]+d([^e]+e[^]))))[^]*