``````

In :

from tock import *

``````

# Regular expressions

Regular expressions in Tock use the same three operators and parentheses as Unix regular expressions; however, because a symbol can have more than one character, consecutive symbols must be separated by a space. Also, for the empty string, you must write `&`. And even though the empty set is considered a regular expression, there's no way to write it in Tock.

To create a regular expression in Tock (this one is from Example 1.56):

``````

In :

m = from_regexp('(a b|a)*')

``````

The regular expression is converted into a finite automaton, which you can view, as usual, as either a graph or a table.

``````

In :

to_graph(m)

``````
``````

%3

_START

7

q8

_START->7

0

q1

1

q2

0->1

a

2

q3

1->2

ε

3

q4

2->3

b

6

q7

3->6

ε

4

q5

5

q6

4->5

a

5->6

ε

6->0

ε

6->4

ε

7->6

ε

``````
``````

In :

to_table(m)

``````
``````

Out:

ε
a
b

q1

q2

q2
q3

q3

q4

@q4
q7

q5

q6

@q6
q7

q7
{q1,q5}

@>q8
q7

``````

The states are numbered according to the position in the regular expression they came from (so that listing them in alphabetical order is natural). The letter suffixes are explained below.

We can also pass the `display_steps=True` option to show the automata created for all the subexpressions.

``````

In :

m = from_regexp('(a b|a)*', display_steps=True)

``````
``````

subexpression: a

%3

_START

0

q1

_START->0

1

q2

0->1

a

subexpression: b

%3

_START

0

q3

_START->0

1

q4

0->1

b

subexpression: a b

%3

_START

0

q1

_START->0

1

q2

0->1

a

2

q3

1->2

ε

3

q4

2->3

b

subexpression: a

%3

_START

0

q5

_START->0

1

q6

0->1

a

subexpression: a b|a

%3

_START

6

q7

_START->6

0

q1

1

q2

0->1

a

2

q3

1->2

ε

3

q4

2->3

b

4

q5

5

q6

4->5

a

6->0

ε

6->4

ε

subexpression: (a b|a)*

%3

_START

7

q8

_START->7

0

q1

1

q2

0->1

a

2

q3

1->2

ε

3

q4

2->3

b

6

q7

3->6

ε

4

q5

5

q6

4->5

a

5->6

ε

6->0

ε

6->4

ε

7->6

ε

``````

The `to_regexp` function converts in the opposite direction:

``````

In :

e = to_regexp(m)
e

``````
``````

Out:

ε|a* a|(ε|a* a) (a b (ε|a* a))* a b (ε|a* a)

``````

Again, the `display_steps` option causes all the intermediate steps of the conversion to be displayed.

``````

In :

e = to_regexp(m, display_steps=True)

``````
``````

%3

_START

9

start

_START->9

0

accept

1

q1

2

q2

1->2

a

3

q3

2->3

ε

4

q4

3->4

b

4->0

ε

7

q7

4->7

ε

5

q5

6

q6

5->6

a

6->0

ε

6->7

ε

7->1

ε

7->5

ε

8

q8

8->0

ε

8->7

ε

9->8

ε

eliminate q8

%3

_START

8

start

_START->8

0

accept

1

q1

2

q2

1->2

a

3

q3

2->3

ε

4

q4

3->4

b

4->0

ε

7

q7

4->7

ε

5

q5

6

q6

5->6

a

6->0

ε

6->7

ε

7->1

ε

7->5

ε

8->0

ε

8->7

ε

eliminate q7

%3

_START

7

start

_START->7

0

accept

1

q1

2

q2

1->2

a

3

q3

2->3

ε

4

q4

3->4

b

4->0

ε

4->1

ε

5

q5

4->5

ε

6

q6

5->6

a

6->0

ε

6->1

ε

6->5

ε

7->0

ε

7->1

ε

7->5

ε

eliminate q6

%3

_START

6

start

_START->6

0

accept

1

q1

2

q2

1->2

a

3

q3

2->3

ε

4

q4

3->4

b

4->0

ε

4->1

ε

5

q5

4->5

ε

5->0

a

5->1

a

5->5

a

6->0

ε

6->1

ε

6->5

ε

eliminate q5

%3

_START

5

start

_START->5

0

accept

1

q1

2

q2

1->2

a

3

q3

2->3

ε

4

q4

3->4

b

4->0

ε|a* a

4->1

ε|a* a

5->0

ε|a* a

5->1

ε|a* a

eliminate q4

%3

_START

4

start

_START->4

0

accept

1

q1

2

q2

1->2

a

3

q3

2->3

ε

3->0

b (ε|a* a)

3->1

b (ε|a* a)

4->0

ε|a* a

4->1

ε|a* a

eliminate q3

%3

_START

3

start

_START->3

0

accept

1

q1

2

q2

1->2

a

2->0

b (ε|a* a)

2->1

b (ε|a* a)

3->0

ε|a* a

3->1

ε|a* a

eliminate q2

%3

_START

2

start

_START->2

0

accept

1

q1

1->0

a b (ε|a* a)

1->1

a b (ε|a* a)

2->0

ε|a* a

2->1

ε|a* a

eliminate q1

%3

_START

1

start

_START->1

0

accept

1->0

ε|a* a|(ε|a* a) (a b (ε|a* a))* a b (ε|a* a)

``````