In [1]:
from tock import *

Context-free grammars

You can create a CFG either by reading from a file (using Grammar.from_file) or a list of strings (using Grammar.from_lines). The first rule's left-hand side is assumed to be the start symbol.


In [2]:
g = Grammar.from_lines(["S -> a T b",
                        "S -> b",
                        "T -> T a",
                        "T -> &"])

To convert to a PDA using a top-down construction, use the from_grammar function:


In [3]:
m = from_grammar(g)
to_graph(m)


%3 _START 0 start _START->0 1 0.1 0->1 ε,ε → $ 2 loop 1->2 ε,ε → S 2->2 ε,S → b ε,T → ε a,a → ε b,b → ε 3 accept 2->3 ε,$ → ε 4 1.2 2->4 ε,S → b 6 3.1 2->6 ε,T → a 5 1.1 4->5 ε,ε → T 5->2 ε,ε → a 6->2 ε,ε → T

In [4]:
run(m, "a a b")


%3 _START 3 start,ε _START->3 0 4 0->4 a 1 1->0 a 2 0.1,$ 6 loop,[S] $ 2->6 3->2 5 4->5 b 7 1.2,[b] $ 6->7 8 loop,[b] $ 6->8 9 1.1,[T] b $ 7->9 10 loop,[a] T b … 9->10 11 loop,[T] b $ 10->11 12 3.1,[a] b $ 11->12 13 loop,[b] $ 11->13 14 loop,[T] a b … 12->14 15 3.1,[a] a b … 14->15 16 loop,[a] b $ 14->16 17 loop,[T] a a … 15->17 18 loop,[b] $ 16->18 19 3.1,[a] a a … 17->19 21 loop,[a] a b … 17->21 24 loop,[a] a a … 17->24 20 loop,$ 18->20 19->17 19->24 22 accept,ε 20->22 23 loop,[a] b $ 21->23 25 loop,[a] a b … 24->25 26 loop,[a] a a … 24->26

This diagram is a little hard to read, but one thing to note is the cycle between loop and 3.1. This is caused by the left-recursive rule T -> T a, which the automaton applies an unbounded number of times.

There's also a bottom-up version

The conversion in the reverse direction, from PDA to CFG, is actually related to the algorithm that Tock uses internally to simulate PDAs. We use the remove_useless method to reduce the size of the converted CFG.


In [5]:
to_grammar(read_csv('../examples/sipser-2-14.csv')).remove_useless()


Out[5]:
start: (start,accept)
(q2,q3) → 0 (q2,q2) 1
(q2,q3) → 0 (q2,q3) 1
(q1,q4) → (q2,q3)
(start,accept) → (q1,q4)
(q1,q1) → (q1,q1) (q1,q1)
(q1,q4) → (q1,q1) (q1,q4)
(q1,q4) → (q1,q4) (q4,q4)
(q3,q3) → (q3,q3) (q3,q3)
(q2,q3) → (q2,q3) (q3,q3)
(q2,q3) → (q2,q2) (q2,q3)
(q2,q2) → (q2,q2) (q2,q2)
(q4,q4) → (q4,q4) (q4,q4)
(q1,q1) → ε
(q3,q3) → ε
(q2,q2) → ε
(q4,q4) → ε