In this chapter, we will be looking at parsing Python-like languages in Scheme. Specifically, we will look at LL(1) parser.
An LL parser is a top-down parser for a subset of context-free languages. It parses the input from Left to right, performing Leftmost derivation of the sentence.
An LL parser is called an LL(k) parser if it uses k tokens of lookahead when parsing a sentence. If such a parser exists for a certain grammar and it can parse sentences of this grammar without backtracking then it is called an LL(k) grammar.
LL grammars, particularly LL(1) grammars, are of great practical interest, as parsers for these grammars are easy to construct, and many computer languages are designed to be LL(1) for this reason. LL parsers are table-based parsers, similar to LR parsers. LL grammars can also be parsed by recursive descent parsers.
Could we define Python's syntax in LL(1)? Almost. See:
In [25]:
(load "sllgen.ss")
The first step in creating a new language is to design a lexical analysis (lexer, scanner). A lexer defines the lowest grouping of characters. You can design a lexer by simply creating a list of CATEGORY rules. CATEGORY rules have the following format:
(LEX-CATEGORY (EXPR ...) ACTION)
where LEX-CATEGORY is:
a name you which to tag
and EXPR is one of:
TEST | (or-exp EXPR ...) | (arbno EXPR) | (concat EXPR ...) | (separated-list ... DELIM)
and TEST is one of:
"string" | letter | digit | whitespace | any | (not CHAR)
and ACTION is one of:
make-symbol | make-number | skip
So, let's consider what is the lowest groupings of items in our language (e.g., our LEX-CATGORIES). We might want to group into:
Here are the rules for such a lexer:
In [26]:
(define lex2
'((whitespace (whitespace) skip)
(comment ("%" (arbno (not #\newline))) skip)
(identifier (letter (arbno (or-exp letter digit))) make-symbol)
(number (digit (arbno digit)) make-number)))
The next step is to arrange the items from the lexer into higher-level items for our grammar. This takes a similar form:
(PARSE-CATEGORY (LEX-ITEMS) id-exp)
where LEX-ITEMS is composed of
Let's define a grammar compposed of expressions
and declarations
. Let's define an expression as one of:
declaration is:
In [27]:
(define gram2
'((expression (number) lit-exp)
(expression (identifier) var-exp)
(expression
("let" (arbno declaration) "in" expression)
let-exp)
(expression
("mvlet"
(separated-list (separated-list identifier ",") "=" expression ";")
"in" expression)
lets-exp)
(expression
("(" expression (arbno expression) ")")
app-exp)
(declaration (identifier "=" expression) decl)))
In [28]:
(sllgen:make-define-datatypes lex2 gram2)
In [29]:
(sllgen:list-define-datatypes lex2 gram2)
Out[29]:
In [30]:
(define parse2 (sllgen:make-string-parser lex2 gram2))
In [32]:
(parse2 "1")
Out[32]:
In [33]:
(parse2 "hello")
Out[33]:
In [34]:
(parse2 "let x = 1 in x")
Out[34]:
In [35]:
(parse2 "let x = 1 y = 2 in x")
Out[35]:
In [37]:
(parse2 "mvlet x = 1 in x")
Out[37]:
In [39]:
(parse2 "mvlet x, y = 1; a, b = 2 in x")
Out[39]:
In [40]:
(parse2 "(function x)")
Out[40]:
In [41]:
(define gram3
'((expression (number) lit-exp)
(expression (identifier) var-exp)
(expression
("let" (arbno identifier "=" expression) "in" expression)
let-exp)
(expression
("mvlet" (arbno (arbno identifier) "=" expression) "in" expression)
mvlet-exp)
(expression
("(" expression (arbno expression) ")")
app-exp)
(declaration
(identifier "=" (arbno expression))
decl)))
In [42]:
(sllgen:list-define-datatypes lex2 gram3)
Out[42]:
In [43]:
(define parse3 (sllgen:make-string-parser lex2 gram3))
In [44]:
(parse3 "let x = 1 in y")
Out[44]:
In [45]:
(parse3 "mvlet x = 1 y = 8 in y")
Out[45]:
In [46]:
(define gram4
'((expression
("let" (separated-list (separated-list identifier "," ) "=" expression ";" )
"in" expression)
let-exp)
(expression
(number)
lit-exp)))
In [47]:
(define parse4 (sllgen:make-string-parser lex2 gram4))
In [48]:
(sllgen:list-define-datatypes lex2 gram4)
Out[48]:
In [49]:
(parse4 "7")
Out[49]:
In [51]:
(parse4 "let x = 7 in 7")
Out[51]:
In [52]:
(parse4 "let x,y = 7; y = 8 in 7")
Out[52]:
In [53]:
(define lc-lex
'((whitespace (whitespace) skip)
(comment ("//" (arbno (not #\newline))) skip)
(identifier (letter (arbno (or-exp letter digit))) make-symbol)
(number (digit (arbno digit)) make-number)))
In [54]:
(define lc-gram
'(
(lc-exp
("def" "(" (arbno identifier) ")" ":" lc-exp)
lambda-exp)
(lc-exp
(identifier)
var-exp)
(lc-exp
("&" identifier "(" (arbno lc-exp) ")")
app-exp)
(lc-exp
(number)
lit-exp)
))
In [56]:
(define parse-lc (sllgen:make-string-parser lc-lex lc-gram))
(print (parse-lc "x"))
(print (parse-lc "42"))
(print (parse-lc "def (a b): 42"))
(print (parse-lc "def (a b): &add(a b)"))
In [57]:
(sllgen:make-define-datatypes lc-lex lc-gram)
In [ ]:
(define view-lc
(lambda (ast)
(cases lc-exp ast
(lambda-exp (ids body)
(printf "ids: ~s body: ~s\n" ids body))
(var-exp (name)
(printf "name: ~s\n" name))
(lit-exp (value)
(printf "lit: ~s\n" value))
(app-exp (operator operands)
(printf "application: ~s ~s\n" operator operands))
(else (error 'eval-lc "invalid lc-exp form")))))
In [60]:
(view-lc (parse-lc "&add(1 2)"))
In [58]:
(define eval-lc
(lambda (ast env)
(cases lc-exp ast
(lambda-exp (ids body)
(list 'closure-exp ids env body))
(var-exp (name)
(lookup name env))
(lit-exp (value)
value)
(app-exp (operator operands)
(applyit (lookup operator env) operands env))
(else (error 'eval-lc "invalid lc-exp form")))))
(define lookup
(lambda (var env)
(cadr (assv var env))))
(define applyit
(lambda (f args env)
(case f
((add) (+ (eval-lc (car args) env) (eval-lc (cadr args) env)))
(else (error 'applit "no such function")))))
In [59]:
(eval-lc (parse-lc "&add(1 2)") '((add add)))
Out[59]:
In [3]:
%download https://www.cs.rit.edu/~ats/plt-2005-1/a_sllgen.pdf