Parsing

In this chapter, we will be looking at parsing Python-like languages in Scheme. Specifically, we will look at LL(1) parser.

An LL parser is a top-down parser for a subset of context-free languages. It parses the input from Left to right, performing Leftmost derivation of the sentence.

An LL parser is called an LL(k) parser if it uses k tokens of lookahead when parsing a sentence. If such a parser exists for a certain grammar and it can parse sentences of this grammar without backtracking then it is called an LL(k) grammar.

LL grammars, particularly LL(1) grammars, are of great practical interest, as parsers for these grammars are easy to construct, and many computer languages are designed to be LL(1) for this reason. LL parsers are table-based parsers, similar to LR parsers. LL grammars can also be parsed by recursive descent parsers.

---After https://en.wikipedia.org/wiki/LL_parser

Could we define Python's syntax in LL(1)? Almost. See:


In [25]:
(load "sllgen.ss")


sllgen.ss 2000-09-25 11:48

Lexical Analysis

The first step in creating a new language is to design a lexical analysis (lexer, scanner). A lexer defines the lowest grouping of characters. You can design a lexer by simply creating a list of CATEGORY rules. CATEGORY rules have the following format:

(LEX-CATEGORY (EXPR ...) ACTION)

where LEX-CATEGORY is:

a name you which to tag

and EXPR is one of:

TEST | (or-exp EXPR ...) | (arbno EXPR) | (concat EXPR ...) | (separated-list ... DELIM)

and TEST is one of:

"string" | letter | digit | whitespace | any | (not CHAR)

and ACTION is one of:

make-symbol | make-number | skip

So, let's consider what is the lowest groupings of items in our language (e.g., our LEX-CATGORIES). We might want to group into:

  • whitespace - tabs and newlines, ignored
  • comments - from here to end of line, ignored
  • identifiers - names
  • numbers - integers, for now

Here are the rules for such a lexer:


In [26]:
(define lex2
   '((whitespace (whitespace) skip)
     (comment ("%" (arbno (not #\newline))) skip)
     (identifier (letter (arbno (or-exp letter digit))) make-symbol)
     (number (digit (arbno digit)) make-number)))

The next step is to arrange the items from the lexer into higher-level items for our grammar. This takes a similar form:

(PARSE-CATEGORY (LEX-ITEMS) id-exp)

where LEX-ITEMS is composed of

  • literals - literal strings, like commas, semicolons, etc.
  • LEX-CATEGORIES

Let's define a grammar compposed of expressions and declarations. Let's define an expression as one of:

  • number
  • identifier
  • let DECLARATION in BODY
  • mvlet identifier = expression, ... in BODY
  • (function expression ...)

declaration is:

  • identifier = expression

In [27]:
(define gram2
  '((expression (number) lit-exp)
    (expression (identifier) var-exp)
    (expression
      ("let" (arbno declaration) "in" expression)
      let-exp)
    (expression
      ("mvlet"
        (separated-list  (separated-list identifier ",") "=" expression ";") 
         "in" expression)
      lets-exp)
    (expression
      ("(" expression (arbno expression) ")")
      app-exp)
    (declaration (identifier "=" expression) decl)))

In [28]:
(sllgen:make-define-datatypes lex2 gram2)

In [29]:
(sllgen:list-define-datatypes lex2 gram2)


Out[29]:
((define-datatype expression expression? (lit-exp (lit-exp12 number?)) (var-exp (var-exp13 symbol?)) (let-exp (let-exp14 (list-of declaration?)) (let-exp15 expression?)) (lets-exp (lets-exp16 (list-of (list-of symbol?))) (lets-exp17 (list-of expression?)) (lets-exp18 expression?)) (app-exp (app-exp19 expression?) (app-exp20 (list-of expression?)))) (define-datatype declaration declaration? (decl (decl21 symbol?) (decl22 expression?))))

In [30]:
(define parse2 (sllgen:make-string-parser lex2 gram2))

In [32]:
(parse2 "1")


Out[32]:
(lit-exp 1)

In [33]:
(parse2 "hello")


Out[33]:
(var-exp hello)

In [34]:
(parse2 "let x = 1 in x")


Out[34]:
(let-exp ((decl x (lit-exp 1))) (var-exp x))

In [35]:
(parse2 "let x = 1 y = 2 in x")


Out[35]:
(let-exp ((decl x (lit-exp 1)) (decl y (lit-exp 2))) (var-exp x))

In [37]:
(parse2 "mvlet x = 1 in x")


Out[37]:
(lets-exp ((x)) ((lit-exp 1)) (var-exp x))

In [39]:
(parse2 "mvlet x, y = 1; a, b = 2 in x")


Out[39]:
(lets-exp ((x y) (a b)) ((lit-exp 1) (lit-exp 2)) (var-exp x))

In [40]:
(parse2 "(function x)")


Out[40]:
(app-exp (var-exp function) ((var-exp x)))

In [41]:
(define gram3
  '((expression (number) lit-exp)
    (expression (identifier) var-exp)
    (expression
      ("let" (arbno identifier "=" expression) "in" expression)
      let-exp)
    (expression
      ("mvlet" (arbno (arbno identifier) "=" expression) "in" expression)
      mvlet-exp)
    (expression
      ("(" expression (arbno expression) ")")
      app-exp)
    (declaration
      (identifier "=" (arbno expression))
      decl)))

In [42]:
(sllgen:list-define-datatypes lex2 gram3)


Out[42]:
((define-datatype expression expression? (lit-exp (lit-exp32 number?)) (var-exp (var-exp33 symbol?)) (let-exp (let-exp34 (list-of symbol?)) (let-exp35 (list-of expression?)) (let-exp36 expression?)) (mvlet-exp (mvlet-exp37 (list-of (list-of symbol?))) (mvlet-exp38 (list-of expression?)) (mvlet-exp39 expression?)) (app-exp (app-exp40 expression?) (app-exp41 (list-of expression?)))) (define-datatype declaration declaration? (decl (decl42 symbol?) (decl43 (list-of expression?)))))

In [43]:
(define parse3 (sllgen:make-string-parser lex2 gram3))

In [44]:
(parse3 "let x = 1 in y")


Out[44]:
(let-exp (x) ((lit-exp 1)) (var-exp y))

In [45]:
(parse3 "mvlet x = 1 y = 8 in y")


Out[45]:
(mvlet-exp ((x) (y)) ((lit-exp 1) (lit-exp 8)) (var-exp y))

In [46]:
(define gram4
  '((expression
      ("let" (separated-list (separated-list identifier "," ) "=" expression ";" )
        "in" expression)
      let-exp)
    (expression
      (number)
      lit-exp)))

In [47]:
(define parse4 (sllgen:make-string-parser lex2 gram4))

In [48]:
(sllgen:list-define-datatypes lex2 gram4)


Out[48]:
((define-datatype expression expression? (let-exp (let-exp57 (list-of (list-of symbol?))) (let-exp58 (list-of expression?)) (let-exp59 expression?)) (lit-exp (lit-exp60 number?))))

In [49]:
(parse4 "7")


Out[49]:
(lit-exp 7)

In [51]:
(parse4 "let x = 7 in 7")


Out[51]:
(let-exp ((x)) ((lit-exp 7)) (lit-exp 7))

In [52]:
(parse4 "let x,y = 7; y = 8 in 7")


Out[52]:
(let-exp ((x y) (y)) ((lit-exp 7) (lit-exp 8)) (lit-exp 7))

In [53]:
(define lc-lex
   '((whitespace (whitespace) skip)
     (comment ("//" (arbno (not #\newline))) skip)
     (identifier (letter (arbno (or-exp letter digit))) make-symbol)
     (number (digit (arbno digit)) make-number)))

In [54]:
(define lc-gram
  '(
   (lc-exp
       ("def" "(" (arbno identifier) ")" ":" lc-exp)
       lambda-exp)
    (lc-exp
       (identifier)
       var-exp)
    (lc-exp
        ("&" identifier "(" (arbno lc-exp) ")")
        app-exp)
    (lc-exp
       (number)
       lit-exp)
  ))

In [56]:
(define parse-lc (sllgen:make-string-parser lc-lex lc-gram))
(print (parse-lc "x"))
(print (parse-lc "42"))
(print (parse-lc "def (a b): 42"))
(print (parse-lc "def (a b): &add(a b)"))


(var-exp x)
(lit-exp 42)
(lambda-exp (a b) (lit-exp 42))
(lambda-exp (a b) (app-exp add ((var-exp a) (var-exp b))))

In [57]:
(sllgen:make-define-datatypes lc-lex lc-gram)

Interpreting SLLGEN AST


In [ ]:
(define view-lc
    (lambda (ast)
        (cases lc-exp ast
            (lambda-exp (ids body)
               (printf "ids: ~s body: ~s\n" ids body))
            (var-exp (name)
               (printf "name: ~s\n" name))
            (lit-exp (value)
               (printf "lit: ~s\n" value))
            (app-exp (operator operands)
               (printf "application: ~s ~s\n" operator operands))
            (else (error 'eval-lc "invalid lc-exp form")))))

In [60]:
(view-lc (parse-lc "&add(1 2)"))


application: add ((lit-exp 1) (lit-exp 2))

In [58]:
(define eval-lc
    (lambda (ast env)
        (cases lc-exp ast
            (lambda-exp (ids body)
               (list 'closure-exp ids env body))
            (var-exp (name)
               (lookup name env))
            (lit-exp (value)
               value)
            (app-exp (operator operands)
               (applyit (lookup operator env) operands env))
            (else (error 'eval-lc "invalid lc-exp form")))))

(define lookup
    (lambda (var env)
        (cadr (assv var env))))

(define applyit
    (lambda (f args env)
        (case f
            ((add) (+ (eval-lc (car args) env) (eval-lc (cadr args) env)))
            (else (error 'applit "no such function")))))

In [59]:
(eval-lc (parse-lc "&add(1 2)") '((add add)))


Out[59]:
3

In [3]:
%download https://www.cs.rit.edu/~ats/plt-2005-1/a_sllgen.pdf


Downloaded 'a_sllgen.pdf'.