In [ ]:
%%HTML
<style>
.container { width:100% } 
</style>

The Shunting Yard Algorithm (Operator Precedence Parsing)


In [ ]:
import re

The function $\texttt{isWhiteSpace}(s)$ checks whether $s$ contains only blanks and tabulators.


In [ ]:
def isWhiteSpace(s):
    whitespace = re.compile(r'[ \t]+')
    return whitespace.fullmatch(s)

The function $\texttt{toInt}(s)$ tries to convert the string $s$ to an integer. If this works out, the integer is returned. Otherwise, the string $s$ is returned unchanged.


In [ ]:
def toInt(s):
    try:
        return int(s)   
    except ValueError:
        return s

The module re provides support for regular expressions. These are needed for tokenizing a string.

The function $\texttt{tokenize}(s)$ takes a string and splits this string into a list of tokens. Whitespace is discarded.


In [ ]:
def tokenize(s):
    regExp = r'''
              0|[1-9][0-9]* |             # number
              \*\*          |             # power operator
              [-+*/()]      |             # arithmetic operators and parentheses
              [ \t]         |             # white space
              sqrt          |
              sin           |
              cos           |
              tan           |   
              asin          |
              acos          |
              atan          |
              exp           |
              log           |
              x             |
              e             |
              pi
              '''
    L = [toInt(t) for t in re.findall(regExp, s, flags=re.VERBOSE) if not isWhiteSpace(t)]
    return list(reversed(L))

In [ ]:
tokenize('x**2 - 2')

The module math provides a number of mathematical functions like exp, sin, log etc.


In [ ]:
import math

The function $\texttt{findZero}(f, a, b, n)$ takes a function $f$ and two numbers $a$ and $b$ such that

  • $a < b$ and
  • $f(a) \leq 0 \leq f(b)$ or $f(a) \geq 0 \geq f(b)$.

It uses the bisection method to find a number $x \in [a, b]$ such that $f(x) \approx 0$. $n$ is the number of iterations.


In [ ]:
def findZero(f, a, b, n):
    assert a < b,            f'{a} has to be less than {b}'
    assert f(a) * f(b) <= 0, f'f({a}) * f({b}) > 0'
    if f(a) <= 0 <= f(b):
        for k in range(n):
            c = 0.5 * (a + b) 
            print(f'f({c}) = {f(c)}, {b-a}')
            if f(c) < 0:
                a = c
            elif f(c) > 0:
                b = c
            else:
                return c
    else:
        for k in range(n):
            c = 0.5 * (a + b) 
            print(f'f({c}) = {f(c)}, {b-a}')
            if f(c) > 0:
                a = c
            elif f(c) < 0:
                b = c
            else:
                return c
    return (a + b) / 2

In [ ]:
def f(x):
    return x ** 2 - 2

In [ ]:
findZero(f, 0, 2, 55)

The function $\texttt{precedence}(o)$ calculates the precedence of the operator $o$.


In [ ]:
def precedence(op):
    "your code here"

The function $\texttt{isUnaryOperator}(o)$ returns True of $o$ is a unary operator.


In [ ]:
def isUnaryOperator(op):
    "your code here"

The function $\texttt{isConstOperator}(o)$ returns True of $o$ is a constant like eor pi.


In [ ]:
def isConstOperator(op):
    "your code here"

The function $\texttt{isLeftAssociative}(o)$ returns True of $o$ is left associative.


In [ ]:
def isLeftAssociative(op):
    "your code here"

The function $\texttt{evalBefore}(o_1, o_2)$ receives to strings representing arithmetical operators. It returns True if the operator $o_1$ should be evaluated before the operator $o_2$ in an arithmetical expression of the form $a \;\texttt{o}_1\; b \;\texttt{o}_2\; c$. In order to determine whether $o_1$ should be evaluated before $o_2$ it uses the precedence and the associativity of the operators.
Its behavior is specified by the following rules:

  • $\texttt{precedence}(o_1) > \texttt{precedence}(o_2) \rightarrow \texttt{evalBefore}(\texttt{o}_1, \texttt{o}_2) = \texttt{True}$,
  • $o_1 = o_2 \rightarrow \texttt{evalBefore}(\texttt{o}_1, \texttt{o}_2) = \texttt{isLeftAssociative}(o_1)$,
  • $\texttt{precedence}(o_1) = \texttt{precedence}(o_2) \wedge o_1 \not= o_2 \rightarrow \texttt{evalBefore}(\texttt{o}_1, \texttt{o}_2) = \texttt{True}$,
  • $\texttt{precedence}(o_1) < \texttt{precedence}(o_2) \rightarrow \texttt{evalBefore}(\texttt{o}_1, \texttt{o}_2) = \texttt{False}$.

In [ ]:
def evalBefore(stackOp, nextOp):
    "your code here"

In [ ]:
import stack

The class Calculator supports three member variables:

  • the token stack mTokens,
  • the operator stack mOperators,
  • the argument stack mArguments,
  • the floating point number mValue, which is the current value of x.

The constructor takes a list of tokens TL and initializes the token stack with these tokens.


In [ ]:
class Calculator:
    def __init__(self, TL, x):
        self.mTokens     = stack.createStack(TL)
        self.mOperators  = stack.Stack()
        self.mArguments  = stack.Stack()
        self.mValue      = x

The method __str__ is used to convert an object of class Calculator to a string.


In [ ]:
def toString(self):
    return '\n'.join(['_'*50, 
                      'TokenStack: ', str(self.mTokens), 
                      'Arguments:  ', str(self.mArguments), 
                      'Operators:  ', str(self.mOperators), 
                      '_'*50])

Calculator.__str__ = toString

The function $\texttt{evaluate}(\texttt{self})$ evaluates the expression that is given by the tokens on the mTokenStack.
There are two phases:

  1. The first phase is the reading phase. In this phase the tokens are removed from the token stack mTokens.
  2. The second phase is the evaluation phase. In this phase, the remaining operators on the operator stack mOperators are evaluated. Note that some operators are already evaluated in the reading phase.

We can describe what happens in the reading phase using rewrite rules that describe how the three stacks mTokens, mArguments and mOperators are changed in each step. Here, a step is one iteration of the first while-loop of the function evaluate. The following rewrite rules are executed until the token stack mTokens is empty.

  1. If the token on top of the token stack is an integer, it is removed from the token stack and pushed onto the argument stack. The operator stack remains unchanged in this case.
    $$\begin{array}{lc} \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{token} ] & \wedge \\ \texttt{isInteger}(\texttt{token}) & \Rightarrow \\[0.2cm] \texttt{mArguments}' = \texttt{mArguments} + [\texttt{token}] & \wedge \\ \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\ \texttt{mOperators}' = \texttt{mOperators} \end{array} $$ Here, the primed variable $\texttt{mArguments}'$ refers to the argument stack after $\texttt{token}$ has been pushed onto it.

    In the following rules we implicitly assume that the token on top of the token stack is not an integer but rather a parenthesis or a proper operator. In order to be more concise, we suppress this precondition from the following rewrite rules.

  2. If the operator stack is empty, the next token is pushed onto the operator stack. $$\begin{array}{lc} \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{op} ] & \wedge \\ \texttt{mOperators} = [] & \Rightarrow \\[0.2cm] \texttt{mOperators}' = \texttt{mOperators} + [\texttt{op}] & \wedge \\ \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\ \texttt{mArguments}' = \texttt{mArguments} \end{array} $$
  3. If the next token is an opening parenthesis, this parenthesis token is pushed onto the operator stack. $$\begin{array}{lc} \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{'('} ] & \Rightarrow \\[0.2cm] \texttt{mOperators}' = \texttt{mOperators} + [\texttt{'('}] & \wedge \\ \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\ \texttt{mArguments}' = \texttt{mArguments} \end{array} $$
  4. If the next token is a closing parenthesis and the operator on top of the operator stack is an opening parenthesis, then both parentheses are removed. $$\begin{array}{lc} \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{')'} ] & \wedge \\ \texttt{mOperators} =\texttt{mOperatorsRest} + [\texttt{'('}] & \Rightarrow \\[0.2cm] \texttt{mOperators}' = \texttt{mOperatorsRest} & \wedge \\ \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\ \texttt{mArguments}' = \texttt{mArguments} \end{array} $$
  5. If the next token is a closing parenthesis but the operator on top of the operator stack is not an opening parenthesis, the operator on top of the operator stack is evaluated. Note that the token stack is not changed in this case. $$\begin{array}{lc} \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{')'} ] & \wedge \ \texttt{mOperatorsRest} + [\texttt{op}] & \wedge \ \texttt{op} \not= \texttt{'('} & \wedge \ \texttt{mArguments} = \texttt{mArgumentsRest} + [\texttt{lhs}, \texttt{rhs}] & \Rightarrow \[0.2cm]
     \texttt{mOperators}' = \texttt{mOperatorsRest} & \wedge \\
      \texttt{mTokens}' = \texttt{mTokens} & \wedge \\
      \texttt{mArguments}' = \texttt{mArgumentsRest} + [\texttt{lhs} \;\texttt{op}\; \texttt{rhs}]
    
    \end{array} $$ Here, the expression $\texttt{lhs} \;\texttt{op}\; \texttt{rhs}$ denotes evaluating the operator $\texttt{op}$ with the arguments $\texttt{lhs}$ and $\texttt{rhs}$.
  6. If the token on top of the operator stack is an opening parenthesis, then the operator on top of the token stack is pushed onto the operator stack. $$\begin{array}{lc} \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{op}] & \wedge \\ \texttt{op} \not= \texttt{')'} & \wedge \\ \texttt{mOperators} = \texttt{mOperatorsRest} + [\texttt{'('}] & \Rightarrow \\[0.2cm] \texttt{mOperator}' = \texttt{mOperator} + [\texttt{op}] & \wedge \\ \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\ \texttt{mArguments}' = \texttt{mArguments} \end{array} $$

    In the remaining cases neither the token on top of the token stack nor the operator on top of the operator stack can be a a parenthesis. The following rules will implicitly assume that this is the case.

  7. If the operator on top of the operator stack needs to be evaluated before the operator on top of the token stack, the operator on top of the operator stack is evaluated. $$\begin{array}{lc}
     \texttt{mTokens} = \texttt{mTokensRest} + [o_2]                                        & \wedge \\
     \texttt{mOperatorsRest} + [o_1]                                                        & \wedge \\
     \texttt{evalBefore}(o_1, o_2)                                                          & \wedge \\ 
     \texttt{mArguments} = \texttt{mArgumentsRest} + [\texttt{lhs}, \texttt{rhs}]           & \Rightarrow \\[0.2cm]
     \texttt{mOperators}' = \texttt{mOperatorRest}                                          & \wedge \\
     \texttt{mTokens}' = \texttt{mTokens}                                                   & \wedge \\
     \texttt{mArguments}' = \texttt{mArgumentsRest} + [\texttt{lhs} \;o_1\; \texttt{rhs}]
     \end{array} 
    
    $$
  8. Otherwise, the operator on top of the token stack is pushed onto the operator stack. $$\begin{array}{lc}
      \texttt{mTokens} = \texttt{mTokensRest} + [o_2]           & \wedge \\
      \texttt{mOperators} = \texttt{mOperatorsRest} + [o_1]     & \wedge \\
      \neg \texttt{evalBefore}(o_1, o_2)                        & \Rightarrow \\[0.2cm]
     \texttt{mOperators}' = \texttt{mOperators} + [o_2]         & \wedge \\
     \texttt{mTokens}' = \texttt{mTokensRest}                   & \wedge \\
     \texttt{mArguments}' = \texttt{mArguments}
    
    \end{array} $$

In every step of the evaluation phase we

  • remove one operator from the operator stack,
  • remove its arguments from the argument stack,
  • evaluate the operator, and
  • push the result back on the argument stack

In [ ]:
def evaluate(self):
    "your code here"
    
Calculator.evaluate = evaluate
del evaluate

The method $\texttt{popAndEvaluate}(\texttt{self})$ removes an operator from the operator stack and removes the corresponding arguments from the arguments stack. It evaluates the operator and pushes the result on the argument stack.


In [ ]:
def popAndEvaluate(self):
    "your code here"

Calculator.popAndEvaluate = popAndEvaluate

In [ ]:
TL = tokenize('x - cos(x)')
C = Calculator(TL, 1)

In [ ]:
C.evaluate()

In [ ]:
def computeZero(s, left, right):
    TL = tokenize(s)

    def f(x):
        c = Calculator(TL, x)
        return c.evaluate()

    return findZero(f, left, right, 54);

In [ ]:
computeZero('log exp x - cos(x)', 0, 1)

In [ ]: