In [ ]:
%%HTML
<style>
.container { width:100% }
</style>
The function $\texttt{toInt}(s)$ tries to convert the string $s$ to an integer. If this works out, the integer is returned. Otherwise, the string $s$ is returned unchanged.
In [ ]:
def toInt(s):
try:
return int(s)
except ValueError:
return s
In [ ]:
toInt('123')
In [ ]:
toInt('**')
The module re
provides support for regular expressions. These are needed for
tokenizing a string.
In [ ]:
import re
The function $\texttt{tokenize}(s)$ takes a string $s$ representing an arithmetic expression and splits this string into a list of tokens.
The string regExp
in the implementation below is interpreted as follows:
r
in front of the apostrophe '
specifies that the regular expression is defined as a raw string. In a raw string the backslash does not have to be escaped because it is treated as a literal character.</br> |
. [0-9]+
matches a natural number. For example, it matches 0
or 123
. It would also match a string like 007
.
The +
at the end of the substring [0-9]*
specifies that there are any positive number of the characters in the range [0-9]
.</br>\*\*
matches the operator **
.</br>[()+*/%-]
matches a parenthesis or an arithmetical operator. Note that we have
to put the symbol -
last in this group as otherwise this symbol would be
interpreted as a range operator.
In [ ]:
def tokenize(s):
regExp = r'[0-9]+|\*\*|[()+*%/-]'
L = [ toInt(t) for t in re.findall(regExp, s) ]
return list(reversed(L))
In [ ]:
re.findall(r'[0-9]+|\*\*|[()+*%/-]', '1 * 2 * 3**4')
In [ ]:
tokenize('1 * 2 * 3**4')
Given an operator $o$, the expression $\texttt{precedence}(o)$ returns the precedence of the operator $o$. If $o_1$ and $o_2$ are different operators and the precedence of $\texttt{o}_1$ is at least as high than the precedence of $\texttt{o}_2$, then the expression $$ a \;\texttt{o}_1\; b \;\texttt{o}_2\; c $$ should be evaluated as $$ (a \;\texttt{o}_1\; b) \;\texttt{o}_2\; c. $$ Otherwise, the expression $a \;\texttt{o}_1\; b \;\texttt{o}_2\; c$ should be evaluated as $$ a \;\texttt{o}_1\; (b \;\texttt{o}_2\; c). $$
In [ ]:
def precedence(o):
Precedence = { '+': 1, '-': 1, '*': 2, '/': 2, '%': 2, '**' : 3 }
return Precedence[o]
The expression $\texttt{isLeftAssociative}(o)$ is True
iff the operator $o$
associates to the left. If $o$ associates to the
right, this expression is False
. If the operator $o$ is unknown, evaluation of the expression results
in an error.
In [ ]:
def isLeftAssociative(o):
if o in { '+', '-', '*', '/', '%' }:
return True
if o in { '**' }:
return False
assert False, f'unknown operator {o}'
The function $\texttt{evalBefore}(o_1, o_2)$ receives to strings representing arithmetical operators. It returns True
if the operator $o_1$ should be evaluated before the operator $o_2$ in an arithmetical expression of the form $a \;\texttt{o}_1\; b \;\texttt{o}_2\; c$. In order to determine whether $o_1$ should be evaluated before $o_2$ it uses the
precedence and the associativity of the operators.
Its behavior is specified by the following rules:
In [ ]:
def evalBefore(stackOp, nextOp):
if precedence(stackOp) > precedence(nextOp):
return True
if stackOp == nextOp:
return isLeftAssociative(stackOp)
if precedence(stackOp) == precedence(nextOp) and stackOp != nextOp:
return True
if precedence(stackOp) < precedence(nextOp):
return False
assert False, f'incomplete case distinction in evalBefore({stackOp}, {nextOp})'
In [ ]:
import stack
The class Calculator
supports three member variables:
mTokenStack
mOperators
mArguments
The constructor takes a string that is tokenized and pushes the tokens onto the token stack such that the first token is on top of the token stack.
In [ ]:
class Calculator:
def __init__(self, s):
self.mTokens = stack.createStack(tokenize(s))
self.mOperators = stack.Stack()
self.mArguments = stack.Stack()
The method __str__
is used to convert an object of class Calculator
to a string.
In [ ]:
def toString(self):
return '\n'.join(['_'*50,
'Tokens: ', str(self.mTokens),
'Arguments: ', str(self.mArguments),
'Operators: ', str(self.mOperators),
'_'*50])
Calculator.__str__ = toString
del toString
The function $\texttt{evaluate}(\texttt{self})$ evaluates the expression that is given by the tokens on the mTokenStack
.
There are two phases:
mTokens
. mOperators
are evaluated. Note that some operators are already
evaluated in the reading phase.We can describe what happens in the reading phase using
rewrite rules that describe how the three stacks mTokens
, mArguments
and mOperators
are changed in each step. Here, a step is one iteration of the first while
-loop of the function evaluate
.
The following rewrite rules are executed until the token stack mTokens
is empty.
If the token on top of the token stack is an integer, it is removed from the token stack and pushed onto the argument stack.
The operator stack remains unchanged in this case.
$$\begin{array}{lc}
\texttt{mTokens} = \texttt{mTokensRest} + [\texttt{token} ] & \wedge \\
\texttt{isInteger}(\texttt{token}) & \Rightarrow \\[0.2cm]
\texttt{mArguments}' = \texttt{mArguments} + [\texttt{token}] & \wedge \\
\texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\
\texttt{mOperators}' = \texttt{mOperators}
\end{array}
$$
Here, the primed variable $\texttt{mArguments}'$ refers to the argument stack after $\texttt{token}$
has been pushed onto it.
In the following rules we implicitly assume that the token on top of the token stack is not an integer but rather a parenthesis or a proper operator. In order to be more concise, we suppress this precondition from the following rewrite rules.
\texttt{mOperators}' = \texttt{mOperatorsRest} & \wedge \\
\texttt{mTokens}' = \texttt{mTokens} & \wedge \\
\texttt{mArguments}' = \texttt{mArgumentsRest} + [\texttt{lhs} \;\texttt{op}\; \texttt{rhs}]
\end{array}
$$
Here, the expression $\texttt{lhs} \;\texttt{op}\; \texttt{rhs}$ denotes evaluating the operator $\texttt{op}$ with the arguments
$\texttt{lhs}$ and $\texttt{rhs}$.If the token on top of the operator stack is an opening parenthesis, then the operator on top of the token stack is pushed onto the operator stack. $$\begin{array}{lc} \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{op}] & \wedge \\ \texttt{op} \not= \texttt{')'} & \wedge \\ \texttt{mOperators} = \texttt{mOperatorsRest} + [\texttt{'('}] & \Rightarrow \\[0.2cm] \texttt{mOperator}' = \texttt{mOperator} + [\texttt{op}] & \wedge \\ \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\ \texttt{mArguments}' = \texttt{mArguments} \end{array} $$
In the remaining cases neither the token on top of the token stack nor the operator on top of the operator stack can be a a parenthesis. The following rules will implicitly assume that this is the case.
\texttt{mTokens} = \texttt{mTokensRest} + [o_2] & \wedge \\
\texttt{mOperatorsRest} + [o_1] & \wedge \\
\texttt{evalBefore}(o_1, o_2) & \wedge \\
\texttt{mArguments} = \texttt{mArgumentsRest} + [\texttt{lhs}, \texttt{rhs}] & \Rightarrow \\[0.2cm]
\texttt{mOperators}' = \texttt{mOperatorRest} & \wedge \\
\texttt{mTokens}' = \texttt{mTokens} & \wedge \\
\texttt{mArguments}' = \texttt{mArgumentsRest} + [\texttt{lhs} \;o_1\; \texttt{rhs}]
\end{array}
$$ \texttt{mTokens} = \texttt{mTokensRest} + [o_2] & \wedge \\
\texttt{mOperators} = \texttt{mOperatorsRest} + [o_1] & \wedge \\
\neg \texttt{evalBefore}(o_1, o_2) & \Rightarrow \\[0.2cm]
\texttt{mOperators}' = \texttt{mOperators} + [o_2] & \wedge \\
\texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\
\texttt{mArguments}' = \texttt{mArguments}
\end{array}
$$In every step of the evaluation phase we
In [ ]:
def evaluate(self):
while not self.mTokens.isEmpty():
print(self) # only for debugging
nextOp = self.mTokens.top(); self.mTokens.pop()
if isinstance(nextOp, int):
self.mArguments.push(nextOp)
continue
if self.mOperators.isEmpty():
self.mOperators.push(nextOp)
continue
if nextOp == "(":
self.mOperators.push(nextOp)
continue
stackOp = self.mOperators.top()
if stackOp == "(" and nextOp == ")":
self.mOperators.pop()
continue
if nextOp == ")":
self.popAndEvaluate()
self.mTokens.push(nextOp)
continue
if stackOp == '(':
self.mOperators.push(nextOp)
continue
if evalBefore(stackOp, nextOp):
self.popAndEvaluate()
self.mTokens.push(nextOp)
else:
self.mOperators.push(nextOp)
while not self.mOperators.isEmpty():
print(self) # only for debugging
self.popAndEvaluate()
print(self)
return self.mArguments.top()
Calculator.evaluate = evaluate
del evaluate
The method $\texttt{popAndEvaluate}(\texttt{self})$ removes the two topmost numbers $\texttt{rhs}$ and $\texttt{lhs}$ from the argument stack and removes the topmost operator $\texttt{op}$ from the operator stack. It applies the operator $\texttt{op}$ to these numbers by computing $\texttt{lhs} \;\texttt{op}\; \texttt{rhs}$ and then pushes this value back on the argument stack.
In [ ]:
def popAndEvaluate(self):
rhs = self.mArguments.top(); self.mArguments.pop()
lhs = self.mArguments.top(); self.mArguments.pop()
op = self.mOperators.top(); self.mOperators.pop()
result = None
if op == '+':
result = lhs + rhs
if op == '-':
result = lhs - rhs
if op == '*':
result = lhs * rhs
if op == '/':
result = lhs // rhs
if op == '%':
result = lhs % rhs
if op == '**':
result = lhs ** rhs
assert result != None, f'ERROR: *** Unknown Operator *** "{op}"'
self.mArguments.push(result)
Calculator.popAndEvaluate = popAndEvaluate
del popAndEvaluate
In [ ]:
C = Calculator('2*(3+4)**2')
In [ ]:
C.evaluate()
In [ ]:
In [ ]: