HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Parsing mathematical expressions in Reverse Polish Notation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
polishreverseexpressionsparsingnotationmathematical

Problem

I've wanted to make a calculator for a long time now, so I wrote one of my first parsers, where the input is in reverse polish notation, \$1 \ 1 +\$ becomes \$2\$.

It supports to following operators; +, -, *, /, // floor division, ^ or ** exponent, and % or mod modulo.

While it's the best calculator I've written, the tokens are limited to only ones that take two operators. And I don't really like the idea of having a separate dictionary for ones that take more or less. This is a shame as now I can't add functions like sqrt or sin, or constants like pi or e, without having to have a different group for each.

Personally I dislike the error you get when you ^C out of the program, and so I silenced it. I also made a new error CalculatorError to not mask unknown errors, to easily exit out of the current equation if more than one function is called, make_pop, and to have a human readable error message.

```
TOKENS = {
'+': lambda a, b: a + b,
'-': lambda a, b: a - b,
'': lambda a, b: a b,
'/': lambda a, b: a / b,
'^': lambda a, b: a ** b,
'': lambda a, b: a b,
'%': lambda a, b: a % b,
'mod': lambda a, b: a % b,
'//': lambda a, b: a // b
}

class CalculatorError(Exception):
pass

def make_pop(stack):
def pop():
try:
return stack.pop()
except IndexError:
raise CalculatorError('Not enough arguments for function')
return pop

def call():
stack = []
pop = make_pop(stack)
buf = []

calculation = iter(input('> ') + ' ')

for char in calculation:
if char not in ' \t':
buf.append(char)
continue

token = ''.join(buf)
if not token:
continue

if token in TOKENS:
a = pop()
b = pop()
function = TOKENS[token]
try:
return_value = function(b, a)
except ZeroDivisionError:
raise Calcul

Solution

I'm not quite sure what you're asking for, but I was bored so I typed this up anyway. Hopefully it helps. ;-)

Working from the outside in:

Invocation

if __name__ == '__main__':
    print('Press ^C to exit')
    try:
        main()
    except KeyboardInterrupt:
        pass
    # For some reason my Windows raises EOF on ^C...
    except EOFError:
        print()
    print('Goodbye!')


This is all fine, but it probably should be in the main() function, and you just write

if __name__ == '__main__':
    main()


Mostly, this is a matter of convention, but also, it's handy to have a function that can be run from another Python script that will just do everything.

Organizing the driver code

The process of evaluating an RPN expression is something that makes a lot of sense to put as its own function. That's a self-contained task you could easily want to reuse elsewhere. So I would suggest making a function evaluate_rpn(expr) which does exactly that. You can then use this function in your main loop as follows:

def main():
    print(...)
    try:
        while True:
            try:
                print(evaluate_rpn(input('> ')))
            except CalculatorError as e:
                print(e)
    except KeyboardInterrupt:
    # and so on


Evaluating

This new evaluate_rpn() function is going to do mostly the same thing as your call() function currently does, except that instead of prompting using input(), it will take a string argument, and instead of printing the result, it will return it. Let me pick on a few things in that function, though.

Let the stack just be a stack

stack = []
pop = make_pop(stack)


The only real purpose of make_pop is to make it so that you get a CalculatorError instead of an IndexError when a function or operator doesn't have enough arguments. But I don't think it makes sense to do that. See, stack is just a stack. (Well, a list that is serving as a stack.) It knows nothing of numbers or arguments or calculators and really has no business raising a CalculatorError. The code in evaluate_rpn() is the thing that knows it is a calculator. That's where you should be raising CalculatorErrors. So instead of using make_pop, just watch for IndexErrors during the evaluation, catch them, and raise a CalculatorError in the handling code. As a side effect you won't need make_pop() anymore; you can just dispense with it entirely.

try:
    a = pop()
    b = pop()
except IndexError:
    raise CalculatorError(...)


Do tokenization the easy way

Your input strings consist of tokens (numbers, operators, functions) separated by whitespace, right? There's a built-in method to split a whitespace-separated string: string.split(). You don't need to bother with a buffer, or with iterating through the string's characters.

for token in calculation.split():
    # and so on


Don't overuse variables

You don't really have to save a and b and return_value separately; you only use them once each.

if token in TOKENS:
    try:
        stack.append(TOKENS[token](stack.pop(), stack.pop())
    except ZeroDivisionError:
        raise CalculatorError(...)
    except IndexError: # from a couple paragraphs ago
        # and so on


If different parts of this expression could raise the same type of error, then that would be a good reason to break it down, so that you could tell exactly where a given error comes from. But that's not an issue here. You'll only ever get ZeroDivisionErrors for one reason.

Same goes for the float-parsing part:

else:
    try:
        stack.append(float(token))
    except ValueError:
        raise CalculatorError(...)


For complicated sequences of code, you have to make a subjective judgement about where to use variables and where not to in order to make the code as clear as possible. But in this case, I think most programmers would agree that the variables are fairly unnecessary.

Let output formatting be controlled by the formatting routines

Evaluating an RPN expression doesn't include formatting - in particular, deciding whether the result is an integer or not is a separate task from just getting the result in the first place. So at the end of evaluate_rpn, just return a float. When you print it, instead of print(result) you can do

print('{:g}'.format(result))


and it will automatically strip trailing zeros as needed. Of course, this does display large numbers in scientific notation (2.89146e+07); if that's a problem, then you might have to do your own formatting.

Use descriptive variable names

Your TOKENS list doesn't actually contain tokens; it contains operators, and could be generalized to contain functions. So I'd suggest naming it OPERATORS or some such thing.

Allowing arbitrary numbers of arguments

This is really more about adding a feature than reviewing your existing code, but I might as well point you in the right direction. All you have to do i

Code Snippets

if __name__ == '__main__':
    print('Press ^C to exit')
    try:
        main()
    except KeyboardInterrupt:
        pass
    # For some reason my Windows raises EOF on ^C...
    except EOFError:
        print()
    print('Goodbye!')
if __name__ == '__main__':
    main()
def main():
    print(...)
    try:
        while True:
            try:
                print(evaluate_rpn(input('> ')))
            except CalculatorError as e:
                print(e)
    except KeyboardInterrupt:
    # and so on
stack = []
pop = make_pop(stack)
try:
    a = pop()
    b = pop()
except IndexError:
    raise CalculatorError(...)

Context

StackExchange Code Review Q#106407, answer score: 5

Revisions (0)

No revisions yet.