Parsing mathematical expressions into a binary tree — Part 1

This is a rewrite of my original article as I realized the problem was more complicated than I initially thought.

Parse and evaluate “(1+2)*3” expression.

This is a two step process. First I decided to tackle the problem without worrying about parentheses. Turns out that there is an algorithm named “Shunting-yard algorithm” that parses mathematical expressions specified in infix notation (normal human readable notation, in which the two expressions above are specified). Follow the Wikipedia link above to read about the algorithm.

However this procedure will return the output queue in the Reverse Polish Notation(RPN), so the output will look like this: “1 2 3 * +” What I want is to build a binary tree in the form:

 + 
1 *
2 3

To convert my output into binary tree form, it is informative to read about how RPN is parsed. Based on that knowledge I came up with my own version of the algorithm that creates a binary tree which if traversed post-order will produce the correct result of a mathematical expression. Note that I replaced the output queue with an output stack.

First I defined some auxiliary classes and methods. Stack class is really just a wrapper for an array, with an additional peek method. Operator is a parent to fours of its subclasses: Plus, Minus, Multiply, Divide, which each define how they are applied two two arguments arg1 and arg2. createOperator is a factory method for the Operator class.

https://gist.github.com/alexandervasyuk/8229d7be345ab7341c30

The main part of the application is the infixToBinary function. In its core it is a for loop that parses the string (one bug I quickly discovered was that numbers can have more than one digit):

https://gist.github.com/alexandervasyuk/b98cfe77f5dd33fb5ade

and the logic of the algorithm:

https://gist.github.com/alexandervasyuk/564dc77637c3004ee69b

If a token is a number it is pushed onto the outputStack, where it waits to be converted into a node of the resulting binary tree by either one of two functions: createSubtree() or updateTree(). Essentially my modification consists of placing subtrees onto the output stack for the updateTree() to pick up when in the final while loop it assembles the entire binary tree. Note that for this version of the algorithm intermediate subtrees are computed if the next operator on the stack is less than or equal to the current one, not just less than. A quick example :“3 — 5 +2". The correct answer to this is 0, but if you only choose to create intermediate subtrees for operators less than the current one, then you will end up with -4, because the algorithm will create a tree looking like this:

 -
3 +
5 2

I learned it the hard way.

The createSubtree function creates intermediate subtrees that are then placed on the output stack:

https://gist.github.com/alexandervasyuk/807decb6ec6a4c7a52ff

The updateTree walks through the stack creating the final tree:

https://gist.github.com/alexandervasyuk/32b364645d4bbdd3b00c

To test the tree use this evaluation function

https://gist.github.com/alexandervasyuk/71b8ea18961a9ad5f832

Read Part 2