This is a supporting article to a series of articles about writing a Windows 3 Emulator in C#. See here for more information on about Win3mu.
This post was supposed to be about the Win3mu debugger’s expression engine that I mentioned in the previous post but instead of describing that very specific implementation I thought it might be more useful to explain how to build a generic math expression engine.
The concepts are essentially the same and this is probably more widely useful.
What is an Expression Engine?
An expression engine is a library that can take a string like the following and calculate a resulting value.
"2 * pi * r"
You could use this to let a user type math expressions into an input field, or to define custom functions for a graph plotting application for example.
The implementation I describe here will support:
- Double precision numbers only
- The basic math operations (add, subtract, multiply, divide)
- Correct order of operation
- Ability to call C# through reflection
I’ve made the source code for this available and there are branches for each of the seven steps described below. Feel free to do with this code as you wish but note that it’s been written in a manner to best demonstrate the topics at hand — as you extend it you’ll almost certainly find better ways to implement some parts of it.
eg: C# will never understand “byte ptr es:[di]” which Win3mu’s debugger does need.
This is a fairly fast moving tutorial with lots of code samples but by the end of it you should have a solid understanding of how to build a flexible and powerful expression evaluation engine.
Step 1 — The Tokenizer
The first thing we need to evaluate an expression is a tokenizer. The tokenizer’s job is to break the expression string into a series of tokens where each token describes one piece of the input string.
For example the string “10 + 20” would be tokenized to:
Token Number "10"
Token Number "20"
Lets start by defining a Token enumerated type:
and a Tokenizer class that reads from a TextReader and provides a way to read the current token and move to the next:
The implementation of the Tokenizer class is reasonably straight forward — it skips whitespace, looks for special characters and parses numbers into doubles. See full source code here.
The tokenizer is used like so:
Step 2 — The Parser and Add/Subtract
The next step is to read the stream of tokens generated by the tokenizer and generate an expression tree — a hierarchy of nodes that represent the structure of the expression.
For example the expression “10 + 20” would be represented by three nodes:
We’ll need a couple of different node types but they’ll all derive from a common base class “Node” which has a method “Eval” that returns the node’s value.
The simplest kind of node is a “NodeNumber” which represents a literal number (ie: the 10 and 20 in the above example).
Add and Subtract are both binary operations — they have two operands and perform some operation on them. Let’s define a node for binary operations:
The important things to note about NodeBinary are:
- It takes two other Node’s as its operands. This is how the “tree” is constructed.
- It takes a Func<> that defines the actual operation
- The Eval function first evaluates the two operand nodes and then calls the Func<> to “do the operation”.
We could now manually construct an expression tree like this:
Of course what we really want to do is construct this tree from our stream of tokens. This is the job of the Parser.
I’ve included the entire first pass of the Parser class here. Take the time to understand it because it’s the core of the expression engine and everything that follows builds upon it.
- Parser reads tokens from a supplied Tokenizer
- The ParseLeaf method creates a NodeNumber if the current token is a number
- The ParseAddSubtract method parses a leaf for the left hand side and then checks if it’s followed by an Add or Subtract token. If it is then it parses another leaf for the right-hand side and joins them with an appropriate NodeBinary.
- ParseAddSubtract will handle a sequence of add/subtract operations. (eg: “10 + 20–30 + 40” etc…)
Parser also includes a couple of static convenience methods (see full code) to automatically create a TextReader and Tokenizer for a string. The following test cases now work:
See what just happened? This is now a functioning expression engine that’s actually evaluating expressions. It’s just a little limited.
Part 3 — Unary Operations
One subtle problem with the above is that it doesn’t support negative numbers. eg: this expression would fail with a syntax error:
"10 + -20"
What we need is a “negate” operator. NodeUnary is identical to NodeBinary with one less argument (see here) and can be used for the negate operation.
There’s two interesting aspects to consider here:
- There’s also a “positive” operator. eg: “10 + +20” is valid
- The right hand side of a unary operator can be another unary operator. eg: “10 + -+20” is also valid. “ten plus negative positive twenty”.
To support this the Parser class has been updated with a new ParseUnary method and ParseAddSubtract is updated to call it instead of ParseLeaf for the left and right hand sides of the operation.
Here’s the test cases:
Part 4 — Multiply, Divide, Parenthesis and Order Of Operation
So far the Parser class has the following methods to generate expression nodes:
It may not be obvious at first glance but the order these methods call each other determines “order of operation” with lower priority parse methods calling higher priority methods.
Multiply and Divide are higher priority operations than Add and Subtract, but lower than Negate so adding support for Multiply and Divide simply means adding some new tokens, writing a new method ParseMultiplyDivide (exactly like ParseAddSubtract) and sitting it in the call hierarchy between ParseAddSubtract and ParseUnary. ie:
- ParseAddSubtract should call ParseMultiplyDivide for its operands
- ParseMultiplyDivide should call ParseUnary for its operands
As an example, consider this expression:
10 + 20 * 30
After ParseAddSubtract has parsed the token for add, it calls ParseMultiplyDivide which will consume the “20 * 30” to yield a multiply node which becomes the right hand node for the add operation.
A similar approach is used for parentheses, or bracketed expression:
- Add new tokens for OpenParens and CloseParens and update the tokenizer to support them
- Update ParseLeaf to recognise these tokens and parse the bracketed expression.
Because the bracketed expression is essentially an entire expression in itself it’s parsed using ParseAddSubtract:
The parser now supports the entire expression syntax and correctly handles order of operation:
Part 5 — Variables
So far the expression engine might be handy for letting a user do simple math in a text input field but its limited in that it only supports literal numbers. The next thing to add is variables.
First, the tokenizer needs to be updated to recognize “identifiers”. ie: variable names.
Next there’s a new node class “NodeVariable” that stores the name of the variable and calls an “IContext” to get the variable’s value when the expression is evaluated.
IContext provides a way for the expression to call out to external code. The other node types all need to be updated to pass this variable through.
The final step is to update ParseLeaf to generate the new node:
Here’s a simple example of a context that provides two variables- “pi” and “r”:
Part 6 — Functions
Functions are really just a more advanced version of variables — an identifier followed by a bracketed and comma separated list of arguments expressions.
NodeFunctionCall is just like NodeVariable but in addition to the name of the function it also accepts an array of expression nodes that are the function arguments. Also, IContext is updated with a new method “CallFunction”
You should be able to guess how the parsing of functions work so I won’t include the code here. (You can find it here).
Here’s how it would be used:
Part 7 — Reflection Context
The above example works well enough however if you have a lot of variables or functions it’s not a very elegant solution. Instead of hand coding all your variables and functions, .NET Reflection can be used to call the methods and properties of a C# object.
ReflectionContext does exactly this — it takes a target object and provides an IContext that will call the the methods and parameters of that object.
We now have a super easy way for expressions to call a library of C# code:
Taking it Further
We’ve now got a handy little expression engine. Here’s some ways you might like to extend it:
- Other Types — instead of working just with doubles add support for strings, bools and other types.
- Simplification — if you’re after a fast expression engine you could write code to walk the expression tree and simplify it.
- Bit-wise Operators— bit-wise and, or, xor etc…
- Comparison Operators — greather than, less than etc…
- Ternary Operator — “x < y ? 10 : 20”
- Conditional Operators —And (&&) and Or (||) and don’t forget about the short-circuit evaluation.
- Support for object members. eg: “rocket.y” and “rocket.launchIn(10)”
- Variable and Function scopes.
You could even use this as the basis for an interpreted language or as part of a HTML template engine.
The full source code for SimpleExpressionEngine is available here:
Back to the Debugger
The next post will return to the debugger. Stay tuned!
Hi, I’m Brad Robinson — an independent software developer living in Sydney Australia. I write software for musicians and as an indie developer I rely on word of mouth.
If you enjoyed this article please consider sharing it by hitting the “recommend heart” below or by sharing on Facebook/Twitter. It’s a small gesture but makes a real difference.
Also, if your feed is lacking in hex dumps, disassembly listings and screen shots of old Windows 3 games you might like to follow me on Twitter.