Operator priority and associativity in EBNF grammar

Mateusz Bednarski
5 min readApr 21, 2022

--

Defining grammar in EBNF notation is reasonably simple except for operator priority and associativity. I found it difficult too. That’s why I will explain step-by-step how to set operator precedence and associativity for simple grammar.

All examples are written using Python with lark library. But the concept is precisely the same for other EBNF tools. If you want to experiment along, you can use the following snippet (lark and matplotlib must be installed).

Or, if you prefer, here is a colab notebook: https://colab.research.google.com/drive/1k5SHu-USUI98Cyo008e0x-oT8vM_YOq6?usp=sharing

Motivating example

Consider we would like to create a grammar for basic arithmetic operations like 1+2+3 or 7/8+9–2. Later we will expand it to support unary operators (e.g.: -5), exponentiation (2 ^ 3 ), and parentheses (e.g.: (1+2)*3).

BINARY_OP: "+" | "-" | "*" | "/"
!expr : expr BINARY_OP expr
| NUMBER

The grammar is very simple. We have four operators that can work on two numbers. The expression can be either a single number or a sequence of binary operations.

Parse tree for 9/3

But there is a problem. Consider a straightforward case: 1+2+3. Our grammar is ambiguous and can produce two different results!

One interpretation: (1+2)+3=3+3=6
Another interpretation! 1+(2+3)=1+5=6

The first one is equivalent to (1+2)+3=3+3=6. The second one is equivalent to 1+(2+3)=3+3=6. For addition, it is not a problem as a result is the same. But, for subtraction we have:

One interpretation, correct: (1–2)-3=-1–3=-4
Another interpretation, incorrect: 1-(2–3)=1-(-1)=2

The (1–2)-3=-1–3=-4 is not the same as 1-(2–3)=1-(-1)=0!

Associativity

The reason is associativity. For addition a+b+c = (a+b)+c = a+(b+c) but it not the case for subtraction. Our current grammar does not define operator associativity so that it can be interpreted either way. Fortunately, all four operations are left-associative (meaning (a⊗b)⊗c, not a⊗(b⊗c), where ⊗ denotes any operator).

The fix is simple:

BINARY_OP: "+" | "-" | "*" | "/"
!expr : expr BINARY_OP NUMBER
| NUMBER
Now we have only one possibility (left-asoc subtraction)

We control left/right associativity by selecting which side of the operator recursion is performed. If we turn it another way around, operators would be right-associative.

BINARY_OP: "+" | "-" | "*" | "/"
!expr : NUMBER BINARY_OP expr
| NUMBER
If we switch recursion to right, operator is right-asoc

Priority/precedence

We solved the first issue, but still, priorities are not correct. For example:

BINARY_OP: "+" | "-" | "*" | "/"
!expr : expr BINARY_OP NUMBER
| NUMBER
1+2*3 = (1+2)*3

Grammar is unambiguous but incorrect. Let’s try to fix it.

First, we need to select how many priority levels we have. For now, there are two: */ before +-. For each level, we introduce a separate expression type. As */ needs to be computed before +-, we conclude that multiplicative expression should be “under” additive. Let’s see the rules:

ADD_OP: "+" | "-" 
MUL_OP: "*" | "/"
!add_expr: add_expr ADD_OP mul_expr
| mul_expr
!mul_expr: mul_expr MUL_OP NUMBER
| NUMBER
!expr: add_expr
1+2*3 = 1 + (2*3)

Starting from the bottom:
Multiplicative expression is either

  1. mul_expr */ number
  2. number

It is the same as before — we only limited allowed operators to */.

Now, the add_expr should be computed using the result of mul_expr.
Additive expression is either:

  1. add_expr +- mul_expr — recursion at the left (for left-assoc) and mul_expr at the right (it is already computed in previous rule)
  2. mul_expr — in case there are no +- operators

The expression itself now is equivalent to additive expression (last rule).

You may think it is not going to work for all cases. But it is. Let’s see some examples:

1+2–3=(1+2)-3
1/2+3/4=(1/2)+(3/4)
1+2*3+4=1+(2*3)+4=(1+(2*3))+4

Exponentiation

Now let’s apply our knowledge to introduce the exponentiation operator ^. It will be right-associative and has the highest priority.

ADD_OP: "+" | "-" 
MUL_OP: "*" | "/"
POWER_OP: "^"
!add_expr: add_expr ADD_OP mul_expr
| mul_expr
!mul_expr: mul_expr MUL_OP pow_expr
| pow_expr
!pow_expr: NUMBER POWER_OP pow_expr
| NUMBER
!expr: add_expr

I hope you see the pattern here :) Let’s try this.

1 + 2 ^ 3 ^ 4 * 5 = 1 + 2 ^ (3 ^ 4) * 5 = 1 + (2 ^ (3 ^ 4)) * 5 = 1 + ((2 ^ (3 ^ 4)) * 5)

Unary operators

It is time to add unary operators. Let’s say we support only unary minus for expressions like -2+3 or 7*-5. To do this, we introduce a new non-terminal: primary. Primary can be either a number or a unary operator followed by a number.

ADD_OP: "+" | "-" 
MUL_OP: "*" | "/"
POWER_OP: "^"
!add_expr: add_expr ADD_OP mul_expr
| mul_expr
!mul_expr: mul_expr MUL_OP pow_expr
| pow_expr
!pow_expr: primary POWER_OP pow_expr
| primary
!primary: "-" NUMBER
| NUMBER
!expr: add_expr
-2+3 = (-2)+3
7*-5

Parentheses

The last element is parentheses support. By definition, an expression in parenthesis has the highest priority. It turns out that the implementation is trivial: all we have to do is extend the definition of primary to include parenthesized expression.

ADD_OP: "+" | "-" 
MUL_OP: "*" | "/"
POWER_OP: "^"
!add_expr: add_expr ADD_OP mul_expr
| mul_expr
!mul_expr: mul_expr MUL_OP pow_expr
| pow_expr
!pow_expr: primary POWER_OP pow_expr
| primary
!primary: "-" NUMBER
| NUMBER
| "(" expr ")"
!expr: add_expr
2*(3–1)
Monster ((-2)+7)-(3/(4+(-1)))

Closing

We covered how to correctly define EBNF grammar for binary expressions with priorities and associativity. We can easily add additional precedence levels (programming languages typically have about 15 of them) by following the same rules.

This post is a spin-off for a series about creating a programming language and a compiler. If you are interested, check the introductory one:

--

--

Mateusz Bednarski

AI enthusiast. Focused mostly on NLP and good software engineering practices for machine learning projects. Currently working at Roche.