Making Languages

Part 1: Grammar

← Back | Forward →

Grammar is a formal description of a language, how it works and which words are allowed to follow which other words. It’s similar in the creation of programming languages.

Grammar is arranged in lists of rules. These rules have a left-hand side and a right-hand side (like X → Y). Symbols on the left-hand (like X)side can be replaced by symbols on the right-hand side (like Y). They’re a kind of placeholder.

The right-hand side contains placeholders and terminal symbols. These terminal symbols can’t be reduced any further. Using this list of rules; a parser can create patterns to match source code. If the patterns can be created (by expanding/reducing) a list of symbols, and they match some code then the code can be parsed.

Creating Grammars

To illustrate this; imagine I have a programming language that only handles the following code:

5 + 7

All we need are addition operations: plus symbols with numbers on either side. The rules for this simple language could be:

E → int + E | int

According to that grammar; any number of addition operations can be chained to produce a final result. So imagine we wanted to parse the following code:

1 + 2 + 3 + 4 + 5

We would want a machine to extrapolate, from our starting symbol E to
int + int + int + int + int because that would then match the code we are trying to parse. To match this code, the parser would need to expand the rules until the pattern is the same:

→ E
→ int + E
→ int + int + E
→ int + int + int + E
→ int + int + int + int + E
→ int + int + int + int + int

Let’s consider something a little more complex. Let’s imagine we have a language which will parse the following code:

(1 + 2) * 3

This language would introduce a couple of new features to the grammar we saw before. We can parenthesise expressions and perform multiplication. It can be described in a grammar resembling:

E → E + E | E * E | int | (E)

This grammar would allow for the kind of transformations we need to be able to parse our code:

→ E
→ E * E
→ (E) * E
→ (E + E) * E
→ (int + E) * E
→ (int + int) * E
→ (int + int) * int

Purposes Of Grammar

Grammars are important for three reasons:

  1. Grammar serves as the most fundamental documentation of how a language works. They can be written so that anybody wanting to make a parser can know exactly who the language is supposed to work.
  2. Grammar is used to check code. If code cannot be matched by the grammar then it is not valid and can be rejected by the parser.
  3. Advanced structures (Abstract Syntax Trees) can be generated by using Grammar. We’ll discuss these structures later.

Grammars are also used to generate compilers. Tools, such as Bison swallow these grammars and spit out compilers. Definitely a more efficient method than creating a compiler by hand, but also quite boring.

← Back | Forward →