A Meta-Grammar for PEG Parsers

Guido van Rossum
9 min readSep 9, 2019

This week we make the parser generator “self-hosted”, meaning the parser generator generates its own parser.

[This is part 7 of my PEG series. See the Series Overview for the rest.]

So we have a parser generator, a piece of which is a parser for grammars. We could call this a meta-parser. The meta-parser works similar to the generated parsers: GrammarParser inherits from Parser, and it uses the same mark() / reset() / expect() machinery. However, it is hand-written. But does it have to be?

It’s a tradition in compiler design to have the compiler written in the language that it compiles. I fondly remember that the Pascal compiler I used when I first learned to program was written in Pascal itself, GCC is written in C, and the Rust compiler is of course written in Rust.

How is this done? There’s a bootstrap: a compiler for a subset or earlier version of the language is written in a different language. (I recall that the initial Pascal compiler was written in FORTRAN!) Then a new compiler is written in the target language and compiled with the bootstrap compiler. Once the new compiler works well enough, the bootstrap compiler is scrapped, and each next version of the language or compiler is constrained by what the previous version of the compiler can compile.

Let’s do this for our meta-parser. We’ll write a grammar for grammars (a meta-grammar) and then we’ll generate a new meta-parser from that. Luckily I had planned this from the start, so it’s a pretty simple exercise. The actions we added in the previous episode are an essential ingredient, because we don’t want to have to change the generator — so we need to be able to generate a compatible data structure.

Here’s a simplified version of the meta-grammar without actions:

start: rules ENDMARKER
rules: rule rules | rule
rule: NAME ":" alts NEWLINE
alts: alt "|" alts | alt
alt: items
items: item items | item

I’ll show how to add actions from the bottom to the top. Recall from part 3 that there are Rule objects which have attributesname and alts. Originally, alts was just a list of lists of strings (the outer list representing the alternatives, the inner lists representing the items of each alternative), but in order to add actions I changed things so that alternatives are represented by Alt objects with attributes items and action. Items are still represented by plain strings. For item we get:

Guido van Rossum