This week I’m not showing any new code for the parser generator I’ve described it the previous parts. Instead, I’ll try to describe what I did at the Core Developer Sprint last week before it all evaporates from my memory. Most of this relates to in PEG one way or another. Then I’ll show some code anyway, because I like to talk about code, and it roughly shows the path I see to a PEG-based parser for CPython 3.9.

[This is part 9 of my PEG series. See the Series Overview for the rest.]

Every year for the past four years a bunch of Python core developers get together for a week-long sprint at an exotic location. These sprints are sponsored by the PSF as well as by the company hosting the sprint. The first two years were hosted by Facebook in Mountain View, last year was Microsoft’s turn in Bellevue, and this year’s sprint was hosted by Bloomberg in London. (I have to say that Bloomberg’s office looks pretty cool.) Kudos to core dev Pablo Galindo Salgado for organizing! …

After making my PEG parser generator self-hosted in the last post, I’m now ready to show how to implement various other PEG features.

[This is part 8 of my PEG series. See the Series Overview for the rest.]

We’ll cover the following PEG features:

  • Named items: NAME=item (the name can be used in an action)
  • Lookaheads: &item (positive) and !item (negative)
  • Grouping items in parentheses: (item item ...)
  • Optional items: [item item ...] and item?
  • Repeated items:item* (zero or more) and item+ (one or more)

Let’s start with named items. These are handy when we have multiple items in one alternative that refer to the same rule, like…

This week we make the parser generator “self-hosted”, meaning the parser generator generates its own parser.

[This is part 7 of my PEG series. See the Series Overview for the rest.]

So we have a parser generator, a piece of which is a parser for grammars. We could call this a meta-parser. The meta-parser works similar to the generated parsers: GrammarParser inherits from Parser, and it uses the same mark() / reset() / expect() machinery. However, it is hand-written. But does it have to be?

It’s a tradition in compiler design to have the compiler written in the language that it compiles. I fondly remember that the Pascal compiler I used when I first learned to program was written in Pascal itself, GCC is written in C, and the Rust compiler is of course written in Rust. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store