This week I’m not showing any new code for the parser generator I’ve described it the previous parts. Instead, I’ll try to describe what I did at the Core Developer Sprint last week before it all evaporates from my memory. Most of this relates to in PEG one way or another. Then I’ll show some code anyway, because I like to talk about code, and it roughly shows the path I see to a PEG-based parser for CPython 3.9.

[This is part 9 of my PEG series. See the Series Overview for the rest.]

Every year for the past four…

After making my PEG parser generator self-hosted in the last post, I’m now ready to show how to implement various other PEG features.

[This is part 8 of my PEG series. See the Series Overview for the rest.]

We’ll cover the following PEG features:

  • Named items: NAME=item (the name can be used in an action)
  • Lookaheads: &item (positive) and !item (negative)
  • Grouping items in parentheses: (item item ...)
  • Optional items: [item item ...] and item?
  • Repeated items:item* (zero or more) and item+ (one or more)

Let’s start with named items. These are handy when we have multiple items in one…

This week we make the parser generator “self-hosted”, meaning the parser generator generates its own parser.

[This is part 7 of my PEG series. See the Series Overview for the rest.]

So we have a parser generator, a piece of which is a parser for grammars. We could call this a meta-parser. The meta-parser works similar to the generated parsers: GrammarParser inherits from Parser, and it uses the same mark() / reset() / expect() machinery. However, it is hand-written. But does it have to be?

It’s a tradition in compiler design to have the compiler written in the language that…

My series of blog posts about PEG parsing keeps expanding. Instead of updating each part to link to all other parts, here’s the table of content:

  1. PEG Parsers
  2. Building a PEG Parser
  3. Generating a PEG Parser
  4. Visualizing PEG Parsing
  5. Left-recursive PEG Grammars
  6. Adding Actions to a PEG Grammar
  7. A Meta-Grammar for PEG Parsers
  8. Implementing PEG Features
  9. PEG at the Core Developer Sprint

A video of a talk I gave about this topic at North Bay Python is up on YouTube: Writing a PEG parser for fun and profit

Image for post
Image for post

Update: April 2, 2020. In case you are wondering what’s happening, we now have PEP 617 up, which proposes to replace the current parser in CPython with a PEG-based parser.

License for this article, the series, and the code shown: CC BY-NC-SA 4.0

Grammars are better if you also can add (some) semantics inline with the grammar rules. In particular, for the Python parser I’m building I need to control the AST node returned by each alternative, since the AST format is given.

[This is part 6 of my PEG series. See the Series Overview for the rest.]

Many grammars use the convention of allowing actions to be added to rules, usually a block of code inside {curly braces}. More precisely, actions are attached to alternatives. The code in an action block is usually written in the language that the rest of the…

I’ve alluded to left-recursion as a stumbling block a few times, and it’s time to tackle it. The basic problem is that with a recursive descent parser, left-recursion causes instant death by stack overflow.

[This is part 5 of my PEG series. See the Series Overview for the rest.]

Consider this hypothetical grammar rule:

expr: expr '+' term | term

If we were to naively translate this grammar fragment into a fragment of a recursive descent parser we’d get something like the following:

def expr():
if expr() and expect('+') and term():
return True
if term():
return True
return False


Last week I showed a simple PEG parser generator. This week I’ll show what the generated parser actually does when it’s parsing a program. I took a dive into the retro world of ASCII art, in particular a library named “curses” which is available in the Python standard library for Linux and Mac, and as an add-on for Windows.

[This is part 4 of my PEG series. See the Series Overview for the rest.]

Let’s have a look at the visualization in progress. …

Now that I’ve sketched the infrastructure for a parser and a simple hand-written parser in part 2, let’s turn to generating a parser from a grammar, as promised. I’ll also show how to implement packrat parsing using a @memoize decorator.

[This is part 3 of my PEG series. See the Series Overview for the rest.]

Last time we ended with a hand-written parser. With some limitations to the grammar, it’s easy to generate such parsers automatically from the grammar. (We’ll lift those limitations later.)

We need two things: something that reads the grammar, constructing a data structure representing the grammar…

Inspired by only a partial understanding of PEG parsing I decided to build one. The result may not be a great general-purpose PEG parser generator — there are already many of those (e.g. TatSu is written in Python and generates Python code) — but it was a good way to learn about PEG, and it furthers my goal of replacing CPython’s parser with one built from a PEG grammar.

[This is part 2 of my PEG series. See the Series Overview for the rest.]

In this section I lay the groundwork for understanding how the generated parser works, by showing…

Some years ago someone asked whether it would make sense to switch Python to a PEG parser. (Or a PEG grammar; I don’t recall exactly what was said by whom, or when.) I looked into it a bit and wasn’t sure what to think, so I dropped the subject. Recently I’ve learned more about PEG (Parsing Expression Grammars), and I now think it’s an interesting alternative to the home-grown parser generator that I developed 30 years ago when I started working on Python. (That parser generator, dubbed “pgen”, was just about the first piece of code I wrote for Python.)

Guido van Rossum

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store