Extending the Python Syntax
How to add sugar to the Python syntax with main wrappers
I love Python. It’s an extremely powerful language that’s very simple to write and understand. It also has an extensive ecosystem of libraries that allow you to do pretty much anything you can think of (even fly). Recently, it has become the lingua franca of Data Science and Machine Learning. I use it constantly in my day job, so I’d say I’m fairly proficient at it (I do have Python readability at Google after all 😏¹).
However, Python is not perfect. I’ve recently started learning Lisp in my free time.² While learning, I stumbled across what I think is a very cool (but minor) feature of Lisp. When defining functions, Lisp allows you to use previous arguments as default values.
For example, suppose you’re writing a
make_rectangle function which takes as input the
x, y coordinates of the top-left corner,
height and returns the
x, y coordinates of the 4 corners of the rectangle. We can do this pretty easily in Python:
Now suppose that we want the function to return squares if it’s only passed the
width argument. If you never programmed before, you might expect that a reasonable thing to is to set
height=width, like so:
Unfortunately, it’s not valid Python. If you try to run it, you’ll get an error:
NameError: name 'width' is not defined.
Personally, I feel that this is a failing of the language. Sure, it’s pretty easy to work around the syntax and get a similar behavior:
But that pisses me off! The language should do what I want, not the other way around. Instead of bending to the will of Python, I’ve created a
__main__ wrapper that lets me write the code I want.
As always, full working code is available on my GitHub.
__main__ wrapper is a Python program which wraps the
__main__ function of another Python program. The wrapper can run whatever code it wants before calling through to the main program. It turns out that a lot of tools are written this way, such as pdb, profile, etc. If you’re interested in all the details, David Beazley has an interesting talk on the topic .
Since the wrapper runs before the main program, it’s possible for it to modify the code of the main program before executing it. This means that we can use the
__main__ wrapper to transform code like
improved_make_rectangle.py into valid Python like
valid_make_rectangle.py. This way we can write in the more pleasant syntax while still having it execute correctly by the Python interpreter.
Discourse On Method
(A quick tangent on other possible approaches; feel free to skip.)
I thought a lot about how to add this functionality without the wrapper, perhaps through function decorators, context managers, etc. However, I came to realize that it’s likely impossible since
improved_make_rectangle.py is not valid Python.
Another avenue I thought about exploring was to modify the Python interpreter itself. I didn’t go with this approach for primarily two reasons:
- It sounds a lot more tedious to pull off since you’d have to modify the underlying C code.
- It’s a less portable because it involves forking the Python interpreter. Anyone who might want to use the new syntax can only do so via the forked interpreter. Conversely, the
__main__wrapper is just regular Python.
If you know any better ways to implement this functionality, please let me know in the comments. I’m very interested in finding out!
Extending the Python Syntax
The first thing we need to do is create a wrapper which is able to run other programs. As demonstrated in , this is fairly simple:
Most of the code just makes sure the Python environment is set up correctly when we execute the main program in line 17. The interesting part happens in lines 7–10: here we read the main program code,
preprocess (i.e. rewrite) it to be valid Python, and compile it to be executed. We can now wrap any Python program by running it like this:
python3 -m main_wrapper some_random_program.py
Of course, the
preprocess function is up to us to write. What we want to do is go through every line of code in the main program and find all function headers, i.e. all lines that start with
def (ignoring any leading whitespace).³ Then for each header we find, we want to see if any of it’s arguments reference previously defined ones. If so we want to transform them into the
make_rectangle_valid.py equivalent by setting them to
None and for each one adding
argument = argument or prev_argument to the top of the function definition. Here’s how this looks in code:
We’re making a lot of implicit assumptions that would need to be ironed out in a more robust implementation. For example, we’re assuming that the main program’s code uses 4 white space characters for each indentation level (line 9) and that the overall code follows the PEP 8 style guide (otherwise line 12 would fail if
kwargs had whitespace around the
=, such as
some_arg = 12). While we could relax these assumptions with some effort, what we have here is a good first step.
Finally, you may have noticed the cryptic call to the
_buid_arg_to_prev_arg function on line 10. This function is responsible for parsing a function header and extracting all arguments which reference previously defined arguments. To do this, it first removes everything that’s not a function argument, such as white space, parentheses, the
def statement, etc. Then, for each argument, it checks if it references any previously defined argument, and if so adds it to a dictionary. Finally, the dictionary is returned. This translates into the following code:
We now have all the components in place to make our wrapper work. Here is the
improved_make_rectangle.py code again with a simple
__main__ method to sanity check the implementation for both rectangles and squares:
As before, running without the wrapper raises a
$ python3 wrapper_test.py
Traceback (most recent call last):
File "wrapper_test.py", line 1, in <module>
def make_rectangle(x, y, width, height=width):
NameError: name 'width' is not defined
However, when we run with the wrapper, the messages correctly print out in the terminal:
$ python3 -m wrap_main wrapper_test.py
Rectangle: ((0, 0), (100, 0), (100, 200), (0, 200))
Square: ((0, 0), (100, 0), (100, 100), (0, 100))
In this article we added some extra syntactic sugar to Python by writing a
__main__ wrapper. As it currently stands, the wrapper is pretty brittle and would fail spectacularly if you tried to use it in real world situations. Also the amount of code we had to write to get rid of a few extra lines is pretty large. So, all things considered, the wrapper is probably not worth it. However,
__main__ wrappers themselves are a general meta-programming technique which allow you to extend Python to do very cool and interesting things. If nothing else, they’re fun to write 🙂.
Thanks for reading!
July 15, 2019
P.S. For the sake of brevity, I’ve skipped over some minor things in this article. If you want the full details, please check out the full, working code on my GitHub.
If you liked this article, then follow me to get notified about new posts!
¹ Of all things to brag about, having Python readability at Google is probably the dumbest. In case it wasn’t obvious, I’m only kidding.
³ It’s worth pointing out that finding function definitions this way is very brittle. While this is OK for quick prototyping, something more robust would need to follow the Python reference.
 D. Beazley, Modules and Packages: Live and Let Die!, PyCon (2015).