Hacking Print As A Statement Into Python 3 (Or How I Spent Way Too Much Effort On An April Fool’s Joke)

Alexander Cougar Lourenco
10 min readApr 5, 2018

--

Recently (not so recently) we began overhauling all of our systems to be Python 3 compatible. The effort was spurred along by a colleague who was a devoted Python 3 aficionado, pushing us all into what we knew we should have been doing in the first place: migrating to the most recent version.

I won’t dwell on it, as there are lots of other wonderful articles on Python migrations, except to say the migration went extremely well: the only casualty being my muscle memory as I retrained my fingers to type print() instead of print. This was a minor frustration, but a frustration nonetheless, and this frustration led to the idea for the prank!

Could you hack print as a statement back into Python 3?

I discussed this with a different colleague of mine, and a plan was hatched: we would create a custom version of Python 3, in time hopefully for April Fool’s, and we would do a fake ICO in which we would casually note that, while in Python 3, the coin wallet was in fact using print like a statement.

Onwards!

I didn’t (and still don’t, really) know much about Python’s C guts. I did know, however, that to add print as a statement back I’d have to change the cpython interpreter’s grammar, so that’s where I started. Thankfully, there is a section on how to do this on the Python developer guide:

so I figured I would just run through this. Being the lazy programmer that I am, I figured that the easiest path to success would be just to diff those files from Python 2 to Python 3, and re-insert anything that made mention of print as a statement, and hope that worked (Spoilers: it ended up working kind of!)

Let’s see how that went!

Step 1: Grammar/Grammar

This part was pretty easy: the grammar is just a collection of symbols that describe various syntax statements and what form they conform to, so the compiler knows what to do with and where to route each symbol. In this case, it was just a matter of copying back in print_stmt back from the Python 2 branch and not having to figure out whatever DSL this file is written in.

(Sidebar: I had a bit of trouble finding out what language this file was in fact written in, so I’ll leave it here for posterity: it’s called Extended Backus-Naur Form, and it can be used to express any context-free grammar!)

Sorted! next.

Step 2: Parser/Python.asdl

The ASDL changes went fine- I just copied and pasted them back from Python 2 to Python 3. the tl;dr on the ASDL file here is that it controls the interpreter AST (or abstract syntax tree): it translates (I don’t think that’s the right word here, but eh) the grammar we made above and calls the appropriate AST object (which we’ll make in step 3) with the function values parsed out of the statement. The ASDL changes did require moving from a bool in the scheme definition, but an int will suffice for our purposes too here:

| Print(expr? dest, expr* values, int nl)

Roughly, that means a “Print” node takes an optional expression dest, one or more expressions values, and an int nl.

But what is it calling? The AST object that we’ll create in step 3!

Step 3: Python/ast.c

Now that we have our context-free grammar and the ASDL that wrangles it into some sort of callable, we need to have a function that handles it!

In case you’re wondering why you never define a Print function anywhere ever again, as I did, that’s because make regen-ast took care of it for you in the previous step and has done the proper routing and bootstrapping in Python-ast.c and Python-ast.h.

This file is a hulking mass of statement definitions- this is where we define what we actually do with those dests, values, and nls that we said we take in the previous step. This is all the logic that says “if the first two characters are >>, that’s a dest, if not then the thing’s a value” etc etc.

I’m definitely not going to think too hard about all this, so I copy / paste the print stuff from Python 2’s ast.c into here and it mostly works okay (there’s a few bits you have to tinker with, like renamed calls and the fact that nl is no longer a bool) but otherwise it works out okay.

Success! Let’s move on to Step 4.

Step 4: make regen-grammar

Nailed it! (I actually forgot to do this and struggled for a bit, which is the only reason I’m mentioning it here: it’s just make regen-grammar).

Step 5: Python/symtable.c

This file controls… well, to be honest, I’m not extraordinarily confident that I totally get this file. From what I can understand of the little bit I’ve read, symbol tables are generally used for type-checking and lexical scope: the compiler constructs the AST tree recursively using the symbol table (I think?) and that way knows what variables are bound to what scope, what we can return, what we can assign (for example, knowing that you can’t declare a global variable inside a function with that variable as the parameter) etc.

An excellent writeup that I did not have the time to thoroughly read, and from which I constructed the (probably misunderstood) summary above is here:

Fortunately, we don’t need to grok this too well to continue in our journey: just graft in the Python 2 print stuff, and continue onwards!

Step 6: Python/compile.c

This file is where the compiler creates the bytecode (the .pyc file) for the interpreter to run. the bytecode are just C opcodes that pop and add things off a stack (LOAD_FAST, ROT_TWO, etc). We’ll just do the copy + paste tango again- hopefully, we don’t need to know anything about what that mea-

Well, shoot.

Step 7: Python/ceval.c

The practical side of me, at this point, wants to admit defeat: just slap a fake version header on a compiled 2.7 interpreter and call it a day. But the part of me that tilts at windmills is so close!

I forge onwards, even though now I am off the beaten track of the developer guide. The error comes out of ceval.c, so I hop in there to see if it’s an easy fix.

Two things are obvious to me right away: I have no idea what’s going on, and the 2.7 ceval.c is significantly different from the 3.5 ceval.c.

an example:

TARGET_NOARG(INPLACE_MODULO)
{
w = POP();
v = TOP();
x = PyNumber_InPlaceRemainder(v, w);
Py_DECREF(v);
Py_DECREF(w);
SET_TOP(x);
if (x != NULL) DISPATCH();
break;

vs 3.5:

TARGET(INPLACE_MODULO) {
PyObject *right = POP();
PyObject *left = TOP();
PyObject *mod = PyNumber_InPlaceRemainder(left, right);
Py_DECREF(left);
Py_DECREF(right);
SET_TOP(mod);
if (mod == NULL)
goto error;
DISPATCH();
}

in 2.7, it appears that w, v, and x are temporary global registers that get used throughout the entire thing, and it’s damned difficult to figure out what the heck they’re doing or assigned to at any given time. Also, due to this, the 2.7 TARGET blocks can “fall” through to the next one, keeping the assigned values to those globals, while the 3.5 blocks have no such luxury.

I copy and paste the 2.7 opcodes in there to see if they work, and they explode all over the place on me: after some tinkering (turning those globals into proper local variables, replacing old functions with their 3.5 renamed equivalents) I’m getting close, but there’s one function that’s stymieing me: PyFile_SoftSpace. The rest of the functions appear to have 3.5 equivalents, but this one does not.

int PyFile_SoftSpace(PyObject *p, int newflag)¶
This function exists for internal use by the interpreter. Set the softspace attribute of p to newflag and return the previous value. p does not have to be a file object for this function to work properly; any object is supported (thought its only interesting if the softspace attribute can be set). This function clears any errors, and will return 0 as the previous value if the attribute either does not exist or if there were errors in retrieving it. There is no way to detect errors from this function, but doing so should not be needed.

To be honest, I’ve googled around pretty extensively and I’m still not 100% sure I understand what the heck this function is for. I’ve found a few mentions of softspace in the 3.0 changelog:

The print() function doesn’t support the “softspace” feature of the old print statement. For example, in Python 2.x, print “A\n”, “B” would write “A\nB\n”; but in Python 3.0, print(“A\n”, “B”) writes “A\n B\n”.

but I don’t understand really why that was ever a thing, or why the C interpreter needed a whole warren’s nest of logic that I didn’t understand in order to keep track of this. The Python readthedocs has this to say about it:

Remarks
Classes that are trying to simulate a file object should also have a writable softspace attribute, which should be initialized to zero. This will be automatic for most classes implemented in Python (care may be needed for objects that override attribute access); types implemented in C will have to provide a writable softspace attribute.
Note
This attribute is not used to control the print statement, but to allow the implementation of print to keep track of its internal state.

At any rate, stripping it out and trying to reconstruct the logic is not working out for me. Time’s running out at this point, and the April Fool’s deadline is nigh. I am close to admitting defeat, but then- inspiration!

What if I just avoided the file and dest logic entirely? What if I just used the PRINT_EXPR bytecode, which exists already? (The eagle-eyed among you might predict what happens next, given the images in the article!)

static int
compiler_print(struct compiler *c, stmt_ty s)
{
int i, n;
int dest;
assert(s->kind == Print_kind);
n = asdl_seq_LEN(s->v.Print.values);
dest = 0;
if (s->v.Print.dest) {
VISIT(c, expr, s->v.Print.dest);
dest = 1;
}
for (i = 0; i < n; i++) {
expr_ty e = (expr_ty)asdl_seq_GET(s->v.Print.values, i);
if (dest) {
ADDOP(c, DUP_TOP);
VISIT(c, expr, e);
ADDOP(c, ROT_TWO);
ADDOP(c, PRINT_EXPR);
}
else {
VISIT(c, expr, e);
ADDOP(c, PRINT_EXPR);
}
}
// if (s->v.Print.nl) {
// if (dest)
// ADDOP(c, PRINT_EXPR)
// else
// ADDOP(c, PRINT_EXPR)
// }
// else if (dest)
// ADDOP(c, POP_TOP);
return 1;
}

What this is doing is piping to the print expression opcode (the one that happens when you type a string or object and hit enter without any print statement at all). Not ideal, and it means the print statement can’t print to a file, but at this point I’ll take it if it works- and it does!

It makes without any problems, except that now I’ve got a bunch of Python standard library includes that rely on print as a function and those are dying when I try to finish compiling (Hoisted by my own petard!)

The solution to this, as it turns out, was easier than I thought (my first attempt was a ham-handed sed in-place regex that destroyed those files): I found 3to2, a converter that turns Python 3 code into Python 2 code, and I ran it against all of the python standard library includes (an act which felt deliciously transgressive).

I finally run the compiler, cross my fingers, and…

SUCCESS.

For good measure, I renamed the print method to threeprint in the bltin_modules.c file to preserve print as a function (and to include a troll-y little easter egg).

The actual demo went really well!

It was a fun, light-hearted April Fool’s demo, and much humor was had (though as you can see I botched whatever it was I was trying to do, and I had to drop into an interpreter).

You’ll notice the flurry of sudo cp up top: as it turns out, it’s really difficult to install python3 packages with pip on this hacked together interpreter since so much of it assumes print is a function (and why wouldn’t it?). I got around this by pip installing to the global Python 3 interpreter on the VM and then copying those packages to the custom Python interpreter, making sure to 3to2 any files that barfed upon import.

Conclusion

While this started out as a fun prank, I actually learned a ton about Python’s internal workings, and it was really illuminating as a deep dive into how the CPython interpreter works (and also how well Python’s minimalism serves it in keeping its C implementation pretty clean).

If anyone is interested, the branch with my hacked-in print statement lives here: https://github.com/hbbtstar/cpython

And here are some references that I found uesful for the bold:

Thanks for reading, and I would ask that anyone who knows more about this than I (which is probably everybody) to offer comments where I got things wrong so I can correct them!

--

--