How Does the Python Interpreter Execute Your Code?
Introduction
Python is a beloved programming language in many fields. Most AI developers and researchers write Python code on the Tensorflow and PyTorch frameworks. Python is also widely used in data analytics with a rich set of data visualization tools like matplotlib
. And thanks to Python’s feature enabling easy binding to C/C++, programmers can easily write code in Python while using C++ frameworks like Qt or ROS frameworks.
But do you know how the Python interpreter executes your code under the hood? If you are curious about the implementation of the Python interpreter, here’s a series of articles explaining and discussing the implementation of CPython, an implementation of Python written in C.
Python was created by a Dutch programmer named Guido van Rossum in 1991. When Python was first created, its implementation was written in C, called CPython. And still, CPython is the official release that you can download from the official website of the Python Software Foundation. There are many other Python implementations written in different programming languages, like Jython written in Java or PyPy written in Python itself. Here, we will explain the Python interpreter based on the official CPython implementation.
Build CPython from Source
In this series of articles, our explanation is based on CPython of version 3.19 and the Ubuntu environment. You can build CPython from source with the commands below.
The Beginning of the Python Interpreter
Everyone has their own way to understand code written by someone else. We will find the entrance where the code begins execution. In CPython, Programs/python.c
is the entrance that is executed first when you type python
on the terminal. Programs/python.c
is shown below and it’s pretty simple, right? Here, main
function in Line 13 encapsulates Py_BytesMain
function in Line 15, which is defined in Modules/main.c
, which is the top module of the Python interpreter.
If you chase the definition of Py_BytesMain
function, it again encapsulates pymain_main
function in Line 9, defined in the same file (Modules/main.c
).
pymain_main
function first initializes the configurations by calling pymain_init
function in Line 4, checks the returned status, and if everything is fine and ready to go, it calls Py_RunMain
function in Line 13 to actually execute the Python code the programmer feeds.
Let’s first look at the part that initializes configurations, which is done by pymain_init
function. The CPython configurations consist of three parts, which are defined in Include/cpython/initconfig.h
:
PyPreConfig
dictionary configurationsPyConfig
runtime configurations- Configurations that are used when compiling the Python interpreter
PyPreConfig
configurations are related to the user environment or the operating system. One of the most important things that PyPreConfig
does is to set the Python memory allocator. Pyconfig
defines runtime configurations such as execution mode specifying the source of the Python code (from a file or stdin
).
Now, let’s look at the part that executes the code, which is done by Py_RunMain
function. Note that Py_RunMain
function is called by pymain_main
function after initializing various configurations. Py_RunMain
executes the Python code by calling pymain_run_python
in Line 6 and finalizes allocated resources afterward.
pymain_run_python
loads the initialized PyConfig
configuration and figures out which execution mode the Python interpreter should run. There are three different methods of feeding Python code to the interpreter: 1) file, 2) I/O stream, and 3) string. For example, if config->run_filename
is set to true, then the Python interpreter calls pymain_run_file
in Line 62 with PyConfig
argument, executing code written in the file. We will look into this execution mode, so, let’s look at the definition of pymain_run_file
function.
pymain_run_file
function is a wrapper of pymain_run_file_obj
function in Line 16.
_PyRun_AnyFileObject
handles two different modes: interactive loop mode and simple file mode. Since we are assuming a scenario where the programmer feeds code in a file format, the control goes to _PyRun_SimpleFileObject
in Line 23.
_PyRun_SimpleFileObject
checks if the file is already bytecode. If so, _PyRun_SimpleFileObject
calls run_pyc_file
in Line 50 to execute the bytecode, and if not, it calls pyrun_file
in Line 59 to execute the Python code in the file.
pyrun_file
creates PyArena
in Line 5, which allocates and manages memory for Python objects. And pyrun_file
constructs an Abstract Syntax Tree (AST) in Line 11 from the code in the input file. Then, it’s finally ready to call run_mod
function in Line 20 to execute the code in the AST form and with PyArena
.
Now that we have looked into the flow from typing python
into the terminal to the beginning of the compilation phase of the input Python code, let’s take a look at how CPython conducts lexical and syntax analysis with its lexer and parser in our next article!
Reference
- Anthony Shaw, “CPython Internals”
- http://www.python.org
- https://jython-devguide.readthedocs.io/en/latest/compiler.html