Simplifying Python workflows with sitecustomize.py

Jean-Baptiste Barth
Alan Product and Technical Blog
4 min readNov 20, 2023
Photo by NASA on Unspash

Alan’s backend is written in Python, using Flask under the hood. We actually manage a few applications within the same codebase, and sometimes found ourselves needing a way to execute code at startup time. We discovered we could do this with the site module, let’s see how!

Using Python’s site module

In Python, the site module is automatically imported during startup. It’s responsible for a variety of initialization tasks for Python, but the most interesting part for us is that it automatically imports sitecustomize and/or usercustomize modules if present, typically in the site-packages directory.

This behavior can be disabled, but it’s on by default, so you have nothing special to do if you want to use it. You can read more about it in the docs.

If you configure the PYTHONPATH=. environment variable, any sitecustomize.py at the root of your repository will be evaluated before any application code runs. For developers, we enforce having this environment variable via a tool called direnv that basically loads/unloads environment variables as you change directories with cd.

You can check the sitecustomize.py file is imported by running your code with PYTHONVERBOSE=1:

% touch sitecustomize.py
% PYTHONPATH=. PYTHONVERBOSE=. flask — help 2>&1 | grep sitecustomize
# code object from /Users/jean-baptiste.barth/dev/alan/backend/sitecustomize.py import ‘sitecustomize’
# <_frozen_importlib_external.SourceFileLoader object at 0x102f6ff50>
# cleanup[2] removing sitecustomize # destroy sitecustomize

Note that I didn’t have to modify my application code for the above to work, which provides the ability to hook into the startup of any python command!

Using sitecustomize.py to check dependencies

As much as dependencies are perfectly controlled in our production environments, it doesn’t happen automatically on engineers’ laptops. They would pull the latest code and then try to run it, sometimes running into issues because this or that dependency is not up-to-date anymore.

Of course we have a way to specify and lock our dependencies (we use poetry at the moment for that), but forcing everybody to run poetry install --sync any time they pull the code is not practical. poetry could have a way to check its dependencies itself (similarly to bundler/setup in the Ruby world) but we would need to ensure it is called in each relevant code path.

Enters sitecustomize.py:

running_command = sys.argv[0]

python_commands = ("flask", "ipython", "pytest", "python")
if sys.platform == "darwin" and any(
command for command in python_commands if running_command.endswith(f"/{command}")
):
subprocess.call(
"bash bin/checksum_dependencies compare >&2",
shell=True,
cwd=os.path.dirname(__file__),
)py

The above code executes the bin/checksum_dependencies compare command when on a developer laptop (“darwin” is the platform of MacOS), and issues some specific Python commands. This script is then responsible for checking dependencies are up-to-date, by taking a simple checksum of poetry.lock and any other relevant files and comparing with the latest checksum it saved on disk.

When the checksum changes, it means your dependencies are out-of-date, and you’ll get a loud, red warning like that:

% ipython                     
******************************************************************
* Dependencies have changed. Please run "poetry install --sync " *
******************************************************************

Auto-loading the Flask Application context

We have numerous commands executed in a Flask-enabled context. Those commands typically rely on Flask’s current_app to load some configuration or log useful debugging information. Sometimes we made the effort to wrap the code into a custom flask <command> of ours and it’s fine.

But sometimes it’s just not practical: a given Python package might expose a 800 lines long CLI command that’s not Flask enabled, but wrapping each and every option into our own codebase just means unnecessary duplication. For instance we started using the simpleflow library recently to host some complex workers code, and the simpleflow workers do need to execute code that is aware of our Flask context.

With sitecustomize.py, we can automatically load the Flask app and execute the command in this context:

def auto_load_app():
try:
from flask.cli import ScriptInfo

app = ScriptInfo().load_app()
context = app.app_context()
context.push()
except Exception as e:
print(f"Got error: {e}", file=sys.stderr)
# We exit explicitly here as the standard behavior of sitecustomize.py is to silently ignore errors if any
sys.exit(1)

running_command = sys.argv[0]
if running_command.endswith("/simpleflow"):
auto_load_app()

Note that this one is not specific to the development phase: we actually use it in production.

Conclusion

The site module offers a flexible way to customize the Python environment at startup time. The utility it provides is immense for us.

It’s definitely a sharp knife though: it has surprising failure modes, and errors in sitecustomize.py can basically lead to our application not working anymore. So it’s crucial to understand its workings before making any alterations, and triple test any behavior we put into this script. We definitely only use it for a few, very specific use cases, and keep application startup in our good old create_app() Flask entrypoint.

We only scratched the surface in this article, but know that the site module is also used widely by many Python packages you know and love, especially with .pth files that provide a similar functionality as the startup one described above, but at Python package level.

--

--