Platform Engineering
Optimising Python3 with Nuitka and Docker Scratch
A Docker container is good, a lighter Docker container is better — Anonymous Cloud Engineer poet
Lately, I have been working with Python3 to implement new features to an existing CLI. I enjoyed discovering multi-threading and concurrency in Python. Baking my scripts in a Docker container was really straight forward:
However, in a world where the time it takes to pull your container matters, the lighter your container is, the better.
In this post, I will explain how I packaged my Python to move from a container of approximately 80Mb to a container of 15Mb.
A bit of background
I am a software engineer who turned cloud engineer working for DAZN in Amsterdam. I have been working a lot with Golang and one of the thing I missed with Python was the ability to compile it into one binary.
My Python script that we will bake in a very tiny container is a CLI performing HTTP(s) requests against an API. The CLI uses multi-threading to dedicate a thread to network operations and another one to I/O operations. On top of the multithreading, I am using the library asyncio
to run the different tasks dispatched to the threads in concurrency. The CLI is baked in a container used as an action for GitHub Actions. GitHub Actions having a billing based on time units, I decided to put extra efforts on optimising the performance of my CLI and the way it’s distributed. For the performance, I explained what I did in the code. Regarding the distribution, I opted in for a Docker container pushed to ECR containing my CLI.
Now that the background is established, let’s deep dive into the core topic of this article, how to optimise the Docker container.
The beginning
As mentioned in my introduction, my first approach for the Docker container was really basic, the main reason to that is the fact that I needed to deliver a working version of our container ASAP. Therefore, I chose to use a Debian container and wrote the following Docker container:
As you can see, we are doing the following things:
- installing our runtime (Python3) and our package manager (pipenv)
- installing the Python dependencies
- copying our sources inside of the container
- defining an entrypoint.
Really basic and straight-forward — BUT (there is always a BUT) — it’s also really heavy. We need a complete operating system + a runtime + a package manager + their dependencies. The base image (debian:bullseye-slim) weight already 30Mb compressed. By adding all the required dependencies + the code, we end up with a Docker container of approximately 80Mb. For sure, it does the job, but it’s 80Mb to pull every time I want to run my GitHub action.
Why not using Alpine?
As I already had in mind compiling Python and optimising my script distribution, I preferred not to have to deal with the musl libc or to have to install bash
in Alpine (very bad) in order to be able to build. Also, the decompressed container with Alpine still weight at about 124Mb. It’s 50Mb better than with Debian but still above 100Mb.
The problem
I mentioned earlier Golang, one of the thing I love about Golang is that like C or C++, it’s a compiled language that can be compiled as one single static binary. A static binary can be executed inside of the Docker scratch
container which is the tiniest Docker container you could use for your app (it’s an empty container with no folders / files understanding only system-calls).
How to do some Python a la Golang?
My idea was the following one: compile the Python code, include the runtime in the binary / dist folder, compress the binary to make it as light as possible, distribute a container using Docker scratch
.
To execute your scripts, the Python runtime “compiles” your py
files to pyc
files. The pyc
files contain the byte code generated from your code. The byte code is then executed by the runtime. Ideally, what we want is to compile all the pyc
files and the runtime into one standalone binary.
The solution
As I was looking for a Python compiler able to generate a standalone binary not requiring the Python runtime on the system where it’s executed, I came across Nuitka.
Nuitka is a Python project written in Python which compiles Python to C and use the libpython
to execute the code.
Additionally, I decided to also use UPX to compress the binary generated by Nuitka.
The process of compiling Python then compressing is time consuming. As I still want to be able to run my container locally, I used multi-stage builds in my Dockerfile. Multi-stage build allows us to use multiple FROM
statement that can be tagged using the keyword as
in one Dockerfile. Files can be copied across stages, stages can be used as base image for other stages and using the Docker client, you can pass the flag --target
to chose which stage to build.
For our application, the different stages are the following ones:
- “base”: takes care of the base setup (runtime, package manager, source code)
- “compressor”: optimises the code
- final stage: the container using
scratch
which will be pushed to ECR
Here is how it translates into a Dockerfile:
The compressor
stage is what we are interested in. The first layer performs the installation of the packages needed for Nuitka and Nuitka itself:
python3.9-dev
(the source of Python)build-essential
(generic Debian package including all the necessary tools for compiling languages)ccache
(compiler cache to optimise the performance of the Nuitka)clang
(faster compiler)libfuse-dev
(for Nuitka)upx
(binary compressor).
I used pip
to install Nuitka for using its latest version which is still in experimental state for Debian
The second layer of the stage performs the transformations of the code. We run at first Nuitka. We instruct it to build a standalone binary ( --standalone
), this instruction will by default follow the imports and include them and their dependencies in our dist folder. We also ask Nuitka to not to include pytest
. We are also passing some flags to Python for the transformation: nosite
(don’t include the site-specific configuration) and -O
(optimise bytecode). Nuitka has a plugin system allowing to use some specific transformation following the modules you are working with in Python. Here, we are using the following plugins:
anti-bloat
: takes care of stripping out the unused importsimplicit-imports
: detect all the implicitly required importsdata-files
: include all non Python files required by the scriptspylint-warnings
: display pylint warnings while parsing the sources
After defining the plugins, we pass the --clang
flag to use clang
instead of gcc
for faster builds. The --warn-implicit-exceptions
and --warn-unusual-code
are for printing warnings. The --prefer-source-code
instructs Nuitka to look for sources for the modules used by our script for compiling them with our binary. The final argument, myapp
, is the Python file containing my __main__
.
Once Nuitka finished the compilation, it creates a new folder called myapp.dist
containing our binary and the .so
files it links to. After the compilation, we use ldd
to ensure that all the system .so
files are included in our dist folder. The 3 copies following the ldd
calls includes the libgcc
for executing our binary and the resolve
and libnss_dns
libraries for DNS resolution (required for requests
and urllib3
).
Finally, our last step is running upx
to compress our binary. We pass to upx
the -9
flag to get the best compression. In the specific case of our application, we reduced the size of the binary by 31%.
Why not using the
--onefile
flag of Nuitka to generate a static binary?
I tried to use the --onefile
option of Nuitka inside of Docker. The main issue I faced is that it requires elevated privileges to run the compilation. As I didn’t want to perform it at runtime, I decided to stick to --standalone
which doesn’t require access to AppImages.
The final stage uses the Docker scratch container and copies the entire dist
folder to the root of our container.
Doing all this process allows us to have a container of approximately 15Mb instead of the original 80Mb which translates to a size reduction of 80%. In addition to the size reduction of the image, we also have one unique layer now. In the case of our GitHub Actions worklow, we observed an improvement on the execution time: it is 4 to 6 seconds faster.
Having lighter containers (with few layers) allows us to reduce the time it takes to pull the container and to extract it. Reducing the size of your container by such a factor and starting to use Docker scratch
for baking your applications is quite interesting in a container based environment (ECS, Fargate, Kubernetes and so on) as pulling will be faster and a container from scratch
will include and execute only what is needed for your application.
I hope this article will help you optimising your containers and if you have any questions, don’t hesitate to comment !