Docker container with Python for ARM64/AMD64
Note: This is Python 3.8, for Python 3.9 version go here https://alex-ber.medium.com/docker-container-with-python-3-9-for-arm64-amd64-f2cdf167230f
END OF UPDATE
This month Anaconda 2021.05 was released. For the first time it contains support for “64-bit AWS Graviton2 (ARM64) platform”. Why do you care? Let’s look first on very short description of what is ARM64.
ARM (Advanced RISC Machine, originally Acorn RISC Machine. RISC = Reduced Instruction Set Computer) is a compact and energy saving, not a high performance chipset. Therefore according chipsets are primarily being used in mobile and energy saving devices like smartphones, tablets, IoT…
x64 type are usually used in according higher performance devices like desktop PCs, higher performance notebooks or even in servers and other business class devices.
Ok, but why would you chose, roughly speaking, lower performance CPU? First of all, may be your code is going to run natively on Raspberry pi or smartphone. The more interesting case is however, in order to pay less.
If you code is going to run in AWS anyway, by choosing ARM64, you can significantly reduce your payment bill. Of course, you can’t do it for any application, but there are some classes of the applications that you can. Let’s look on official AWS page:
Amazon EC2 A1 instances deliver significant cost savings for scale-out and Arm-based applications such as web servers, containerized microservices, caching fleets, and distributed data stores that are supported by the extensive Arm ecosystem… Most architecture-agnostic applications that can run on Arm cores could also benefit from A1 instances.
As a side note, this implies that if you’re working with neuron network, it may be wrong idea to use ARM64. On another hand, if you’re using Pandas for some background processing (ETL for example), it may be ok. But if you have web services/micro service this is a right chose to do.
"If you want to use ARM targets to reduce your bill, such as Raspberry Pis and AWS A1 instances, or even keep using your old i386 servers, deploying everywhere can become a tricky problem as you need to build your software for these platforms".
--platform linux/arm64 alexberkovich/alpine-python3:latest works on AWS A1 instance.
Slim version of
arm64 is in in beta-version and can be removed without prior notice.
You can read about the history of my my alpine-anaconda3 s project in appendix below.
The source code is available as part of my alpine-anaconda3 s project.
You can install Docker container with Python from Dockerhub:
docker pull alexberkovich/alpine-anaconda3
or for AMD64/Intel x86-x64:
docker pull --platform linux/amd64 alexberkovich/alpine-anaconda3
or for ARM/64/v8
docker pull --platform --platform linux/arm64 alexberkovich/alpine-anaconda3
You can extends this docker image and simply add
FROM alexberkovich/alpine-anaconda3:latestCOPY conf/requirements.txt etc/requirements.txtRUN pip install -r etc/requirements.txt
Note: There is also slim and python version for this Docker file, that contains almost no Python Package. For python version change anaconda3 to python3. For slim version Just append
-slimto the name of the Docker image. See appendix below to read more.
So, I have existed “base Python Docker Image” that runs with Python 3.8.5 on AMD64. I wanted to make create mutli-arch Docker Image for AMD64 and ARM64.
What I really wanted, that from the user perspective, I can have the same Docker Image that can work on AMD64 and on ARM64.
The tool to achieve this is multi-arch build and images. Actually I’m using what is described their as “the hard way with docker manifest”. I’m finding in acceptable idea that my build should be directly deployed to the Dockerhub. For example, for creating the slim version I want to aggressively remove all intermediate container. This is impossible using
Under the hood
I have starting with existing Docker Image. It works for AMD64. So, what I wanted to do first, is to create another version of it that is works for ARM64, but with latest version of Anaconda.
Because my Docker Image is based on Alpine Linux, I have to update some version of used OS-Level tools, such as
Then I discovered that I need to add OS-level package hdf5-dev. On AMD64, when I pip install h5py it actually downloads the wheel and install it. For ARM64 there is no wheel available, so it installs it from source, and this requires hdf5-dev to be installed in OS-level.
In general, many Python Package has no wheel for AMD64, so they will be installed from source. Likely, I have everything in place (GCC, etc) to achieve this.
Maybe, it will surprise you, but Anaconda 2021.05 for AMD64 and Anaconda 2021.05 for AMR64 doesn’t contain exactly the same packages. As I’ve said before, workaround for this is installation from source.
There is one important exception, however.
mkl package (source code https://github.com/IntelPython/mkl-service). It is available only through Anaconda and only for AMD64. For AMR64 instead of
openblas can be used. There are some more Intel packages that are not available for AMD64, but there are replacements for them.
On the source level I’ve created single Dockerfile from which I’m creating 2 separate Docker Images — one for AMD64, one for ARM64. I’ve manifest that aggregate those 2 Docker Image, so you can write
docker pull alexberkovich/alpine-anaconda3:latest (this is actually manifest; it is deliberately looks like tag) and Docker will pull one of the supported versions of the Docker Image. You can control this explicitly, by supplying
--platform=linux/amr64v8 in docker
pull / docker
run command or even in FROM clasue of the Dockerfile, something like this:
In Dockerfile itself there are couple of if statement, sometimes I do different things depending whether I’m building for AMD64 or for ARM64. For example,
curl has different version. In Alpine Linux I can use only latest version for OS-level packages, and they have turn to be different for AMD64 and for ARM64….
In Docker Hub, there are actually 3+3+3(+3) different entities.
alexberkovich/alpine-anaconda3:latest is manifest file that aggregates in the time of writing of this story alexberkovich/alpine-anaconda3:0.3.3-amd64 and alexberkovich/alpine-anaconda3:0.3.3-arm64v8. Two last one are regular tagged images (built for specific CPU architecture).
alexberkovich/alpine-anaconda3:latest-slim is manifest file that aggregates in the time of writing of this story alexberkovich/alpine-anaconda3:0.3.3-slim-amd64 and alexberkovich/alpine-anaconda3:0.3.3-slim-arm64v8. Two last one are
slim tagged images (built for specific CPU architecture).
alexberkovich/alpine-python3:latest is manifest file that aggregates in the time of writing of this story alexberkovich/alpine-python3:0.3.3-amd64 and alexberkovich/alpine-python3:0.3.3-arm64v8. Two last one are Python-based (without Anaconda) tagged images (built for specific CPU architecture).
alexberkovich/alpine-anaconda3:latest is the same as
alexberkovich/alpine-anaconda3:0.3.3 (will change after new version will be released).
alexberkovich/alpine-anaconda3:latest-slim is the same as
alexberkovich/alpine-anaconda3:0.3.3-slim (will change after new version will be released).
alexberkovich/alpine-python3:latest is the same as
alexberkovich/alpine-python3:0.3.3 (will change after new version will be released).
About 1.5 years ago I have created what I’m calling “base Python Docker Image”. It was based on Python 3.7 and Anaconda. The reason that I’ve created it, I’ve found that having requiments.txt and trying to “just” pip install doesn’t work on any existing Docker Images.
If I want to install some brand new package, suddenly it fails to work, because of another dependency that is old.
Sometimes, I want to remove some package (because it prevent me to install another one; and I don’t need this specific one), but I can’t (it may require to uninstall more packages or it can be distutil project, that pip just can’t uninstall, for example ruamel_yaml.
Sometimes installing latest new packages will update half of existence dependencies, for example graphviz.
Another problem may be, that if I have package A and package B that I want to install both and they depends on cffi package (Foreign Function Interface for Python calling C code — many packages that have C-extension uses it), that if I run command pip install B and pip install A everything works fine, but if I put them in requiments.txt suddenly this doesn’t work. This can happen, if I put them A, B — in this order. If I have cffi preinstalled in the Docker, than the order doesn’t matter.
It also has some OS-level packages in order to pip install of any packages “just work”. This means that I have some SSL-related packages, such as openssl-dev installed. In order to be able to successfully built practically any package from source I have couple of C/C++ compilers, even Fortran compiler.
On latest release I have added hdf5-dev OS-level package. It appears that h5py (Read and write HDF5 files from Python) requires it to built from source.
I have change the Python version to 3.8 in previous versions. There are some packages (for example, pandas/sklearn) that I preserve their version exactly.
In latest release, I’ve updated Anaconda version to the latest, so most of the packages are up-to-date. Still, there are some packages with pinned version (still true for pandas/sklearn, but there more, for example h5py and pyyaml (and there more). There are various reasons for such decisions, but if it doesn’t fit your need, you should be able easily update their version to what you need. Just you should be aware, that you may need to update also other dependencies. With help of all infrastructure installed, once you figured out what packaged to what version you want to update, you should be able to do with simple pip install.
My original motivation of creating Docker Image was to create it for Development purpose. It wasn’t designed to use it to created Docker Container to actually run the code. So, the size of the Docker Imaged didn’t bother me.
If I’m creating some simple microservice/web service I don’t need to have graphviz in it. Actually, most of the installed packaged are not in use.
So, I have create the “slim” version of my “base Python Docker Image” that contains almost no Python packages.
I’m using the regular base version to create requirements.txt — all needed packages for this specific application. Than I’m using
alexberkovich/alpine-anaconda3-slim as base version (FROM) to actually run it in production.
If I need to add new dependency I’m doing this on regular “base Python Docker Image”, updating the requirements.txt and changing the base image to
You can install slim Docker container with Python from Dockerhub:
docker pull alexberkovich/alpine-anaconda3-slim
or for AMD64/Intel x86-x64:
docker pull --platform linux/amd64 alexberkovich/alpine-anaconda3-slim
or for ARM/64/v8
docker pull --platform --platform linux/arm64 alexberkovich/alpine-anaconda3-slim
Finally today I have added Python version. It doesn’t include Anaconda at all. I’m using Alpine Linux package manage to install
python-dev OS level package. Then I’m adding some packages to get it close to the
It serve as alternative solution for
slim version. Note, that some packages, such as sip are unavailable in this version. For AMD64 version packages like mkl_random are unavailable, making numpy not possible to use. For ARM64 llvmlite, for example, is unavailable making impossible to built from source many packages.
You can install Docker container with plain Python from Dockerhub:
docker pull alexberkovich/alpine-python3
or for AMD64/Intel x86-x64:
docker pull --platform linux/amd64 alexberkovich/alpine-python3
or for ARM/64/v8
docker pull --platform --platform linux/arm64 alexberkovich/alpine-python3