Lightweight OCaml Docker Images with Multi-Stage Builds
UPDATE 2017–09–22: Marek Kubica (@leonidasfromxiv) on Twitter has kindly pointed out a missing step of the instructions below, namely forgotting to install depexts on the deployment image. The article has been updated to reflect it in the multi-stage build section, utilizing an adaptation of Marek’s solution from the Twitter thread. Thanks for the feedback!
Another weekend, another thing to explore!
In case you missed it, Docker has quite recently announced a feature called multi-stage builds back in July. The documentation that I linked to has a section called “Before multi-stage builds” which eloquently describes the problem that multi-stage builds solve: the required use of the Builder Pattern (no, it’s different from the Builder design pattern), where you maintain at least two Dockerfiles for your project — one for building, and one for production image — in order to produce lightweight images.
It works, and I used it for some of my projects, but the Builder Pattern often introduces the need of some shell scripting tricks to wire things up together. Multi-stage builds solves that!
As with new features of a widely used tool, tutorials have sprout up for any kind of technology X to utilize multi-stage builds. The official docs uses Go, and there are ones for Node.js, .NET, Java, another one for Go, Python, Angular, some static files SPA + nginx, Elixir, React + CRA, Erlang, and I think many more.
As there are none for OCaml at the time of this writing, and by coincidence I’m exploring OCaml on my free time recently, why don’t I try and write one?
Sounds like a great idea! Let’s get to it. The goal of this tutorial is to run a simple OCaml program in a small, lightweight Alpine Docker image. You, the reader, will follow me through my exploration on this topic. I hope you enjoy it!
First of all, why OCaml? Well, why not? :D
OCaml is statically- and strongly-typed, (mostly) functional, fast, safe, and compiled to native, resulting in a single executable binary. The compiler is blazingly fast, and its Hindley-Milner type system (same as for Haskell) is so sophisticated that you don’t need to over-annotate your program and still be type-safe. I’ve always been looking for a fast compiled-to-native — preferably functional — language and OCaml seems to fit the bill so well.
Lately I’ve also played around with BuckleScript, the OCaml-to-JS compiler, but that’s another story for another time!
The Project we’re building
If you have no idea what that is, basically it answers the question “how far is the distance from A to B?” assuming that A and B are two places on Earth identifiable by coordinates.
Normal OCaml and opam Build Workflow
Before we get into Docker stuff, let’s take a look on how we build this project.
This is a standard opam project so I’ll be using several opam commands throughout, and I’m also using Jane Street’s awesome jbuilder build tool for building and generating artefacts. I’m using opam v1.2.2 and jbuilder v1.0+beta12. What I’m going to do here is what I see as the normal workflow of developing opam projects.
First, we clone the project:
$ git clone https://github.com/bobbypriambodo/haversiner.git
$ cd haversiner
Nothing special here. Next, we run this (assuming we have opam installed):
$ opam switch 4.04.2-haversiner --alias-of 4.04.2
$ eval `opam config env`
Here we create a “switch”. To understand what it is, know that opam v1 by default treats all package installations globally. This might sound unintuitive if you come from, say, Node.js world where every project has their own self-contained
node_modules directory. If we run
opam install somelibrary, opam will install it globally.
That sounds difficult! How do you deal with multiple versions of dependencies for multiple projects? Well, it is difficult, and in opam v1, switches solves that. Each switch is a self-contained OCaml compiler and packages knowing nothing about one another. I like to use a single switch for a single project (utilizing the
--alias-of flag), that way I keep my packages self-contained for each project. The
eval command will just set the correct environment variables to make sure your
PATH will point to the correct binaries of the switch.
A bit cumbersome indeed, and opam v2 has a solution called local switches. But it’s still in beta, so…
Okay! So this command will build the compiler for OCaml 4.04.2 (takes a bit long time), alias it as
4.04.2-haversiner, and we’re good to go. Next, we need to run this:
$ opam pin add -yn haversiner .
We go into the cloned directory. The command
opam help pin will tell you that
opam pin will “pin” a given package to a specific version or source. I’ve said that opam treats packages globally, and the
pin command simply tells opam that whenever I refer to
haversiner package, I am talking about this (
.) directory. It will read your opam file, in this case it’s
-yn flags are for answering the two prompts of
opam pin. The first prompts asks whether we want to create
haversiner as a new package (we answer
yes), and whether we want to install the newly created package (we answer
The next step is to install the dependencies of the project, and we use the following command:
$ opam depext haversiner
$ opam install --deps-only haversiner
With this command, opam will read the
depends section of our
haversiner.opam and install them. The
depext command is used for installing OS-specific system packages that our libraries might need (more on that here), and the
opam install is quite self-explanatory.
Lastly, we build the project!
$ make build
jbuilder build @install
As you can see on the last line, it produces a single executable called
main.exe. It is located on the directory
_build/default/bin/. Let’s try it!
Running server on port 3000.
Ooh, nice! We get it running on port 3000. We can test it with
curl, I’ve provided a sample payload at
test/payload.json containing the coordinates of the Nashville airport (BNA) and the Los Angeles (LAX) airport, which we can use like this:
$ curl -XPOST http://localhost:3000 -d @test/payload.json
The distance is roughly 2886.44 km, which is roughly the same result of the ones from Rosetta Code! We successfully built and run our project.
Let’s run it with Docker!
The goal that we want to achieve is to run this program in the lightweight Alpine Linux image. Ubuntu images will likely result in over than 500MB (often more than 1GB) images, which is a drag to push and pull. With Alpine, the base image is only 3.97MB! That’s like downloading an MP3 file of a song from the internet.
We’ve built the image previously in this tutorial, can we just put it into an Alpine image and run it?
Not so fast.
Unfortunately, I’m not using Alpine Linux as my OS, as I’m using OSX. This is bad, because the binary is compiled with OSX as the target. I also don’t think OCaml’s current cross-compilation story is as easy as the one of Go. We need to somehow compile it with Alpine as the target. So what do we do? Well, we build it with Docker!
Let’s write a Dockerfile for the build step:
We use the
ocaml/opam:alpine, which is an Alpine image but with OCaml and opam installed on top of it. The next instructions are similar to what we’ve done in this tutorial: install dependencies and build.
chown part and
opam config exec are nitty-gritty details that I need to add in order to circumvent permission errors and it somehow not finding
If you run
docker build -t haversiner ., it will build the image successfully, resulting in an image that contains our source and resulting executable! Of course, we could naively use this image for our production image (by adding
CMD _build/default/bin/main.exe), but through
docker images, we see that the
ocaml/opam image’s base size is already 935MB (!!) That’s a lot! Surely we can do better.
Enter Docker multi-stage build (finally!).
Here’s the updated
Dockerfile utilizing the new, shiny feature of Docker:
UPDATE 2017–09–22: the above snippet is updated, now taking note of the depexts from the build image and transfer it to the deployment image for installing. Without this, our deployment image might be missing OS libs that are not statically linked to our binary. Solution kudos to Marek Kubica from Twitter with adaptation.
The only difference between this and the previous one is the additions of several lines. If you noticed, we add another
FROM instruction there, and that’s the main point of multi-stage builds! We essentially can have multiple
FROM instructions inside a single Dockerfile which acts as stages with their own set of instructions, and only the final stage will be saved as the resulting image.
Here, we drop all the baggage from the
ocaml/opam image, build a new image based on
alpine, and copy the build artifact from the previous stage to this new image. With the binary size of 4.4MB, the resulting image, in my computer, is only 10.5MB in size! That’s a huge cut from nearly 1GB.
UPDATE 2017–09–22: the above numbers were from before I install the depexts. After, I get a 38.5MB image, which honestly a bit disappointing compared to the previous result, but still a huge improvement nonetheless over 1GB!
So, how about we test it?
$ docker run -p 3000:3000 -d --rm --name haversiner haversiner
<some container ID output here>$ curl -XPOST http://localhost:3000 -d @test/payload.json
Yes! Our Docker image works as expected and ready to serve requests, with a dazzlingly small size!
If you also noticed, we don’t use any
opam switch commands in the Dockerfile, simply because each Docker container will be self-contained, and that way we don’t need to think about managing multiple packages for multiple projects. Profit!
In this article, we see how we can build a normal opam project. We also see how to run our program inside Docker, and how Docker’s new multi-stage builds feature allow us to easily build and create lightweight production images with zero cost.
“But Bobby, why Docker?” The OCaml developers among you are surely very eager to point me to the amazing MirageOS project for unikernels and containerization. I assure you, it’s next on my learning list! There are still so many things to learn! :D
Thanks for reading, I hope you get something out of this post! Tell me in the comments if I can improve anything.