Lightweight OCaml Docker Images with Multi-Stage Builds

What’s a Docker tutorial without a shipping container reference? [source]

UPDATE 2017–09–22: Marek Kubica (@leonidasfromxiv) on Twitter has kindly pointed out a missing step of the instructions below, namely forgotting to install depexts on the deployment image. The article has been updated to reflect it in the multi-stage build section, utilizing an adaptation of Marek’s solution from the Twitter thread. Thanks for the feedback!

Another weekend, another thing to explore!

In case you missed it, Docker has quite recently announced a feature called multi-stage builds back in July. The documentation that I linked to has a section called “Before multi-stage builds” which eloquently describes the problem that multi-stage builds solve: the required use of the Builder Pattern (no, it’s different from the Builder design pattern), where you maintain at least two Dockerfiles for your project — one for building, and one for production image — in order to produce lightweight images.

It works, and I used it for some of my projects, but the Builder Pattern often introduces the need of some shell scripting tricks to wire things up together. Multi-stage builds solves that!

As with new features of a widely used tool, tutorials have sprout up for any kind of technology X to utilize multi-stage builds. The official docs uses Go, and there are ones for Node.js.NET, Java, another one for Go, Python, Angular, some static files SPA + nginx, Elixir, React + CRA, Erlang, and I think many more.

As there are none for OCaml at the time of this writing, and by coincidence I’m exploring OCaml on my free time recently, why don’t I try and write one?

Sounds like a great idea! Let’s get to it. The goal of this tutorial is to run a simple OCaml program in a small, lightweight Alpine Docker image. You, the reader, will follow me through my exploration on this topic. I hope you enjoy it!

Why OCaml

First of all, why OCaml? Well, why not? :D

OCaml is statically- and strongly-typed, (mostly) functional, fast, safe, and compiled to native, resulting in a single executable binary. The compiler is blazingly fast, and its Hindley-Milner type system (same as for Haskell) is so sophisticated that you don’t need to over-annotate your program and still be type-safe. I’ve always been looking for a fast compiled-to-native — preferably functional — language and OCaml seems to fit the bill so well.

Lately I’ve also played around with BuckleScript, the OCaml-to-JS compiler, but that’s another story for another time!

The Project we’re building

In this tutorial, we’re going to reuse my Haversiner project, a simple OCaml web API exposing functionality to calculate great-circle distance of two coordinates using the Haversine formula.

If you have no idea what that is, basically it answers the question “how far is the distance from A to B?” assuming that A and B are two places on Earth identifiable by coordinates.

Normal OCaml and opam Build Workflow

Before we get into Docker stuff, let’s take a look on how we build this project.

This is a standard opam project so I’ll be using several opam commands throughout, and I’m also using Jane Street’s awesome jbuilder build tool for building and generating artefacts. I’m using opam v1.2.2 and jbuilder v1.0+beta12. What I’m going to do here is what I see as the normal workflow of developing opam projects.

First, we clone the project:

$ git clone https://github.com/bobbypriambodo/haversiner.git
$ cd haversiner

Nothing special here. Next, we run this (assuming we have opam installed):

$ opam switch 4.04.2-haversiner --alias-of 4.04.2
$ eval `opam config env`

Here we create a “switch”. To understand what it is, know that opam v1 by default treats all package installations globally. This might sound unintuitive if you come from, say, Node.js world where every project has their own self-contained node_modules directory. If we run opam install somelibrary, opam will install it globally.

That sounds difficult! How do you deal with multiple versions of dependencies for multiple projects? Well, it is difficult, and in opam v1, switches solves that. Each switch is a self-contained OCaml compiler and packages knowing nothing about one another. I like to use a single switch for a single project (utilizing the --alias-of flag), that way I keep my packages self-contained for each project. The eval command will just set the correct environment variables to make sure your PATH will point to the correct binaries of the switch.

A bit cumbersome indeed, and opam v2 has a solution called local switches. But it’s still in beta, so…

Okay! So this command will build the compiler for OCaml 4.04.2 (takes a bit long time), alias it as 4.04.2-haversiner, and we’re good to go. Next, we need to run this:

$ opam pin add -yn haversiner .

We go into the cloned directory. The command opam help pin will tell you that opam pin will “pin” a given package to a specific version or source. I’ve said that opam treats packages globally, and the pin command simply tells opam that whenever I refer to haversiner package, I am talking about this (.) directory. It will read your opam file, in this case it’s haversiner.opam.

The -yn flags are for answering the two prompts of opam pin. The first prompts asks whether we want to create haversiner as a new package (we answer yes), and whether we want to install the newly created package (we answer no).

The next step is to install the dependencies of the project, and we use the following command:

$ opam depext haversiner
$ opam install --deps-only haversiner

With this command, opam will read the depends section of our haversiner.opam and install them. The depext command is used for installing OS-specific system packages that our libraries might need (more on that here), and the opam install is quite self-explanatory.

Lastly, we build the project!

$ make build
jbuilder build @install
ocamldep bin/main.depends.ocamldep-output
ocamldep lib/haversiner.depends.ocamldep-output
ocamldep lib/haversiner.dependsi.ocamldep-output
ocamlc lib/haversine.{cmi,cmti}
ocamlc lib/haversiner_server.{cmi,cmti}
ocamlc lib/coordinate_parser.{cmi,cmti}
ocamlc lib/haversine.{cmo,cmt}
ocamlc lib/coordinate_parser.{cmo,cmt}
ocamlc bin/main.{cmi,cmo,cmt}
ocamlc lib/haversiner_server.{cmo,cmt}
ocamlc lib/haversiner.cma
ocamlopt lib/haversine.{cmx,o}
ocamlopt lib/coordinate_parser.{cmx,o}
ocamlopt lib/haversiner_server.{cmx,o}
ocamlopt bin/main.{cmx,o}
ocamlopt lib/haversiner.{a,cmxa}
ocamlopt lib/haversiner.cmxs
ocamlopt bin/main.exe

As you can see on the last line, it produces a single executable called main.exe. It is located on the directory _build/default/bin/. Let’s try it!

$ _build/default/bin/main.exe
Running server on port 3000.

Ooh, nice! We get it running on port 3000. We can test it with curl, I’ve provided a sample payload at test/payload.json containing the coordinates of the Nashville airport (BNA) and the Los Angeles (LAX) airport, which we can use like this:

$ curl -XPOST http://localhost:3000 -d @test/payload.json
{"distance":2886.444442837984}

The distance is roughly 2886.44 km, which is roughly the same result of the ones from Rosetta Code! We successfully built and run our project.

Let’s run it with Docker!

The goal that we want to achieve is to run this program in the lightweight Alpine Linux image. Ubuntu images will likely result in over than 500MB (often more than 1GB) images, which is a drag to push and pull. With Alpine, the base image is only 3.97MB! That’s like downloading an MP3 file of a song from the internet.

We’ve built the image previously in this tutorial, can we just put it into an Alpine image and run it?

Not so fast.

Unfortunately, I’m not using Alpine Linux as my OS, as I’m using OSX. This is bad, because the binary is compiled with OSX as the target. I also don’t think OCaml’s current cross-compilation story is as easy as the one of Go. We need to somehow compile it with Alpine as the target. So what do we do? Well, we build it with Docker!

Let’s write a Dockerfile for the build step:

We use the ocaml/opam:alpine, which is an Alpine image but with OCaml and opam installed on top of it. The next instructions are similar to what we’ve done in this tutorial: install dependencies and build.

The chown part and opam config exec are nitty-gritty details that I need to add in order to circumvent permission errors and it somehow not finding jbuilder.

If you run docker build -t haversiner ., it will build the image successfully, resulting in an image that contains our source and resulting executable! Of course, we could naively use this image for our production image (by adding CMD _build/default/bin/main.exe), but through docker images, we see that the ocaml/opam image’s base size is already 935MB (!!) That’s a lot! Surely we can do better.

Enter Docker multi-stage build (finally!).

Here’s the updated Dockerfile utilizing the new, shiny feature of Docker:

UPDATE 2017–09–22: the above snippet is updated, now taking note of the depexts from the build image and transfer it to the deployment image for installing. Without this, our deployment image might be missing OS libs that are not statically linked to our binary. Solution kudos to Marek Kubica from Twitter with adaptation.

The only difference between this and the previous one is the additions of several lines. If you noticed, we add another FROM instruction there, and that’s the main point of multi-stage builds! We essentially can have multiple FROM instructions inside a single Dockerfile which acts as stages with their own set of instructions, and only the final stage will be saved as the resulting image.

Here, we drop all the baggage from the ocaml/opam image, build a new image based on alpine, and copy the build artifact from the previous stage to this new image. With the binary size of 4.4MB, the resulting image, in my computer, is only 10.5MB in size! That’s a huge cut from nearly 1GB.

UPDATE 2017–09–22: the above numbers were from before I install the depexts. After, I get a 38.5MB image, which honestly a bit disappointing compared to the previous result, but still a huge improvement nonetheless over 1GB!

So, how about we test it?

$ docker run -p 3000:3000 -d --rm --name haversiner haversiner
<some container ID output here>
$ curl -XPOST http://localhost:3000 -d @test/payload.json
{"distance":2886.444442837984}

Yes! Our Docker image works as expected and ready to serve requests, with a dazzlingly small size!

If you also noticed, we don’t use any opam switch commands in the Dockerfile, simply because each Docker container will be self-contained, and that way we don’t need to think about managing multiple packages for multiple projects. Profit!

Conclusion

In this article, we see how we can build a normal opam project. We also see how to run our program inside Docker, and how Docker’s new multi-stage builds feature allow us to easily build and create lightweight production images with zero cost.

“But Bobby, why Docker?” The OCaml developers among you are surely very eager to point me to the amazing MirageOS project for unikernels and containerization. I assure you, it’s next on my learning list! There are still so many things to learn! :D

Thanks for reading, I hope you get something out of this post! Tell me in the comments if I can improve anything.