Optimized Docker builds for Haskell Stack

Tim Spence
Permutive
Published in
3 min readAug 2, 2019

Stack recently removed support for automatically building Docker images for Haskell apps, citing the reason that “with the advent of Docker multistage builds, this functionality is no longer useful.”

However, I’ve yet to find a good resource on how to actually write a multistage Docker build for a Stack-based Haskell application. I posted an initial solution on r/Haskell but was rightly challenged that this solution requires we re-compile all dependencies on every build, which is expensive in a CI environment.

So I went back to the drawing board to come up with a Docker build for Haskell applications which would use multistage builds to optimize both final image size and build time.

Multistage Dockerfile
Script for multistage build

We’ve defined 3 Docker images here:

  • The first is responsible for fetching LibGMP (which GHC dynamically links its compilation targets to) and compiling dependencies. It should only need to be re-built if stack.yaml or package.yaml are modified. Fortunately, that is exactly the behaviour that Docker’s layer caching gives us.
  • The second is where we build our application and it re-uses Stack’s cache of compiled dependencies from the first stage.
  • Finally, the third image will be our compiled application. Note, that it’s based on Ubuntu rather than fpco/stack-build and hence is much more lightweight. For one of our applications, the dependencies image was 10.3GB whereas the final app image was just 142MB!

This all works because of the build script, which uses Docker’s —-cache-from feature to re-use layers from the dependencies stage. If you find that CI builds are still too slow (you are downloading a ~10GB image after all), you may want to docker save your dependencies image to a persistent shared folder if the filesystem is faster than the network—although I couldn’t possibly condone such a hacky solution. 😜 Also, watch out for concurrent builds if you do this: you might need to rely on atomic renames or similar.

In fact, downloading the images was so slow that we switched to using the official haskell:8.6.5 as a base image instead of fpco/stack-build. However, this uses an older version of Stack and doesn’t seem to be updated as frequently so I wouldn’t necessarily recommend it unless the build times are too painful for you. If you do switch to this, remember to update the base image for your final stage to be the same as the base image for haskell (debian:stretch instead of ubuntu:16.04 ) as well to ensure your app is compiled against the correct version of glibc, etc

Caveat Emptor

Note that this solution exhibits degenerate behaviour in the case where a change has modified {stack,package}.yaml and there is at least one other change being built concurrently. In this case, the cache of compiled dependencies will be wiped out on every build as Docker sees that these files have been modified. The best solution I’ve come up with if you need to modify the stack manifests is to make a small commit directly to master which only modifies these files. This should mean that that commit is a common ancestor of any work in progress and hence they all agree on the contents of {stack,package}.yaml

--

--