Quick tip: pre-fetched ivy2/maven dependencies in Docker base images

Given I generally develop on the JVM (Scala or Java), and given that I happen to be a big fan of containerisation in general, one of the biggest pain points I’ve come across with this combination is slooooooooooooooow docker image builds.

Local caching of ivy2 and/or maven dependencies is something that prevents all us JVM developers from throwing in the towel — who wants to Download the Internet with every sbt/mvn compile for goodness sake?! Problem is, that when packaging up JVM apps into docker images, you’re kind of breaking the rules of docker image building if your Dockerfiles rely on anything outside of the build context, and this means you’re ultimately facing a Download the Internet scenario with every build. It’s quite frankly intolerable.

So, I decided to experiment with something, but bear in mind it’s a work in progress; the workflow doesn’t feel quite “right” at this point. I would welcome thoughts about how to make this feel more seamless :-)

The crux of the pattern is to create a “base” image Dockerfile— either inside or outside of your project — that is built from a dummy project with a simple build file (pom.xml/build.sbt) referencing the same dependencies you use in your real project. As long as you actually run your build tool as part of the building the base image (i.e not as a CMD), e.g.

RUN sbt compile

then the image will actually be pre-baked with the ivy2/maven caches populated in the usual locations:~/.ivy2 ~/.m2

You can then use this image as the base box for building the container image for your real project. When your build takes place, you’ll notice that you’re not Downloading the Internet any longer, and you’ll be a happy person. If you subsequently add additional dependencies to your project, you’ll have to remember to modify your base image build too.

As I said, I’m not quite sure yet how to turn this into a cleaner workflow — there’s something a little bit hacky about it in my mind. Maybe it would be nice to have some separate process that is regularly re-generating the base image based on your project’s dependencies, or at least something like that.

In the meantime, this is a solution I’m happy enough with because it’s a MASSIVE time saver.

And that’s good enough for me in the short-term :-)

Software engineering nut. Cyclist. Musician. Dog lover