A visual guide to gitlab-ci caching

Matthieu FRONTON
4 min readOct 25, 2019

--

If you’ve ever worked with gitlab-ci you may have needed, at some point, to use a cache to share content between jobs.

But the decentralized nature of gitlab-ci, the understanding of where/how the setup must be set, and the overlaping of concepts between cache and artifacts may have get you struggled.

It did to me.

So while going back and forth between the documentation, my hosts, my runners and my projects, I took note of my understanding and ultimately end up with a visual guide for my co-workers

I thought some people outside of my company may find it useful.
Hope you’ll like it.

Before we start : cache vs artifacts

The concepts may seems to overlaps because it’s about sharing content between jobs. But it differs fundamentally :

  • If your job does not rely on the the previous one (i.e. can produce it by itself but if content already exist the job will run faster) then use cache
  • If your job does rely on the output of the previous one (i.e. cannot produce it by itself) then use artifacts and dependencies

Just remember the following : “Cache is here to speedup your job, but it may not exist : don’t rely on it”. This article is about cache

Simple Model

We’ll use a simple representation of the gitlab-ci pipeline and forget the jobs can be executed on any runners and any hosts. It will help get the basics.

Lets say you have 1 project, 2 docker runners and 3 branches

Local Cache : Docker Volume

If you want a local cache between all your jobs running on the same runner, use the cache statement in your .gitlab-ci.yml

Using the predefined variable “CI_COMMIT_REF_NAME” as the cache key you can ensure the cache is tied to a specific branch

Using the predefined variable “ CI_JOB_NAME” as the cache key you can ensure the cache is tied to a specific job

Local Cache : Bind Mount

If you don’t want volume for caching purpose (debugging purpose, cleanup disk space more easily, etc…) you can set a “bind mount” while registering the runner. With this setup you do not need to setup the cache statement in your .gitlab-ci.yml

In fact this setup even allows you to share a cache between jobs running on the same host without requiring you to setup a distributed cache (which we’ll talk about later…)

Distributed Cache

If you want a shared cache between all your jobs running on multiple runners and hosts, use the [runner.cache] section in your config.toml

Using the predefined variable “CI_COMMIT_REF_NAME” as the cache key you can ensure the cache is tied to a specific branch between multiple runner and hosts

Real Life Model

The above is a simplified model where you know on which runner/host you run the jobs. It allows you to understand the concepts and even sometimes use it in real life (register local runners, use tags to select the ones you want and you’re set…)

In real life this may not be true (autoscaling, …) but this article is more a TL;DR than a reference guide. The above should be enough to help you understand the basics required to play with more advanced setup.

Just to give you a sneak peak, here is an exercice for you :

Setup a cache between all the jobs of a specific stage, running on any runner and any hosts, but only between pipeline of the same branches

Have fun :)

--

--

Matthieu FRONTON

Director - Cybersecurity & Digital Architect @ frog part of Capgemini Invent. Formerly Head of DevOps Strategy @ La Poste. Full Time Digitaloholic