Mastering Gradle Caching and Incremental Builds

Fedor Korotkov
CirrusLabs
Published in
6 min readJan 17, 2018

--

Caching is a crucial part of any sophisticated system on almost every layer: from caching RAM pages inside a CPU to caching DNS records on routers that served the blog post you are reading.

Not surprising that modern build systems also extensively use caching to speed up builds. In this blog post we’ll take a closer look at what Gradle is caching and how incremental builds are working in order to optimally configure Gradle build for a continuous integration system (we’ll have configuration snippets for Cirrus CI but they can be easily applied to any CI).

Roughly Gradle does two kinds of caching: global caching for all projects and task caching on per project basis.

Global Caching

Global caching is probably the most known one. By default Gradle stores it in a home directory for a user. ~/.gradle/caches contains many folders, let’s see what Gradle stores in each one of them:

  • ~/.gradle/caches/jars-3 and ~/.gradle/caches/$GRADLE_VERSION contain jars needed for launching Gradle itself and to start execution of a build.
  • ~/.gradle/caches/transforms-1 contains transformed dependencies from all DependencyHandlers.
  • ~/.gradle/caches/modules-2 contains all the actual jar files and other metadata for all external dependencies of all projects that Gradle ever resolved on the current machine.

Now that we know what Gradle stores in the global cache, we need to briefly discuss how caching is working for most of continuous integration systems. There are basically two options:

  1. A CI agent executes one build after another and persists state between builds. This potentially leads to corrupted state on the agent and not that reproducible builds.
  2. A CI build is executed in a disposable container or a disposable virtual machine. This is a known best practice that people use more and more. In that case continuous integration systems provide a mechanism to persist particular folders between builds. Usually it means that a CI system will archive such folders and upload them to a remote store.

For case #1 there is no need to configure anything. A global cache is just always there but it’s always there for every single build for every single branch of every single repository a particular agent is executing builds for. It will grow enormously!

For case #2 we simply need to cache ~/.gradle/caches folder. But there is a catch! On every build Gradle modifies some files in ~/.gradle/caches folder. It makes a CI system think that a cache has been changed during that build and it’s needed to be re-uploaded again. In order to clean up ~/.gradle/caches folder we need:

  • delete all *.lock files that Gradle workers created for synchronized access to the caches.
  • delete ~/.gradle/caches/$GRADLE_VERSION because it contains caches of file timestamps which change on every CI build.

To summarize, here is an example of .cirrus.yml config file for Cirrus CI:

container:
image:
gradle:4.4-jdk8

check_task:
gradle_cache:
folder:
~/.gradle/caches
check_script: gradle check
cleanup_before_cache_script:
- rm -rf ~/.gradle/caches/$GRADLE_VERSION/
- find ~/.gradle/caches/ -name "*.lock" -type f -delete

Task Caching

Historically plugin authors were inventing optimizations for their tasks on their own and there was no standardized way to do it until Gradle introduced incremental build support aka UP-TO-DATE checks. The idea of UP-TO-DATE checks is pretty simple: If a task (1) has a known set of inputs and outputs and (2) does not have side effects then inputs deterministically define outputs and task execution can be skipped if inputs hasn’t changed from the previous build. It’s just like a pure function!

The introduction of Build Cache in Gradle 3.5 took this idea to a completely different level. Before, UP-TO-DATE checks worked only between sequential local invocations of Gradle which was a huge pain-point for people using feature branches in Git. Switching branches most likely was resulting in all UP-TO-DATE checks to fail on the next build. The build cache on the other hand allowed UP-TO-DATE checks to persist information about inputs and outputs not only between sequential invocations.

To enable the build cache for your Gradle project simply put org.gradle.caching=true in your gradle.properties file. By default Gradle stores Build Cache locally in ~/.gradle/caches/build-cache-1 folder which means it will be automatically cached in CI as we discussed above. But it’s not that simple! Build Cache size will grow overtime and by default it can be up to 5 GB. It’s quite a lot! In Gradle 4.6 Build Cache will switch to time based cache eviction policy which means everything that was unused in the last 7 days will be evicted from the cache. But still Build Cache will contain a lot of artifacts that are not required for every build!

Thankfully, there is an option to use a Remote Build Cache. Instead of storing everything locally Gradle will use a remote storage and download cached build artifacts only when they are needed.

Both Gradle Enterprise and Cirrus CI have built-in HTTP Build Cache and it’s very easy to setup. Here is an example of how to change settings.gradle file to setup HTTP Build Cache for Cirrus CI:

ext.isCiServer = System.getenv().containsKey("CI") ext.isMasterBranch = System.getenv()["CIRRUS_BRANCH"] == "master" 
ext.buildCacheHost = System.getenv().getOrDefault("CIRRUS_HTTP_CACHE_HOST", "localhost:12321")
buildCache {
local {
enabled = !isCiServer
}
remote(HttpBuildCache) {
url = "http://" + buildCacheHost + "/"
enabled = isCiServer
push = isMasterBranch
}
}

You can use the same configuration with Gradle Enterprise by simply changing buildCacheHost to point to your Gradle Enterprise instance.

Now let’s see how different caching configurations affect build performance.

Performance Testing

As an example let’s see how caching can affect Gradle build times for Groovy project. We are going to measure time to build Groovy distribution because in that case test execution time won’t suppress the effect of caching. We are going to compare three cache configurations:

  1. No caching at all.
  2. Only global caching of ~/.gradle/caches folder.
  3. Global caching and enabled Remote Build Cache.

Here is a .cirrus.yml config containing such tasks that we used for testing. Note that we were using a container with 4 CPUs and 12 GB of memory.

Cirrus UI of a build to test performance

As we can see global caching almost didn’t affect build time since Groovy project doesn’t have that much external dependencies. But a build with Remote Build Cache enabled showed six(!) times faster build! Logs for build_cache task shows that the build cache is actually working:

189 actionable tasks: 89 executed, 100 from cache

Note that not all tasks make sense to cache, for example Copy tasks or some archive tasks like Jar. It’s faster to perform the operation locally than download from the build cache. This is why profiling with build scans is very important to understand how long it took to resolve task outputs from the cache.

Now let’s also make sure that the build cache is working for incremental builds as well. In order to do so we are going to change an implementation of a class for a subproject. Since it is an Application Binary Interface compatible change, Gradle shouldn’t recompile anything but the change subproject due to compilation avoidance feature added in Gradle 3.4.

Cirrus UI of an incremental build

And logs of build_cache task shows exactly what we would expect:

189 actionable tasks: 91 executed, 98 from cache

Only extra two tasks were executed and a build with Remote Build Cache enabled was still six times faster! And a relevant build scan pinpoints that only tasks for the changed project were executed:

Build Scan for the incremental build

And the rest 98% of tasks came from the remote cache:

Build cache tab of a build scan

We hope after reading this blog post there are no questions left about Build Cache or Incremental Builds. If there are still any left, please don’t hesitate to ask!

Note: for Android projects we have a separate blog post show casing Cirrus CI.

We are highly encourage to try out Cirrus CI and provide us feedback! Cirrus CI is free for Open Source projects and very easy to setup!

--

--