GitHub All-Stars #4: GoogleContainerTools/jib
Ok, that was a long break since the last episode. But there was a reason.
In the fourth article in the series, I decided to try a bit different approach. My previous reviews covered analysis of projects from a sole developer -. this time I decided to take a look at work of a team from one of the FAMGA companies — Google’s JIB.
I did it for several reasons — It was a trending project, but also it’s theme was something that caught my eye: it was Java project (Java is my main work language so it was fun to see how it’s used in modern open source project) and project’s domain was freakin’ interesting (build your own Docker images on very low level). Last but not least, I was just plain curious about how Google structured their source project.
Having that in mind, let’s take a look at what Jib is doing.
If you wonder what JIB is doing, a perfect introduction can be found in the post from the original creators. In short, the goal of this library is Continous Deployment of container images for Java applications, strictly from build tools (JIB is supporting both Gradle and Maven). It is achieved by building container images without usage of Docker deamon, from pure Java. Due to targeting a singular platform, it can provide the much more aggressive granularity of image layers and through that more efficient cache.
It’s supporting Google Container Registry, as well as Amazon Elasting Container Registry and Docker Hub. It’s great to see that Google is targeting not only their own platform but is creating reusable tool that can be used by a variety of programmers, no matter which Cloud platform they are using — at least as long they are not using Azure :)
Bit of Stats
Due to some feedback I received after previous articles, I’ll provide additional details about projects. As open source projects are swiftly changing, it will help find specific “point in time” that is covered:
Commit hash: dea7a2ae244629a5f97ee1fec7be5df49bf07156
Amount of "Stars": 3893
In comparison to projects I described in previous editions, this one seems to be both far more mature, which brings far more interesting structure. It also allows us to learn how Google structurize their open source projects.
Between rather typical directories, there are two that got my interest — kokoro and proposals. While after just visiting the latter we learn that it’s a place where people can suggest changes to contributors using Pull Request (clever concept), the former is more interesting: there is no explanation and while a content of the files suggest that this is some kind of CI-connected space, nothing more can be found.
Fortunately, a quick searching the web helps us understand what it is.
In ProtoBuf repository we find the answer:
The files in this directory serve as plumbing for running Protobuf tests under Kokoro, our internal CI.
Ok, so Kokoro is some kind of Google tooling. Maybe one day it will be open sourced (like Bazel).
In the repository, we can also find examples — very short and succinct, as well as docs, which should be self-explanatory. Rest of the directories is covering what’s animals like we like’s the most, code.
In the repository, we can also find examples — very short and succinct, as well as docs, which should be self-explanatory. Rest of the directories is covering what’s animals like we like’s the most, code. It is shared between tree directories — one for core logic (jib-core), two for build tool’s plugins, respectively for Maven (jib-maven-plugin) and Gradle (jib-gradle-plugin). As they are quite similar to each other, I decided to cover only Gradle version, as jib itself is Gradle-built.
Starting with jib-gradle-plugin, let’s take a look on build.gradle. It gives us a bit of insight into internal/external tools which Google is using. List of plugins itself is worth mentioning. Apart of standards, like java-gradle-plugin, com.gradle.plugin-publish, checkstyle or maven (which allows offline builds), there are some unusual choices.
- com.github.sherter.google-java-format — Gradle plugin for google-java-format, which automatically formats code during build to cope with Google standards
- net.ltgt.apt — helper for annotation processing, automatically configuring both compilers as well as IDEs
- net.ltgt.errorprone — plugin for yet another static checker, Error Prone, integrating it into build
- net.researchgate.release — plugin which augment Gradle with release task similar to one known from Maven, doing a lot of pre-release checks for you.
The next thing worth to take a look are dependencies:
- google-http-java-client — Not well-known project. It’s interesting (but rather not surprising) that Google has their own HTTP Library. I’ll definitely try it in some project in future to evaluate it deeper.
- apache-commons-compress — there is also a place for Apache Commons and their library for Compression. While doing code review, I will try to find where it is used
- javassist — for bytecode manipulation — as this is very interesting and powerful library, also there it will be interesting to find what it is used for
- guava — I don’t think any comment is needed there ;) It would be surprising NOT to find guava in google project
- few classics — jackson, slf4j, junit, mockito
- There is also singular annotation processor — com.uber.nullaway — plugin to previously mentioned Error Prone, static analysis tool that brings type-based nullability checking known from Kotlin to Plain Old Java World.
Gradle Plugin Code
After we checked this (really) interesting build.gradle, there is time to see what can be found in the code itself. Gradle Plugins structure is strictly defined, which definitely helps us exploring it’s code (convention over configuration FTW!). We will start with file JibPlugin.java.
The first thing that catches the eye is fact, that Gradle 4.6 is necessary to even run plugin — it’s worth to pinpoint that Google is not toying with backward compatibility in projects like that.
JibExtension has prominent place in plugin. This is simple POJO that is used to preserve configuration set by user in build.gradle which are used to control build process — as an example, we can set which cloud providers is used or configuration of specific container image — exposed ports or format.
Those properties can be acquired by retrieving ObjectFactory from Project ( both structures are part of Gradle standard library). Interesting code generation is used here, invoking newInstance on ObjectFactory, which create an instance of a passed class, filled with user-set properties. Everything happens in a rather straightforward way there. Whole class has one interesting method — handleDeprecatedParameters — but we will show it later when it will be used.
JibPlugin is initialising several Gradle task — buildImageTask, buildDockerTask, dockerContextTask, buildTarTask — all of them depends on the same task — generation of classes — and all very similar. That’s why we will be covering just first, most generic one — buildImageTask — for the rest of articles. Let’s take a look how it’s build.
Each user-defined task extends org.gradle.AbstractTask class (or it’s subclass). Method containing business logic needs to be marked with @TaskAction annotation.
This is also a place where we find previously mentioned jibExtension.handleDeprecatedParameters — it is informing users about deprecated parameters she’s using. When I said that every task does the same, I meant it: they are literally executing the same methods — we can find a lot of copy-pasted code, and logging is a good example there. I understand the reason — premature generalization is a pure evil and it’s worth waiting a bit longer just to see how our patterns will evolve to decide if we need given IAbstractController or we can just write implementation. Still, as there are four different subsequent checks like that for simple things, I hope some abstraction will emerge there in future versions.
In next few lines there is a bit of orchestration of parameters, which is not worth covering.
Real meat starts when we go to BuildConfiguration — all the previous orchestrations are going to this point. BuildConfiguration is our transport object between Gradle plugin and jib core. It’s using a builder pattern and is the only thing that truly differs between plugin tasks.
An additional configuration that needs to be set is cache configuration. Fortunately, it’s filled with some sensible default. There is also an option of reusing cache between projects or restricting them to specific projects.
After this whole configuration heavy lifting which happened in the Gradle Plugin, there is now time to use a class from jib-core module — BuildStepsRunner is our next stop. Real Business Logic, beware!
Now, finally, we got to the core.
BuildStepsRunner is yet another orchestrator (Java ❤), this time containing “templates” which are used to create specific images. For the rest of the text, we will be using that from forBuildImage method, which will be building a version for Docker Registry, but without delegating to Docker deamon. It will allow us to see steps which are used to build Docker image on a very low level (just to say that we will be operating on filesystem stream level). While a bit messy, It brings a lot of interesting information.
BuildSteps is a place where real works happens. It builds a pipeline from small steps, which show what operations need to happen if we want to create a docker-compatible image. We will go step by step to see what is executed in subsequent parts.
First of method is runRetrieveTargetRegistryCredentialsStep (when you see names like that, you know you are in a Java world) which runs RetrieveRegistryCredentialsStep. It’s a good moment to see how our BuildSteps are constructed.
Each implements AsyncStep and Callable Interfaces.
AsyncStep has a single method, getFuture, which returns ListenableFuture from Guava package. This is Google approach to implements something similar to CompletableFuture from Java Core API, both preceding and allowing to use this concept with JDKs older than 8th.
They are glue part of asynchronous invocation which happens in particular BuildSteps.
The Business Logic of every step is in call method from the Callable interface. This callable interface is submitted into ListeningExecutorService, which is Guava implementation of ExecutorService from java.util.concurrent. This way previously mentioned ListenableFuture is created and returned to StepRunner.
Logic in every call is wrapped with Timer object that is used for performance measurement — it was meant to be removed before first release (spoiler, it wasn’t :))
In RetrieveRegistryCredentialsStep, a first from many steps that are needed to generate Docker Image, RegistryCredentials are meant to be retrieved in form of Authorization object. There is a complex algorithm that has multiple different approaches to getting them.
At first, credential helpers are tried to be used. CredentialsHelpers are the implementation of Credential Helper Protocol. If CredentialHelper name was passed in BuildConfiguration (and specifically in build.gradle from.credHelper property), JIB tries to call given helper and retrieve credentials that way. Whole logic covering that is stored in com.google.cloud.tools.jib.registry.credentials.DockerCredentialHelper class.
Another approach is to use know credentials, passed by from.auth, just as a plain text username and password. In the following code, they areconverted to Authorization object.
There is also a bit of magic added to the library in form of sensible defaults. As a last resort, if no from.auth nor from.credHelper properties are passed, JIB will try to use one of the known credential helpers, based on a registry to which we want to push mage (which itself is previously calculated in ImageReference class).
In last resort, there is a fallback to DockerConfigCredentialRetriever, which try to user Docker config file (~/.docker/config.json) to find those creds.
If no credentials are found in any place, null is returned and the image registry is assumed to be public.
In the next step, AuthenticatePushStep, credentials retrieved in the previous step are used to create RegistryAuthenticator. It need to be initialised, and this process contains few interesting steps.
Based on BuildConfiguration (registry, image), RegistryAuthenticator is created from RegistryClient (which is API Client for image registries, we will use it a lot in further part).
To push anything to Docker registry, there is need of responding to challenge passed in header request. RegistryEndpointCaller encapsulates that logic and prepares RegistryAuthenticator, ready to respond to challenge.
In next step, we are using our previously created Authorization object to exchange it for a new one, this time containing Bearer Token.
Now, when we have authenticated ourselves, there is time to pull original Image layers from the repository — even if JIB target is to build Java application layers by its own, there are still backbones like OS or JDK binaries which need to be downloaded from the repository.
This step’s business logic is wrapped in an enormous try catch. At first, it always tries to go without authentication, and in case of an exception, orchestration of credentials is triggered — based on previous tasks. That’s why will go straightly to pullBaseImage method which is used to create Image object.
The first thing that needs to be downloaded is manifest file. We are using RegistryClient for that (I promised it will be needed in many places). Manifest is a file that contains all information about the image. It contain both base configuration, as well as information about all the layers of docker image. There are two versions of docker manifest (to make everything easier, called 2.1 and 2.2), and JIB support both of them. We will cover only newer schema V2.2.
The first thing we need to download is configuration blob. This blob can be later converted into ContainerConfigurationTemplate.
It’s containing base information about image — when it was created, what OS is used, what is an entry point and also root filesystem for the container.
After we read our configuration, there is time to convert our manifest into real image layers. Method JsonToImageTranslator.toImage iterates over all the layers from the manifest, and convert to ReferenceNoDiffIdLayer using BlobDescriptor, at this moment no image data downloaded are downloaded yet.
When all the layers are initialised, there come the time to download content. The first check test if the amount of layers is equal to the amount of rootfs from configuration object Root File System. In next step, those layers are paired with diffIds to create referenceLayers (blobDescriptor + diffId) which are passed to image builder.
In the next step, ImageBuilder is decorated with original configuration details, like createDate of image or entry point. After all these steps, we have a scheme for creating an image. Please remember one thing- no data apart of configuration is downloaded yet, we just know for what we need to ask our registry.
Orchestration of all the data downloads happens in PullAndCacheBaseImageLayersStep — this step gets them layer after layer.
Before describing the logic of call method, I will pinpoint one thing: ImmutableList.builderWithExpectedSize — another interesting class from Guava. It’s a more performant version of ImmutableList, but only if you pass the exact amount of elements you define during its contraction.
As we can see, what’s really happening in the PullAndCacheBaseImageLayersStep, it’s the only aggregation of PullAndCacheBaseImageLayerStep‘s— and this is our next stop.
The code that covers downloading of layers is surprisingly straightforward, especially in comparison to the convoluted stuff connected to authentication. It basically does what you would expect it to do — first checking CacheReader to check if given layer hasn’t been downloaded yet, and if this is the first time JIB is touching it, a binary blob with its data is downloaded. Only unusual thing is fact that OutputStream that is used to retrieve downloaded data is derived from CacheWriter.
Due to that, downloaded layer will be written directly to cache. Whole logic connected to downloading data is BlobPuller, but don’t expect much magic there — the call to endpoint
apiRouteBase + registryEndpointRequestProperties.getImageName() + “/blobs/” + blobDigest
of Registry HTTP API is executed, and response is passed to mentioned BlobPuller.
I’ll point once again — at this point no data is downloaded yet — we have algorithm, but application is yet to get data. This convoluted process is inherent part of asynchronicity of all the tasks.
In the next step, PushLayersStep, those aggregated Pull layers are passed down once again. But this time, their Futures are (at last) executed (or rather will be executed when PushLayersStep.call will!), and as on every result, makePushBlob method is run.
It is creating PushBlobStep for each of pulls.
It has one job — if any of the Blobs is not yet pushed to the registry, it’s doing that using BlobPusher, using PUT method on
apiRouteBase + registryEndpointRequestProperties.getImageName() + “/blobs/uploads/?mount"
registry endpoint (mount query parameters takes BlobDigest as an argument).
So at the moment, for every LayerPull we have an additional LayerPush step, ready to be executed.
After we did the work for image layers, now there is time for the main reason why JIB was developed — preparation of layers for our Java Application. All those steps were orchestrated for this step.
BuildAndCacheApplicationLayerStep is probably the most important part of the library — it contains all the logic that covers checking if the specific layer is stale or not.
In BuildAndCacheApplicationLayerStep.makeList, BuildAndCacheApplicationLayerStep’s are “manually” created for dependencies, resources, and classes (for some builds, also “snapshot-dependencies” and “extra files” are added). This method is a helper that helps with creating as many of them as necessary.
In the terminology of JIB, they are called Application Layers and prepared using BuildAndCacheApplicationLayerStep. Specific source directories for each category are retrieved from Gradle build.
The first interesting happening there is checking if there are any files (fe. sources) that were modified after the last known layer was created, which is done using CacheReader. If there are no newer files, then the old layer can be reused — we can do first real performance gain.
If there are newer files, the layer needs to be created. Every source root which is retrieved from Gradle configuration is accumulated and all of them are written to layer. The whole process is executed with CacheWriter. That’s the class which covers a major part of the process.
writeLayer is a method where layer is persisted. To create a new blob of data, JIB is using CountingDigestOutputStream, implementation of DigestOutputStream. In comparison to standard CountingDigestOutputStream count bytes, which allows producing BlobDescriptor, needed to create layer metadata.
Following that, compression is added and data is written to previously created UnwrittenLayer, then BlobDescriptor is retrieved. After that, temporarily create layer file is moved to a location where blob with a specific digest should be placed due to Docker specification. Finally, using both data and metadata, the layer is put in the cache and its metadata are calculated. Such calculated cachedLayer is an artifact from BuildAndCacheApplicationLayerStep step.
Now, when we have all the layers, container image can be build.
In BuildImageStep, both PullAndCacheBaseImageLayerStep and BuildAndCacheApplicationLayerStep lists are accumulated and their “Futures” are executed (we need to remember we worked whole the time in the asynchronous world).
After that, afterCachedLayersSteps (nomen omen) method is run. It’s finally constructing the image. Image.builder() takes all the layers and decorate with additional configuration parameters, like entry points or ports, to finally have scheme for Container Image.
Probably surprisingly small effort was done there, but we need to remember that BuildImageStep is finally the step that generate “physical” layers, not only prepare them to be downloaded — what is greatly reflected in method signature.
Now it’s time to fruition over everything we did in the previous steps in PushContainerConfigurationStep. It executes buildImageStep (asynchronicity) and then afterBuildConfigurationFutureFuture (Java naming, once again!), converting an image into JSON file (the same format we started our whole adventure with). PushBlobStep is returned from this task.
PushApplicationLayersStep and FinalizingStep
Finally, we have PushApplicationLayersStep and FinalizingStep, which responsibility is to execute the rest of the Futures accumulated by whole flow. That’s basically ending the whole long process and allow us to be happy with our (finally) created image.
It was very long and probably a bit tedious journey, but I’m very happy you were able to endure it with me. Myself, I learned (or in some cases, I was able to highlight for you) a lot of interesting stuff from this project:
- How Gradle plugins are structured
- What kinds of dependencies Google uses for their projects (which itself is really interesting stuff!)
- How Docker images are built — About manifest, streams, layers. I’m far more proficient about Docker internals than I was week ago
But the probably the most important thing for me — If you asked if I would be able to write such a tool myself after writing this blog post, my response would be uncertain “maybe”. In comparison to my previous state, when I wouldn’t have any idea how to start, I treat it as enormous progress. That’s why it’s worth to write (and I hope also read) articles like that.