Playing with a trio of aces: Bazel, AWS and GitLab

Francisco Lorente
gft-engineering

--

Playing with Bazel in a project is quite a challenge, but adding AWS and GitLab to the mix makes it even more interesting. We had the opportunity to combine those exceptional technologies while building a project called Digital Bank Launcher at GFT. I’ll introduce the project context and share some of the things we did along the way while we were iterating looking for a comfortable solution.

Digital Bank Launcher(DBL) is what we call an “accelerator”: a starting set of pieces, or project scaffolding, and also a way of working, a methodology, that allows to deliver a new bank, starting from scratch, in a very short time. DBL supports different core banking systems and provides the foundation for sustainable development speed and productivity in time. A very interesting aspect about this approach is that it’s indeed valid for other projects as well. Long story short, on the frontend side we were building iOS and Android apps (native and hybrid to be able to cope with different client requirements), and an Angular front end for web channels and front, middle and back office applications. In the middle ware, we’ve used a set of Spring Boot micro services deployed in AWS Fargate, plus a set of DynamoDB tables and other satellite AWS components, both linked by GraphQL API deployed in AWS AppSync with its accompanying resolvers. On the backend side, we’ve used an external core banking product. On top of this high level architecture, many other systems are in place, like a seamlessly integration with AWS X-Ray or end-to-end automatic testing with AWS Device Farm just to name a few.

Being Bazel ready was the first challenge. We configured it so that when a developer used the command

 > bazel build … 

all the technologies were compiled and unit tested. Next, we needed to integrate it with AWS.

Linking Bazel to AWS

One of the goals was to be able to use the “bazel run” command to deploy in AWS . First, we built a set of bash shell scripts to automatic and completely test any AWS deployment. Each of such parameterized script was able to run the right “aws deploy” command for the corresponding CloudFormation yaml template stack associated, e.g. to deploy GraphQL and all its resolvers to AppSync, or to deploy the DynamoDB tables and even to upload test data to them.

We created specific executable custom rules in Bazel for deployment whose output was a simple bash shell script that called the external bash to do the real deployment in AWS. Now that we were ready to play with “bazel run” commands, we needed to toggle everything with an automation tool.

Building a GitLab pipeline

We already had the pieces, then we were able to setup a traditional pipeline in gitlab-ci.yml defining several stages and jobs, and it worked !

But we also wanted to have a more advanced developer experience taking into account the power and flexibility of the cloud. We wanted to achieve that any new feature being built by a single developer or a group of them, should be tested in a complete and isolated environment by the developers, by the testers, even by the Product Owner, in order to reduce friction with other features being built at the same time. That means to decide when to run the pipeline.

We had to decide when to run these deployments, and we chose to do it in Merge Request actions (in GitLab called “pipelines for Merge Requests”). When a Merge Request is created, the pipeline is running. It will also run in any other pushes related to that particular Merge Request.

Creating feature/bugfix ephemeral environments

Many nocturnal blooming cacti have ephemeral flowers, lasting no longer than a day. Juan Carlos Fonseca Mata, CC BY-SA 4.0

In order to achieve such feature isolation environment, we parameterized each bash AWS deployment script in order to deploy everything dynamically, named by application name and stage. It’s what we have called an “ephemeral environment”.

From the developer point of view, these would be the usual actions:

  • Create a Merge Request to the master repo, from a branch in a fork, as the initial step of a new feature/bug development process. Then, in the background, the pipeline is running and then through Bazel commands a new full set of application AWS resources that compose the full ephemeral environment is created . Of course, this first pipeline run takes more time because everything is being built first time, but just only in this occasion.
  • Do commit/push actions as required. Each push action would fire, again, the pipeline to run Bazel again to update its own ephemeral environment in AWS, only modifying what has been changed. It’s a way to get continuous delivery of the individual developer work.

Of course, these environments are deleted after the Merge Request is closed, as it doesn’t make sense to extend them beyond that point.

We’re talking about ephemeral branches/forks, but indeed if there is something stable by definition in the project, it’s the mainline/master branch. For this specific case the approach was not different, we needed to have such latest project content approved by the team (not yet released to production) visible/testable for everybody. We keep a stable environment that it’s updated after each Merge Request is merged.

Optimizing GitLab for Bazel

Although we were satisfied about how everything was running, we also realized that the user experience of Bazel on laptops was very different from what we were seeing in the pipelines, where the time was very high. It seemed very inefficient. It was even worse if we were setting up more than one GitLab runner!

Optimizing GitLab runner cache didn’t give us too much, we needed to understand a bit more how Bazel and GitLab jobs were interacting and finally after many attempts we decided to try to use the same approach as on a laptop: Bazel was building your own branch, getting a runner assigned be only a way of getting CPU for your specific build case. The Bazel optimization would be applied when you run again and again the same Merge Request pipeline.

By default, GitLab, in each job, fetches your Git code to a specific directory to work with it. The Git directory path is composed dynamically based on the GitLab runner internal Id and instance number, this goes against output location assigned by Bazel, that takes into account the workspace root creating a MD5 hash.

Therefore, we selected an advanced option for GitLab jobs in order to define a specific working path for the current Merge Request.

GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_PROJECT_NAME/$CI_COMMIT_REF_SLUG/

(Using this option requires you to configure in your config.toml file, for each runner, the option [runners.custom_build_dir] enabled = true, also to set for each runner the option limit=1)

In our case we’re using GitLab runner with shell option on a dedicated Mac. Such different output base is what allowed Bazel to run multiple builds for the same client on the same machine concurrently.

One additional tip in this way of working: after the first pipeline job runs, it doesn’t make sense to try to do a new Git fetch. An optimization is to disable it in order to earn some more seconds.

GIT_STRATEGY: none

We have been talking about optimizing Bazel for each Merge Request pipeline run, but also in order to try to optimize a bit more we set up a shared cache across the machine defining it in bazel.rc file.

build — disk_cache=~/.cache/bazel

We didn’t get as wonderful results as we expected, but it helped a bit.

Now, at the end, we have something better. We are able to execute the “bazel build …” command in the pipeline in a reasonable time, close to local laptop case (of course not first time build, as in your first time build in a laptop).

Optimizing AWS deployment

We had left aside the AWS deployment. “bazel runs” calls were set in the pipeline jobs after the “bazel build” jobs, we were delegating the infrastructure update to AWS, running always the CloudFormation stacks. Why not? Who knows better than AWS if a change is needed? If there is a yaml template, a change in an VTL resolver of AppSync could be detected by AWS and do the right update.

This was one of the key elements that made us think about parallelization. If you had a change that didn’t need anything else, why can’t it be parallelized along the “bazel build” for software components?

Then we squeezed a bit more the GitLab options: we introduced DAG (Direct Acyclic Graph) way in order to setup explicitly the dependencies of the GitLab jobs beyond the logical “stage”.

It took a bit more time to define such dependencies, also it can be complex to define/maintain them if you have many jobs defined. But as the outcome is to be able to run each job as soon as possible, just when all their preconditions are met.

In our case, we were able to parallelize the update of AWS infrastructure. Also, in this case we had to setup a different Git path in the job different from build jobs.

Other costly cases like AWS Device Farm testing jobs were setup in the rules as manual tasks to allow the team to play them manually as needed.

The second question about AWS deployment was: If I know there are no changes in some specific AWS components, why am I asking AWS to verify those empty changes? This verification needs some time from AWS to complete!

This is when Bazel comes in handy again. One of the strengths of Bazel is its dependency management capabilities: it will only build something if it’s absolutely required. So why don’t we follow the same approach with deployments? Why don’t we deploy only if source AWS dependencies have changed? That was the challenge, to take advantage of our Bazel custom deployment rules to allow the firing of a deployment only if a specific component had changed (in this case custom rule source files).

We achieved this by modifying the output file of the rule. This file included current build timestamp as part of generated code. Thus, when the rule runs the bash script it calls the AWS deployment bash with such timestamp as parameter, which helps it decide whether to deploy. Of course, this requires to keep updated the latest deployment timestamp on the AWS side. In this case, this was done using SSM parameters.

This solution increased a bit the complexity as it required to add additional tools, but it contributed to accelerate the development and validation experience, and therefore to improve the cost of building any new feature or bugfix.

Summary

In this project we have been exploring many options and we continue doing it just now. Step by step, we have enhanced a delivery workflow improving the developer experience and now we have a deeper knowledge of this trio of aces. But also, we are conscious that nothing is immutable, even at this point, many opportunities of enhancements are in the road ahead.

--

--