Android CI with kubernetes

So you want to scale your CI/CD beyond what’s possible with docker? Or you’ve seen the word kubernetes and you’re eager to apply it to your Android development work process? Either way, you’ve come to the right place. In the next several minutes you’ll learn why you even need something besides docker, how to prepare your CI for a team of dozens of Android developers.

The approach described in this article has been tested by running ~6 million UI tests and ~160 million unit tests (so far) while staying sane with managing hardware and 100+ real Android devices. All of this is based on about a year of work helping a very large team of developers succeed in a test-driven environment. We (as in the mobile devops team at agoda) would like to thank everyone who wrote code to help us or asked for help with the use-case that was not achievable before.

All the source code is available on GitHub. Please star, share and spread the word.

This is a follow-up story to the Android CI with docker, we strongly suggest to read it to understand the context of using docker containers and how they can be used in the CI/CD pipeline.

Gathering requirements

We still want everything to be as fast as possible, but fast should be related not only to how fast our code is building , but also to scale any component as fast as possible.

We also want everything related to infrastructure of our builds to be versioned, ideally we should put it in VCS and be able to restore to previous state if something goes wrong.

For any Android CI it’s critical that we cover both JUnit/Robolectric and instrumented tests.


After dockerizing your build pipeline you should already have a container to build-test-deploy your application. For the simplicity let’s assume that this is an all-in-one android build container, after all most of us are using the same machine to develop so it stands to reason that the container is also universal. For example we’ve prepared docker-android as an baseline container.

Although some parts of this can be applied to cloud instances of kubernetes, other parts require you to setup bare-bone cluster which is out of scope of this article. We assume that you already have a working bare-bone linux-based kubernetes cluster running. If not check out the Typhoon project which helps you create minimal and free kubernetes cluster.

We also assume that you’re at least somewhat familiar with the kubernetes terms (Pod, Deployment, StatefulSet, Service, etc). If that is not the case check out this awesome cheat sheet.

You should also know how to version deployments to the cluster. By the way we use helm.

CI/CD installation

We will deploy everything to the kubernetes cluster. Basic CI/CD components are:

  • Server + Agent (Jenkins/TeamCity/etc.)
  • Docker repository (Nexus, Artifactory/etc.)

In our case the agent is supposed to be Android aware, so we strongly advise to create a Dockerfile that includes the agent of your choice and also all the dependencies: agent software+ Java, gradle, android-sdk, etc, but you can start with a baseline image we have created.

CI/CD process for infrastructure

Our app’s code will be going through the usual CI/CD pipeline. We also have infrastructure as a code. So you know what? We will actually use the same pipeline for building and deploying our infrastructure:

  1. Push of new source of our infrastructure code happens (helm package change/dockerfile changes)
  2. An agent with proper access to cluster setup will deploy our infrastructure as a helm package

Unit testing and deploying

To scale our all-in-one image we will create a kubernetes deployment of this image. Depending on our needs we can create as many as we want and scale dynamically by redeploying our helm package.

The image has to be aware of the cgroups restrictions, so you should take a look at container-limit script in the example container which gathers the number of available CPU’s and available RAM and then assigns proper number of gradle worker threads and also the heap through -Xmx/-Xms. A lot of times the JVM is actually forked so you will have a lot JVM instances with the same limits. For example if you use maxParallelForks option for running JUnit tests.

Keep in mind that you should always look at the resources usage by collecting CPU/RAM usage of your containers. Best case scenario: you see that the usage doesn’t change between releases. Sometimes you might face issues that are related to upgrades in build tools, poorly written tests, etc. For example here is a grafana:

CPU usage
RAM usage

Cluster for instrumented tests

This is where things get really interesting. As for the devices we have two options:

  • Android devices connected via USB
  • Android emulators

Both have pros and cons so what the hell, let’s provide both to the developers!

For that we will use this cool project called OpenSTF. It allows us to manage busy/free devices, connect to them using network layer instead of usb bus and also provides nice WebUI tools to interact with Android remotely if you want to debug something or change device settings for example.

Unfortunately deploying and monitoring it is actually a huge pain for us.

Real devices

You need the real provider nodes to join the kubernetes cluster. Choosing proper hardware for USB providers is hard so we’ll refer you to the official recommendations. We will mark these nodes with a label and taint them so that nothing else apart from providing the devices is running there.

kubectl label nodes node-x.cluster.local
kubectl taint nodes node-x.cluster.local

Now that we have the hardware labeled we need to provide these devices so we will run the stf providers using DaemonSet with proper nodeSelector.

To access the usb hardware provider pod needs to:

  • Map /dev/bus/usb as a volume
  • Map /sys/bus/usb and /sys/devices to allow the USB providers to self heal (more on that later)
  • Run container as privileged


We cannot run Android inside the container with just any kernel, so our only viable option is to use another layer of virtualization. Also the emulator should be able to connect to OpenSTF and be compatible with StfService.apk that runs on all the devices connected to OpenSTF.

Android emulators use kvm on linux, so we will need access to the host’s kvm. We will use privileged containers with access to /dev/kvm. The downside of this is, of course, security, but this is supposed to be your internal cluster so we can make this tradeoff. Also this restricts your choice of OS for the kubernetes to linux. We don’t have experience of running windows containters, but we suspect that it’s also possible to run windows version of the emulator on windows host.

The emulators will require the nodes to have kvm device. To distinguish such nodes we will label them as “kvm”

kubectl label nodes node-x.cluster.local

Running the emulator should be done with ranchu engine (QEMU 2) which supports multiple CPU’s amongst other things. Also OpenSTF requires multi-touch enabled which is disabled by default in emulator. We also want to fully control what we actually spawn in terms of device: API version (KitKat? Oreo?), screen size (phone? tablet?). Overall this results in being able to control the command-line parameters and config.ini file for the AVD. For example if you want to specify a 7inch tablet you’ll set

ANDROID_CONFIG=”;hw.lcd.density=160;hw.lcd.height=600;hw.lcd.width=1024; WSVGA (Tablet);avd.ini.displayname=7 WSVGA (Tablet) API 23;”

It’s important to mention that screen resolution is crucial for performance reasons. And there is also a choice of rendering engines for the emulators:

  • auto
  • host
  • angle_indirect(windows only)
  • swiftshader_indirect
  • off

As we’re using linux we basically have a slow(off) options, stable(swiftshader_indirect) and fast(host). Swiftshader is a CPU implementation of OpenGL ES which is needed to render the Android screen output, so swiftshader is the best stable option for now. If you have a GPU equipped hosts in the kubernetes then you can also try to use the host option, but this is out of the scope of this article. One problem we see with this is that the rendering will be inconsistent when you deploy with GPU and without it because of the differences in the OpenGL ES implementation, so for the reason of ubiquitous results we prefer a stable solution.

The docker image of the emulator is mostly occupied by system and userdata (~3Gi and ~1Gi depending on the API version) partitions. In reality they contain much less data, so we will tradeoff some startup time for the size: during the docker build we will compress these partitions by using gzip and during the startup these will be uncompressed. After all the optimizations the size of the resulting image is ~1.5Gi which is sane compared to the initial 4Gi+.

For debugging purposes we’ve put a virtual framebuffer with a vnc server (x11vnc) so that you’re able to connect to the virtual android device.

By default the emulator will have 2 CPU. The RAM usage varies with API version but you should expect numbers around 3Gi-10Gi depending on the usage.

480x800 emulator, API 26
600x1024 emulator, API 25

The emulator we’re using is the x86 version, not the x86_64: after benchmarking with AnTuTu we’ve found x86 version to be noticeably faster.

You can find the source-code for emulator container here.


Now we need to connect the devices to the actual testing agents. Remember that all-in-one image? We need a small piece there: a client which can request devices from OpenSTF and connect them to the local adb server using TCP socket that is provided by OpenSTF provider component. For example requesting 30 devices with API version 25 looks like this:

stf-client connect -f sdk:25 -n 30

When you have a lot of devices in your OpenSTF it starts to get a bit problematic which device should be connected now. For that we’ve implemented filtering by all the keys and values available for the device. You can request all currently available keys with

stf-client keys

If you need to get all the values of particular key, for example sdk versions, you execute

stf-client values sdk

If you want something more interesting then there is a notes field on every device. You can fill it in and then query devices by this note. For example we want to connect tablets with different API versions and different screen sizes. We label them with notes:tablet and then execute:

stf-client connect -f notes:tablet -n 10

If some devices are lost the stf-client will automatically connect more. For example if the device is rebooted for maintenance or if adb implementation on the device disconnected from the adb server.

Now we just run the instrumentation tests and while this happens you can login into a nice WebUI of OpenSTF and look at the execution of the tests.

Stability & more features: adb-butler

As you probably know if something can break it will break. Self-healing to the rescue! We’ve embedded several self-healing scripts into the image with the adb server:

  • check connected /sys/bus/usb/devices and rebind the driver if something is missing from adb devices -l
  • check devices in Offline or Unauthorized state and do adb reconnect

We’ve also added the following to the adb-butler:

  • clean up of soon-to-be-unavailable emulators
  • ability to automatically add notes to stf devices
  • automatic installation of Linkedin’s test-butler for emulators


adb-butler also contains metrics about devices currently connected. It is exposed as a /custom-metrics/devices file which has metrics in an InfluxDB line protocol. There is a telegraf side container built-in that outputs these metrics in prometheus format that are scraped automatically by adding true annotation on the provider pods.

Most of the problems come from USB connections: emulators prove to be much more reliable in that regard, so the dashboards with the number of USB connected devices really make sense.

Test runner

Even though we do all of the above, outages still happen during the run and the only way to get around that is to make the test runner aware of these. For that we actually forked the fork (pun intended). It’s a work in progress, but it solves a lot of real-world problems we faced:

  • reconnecting to devices on the go (i.e. in the middle of the run)
  • rerunning the test on a different device if a failure happens and the device is out
  • visualizing the associations between the tests and the devices to identify potentially faulty devices
  • balancing the execution time of tests

That last on is tricky, so I’ll describe to you the problem first. In all the runners we’ve seen there is an assumption that all tests take the same amount of time to be execute. Runner is also not aware of the flakiness of the test, i.e. how many retries are expected. If you want all the devices to finish roughly at the same time you need to apply some algorithm to take the retries and time variance into account by the tests runner. Sorting the tests by variance of test time + expected value test time will do the trick. In simple terms you prefer to run unstable and longer tests first.

Currently the stats for previous runs are collected only from TeamCity.

There is also the problem that starting with Espresso 3 test executions are isolated from each other. This brings stability but trades off some of the performance. The pauses between the tests is exactly the time spent executing adb shell am instrument. We wanted to batch very fast tests together so that the don’t introduce this delay but still have a clean environment in between batches.

Some statistics about our UI tests: we have ~1.1k UI tests written with our nice Kotlin DSL on top of Espresso framework. That is around 10 hours of execution on one device. The test time is around 35 minutes for 20 devices (including the build time), 30 minutes for 30 devices.


Due to restrictions in Android SDK license agreement we’re not able to share docker images, but not to worry! We still provide you with all the Dockerfile’s needed, so it’s all just a make away:

make PROXY=registry-url/ build tag push


Next we plan to improve the solution by providing better filtering logic in stf-client, better documentation for features in the fork runner and also other sources of metrics for previous test runs. And, of course, we will make it even faster!

Happy building!