Flank: Smart Test Runner for Firebase

In Nautical terms, Flank speed is the maximum speed a ship can attain. That’s exactly what we wanted to achieve for our android test automation, and that requirement resulted in the creation of Flank, our latest addition to TestArmada tool suite.

This article details out our android test automation requirements, why we selected firebase and created Flank, How Flank is helping us, and how it can help all Firebase using android developers.

Android Automation @ WalmartLabs

We support both instrumentation (Espresso) and Non-Instrumentation (Appium) based test automation for our developers. Espresso and any other instrumentation based test framework are always more feature-rich and faster than Appium. However, Appium has the appeal of test reuse. Appium can let you reuse your web tests for mobile-web, or native app tests for both ios and android. Teams targeting combination of iOS & Android or web & mobile-web often go with Appium, while native android developers prefer Espresso for obvious reasons.

We use external device cloud vendors to run both our mobile-web and native app tests. We are always looking out for new tools/services for making test automation more efficient at WalmartLabs. That’s how we learned about Firebase and became part of their beta program early last year.

Firebase TestLab

There are three key characteristics on which we evaluate all test automation tools and services:

  1. Speed
  2. Reliability
  3. Actionable Insights

With Firebase TestLab, it’s most attractive characteristic was something totally different, it was pricing. There is always a question of build vs buy running in your mind when you use an external vendor. Firebase with $1 per hour for virtual devices and $5 per hour for real devices makes that question obsolete. Did I mention, they round up test execution time by minutes, and do not charge for device setup and tear-down time.

Cost savings were significant, and we started evaluating firebase on our other evaluation criteria. It scored well all over but we identified one problem: Firebase doest not support Test Sharding.

That was a big problem for us, as we have strict SLA around our PR job execution timings. We like to distribute our tests, preferably one per device to ensure fastest possible test execution time. While Firebase was only allowing us to run tests sequentially on a given device.

This requirement/issue was the core reason why we created Flank.


We started discussing our Flanks requirement with Google Firebase team, they were extremely cooperative and shared lot of ideas and tips on how to create an efficient runner to support sharding.

Renas Reda from our team started working on it, and soon the first version of Flank was ready with these features:

  1. Test Sharding: All your Espresso tests will automatically get distributed to as many Firebase devices, and Flank will output combined test report as if they ran on the same device.
  2. Throttling Support: If you have lots of tests, and you are worried that Firebase may not have enough devices to run all your tests at once. You can specify a Throttling number, and Flank will ensure your test suite would never use more than specified number of devices.
  3. Device Config Support: Pick single or multiple devices of your choice, for your tests through a config file.
  4. Test Reporting: Flanks outputs standard XUnit report format, so that you can integrate it with any CI system of your choice. Flanks also include device name, api-level, locale and orientation for each test, to ensure developers get the complete picture.

This worked great for us, Our tests were running faster, execution cost was low, everybody was happy. We started working on cost projections to get finance approval and noticed one very strange thing. Firebase was charging us 2–3X more than expected price.

Smart Flank

We started debugging price discrepancy. Our use case was a test suite with 312 test cases, which used to take 2 hours to run locally on a single device. We were running same tests on Firebase with 312 shards, Our tests were taking overall ~15 minutes. As local test run time was ~2 hours, we were expecting to be charged around $2 per execution, but we were getting charged over $5.

It took us some time, but we find out the culprit. Remember, I mentioned above how Firebase rounds up test execution time to the minute. That meant no device could be used for less than a minute. Most of our tests had an execution time of fewer than 30 seconds, on average 10 seconds. This round up to minutes meant 312 tests would be charged for 312 minutes, regard less of their execution time. Hence in our case, we got charged for 312/60 = $5.2.

Once We learned that, We started working on a new feature in Flank which takes account of last test run, and then shards tests in such a way that each shard run for close to X minutes. Here X is configurable but defaults to two minutes. We called this feature internally “Smart Flank”.

Smart Flank was soon ready, and we were pleasantly surprised to find out, that not only it kept our cost down but also helped us with test execution time. As multiple tests are using the same device, we observed savings on setup/tear down time, and that reflected in our overall test execution time. Now our 15 minutes tests were running in just around 7 minutes, and the cost was down to $2. In other words, this new feature allowed us to run tests at 2X speed while paying just 1/3rd of our previous cost. Pretty smart, yes?

What’s next

We have open-sourced Flank and would love to evolve it further with community support and feedback. Flank should work with any JUnit based android test suite, and it’s probably the best way to run your android test cases on Firebase with no changes required in your codebase.

Here is our Github Repo: https://github.com/TestArmada/flank


  1. Renas Reda: For creating Flank
  2. Google Firebase Team (Louis Amira, Nalin Mittal, and Ahmed Mounir Gad): Thanks for all your support and ideas.