Building our In-house Virtual Device Lab “Caroufarm”
To address the growing mobile device needs for automation testing, we have chosen to create an in-house virtual device lab as a cost-effective alternative for the real device cloud vendors in Carousell. This write-up is an in-depth article about the different stages of this project and the challenges we faced along the way.
In the previous article on Automated end-to-end tests and how they fit into our testing culture at Carousell, Martin shared on the need to run UI smoke tests for each pull request to provide developers with faster feedback. He also touched upon how we started to build an in-house simulator farm to execute those tests. Please read that article to learn more about our thought process behind building our virtual device lab instead of using any of the existing cloud vendors.
New life for old Macbooks (“Caroufarm” v1)
Overview of the solution
Test framework: Our test framework is written in Java, builds on top of Selenium and Appium for automation and makes use of Cucumber for writing tests in BDD format.
Simulators running in Macbooks: We use phased out MacBooks which host around eight simulators and Appium servers each.
Selenium Grid: We use Selenium Grid to execute tests in parallel against multiple simulators. For each simulator, we maintain a node config.
- Downloading the app built from the commit in the Pull Request
- Resetting the entire Selenium Grid and simulators
Finally, the tests are executed against the Selenium Grid URL. Selenium Grid takes care of distributing the tests to different simulators.
This setup worked well until there was an increase in the number of PRs and the number of tests we wanted to execute.
Some of the issues we encountered with increasing load are:
The Selenium Grid, we relied on for test distribution was not distributing effectively. We observed some of the simulators were still free, but the tests were queued, which increased the wait time.
- During the test run time, we open a few numbers of simulators parallelly and during the reset, the processes were not properly terminated. The tests started to accumulate and made the simulators to hang. (Again increase in wait time.)
- We ended up running a job to terminate the process and reset the whole simulator farm for every ~4 PRs (instead of daily as initially planned). (Again increase in wait time.)
- Adding/spawning additional simulators based on the current need was not convenient enough.
- We ended up spending significant time on the maintenance of our infrastructure, for example, wifi connectivity issues, software updates and hardware maintenance (remember we used old MacBooks!)
Let’s build it for Android! (“Caroufarm” v2)
With these learnings, we set out to create an emulator farm for Android devices. We also added another ambitious goal of running our whole regression suite (~150 tests) on this infrastructure, which means it should be easy to scale it up based on demand.
Overview of the solution
Dockerize the environment
One of the root causes of many problems we were facing is that the processes were not entirely isolated, which resulted in resetting the whole infrastructure multiple times.
To overcome this, we first dockerized the whole test environment. Each docker contains an Appium server, an Android emulator, the test framework code as well as other configs. This containerisation made scaling easy, as we only need to increase the number of containers as needed. (The docker image we use is inspired by Selenoid; however, we made quite a few changes to make it work for our requirements.)
Package the test suite & framework as an executable jar
As said above, our framework is built using a Java-based tech stack, so we created a jar with all dependencies and deployed it in our Maven repository which made the distribution of the test suite to different containers easy.
Use Queues to Distribute the tests
Since Selenium Grid was not effectively distributing the tests, we decided to distribute the tests using a queuing service (AWS SQS). A simple queue observer running in each docker container polls the queue for jobs.
When a test job is triggered, it sends all test details to the queue. Then the queue observer picks up the test one-by-one and executes it after starting the emulator and the Appium server. Once the tests complete, the results are sent to the result queue.
- The time taken for running the tests was reduced by ~⅓
- We can scale it and run the regression suite on the same infrastructure.
- Resetting of the whole infrastructure happens once a day
Moving on-premise iOS solution to the cloud
We are implementing the same infrastructure for iOS automation, with some tweaks to overcome the technical limitation that iOS simulators cannot be run within a Docker container.
Right now, we use both simulator device farm and real devices (BrowserStack) for our automation runs. The test execution on a physical or virtual device happens based on the Jenkins configuration. The next vision of this initiative is to abstract the routing logic; based on the device availability and test requirement, Caroufarm will choose the best available device automatically.
Shoutout to the QA team Martin, Long, Abhijeet, Syam, Ngan, Chia Hung and Eva for making this possible.