Automated end-to-end tests and how they fit into our testing culture
We move fast at Carousell. We update and release brand-new versions of our Android and iOS app every week. This adds up to a whopping 50 versions each year, each targeting multiple marketplaces with different feature sets and country-specific customisations!
On top of that, there are also nightly releases and the occasional hotfix. Not to forget, we continuously deploy new features on our Web platform.
At Carousell, we take updates and feature roll-outs seriously. But with so many frequent updates, this begs the question: How do we test all this to ensure our users get the best possible experience?
Nightmares of release day
Let’s rewind to early last year, and take a look at how we traditionally tested our releases. Friday, our release day, has always been a dreaded day for our test engineers. We would come to work and install last night’s release candidate, before verifying its stability by manually executing our “sanity tests”.
These sanity tests are a small set of regression tests to ensure the core features of the app still work:
- Can users still sign-up?
- Can they log in?
- Can they list and buy items?
- Does the chat work?
- And so on...
Back then, we had around 50 such test cases that we had to verify. Manually verifying these features was tedious , and we had to repeat it for every build.
If a bug was found we worked together with the development team to create a hotfix and then perform yet another round of sanity to rule out any side-effects of the fix.
This “fun” process would continued and repeated until we had a stable release. It was only then could we start the roll-out at some point late, sometimes very late, on Friday or Saturday morning. It could become quite draining and, sometimes I’d wonder if I’m testing the sanity of the app or myself.
Let’s automate it
Sanity testing with its repeated processes made it a prime candidate for automation. However, our automation engineers were so caught up with manual testing process that this effort never gained traction.
Having them dispersed across multiple development teams, and three development centres around the region didn’t help either. To break out of this sanity testing cycle, we decided to create a dedicated test engineering team to work full-time on automation.
First, we revamped our testing framework. Having to test the same (or similar) features on three different platforms, Android, iOS and Web, we decided to build something that allowed us to specify the test steps and most of the page objects only once, and could be re-used across all platforms. Our framework is written in Java and uses Cucumber with Appium (for mobile) and Selenium (for web testing).
Second, we began automating the existing sanity tests and started adding new ones. With this done, we went one step further and set ourselves a more ambitious goal.
What if we don’t run the tests once a day, but use them to verify every single pull request before the author can merge the change into our code base?
This check would give our developers faster feedback about the stability of their change, hence the term “fast feedback tests” (FFT) was born.
We split our regression tests into three sets: Regression, Sanity and FFT, each a subset of the previous one. Regression includes all our automated end-to-end tests and runs once a night. Execution time is not critical here so we can add more complex and specific test cases. Sanity tests are tailored to finish within an hour and are used to verify the stability of each release candidate or hotfix.
FFT only includes the most basic tests for the core functionality of our app. The goal is to receive a test result no longer than 30 minutes after an engineer pushes a pull request to the Android or iOS code base. Given that it takes ~15 minutes to build the iOS app from source (Android is faster), there’s not much time left for testing.
Considering the goals we set ourselves, we had to think about how to scale our testing infrastructure. We started with an in-house device farm of two Android devices and two iPhones and soon realised that this wouldn’t get us anywhere close to the time we wanted our tests to complete within.
Move it to the cloud
We started diving into cloud solutions which offered access to physical Android and iOS test devices. After extensive testing and some tweaking of our framework, we started running our Sanity and Regression tests on 20 (and growing) phones on the cloud.
For every run, we distribute the test cases across all available devices. UI tests are inherently slower than backend ones, but heavy parallelisation allows us to still get the results within the ambitious time limits we set ourselves. That said, we also improved our framework and test cases to decrease the execution time further. It was tempting to speed things up by simply adding more and more devices, so we constantly had to remind ourselves not to fall into this costly trap.
For Fast Feedback Tests (FFT), we needed even more flexibility. The number of pull requests fluctuates with high peaks before code-freeze (Thursday night), and almost no commits over the weekend. None of the solutions we looked into fully supported our needs.
Almost all vendors have plans charging by a fixed number of devices which doesn’t allow us to scale up on days with a higher number of pull requests and thus a more significant number of test executions. Of course, we could buy a bigger package but then a lot of the devices would remain unused during lull periods.
We took one of Carousell’s core values to heart, “being relentlessly resourceful” and decided to give the growing number of old MacBooks that kept stacking up in our offices a second life. Instead of running our Fast Feedback Tests on real devices we opted for using simulators and started building our own in-house iPhone simulator farm (Android is next) on the MacBooks.
Using Ansible, Appium, Selenium Grid and some more tweaks to our existing test framework we built a testing farm which allows us to scale up and down as needed. (Stay tuned for an in-depth write-up about this project!)
Fridays are a lot more relaxing now. Once the nightly build is ready, it triggers our Sanity (and Regression) tests. In the morning, we just need to check the results. Most weeks we are good to release before lunchtime. Instead of repetitive manual testing, we can use the time to write new test cases or further improve our automation framework.
The best part is if there should be a need for a hotfix release, we can run another batch of our tests with just a press of a button. Mobile automation has been around for some time, so our improvement might not sound too spectacular at first. However, our appreciation for automated UI tests has indeed reached new heights after doing manual regression testing for half a year.
Everyone’s a tester
However, the main change we’re propagating at Carousell is a cultural one. Instead of relying on a QA- or testing team we strive to create an environment where everyone feels responsible for and takes ownership of the quality we deliver. The test engineer’s primary objective is to show everyone their role in our overall testing strategy, get them excited to be part of it and enable them to contribute to the quality of our product.
One year ago, we started automating features in a dedicated test engineering team. Now that we got things rolling we get everyone else involved again. Product managers use Gherkin to describe how a feature is supposed to behave, developers refine these scenarios and write automated tests for it. Instead of trying to catch up automating the features of a growing number of teams we are going to establish automated UI tests as part of the development process within the teams themselves.
We still have a long way to go or as we like to say at Carousell “we are always just 1% done” but we’re already starting to reap the benefits of our testing strategy and it’s only going to get better from here.