Setting up test data for browser tests ⚡

Published in

The Airtasker Tribe

7 min readDec 13, 2018

Automated testing is a key aspect of a solid software engineering culture. Whenever we make a change in our codebase, we want to be confident that we haven’t broken anything. Having a good suite of tests validate every commit gives us that confidence.

This post looks at the testing strategy for Airtasker’s web frontend, and in particular our “browser tests”. Also called UI tests, they spin up a real browser and simulate a real user using the website to ensure it behaves as expected.

Before I dive into the details, let me tell you a tiny bit about Airtasker. Airtasker is a platform where users can post tasks (such as “entertain my pet tortoise”), and others can post offers on tasks (e.g. “I’ll take your tortoise on a tour of the world’s biggest rollercoasters and we’ll become best friends”). Once the poster receives an offer they like, the tasker is assigned the task and off they go.

Here’s our latest marketing video if that’s not clear enough:

The old way

Our old implementation of browser tests had two separate sets of test cases:

“Absolute state” generators, running every few hours.
Conventional test cases, which ran as often as required.

To illustrate, one of our “absolute state” test cases was named TASK_ASSIGNED. It went through the following steps in our staging environment:

1. Open two separate browsers (one signed in as a poster, another signed in as a tasker).
2. In the Poster browser: click the "Post a task" button, set a random title, set a random description, go to the next screen, enter a location, pick a random date, set a budget, confirm and store the URL.
3. In the Tasker browser: open the URL and make an offer.
4. In the Poster browser: view the offers and accept the offer.
5. Record state TASK_ASSIGNED with the task details in DynamoDB.

“Absolute state” test cases were tests in their own right, but they also had the additional property of generating a specific state, which was reused by the other test cases.

Conventional test cases were based on one of the absolute states. For example, the test task_cancellation_by_poster looked up the TASK_ASSIGNED state from DynamoDB, randomly opening one of the pre-created tasks before running more specific steps (ie clicking the “Cancel” button from the Poster browser).

Problems with the old way

There were a few serious flaws with our previous approach for setting up test data. Let’s walk through them, one at a time.

Test data bound to a specific environment

The cron job to generate absolute states ran in our staging environment. New tasks were stored in our staging database, which means that all tests had to run against our staging environment too. It was therefore impossible for a developer to run tests against their local stack.

Because of this limitation, frontend engineers would have to push their work to GitHub, wait for CircleCI to build an image, then yell out “Taking stage!” on Slack hoping that nobody would mind. Finally, they could trigger tests to run against their newly deployed staging frontend.

This was a massive productivity killer. Not only was it a bottleneck, it also meant that we discovered issues too late in the development cycle. We needed a way to easily run tests against a local stack, or any deployed environment for that matter (except production, to avoid corrupting real data).

Lesson learned: allow tests to run in any environment.

Unreliable test data #1

If two tests expecting the same initial state ran in parallel, they would occasionally end up picking the same task from DynamoDB. One of the two tests (or both) would then fail with a cryptic error. If one test tries to confirm payment of a task while another tries to cancel the task, you’re going to have a bad time.

Lesson learned: make each test self-contained.

Unreliable test data #2

Because the absolute states were only regenerated every few hours, you could run out of test data if you ran too many tests. The same problem occurred when an automated process wiped the staging database, every week or two.

Lesson learned: generate test data on demand.

Browser flakiness

The role of our absolute state generators was to set up state for browser tests. But they did so via the browser (two browsers, in fact: poster + tasker). They were building blocks of browser tests, but they relied on the browser themselves.

This means that our absolute state generators suffered from the same flakiness you’d expect from a browser test, and randomly failed on a regular basis. This made it very hard to debug failing tests, as you couldn’t always trust that the initial state was what you intended.

Lesson learned: do not use the browser to set up test data.

The new way

We’ve established that the old way of setting up test data wasn’t great. As it turns out, there were quite a few other issues with our browser test suite. We decided to start fresh, but not without applying the lessons we’d learnt.

Let’s walk through lessons learned so far:

allow tests to run in any environment (prod excepted)
make each test self-contained
generate test data on demand
do not use the browser to set up test data

The solution was right there, staring at us: what if we used our internal REST API directly?

We decided to scrap the cron job and get rid of the concept of “absolute states” entirely. Instead, each test case would make any API calls required to set up the test data it needs.

Here’s an example copy-pasted from our new test suite:

## Successfully submit an offer
* ⚡️ Sign in with poster account
* ⚡️ Create single worker task and persist as “make an offer task”
* ⚡️ Sign in with tasker account
* 🐌 Navigate to make an offer task
* 🐌 Click the Make an Offer button
* 🐌 Click the continue button
* 🐌 Enter offer comment and click next
* 🐌 Verify the fee breakdown is displayed
* 🐌 Click submit offer
* 🐌 Verify offer was successfully placed

Yes, this is Markdown. We use Gauge to write our test scenarios. Under the hood, each of these bullet points is matched to an async function call in JavaScript.

And yep, these are ️⚡️(thunderbolt) and 🐌(snail) emojis.

When you see a ⚡️, it means this step is exclusively using the API. We chose a thunderbolt because it’s much faster than using the browser: in the step Signin with poster account, we just make an API call to /users/create which creates a new poster account and returns an authentication token, which we can use in subsequent calls. That’s a lot quicker and a lot more reliable than clicking “Sign up”, entering user details, clicking “Submit” and waiting for the loading spinner to disappear.

The actual test starts when you see a snail. That designates a browser step: it will always be slower than directly calling the API, hence the 🐌. As far as we know, thunderbolts travel faster than snails. We’re happy to be proven otherwise though.

But you’re bypassing the UI!

You may ask: What about testing the browser UI for creating a task? Aren’t you bypassing it entirely, since you’re calling the API?

The answer is yes. We’re intentionally bypassing the UI, because this specific test was Successfully submit an offer. That’s a test for the tasker’s UI flow. We do have another test whose sole job is to test the poster side of things:

## Create a single worker task
* ⚡️ Sign in with poster account
* 🐌 Open the post task screen
* 🐌 Enter task title
* 🐌 Enter task description
* 🐌 Go to the next page
* 🐌 Enter task location
* 🐌 Select Today as due date
* 🐌 Go to the next page
* 🐌 Enter task budget “100”
* 🐌 Finish post a task

Sometimes, the same step can be done via the API, or via the browser. This is why we have to distinguish our steps with an emoji prefix. Enter valid credit card details is one such example. We have two steps to pick from:

⚡️ Enter valid credit card details
🐌 Enter valid credit card details

Which one do you pick? It depends on the purpose of the test scenario you’re writing. If the test scenario is Enter payment details, then you probably want to use 🐌 since entering credit card details is an essential part of the flow of entering payment details. On the other hand, if the test scenario is Post a task with payment details already provided, then entering a credit card is definitely not part of the flow and it should be done via the API instead.

So far, so good

We’ve been rolling with this solution for a few months now, and coverage has already tripled compared to our previous browser test suite. We haven’t had a single test failure caused by API steps (⚡️). Our tests are now a lot simpler, because you only need one or two API calls to achieve the equivalent of a fairly complex UI flow. We’ve adopted the exact same approach in both our Android and iOS test frameworks. It requires a small mindset change, but engineers end up getting a lot more done. Each test case is small, focused and self-contained. That’s exactly what we want.

We did implement quite a few other improvements with our new test suite, both in terms of infrastructure and process. We switched from Cucumber to Gauge to make tests easier to write. We introduced a thin abstraction layer on top of WebdriverIO to enforce consistent practices. We used a record-replay proxy server to make our tests hermetic and reliable (that’s part of the reason why our API steps are so reliable). We made engineers responsible for writing and maintaining browser tests.

We’ll delve into these other aspects in upcoming blog posts. Stay tuned!