How to Run e2e Tests for Every Commit Using testim.io, GitLab CI, and Cloudflare

Or Kaplan
Remitly Israel (formerly Rewire)
6 min readSep 29, 2021

Running e2e tests on every commit while maintaining stability is not an easy task. In this post, you’ll see how our team at Rewire uses testim.io in order to check every commit on our web app in order to maintain high product quality and limit serious bugs on production.

Looking for a fresh start

Running e2e tests before merging our code to the main branch is not a new concept. At Rewire, our flow works like this:

  1. The developer opens a PR from the feature branch in order to integrate a new piece of code into the main branch
  2. At least one other team member must perform a code review and approve it
  3. Run a set of automated tests including integration and e2e tests
  4. Once all steps are completed, the developer can merge the code (using marge bot)

Before choosing to work with testim.io, we used protractor in order to run e2e tests, which exposed the following challenges:

  1. A developer must run and maintain the tests, which can take a lot of time and effort
  2. The tests failed too many times, some of them sporadically, which made it hard to follow
  3. The test ran on a Kubernetes pod, which required us to set special shared memory settings, which affected the stability and the performance of the test
  4. The test code was not clean, so it was really difficult to maintain

Why did we choose testim.io?

Frustrated with our previous e2e automated test flows, we looked for a better solution. We really liked testim.io from day one as its intuitive platform could free our developers from writing complex e2e tests and allow them to focus on building a better product and improving quality in other places. That way, any team member could generate a test that simulates the most important flow of the user.

Some of our favorite features include:

  1. Record a test on a live environment and replay it when needed
  2. AI-based element detectors — the test didn’t change if we changed the location or the color of the button, which was really important to us as we were in the middle of rebranding our app
  3. Easy to run from our CI (as you already know, we use GitLab CI)
  4. Thanks to the great and intuitive UI that provides console logs, network logs, and visual assistant, failure points are really easy to detect

Still, there are some conflicts. Testim.io is running on its own selenium grid, on its cloud instances, which is a bit challenging for us as it requires us to expose our temporary environment externally in a secure manner.

Challenges in setting up the e2e environment

In order to keep our e2e test stable, we wanted to build an environment that wouldn’t be affected by other tests or human errors. Therefore, we decide to create a fresh environment for every new build. To do so, we leveraged the GitLab CI Kubernetes executor infrastructure in order to simulate the main components of our live environment in the following order:

  1. Fresh database instance
  2. Redis instance
  3. Our API services

All of these services were loaded as part of the executor pod context and were not accessible from the internet, which proved to be a great challenge for us as testim.io requires a public endpoint for the tests.

In addition, we had to use external services that had no sandbox environment. So, we had to mock their response as part of the user flow.

Eureka — Connecting the dots

To make our environment accessible, we had to expose 3 main services:

  1. API service — naturally requires access to our temporary database and provides a consistent state to our web app
  2. Web service — serves our single page application to the browser
  3. Mocking service (aka e2e service) — exposes mocks in order to support external service provider behavior, such as payments, storage, etc.

Exposing our e2e architecture

Our e2e environment high-level diagram

In the diagram above, you can find our set of solutions for the aforementioned challenges. Each service was exposed differently:

  1. Using Cloudflare Argo tunnels, we were able to expose our API and Mocking services to the internet using a temporary DNS name. In order to do so, we built a special docker image that includes the Argo tunnel binary and is able to fetch the required certificates from our HashiCorp Vault instance. This image was the base image for the executor who ran the test.
  2. We were able to expose the most up-to-date version of our web app to the internet, which was set to access a specific API endpoint (as demonstrated in our article about building staging environments using Cloudflare Workers). In that case, we had to provide testim.io with the relevant web app URL and API version to test against.

Now, all that’s left to do is run testim.io’s CLI with the relevant variables in order to access the web app and relevant services that are now exposed externally.

Sounds easy, right?

Well, in real life nothing comes easily. As we quickly learned, there were many issues to tackle in order to make the test run perfectly such as:

  1. Orchestrating the services. We had to make sure our satellite services would start only after the main executor service was loaded. Since those services needed some build artifact that the main container in the pod copied in order to run as part of its bootstrap process.
  2. Argo tunnel bootstrap time is not predictable, so we had to test the tunnel connectivity before running the testim.io CLI.
  3. CORS issue — unfortunately, this bug was not easy to find. When running the test, testim.io sends a different origin header than the base URL host, therefore, we had to support custom origin headers when replying with our CORS headers.
  4. Exposing the logs on failure since the test did not always pass, we had to provide the logs of the temporary services to the developers. In this case, we didn’t connect the services to our central logging system, so for each build, we exposed artifacts that contained the logs.
  5. Logs were not enough. The developers had to see the content of the database that stores the app state. Therefore, we added a dump of the database upon failure and a restored script.
  6. DNS collision — as multiple tests might run in parallel, and in order to avoid collisions, we need to randomize the Argo tunnel name as two Argo tunnels cannot run with the same name. Having two Argo tunnels with the same name is problematic as the test might consume the wrong API.
  7. DNS pollution — each test creates a few DNS records so we had to clean up those records in order to avoid having junk on our DNS provider.

After overcoming all of those challenges, we got to a pretty stable and quick e2e test that covers the main user flow on every pull request pipeline.

Finally, we were happy with the results

Combining the set of tools we selected, we were able to build an effective e2e test that validates our latest version while keeping our developers happy by letting them deliver quickly.

  • Testim.io enables us to build stable tests that are easy to construct, run, maintain and debug
  • GitLab CI provides the infrastructure to build and run complex testing architecture
  • Cloudflare workers make it easy to expose the web app and reduce the load from the build servers
  • Finally, Argo tunnel makes it easy to expose our internal services to the testim.io platform

That’s it. Stay tuned to the following post in this series: we will talk about how we monitor our Cloudflare workers and what custom infrastructure was built for that purpose.

--

--