The web E2E testing journey at heycar

Elaine Peronico
heycar
Published in
6 min readDec 2, 2021
Illustration by Vijay Verma from Ouch!

If you are a software developer, there are little chances you haven’t heard or read about end-to-end (E2E) testing, but if that’s the case, don’t worry, the explanation is really straight-forward.

Essentially, E2E testing “refers to a software testing method that involves testing an application’s workflow from beginning to end. This method basically aims to replicate real user scenarios so that the system can be validated for integration and data integrity.” In general words, the tests simulate going through various flows that the user performs while using the application and make sure that they work as expected.

As you may have noticed, end-to-end testing is very important to guarantee the complete functionality of the whole application, especially from the user’s perspective, and we acknowledged that at heycar. Therefore, back in 2019 we started researching more about the topic, how it would work for us, which tools or libraries we should use, which flows were important enough to be tested, how much that would cost and so on.

In this article, I’m going to talk about the journey to implement them in the company, more specifically in the Frontend, all the difficulties we faced to adopt the necessary infrastructure, the tools we used, the solutions we tried and all the learning that came along with this experience. Spoiler alert: it isn’t over yet!

How it started

As a first step, we gathered some members of the engineering team and decided to compare a few different tools that could be a good fit for our expectations, necessities and budget. Cypress, Puppeteer and Testcafe were the ones we were focusing on.

If you have never heard of them, they are all Frontend E2E browser testing frameworks or libraries:

  • Cypress is easy to install and use for both developers and Quality Assurance (QA) analysts/engineers and it has a great open source community. Also, it has very cool features, such as recordings, snapshots, readable errors, automatic waiting, etc. But it can get quite expensive for companies.
  • Puppeteer runs the tests faster than Cypress, uses Jest (just like we do for our unit tests) and works well with continuous integration (CI) softwares due to its headless mode option. Nevertheless, it only works with Chrome and Firefox — in an experimental mode — and has very poor to no support for cross-device libraries such as BrowserStack. Originally that was a very important point to us because our initial goal was to run the tests on it in order to reach the biggest range of browsers and devices.
  • Testcafe, on the other hand, is free, easy to install and use, has a great browser and cross-device support and an uncomplicated debugging process as well as super useful features like HTTP mocking and interception possibility, among other things.

Taking all the mentioned facts and a couple more into consideration, we analysed pros and cons and decided to go with Testcafe.

Animated image from GIPHY

Then, we started to set everything up and began to create a few test cases, starting with what we considered to be the most important flows on the website, such as filtering and favoriting vehicles, creating different types of contact requests and checking content pages — for SEO purposes. And it worked… at first.

The timeout error nightmare…

Unfortunately, the problems started not too long after creating the first set of tests.

heycar has a vast coding ecosystem, and this sometimes affects the speed of the website locally or on the CI environment. As previously said, E2E tests focus on the user scenarios and they look for the website’s HTML elements (e.g. texts, buttons, links, images and others) to ensure that the flows work and the user sees the expected content afterwards. Therefore, slowness often results in a loading status and the correct element not being displayed at a certain time, leading to test failure due to timeout (waiting for too long)… our biggest issue up until today.

The good documentation and support from the community helped us find different solutions to experiment with, which we did tirelessly, and some of them worked for a while. However, the timeout failure problem kept coming back for quite a long time.

The CI build was failing so often that the developers stopped checking the errors as well as ceased updating existing tests according to the new features and flows that were being implemented. In other words, we were losing hope :(

Animated image from GIPHY

Ok, maybe we should start over

After months of no concrete solution, we — the frontend team — decided to go back to stage 1 and search for other tools and/or libraries. This time, Cypress (yes, we hadn’t given up on this one yet) and Playwright were the ones being analysed, but we decided to take a different approach. We experimented with both of them using our current set of tests, running them with each tool for about 15 days, several times a day on a production level.

Playwright, in case you are not familiar with it, is a library built by Microsoft to enable cross-browser web automation that is “ever-green, capable, reliable and fast” and it works for the most important and used browsers like Chromium, Firefox and Safari.

So, why did we give Cypress another shot? Well, I haven’t mentioned this before, but at heycar we have several projects distributed across different repositories. This whole experience so far is about the main — and consequently the biggest — one, which we will refer to as “uno-app” here.

On a different — yet also important — repository (let’s call it the “hanabi-app”) the end-to-end tests run on Cypress and we thought that it would be worth investigating it (again) on a deeper level for uno-app.

Although Cypress was the easiest to set up and run, mostly because of its beautiful and useful UI, it turned out to be extremely slow on the uno-app repository and it would get really expensive due to the amount of tests as well as the amount of times we had to run them on a daily basis. Playwright, however, performed surprisingly well.

30 days passed — 15 for each tool — and by comparison Playwright seemed to be the solution for us, so we went with it. The tool works quite well for the tests we currently have implemented, but it is still quite unstable (at least in our case). So, guess what? We were not that confident about it anymore.

Animated image from GIPHY

Expanding our horizons to synthetic testing

Datadog is commonly used by companies — including heycar — to monitor services, tools and databases because it provides clear log reports and efficient alert dashboards. But their products do not stop there, they also provide automated browser testing and that’s what we aimed for.

The union of Playwright and Datadog is showing promising results for us so far and combined with some extra ideas we are currently brainstorming and implementing, such as a local CI docker container and a few performance improvement tasks, this may be the final point of this demanding — yet exciting — journey.

Animated image from GIPHY

Conclusion

This whole experience taught and keeps teaching us a lot about the importance of having a high-performing website. Sometimes you just have to step back and start over in order to achieve the successful path — and, more importantly, you will find a solution for the arising issues. You just have to keep trying different approaches and not stop looking for something which works for your context.

Have we solved the entire problem? Not yet. Are there going to be more issues to fix? Most probably. As mentioned previously, our codebase is complex, but we will continue doing our best to keep the improvements coming. As a matter of fact, the company is really interested in the topic and even included it in our department’s goals. I can feel the end of the tunnel getting closer… let’s hope I’m right 😬

Resources

If you want to read more about the concepts briefly described in this article, feel free to take a look at these online pages:

--

--