Setting up Automation Testing at Zolostays

Kushagra Mangal
Zolo Engineering
Published in
9 min readApr 6, 2020

As many of us have already heard “With great power comes greater responsibilities”. Similarly, when you build software that is going to be used by customers, you need to tread carefully to ensure that your customer gets the best experience from your product. This is where testing comes into the picture.

At Zolo initially, our development pipeline was -

Old pipeline
  1. Get feedback to create a feature
  2. Develop the feature
  3. Deploy to QA servers
  4. Wait for days to get QA sign-off
  5. Fix any bugs found by QA
  6. Finally after QA sign-off deploy to production.

This process had the following issues:

  1. Wait for QA to completely test the product to find all the bugs in mobile and web in all the browsers is going to be a lengthy cycle.
  2. As its a manual process there is a high probability that some tests will be missed
  3. The feedback loop between deployment and bugs reported to developers is not quick
  4. Each test needs to run in an isolated testing flow, but this is difficult to maintain in manual testing.

Due to these reasons and after some regressed bugs that got released to production, we needed to ensure this never happens again.

Our journey to automated testing

Listing out the necessities

Before starting with the research process we needed to have bare minimum objectives that we are intending to achieve:

1. Test flows to be written in javascript

A most common question asked by among our team was — “Why do you need test flows in javascript?”

The idea behind this ideology is to improve the capability of front-end developers to support testing and writing test cases. As they can drive testing and keeping that approach in mind ensure that whatever they build, is highly testable.

2. Scalable and Cross-Platform System

We wanted this framework to be capable of testing in multiple browsers and as the number of tests increases so should its ability to scale with that.

3. Minimize feedback loop time between deployment and report of bugs to developers

We needed this framework to be integrated with the deployment pipeline to run all the tests automatically as soon as a product is being tried in development servers to ensure, a better quality product reaches QA servers.

This gave us a clear view of how we want our new pipeline to be:

Proposed System

By the end of this implementation, we have improved our deployment pipeline according to the proposed system:

  1. A requirement for a new feature arises and developers build it as soon as possible.
  2. Code is pushed to development server — (It’s tested if no bugs found then deployed to QA servers else developer gets an immediate email with bugs)
  3. QA validates the report and gives a quick signoff
  4. Code is ready for production, So it's first deployed on Pre-Prod Environment(Another rerun of tests) and then deployment to Production.

Research and Proof Of Concepts

So the different approaches that we tried out as POC were:

1. Selenium with Java
This was started by our QA team to try to implement an automated testing mechanism in their local systems

It was a good approach in terms of customizability and community but had certain issues that lead us to different systems.

  1. As it was in java, it was difficult for frontend developers who worked in React and javascript.
  2. Manually needed to handle the problems caused by the Selenium initialization process, it’s waiting and retry mechanisms.
  3. With being more customizable, it was more complex to implement.
  4. Building time with maven.

Still, it could have worked, but we wanted to try different approaches so we tried next with Cypress.

2. Cypress

This was a framework that was built in javascript and natively worked on browsers. It was like a dream come true to prevent any kind of flaky tests.

This was a lot simpler setup process, with npm, and then writing test cases in mocha architecture.

So we checked out different best practices to work with this and started building some flows, helper plugins, and base testing code. Which due to its simplicity allowed us to get most of the basic setup and flows being written in a matter of 3 to 4 days. It was looking like a great system for us.

We built a docker image for it and tried to integrate it with our Jenkins pipeline. This docker image was self-reliant i.e. it included the necessary browsers and tests to run.

Then we started facing issues that brought us back to “Nothing can be perfect”, i.e. “It was too good to be true”.

So following were the issues we faced:

  1. It supported only chrome at the moment — It was still an acceptable tradeoff for all the internal products that we have as it was a lot more stable testing than selenium.
  2. Headless browser support only for Electron — This was a deal-breaker as electron was still with Chromium 73 when chrome was at version 78, Also it's working with test cases were coming differently than chrome.
  3. Huge docker image — Still an acceptable tradeoff
  4. Tests used to hang when running in docker — This was a fixable issue, which I will add here for anyone else who got into this problem.
    This was fixed with following code in plugin.js file (though the exhaustive test was not done to ensure if it fixed the issue completely)
Cypress fix for chrome in a docker image

5. Issues with frames — This was a major blocker as it prevented us to run any payment flows.

These issues which were mostly acceptable for internal products but not customer-facing products made it an alternative we could think of for our internal products.

3. Nightwatch

Again back to selenium, with javascript, we had two choices. Either go with Webdriver.io or Nightwatch.

Both of these frameworks were good, but Nightwatch had managed selenium drivers handling that lead us to try Nightwatch.

We first tested and ensured this didn’t have the issues we faced with cypress than came up with file structure for the framework.

.
├── application # Name of application being tested
├── commands # Command helper for nightwatch
├── configurations # Configuration folder
├── config.js # Contains some config functions
├── globals.js # Nightwatch globals file
├── dataset # Datasets used for test cases
├── helpers # Helper flows from CRM
├── pages # Page Objects for different pages
├── reports # Reports folder
├── tests # Tests folder
├── desktop # Desktop tests
├── mobile # Mobile tests
├── nightwatch.json # Base configuration for nightwatch
├── nightwatch.local.js # Local configuration for nightwatch
├── nightwatch.server.js # Remote testing configuration
├── Dockerfile # Dockerfile for the test suite
├── package.json # Dependency list
├── Jenkinsfile # Jenkins pipeline file
└── README.md

We wrote a few of our major testing flows, which included similar flow but different test cases for desktop and mobile as our website was adaptive and served content according to the device. And we stuck with it as it solved our objectives. You can read about our adaptive web implementation here:

Nightwatch and Zalenium — POC

With Nightwatch, we build some of the base tests flow for mobile and desktop using page objects, and common commands. Had commands for our reusable flows, had a JSON dataset(As this was a POC we just wanted to deep dive with dataset), few utilities with default reporting supported by Nightwatch.

Now as our POC framework was working fine locally, we needed to integrate this with our CI/CD (Jenkins) with tests running in remote servers.

To have a remote testing format, we needed to have a grid of browsers running which would be able to run these tests and give us reports on them. To implement this selenium grid was the best option that was available, but to have improved scalability in the selenium grid on our Kubernetes cluster we preferred Zalenium.

So we set-up zalenium in our Kubernetes cluster, with its docker containers running chrome and firefox.

With this our automation process POC was looking good, as we created a docker image with our test cases which were integratable in our CI/CD Jenkins pipeline and brought our POC to a success. Our architecture looked like:

Testing architecture
Deployment flow
Testing Flow

POC is fine, what about actual implementation?

So after POC, we needed to add certain things to our framework for it to be used with production systems:

  1. Support for different environment testing
  2. Three browsers testing — Safari, Firefox, and Chrome
  3. A better reporting for test
  4. More controlled deployments

1. Support for different environment testing

We added configurable environments to our service. This would enable testing our product running in different environments without changing any code.

2. Three browsers testing — Safari, Firefox, and Chrome

With Safari into the picture, a lot of issues needed to be handled as it has certain different requirements regarding how it acts and works.

Also with respect to Zalenium docker images that we were using, we needed to decide if we were going to add MAC OS physical machines to the grid or use cloud testing tools like lambdatest or browserstack, so based on our requirements we decided to connect lambdatest with our zalenium hub. As it had native integrations this process was simple, but what was going to be a problem was how the safari browser is different in functionality than the rest of the browsers.

So these were the main issues we faced with Safari browser:

  1. In Safari 13 performing click via selenium-driver was messed up, due to which any selenium click commands didn’t work on Safari
  2. Safari had issues with how it worked with frames
  3. Safari didn’t have configurations to be run in a mobile environment as with chrome and firefox.

So for these, we wrote customClick commands, used appium for mobile testing and made changes to our framework so that tests can remain the same for all the browsers.

3. A better reporting for test

This was an important aspect, as the reports generated by Nightwatch by default were not that good. As we were using allure reports initially, we decided to create an npm package that could be used as a reporter with Nightwatch. It can be found at the following URL:

This allowed us to get the reports in a structure where we will be able to see which browser was used for which test, what was its functioning in each step. If it had an error and screenshot was taken for it, then embedding that in the report.

Still, there are certain features that remain to be added to this package including browser console logs, custom tagging of tests, etc.

4. More controlled deployments

This required changes in our Jenkins pipeline to have a stage of testing that validates for any failed test cases, lighthouse score reductions and only if all’s well, then do the deployment.

This also included emailing the testing reports with Jenkins's link to the concerned teams.

This article didn’t include much of the actual implementation steps to deploy zalenium or code of how to use Nightwatch. For that, I will be writing another article with a complete walkthrough on all the steps that need to be done to complete this setup. I will update it here on completion. Thanks and keep testing :)

PS: If you are into Automation and find terms like Infrastructure Automation, CI/CD exciting as much as we do, then drop us a line at join-tech@zolostays.com. We would love to take you out for a Coffee ☕ to discuss the possibility of you working with us.

--

--

Kushagra Mangal
Zolo Engineering

Software Engineer at Zolostays | MERN Stack, AWS, Docker, Kubernetes, OAuth | Technology Enthusiast