The Journey to Reliable Automated Testing at Zillow

Jeff Strada
7 min readOct 25, 2021

--

Throughout the years, we learn and gain experience, sometimes from our own mistakes.

I’m Jeff Strada, a Software Engineering Manager at Zillow.

Actual photo of me while hiking.

And I’m writing this short technical artcle to help other teams who might be struggling with automated testing.

Hopefully, you can learn from our journey:

1. We focused too much on unit tests

The only purpose of testing is to check if something works as expected.

We know there are different types of automated tests: unit tests, functional tests. integration tests, etc.

It’s a dangerous truth to know that you can get away if you skip some of them, as long as you’re doing functional tests.

When dealing with a website, functional testing means interacting with the elements through the browser, just like a real user would.

Our first mistake was that we focued too much on unit tests, and not enough on functional tests.

A browser is more than just a JavaScript interpreter. Checking if a bunch of functions return the correct output only proves that the functions are working, and nothing else.

2. Cross-browser testing is critical

A few years ago, it seemed like Chrome was going to achieve full market domination.

But it didn’t happen.

Safari is stronger than ever, currently being the 2nd most used desktop browser in the world.

This is also visible in our own stats:

Testing only on Chrome means ignoring at least 57% of our users. And that’s what we were doing.

This catastrophic mistake was also fueled by the impression that modern frameworks (React, Vue, etc) prevent cross-browser issues.

And we couldn’t be further from the truth.

Our tests were green, and our users were reporting more and more issues.

It was confusing and embarrassing.

3. Our struggle with Selenium

Selenium seemed to be the default option for functional automated testing.

And Cypress was immediately excluded due to the following reasons:
1. Does not work well with iframes.
2. Can’t be used to test on Safari and Internet Explorer 11.
3. Does not work with multiple browser tabs.
4. Can’t be used to access multiple domains in the same test.

Selenium didn’t have any of those limitations, and a small team started building an internal test automation framework.

After 10 months, it was still in an “almost ready” state, without delivering any results, except some trial runs.

What took longer than expected:
1. Dealing with waits, element states and timeouts.
2. Certain actions didn’t work out-of-the-box on Safari and Firefox.
3. The critical chromedriver bugs that never got fixed.
4. Collaboration and integration with CI/CD wasn’t easy.

4. Selenium Grid was the nail in the coffin

The only way to run Selenium tests at scale is through a Selenium Grid, either built by yourself or an out-of-the-box one provided by different cloud vendors.

But there was a major downside in the Selenium Grid architecture:

It was too slow.

And we can find the root cause in the Selenium Grid Architecture Diagram:

selenium grid architecture diagram

As you can see, each action goes over HTTP to the hub.

That means that each Click instruction you make needs to go through a network until it reaches the Selenium node.

Performing a Click shouldn’t take more than a few miliseconds, but a few seconds were added because of the HTTP route.

That means that a basic test that shouldn’t take more than 20 seconds ended up taking 3 minutes.

We tried an alternative approach, to spin up a docker-based Selenium grid on our own machines.

docker selenium grid

But here’s why using a docker-based Selenium Grid was a bad idea:
1. No way to install Safari, Internet Explorer and mobile browsers.
2. Headless browsers on contains behave differently than regular browsers on Windows and MacOS machines.

We hit this wall ourselves when our Chrome tests were green, but MacOS users reported issues with our website that they encountered in Chrome.

5. Understanding what we want

We made a clear list of requirements:
1. Tests need to be executed on all major browsers.
2. Tests need to be executed fast, bypassing Selenium Grid.
3. Tests need to be easy to create, modify, maintain and adapt.
4. Tests should be easy to integrate with CI/CD flows.
5. Multiple users should be able to easily collaborate on the same tests.
6. Tests need to deliver results as soon as possible, not in 10 months.

6. Reaching out for help

We reached out to our network to see if their approaches ticked this boxes and if we can learn from them.

Here’s the most memorable response I got:

It looks like your goal is to deliver fruits to the market, and you’ve convinced yourself that the only way to do that is to build a truck and a road from scratch.
You don’t have to reinvent the wheel.

Turns out there’s a flawed approach in the industry, that’s still going on in a lot of teams.

We focus too much on building a test automation framework, instead of building the actual tests.

7. The simple cure: No-code testing tools

In those discussions, we were a bit puzzled to find out that teams from Amazon, Washington Post, Airbnb aren’t using Selenium or Cypress.

They’re using these no-code tools, which seemed absurd at the time.

Their logic was that it’s the only feasible way in which tests can keep up with the development during sprints.

Turns out they have the same flexbility as most scripting languages.

With some hesitation, we decided to try it for ourselves.

Just 8 Weeks Later

I was looking at our stats in Endtest earlier and it’s satisfying to see what we’ve achieved.

We now have functional tests that are running incredibly fast on all browsers.

And our tests have things like If Statements, Else Statements, Loops and Variables, without us having to write any code.

There are 3 ways in which we run the tests:

  1. From the Endtest user interface:
Running a test from Endtest

2. From our CI/CD system, with the Endtest API.
3. From the Scheduler.

There are other similar tools that you can try, we went with this one because it seemed to be what others were using as well.

Other options are just as good.

As long as they meet your cross-browser and speed requirements.

I can’t help but feel silly after realising the struggles I had to put my team through.

What’s wrong with the industry?

I decided to dig around, to talk to users, to see why there’s such a wide gap in the industry.

On one side, you have teams who deliver quality software at incredible speeds.

One the other side, you have teams who are struggling and claim that automated testing is the bottleneck that stops them from being agile.

It seems one of the root causes is the existence of Gatekeepers.

My friend Tobias said it in a harsh way:

A gatekeeper in this case is someone who’s built a career out of copy+pasting Selenium or Cypress code from StackOverflow.

Building a test automation framework, step by step.

In most teams, they seem to be the ones who block any initiative to try other approaches.

As expected, this tends to happen mostly in companies that don’t focus so much on innovation.

For those of you who have been in the industry for more than 8 years, you might notice the similarity with some Sys Admins who were the most vocal opponents of Amazon Web Services.

We need our servers to be in our own datacenter, I cannot configure or debug anything if they’re on the East Coast.
If Amazon is down, our website will be down.

If you dig even deeper, you might even notice a conspirancy, some contributors to the open-source Selenium / Selenium Grid libraries are paid employees of for-profit companies that sell testing infrastructure as a service.

It might be one of the key reasons why these open-source solutions with such slow speed and performance managed to stay relevant for so many years.

--

--