Failing fast with reliable and actionable system-level tests

Posted on February 25, 2016 by Ramsay Ashby

​It’s almost become accepted that when you have selenium tests, or any test that sits at the top of the software stack, you’re going to have random and intermittent failures.

There is some truth in that; it stands to reason that the more dependencies that you have the greater the likelihood of a dependency causing a failure, whether that is related to network issues, third party libraries or components, browsers and even the work of other teams.

It’s a problem across the industry that Skyscanner is not immune to. Within the Web Publishing Squad, we wanted to be able to rely on our automated tests as a means of quickly detecting defects in our products and systems. For tests to be valuable, we require confidence that any notification of a test failure is because of a genuine product issue; one that if left unresolved and released would impact our users.

Since its inception at the start of last year, our test suite has always been stable with a test pass rate close to 99% and a step pass rate around 90%. These numbers were good in their own right, but they still fell short of the level of stability that we would like.

Web Publishing acceptance/system test pass rate October 2015 — click to see this graph full size

While generally happy with this performance, it still meant that we had numerous builds failing erroneously each day, which took time to investigate. There was also a very real risk of the tests ‘crying wolf’ and stopping genuine failures from receiving the attention that they required. We fell foul of this on more than one occasion.

There’s nothing more frustrating than a production issue that not only should have been detected in pre-prod, but in fact was detected, but ignored. That’s not ‘fail-fast’… it’s just a fail .

Over the past few months we have iteratively and incrementally been working to improve the stability and reliability of our tests.

Part of this meant embracing imperfection rather than battling against it.

As a company our systems aren’t perfect and sometimes these imperfections cause test failures. Sometimes a service goes down momentarily, sometimes there’s a network issue, or sometimes something entirely inexplicable happens that causes things to fail.

These failures are genuine in the sense that something went wrong with the product, but no action would need to be taken by the Squad; the tests would simply be re-run and they would subsequently pass because the problem had resolved itself .

It became apparent that we had to embrace these imperfections rather than battle against them: righteousness is worth very little when it costs hours of productivity each week and increases the likelihood of issues escaping to production.

Embracing imperfections

The first step to take was to ensure that everything within the tests was wrapped in the most robust logic that it could be. This is mostly stuff that’s good practice and so was pre-existing, but it also meant that for some actions we needed to add some localised retry logic for any interactions that didn’t quite work as they should.

The second step turned the dial up to eleven and involved some improvements to our test framework to enable test-level retry logic. This is a phenomenally powerful tool for us as it truly is a ‘silver bullet’ that can remove any test ‘flakiness’. But as Churchill, Roosevelt and Spiderman all said, “with great power comes great responsibility”.

We didn’t want to start masking product instabilities through the use of retry logic, but what we did want to do is shield the team from failures that we wouldn’t care about and instead to provide knowledge and confidence that a test failure is something worth listening to.

As a Test Engineer I believe a substantial aspect of my skill set entails understanding, controlling, mitigating and ultimately embracing risks. Seldom would a Test Engineer try to eliminate risk entirely because to do so is slow, expensive and generally ineffective, which is why we choose to utilise matrices and techniques like equivalence partitioning. With the retry logic, we are controlling multiple elements of risk and a substantial ‘convenience’ factor.

Reflecting several months on from our initial implementation of these features, we seem to have hit the balance quite nicely; nothing has slipped through as a result of these changes but we can enjoy the serenity of a highly reliable test suite.

Web Publishing acceptance/system test pass rate February 2016 — click to see this graph full size

Web Publishing Acceptance Test Facts

  • Test step style: BDD
  • Parallelisation: Yes, at an individual scenario level and with simultaneous execution on different environments (such as our internal Selenium Grid and Sauce Labs)
  • Lowest daily average test pass rate in last 60 days: 93.2%
  • Longest consecutive 100% pass rate in last 60 days: 4 days
  • Longest consecutive 100% pass rate ever: 7 days
  • Typical execution time: All builds/project typically under 5 minutes including mobile OSs on Sauce Labs

Learn with us

Take a look at our current job roles available across our 10 global offices.

We’re hiring!
Like what you read? Give Skyscanner Engineering a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.