Think before you Check

Photo by Tim Gouw on Unsplash

Before you stop reading thinking this is just another article extolling the virtues of using the term ‘Checking’ for test automation, I urge you to persevere…

This is an article about a ‘check’ we’ve recently come across in one of our test automation suites that has been intermittently causing builds to fail for some time now, and how we should identify other such ‘checks’ and eradicate them from our test automation pipelines.

The Check

Written in 2014, the ‘check’ consisted of an NUnit test that compares the latest Azure version with the version we are expected, and if the versions are different, fails the build. The exact reason for the check is unclear, but one can only assume that at the time, the product in question was encountering many problems with every Azure release.

The problem with that is?

Fair enough, I hear you say; what’s wrong with that?

In the past 5 years, the expected version has been updated >40 times, most being in the past 2-3 years. That means the test has unexpectedly failed over 40 times due to something totally out of our control and something we know will happen many more times in the future.

Using this model, if Microsoft started making Azure releases every day, which is not beyond the realms of possibility, the build would be failing almost all the time.

We should not care what version of Azure is currently available. If a release of Azure causes a problem with our software, then the tests that are exercising functionality of our product should be catching these issues.

This ‘check’ is simply a notification that the Azure version has changed and is now used as a trigger for someone to manually invalidate the SQL Compare record/reply traces, update the version and rerun the tests.

What’s the message?

In conclusion, this isn’t really about that specific Azure test (see below); It’s a reminder to not abuse test automation by using it as a notification system.

It may seem like a good idea to write a test that checks some system or environmental thing, but what is that telling you about the product if it passes or fails? Automated tests should exist to provide information about the quality/stability of a product.

If there are undesirable behaviours in the product caused by such environmental changes, then these should be found by tests run against the actual product, and if not, then additional tests should be created.

Status, environmental, any other alerting information is far better placed on a monitoring dashboard, or via some other notification system, than in your development pipeline.

I know you’re dying to know…

So… what did we do with this Azure ‘check’? Long story short, we deleted it!

After a brief discussion about the consequences of not having the test, and deciding that the existing tests would likely catch a major issue should one be introduced, the test was removed.

We have a couple of tasks on the backlog to investigate using the Azure version information to trigger rebuilds among other things, but for now, our tests will no longer fail every time a new version of Azure is rolled out (unless it breaks our product of course!).