Whitelist Testing vs. Blacklist Testing

From a IT security point of view, the current approach to GUI test automation is careless or even dangerous. And here is why…

A general principle in IT security is to forbid everything and only allow what is really needed. This reduces your attack surface and with it the number of problems you can encounter. For most situations (e.g. when configuring a firewall), this means to apply a whitelist: forbid everything and allow only individual, listed exceptions. And make sure to review and document them.

Compare this to the current state of the art of test automation of software GUIs. With tools like Selenium — the quasi standard in web test automation — it is the other way around. These tools allow every change in the software under test (SUT), unless you manually create an explicit check. With regard to changes, this is a blacklisting approach. If you are familiar with software test automation, you know that this is for good reasons. It is because of both the brittleness of every such check and the maintenance effort it brings about. But apart from why it is that way, does it make sense? After all, false negatives (missing checks) will decay trust in your test automation.

To be defensive would mean to check everything and only allow individual and documented exceptions. Every other change to the software should be highlighted and reviewed. This is comparable to the “track changes” mode in Word or version control as used in source code. And it is the only way to not miss the dancing pony on the screen, that you didn’t create a check for. At the end of the day, this is what automated tests are for: to find regressions.

Tracking changes in Word is what we want for our software also

Of course, for that approach to work in practice, there are a few necessary preconditions:

  1. We need the execution of the system under test (SUT) to be repeatable (e.g. use the same test data). This is a very sensible idea anyway. And it is way easier with today’s tools of virtualization and containerization than it was a couple of years before.
  2. We need to deal with the multiplication of changes. Every change to the software shows up in multiple tests, probably multiple times. E.g. if the logo on a web site changes, this may well affect each and every test. Yet it should be necessary to review a change only once.

The dose makes the poison

There is an ideal amount of checks for every software. Everything than can change without ever being a problem should not be checked. And everything that must not change should be checked.

Whitelisting (of unproblematic changes) vs. blacklisting (of problematic changes)

There are two important considerations when choosing between the two approaches:

  1. How do you reach that middle ground in the most effective way?
  2. What “side” is less risky to approach it from, if the perfect spot is missed?

IT security guidelines recommend to err on the side of caution. So in case both approaches create an equal amount of effort, you should choose whitelisting. But, of course, you usually don’t have equal amounts of effort.

Whitelisting vs. blacklisting in reality

A real-life example

Imagine you have a software that features a table. In your GUI test, you should put a check for every column of every row. With seven columns and rows, this would mean 49 checks — just for the table. And if any of the displayed data ever changes, you have to copy & paste the changes manually to adjust the checks.

Software under test with a table

Starting with a whitelisting approach, the complete table is checked per default. You then only need to exclude volatile data or components (typically build-number or current date and time). And if the data ever changes, maintaining the test is way easier, because you usually (depending on the tool) have efficient ways to update the checks. Guess which of the two approaches is less effort…


Text-based vs pixel-based whitelist tests

There are already tools out there that let you create whitelist tests. Some are purely visual/pixel-based, such as PDiff, Applitools and the like. This approach comes with its benefits and drawbacks. It is universally applicable — no matter if you check a web site or a PDF document. But on the other hand, if the same change appears multiple times, it is hard to treat it with one go. Whitelisting of changes (i.e. excluding parts of the image) can be a problem.

Approval Test and TextTest are text-based tools that are much more robust. But PDFs, web sites, or software GUIs have to be converted to plain text or images for comparison. Ignoring changes is usually done via regular expressions.

Shameless self-promotion:

I am only aware of one tool that is semantic, can be applied to software GUIs, is not pixel-based (although it can be), and easily lets you ignore volatile elements: ReTest.

If you liked this post, clap (as much as you want), twitter, share or otherwise help raise awareness. Thank you!