Achieving acceptable quality in software products

Author: Alexey Skorokhodov.

Someone at my previous company complained about Selenium testing being unreliable and I decided to write up some thoughts. This is more of a generic approach and “Your mileage may vary”.

The problem.

I have seen people trying to “make UI testing more reliable because developers are frustrated and don’t look at results anymore” (c) over and over and over again. You join a company and hear these complaints and then a few years later when you leave you hear exactly the same complaints.

Of course, many selenium tests can be made more reliable by using various “Best Practices”. This is the right thing to do if you decide to use Selenium testing, but this does not eliminate inherent problems which I will describe here. This article is not about how to make Selenium tests better — just google recipes for that, there are plenty. This is about the bigger issue.

To start, I will say that I do like the idea of automating everything, but… Not “at any price”…

Let’s see:

* How much money did we spend in the last 3 years on Selenium tests?

* How many bugs did those tests find?

* What is the price of one found bug in $ value? what is the ROI?

* is it worth it? could the same or better results be achieved some other way?

After all, the end goal is NOT to make Selenium tests more reliable. The end goal I believe is to spend as little money as possible on achieving acceptable level quality in our software products.

This is an important thing to remember. Now, with that in mind -

Let’s categorize bugs.

Of course, there is unlimited number of possible ways to group bugs. Here is the one I decided to use for this article:

1. Logical errors.

bugs like incorrect IF() expressions. typically easily caught by unit tests.

2. Wiring errors.

functions work fine by themselves, but fail in production because someone forgot to initialize/wire something together or did it incorrectly. this is caught by integration tests.

3. Multi-threading errors.

For the purpose of this article I will consider these to be in a separate category. Can mostly be found during code review by humans. Some can be found by tests, but absolutely not reliably and only when you know where to look exactly and how to write a proper test. In which case you need to know the code and this goes back to code review.

4. “Ugly” errors.

Everything “works”, but the end result is unusable. E.g. “Buy” button is not visible on the website because of overlapping or wrong color. these are “Rendering errors”. Can only be caught by humans.

Mini-example (CSS):

body { display: none !important; }
or
{max-width:1px, max-height: 1px, overflow: hidden}

- many selenium tests will pass with this :). same goes for overlapping and/or hidden buttons or other elements. from Selenium point of view, everything is there. but the app is unusable for people.

Automated UI testing is an attempt to go after ALL categories of errors at once while being inherently flaky and slow plus inherently unable to detect multiple kinds of “Ugly errors”.

Assuming that your unit and integration tests pass (and they cover enough), what is it that you are really trying to test on Browser level
That Google Chrome can parse valid Html and show in on the screen? 
Or that a javascript library you are using can show correct JSon response?

You should not test Google Chrome itself — Google does that for you. And you should not do automated testing for 3rd-party Javascript libraries — that is the job of those 3rd parties.

Given that unit and integration tests pass, there is little that can break in the rendering layer. And many of those things that CAN break cannot be detected by automated UI-level testing without this being prohibitively expensive.

Now, this is where people say things like “well, yeah, that is all great, but we don’t have enough unit and integration tests, so we will write Selenium tests and then try making them more reliable” (please see this excellent article: Riding a Dead Horse).

And let’s go back a few chapters -

how much money did we spend on Selenium tests? how many unit and integrations tests can we get at similar price and in similar time frame going forward?

Proposals.

1. Tests allocation.

Have ~45% tests on unit level plus ~45% on so-called “integration level” and then only a few basic tests in Selenium. Plus some manual UI testing. Yes, manual. As much as people hate this idea, it is prohibitively expensive to find several kinds of bugs any other way. You should not need to retest the whole site manually. Why? Let’s see:

when do people usually run automated selenium tests? Usually — for each new build. Why? Because a change done by a software engineer could have broken something.

But is retesting the whole website with Selenium for all code changes the right approach to begin with?

I would argue that code changes should not affect the app in multiple places, but that requires modular approach and strict boundaries between modules. That is a separate topic.

In general, it is likely that if you made a code change that affected the whole site (say, a Javascript popup automatically showing on all pages because it is added to something like “footer.jsp”), then all Web UI level tests will fail. And having just a few “happy path” UI level tests should suffice.

2. Actionable item.

To achieve “tests allocation” proposal above — reallocate all resources from writing useless Selenium tests to unit and integration tests.

3. Code reviews.

If your toilet pipes are leaking, you need to fix them before you start mopping floors.

Spending time (==money) on continuously mopping floors and installing multiple sensors to measure the amount of “stuff” leaking from broken pipes is not going to get you anywhere.

Fix the broken pipes first!

Testing is not the only way to achieve quality. It is only one of the layers in the dev process. Product quality starts with good development processes, which include design discussions, continuous code reviews, using “right tool for the job”, understanding WHAT needs to be tested and on WHAT level, etc.

Incorporate code reviews into ALL development processes, including writing tests. If you believe this will slow you down, ask yourself a question: if you don’t have time to “do things right”, then why do you have time to fix all bugs later?..

If code reviews slow you down — this is probably because the code had issues and you had to go through multiple iterations to make it better. The problem lies not in the review process, but in the bad code to begin with.

If people do not treat code review requests with high priority — this is a process issue and can be solved.

There are several ways to make code reviews *useful*. This includes having strict review guidelines, having experienced engineers, etc. There is no point in having reviews if they are a pure formality and general attitude is “whatever, it looks okay to me, commit this”.

By making GOOD code reviews A MUST in all dev activities, you:

  • share knowledge
  • teach people how to do things better
  • and find many logical errors/bugs at the same time!

This way you build better teams day after day rather than keep adding [possibly low quality] code that may be a constant source of bugs later. This is not about some abstract “achieving zen and pure beauty while programming”. This is about saving a lot of money for your company.