Starting Web Testing From Zero

Published in

TextNow Engineering Blog

7 min readNov 19, 2019

The Testing Pyramid with E2E, Integration, Unit in descending order and increasing size. — The Testing Pyramid

At some point you’ve likely seen the above chart or a variant of it — the testing pyramid as originally described by Mike Cohn in his book Succeeding with Agile. It describes a breakdown of the main types of testing and the proportions of each. The sections themselves have been a matter of debate: E2E vs UI, Integration vs Service, Unit vs Component. Others have proposed additional levels for API testing, component testing and more. There is a significant amount of useful and interesting discussion around the right way to approach a holistic testing strategy.

All of which seems hopelessly distant when you’re staring at a codebase with no automation and the bare minimum of a manual testing strategy. All of this sounds great, but how in the world do you get there? This was the situation I found myself in when I started at TextNow five years ago. Our testing capabilities have expanded massively since then, and I wanted to share some of our learnings over that half decade.

Let me start at the beginning, and introduce you to our original testing methodology.

It All Started With…

Picture of a person — It’s like looking into a mirror.

Having a singular dev (me)both coding and testing was fraught with issues. The split focus meant both slowed down production and less testing. Our initial solution was to throw more bodies at the problem. Utilizing a third party testing company, we were able to develop a suite of manual tests that could be run on every release. This meant that we could be reasonably confident that when we put something out, we weren’t unknowingly breaking everything else.

This was an improvement, but it came with a number of drawbacks. Drawbacks that are endemic to any strategy based solely around manual testing. Much of what came later was an effort to mitigate the severity of these drawbacks.

Unit test the world

The first drawback was the low frequency of the testing. Tests were being done on a per release basis, which, at the time were occurring every two weeks. This meant that things could be broken for weeks before anyone noticed. When problems were found, it became a scavenger hunt to figure out how the issue arose and the cause of said problem. An added challenge is the fact that you then had to wait for another run prior to being able to ensure that the problem was fixed properly and did not beget any other consequent problems. This process was inefficient but it was cost and time prohibitive to test more frequently than that. We needed a cost-effective process that also informed devs, as quickly as possible, that they’d broken something, preferably even before the code is committed.

Our first pass at attempting to solve the issues that came with low frequency of testing was unit testing: Testing where individual units or components of software are tested. This task wasn’t quite as simple as it sounded; the code hadn’t been written with testing in mind. At times things were so entangled that trying to test a single line would involve half the codebase getting pulled in. Work had to be done to isolate code and let us test one thing at a time.

Dependency injection, which is the ability to provide a mock for an entire file, was the key to this task. It was something that wasn’t trivial in Javascript, especially with an older technology like requireJS. But through perseverance, a tool called SquireJS, and a substantial amount of cursing, we were able to get dependency injection, and thus, isolation. A new rule was instituted: any code had to have tests before it could be merged. As for the old stuff? If you were touching it, add tests, otherwise, leave it alone. It didn’t seem practical to try and do everything all at once.

Chart showing code coverage — This was more successful in some places than others

Manual testing en masse

So we had the ability to quickly tell developers they’d broken a line or a function. However, that didn’t actually let us know if the whole system was working; oftentimes issues would arise in the integration of multiple components. This meant continued manual testing was required. For manual testing to be valuable, the tests had to remain up-to-date and properly described.

Graph showing near 100% pass rate — That much green can be awfully deceiving

Properly describing every detail and every expectation turned out to be the more serious issue. There is often a large amount of implicit knowledge known to only one or two people. Things that are automatic to you must be laid out in excruciating detail for others. All this, of course, is assuming that you remember to write it down at all.

This pitfall was made apparent to us in the final days of a large project. The decision was made to do an internal regression of the whole application; the number of issues found was distressing. An investigation was launched into how these issues could have crept in unnoticed. There were supposed to be tests to cover many of these behaviours. The investigation showed that many of these tests were out of date or poorly described. All the confidence we had in our manual testing evaporated in moments.

So it was determined: We would do the testing ourselves, and we would do it better. We knew what we were looking for, so if we tested our own stuff, fewer things would be missed. It worked. Testing things ourselves drastically cut the number of issues getting into production…but it was also taking an eternity. While the team had grown significantly since I started, it was still taking us the better part of the day to test everything. This solution was expensive, overly time-consuming, and monotonous.

Luckily, there was a technology that was excellent at doing the same thing again and again, the same way, every time.

Robots took our jobs

As you can see above, computers are fantastic at following instructions; in fact, that’s all they do. Using a tool called Selenium, we were able to translate those messy human instructions, open for interpretation, into cold, factual, computer-readable instructions. This not only let us run our testing in a fraction of the time, but we had a source of truth for how things were supposed to work. If someone changed a behaviour and it no longer matched spec, a test broke, no debate.

This was a revolution in how our team operated. At the unit level, we could know if things were broken before ever even committing. At the end-to-end level, we could find errors in mere hours rather than days or weeks. This also allowed us to expand our testing significantly. If the tests were already there, it was minimal trouble to run on more browsers and more operating systems, ensuring that our product worked for all of our customers.

What to do with all our free time?

A happy consequence of the continued goal of testing more quickly and more robustly led us to adding other tools; after all, we still had to do some manual testing, and we wanted it as simple as possible.

We developed the ability to create users in a wide variety of states. This sped up the creation of users who had to be in complicated states. It also allowed for greater parallelization of automation as we no longer had to be concerned about multiple runs using the same account.

Since collisions weren’t a concern, wouldn’t it be nice to run the full test suite on every PR? Introducing our new environment-per-PR capabilities, courtesy of Kubernetes.

GitHub comment giving a link to a custom environment. — It’s a seemingly little thing, but the quality of life improvement was staggering

It became a breeze to just send a link to our design team or a stakeholder who wanted a preview of things, rather than hauling your laptop over to their desk. A technique which was increasingly difficult now that we had offices in two different countries.

One concern that can arise from this is, what happens to the manual QA testers; aren’t they out of a job? Not even remotely. Their jobs just got far more interesting. Now they could do the things that humans are actually good at- usability testing, exploratory testing, etc. In our case, they were able to become fully fledged developers. So we were getting even more done, and with increased confidence.

In summary

We’ve come a long way, from singular coding and testing, to manual testing, to automation, and a combination of all three. We’ve made our product more resilient to changes and less prone to bug-riddled releases being set out into the world. And most importantly, we have more time to spend on developing, and less on investigating.

Now, if we think back to that testing pyramid we started with, there is one more layer to uncover- Integration level tests. But…we’ll save that for the next post 😉.