Quality is not synonymous with testing. Automated testing in a CI environment is necessary, and is often seen as the primary measure of quality, but in fact does very little to produce a ‘quality’ product.
There are those who proclaim “automation is not real testing” and those who insist “manual testing is just a waste of time, automate all the things”. Both groups are right, yet wrong. You need automation and you need manual testing, but you need much more to create a product that is of shippable quality.
I use the term “shippable quality” deliberately (and probably controversially) because the acceptable level of quality of a piece of software will often depend on its stage in the life cycle and the intended audience. For example, acceptable software quality for an alpha release will be different to a v1 release.
Alan Page and Brent Jensen coined the term Modern Testing along with seven principles that defined this, and this is very close to the approach we are taking at Redgate. I’d like to describe how one of our teams operates and perhaps this will help show how quality just isn’t testing!
When I started working with one of our dev teams in the role of quality coach, it became apparent that they were already ahead of the game regarding testing. Their coding culture included writing unit tests (for which they had a strong sense of what that meant); integration tests and a nod to some end to end UI smoke tests. More on the details of these later.
As we (should) know, quality isn’t just about software testing, and is certainly in the eye of the beholder. The team are placing a huge emphasis on building the right product, and are performing user research to test out ideas, functionality and workflows to ensure they are building the right thing at every step. This is a massive advantage for the team as they are giving themselves every opportunity to fail fast, learn and iterate. But they are not waiting for that one big bang release with the ‘perfect’ product. No, they have a targeted beta programme that a ‘known good-enough’ release is made available to get it into the hands of users within their own domains, databases, environments. This is certainly not a case of letting the user do the testing, but it is a case of getting as much real user feedback as early as possible to inform decisions on the product.
But what about the software testing? I’ve already mentioned unit, integration and UI tests, but what is the team actually doing?
The team are building a culture that focuses on quality. It’s important to point out that the team are not striving for perfection, they are striving to build great, valuable software. Achieving this requires many parts of a jigsaw puzzle:
Pull requests (with templates)
The team uses pull requests to merge onto the master branch. These pull requests have a template in which the author of the pull request must state what testing they have done on the change they are proposing. It is proven that introducing, what is effectively, a checklist ensures the author and reviewers do not forget important tasks.
Code reviews are a very important part of the quality story, and serves to spread codebase knowledge among the team.
Having this template makes it easier for the rest of the team to review and critique not only the product code, but the testing activities as well. This peer enforcement is so powerful and far more effective than any top-down management.
The team’s approach to unit tests is that every code change or new code must have unit tests unless there is a very good reason. Having unit tests helps to shape the architecture of the code, encouraging SOLID principles. All unit tests have no external dependencies, and test a single function of the code.
The backend of the application is written in C#, with the tests written in NUnit, and NCover measuring coverage. Whilst code coverage is not a particularly meaningful metric on its own, it does provide the team with data to help identify high risk areas that have a low coverage from unit tests.
This frontend code is a combination of Typescript, React and Electron. The team chose to use Jest, a testing framework developed by Facebook that has support for testing React.
The integration tests are very similar to the unit tests using the same testing frameworks in the backend and frontend. The difference here is that they are starting to integrate several parts of code, and can have external dependencies. Where possible, these are mocked out to improve reliability, reduce fragility and increase speed.
UI smoke tests (spectron/jest)
Automated UI tests have often been hailed as the replacement for manual testing, but anyone who has worked on UI tests for any extended period will tell you that they are a royal pain-in-the-arse! UI tests tend to be fragile, slow and will break quicker than you can say ‘checking not testing’.
However, they do have their place in the wider test strategy. On this project, they are being used to ‘smoke test’ the installed application. As the application uses Electron, they are using Spectron (the electron webdriverio based framework) to drive it.
There will be a handful of scenarios that install and launch the application, check window titles, initial page etc, then walk through a very basic workflow. As a part of the continuous integration pipeline, this gives the team a greater confidence that their application at least installs and starts correctly!
Of course the temptation (and certainly the tendency of many UI test authors) is to go off on a tangent and start functionally testing the UI. This is very wasteful.
React component tests (enzyme/jest)
To test the UI, but avoid a lot of the pitfalls with end-to-end style tests, the team have opted to use a framework called enzyme. This enables the testing of individual react UI components by shallow rendering them then querying a JS DOM to check things like event handling and rendering success of elements within the component.
This allows any behaviours and logic within a component to be tested together (as some would have already been unit tested), within the context of a loaded/rendered component.
All Redgate’s products are built in the Teamcity CI system. This provides sterile, reproducible and consistent build environments to build and run tests. The results and various artefacts are made available for the current build as well as all historic builds.
Any test failures will result in the build being marked as failed.
Sonarqube (coverage/static analysis)
Redgate’s products are also subject to static code analysis using Sonarqube. This inspects the code and identifies various ‘code smells’, code duplication and other less-desirable issues, before giving the code a rating. It also takes any code coverage results from the test runs and takes these into account.
Sonarqube is a very useful way of keeping a very visible track of the quality of code being pushed into the builds.
Exploratory testing sessions
Many people would look at the number of automated tests in this project and really question the need to do anything else. After all, the functionality is pretty much all tested, right?
However, the team have quite rightly identified the need to run exploratory testing sessions. These aren’t regularly run, but run when the team feels they have done enough work on the product to introduce significant risk to warrant it. I feel this is a very sensible approach.
The sessions follow a Session-based testing format which is explained in more detail in an article by Jon Bach (http://www.satisfice.com/articles/sbtm.pdf).
These sessions are so very important to get everyone on the team using their product in ways they wouldn’t normally, and taking the time to use it to achieve something. This will often raise usability issues, or functional issues that they just haven’t noticed until now.
User research calls
I mentioned earlier about the importance the entire team have placed on doing user research calls. In fact, they spend about 20% of their time either on calls to users or discussing the findings.
They are organised and led by our UX designer on the team, but the whole team have input and participate. It’s so useful for the developers to see the struggles users have with bits of the product, and to be able to get immediate feedback on new features. The feedback and findings from these calls can have a direct impact on what gets pulled into the next sprint.
Following several rounds of user research call, the team put out a private beta release to a selected audience. This is currently underway and is being very closely managed, with each of the recipients being contacted to find out how they are getting on as well as the use of Intercom (see next section) to communicate with users.
Feedback via Intercom
Embedded into the product is a feedback mechanism to enable the user to communicate directly with the development team during the field trial stages of the project. This is done using a system called Intercom, and you’ll be familiar with the type of technology if you’ve ever used an embedded chat system on a web page.
What they haven’t done yet…
There are several obvious omissions from the above list. The team have made a conscious decision to not focus on performance or security tests as this stage. This is not to say they aren’t concerned with the performance or security of the product, but they feel their efforts are better spent on the other things.
However, performance of the product, specifically the user-interaction performance is something that is of importance and is assessed during the user research sessions.
The intention is to introduce tests for these areas later in the project. There are pros and cons for doing this, but within the context of this product, it is an approach which is appropriate.
So quality really isn’t just about testing…
So quality is much more than just testing. Sure, testing gives us a level of confidence in the quality of our software, but it is one part of a multi-faceted approach. Gareth Bragg, one of our other quality coaches, introduces his model of quality with the concept of 3 dimensions. We tried to fit this model to the facets of quality mentioned in this article, and it evolved into overlapping dimensions where certain actions affect more than one. What was interesting that we felt the only operation that fit in that sweet spot across all 3 dimensions was monitoring. Makes sense really, as monitoring is somewhat broad, but can inform all areas of a product. While it doesn’t directly affect quality, monitoring provides the data on which to base quality improvements.
We may write a follow-up article to Gareth Bragg’s original article, with more details about how we evolved his original model to this one…
Oh, did I mention that the team doesn’t have any testers? The team consists of 4 software engineers, 1 technical lead and 1 UX designer, yet they are still managing to do all of this. There is still a fair way to go, the product is not feature complete and is only in private beta now, but by developing the product in vertical slices, there is a good, justifiable feeling that they are building the right thing to a good standard.