Microservices: High quality code development — The Theory

By Daniel Perazza, Emilio Gerbino and Nicolás Quintana

intive

Published in

intive Developers

12 min readApr 25, 2023

Introduction

Only what is measured can be improved, but, are you sure you have the right measure? How can you tell your microservice is doing the right thing? Moreover, let’s say you have, for example, 87% unit test coverage… is it meaningful coverage, or is it just coverage? How can you tell the difference, and how can you prove your development has the appropriate level of quality?

The purpose of this article is to address the questions above by presenting a theoretical approach of combining two different techniques: Semantic Testing and Mutation Testing.

Before you go on reading this article, make sure you are familiar with the following concepts: Unit testing and microservices, as they are the basis from where we’ll start building everything else.

Hope you enjoy the journey.

Addressing issues

Scenario Facing a microservices architecture has clearly several advantages but also presents some challenges. One of them is to guarantee the quality of each microservice (or small set of microservices), with a clear purpose: to make the whole integration smoother. Otherwise, a lot of effort would be required to make all microservices work together seamlessly, and of course, this may take a lot of time.

So, a key concept is to assure any digital capability (defined as a small set of microservices with a single functional purpose) has the exactly expected behavior. This is particularly important, as it will simplify a lot of the troubleshooting required when putting the whole system altogether before going into production; or when already in production, to quickly diagnose and fix any issues.

One traditional technique microservices developers use is to rely on a good set of unit tests in order to ensure the quality of the component delivered. Unit tests are great, but they are just the beginning, the building block that is necessary, but not enough, to build everything else.

The very first problem is how to differentiate just coverage from “meaningful” coverage. In other words, this means not just covering lines of code, but to make sure of writing the appropriate set of unit tests that will cover all real conditions. Failing on this topic may lead to a false sense of confidence in the code, and yet have room for potential problems that eventually will arise in time. Only high quality UT (unit tests) will help to solve this issue.

The second problem is, although a digital capability may have a good, high quality set of UT, it may still be hard to integrate it with the rest of the microservices of the whole system. How may that happen? UT will guarantee the quality of very small pieces of code, but we still need to guarantee the behavior of the entire capability before going into integration with all the others. In other words, UT will separately certify the pieces, but we need to make sure when putting them together the resulting behavior is also correct. Failing on this topic will definitely lead to rework.

Let’s go deeper into these concepts, but first some basic theory.

Some basics

The goal of the software development process is to create software that meets the domain requirements and also contains no defects. However, achieving this goal is almost impossible because most of the time not all requirements are known from the beginning and, even more, software can become so complex that there is no way to absolutely be sure that it contains no defects.

Quality in microservices is the combination of several factors, but the most important ones are: a team or organization willing to work on this topic, understanding testing time and effort, building testable applications, the theory and principles of testing and tools, for example, static analysis, reports, CI/CD, mutation testing and test coverage.

Theory and principles of testing

There are many different types of tests. There is also an overlap between them that can be confusing when going into implementation. The following graph summarizes them in 4 categories or types as described in the test pyramid below:

Unit tests: UT ensures a function, an object or small piece of code does the expected. The idea of unit refers to the piece of code that has a unique purpose with clear boundaries. Unit tests operate at the most basic level, and scrutinize the behavior of those small well defined pieces of code, exercising the software against specific inputs and comparing the results. Dependencies should be mocked up. They provide early feedback as they are fast and easy to implement.
Component test: They test several units of code working coordinately as a component. This kind of test ensures that the component behaves as expected. Dependencies outside the component should be mocked up
Integration test: Evaluates how all parts work together. Some dependencies should be mocked up, while others should remain in place to check integration. Basically, they are quite similar to the component test, the difference is they should be focused on integration.

Any given project may have integrations and components tests; OR just one of them that will test both the integration and the purpose of the component.

E2E test: They test the whole system, similar to how a client would use the product. They validate functionalities and some non-functional requirements. All the pieces of software are exactly the same as the production ones.

Testing metrics

Developers apply multiple techniques that allow them to ensure that the code is behaving the way it is supposed to. Therefore, it becomes imperative that developers create good quality unit tests to obtain these benefits considering that the lack of well designed and efficient test suites can be as harmful as not having tests at all.

In order to do this, developers need to take into account metrics that will help them track and have a sense of how well the software is tested, and this is where code coverage enters into place.

Code coverage is a metric that indicates in what measurement the source code is executed by the test suites, and it is composed of several coverage sub-metrics that ensure that all statements, branches or conditionals and paths are exercised when running the test suites. In other words, code coverage states how much of our code is actually reached by our tests.

Satisfying this metric, with regards to achieving a high percentage of code coverage, does not guarantee to detect defects nor it can stipulate the absence of them. Furthermore, the coverage metrics do not assess how well the results of the program are being checked. This might end up in tests that trigger a defect but do not detect it.

Having a code coverage report of 100% only means that the tests have executed all the code but it does not tell us anything about if the code is being correctly tested or if it is doing what it is intended.

So this kind of metric cannot provide an accurate assessment about the quality of the checks that are performed to detect defects, although some correlation can be proven, but it is still useful since it can help developers identify those sections of the source code that are not being evaluated by their test suites.

The next section presents solutions to overcome these problems.

Potential solutions

As described in the previous section, UTs are the base, something necessary but not enough to address the issues. The lack of UT is simply catastrophic , as UT provide the following benefits:

● Reduces the number of bugs that reach production.

● Encourages a better code design, more modular, and also enforces the SRP (single responsibility principle).

● Allows adding new features without breaking existing ones.

● Supports refactors, as they can be faced without major regressions.

● Reduces development costs by early problem detection that otherwise would have a larger impact in later stages.

● Helps self code documentation since they explicitly describe how a certain section of the software has to work.

So, this is great as a first step. But one step further would be to introduce “Mutation testing” and “Semantic Testing” to ensure both, high quality at fine grain, and whole system quality with regards to the digital capability being developed.

Let’s go deeper into these last two concepts.

Mutation testing

Mutation testing is a technique used to evaluate and improve the quality of unit tests, and in consequence, the quality of the code. This technique consists on mutating the source code by introducing artificial defects, called mutants. Then, this mutated software is evaluated against the existing tests in order to see if the alterations are detected.

A mutation is considered detected and killed when at least one test that passes the control run (against the original code), fails to pass when run against the mutated code. On the other hand, if no test fails when run against the mutated code, then that mutant is considered undetected or a surviving mutant, meaning that the test suite is unable to identify the defect and to differentiate the original code from the defective one.

The defects created by mutation testing are provided by mutation operators. These are well-defined rules that describe how to change elements of a software and aim at mimicking typical errors that programmers make. Usually, a mutation operator can be applied to the source code at multiple places, each leading to a new mutant.

The results of this testing process can be grouped in what is called mutation score, which provides a quantitative measure of a test suite quality. The mutation score is calculated as a percentage that indicates the number of detected mutants divided by the total number of mutants.

It is important to note that mutation testing does not test the software directly; instead it tests the quality of already existing tests and helps to improve them. The assumption is that tests that detect more mutations will also detect more potential defects. This is based on two hypothesis called “the competent programmer hypothesis” and “the coupling effect”, where the first one states that programmers have enough knowledge of the correct software that they tend to develop software close to it, but deviate from it because of small defects, and the second one indicates that software that deviates from the correct version due to small errors is so sensitive that can also detect more complex ones.

Semantics Test

When implementing tests, there are many suitable frameworks or tools to choose from. There are tools that allow writing tests in natural language while others require a development language, or even both of them.

Let’s imagine just paying full attention to writing the tests, in natural language, without coding, without dealing with implementation stuff. Just focusing on the desired quality in a very easy-to-understand fashion. This is actually possible: by using semantic tests.

Semantic tests enable developers, testers and other stakeholders to write and validate:

● Acceptance criteria case.

● Any kind of requirement as a test in a natural language.

● Any other tests defined in a test plan or test suite.

● Allows BDD (Behavior Driven Development).

So, what is a semantic test exactly?

First of all, it is a concept that aims to focus on what to validate over how to implement that validation.

Secondly, it is an implementation technique that allows you to see a test as a view, plus its implementation. There are tools, frameworks and applications in the software world that allow this, and even your own developed version could be used.

To summarize the key of semantic tests, and to extend this concept to other cases, let’s say that there is a business-readable test, or set of steps, that validates and verifies some functionality or acceptance criteria in order to assure the quality. Behind this natural, human-friendly test, there is the necessary machinery to implement the validation and verification.

Semantics Monitoring

Nowadays with Observability tools, this semantic concept can be extended to monitoring systems, adding one more feature to our dashboards, alerts, etc. These tools provide the mechanism to define a test, or set of steps, in a view with all associated metrics and actions, as well as, the capability to implement these steps.

So anybody in charge of monitoring the applications, microservices and other components, already familiar with the observability tools can take advantage of this and apply the semantic concept to further enrich the monitoring ecosystem.

Adding concepts together

Everything is mixed and the magic is done! Surely not, but by combining them the results would be much better.

Mutation concepts make more accurate, sensitive and reliable tests; furthermore, they can be applied to any level of the test pyramid. However, for simplicity and time reasons it is enough to do it just with unit tests. Combining mutation concepts, and good assertions will produce good quality tests. In other words, implementing mutation testing will transform plain simple coverage to “meaningful” coverage, with all its associated advantages.

Semantics tests allow us to focus on validating and verifying functionality regardless of their implementations. While mutation testing is great for certifying the quality of the UTs, semantic testing is great to certify the quality of the whole component or microservice. And to make this even greater, in a very easy-to-understand, human friendly manner, that everybody can acknowledge.

Semantics Monitoring is the next natural step. It gives exponential benefits to the existing monitoring ecosystem since, not only information will be received from the system, but also feedback by exercising the whole component functionality under monitoring by using semantic tests.

When both approaches are combined into a single project, quality is significantly boosted up. Semantics tests would be used to cover most of the functionality and its related underlying code, while mutation testing would ensure a truly reliable UT coverage. On the contrary, having in place just one of the two approaches would only produce half of the picture, yet with both together quality gets heavily reinforced.

Ok, enough theory, let’s move to the practice :)

Pros and Cons

Mutation Testing

This type of tests enables developers to detect potential bugs and ensures that the tests are created with quality in mind. It also allows developers to understand and find cases and ambiguities in the source code that were not checked before.

And finally, mutation testing also makes sure that by eliminating small mutants, bigger, more costly and risky bugs are also eliminated.

On the other hand, the major factors that can affect the application of this method are:

Cost of producing and evaluating mutants: Depending on the number of mutants generated, and the time it takes to run the test suites, the whole process can become extremely time consuming. Also, the number of mutants generated depends on the choice of mutation operators so the more operator developers wish to use, the more mutants will be generated. A test suite that takes about 5 seconds to run, with a code base that generates 2500 mutants can take approx. 4 hours.
Cost of equivalent mutants: Equivalent mutants are changes made to the original code that differ only in the syntax but not in the semantics, meaning the mutant generated creates a change that does the same as the original code. These mutants cannot be detected and create false surviving mutants that can influence the final mutation score and also represent a problem for developers when interpreting the results and trying to eliminate mutants.

Semantic Testing

Let’s mention some pros of semantic testing: it abstracts testing to the business domain without bothering with the code for those tests, plus it enforces quality by leaving coding issues behind.

Another benefit is test understanding. Semantic testing, as a first step towards semantic monitoring, is a technique that focuses on finding a standard way of testing software. This means to focus on “what” the system should be doing in a clear, easy to understand way. This is the key concept to ensure the complete behavior of the whole digital capability.

The last advantage is reporting improvements. Because of the way semantic testing works, it makes the reports easier and richer.

However a disadvantage of this approach is that extra complexity is introduced. In the worst scenario, without a deep understanding of the underlying tools, tests may become un-maintainable and unscalable.

Going into practice

Glad you reached this far in the article, so, to find out how to move into practice please read our next article: “Microservices: High quality code development — Going into practice”, for sure you will enjoy it :)

Annexes:

“Assessing Test Quality” — by David Schuler
“The Practical Test Pyramid”
“Microservices: High Quality Code development — Going into practice” — by Daniel Perazza, Emilio Gerbino and Nicolás Quintana