Self-Certification: A Developer-Centric Test Pyramid for Microservices
--
TLDR;
Migrating from a traditional test strategy to a fully automated test strategy with continuous deployment has been an exciting journey. We had to change our processes, technology stacks and status quo to make that happen. This article is a deep dive into those aspects followed by some lessons learned. Example git repo is linked at the bottom.
Backstory
Couple of years ago, we made a move to redefine our overall automated testing strategy to take the shape of a proper pyramid from a classic ice-cream cone. For many good reasons, over the years with rapid product growth, we too had to place many quality gates despite the cost. As our product entered the next phase of its evolution, with doubled the staff and functional scope, the conventional approaches that once served us well were then in our way from delivering faster.
We had tons of unit tests that are maintained by Devs. We had API test suites per each Microservice which were owned by Quality Engineers(QEs). Though we made some level of effort to avoid overlaps, most API tests did became redundant. And when API tests are around, our focus on unit tests gradually went away to verify functionality. We also had a myriad of E2E tests in Selenium again owned by QEs. And then yes, we had manual regression too. With so many E2E and API tests our regression was tedious.
- Most of the time they fail due to various reasons
- They take too much time to execute. Having to re-run usually affects release cadence
- Had so little confidence in tests, that we mostly relied on manual judgment to proceed
At the end of the Sprint, we’d then bundle up everything and make a Go/NoGo decision after reviewing test results and known bugs. With growing amount of features and teams at 2 different geo locations, we could not keep up with this process. And we knew we could never achieve continuous deployment with a process like that. So as a collective decision, we decided to invest in simplifying this model.
Redefining Test Strategy
To get rolling, we started off with a few goals.
- Make components independently deployable
Instead of releasing a batch of components, we wanted each component to go live independently. We were not in the stone age, and we did have a decent CI/CD pipelines to deploy components to any environment with a click. It was the process that kept us from doing this. So we wanted it changed. - Complete regression under 1 hour
Whatever the testing suites that must be run before a component gets deployed to production, will only have a time budget of 1 hour. - Let anyone easily execute all kinds of tests
No more test owners! Engineers/Teams must own the quality of their work. Therefore, all kinds of tests for a certain component should be easily executable by anyone either by intuition or with a simple manual like a README.md file. - Make tests reliable
What’s the point in having so much tests if they are crying wolf all the time? We made a clear guideline that if a test is flaky, either that has to be fixed for good or removed from the suite. (This is not easy for Frontend components - more on that later)New Test Strategy — Component Self-Certification
In order to perform CD, we explored solutions to self-certify architecture components and we came across a great article by Martin Fowler on a Practical Test Pyramid. In that article, Fowler explains how each component can have its own pyramid of tests that are deterministic. While most of those concepts well aligned with our objectives, we had to customize it a bit for our context.
Self-Certification Framework
Kinds of Tests
- Unit Tests:
to test logic and service layer - Integration Tests:
to test classes that directly handles external dependencies - Functional Tests:
to test REST endpoints or the outer layer of the component
In order to satisfy external dependencies for Functional and Integration tests, we are using TestContainers and APIs hosted in QA environment. TestContainers is a cool library that manages the lifecycle of docker containers so that we can conveniently wire them to our application during the test execution. For API dependencies we connect with real services in QA. (Which has its pros and cons — more on that later)
Every component in our architecture is to have a self-sufficient test pyramid on its own which also co-locate with the source code. gradle test
or yarn test
can simply execute the whole test suite locally as well as in build server.
Boilerplate Code
At a glance, our test suite looks similar to regular unit test suite which gets executed with gradle test
command (or yarn test
). But underneath there’s a minimal amount of boilerplate code that powers the self-certification suite with some benefits as follows:
- Flexibility to execute the full suite or individual suites as unit or integration or functional
- Ability execute individual tests from IDE
- Parallel test execution
- Shared docker containers to improve performance
- Gradle Flyway Plugin manages DB DDLs (DB change management)
- Sharing scope is isolated between functional and integration suites
- Consolidated code coverage report with Jacoco/Istanbul
We normally use jUnit rules to prepare test environments but in order to keep Functional and Integrations tests from becoming heavy with each test class spinning up Docker containers, we have to create a few Abstract classes and have some static fields to hold the Docker dependencies. Below setup worked for us.
Unit Tests
Nothing special here but couple of noteworthy points about good-old unit tests.
- Make them super fast by avoiding field injections that will result in heavy Spring container. Use constructor injection instead.
- Mock system date related operations to stay deterministic
- If you have to use Powermock, it may be a hint that you may need some refactoring. So use it wisely.
- Mockito and AssertJ can make unit tests cleaner and simpler to read.
Integration Tests
Integration tests become very convenient when we have TestContainers. With previously mentioned TestEnvironment.java
, external dependencies such as caches, Databases can be wired to docker containers whose lifecycle lies within the test execution time. This works well with NodeJS too.
Value of an integration test over a unit test is that it is able to test our queries, configurations, connectivity, marshalling/un-marshalling and error handling.
It’s important not to overload integration tests with many permutations that ideally unit tests could cover.
Tip: When sharing a Relational DB among multiple tests, test data won’t get conflicted if methods are having @Transactional
as Spring does not commit changes to DB.
Tip: Include Flyway or Liquibase on your microservice build time to manage database changes. Doing so will seamless create the database schemas in the Database TestContainers.
Functional Tests
Any component has a functional layer that is invoked or triggered by an external source. Functional tests should be testing these interactions.
Typically for an API, a functional test could be written for each endpoint. Like with integration tests, functional tests also not to be overloaded to test anything that unit or integration tests can cover. Typically a couple of functional tests that can validate the response structure, backward compatibility and status codes could cover the component’s end-to-end functionality.
Functional Tests also work with APIs in QA environment to satisfy API dependencies and use TestContainers to simulate other dependencies such as databases and caches. But what about AWS dependencies such as SQS, EFS, SNS etc? Take a look at LocalStack tools which simulate such dependencies with an excellent feature parity. Another cool thing is that TestContainers specially supports LocalStack with an out-of-box module.
Consolidated Report Coverage
Because we use a single Test target i.e. ./gadlew test
to execute all 3 kinds of test suites, Jacoco can create a consolidated coverage report as follows.
Pull Request Check
As the whole suite consisting functional, integration and unit tests can be executed with a single gradle command such as ./gradlew test
, it can be wrapped in a Jenkins pipeline and integrate with GitHub to work like a PR check. So until the whole suite is successfully executed, the Merge button will be disabled as follows.
Real APIs vs Mocks - Choose Your Poison!
Wherever we start this discussion on whether to mock our API dependencies or not, it turns into a religious debate like the Tabs v. Space.
#1 Using Mocks
Mocking eradicates non-determinism from tests and we all know how important that is. However, one cannot get away by simply mocking their API dependencies. These mocks have to be maintained continuously. If your external services do not have a good track record of keeping their backward compatibility with schema and logic, you’d leak bugs into production. If we want to head down this path, there are many Mocking tools that can record and replay API responses. Also for backward compatibility, there are tools like pact.io for consumer-driven contract testing. But these add an additional overhead to your dev cycle but that effort would be worth it if the external services are flaky.
#2 Using Real Services
On the other hand, we can just connect our http clients to their corresponding QA API providers. What this means is that, we will not be able to merge the PR if they did not work with QA APIs, which is a good thing coz then we would be proactively preventing a commit that would break QA environment. This approach also very easy to implement as there’s no preparation of mocks.
However, the painful part comes regularly if these APIs are flaky or infrastructure is not stable, which makes tests non-deterministic thus making them less credible.
But here’s the thing. Even in production we could get these kind of issues! So whatever our tests experienced with a flaky environment, could happen in real life. And we can take these lessons into our code and improve the application’s resilience or improve its error handling. But still this would not solve the problem of non-determinism. In order to minimize that the only other thing that we can do is to make the QA environment more stable. This involves isolating test data for automated tests, setting SLAs for uptime, holding external teams accountable for infra issues in non-prod etc.
At our organization, we wanted to go down the path of #2 as it also helps us challenge the status quo and improve our eco-system. But this might not work for every team.
North-star — Continuous Deployment
Migrating to a component level test suite is to lead ourselves towards implementing Continuous Deployment through out the system. It is not an uncommon process these days but converting from a conventional test strategy to a pure CD model is not an easy task.
Continuous Deployment helps us to,
- Make small changes to production continuously — therefore delivering value to customers in a true agile way
- Avoid coordinating releases on multiple components across multiple teams
- Avoid having long living branches
This is the high-level approach that we are working towards.
- When a developer raised a pull request, as part of PR checks, a self-certification test suite will trigger and enable the “Merge” button on success. This way we cannot merge a PR if tests failed.
- Once the PR is merged, it again runs the self-certification suite and deploys to QA environment
- If enabled, component will then get automatically promoted to production Canary stage for x% of the traffic
- If the Canary test succeed, the deployment will get rolled out fully.
- If Canary failed, rollback process will be triggered.
- Half-baked features are rolled out to production but hidden behind a feature flag using Optimizely.
What about E2E tests?
E2E tests do not work well with a Continuous Deployment model as it challenges the autonomy of a Microservice. Therefore we decoupled our E2E tests from the release process with the intent of decommissioning it in future. Instead, we started investing in tools such as Cypress, Canary deployments and synthetic tests to achieve a similar value.
What about Performance and Load Tests?
We perform Load Tests as a periodic exercise to keep an eye on how our systems work with a load similar to production. This test is also decoupled from component self-certification as it needs a dedicated environment to run on.
Performance tests on the other hand, are a test suite that we co-locate with the component in parallel to the self-certification suite to compare the components performance against a previously executed benchmark. It does not need high end environment like a load test environment. This deserves a separate article.
Our Learnings
- Dealing with real services over mocks can be hard in the beginning but then it gets easier as we take measures to improve the environment stability
- However, connecting a Frontend to real services as part of self-certification can be overwhelming. So, ROI needs to be in check here.
- Having APM metrics and logs in DataDog made troubleshooting faster
- Dashboards to overlook migration efforts helped us immensely to track progress. (Add a gradle task to pump coverage/test results to a Dashboard)
Changing our test strategy was lot of work but it was the right thing to do. But we still have to keep evolving it until we reach our desired state.
Full Example
This git repo contains the full boilerplate for a Java based Microservice API.