Avoid integration testing against shared environments
Hello

Several times in my career I have faced projects with integration tests that were running against shared environments such as staging. Those environments were usually used for a lot of different purposes including regression and manual testing, internal demos and experiments. Stagings were usually up to date, but not completely stable nor had 100% up time. Often those integration tests were developed during switching from monolithic to microservices architecture what forced team to learn new ways of deploying and detecting services. I would like to point out some issues with such testing against shared environments and propose available alternatives.
While it is easy to write integration tests against a shared environment in the beginning (due to lack of time or experience in dynamic service deployment), there are aspects that would make it hard to maintain this approach in the long term.
Slow tests: remote communication is slower than local communication and suffers from instability or unpredictable latency.
Not isolated: Test cases may create or delete entities used or referenced by other running test cases causing failure. Tests might even interfere and fail when run concurrently. Parallelity of tests may change and introduce unexpected failures later.
Non repetitive: test cases depend on version, performance and health, network latency and reachability of staging resources (forget working offline from the beach).
With so many variables tests become flaky or unstable. Without code changes tests randomly pass or fail locally and on CI. If security requirements change access to shared environments is restricted or removed. Integration tests may only be run in the office or via VPN. Once flakiness or hassle to run integration tests reaches a higher level failures are explaining by general instability and ignored. At this point tests are losing their most important quality: assurance.
Let’s go back and review how this could be prevented.
As a first step we could wrap dependency into docker container and start them exclusively for each integration test (Take a look at this library: testcontainers which can help with that). This step reduces network latency and improves test performance. And it will also help with isolation as each test interacts with dedicated resources (test setup would be more complex but dependencies become more explicit). Next we fix the docker image version (instead of using default :latest) making the tests repetitive as every build will be pulling the same release version of the dependency (please never override release artifacts). Once there is a new dependency version available the tests are verified against it before upgrading. That way we can check beforehand if the change is compatible.
This is a good start, but over time tests become slow again as the number of dependencies and their complexity grows. At some point laptop memory may not be enough to run all dependencies and shared dependencies are discussed again going back to the original problem.
To avoid this I would recommend writing integration tests that run in isolation for each service much like unit tests run against a single class. Nowadays most of the inter-service synchronous communication is implemented using REST API and asynchronous communication with event queues such as kafka or activemq. It is possible to mimic a REST API using wiremock or mock-server having them return predefined responses and matching requests. For asynchronous communication and queues, it is possible to assert the content of the queue. This approach is well known as contract testing. Contract tests help to define and test the public interfaces or APIs for communication and the expected behavior. This approach quickly identifies architectural problems for example if state between services is shared via database. This would raise serious questions whether services are independent. Contract driven tests are also very helpful ensuring compatibility across service versions and establish an upgrade order for services with breaking changes.
Once there are a lot of services with a complex dependency graph I would recommend to introduce a new layer to the testing pyramid called “End to End tests”(e2e) . They should be located in a separate repository with no production code. These tests are covering business scenarios spanning across multiple services. Quite often “e2e” tests are implemented with selenium, cypress or a similar technology as the web browser is often the standard way to interact with the services functionality. Usually development teams would run e2e tests on dedicated environments before a release, then again after a release on dedicated regression accounts on production. This gives teams the confidence that a release went well and production is in an expected state. With tens of releases per day or external services that have unpredictable release schedules I would also suggest to run e2e tests periodically (every 10 to 15 mins) and use the output as a high level liveness check for business functions such as payments or order processing. Make sure it is as few of them as possible for the sake of execution time. Also consider running each e2e test in separate pipelines to have as fast feedback cycle as possible (no need to know that customer can not checkout or add review if login is broken).
Conclusion
Integration testing is an important aspect of product development that heavily affects architecture. It is very important for a team to decide on test strategies, tools and techniques early. It is worth considering that shared state is harmful during testing as well.
Thanks Matthias Hofschen for reviewing this article.
