Simplified Integration Testing

The story of increasing our end-to-end test execution speed by a factor of 22x

Integration/end-to-end (E2E) tests do come with a lot pain. Actually, all the issues googlers faced with their tests were met by us too, with an additional (fortunately) short lived sidetrack of writing test scripts in Python, while the whole codebase was in Java. (No kids, don’t try this at home!) However, last summer we have found something promising.

Simplified architecture in our case: during the testing, all of our components/services are hacked into one JVM + an in-memory DB is used, instead of the regular PostgreSQL

How we got here?

In the very early phase of our project, our quality architect clearly expressed the will of focusing on E2E test automation: previous experiences showed that using unit tests only resulted little benefit on the bug discovery side — a lot of manual testing effort was still needed, despite the energy spent on writing and maintaining unit tests. Most of tests produced by the team were either tautological or simply meaningless, which could/should have called for improvements on our unit test writing strategy and mocking approach, but given the team’s generic disappointment in UTs and the urge to reduce manual regression testing efforts, the decision was to concentrate our limited resources on automating integration (E2E) tests. And that came with a price that we were hardly prepared or willing to pay.

Thought it was a bunch of mumbo-jumbo. An endless geek-debate about good and evil, the dark side and the light. Crazy thing is… it’s true. All of it.

Compared to the complexity of our networking and virtualization architecture in the beginning we spent a relatively tiny effort on automated deployment (meaning, we had our automated deployment script, but didn’t use any deployment orchestration tools — eg. puppet/ansible).

Also, we were using J2EE (Glassfish 3.1) but without any EJB mocking frameworks (eg. Arquillian), which made in-container (non-E2E) integration testing impossible.

On top of all, in the early phases we had the “excellent” idea of having our testers helping in writing test code (in Java) but without adequate code review. Yes, it was disastrous. They were indeed stunning in phrasing test cases with JBehave, but implementing them in Java demands a different skillset. Robert Martin is right: “…having dirty tests is equivalent to, if not worse than, having no tests.”

All this together naturally lead to severe consequences. By early last year we reached a state, where our daily build took ~6.5 hrs:

  • ~25 mins for source checkout + build
  • ~60 mins for deploying around 15+ VMwares
  • the rest of time for running ~700 test cases

in a very flaky environment, causing a significant part of our nightly builds being invalid due to random networking/etc. glitches. Funnily enough acceptance level tests (~ 250 test cases, running — in theory — before any kind of code-delivery) took more than 2 hrs to run on a simplified development test environment, which, together with our “dirty” test code rendered even the most basic bugfix activities to be painfully slow and risky.

Key decision: leave J2EE and move into Spring

From moving into Spring, we expected major benefits on the testability side, together with the advantages of leaving the concept of “application server” behind (which, we knew, would always remain to be a pain in the back for our Operations team) and starting to use Springboot + embedded Tomcat instead. We also hoped for an increased flexibility with different versions of different libraries. The resulted ability of testing our backend architecture within the container was a significant improvement over the previous unit tests from bug detection perspective and also provided much faster feedback than the already existing physical E2E tests.

As far as our architecture is concerned: in the very beginning we decided to use Java everywhere, even on client side (considering the relatively low number of expected concurrent users — around a few hundred — and the benefits of homogeneous skillset needed for all layers, Vaadin is a great choice). Also, we tried to separate our client side business logic from the rendering library so that we could test the interesting part of the system without any dependency to any view specific frameworks.

FOR THE RECORD: With this, actually we failed in many places. Unfortunately for a long time we didn’t have a strong enough code-review protocol installed, the necessity of which we had to learn in the hard way. We still have some amount of business/test-relevant code in the Vaadin dependent layer, being cleaned out continuously. On the other hand, lately we found ways of incorporating the Vaadin objects into the tests as well, but it’s not yet in production

The main concept was to use the Presenter parts of an MVP like setup during the testing, without the Views: the non-rendering specific parts of the screens are instantiated and run from the test code, communicating with the deployed backend services via REST/HTTP. Thus we can create and execute “almost E2E” tests using pure Java, without Selenium or other browser dependent tools.

Separating screen logic from rendering libraries in a standard multi-layered architecture

And then came the idea:

what if we could get rid of the deployed backend components, using the in-container test capabilities of Spring? Can we launch them as a single application? It would eliminate the need for any physical test environments: the only thing we would need is a single JVM, different components could communicate using a local-call adapter instead of REST/HTTP (so long for test environment network glitches). With different Maven profiles for SIT and “physical” integration testing it’s achievable. And yes, let’s use an in-memory DB during the tests: again no infrastructure is needed + it would be lightning fast, right? Right!

When the first measures showed between 20–80x speed increase, we didn’t dare to believe what we saw.

Currently we can run ~60% of our tests in the simplified mode. Total test running time is around 5hrs on the physical environment, while the same tests in SIT mode (60% of them, to be precise) run in less than 8 minutes. That’s 22,5 times faster. Yes, it doesn’t test REST/HTTP or PostgreSQL, neither a lot of 3rd party components we need to integrate with, however the vast majority of our bugs is coming from the parts of the code which actually do get tested this way, without deploying into (a bit still flaky) physical environment (another additional 20 minutes, even with puppet).

Nowadays we are aiming for short (1–3 teamdays) stories, where story acceptance criteria are mostly a bunch of these integration test cases. In our interpretation this grabs the essence of TDD, bringing its core benefits even without unit testing (of non-framework level elements). On the other hand, we still have a lot of debates whether (and how) we should add more unit tests, if not for the sake of bug-detection then for code & design quality itself. As a skimpy product & delivery responsible, I tend to agree that code reviews and static code analysis should cover this part, however I’m pretty sure we’ll find some places where unit tests can bring value.

Way forward and conclusion

The next possible step for speed increase is a quite conventional one. No, we won’t get rid of Java, but building and testing only the necessary parts can be an option. Also, we see a potential in increasing the SIT mode coverage to around 80% (the other parts of our E2E tests are really IO/…etc. specific, we wouldn’t gain much with those). As far as the development is concerned, by using hot code replacement solutions like devtools we now aim to skip context-fire-up time (currently ~ 2 minutes for all the components) and use SITS similarly to unit tests, subsets of them continuously run by the devs locally immediately after any bigger changes in the code.

The main message is that integration/E2E tests indeed come with a lot of effort, however if you can cheat, you might find it worth to spend the effort here, rather than pouring it into extending the base level of your test pyramid. In our case the result is that we can provide a fairly reliable feedback about our E2E functionality within less than 10 minutes, instead of 5hrs, by (re)using the same tests as on the physical environment.