DevTestOps: End To End/System Tests Tips and Tricks

dm03514
Dm03514 Tech Blog
Published in
6 min readMay 7, 2018

These are a couple of tips and tricks that I’ve encountered in writing, executing, maintaining, and retiring End to End (or System Tests). I have found using some of the follow techniques can help to address slow, unreliable, difficult to maintain end to end system tests. Achieving reliable, easy to maintain, and modify, end to end/ system tests isn’t magic and the following techniques should hopefully help to make these critical tests achievable.

Make Tests Hermetic

What: Tests are able to be executed locally, and in CI, using repeatable, clean states.

Why: Help with speed because services are provisioned only with the absolute minimal configuration necessary to receive and respond to requests. Tests are able to provide a perfect world assurance that the system fulfills expected business capabilities.

Approach: Tests should provision the data they need on each test run. These go as far as DELETING, or cleaning up, previous test states and runs. This means that only a single service test can usually be executed at a single time. This helps to encourage keeping the number of tests small while also providing amazing benefits to debugging failures, and reasoning about the state of the system during test failures or errors.

The service should be reset to a clean, known state before each test.

In practice this looks like starting a test service using stub or docker based services. And then executing a test against that service:

$ make start-test-service
$ make test-service-some-test-targeting-local-service

Caveats:

  • Having service dependencies reset to clean states between tests comes with the tradeoff of isolating these tests to only be executable locally, or on CI.

Anti-Patterns:

  • State is persisted between tests
  • The test suite executes multiple tests concurrently (locally or CI)
  • Provisioning a system for test execution requires more than a single command

Execute Tests AGAINST an Environment

What: Provisioning the System Under Test and its dependencies should take place outside of the test suite.

Why: This is a separation of concerns. Tests should focus on exercising and asserting something. Adding the responsibility of managing other processes adds significant complexity. Having tests only focused on applying input to a target system and asserting on output makes it easier to create, monitor and debug tests.

Approach: Have another process (ie docker-compose ) be responsible for starting, monitoring, and making sure that dependencies are fully initialized and able to accept connections massively simplifies tests. Once the service and its dependencies are ready the command yields. This allows tests to be executed immediately afterwards without having to further wait for initialization.

Resetting data and managing text fixtures should occur at the application level and the test framework, or the tests themselves, will need to be able do handle this.

When executing a set of tests the input to the test suite should be resource identifiers, ie queue names/locations, urls to keep communication separate, allowing the test to more closely mimic it being a standard client of the service.

Caveats:

  • This is a design structure to help guide layers in tests, tests shouldn’t necessarily actually be executed against any env ie if there local/CI hermetic tests that perform destructive DB operations to achieve hermetic tests.

Antipatterns:

  • Sleeps in the test, while waiting for resources
  • Tests that start or manage the service and/or its dependencies within the test suite

Make Test Commands Executable

What: All steps necessary to configure and start a service for testing are encapsulated in executable commands.

Why: Commands should be encapsulated and executable to make provisioning a test environment, and executing tests, as easy (and descriptive) as possible. Starting systems for testing may have considerable configuration. These may be because of stubs, and other overheads, required to achieve hermetic, reliable tests. Since these are system tests, they will most likely require networking configuration, as well as potentially multiple data-stores. Provisioning these need to be trivially available.

Approach: Test commands are encapsulated behind descriptive entry points. Instead of starting a service and passing it all the flags necessary to configure it for testing, provide a higher level command ./start-test-dependencies.sh

The same decisions that go into how to scope functions, and what should be encapsulated to what degree should be thought about here. Applying software engineering to infrastructure and test interactions make tooling easier to operate, easier to understand and easier to debug. Contrast this with infrastructure glue that may be 10’s or hundreds of lines long of bash.

Antipatterns:

  • Instructions to configure and start a service are in a README
  • Provisioning a system for test execution requires more than a single command
  • No functionally encapsulated groupings of well defined test/infrastructure operations, ie long bash scripts

Use the same commands to prepare a system for testing locally in your CI Environment

What: Commands that are used to provision, and execute, tests should be executed locally by engineers AND in CI by automation.

Why: Executing the same commands that developers use locally, within CI is what enforces that local tooling doesn’t drift out of sync with CI commands. This is the key to keeping dev commands in sync with reality. I have personally seen (and created) documentation that doesn’t stay in sync. Even if commands have a lot of utility if they aren’t executed regularly they can easily go out of sync. In my experiences, out of sync automation has all of the downsides of out of sync documentation; it can be misleading causing a waste of time, but with the added caveat of potentially altering a production environment in an out of date, incorrect, way.

Approach: Achieving this can be done by encapsulating provisioning and test execution into functions, and making sure to execute those exact functions in CI.

Caveats:

  • CI can be viewed as a client of the code, just like any developer. Be careful to make sure that commands can support multiple different configuration types, ie what a human developer may need vs a machine.

Antipatterns:

  • Tooling to provision and execute system tests locally is different than in CI
  • No tooling is available to provision and execute system tests locally

Focus on Quality over Quantity

What: Focus on reliability, and false positive rate over test coverage.

Why: Having fewer tests with a smaller coverage scope which are reliable, easy to execute, debug, and maintain are more valuable than a larger number of end to end tests, with higher coverage, that are difficult to work with or provide false positives.

Approach: Create a single end to end or service test to prove out the service and test provisioning. Flush out all timing related errors. Take the opportunity to investigate failures to ensure logging allows for quick identification of issues. Once a single test and tooling is stable then slowly incrementally add more tests.

Building tests on top of low quality testing foundations is a recipe for failure. To make this more difficult the time savings in being able to validate a service locally, or in CI, independent of a production like environment, is massive. Being able to determine if a service “works” locally saves so much time. Higher level testing is a tradeoff. Too many tests are prone to failure through huge cost of maintenance. Not enough time invested in testing tools is prone to failure by wasting huge amounts of time trying to verify services after deployment and integration into a production like environment. I found the middle ground to be a handful 1–3 of high level service tests. These tests are broadly scoped to cross cut core business functionality. Because of the low number the difficulty in pinpointing component failures during tests is manageable.

Antipatterns:

  • Tests fail regularly
  • When tests fails the solution is to re-run the tests
  • Debugging failed tests take hours or longer

Don’t Sleep

What: No sleep statements, ever, in any part of end to end / system test code :). Tests and tooling should poll for states, or wait to be notified of events.

Why: Sleeping will be always be off. Sleep durations will need to be over provisioned longer than ever expected, resulting in lots of wasted time for normal cases.

If sleep durations are too small it will result in flaky tests or tests failing between environments, ie CI may have less powerful machine and fail more often.

Approach: have reactive tests. Tests should respond to events when they happen, with sane timeouts. This will completely remove the flakiness around timing and create more resilient tests. The most common strategy for this, one that selenium uses, is polling. This is a happy medium that avoids the complexities of architecting for asynchronicity. An operation will be tried for some amount of time until a timeout is reached. Test-Engine uses this approach extensively.

fulfillment:
type: poll.Poller
interval: 100ms
timeout: 5m
action:
type: http.Http
url: http://localhost:9999/gstreamer/analysis_complete

It is easy to understand and reason about, and easy to implement. It also, often, doesn’t require any architectural changes.

Caveats:

  • It may be easier to sleep than to write little reactive snippets but the benefit of having tests that respond to events instead of try and move forward on a schedule is gigantic for minimizing false positives

Antipatterns:

  • Sleep, anywhere
  • Not including timeouts with reactive code, ie poll until a condition until some timeout, then fail

--

--