HTTP-Based Contract Testing @ Gamesys

Danny Noam
The Startup
Published in
14 min readAug 7, 2020

Abstract

Contract Testing: a modern testing strategy much-discussed, though ultimately evasive of a universal understanding. Let’s get academic and wordy: it’s become an umbrella term for describing tests between disparate, heterogenous components that interact via some binding interface. This interface is described by a contract, a specification of sorts, which defines the inputs and outputs of a given provider. In layman’s terms, it’s testing that two components are speaking the same, agreed-upon language.

In distributed architectures, systems are typically composed of many components, and component interactions within a given system can be likened to vertices and edges. The vertices, of course, represent the individual components, and the edges represent the communication mechanisms that bridge two vertices together. The edge between two connecting vertices can be represented by a contract — a human-readable specification for bidirectional communication over a network, typically some abstraction above TCP (i.e. HTTP), but occasionally even UDP. In essence, a contract should be viewed as network-specific, whilst the components are language-agnostic.

Figure 1A: Edges and vertices

A contract can come in many different flavours: it could be a hastily-written, infuriatingly error-strewn Word document provided by some junior engineer, a 300-page tedium-riddled PDF littered with SOAP examples or, preferably for REST over HTTP, an accurate and concise OpenAPI contract. With today’s focus on the latter, what exactly are we trying to test? At Gamesys, we have concocted a number of different patterns that extend and implement the concept of Contract Testing. One of those patterns, which is the focus of today’s article, is HTTP-Based Contract Testing.

Actor Glossary

Before deep-diving into the murky waters of HTTP-Based Contract Testing, it’s worth familiarising yourself with the cast. There are four main actors:

Contract: The specification of a given interface. For the purposes of this article, we are referring specifically to OpenAPI contracts, and will use the terms interchangeably. This should be considered a runtime-binding agreement between a provider and her consumers, as well as a documentation artifact.
Provider: A HTTP-based application that provides a service.
Consumer: A HTTP-based application that makes service requests to a provider.
Contract Repository: A centralised store for contracts, accessible from all consumers and provider.

A rough-and-ready topology of how these actors interact is illustrated below.

Figure 1b: Contract Testing Actor Topology

The Abridged & Academic Version

Provider-Side Contract Testing: During a build, providers MUST run provider-side contract tests to ensure the accuracy and syntactic correctness of a given contract. Provided these tests pass, the contract should be subsequently pushed to a centralised contract repository (whether that be Artifactory, Github, etc.)

Additional steps should also be taken to ensure that, in accordance with semantic versioning, contracts MUST NOT be pushed to the centralised contract repository, for a given major version, if it contains breaking changes. The syntactic correctness of a given contract, and semantic versioning checks, are not within the scope of this article.

Figure 1c: The build lifecycle for provider-side contract tests

Consumer-Side Contract Testing: During a build, consumers MUST pull the latest contract for a given, major API version and run consumer-side contract tests to ensure they interact appropriately with a given provider, as specified by a contract.

Figure 1d: The build lifecycle for consumer-side contract tests

Context

At Gamesys — between all the legacy system patches and regulatory deadlines thrust onto us— we are aggressively transitioning to a microservice-based architecture. This is not a silver bullet architecture, despite becoming a resumé buzzword in recent years. If products have sensible, logical boundaries, and teams are organised appropriately around these verticals, companies can enjoy a faster time-to-market, greater team autonomy and a higher frequency of deployments to production as a direct result of this architecture. Gone are the days of a meticulous release cadence, a bi-weekly scrum-of-scrums and launch-day panic.

Historically, our main development platform at Gamesys has been technically-partitioned — disregarding middleware and networking infrastructure, we had a front-end monolith and a back-end monolith, with a single database. This, of course, was a reflection of our organisational structure — as Conway’s Law famously suggests. To oversimplify, our platform was essentially one giant CRUD application which, whilst problematic in regards to deployments and inter-area organisational dependencies, meant most interfaces, or contracts, were enforced at compile-time.

Undertaking this fine-grained componentization of our platform, of course, means intra-process communication evolves into inter-process communication. Contracts are no longer enforced by compile-time interfaces, with development violations flagged up by our IDEs. Instead, our platform becomes chatty, our network bandwidth becomes saturated, and compile-time problems become runtime problems.

And let’s immediately focus on the first rule of networks: they don’t work. Networks are fallible — packets will arrive out of order, communication will become unbearably slow, and you will even have to handle the occasional bogan.

With this in mind — is it tenable to continue development of ‘spin-up-the-world’ style integration tests in the modern, micro-service era? You’re no longer launching two components, and testing their integration — a given user journey may require 10+ disparate components, all communicating over the wire. Some of these components may have transitive, middleware dependencies (e.g. Kafka) that, actually, you really, really don’t care about, but have to spin up anyway. Which transitive dependencies should you mock? Why should you care? What was once spinning up two, albeit heavy, applications, and testing their integration, has evolved into spiderweb, entropic chaos.

Figure 1e: This might have once been a monolith, with compile-time interfaces enforced…

And with that scaremongering out of the way…

The Goals

With HTTP-Based Contract Testing, we wanted to ensure that:

  • Providers MUST adhere to the contract they provide;
  • Consumers MUST adhere to the contract they consume;
  • Providers MUST NOT make breaking changes to the contract they provide (though this is out-of-scope for this article)

Provider-Side Contract Testing

Ultimately, a contract is a representation, or a specification, of the inputs and outputs of a given application or service. It’s development-time documentation, and as such, it’s of paramount importance that the contract is indeed accurate. If a contract allows you to GET a fish, it would be really disappointing to find yourself dealing with an elephant at runtime.

How we construct these contracts is of very little relevance. At Gamesys, we have experimented with both auto-generated contracts (using OpenAPI 3 Java annotation libraries), and hand-crafted contracts. Both have pros and cons but, ultimately, the output — syntactically and semantically accurate contracts — should remain the primary focus.

Admittedly, ‘accuracy’ is a loaded word, so let’s drill a little deeper:

  • Is my application serving the correct URLs, given the correct HTTP verbs, as specified by the contract?
  • Is my application serving/creating resources as specified by the contract?

This is where the wonderful Dredd comes into the fore. Dredd, for the purposes of our provider-side contract tests, assumes the role of a consumer. Given an OpenAPI v2/v3 contract, and the URL to your running application, it will fire a series of requests that are syntactically correct as per your contract. During a given test, as long as your application is capable of handling those requests, and returning responses that adhere to the contract specification, the test passes and, crucially, we can confidently assert our contract is accurate.

Figure 1f: Provider-Side Contract Testing

It’s important to note that we do not want to conflate business rules with these tests. Let’s illustrate this with an example — we’re creating a competitor for Medium. We have an endpoint that allows you to create (POST) an article. When we first launch this service, we’d like to undercut Medium so, as long as you have one forum post, you can post an article.

Our capricious product-owners, blind-sided by our success, decide to make amendments to our business rules — to create an article, users must now have twenty forum posts. Our contract tests should be entirely insulated from this change — the specification, or interface, remains unaltered. Yes, the prerequisite requirements for interacting with said endpoint have changed, but the inputs and outputs remain unchanged.

A well-tested application would have additional behaviour-driven tests that need amending, rather than contract tests. It’s the much-quipped separation of concerns — the primary focus of provider-side contract testing is to ensure our contracts are accurate. Most enterprises have a sensible testing taxonomy defined, and tests should have bounded concerns, otherwise we may find ourselves mindlessly sleepwalking back into the brittle tests of yesteryear.

As Figure 1f attempts to illustrate, we isolate our provider-side contract tests from evolving business rules by mocking out any service layers, where business rules reside, in our spun-up application. In a horizontally-layered, technically-partitioned application architecture (e.g. MVC), this is simple to achieve. Dredd will make requests, and spun-up application will handle the requests with a mocked service layer thereby ensuring, by virtue of stubs, that any rule changes in this layer will have no impact on our tests.

Okay, so we’ve run provider-side contract tests and they’ve come up green — what now? As consumer-driven contract testing rightly stresses, the value of providers is at consumption. In our case, a contract is both human-readable and machine-readable documentation for how to interact with a given provider. We need to make it available for consumption by consumers.

Ostensibly, this should be as simple as pushing to a binary repository manager, such as Artifactory, and going on our merry little way. Unfortunately, due care must be exercised. How do we indicate to consumers what the latest version of a contract is? Is latest synonymous with released? What if an application is rolled back on production (and thus, the contract that corresponds to that version is no longer in active use)?

Pushing to our Centralised Contract Repository

For context, as part of our deployment process, we use release gates at Gamesys — a scattering of *-build, *-rc and *-release repositories in Artifactory. In essence, these correspond to the different development environments that our binaries have been tested on. What’s important to reinforce, once again, is that a contract is a specification of an application’s inputs and outputs — almost like a shallow copy of a given provider, and given that we have robust provider-side contract tests in place, we can guarantee that the two do not diverge.

With that said, it was decided that, as our contract is indeed a reflection of the application, we would promote our contracts through release gates in an identical manner. We created repositories that aligned with a given contract name and major version. For example, we would push our contracts to contract-repository-build/registration-api-v1 following tests in our build. When promoting our application from build to rc, we would similarly promote to contract-repository-rc, and so forth.

We found aligning the promotion strategy of a contract with the application itself somewhat reduces the cognitive complexity of contract testing — if you know how an application is promoted through the various release gates, there’s nothing to relearn for contract artifacts. An application and contract are, theoretically, inseparable artifacts, and treating them both as equal citizens reinforces this belief.

This also ensures that concerns such as rollback are handled appropriately, in-line with your existing enterprise strategy— as long as consumers are able to accurately identify the latest version of a contract (e.g. through timestamp metadata), then we alleviate the concerns of consumers erroneously pulling an out-of-date contract and testing against them. Of course, this is why it’s important that you enact a mature methodology for identifying the live version of a given contract, whether that’s via a timestamp, custom tag or latest minor/patch version.

Consumer-Side Contract Testing

Unlike a provider, a consumer can consume from many different providers. Consumers often act as aggregators, accumulating data from various different sources and making sense of them in the context of the domain they represent. This doesn’t present any issues as such, but axiomatically infers a linear distribution of tests correlating to the number of providers you consume from. That might be a lot of tests.

With that caveat out of the way — for those applications that consume from hundreds of providers — let us dig deeper. Prism is the tool-of-choice for running these tests at Gamesys. In a nutshell, Prism assumes the role of a provider — you provide it an OpenAPI contract, and it will spin-up a server that handles requests as per the contract specification.

The general idea is that, in a well-crafted application that somewhat adheres to the single responsibility principle, you cherry-pick a client from your codebase — a client whose sole responsibility is communicating with a provider — and use this client to fire requests at Prism. If those requests are syntactically correct, and do not violate the constraints imposed by a given contract, Prism will respond as appropriate. If your application can handle the provided responses (i.e. deserialize the JSON responses), then your tests should be considered successful.

Figure 1g: Consumer-side contract testing with Prism

Network communication is bidirectional and, in the case of consumer-side contract tests, the successful handling of a response may require you to equip your thinking hat. If you’re working with a strongly-typed language such as Java, where response data is typically deserialized and mapped into runtime objects, and your objects appropriately annotated (e.g. @NotNull on required fields), you can assert with a high-level of confidence that you’re not violating a contract agreement if a response is deserialized successfully.

So, what about languages like JavaScript, where JSON (the de-facto serialization standard on the internet) is the native format, and no deserialization is required? In JavaScript, it’s common for response data to be assigned to a variable for later evaluation, without implicit or explicitly checking the contents of a response. For example:

var responseData = axios.get();

We may not use this response data immediately. What if our application expects responseData to contain a dog, but it contains a duck? Our application will blow up in unexpected ways. Assuming our providers are well-behaved, and don’t publish breaking changes to a contract (in the case of an enforced, enterprise-wide contract-testing strategy!), we’d like to find out if our application is violating a contract as fast as humanely possible. Runtime outages are very expensive.

Consumer-Side Contract Testing (with no deserialization)

Luckily, at Gamesys, we quickly realised that JSON Schema, a ‘specification vocabulary’ for JSON objects, had a high degree of parity with OpenAPI, with plans to have complete parity in the future. This meant that all we needed to do was, in our client-side clients, compare the response data against some schema, as defined by the provider contract. If it response doesn’t match, handle it as appropriate. Many may be wincing — shoehorning elements of Java-esque deserialization into our beautiful front-ends, and it might not ideal for your project. But, asides from the up-front technical toll of baking these schemas into our codebase, it cost very us very little, with high returns in the form of deployment confidence.

Of course, performance is king on the client-side. We diligently performed performance tests and ascertained that, unless you’re running with a Nokia 3310 boasting an integrated Pentium chipset, performance is unlikely to be affected by schema validation. Nevertheless, a compromise could be to only run schema-validation during consumer-side contract tests — you’re unlikely to recover if providers decide to violate their end of the bargain at runtime.

Figure 1h: Using JSON Schema to validate your responses in JavaScript

What are the alternatives?

Historically, we’ve ran two types of integration tests at Gamesys:

Classical Integration Tests: Spinning up an application, with real instances of their dependent applications and middleware, and running tests.

Figure 1i: Classical Integration Testing

Fixture-based Integration Tests: Spinning up an application, with mocked instances of their dependent applications and middleware, and running tests.

Figure 1j: Fixture-based Integration Tests

Both options are fatally flawed. Classical integration tests, as you will have likely experienced yourself, are beset by transient network issues, evolving dependencies and general slowness. Their brittle nature has been a focal point of developer ire over the years — resulting in frustration, reduced productivity and diminishing our ability to release frequently.

Fixture-based integration tests, whilst certainly preferable, are subject to human error. To provide total confidence, they rely on developers diligently updating fixtures when APIs are changed. We’re in the business of automation so this approach seems, unsurprisingly, non-sensical when juxtaposed with superior testing options such as contract testing.

Why not consumer-driven contract testing?

It’s a question of trade-offs, and assessing appropriateness for our platform. A core tenet of consumer-side contract testing is that consumers dictate what they need from providers. Whilst it’s hard to disagree that, yes, the value in service providers is at the point of consumption, we feel a contract should represent a balanced and mutual agreement between a provider and her consumers. Ultimately, a provider is the owner and expert of their provided context — consumers and providers should negotiate on which data points are required for exposing. Consumer-driven contract testing represents a dangerous subversion of control.

Pact, the market-leader for consumer-side contract testing, offers a powerful ecosystem. Unfortunately, this also means there is an element of architectural disruption and vendor lock-in associated with adoption. Tests are written in a Pact-specific DSL, and there is, of course, monetary cost associated with it. We’re not actively looking to discredit or discourage this approach, but to simply assess the pro and cons within the context of your enterprise.

Pitfalls

Like all approaches and patterns that we choose to adopt, there are trade-offs to be made. Our chosen approach is low-cost and low-risk, in regards to any form of vendor lock-in. Perhaps arguable, but some may say our approach, once prototyped, is faster to bootstrap a given project with, as our approach relies on existing testing methodologies (e.g. plain old asserting) and libraries (such as JUnit).

However, Dredd and Prism may lack the sort of commercial long-term support that Pact offers. Both Dredd and Prism are open-source projects and, as such, rely on the goodwill of contributors to actively maintain. Open-source, of course, is a double-edged sword in this regard — we’re potentially reliant on volunteers pushing updates and fixes to these projects, but equally we’re capable of making pull requests ourselves, as necessary.

It’s entirely possible that Dredd and Prism may one day be discontinued relics that require replacing. The good news, in response to this, is that they have limited responsibilities, and it’s unlikely that we’ll be left high-and-dry in the event of their demise. Other tools already exist, and new tools will almost certainly come into the fray in the coming years.

Conclusion

Truth be told, we’re only just starting on our contract testing journey in Gamesys. We’ve definitely experienced bumps in the road — Dredd and Prism both have flaws but, for all intents and purposes, early indications suggest that this approach is much more reliable alternative to both classical integration tests, and fixture-based integration tests.

We’ve yet to experience any flakiness running these tests — barring some network misconfiguration on our pipelines — and, when understood by developers, provide both improved documentation and a bigger sense of confidence when deploying to production.

--

--

Danny Noam
The Startup

Technical Lead @ Gamesys — Road To 1m Followers