Spec to Gherkin to Code

A Relay Play based on Swagger and OAS

Steven F. Lott
Feb 1, 2018 · 16 min read

The phrase “Spec to Gherkin to Code” almost sounds like sportscaster calling out a baseball double-play from shortstop to second baseman to first baseman. The ideal of well-synchronized teamwork is how we move from an API description written to the OpenAPI specification (OAS, formerly Swagger) to a Gherkin-based acceptance test to actual working code. What’s interesting to me is the details of the hand-off and how information is preserved in each stage of the process.

This seems to parallel the essence of building a team by balancing strengths and weaknesses. The rules for a sport lead to a need built specialized skills. It takes nine people to make a baseball team; it takes several tools and languages to build a working application. It’s can be an error to lift up one player as “most valuable” when all of them are required.

It’s important to keep the end in mind: working code. But running application code — in isolation — isn’t enough. In addition to the code, we need some basis for a claim the code really works. If you’re a baseball fan, and rooting for a well-executed defensive play, you’re happy to see the umpire calling the runners out. It’s essential to have an umpire provide the official witness to what just transpired. In the software realm, the umpire’s call is replaced by automated testing. As we look at various testing tools, I’m reminded of the three baseball umpires, struggling to be fair:

Umpire #1:There are balls and there are strikes, and I call them as they are.”

Umpire #2: “I’m only human, the very best I can do is to call them as I see them.”

Umpire #3: “You’re both wrong: they’re nothing until you call them.”

Umpire #3’s view parallels automated testing and Acceptance Test Driven Development (ATDD). Our code doesn’t work until we have automated tests confirming our subjective impression of correct behavior. Without the test results, there is no usable code.

I think this helps clarify the role of the OpenAPI specification and Gherkin. The final, working code is only a part of the solution. The test cases are at least as important as the code and in some cases, the test cases may be the single more important artifact because they may be the only meaningful definition of correct behavior.

However, there can be problems trying to manage Gherkin, OpenAPI, and code, and getting them to cooperate successfully. Our players have different languages, but they should all be playing one game. We can create a small Python tool to assure that the languages match.

RESTful API’s and OpenAPI Specification

I’m going to start with OpenAPI specification because I’m biased toward writing RESTful API’s and an API description written to the OpenAPI specification provides a wealth of details that are easily transformed to Gherkin. Note — I’m going to focus on OAS version 2 instead of OAS version 3 in this post.

While an OpenAPI specification is — technically — a simple JavaScript Object Notation (JSON) document, it can be difficult to read and write JSON. Because of this, a lot of people prefer to use YAML (Yet Another Markup Language) notation for writing API descriptions to the OpenAPI specification. If you go to https://swagger.io/and fire up the online editor, it works in YAML notation.

An OpenAPI specification is constrained by the rules of the underlying HTTP. The rules are not elaborated in the specification. Formally, they’re defined in RFC’s which define the semantics of the specification section. While behavior details are not stated in the spec, they will need to be stated when creating Gherkin for an acceptance test.

An API description written to the OpenAPI specification defines the URL paths handled by a server. Within each path there are operations. These are the verbs that apply to the resources defined by the paths. HTTP imposes some limitations on the available verbs, requiring some cleverness in choosing operations appropriately. Gherkin doesn’t have the narrow focus that the OpenAPI specification has.

Here’s a concrete example keeping with the baseball theme in our intro:

swagger: ‘2.0’
title: A Quick Demo
version: ‘1.0’

summary: The umpires
description: The umpire list
type: object
type: string
example: “OK”
type: array
type: string
example: [“Tinker”, “Evers”, “Chance”]
description: Error

We’ve defined a RESTful API with one resource, /umpires, and only one operation, GET for the resource. The name /umpires is plural to emphasize it’s a classification of resources, not an individual instance of a resource.

The schema section provides a definition of the response document. The ins and outs of the schema definition can be hard to visualize. An example helps: here’s the example document

“status”: “OK”,
“data”: [

This document was built from the examples provided in the OpenAPI specification. The example values are central to getting from the Spec to Gherkin to Code.

Acceptance Testing and Gherkin

One very popular Acceptance Testing tool is Cucumber This tool works with a Gherkin language specification and Ruby-based step definitions. I’m more familiar with Behave because it’s a good Python implementation of Cucumber. Read more in Writing better user stories with Gherkin and Cucumber.

As with the OpenAPI specification, Gherkin tends to be declarative in nature. The specifications are not working code, per se, they’re a description of the code’s expected behavior. Gherkin statements are written passively to allow a flexible implementation with imperative code.

Here’s an example of a Gherkin scenario specification:

Scenario: Get the list of umpires
Given a test microservice
And umpires [‘Tinker’, ‘Evers’, ‘Chance’]
When request is “/umpires”
Then response body includes [‘Tinker’, ‘Evers’, ‘Chance’]

This has a formalized structure with three step definitions:

  • Given — this specifies the context.
  • When — this will specify a RESTful API request.
  • Then — this describes the expected response.

The Gherkin structure is an elegant way to specify behavior at any level of granularity. It can be applied to functions, class definitions, modules, packages, microservices, frameworks, databases, etc. Everything where there’s behavior.

If Gherkin feature files aren’t code, how are they turned into a test?

We need to write step definitions to bridge between the features written in terminology of the problem domain, and test cases that work in the solution domain. When using Behave, we might have an steps/umpire.py file with these kinds of definitions:

from behave import *

@given(‘Given a test microservice’)
def create_client(context):
# Some test client setup goes here.

@given(“umpires [‘Tinker’, ‘Evers’, ‘Chance’]”)
def set_database(context):
# Some database setup goes here.

@when(‘request is “/umpires”’)
def make_get_umpires_request(context):
# The RESTful API request goes here.

@then(“response body includes [‘Tinker’, ‘Evers’, ‘Chance’]”)
def assert_response(context):
# Check the response goes here.

The code shown here has placeholders based on the Gherkin scenarios. The pass statements must be replaced with working test code. I’ve thrown in some comments to help show what kind of code needs to be written.

For example, if the server is implemented in http://flask.pocoo.org/, then the create_client() function would need to build the application and create a Flask test client for the application.

When the tests are run, Behave will use both the feature files and the step specifications to perform the tests and confirm the results. The output will include a log as well as a JSON document with the official results to show the software behaves acceptably.

The JSON output created by running Behave or pytest-bdd is the embodiment of an official Umpire’s signals of “Safe” or “Out”. This is what leads to the eruption of joy or sorrow from the fans.

Gherkin is often touted as a good way to get non-technical folks invested in the correct behavior of software. The potential cost of Gherkin is the extra complexity of working in Yet Another Language. As we noted above, we’re expressing common ideas in three different languages.

  • OpenAPI specification — while the syntax of the API description may be YAML or JSON, it’s a declarative language to formalize behavior.
  • Gherkin — another formalization of the behavior. It’s less strict than OpenAPI specification.
  • The Code — this is still another formalization of the desired behavior. For me, this is in Python. It can, of course, involve other languages like HTML, CSS, SQL, the Jinja template language, etc.

The Development Process

Once we’ve got written our API description to the OpenAPI specification, we can begin coding.

Once we’ve got a Gherkin feature definition, we can begin testing.

Once we’ve got a baseball and a bat, we can play ball.

Much of the work is writing and refactoring the code until the tests pass and the code looks good enough to publish. Ideally, we’re using Behave (or pytest-bdd) to confirm the software works. It’s handy to configure the repository so each commit will run Behave, and confirm the code works. In some cases, it can be simpler to use Git pull requests to trigger running Behave. This confirms the code to be merged will meet its required behavior.

When we look at the Spec-to-Gherkin-to-Code play, it’s helpful to narrow the super-flexible Gherkin descriptions so they match the API description who wrote exactly.

Here’s my preference.

  1. Write the API description to the OpenAPI specification. I like to start here because it helps me stay focused on RESTful API limitations and constraints.
  2. Transform that API description to Gherkin. I’ll outline a tool for this below. It’s a difficult problem in general, but you can easily develop a tool in Python that supports the necessary customizations and tweaks.
  3. Provide the step definitions so the Gherkin can be used to run tests. The first few times, this involves some learning. Good step definitions can be generic enough to be highly reusable.
  4. Write code that works. Refactor the code until it’s good.

Converting Open API Spec to Gherkin

Moving from an API description written to the OpenAPI specification to Gherkin is restructuring the OAS API description (or spec for short) details into a different notation. It’s important to avoid loss and confusion in the transformation process.

The OpenAPI specification’s nested structure can be summarized like this:

… (request details)
code: … (response details)

A specification often has a number of paths. For each path item, there are one or more operations. Each operation has one or more responses.

A Gherkin document — based on an API description written to the OpenAPI specification — will have one (or more) feature descriptions. (tags can be used to distinguish features for complex applications.) Each feature will include a number of scenarios. Each scenario describes one of the responses to an (path, operation) combination.

A Gherkin template for a scenario can look like this:

scenario: {operation summary} leading to {response description}
Given {context}
And some context tied to the response
When {operation} on {path}
And some reason for the response
Then validate expected {response}

Most of the {placeholders} above come directly from the OAS description. The exceptions are the “context tied to the response” and “reason for the response”.

These “some context” and “some reason” phrases are examples of details which are implied — but not stated — in the spec. This happens because OpenAPI specification is specific to RESTful API’s and the examples provided focus on the “Happy Path” of successful responses. Any “Unhappy” responses of 401 UNAUTHORIZED or 404 NOT FOUND are often summarized in the description property of a response. That’s why I claim the conditions leading to an unhappy response are implied.

Here’s a concrete example. We’ll add a path to the API description we wrote using the OpenAPI specification to get details for an umpire. This implies an error for an unknown umpire.

summary: An umpire
— name: umpire
in: path
description: the umpire’s name
type: string
required: true
description: An umpire’s details
type: object
type: string
example: “Tinker”
description: Unknown umpire

This describes a response status code of 404 for the case when the Umpire is unknown. Developers who understand RESTful API’s (and the OpenAPI specification) know what this means. When working with Gherkin step definitions, we’ll need to make this explicit to be sure we can distinguish the two scenarios from each other.

We have several choices of ways expose the details of the causes for this kind of error:

  • Try to expand the spec description entry using stylized natural language. Adding erroneous examples to define these additional Gherkin scenarios will clutter the spec.
  • Add extension properties in the form of “x-given” and “x-when” in the spec. Extension properties must be quietly ignored by tools that can’t make use of them, so they’re safe to add.
description: Unknown umpire
x-given: Umpires are [“Tinker”]
x-when: request is for “Evers”

These extension values provide the needed concrete examples to build a Gherkin error scenario. This idea can be extended to provide a number of error scenarios.

While adding error details can make it easier to convert the API description written to the OpenAPI specification to meaningful Gherkin, they will clutter things up. It seems much better to use the OpenAPI specification for the happy paths, and describe the unhappy paths in Gherkin only.

Technical Gherkin

There’s one more essential topic required to make the crucial relay from Spec to Gherkin to Code. We have to change our focus slightly. Gherkin — when used by the business owners — is like a baseball slugger swinging for the. The shot is strategic: it will be a home run.

Gherkin can be narrowed to have a technical focus. Here’s an example.

Scenario: Get list of Umpires
Given a test app and client
And a valid Authorization token is ‘dizzy’
And a test database
And umpire names of [“Tinker”, “Evers”, “Chance”]
When headers include {“x-Auth-Token”: “dizzy”, “Accept”: “application/json”}
And request operation is ‘Get’
And request path is ‘/umpires’
Then response status is 200
And response body is {“status”: “OK”, “data”: [“Tinker”, “Evers”, “Chance”]}

The general shape of the Gherkin scenario has the Given-When-Then structure. The details, however, have changed from a product owner’s abstract view to specific technical details describing a RESTful API implementation.

The steps have been parameterized using some boilerplate text of “headers include” followed by the parameter values. The idea is to create step definitions based on the Behave tool’s ability to parse the step strings. One general Python step implementation can work with a variety of parameter values.

Here’s a parameterized step definition using Behave’s notation:

@when(‘Response status is {status:int}’)
def check_status(context, status):
assert context[‘status’] == status

This @when definition can be reused by all the Gherkin scenarios generated from the API description we’ve written to the OpenAPI specifications. Having a library of reusable step definitions means that a change to the spec will change the Gherkin-based processing. This has an immediate benefit in executing the changed test scenarios.

OpenAPI specification to Gherkin Script

While this seems pretty clear and straight-forward, there are a number of complicating factors.

  • Naming. The API description we’ve written to the OpenAPI specification allows both summaries and descriptions for an operation as well as each response. Details of how these fields are used can vary and each organization will establish different conventions for naming. A OpenAPI-to-Gherkin tool must reflect the organizational naming conventions.
  • Optional Features. The specification for an operation may define many optional parameters. The description text may include some guidance on what the parameters mean and what default behaviors are when parameters are omitted. Enumerating the scenarios can be difficult to automate, and manual intervention may be required to provide the detailed test cases showing how the parameters interact.
  • Local Standards. An established family of microservices might have some standard features assumed, rather than defined, in each OpenAPI API document. While the description can be copied into each OpenAPI API document, it might be difficult to work out all the necessary Gherkin-focused variations to be tested. It may be better to handle these cases outside the OpenAPI specification.
  • External Dependencies. When working with a simple microservice, the state of a database can be described with x-given extensions. Trying to describe errors and problems with multiple, complex external dependencies through simple x-given extensions may become awkwardly complex. In these cases, some manually-created step definitions may be easier than working out an OAS extension.
  • Unhappy Path Scenarios. Rather than clutter our API description with error-processing details, it may be better for a OpenAPI-to-Gherkin tool to inject the error processing scenarios.

A simple and generic OpenAPI-to-Gherkin application seems needlessly difficult. Most of the practical uses would involve extensions and optional features. Instead it seems better to build a customized two-step transformation pipeline.

  1. OpenAPI In. This builds the scenario definitions from the API description we wrote to the OpenAPI specification. Customizations and extensions can be handled as necessary when ingesting the source.
  2. Gherkin Out. This writes the Gherkin feature files from the scenario definitions. This involves a careful orchestration of features between the step definitions and the Gherkin text. Customizations and extensions can be injected here.

Between these two is an internal representation of the feature as a sequence of scenarios. Each scenario is a collection of values extracted from the OpenAPI document and reassembled into a Python dictionary with keys like these:

  • Path- The path which goes into the When step.
  • Operation- The operation which goes into the When step.
  • Request- Details of the request and the parameters. Parts of this populate the scenario name. Parts of this will populate the When steps to make the request. In the case when there are optional parameters, each combination could lead to a separate scenario. Beware of combinatoric explosion here: the size of the powerset of a n optional parameters is 2n.
  • Status- This is the overall final status for this scenario.
  • Response- This is the expected response example document.

Here’s an example:

{‘path’: ‘/credentials’,
‘operation’: ‘post’,
‘request’: {
‘parameters’: [
{‘in’: ‘body’,
‘name’: ‘User Credentials’,
‘schema’: {
‘description’: ‘input’,
‘properties’: {‘username’: {‘type’: ‘string’}}
‘summary’: ‘Post new user credentials’,
‘status’: ‘200’,
‘response’: {‘description’: ‘it worked’}}

And yes, you’re right, the parameter doesn’t have any example detail. It’s only the schema. Filling in these details is part of an initial quality check. If a simple tool can’t generate proper Gherkin tests, then the spec needs to be expanded.

The essential Python programming to create this has the following kind of top-level function:

def make_gherkin(oaspec: Dict, file: OptFile=sys.stdout):    common = get_common_features(oaspec)
emit_feature_header(oaspec, file)
for scenario in scenario_iter(oaspec):
emit_scenario(make_scenario(common, scenario), file)

The output is created by two functions.

  • emit_feature_header() writes the “Feature:” section at the top of the file. The template for writing a feature header is trivial.
  • emit_scenario() writes the Gherkin steps for a given scenario. This is also relatively simple. It’s a matter of annotating the various steps with words like ‘Given’, ‘When’, ‘Then’, and ‘And’.

The data gathering is done by three functions:

  • get_common_features() extracts common details from the OpenAPI document. This includes the security information, MIME types, and security definitions. This extracts fields like basePath from the OpenAPI document.
  • make_scenario() extracts details from OpenAPI and repackages those details into the various Gherkin steps. This is the most complex part of the processing.
  • scenario_iter() is a generator function that yields a sequence of scenarios based on the various combinations of paths, operations, and optional parameters.

The scenario_iter() function can look like this:

def scenario_iter(oaspec: Dict) -> Iterable[Dict]:
for path in oaspec[‘paths’]:
for operation in oaspec[‘paths’][path]:
request = oaspec[‘paths’][path][operation].copy()
# Optional: compute the powerset of the optional parameters
responses = request.pop(‘responses’)
for response_status in responses:
response = responses[response_status].copy()
yield {
‘path’: path,
‘operation’: operation,
‘request’: request,
‘status’: response_status,
‘response’: response}

This function iterates through paths, operations, and responses. It yields each unique combination as a distinct scenario. The details can be used to emit details in Gherkin notation.

This example doesn’t compute the powerset of the optional parameters. The itertools documentation shows how to build a powerset() iterator using the chain.from_iterable() function. This can add a helpful level of sophistication for the alternative scenarios when working with a complex OpenAPI specification.

The make_scenario() function is where the bulk of the transformation happens. The goal for this function is to emit a dictionary that follows the overall structure of the Gherkin scenario. This is where the implicit details of the API description we wrote are expanded into explicit Gherkin text.

The tail end of make_scenario() will look like this:

scenario = {
‘scenario’: [summary, description],
‘given’: [],
‘when’: [],
‘then’: []
scenario[‘given’].append(‘test app and client’)
scenario[‘given’].extend( [f”{sec_key} security” for sec_key in security_keys] )

if req_header:
req_header_json = json.dumps(req_header)
scenario[‘when’].append(f”headers are {req_header_json}”)
if req_query:
req_query_json = json.dumps(req_query)
scenario[‘when’].append(f”query is {req_query_json}”)
if req_body:
req_body_json = json.dumps(req_body)
scenario[‘when’].append(f”body is {req_body_json}”)
scenario[‘when’].append(f”request operation is {operation}”)
scenario[‘when’].append(f”request path is {basePath}{path}”)

scenario[‘then’].append(f”status is {status} {response[‘description’]}”)
if set(response.keys()) > {‘description’}:
response_json = json.dumps(example_response)
scenario[‘then’].append(f”response is {response_json}”)

Each of the three Gherkin steps is a list of clauses. Each clause is a formatted string that will be both meaningful to people, and also useful by the step matching rules of Behave or pytest-bdd. A good choice of strings reflects a balance between technical details and meaningful terminology.

Note, each of these clauses has a direct parallel with a step fixture. For example, appending headers are {…}” to the when clause. The “headers are” text must be matched by a step definition. We could use something very detailed like @when(“headers are {…}”). It’s more flexible if we use @when(parsers.re(r”headers are (?P<header>.*)”)) so a generic step definition can be reused in multiple scenarios.

Now that we’ve done the first relay, from Spec to Gherkin we’re ready for the next relay from Spec to Gherkin to Code. Tools like Behave and pytest are the umpires, making the final call that the play was successful.

Spec to Gherkin to Code

We often call the outline to a project a “pitch.” Let’s not push this analogy too far, but the defensive team — coach, pitcher, and catcher are grooming the pitch. When it’s hit to the infield, there’s a lot that of defensive play that can go wrong, allowing the runners to advance around the bases.

Writing an API description to the OpenAPI specification is a common first step in designing an API. This description is based on the pitch, and it can drive the API design and implementation.

While there are numerous test tools for API descriptions written to the OpenAPI specification, they suffer from a limitation: the OpenAPI examples only cover the happy path, leaving all the “other” paths unspecified. If we write our own tools to create tests from extended the OpenAPI specification, we can readily include non-happy path examples.

Gherkin-based tools like Behave produce formal documentation showing that a suite of acceptance tests have all passed. If we don’t have this formal notification of all the tests passing, we’re playing sports without any officials. It’s fun, but disputes are inevitable. It really helps to have an impartial call on the close plays.

Tossing the ball from Spec to Gherkin to Code makes it easy to be sure the play is called correctly.


DISCLOSURE STATEMENT: These opinions are those of the author. Unless noted otherwise in this post, Capital One is not affiliated with, nor is it endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are the ownership of their respective owners. This article is © 2018 Capital One.

Capital One Tech

The low down on our high tech from the engineering experts at Capital One. Learn about the solutions, ideas and stories driving our tech transformation.

Steven F. Lott

Written by

Programmer. Writer. Whitby 42 Sailor.

Capital One Tech

The low down on our high tech from the engineering experts at Capital One. Learn about the solutions, ideas and stories driving our tech transformation.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade