Normalising requirements in multi-operation protocol specs

Richard D Jones
7 min readJun 1, 2018

--

In writing a specification, we strive for normalisation (i.e. avoiding repeating the same things) for a few reasons:

  1. For purity in the spec. By saying things once and once only you can be clear and definitive.
  2. For your own sanity. Each time the spec evolves, such as from feedback from your community, you need to make sure you make all the changes in all the right places. If you have repeated yourself everywhere this becomes a much more difficult task
  3. To make it easier to get the spec right. If you normalise the spec at every turn, you are forced to sanity-check the consequences of your decisions. You can’t change a rule without reviewing how it impacts on everywhere else it is imported.

This is not too hard to do when you are dealing with a small focused specification, like a new HTTP header. It becomes significantly harder when dealing with a protocol specification. And incidentally, a protocol specification is not very different from an API specification, so these comments apply if you’re also documenting your real-world API.

I encountered this challenge in the context of the SWORDv3 specification that I have been working on with the University of Oxford. Here we have 17 different protocol operations within the scope of the spec (GETting, POSTing, PUTting, DELET(E)ing to all the various resources), and in many cases they share common requirements. How would we normalise the specification of those requirements?

A good example, which most protocol specifications will have to deal with, is that of Authentication. Assuming all your operations are authenticated, it’s pretty easy just to pull that out and say

All protocol operations MUST be authenticated

Things get a little tricky when a rule applies to some operations and not others, depending on the details of the operation. Suppose you want clients to provide Digest headers when they send body content in the request. Now your rule for this is something like:

All protocol operations which include body content in the request MUST provide a Digest header.

That’s not too bad, though it’s getting wordy. Then you come across the need for something like:

All protocol operations which modify an existing resource MUST send concurrency control headers except if the resource is of types X, Y or Z

or

All protocol operations which modify an existing resource and contain body content (i.e. not empty body requests) MUST respond with status code X or Y depending on what the server does with the content, except if the resource is of types A, B or C.

When you get to this kind of state, it’s tempting to think that normalisation has broken down, and it isn’t feasible or sensible to normalise any further (or, indeed, this far). And it’s certainly the case that somewhere around this point you have to start to accept a certain amount of de-normalised requirements. But we can do a little better than the above.

This comes down to figuring out which aspects of your protocol control the requirements, such that you can express the total requirements for a protocol operation as a combination of those aspects. The exact details will depend on what’s important in your protocol, but here’s how it worked out for SWORDv3:

  1. The request type — whether this was a create, update, append, replace, delete, etc.
  2. The body content — whether this was an empty body or not, and if not what kind of content was it
  3. The resource — to which URL class was the request being sent (e.g. to the parent resource, or to one of its sub-resources)

With those we can define the requirements on a protocol operation by the unique combination, such as

Create | With a JSON Document | on the Service URL

The next step is to break your aspects down into their possible values, and (if possible) a hierarchy. For example, the request type could be split up into Create, Retrieve, Append, Replace, Delete, as the distinct kinds of operation that are possible. We can then further place this in a hierarchy, adding a few intermediate layers for convenience, like this:

All Requests
| - Modify
| | - Create
| \ - Update
| | - Append
| \ - Replace
| - Retrieve
\ - Delete

Now taking advantage of those two steps, we can start to write our requirements as a set of statements that apply to a protocol operation based on a vector of those aspects.

Starting with our simple Authentication example, which was “All protocol operations MUST be authenticated”, we could write:

Request: All Requests
Content: Any Content
Resource: Any Resource
Requirements: MUST authenticate

Now for our next example “All protocol operations which include body content in the request MUST provide a Digest header.

Request: All Requests
Content: Body Content
Resource: Any Resource
Requirements: MUST provide a Digest header

How about “All protocol operations which modify an existing resource MUST send concurrency control headers except if the resource is of types X, Y or Z”? We can structure our resource hierarchy such that resources X, Y and Z are separate from all other resources; call the exclusive area “Resource Set A” and put X, Y and Z in “Resource Set B”. Then we can just say:

Request: Modify
Content: Any Content
Resource: Resource Set A
Requirements: MUST send concurrency control headers

And finally “All protocol operations which modify an existing resource and contain body content (i.e. not empty body requests) MUST respond with status code X or Y depending on what the server does with the content, except if the resource is of types A, B or C.

Request: Modify
Content: Body
Resource: Existing
Requirements: MUST respond with X or Y

This is a useful enhancement to how we represent our requirements in a couple of ways:

  • It’s easier to read, as it is consistently presented and removes a lot of the fluff of human language
  • It can be tabularised
  • Because it can be tabularised and uses a consistent vocabulary that you’ve created, it can become machine-readable

Since it can be tabularised, we should do that, and we’ll end up with something like this (where “All” or “Any” have been replaced by “*” for convenience).

Request | Content | Resource       | Requirements
--------+---------+----------------+--------------
* | * | * | MUST authenticate
* | Body | * | MUST provide a Digest header
Modify | * | Resource Set A | MUST send concurrency
| | | control headers
Modify | Body | Existing | MUST respond with X or Y

As an implementer of the spec, if I want to know the requirements for the protocol operation where I “Modify a resource in Resource Set A with body content in the request”, then when I take the above along with the hierarchy for each aspect, I can quickly derive the full list of actual requirements (in that case, it’s all of them).

For your normalised spec, then, this is virtually all you need to do. Choose a nice way to present the information, like in a table as above, and you have done your best to normalise an otherwise complex set of requirements.

We can still go a little further, though, since this tabular data is machine-readable.

Don’t write that table out as HTML, or in ASCII art in your spec. Instead, you should put it in a CSV. CSVs are great for machine-readability, with the additional bonus that you can open it in your favourite spreadsheet editor.

When you’ve done that, all you need is a short bit of code in your preferred language to read in the CSV, and do two cool things with it:

  1. Output it as HTML or ASCII art for the spec itself
  2. Analyse it exhaustively for every protocol operation or combination of aspects to produce extremely de-normalised requirements documentation, which you can hand to your implementers as supporting documentation (without ever having to maintain that documentation directly).

Suppose in our ongoing example that there are the following aspects:

  • Requests: Create, Retrieve, Append, Replace, Delete
  • Content: Empty Body, Binary File, JSON File
  • Resources: A, B, C and X, Y, Z

If every request can be carried out on every resource with every content body you have 90 possible cases that your implementers could be handling. (ok, this is not totally realistic, some of those combinations are unlikely, but the point remains). Provided your code knows each aspect and its hierarchy you can quickly generate the full list of requirements for any combination.

We haven’t defined enough requirements here to make the examples very interesting, and doing so would be very tedious for both of us, so let’s just try a few simple cases. Assuming Resource type A is in Resource Set A, and that it doesn’t yet exist:

Create | JSON Body | Resource A
* MUST authenticate
* MUST provide a Digest header
* MUST send concurrency control headers

Or how about

Retrieve | Empty Body | Resource B
* MUST authenticate

and so on.

I took this approach for the SWORDv3 protocol specification, and it has been highly successful, making it possible to actually manage the requirements across 17 protocol operations, with 6 request types, 7 body contents, and 7 resource types. If you want to take a look you can see the following:

Hope you find this useful!

Richard is Founder and Senior Partner at Cottage Labs, a software development consultancy specialising in all aspects of the data lifecycle. He’s occasionally on Twitter at @richard_d_jones

--

--

Richard D Jones

All things data: capture, management, sharing, viz. All-round information systems person. Founder at Cottage Labs. https://cottagelabs.com