Your first OpenAPI document (Part II: data model)

Document your REST API using OpenAPI standard — Part II: data model

In the previous article we ended up with the following OpenAPI document draft:

In this post we focus on the data model definition, i.e. the components/schemas section of the OpenAPI document, while in the next one we will cover the paths section.

A model represents an entity of our application domain with an associated type:

  • string
  • number
  • integer
  • boolean
  • array
  • object

Usually a non-trivial model has an object type, which provides the foundation for any custom data type — OpenAPI calls it Schema Object.

The Schema Object allows the definition of input and output data types. These types can be objects, but also primitives and arrays.

At its core every custom data type can have the following fields:

  • type* string : a built-in data type or a reference to an existing model
  • description string : a human-friendly description of what a model represents in our domain
  • nullable boolean : if the field can be nullable — remember: null is not a value itself, you only have this field to control this behavior (default: false)
  • readOnly boolean : if the field can appear in a response, but should be never sent in a request (default: false)
  • writeOnly boolean : if the field can appear in a request, but should be never sent in a response (default: false)
  • deprecated boolean : if the field is deprecated and planned for removal
  • default any : default value for an optional parameter (value must be consistent with field declared type)
  • enum array : a list of admitted values (useful for fields with a well-known and fixed set of possible values, e.g. person gender or task status; value must be consistent with field declared type)

A margin annotation: there’s no required attribute to specify if a certain field must have a value (which is different and must not be confused with nullable), but we need to rely on the object required field instead (see next section for more details).

In the following sections we’ll review the specific fields available for each built-in data type while we try to build an OpenAPI definition for our hypothetical Book domain entity:

How a book could be represented as a YAML document

Object

Since a book is a complex entity with multiple fields we define it as an OpenAPI object and start outlining the list of all its fields and filling basic type information:

  • type* string : must be object
  • properties* object : a list of property names and corresponding types
  • required array : a list of required properties

An object is like a class in any object-oriented programming language, with its own type and definition. And like in OOP, with OpenAPI we can define a new object from scratch or:

  • combine existing objects, i.e. declaring object fields with types of other objects (composition)
  • extend an existing object, preserving its definition while changing existing properties (e.g. providing a default value) or adding new ones (inheritance)

In our Book example, we may assume language and title are mandatory fields, so we add them under the required section, and we also add two integer fields to store two very common database record timestamps: creation date and last modification date.

Now we can carefully review each field to enrich its definition with additional type constraints, depending on its type.

String (and file)

  • enum array : list of admitted values (e.g. person gender)
  • format string : one of several built-in validation formats (e.g. email, url, date and binary for files) — full list available here
  • maxLength integer : maximum number of characters
  • minLength integer : minimum number of characters
  • pattern string: a regular expression the string must match — note that you will get a partial match unless you use ^...$ delimiters
An empty string is a perfectly valid value unless minLength or pattern is explicitly set to prevent such a value to be accepted.
  • title: to avoid empty strings being potentially treated as valid titles, we explicitly set it to be at least one character long
  • publicationDate: since it’s a date we set date format
  • language: we use pattern to enforce a check for a syntactically valid ISO 639–1 two-letter language codes (note that I used ^...$ delimiters for a global match)
  • wikipedia: we use built-in url format to ensure is a valid URL; we could also use pattern to check it actually points to a Wikipedia page

Number

Numbers can be integer or floating-point values and have the following specific properties:

  • format string : numeric precision for integers (int32,int64) and floating point numbers (float,double)
  • maximum number : maximum value (included in the range unless exclusiveMaximum is true)
  • exclusiveMaximum boolean : if maximum interval endpoint is excluded
  • minimum number : minimum value (included in the range unless exclusiveMinimum is true)
  • exclusiveMinimum boolean : if minimum interval endpoint is excluded
  • multipleOf integer : the field value must be a multiple of a given positive value (safe for integers, not for floating point numbers due to limited precision of floating point math)
Timestamp fields with format, minimum and readOnly constraints

A timestamp is usually expressed as the number of seconds (or milliseconds, if additional precision is desired) elapsed since January 1st, 1970 UTC (commonly referred as Unix Epoch), so we define both of them as 64-bit integers using int64 format and with a minimum value of zero.

This format is quite useful — from a calculator perspective — if we need to establish the relative order between two dates, since only a simple math comparison is required, or to sort them.

Beyond that, if we choose to manage (i.e. update) timestamp fields straight from the database, we can take advantage of the readOnly property.

Array

Each array has the following specific properties:

  • maxItems integer : maximum length
  • minItems integer : minimum length
  • uniqueItems boolean : if all values must be unique

The type of the array elements — under items sectioncan be:

  • a single data type, including object and array
  • a mixed data type: oneOf
  • a reference to another model: $ref
  • anything: {}

This flexibility allows to define multidimensional arrays with any type of your choice. Here’s a brief recap of the most common array declarations:

Array definitions: single or mixed type, model reference(s), arbitrary and multi-dimensional
  • authors: author is a domain entity on its own, so if we define an Author model under section components/schemas, we can easily refer it as type of the array elements using $ref attribute followed by the path of the model in our OpenAPI document (#/components/schemas/Author where # represents the document root)
A basic example of an Author model
  • characters: in our simplified model we keep it as a list of strings, but the same reasoning could be applied here and led us to the definition of a Character model
  • genres: a simple array of strings
  • volumes: if we look again at the Book example, we could say each volume is a book on its own, so we can create a recursive data type of sort, where each book can refer other books inside any of its fields (like an SQL recursive foreign-key relationship)

For all these fields it makes sense to ensure values are unique by setting uniqueItems to true. We definitely do not want the same author, genre, character or volume to appear multiple times in the corresponding array field.

How equality comparison will be actually performed will depend on successive design and implementation choices.

In the next couple of sections we are going to see how we can further reuse and combine models to represent more complex entity domains and relationships using inheritance, composition and polymorphism.

Composition (allOf)

The OpenAPI Specification allows combining and extending model definitions using the allOf property of JSON Schema, in effect offering model composition. allOf takes an array of object definitions that are validated independently but together compose a single object.

With the allOf keyword you can do something more sophisticated than simply adding fields with model types (e.g. our Book authors): you can actually combine one or more models merging all their fields into a new model, which nevertheless has no type relationship with the merged one(s).

This is the most basic application of the reuse principle but it’s very common and often recommended over inheritance because of the greater flexibility and maintainability.

In our example, we may want to make a distinction between fictional and non-fictional works.

Inheritance and polymorphism (oneOf / anyOf)

While composition offers model extensibility, it does not imply a hierarchy between the models.

To support inheritance and polymorphism, the OpenAPI Specification provides the discriminator field in combination with anyOf or oneOf keyword.

The semantic is quite straightforward: the model must conform to any (one or more) or exactly one of the referenced models and the optional discriminator field can remove ambiguity providing the name of a property, which must be present in all referenced models, whose value is used to identify the model it must be validated against.

oneOf and anyOf are mostly used when defining the schema of an API endpoint request or response as indication of the fact that we may accept or return one or more different models.

In the next article we are going to see how to describe our API endpoints and use such keywords. In the meanwhile, you can find additional information and examples in the official documentation.

anyOf , oneOf and allOf offer a great amount of flexibility in terms of model reuse and combination; more information and examples are available in the official documentation.

Conclusion

After a field-by-field review, I leave you with the Book and Author models we came up with. The Book example provided at the beginning of this article should be slightly rewritten to be valid according to this specification, though, mostly because of authors and volumes fields.