Your first OpenAPI document (Part II: data model)
Document your REST API using OpenAPI standard — Part II: data model
In the previous article we ended up with the following OpenAPI document draft:
In this post we focus on the data model definition, i.e. the
components/schemas section of the OpenAPI document, while in the next one we will cover the
A model represents an entity of our application domain with an associated type:
Usually a non-trivial model has an
object type, which provides the foundation for any custom data type — OpenAPI calls it Schema Object.
The Schema Object allows the definition of input and output data types. These types can be objects, but also primitives and arrays.
At its core every custom data type can have the following fields:
string: a built-in data type or a reference to an existing model
string: a human-friendly description of what a model represents in our domain
boolean: if the field can be nullable — remember:
nullis not a value itself, you only have this field to control this behavior (default:
boolean: if the field can appear in a response, but should be never sent in a request (default:
boolean: if the field can appear in a request, but should be never sent in a response (default:
boolean: if the field is deprecated and planned for removal
any: default value for an optional parameter (value must be consistent with field declared type)
array: a list of admitted values (useful for fields with a well-known and fixed set of possible values, e.g. person gender or task status; value must be consistent with field declared type)
A margin annotation: there’s no required attribute to specify if a certain field must have a value (which is different and must not be confused with nullable), but we need to rely on the
object required field instead (see next section for more details).
In the following sections we’ll review the specific fields available for each built-in data type while we try to build an OpenAPI definition for our hypothetical Book domain entity:
Since a book is a complex entity with multiple fields we define it as an OpenAPI
object and start outlining the list of all its fields and filling basic type information:
string: must be
object: a list of property names and corresponding types
array: a list of required properties
object is like a class in any object-oriented programming language, with its own type and definition. And like in OOP, with OpenAPI we can define a new object from scratch or:
- combine existing objects, i.e. declaring object fields with types of other objects (composition)
- extend an existing object, preserving its definition while changing existing properties (e.g. providing a default value) or adding new ones (inheritance)
In our Book example, we may assume language and title are mandatory fields, so we add them under the
required section, and we also add two
integer fields to store two very common database record timestamps: creation date and last modification date.
Now we can carefully review each field to enrich its definition with additional type constraints, depending on its type.
String (and file)
array: list of admitted values (e.g. person gender)
string: one of several built-in validation formats (e.g.
binaryfor files) — full list available here
integer: maximum number of characters
integer: minimum number of characters
string: a regular expression the string must match — note that you will get a partial match unless you use
An empty string is a perfectly valid value unless minLength or pattern is explicitly set to prevent such a value to be accepted.
- title: to avoid empty strings being potentially treated as valid titles, we explicitly set it to be at least one character long
- publicationDate: since it’s a date we set
- language: we use pattern to enforce a check for a syntactically valid ISO 639–1 two-letter language codes (note that I used
^...$delimiters for a global match)
- wikipedia: we use built-in
urlformat to ensure is a valid URL; we could also use pattern to check it actually points to a Wikipedia page
Numbers can be integer or floating-point values and have the following specific properties:
string: numeric precision for integers (
int64) and floating point numbers (
number: maximum value (included in the range unless
maximuminterval endpoint is excluded
number: minimum value (included in the range unless
minimuminterval endpoint is excluded
integer: the field value must be a multiple of a given positive value (safe for integers, not for floating point numbers due to limited precision of floating point math)
A timestamp is usually expressed as the number of seconds (or milliseconds, if additional precision is desired) elapsed since January 1st, 1970 UTC (commonly referred as Unix Epoch), so we define both of them as 64-bit integers using
int64 format and with a minimum value of zero.
This format is quite useful — from a calculator perspective — if we need to establish the relative order between two dates, since only a simple math comparison is required, or to sort them.
Beyond that, if we choose to manage (i.e. update) timestamp fields straight from the database, we can take advantage of the readOnly property.
Each array has the following specific properties:
integer: maximum length
integer: minimum length
boolean: if all values must be unique
The type of the array elements — under items section— can be:
- a single data type, including
- a mixed data type:
- a reference to another model:
This flexibility allows to define multidimensional arrays with any type of your choice. Here’s a brief recap of the most common array declarations:
- authors: author is a domain entity on its own, so if we define an Author model under section
components/schemas, we can easily refer it as type of the array elements using
$refattribute followed by the path of the model in our OpenAPI document (
#represents the document root)
- characters: in our simplified model we keep it as a list of strings, but the same reasoning could be applied here and led us to the definition of a Character model
- genres: a simple array of strings
- volumes: if we look again at the Book example, we could say each volume is a book on its own, so we can create a recursive data type of sort, where each book can refer other books inside any of its fields (like an SQL recursive foreign-key relationship)
For all these fields it makes sense to ensure values are unique by setting uniqueItems to
true. We definitely do not want the same author, genre, character or volume to appear multiple times in the corresponding array field.
How equality comparison will be actually performed will depend on successive design and implementation choices.
In the next couple of sections we are going to see how we can further reuse and combine models to represent more complex entity domains and relationships using inheritance, composition and polymorphism.
The OpenAPI Specification allows combining and extending model definitions using the
allOfproperty of JSON Schema, in effect offering model composition.
allOftakes an array of object definitions that are validated independently but together compose a single object.
allOf keyword you can do something more sophisticated than simply adding fields with model types (e.g. our Book authors): you can actually combine one or more models merging all their fields into a new model, which nevertheless has no type relationship with the merged one(s).
This is the most basic application of the reuse principle but it’s very common and often recommended over inheritance because of the greater flexibility and maintainability.
In our example, we may want to make a distinction between fictional and non-fictional works.
Inheritance and polymorphism (oneOf / anyOf)
While composition offers model extensibility, it does not imply a hierarchy between the models.
To support inheritance and polymorphism, the OpenAPI Specification provides the discriminator field in combination with
The semantic is quite straightforward: the model must conform to any (one or more) or exactly one of the referenced models and the optional discriminator field can remove ambiguity providing the name of a property, which must be present in all referenced models, whose value is used to identify the model it must be validated against.
anyOf are mostly used when defining the schema of an API endpoint request or response as indication of the fact that we may accept or return one or more different models.
In the next article we are going to see how to describe our API endpoints and use such keywords. In the meanwhile, you can find additional information and examples in the official documentation.
allOf offer a great amount of flexibility in terms of model reuse and combination; more information and examples are available in the official documentation.
After a field-by-field review, I leave you with the Book and Author models we came up with. The Book example provided at the beginning of this article should be slightly rewritten to be valid according to this specification, though, mostly because of authors and volumes fields.