Source: Unsplash [https://unsplash.com/photos/HfFoo4d061A]

Designing RESTful query DSL

Gayan Perera
5 min readJun 12, 2023

--

Recently I had a chat with a colleague of mine and we ended up discussing the best way to design a restful API to query data repository. This kept me thinking about the best way to implement something generally. So this story will focus on a hypothetical use case to explain one of my suggestions to satisfy this requirement.

Now you might be thinking why query ? Arn’t API should focus on business queries that are fairly straightforward to implement like GET /suppliers?name=ACME ?

Yes, for services that provide or cover business use cases, it is fairly straightforward. But what if your service is a BFF (Backend For Frontend) service where the domain is to expose an analytical database/repository to consumers who are mainly using the APIs to visualize the data based on user-defined queries? Like PowerBI

Now before we move forward keep in mind, you should never expose such a query API from the core business services, Trying to do that might drag you to a very difficult situation maintaining your core business service down the line. I will be writing a different story about how one should separate such core business functionality and analytic needs in a future story, So stay tuned.

Requirement

Let's take a hypothetical example of a health analytical database, where you have anonymous patient diagnostic information and demographic data related to those patients for a given country.

Now in such a database, we have patient information such as

  • BirthYear
  • Gender
  • City
  • Marital status

And demographic data such as

  • City
  • Population

And we also have medical information such as

  • DiagnosisCode (ICD-10)
  • DiagnosisDate
  • AgeAtDiagnose

Now the consumers would like to query data using queries like below

  • Find all patients who are diagnosed with diabetes after the age of 30 years.
  • Find all patients who are diagnosed with depression before the age of 40 years and who are not married and who live in a metropolitan area with a population exceeding 900k residents.

Now above queries are some examples, but the number of queries is not limited since they are based on visualizations that are needed by consuming applications. So therefore we cannot define more granular APIs for this need.

A naive approach

A very naive approach is to define an API that provides a list of Boolean Expressions which can be connected using Logical Operators and each Expression containing operands and operators such as equality and relational. So such API would look as follows

openapi: 3.0.2
info:
title: PopulationDiagnosisData API
version: 1.0.0
servers:
- url: http://localhost:8080/search

paths:
/search:
post:
requestBody:
content:
application/json:
schema:
type: array
items:
$ref: '#/components/schemas/Criteria'
required: true
operationId: search
responses:
"200":
description: ""
content:
application/json:
schema:
$ref: "#/components/schemas/DataPoint"

components:
schemas:
Operators:
enum:
- eq
- neq
- gt
- lt
- gte
- lte
type: string
Criteria:
required:
- attribute
- operator
- value
type: object
properties:
attribute:
$ref: '#/components/schemas/Attributes'
operator:
$ref: '#/components/schemas/Operators'
value:
type: string
Attributes:
enum:
- age_at_diagnose
- city_population
- diagnose_code
- gender
- marital_status
- year_of_birth
- diagnose_date
- city
type: string

DataPoint:
type: object

An example query would look like

[
{
"attribute": "age_at_diagnose",
"operator": "lt",
"value": "30"
},
{
"attribute": "diagnose_code",
"operator": "eq",
"value": "AA0090"
}
]

As you can see if we want to find many diagnosis codes we need to add more constructs to support operators such as IN with this approach. And it's also hard to implement precedence over a group of expressions. Trying to evolve this to support such scenarios will end up with a complex RESTful API spec which is hard to follow or you might end up defining something similar to the bellow approach.

A DSL Approach

DSL stands for Domain-Specific Language, I had some experience creating a few DSL during the past and also using several DSLs in the past. So let's see how we can define a DSL to solve this problem with better usability for our consumers. What we will try to achieve is the readability of defined queries and make them fluent. I would like to mention that some inspiration for this came from

openapi: 3.0.2
info:
title: PopulationDiagnosisData API
version: 1.0.0
servers:
- url: http://localhost:8080/search
paths:
/search:
post:
requestBody:
content:
application/json:
schema:
$ref: "#/components/schemas/Query"
required: true
responses:
"200":
description: ""
content:
application/json:
schema:
$ref: "#/components/schemas/DataPoint"
operationId: search
components:
schemas:
Attributes:
enum:
- age_at_diagnose
- city_population
- diagnose_code
- gender
- marital_status
- year_of_birth
- diagnose_date
- city
type: string
Query:
oneOf:
- $ref: "#/components/schemas/MatchAllQuery"
- $ref: "#/components/schemas/MatchAnyQuery"
- $ref: "#/components/schemas/EqualQuery"
- $ref: "#/components/schemas/NotEqualQuery"
- $ref: "#/components/schemas/RangeQuery"

MatchAllQuery:
required:
- match_all
type: object
properties:
match_all:
minItems: 1
uniqueItems: false
type: array
items:
$ref: "#/components/schemas/Query"

MatchAnyQuery:
required:
- match_any
type: object
properties:
match_any:
minItems: 1
uniqueItems: false
type: array
items:
$ref: "#/components/schemas/Query"
EqualQuery:
required:
- equals
type: object
properties:
equals:
$ref: "#/components/schemas/EqualCondition"

NotEqualQuery:
required:
- not_equals
type: object
properties:
not_equals:
$ref: "#/components/schemas/EqualCondition"
EqualCondition:
required:
- attribute
- value
type: object
properties:
attribute:
$ref: "#/components/schemas/Attributes"
value:
type: string
RangeQuery:
type: object
required:
- range
properties:
range:
$ref: "#/components/schemas/RangeCondition"

Range:
type: object
properties:
value:
type: string

GreaterThanRange:
required:
- gt
properties:
gt:
$ref: "#/components/schemas/Range"

LessThanRange:
required:
- lt
properties:
lt:
$ref: "#/components/schemas/Range"

RangeCondition:
required:
- attribute
- values
type: object
properties:
attribute:
$ref: "#/components/schemas/Attributes"
values:
maxItems: 2
minItems: 1
uniqueItems: true
type: array
items:
oneOf:
- $ref: "#/components/schemas/GreaterThanRange"
- $ref: "#/components/schemas/LessThanRange"

DataPoint:
type: object

A sample query for this DSL would look like as follows

{
"match_all": [
{
"range": {
"attribute": "age_at_diagnose",
"values": [
{
"lt": {
"value": "30"
}
}
]
}
},
{
"range": {
"attribute": "year_of_birth",
"values": [
{
"gt": {
"value": "1980"
}
},
{
"lt": {
"value": "2000"
}
}
]
}
},
{
"match_any": [
{
"equals": {
"attribute": "diagnose_code",
"value": "AA0090"
}
},
{
"equals": {
"attribute": "diagnose_code",
"value": "AA0091"
}
}
]
}
]
}

The above example would be translated to

Find all data points which are diagnosed with codes AA0090 or AA0091 and diagnosed before the age 30 of patient who are borned between 1980 and 2000.

As you can see the DSL is extensible for more operators and the DSL is more domain friendly rather than mathematical. It's also readable and provides a way of defining precedence over a group of expressions. For example

            "match_any": [
{
"equals": {
"attribute": "diagnose_code",
"value": "AA0090"
}
},
{
"equals": {
"attribute": "diagnose_code",
"value": "AA0091"
}
}
]

will be evaluated as a single expression and the result will be combined with the outer expression.

I hope you can take some inspiration from this approach for your next query-based API design. If you would like to see an implementation of this API spec design let me know in the comments.

--

--