API Bites — Filtering Conventions

Improving API Composability with Consistent Filtering Conventions

TRGoodwill
API Central
7 min readSep 14, 2022

--

Composable Data

Not every client system will be interested in all of the information provided by a business resource API — even if it is modeled around core business data and streamlined for composability. A predictably implemented parameter driven ‘read’ affordance model that includes field and collection filters can reduce over-fetching and enhance composability — without risking a proliferation of client-coupled response document models.

Query parameters commonly employed for this purpose include ‘fields’, ‘include’ and ‘sort’ parameters and field-based filtering. The exact name and syntax of the parameters matters less than the imperative that they are transparent in intent, and applied to a business resource context.

For example:

GET api.myorg.com/club/v1/applications?applicantId=12B34C&fields=applicationId,name

Data Field Selection

Fields Parameter

Consumers can specify the fields they wish to return in the response payload by specifying the required fields in a ‘fields’ query parameters.

Example GET requesting only the applicationId, familyName and givenName fields in the response:

GET api.myorg.com/club/v1/applications?fields=applicationId,familyName,givenName

Query parameter field names should always align with field naming conventions (e.g. camelCase, snake_case).

The response will return the requested fields, as well as any additional mandatory fields:

{
"applicationId": "65648987234",
"familyName": "SMITH",
"givenName": "John",
}

It is important to note that any mandatory (‘required’) fields specified in the schema MUST be returned in the response body regardless.

Note: Some implementations of this pattern will mandate that the filter be echoed in the response, e.g. "fields": "applicationId,familyName,givenName" — this increases transparency, however it is somewhat superfluous and verbose. API design standards should clearly define the pattern for universal application.

Include Parameter

A consuming system may consistently require one or more nested sub-resources when retrieving data. APIs with sub-resources defined in the same specification document as the parent resource may support an ‘include’ parameter to return nested sub-resource payload.

Performing an unqualified GET operation on a resource would typically return only top-level resource data:

GET /applicants/12345

Response:

}
"id": "1234",
"familyName": "SMITH",
"givenName": "Jane"
}

Retrieving sub-resources would require subsequent API calls. In the example below, retrieving ‘familyMembers’ and ‘memberReferrals’ sub-resources requires a second and third API call:

GET /applicants/12345/family-members
GET /applicants/12345/member-referrals

Alternatively, providing an ‘include’ parameter with a comma-separated list of required sub-resources would return the parent resource and nested arrays of sub-resource data in a single API call:

GET /applicants/12345?include=familyMembers,memberReferals

Response:

{
"id": "1234",
"familyName": "SMITH",
"givenName": "Jane",
"familyMembers": [
{ "familyName": "SMITH", "givenName": "Barbara"},
{ "familyName": "SMITH", "givenName": "Bob"}
],
"memberReferals": [
{ "memberId": "3456", "referalDate": "2021–01–01" }
]
}

Note that sub-resource arrays of indeterminate size cannot be supported, as pagination of included sub-resources is not possible.

Support for the ‘includes’ parameter requires adding the query parameter, allowed values and a description to the API specification, and modelling nested sub-resources as optional response content — ideally schema references shared with sub-resource content definitions. Consider the impact on the intelligibility of the response document. Implement only in support of genuine high-value use-cases.

Alternatively, an ‘embed’ query parameter mechanism works in much the same way, returning nested sub-resources in an explicit “_embedded” object. There is no global standard — choose the mechanism that is the best fit.

Simple Filtering

The use of parameters to filter a collection of resources based on field values should be broadly supported.

For example, the query parameters “?lastName=Jones” will filter the collection of resources for instances in which the field “lastName” matches the value “Jones”.

The query parameters “?lastName=Jones&birthDate=1990–02–02” will filter out the collection of resources with the field lastName that matches Jones and birthDate that matches 2nd of February, 1990.

A filter on a collection, such as “…/customers?lastName=Jones” must return a collection, even when a single instance is returned.

The equal (=) operator is the only supported operator when used in this technique. For other operators and conditions see the ‘Advanced Filtering’ section below.

Multi-value parameters (an array of values) may be supported.

In Swagger 2, a multi-value parameter must be defined with type: array and collectionFormat: csv.

parameters:
- in: query
name: color
type: array
collectionFormat: csv
items:
type: string

In OpenAPI 3, a query param array may be defined by the schema keyword.

parameters:
- in: query
name: color
schema:
type: array
items:
type: string

Passing Personally Identifying Information (PII) in query parameters should be avoided. Where common query use-cases would require passing PII, consider encapsulating input data with a ‘search’ function (covered below).

Complex Filtering

Patterns and conventions for advanced filtering are not so well worn, however the (new-ish) idempotent, cache-able HTTP QUERY method offers an opportunity to shape a standardized enterprise approach to moderately complex querys, and to safely encapsulate PII. It should be employed in preference to overloaded query parameter schemes for complex queries.

Payload Encapsulated HTTP QUERY Method

When supported end-to-end (unsupported by OpenAPI 3.0 atm), the idempotent, cache-able HTTP QUERY method with well articulated standardized enterprise syntax and semantics (covering, among other things, logic operators) may be applied transparently to a business resource to enable more complex queries. e.g.

QUERY /v1/subscriptions  {...}

It is important that query method syntax and semantics are strictly applied to the abstracted business resource representation, and never directly reflect database structures or queries.

Payload Encapsulated Search

Where the HTTP QUERY method is not fully supported, a ‘search’ API may encapsulate search parameters within a POST payload. The pattern is often applied for reasons of confidentiality and/or to constrain inputs. The context of a search function should be clearly reflected in the path. e.g.

POST api.myorg.com/club/v1/applications/search  {...}

A standardized QUERY-method-ready enterprise syntax is recommended.

Explicit Read Affordances

There will occasionally be obvious, common, high-traffic use-cases that suggest an explicit read affordance. Parameter-driven read affordance invocation pattens may in fact reveal such use-cases. It is important is that these affordances are generic, and not tightly coupled to the niche requirements of a specific client. e.g.

GET api.myorg.com/club/v1/applications/renewing

Need-to-Know or Job-to-be-Done Pattern

When some specific process or regulatory obligation applies to a business resource, a dedicated read affordance may be provided to encapsulate a number of complex filter conditions. In this case, the affordance name might be a verb or singular noun describing the job or query context. e.g.

GET api.myorg.com/club/v1/applications/ready-for-review

Filtering Case Sensitivity

As a general guide it is better to filter with case insensitivity, and to return similar matches than to return no matches at all.

For consistency and clarity, a decision to filter with case insensitivity or otherwise should be made at an enterprise level, clearly documented, and globally applied.

Query Services

The utility of query services extends well beyond support for composability, and implementation is likely to have ramifications for enterprise, product and application architecture.

When using elastic search, Lucene or equivalent search product, align with the syntax provided by the product. Otherwise consider well-documented, standardized mechanisms such as OData filter syntax or a GraphQL service, in all cases addressed in the context of the business resource.

ODATA: 
GET /orders/product-history?filter=productId eq 9876 and date ge …
HTTP QUERY:
QUERY /orders/product-history {filter="productId eq 9876 and date …
GraphQL:
POST /orders/graphql {“query”: “query productOrderHistory($product …

The topic of Query Services and Data-as-a-Product is touched on here.

Sorting

Data can often be handled more efficiently by clients when it is provided in a specific order, hence it is important to provide the flexibility to clients to specify sorting order when retrieving a collection.

An endpoint may support multiple sort fields by allowing comma-separated (“,”) sort fields. Sort fields should be applied in the order specified.

There is more than one sorting syntax, and most schemes offer a similar range of sorting options. In a simple scheme advocated by multiple specifications and standards including JsonAPI, the sort order for each sort field will be ascending unless it is prefixed with a minus (“-“), in which case it must be descending. e.g:

?sort=name,lastModified?sort=-name

It can be useful to support dot-separated (“.”) sort fields to request sorting based upon relational attributes. For example, a sort field of author.name could be used to request that the primary data be sorted based upon the name attribute of the author object.

If the server does not support sorting as specified in the query parameter sort, it should return 400 Bad Request.

Query Parameter Syntax and Amazon API Gateways

Query parameter names must align with enterprise field naming conventions (e.g. camelCase or snake_case). Query parameter expressions and values must be URL safe and percent-encoded.

Note that Amazon API Gateways do not fully implement RFC 3986 and are more restrictive — query parameters must conform to the regular expression:

^[a-zA-Z0–9:._$-]+$

An enterprise that makes use of Amazon API gateways should avoid field selection and filtering syntax that makes use of unsupported characters (such as square brackets ‘[]’).

Wrap-up

A parameter driven ‘read’ affordance model providing field and collection filters together with collection sorting mechanisms can reduce over-fetching and enhance the composability and usability of resource APIs. However, complex filtering and high-traffic use-cases will sometimes suggest dedicated use-case affordances or search functions. In either case, it is important that agreed conventions are consistently applied, and that the context and intent of the requested filter or query is clear.

--

--

TRGoodwill
API Central

Tim has several years experience in the delivery and evolution of interoperability frameworks and platforms, and currently works out of Berlin for Accenture ASG