API Bites — Binary and Multi-Part Content

Handling Binary and Multi-Part Content in a Business Resource Context

TRGoodwill
API Central
4 min readOct 6, 2022

--

Binary and Multipart Content

Some resources have one or more binary documents associated with them. For example identity verification might require multiple supporting documents. REST interfaces are, however, primarily concerned with data that can be serialized and parsed (and/or validated) by services and platforms that processes HTTP requests.

Implementation details for large binary uploads are often necessarily different from small JSON payloads (for example virus scanning, different tuning for HTTP variables for efficient compression, different DDOS protection strategies, etc). From a client point of view, consumers of business resource APIs are not primarily interested in raw binary data, but rather core business facts. For these reasons, care needs to be taken with the modeling of binary data to avoid unnecessary imposts on performance and availability.

File Upload and Download

If at all possible, binary data should be modeled as a dedicated sub-resource on a separate path to facilitate upload and download as discrete operations.

Singleton

The simplest use case (and cleanest interface) is the POST of a binary file to a dedicated binary sub-resource path (with descriptive names such as ‘[name]-document’, ‘[name]-image’) in a separate, or series of separate API calls.

This definition would correspond to the following HTTP request:

If the binary file is a singleton (e.g. ‘profile-image’) then your work is done. However, if the binary file belongs to a collection (e.g. ‘profile-images’), the file upload must return a unique id, and the API will need to be able to describe the collection to API clients.

Server Generated Metadata

To facilitate a one-step binary upload to a collection, and/or when there is a requirement for server generated metadata around binary content (id, file format, size, EXIF data, upload date etc), a new ‘anchor’ sub-resource should be created, and an id returned as per enterprise payload conventions. Modelling the binary data as a singleton child of the created sub-resource on a separate path (clearly identified with a name such as ‘binary-[file-type]’) enables us to avoid embedding the binary data with our generated structured data —this is an unconventional abstraction that might be mitigated with a convention to specify the specific path to the binary data in the returned location header, and/or returning a URL to the binary content in the payload.

Request:

Response:

Upload of Binary Data and Metadata

When client upload of metadata associated with binary content is required, simple binary upload might still be supported in a multi-step approach.

Request and Response step 1:

Again, the path for upload of binary content should in this case be returned in the payload, and/or as a link relation — a convention that should be articulated in enterprise API design standards.

Request and Response step 2:

Note: a multi-step approach can result in incomplete/invalid records — ensure that a strategy is in place to manage this possibility.

While the simple solution is almost always best, especially when defining intelligible, robust interfaces, there are occasions when more complex interactions must be supported.

Multipart

When a resource or sub-resource includes document metadata and/or more than one binary file, support for multipart message may be required.

In OpenAPI 3.0, you can describe a mixed-format payload containing binary files with multipart requests. Use the requestBody keyword to describe request payloads containing a file or multiple files. File uploads typically use the ‘multipart/form-data’ media type. Mixed-data requests usually use ‘multipart/mixed’ media type. Care should be taken not to exceed message size limits, or API SLA’s.

By default, the Content-Type of individual request parts is set automatically according to the type of the schema properties that describe the request parts:

To declare a specific Content-Type for a request part (such as ‘image/png’ and/or ‘image/jpeg’), use the encoding/{property-name}/contentType field, as per the following example:

This definition would correspond to the following HTTP request:

Refer to the OpenAPI 3.0 Specification for File Upload and Considerations for file uploads.

Embedded Binary Content

Small binary content may be embedded in a JSON payload as a base64-encoded string. Embedded binary content may introduce latency wherever payloads are parsed or schema validated (including API gateways), and should be employed sparingly if at all. Define a maxLength property to constrain binary file sizes and safeguard API performance and availability.

Encoding

Unicode Transformation Format-8 (UTF-8) is the standard encoding type for all text and textual representations of data through APIs, and is the default encoding for JSON (RFC 7159).

Wrap-up

If at all possible, model binary data as an interdependently fetchable sub-resource on its own path. Target the cleanest, most intelligible interface for the use-case. If multipart message support is required, provide as much definition to the interface as possible. If embedded binary content is unavoidable, apply aditional constraints to safeguard performance and availability.

--

--

TRGoodwill
API Central

Tim has several years experience in the delivery and evolution of interoperability frameworks and platforms, and currently works out of Berlin for Accenture ASG