API Bites — Binary and Multi-Part Content

Handling Binary and Multi-Part Content in a Business Resource Context

TRGoodwill
API Central
4 min readOct 6, 2022

--

Binary and Multipart Content

Some resources have one or more binary documents associated with them. For example identity verification might require multiple supporting documents. REST interfaces are, however, primarily concerned with data that can be serialized and parsed (and/or validated) by services and platforms that processes HTTP requests.

Implementation details for large binary uploads are often necessarily different from small JSON payloads (for example virus scanning, different tuning for HTTP variables for efficient compression, different DDOS protection strategies, etc). From a client point of view, consumers of business resource APIs are not primarily interested in raw binary data, but rather core business facts. For these reasons, care needs to be taken with the modeling of binary data to avoid unnecessary imposts on performance and availability.

File Upload and Download

If at all possible, binary data should be modeled as a dedicated sub-resource on a separate path to facilitate upload and download as discrete operations.

Singleton

The simplest use case (and cleanest interface) is the POST of a binary file to a dedicated binary sub-resource path (with descriptive names such as ‘[name]-document’, ‘[name]-image’) in a separate, or series of separate API calls.

paths:
/members/(id)/profile-image:
post:
summary: Add a profile image
requestBody:
content:
image/*:
schema:
type: string
format: binary

This definition would correspond to the following HTTP request:

POST /members/12C4/profile-image
Content-Length: 808
Content-Type: image/png

[file content goes here]

If the binary file is a singleton (e.g. ‘profile-image’) then your work is done. However, if the binary file belongs to a collection (e.g. ‘profile-images’), the file upload must return a unique id, and the API will need to be able to describe the collection to API clients.

Server Generated Metadata

To facilitate a one-step binary upload to a collection, and/or when there is a requirement for server generated metadata around binary content (id, file format, size, EXIF data, upload date etc), a new ‘anchor’ sub-resource should be created, and an id returned as per enterprise payload conventions. Modelling the binary data as a singleton child of the created sub-resource on a separate path (clearly identified with a name such as ‘binary-[file-type]’) enables us to avoid embedding the binary data with our generated structured data —this is an unconventional abstraction that might be mitigated with a convention to specify the specific path to the binary data in the returned location header, and/or returning a URL to the binary content in the payload.

Request:

POST /members/12C4/profile-images
Content-Length: 808
Content-Type: image/png

[file content]

Response:

HTTP/1.1 200 OK
Content-Type: application/json
Location: /members/12C4/profile-images/98B3/binary-image

{
"id": "98B3"
"imageUrl": "/members/12C4/profile-images/98B3/binary-image"
}

Upload of Binary Data and Metadata

When client upload of metadata associated with binary content is required, simple binary upload might still be supported in a multi-step approach.

Request and Response step 1:

POST /members/12C4/profile-images  {...}HTTP/1.1 200 OK
Content-Type: application/json
Location: /members/12C4/profile-images/98B3
{
"id": "98B3",
"imageUrl": "/members/12C4/profile-images/98B3/binary-image"
}

Again, the path for upload of binary content should in this case be returned in the payload, and/or as a link relation — a convention that should be articulated in enterprise API design standards.

Request and Response step 2:

POST /members/12C4/profile-images/98B3/binary-image
Content-Length: 808
Content-Type: image/png

[file content]
HTTP/1.1 200 OK
Location: /members/12C4/profile-images/98B3/binary-image

Note: a multi-step approach can result in incomplete/invalid records — ensure that a strategy is in place to manage this possibility.

While the simple solution is almost always best, especially when defining intelligible, robust interfaces, there are occasions when more complex interactions must be supported.

Multipart

When a resource or sub-resource includes document metadata and/or more than one binary file, support for multipart message may be required.

In OpenAPI 3.0, you can describe a mixed-format payload containing binary files with multipart requests. Use the requestBody keyword to describe request payloads containing a file or multiple files. File uploads typically use the ‘multipart/form-data’ media type. Mixed-data requests usually use ‘multipart/mixed’ media type. Care should be taken not to exceed message size limits, or API SLA’s.

By default, the Content-Type of individual request parts is set automatically according to the type of the schema properties that describe the request parts:

------------------------------------------------------------------
| Schema Property Type | Content-Type |
------------------------------------------------------------------
| Complex value or | application/json |
| array of complex values | |
------------------------------------------------------------------
| Primitive or | text/plain |
| array of primitives | |
------------------------------------------------------------------
| Binary string | application/octet-stream |
| or base64 format | |
------------------------------------------------------------------

To declare a specific Content-Type for a request part (such as ‘image/png’ and/or ‘image/jpeg’), use the encoding/{property-name}/contentType field, as per the following example:

paths:
/attachments:
post:
summary: Add a new file
requestBody:
content:
multipart/form-data:
schema:
type: object
properties:
certificationNumber:
type: integer
certificateImage:
type: string
format: binary
encoding:
certificateImage:
contentType: application/octet-stream

This definition would correspond to the following HTTP request:

POST /upload HTTP/1.1
Content-Length: 428
Content-Type: multipart/form-data; boundary=abcde12345

--abcde12345
Content-Disposition: form-data; name="imageId"
Content-Type: text/plain

123e4567-e89b-12d3-a456-426655440000

--abcde12345
Content-Disposition: form-data; name="profileImage"; filename="image1.png"
Content-Type: image/png

[file content goes here]

--abcde12345--

Refer to the OpenAPI 3.0 Specification for File Upload and Considerations for file uploads.

Embedded Binary Content

Small binary content may be embedded in a JSON payload as a base64-encoded string. Embedded binary content may introduce latency wherever payloads are parsed or schema validated (including API gateways), and should be employed sparingly if at all. Define a maxLength property to constrain binary file sizes and safeguard API performance and availability.

application/json:
schema:
type: object
properties:
customerId:
type: string
thumbnailImage:
type: string
format: byte
maxLength:4096
description: Base64-encoded selfie thumbnail

Encoding

Unicode Transformation Format-8 (UTF-8) is the standard encoding type for all text and textual representations of data through APIs, and is the default encoding for JSON (RFC 7159).

Wrap-up

If at all possible, model binary data as an interdependently fetchable sub-resource on its own path. Target the cleanest, most intelligible interface for the use-case. If multipart message support is required, provide as much definition to the interface as possible. If embedded binary content is unavoidable, apply aditional constraints to safeguard performance and availability.

--

--

TRGoodwill
API Central

Tim has several years experience in the delivery and evolution of interoperability frameworks and platforms, and currently works out of Berlin for Accenture ASG