Handling Huge Payloads Using Apigee And Google Cloud Storage

Paul Williams
Google Cloud - Community
4 min readFeb 1, 2023
A laptop with file cabinet drawers being pulled from the screen, evoking organization of files.

Sometimes, it’s necessary to transmit large data sets as part of an API transaction. Typical use cases involve medical imaging, document management, GIS, and other cases in which large files are inputs or outputs of processes fronted by APIs. The core challenge is that most API management gateways have a challenge to intercept and make decisions about the validity of the API call when the primary payload size is so large.

Why is this a problem?

API Management Systems exist as a proxy layer between a big, bad, dangerous world, and a network with valuable data and systems that must be protected from unauthorized, malicious, or naive actors. Within the typical API Management layer, any given transaction must be stored (typically in memory, but possibly on disk) and evaluated against a set of access, security, and protection policies before being allowed to continue. For the typical transaction of several kilobytes this is a trivial activity — the data are briefly stored while being analyzed and then passed along.

However, for larger payloads, this buffering activity is increasingly complex. Too many streams, and a system with the role to process these requests may be put under memory or storage pressure, causing instability or transaction failure.

What approaches work around this issue?

Of course there are workarounds that allow transactions that have large primary payloads to run through the API Management systems, given a few tradeoffs. In Apigee, there are two approaches: Proxy Streaming, and the Envoy Adapter for Apigee X. In particular, the Envoy Adapter has been tested and is working for payloads of arbitrary size. I was able to prove that it handled multi-gigabyte payloads uploaded, and showed no sign of pressure. Proxy Streaming is more impactful to the runtime, and may create compute pressure in the runtime layer as the volume of streams increases.

Both of these will allow a workaround for large payloads, but both have similar tradeoffs: only a subset of policies can be executed. Only those policies which make immediate decisions based on the headers can actually be executed. Furthermore, in the upload case, the upload stream remains open and consuming bandwidth until the upload is complete even if the upload would have been denied by access, quota or other policy constraints.

The recommended approach: Staging Files

Instead of posting large data as part of a primary payload, we recommend staging files in Google Cloud Storage (GCS) using signed URLs to control upload and download access.

Why Staging?

Staging is a good way to take advantage of the power and flexibility of Cloud Storage. Clients can use features like restartable uploads, parallel transmission, event triggers, and data pipelines.

How would that work?

There is one main mechanism for staging uploads, and one typical mechanism for download staging.

Upload Mechanism

Using this mechanism, Apigee is called with a “GET” request to fetch a signed URL that can be used to upload a file to your system. The application trigger is fired when the upload is completed, and additional processing can be done, as well as updating the state of the file for the client when they check on it next.

Sequence diagram describing a typical staging use case.

Variants of this mechanism may use “POST” requests to signal the creation of a new resource. The decision of which specific HTTP verb to use is orthogonal to the general design. Instead the HTTP verb and path should be a design consideration of the overall application.

Download Mechanism

Similarly in this mechanism, a request for a file is forwarded using a redirect to GCS.

A sequence diagram that describes a possible download staging workflow.

Is it secure?

Yes, a file staging area can be secured. The Signed URL pattern allows time limited access for a specific purpose. Using properly configured signed URLs will allow a client to upload or download exactly one object for a short time, limiting the ability of bad actors to attack the data being transmitted. URLs are signed using highly secured cryptographic signature mechanisms that can be defined and controlled by system administrators.

Access control is actually decided upon in the Apigee API layer, with all the security, identity management, traffic shaping, reporting, auditing and observability features gained.

Has it been done before?

There is a very comprehensive example of how to use Signed URLs with Apigee available with documentation and video demonstrations on GitHub.

Bonus Benefits

As a bonus of this architecture, the application processing has been removed from the upload or download flows. This effectively creates an asynchronous file handling structure that reduces the burden of file transmission from the compute resources of the application. We expect that this will reduce the marginal cost of transaction processing and increase your maximum transactions per compute node.

Get Started With Apigee

Getting started has never been easier. Anyone can also create an Apigee Evaluation project, which is a trial Apigee installation with zero cost for 60 days. With Apigee Pay-As-You-Go pricing, there’s no subscription negotiation or special licensing requirement. Just enable your Apigee infrastructure in your GCP project and start developing.

For more comprehensive help, see our Getting Started documentation and the Apigee — Google Cloud Community. See Apigee API Management pricing for more details.

--

--

Paul Williams
Google Cloud - Community

Paul is a professional services consultant exceeding 20 years of experience, currently working for Google Cloud to help enterprises launch cloud initiatives.