Going beyond standard HTTP timeouts in GCP Workflows — the theory

Yurii Serhiichuk
Google Cloud - Community
3 min readJan 10, 2023
The gateway timeout HTTP 504 error

So you’ve adopted serverless and started using GCP Workflows for the orchestration. And everything is great, but suddenly you’re facing an issue with the timing out of one of your service calls.

A sample GCP Workflow

You’re going and checking that your Cloud Function, App Engine, or Cloud Run timeout limit is set to the maximum already (probably 60 minutes) and starting to dig up. Eventually, you understand that the Workflows HTTP connector has only 30 minutes timeout for HTTP service calls, so what can you do now?

First of all, maybe you need to reconsider if you’re doing everything right, and serverless with its tighter limits is a good fit for your task. But if that’s the case, please welcome under the hood.

Some analysis

So synchronous HTTP calls in GCP Workflows have an up to 30 minutes execution timeout limit, so let’s maybe try async solutions.

There are callbacks available, but what if you need to call a serverless GCP service and get a notification back? If you just send a long-lived HTTP request to Cloud Functions or Cloud Run they are going to shut down your request processing as soon as the response is sent back (yes, you may enable “CPU always allocated” but now you’re going to pay for this CPU way longer than usually actually needed and we don’t want that).

So we need a solution to keep an HTTP request and send back a callback to the Workflows to continue the execution.

Some prominent services from GCP that pop up in mind and are suited for async executions are Pub/Sub and Cloud Tasks. But you already know that Pub/Sub has only 10 minutes timeout for HTTP triggers, and using a pull subscription is not an option either because Cloud Functions and Cloud Run require an ongoing HTTP request.

So what about Cloud Tasks? Well, standard HTTP targets also have up to 30 minutes timeout, but… AppEngine targets have up to 24 hours of a timeout to AppEngine services with basic scaling! And that’s our loophole.

GCP services timeouts
GCP services HTTP calls timeouts

The solution

So here’s what we’re gonna do to go beyond the usual 10 or 30 minutes timeout.

Long running HTTP requests with GCP Workflows diagram
Long-running GCP Workflows HTTP requests setup

So whenever you want to have a long-running HTTP request with the Workflows you’d need to:

  1. Create a callback URL to notify the Workflows back.
  2. Create an async Cloud Tasks with the Task Runner App Engine service target and your service call details as the payload.
  3. Unwrap the Cloud Task payload and do a synchronous HTTP call from the Task Runner service to the destination service.
  4. Send back a callback with the service call results (if any).
  5. Continue the workflow execution.

The updated example

So how the example GCP Workflow may look now as we have the solution outlined?

A sample workflow with long-running HTTP calls approach embedded

That’s basically it. The only thing left is to implement the Task Runner service, set up all the infrastructure, and let your workflow calls run for much longer now.

And I will cover the Task Runner implementation along with an example of the infrastructure setup and the workflow in the next article.

I developed this approach while building a serverless data processing platform at Travelshift where we are building next-gen travel experience solutions. You can check it out at Guide to Europe and Guide to Iceland.

--

--

Yurii Serhiichuk
Google Cloud - Community

GCP Champion Innovator, 6x GCP Certified, tech-savvy Cloud Engineer. Troubleshooter and problem solver.