A Cloud Function enqueues two tasks to Cloud Tasks: one to be handled by  App Engine, another to be handled by Cloud Run.

Cloud Tasks is a little stateful

Adam Ross
Google Cloud - Community

--

Serverless products like Cloud Run are intended to host stateless services, allowing amazing horizontal scaling. External services like Firestore and Cloud Storage are often used to bring “state” back into the picture. In this post I’ll explore how Cloud Tasks can bring bits of statefulness to your application architecture for those use cases where a queueing system can help.

In this post I’m using the newly GA’d Cloud Tasks HTTP Targets, which allows each task you create to be configured for delivery to a web service. For a more general introduction to Cloud Tasks, go read Asynchronous Code Execution with Google Cloud Tasks then come back.

Applying “Stateful” to a Queue?

The premise of a queuing system is that information goes in, and eventually disappears when the work is done. Before explaining in more detail how this applies in Cloud Tasks, let’s talk definitions:

From Wikipedia and TechTarget:

a system is described as stateful if it is designed to remember preceding events or user interactions; the remembered information is called the state of the system.

While Cloud Tasks is not going to indefinitely remember preceding events, it does remember data from these events after they have otherwise been discarded by stateless components of the system, such as your Cloud Function or Cloud Run service. Let’s explore how Cloud Tasks uses this otherwise forgotten information.

Cloud Tasks supports task-specific properties

In Cloud Tasks, you create a task which represents a bit of work to be done later at a pace determined by the configuration of a particular queue. Each task can carry unique properties which propagates details of the current request into the queue. This allows you to skip storing the data of this work item in a database.

Sequence Diagram: Cloud Run can create Cloud Tasks which carry custom HTTP parameters propagated to the task handler.
Unique request properties pass outside the lifetime of the Cloud Run process, and propagate to the Cloud Tasks handler in App Engine. Diagram created on https://bramp.github.io/js-sequence-diagrams/

When you create a task, you are defining an HTTPS request which will be dispatched to a handler which processes the request. That handler can pull anything out of the HTTP request, and use that to look up data in other systems.

At a minimum, every HTTP Targets task needs a URL for dispatch of the task. Of course, a URL can include arbitrary path and query string components as one mechanism to carry some data. Beyond that, the HTTP Method and Body can be overridden, and arbitrary HTTP headers added to the request. Here’s an example of using gcloud to create a task with a customized query string for “search parameters” and a slightly more subtle header to carry a user ID.

gcloud tasks create-http-task --queue my-queue \
--url "https://search.example123-uc.a.run.app?s=piano&color=red" \
--header "Request-User-Id: user123"
Created task [projects/my-project/locations/us-central1/queues/my-queue/tasks/1234567890].

Instead of storing a request object to a database, the key details of this request are stored on the task. On the other hand, the key account details for user123 are kept private and looked up by ID as-needed.

Cloud Tasks let’s you check task status

Once the task is created, dispatch will depend on how the queue is configured, the performance of the handlers at the other end of the URL, and how successful processing is generally. Until the task is completed, you can inspect its details to confirm it has the HTTP properties you expect, and check on how delivery is going.

You can check on the delivery details of a task using the gcloud or the API. gcloud has a handy command which leverages the Cloud Tasks API to get the task object:

gcloud tasks describe projects/my-project/locations/us-central1/queues/my-queue/tasks/1234567890createTime: '2019–09–16T19:01:09Z'
dispatchCount: 9
dispatchDeadline: 600s
firstAttempt:
dispatchTime: '2019–09–16T19:01:09.525182Z'
lastAttempt:
dispatchTime: '2019–09–16T19:01:37.984184Z'
responseStatus:
code: 9
message: 'FAILED_PRECONDITION(9): HTTP status code 405'
responseTime: '2019–09–16T19:01:38.007307Z'
scheduleTime: '2019–09–16T19:01:37.982882Z'
name: projects/my-project/locations/us-central1/queues/my-queue/tasks/1234567890
scheduleTime: '2019–09–16T19:02:03.607307Z'
view: BASIC

In these details we see Cloud Tasks has attempted delivery 9 times, and is getting an HTTP status code of 405. It will keep trying (depending on your configuration up to 30 days) until it gets a 2xx response code.

In this case it looks like the task handler is not expecting the Cloud Tasks default of using HTTP POST requests. There’s no task update option, so the options are to change the handler service to support POST or delete and recreate the task.

Let’s try recreating the task using the GET method:

gcloud tasks delete projects/my-project/locations/us-central1/queues/my-queue/tasks/1234567890
Deleted task [1234567890]
gcloud tasks create-http-task --queue my-queue \
--url "https://search.example123-uc.a.run.app?s=piano&color=red" \
--header "Request-User-Id: user123" \
--method GET
Created task [projects/my-project/locations/us-central1/queues/my-queue/tasks/2468101214161820].

Trying that same describe command again, we see the task no longer exists, and the error from the API shows it’s gone:

{
"error": {
"code": 404,
"message": "Requested entity was not found.",
"status": "NOT_FOUND"
}
}

If we try describing our new task, we might see it’s already been processed:

{
"error": {
"code": 404,
"message": "The task no longer exists, though a task with this name existed recently. The task either successfully completed or was deleted.",
"status": "NOT_FOUND"
}
}

This means we can request the task details and use the response to answer many questions, including:

  • Is the handler service broken? Which status code is it sending?
  • Does the task have all the properties we expect?
  • Has the task been completed?

You may not want to use Cloud Tasks in lieu of a status field in your database since the “completion” message will disappear.

Cloud Tasks handles timing

When you create a task, you can set a schedule-time property to have your task delivered up to 30 days in the future. This means your front-line services can keep adding work even during maintenance periods for backend systems.

Cloud Tasks let’s you uniquely name a task

The task IDs thrown around in some of the commands above are pretty numerical, but as the developer you can specify an ID unique to your queue. If it’s unique, it will be the only ID you need to deal with.

If it’s not, Cloud Tasks will helpfully send back an error. You might think, GREAT another error to handle; however, this is a fantastic feature. It means you can implement a circuit breaker around the creation of asynchronous workloads.

For example, suppose you have a Content Management System which needs to update a sitemap at the end of any 24 hour period that had any content update. You don’t want to build the sitemap if there’s no change, and you don’t want to build it more than once.

Using Cloud Tasks deduplication and scheduling features, you can easily build a “trip-line” mechanism that will run your job at 2am if at least one content update has been made:

Sequence Diagram: Editor publishes many pieces of content, each tries to create a sitemap rebuild task. Only one is executed.
No matter how many times a given event is triggered, only one instance of the named task will be created, and only one is needed. Diagram created on https://bramp.github.io/js-sequence-diagrams/
gcloud tasks create-http-task sitemap-rebuild --queue my-queue\
--url "https://cms.example123-uc.a.run.app/sitemap-rebuild" \
--schedule-time 2019-09-17T05:30:00Z
# First Time
Created task [projects/my-project/locations/us-central1/queues/my-queue/tasks/sitemap-rebuild].
# Second Time, get an HTTP 409 error
ERROR: (gcloud.tasks.create-http-task) ALREADY_EXISTS: Requested entity already exists

In this post we’ve seen many ways that Cloud Tasks can provide little bits of application state that serverless apps might need for success:

  • persist request-specific details for later processing
  • share details on the progress of task processing
  • schedule delivery to your timeline requirements
  • build “trip-lines” so you don’t need to do extra work to prevent your task handlers from doing extra work

To walk through an example of using Cloud Tasks to transmit state to an asynchronous workload, check out the postcard tutorial in the Cloud Tasks docs.

--

--