Callbacks with AWS Step Functions
Pausing your execution until an asynchronous task is finished
Working with Amazon Web Services, Step Functions prove an invaluable resource to define your system’s workflows. If you find yourself working with asynchronous steps in these flows, you may be wondering how best to integrate them into your Step Functions, especially if these steps take hours, days, or even months to finish.
Fear not, for AWS announced callbacks for Step Functions in late May 2019. Offered at no extra cost, callbacks make it easier to define asynchronous tasks and configure the Step Functions to wait for these tasks to finish before proceeding to the next step.
Using the AWS Cloud Development Kit (CDK) for TypeScript, let’s implement Step Functions that use a callback task. Starting with an SQS queue and a Lambda function to consume from this queue, let’s configure the task to push messages onto the queue and wait for the Lambda function to return the outcome — either a success or failure.
We begin with the SQS Queue and Lambda Function CDK constructs. The queue is configured to retain messages for at most a minute, whilst the Lambda function is set to run Python 3.7 code. The code resides in file
lambda/app.py with entry-point method
lambda_handler will automatically consume messages from the queue as soon as they arrive, reading them in batches and processing each one sequentially. We’ll cover this Python code in greater detail when we’ve defined the callback task.
First we need to create the Step Functions for the stack—consisting of a sequence of states to define our execution flow. These states take the form of tasks, choices, and more to deliver a wealth of functionality for our systems. We’ll keep this one simple — defining a single callback task.
SendToQueue Step Functions tasks construct, we want to push messages onto the SQS queue. These messages will consist of a randomly generated UUID and a task token, the latter of which is defined automatically in the Step Functions’ Context. By setting the integration pattern to
WAIT_FOR_TASK_TOKEN, we take advantage of the callback functionality — forcing the Step Functions to wait until this task finishes before proceeding to the next.
The token is important to us: we need to return this when the task finishes, otherwise the Step Functions will wait until they eventually time out. If we were to follow default settings, we would be waiting a year before this happens!
With that in mind, let’s configure a Step Functions state machine to execute this task. We’ll reduce the timeout to two minutes and mark the callback task both as the starting state and the end state for this machine.
With an understanding of the role that the task tokens play, now is a good time to dive into the Python Lambda code. There’s a few points of particular interest to explain. First, we send a task heartbeat to the Step Functions using Python’s Boto 3 library for AWS. This heartbeat sends an acknowledgement to the state machine that the task (identified by its token) remains in progress, resetting the timeout for another two minutes.
Now we have to decide whether to mark the task as a success or failure. This is done at random with Python’s
random package, assigning
True, we’ll return a task success to the state machine, otherwise we’ll return a task failure. In both cases we provide the task token to identify which task has finished.
Almost there! With the Lambda function configured to send task heartbeats, failures, and successes, we need to supply it the permissions to do these operations. Using AWS Identity Access Management (IAM), we need to define a policy statement listing these actions and assign it to the Lambda function.
Building and deploying the updated CDK stack (
npm run build followed by
cdk deploy), we’ll find our new state machine in the Step Functions console. To take it for a spin, select “Start execution” at the top right corner, then click “Start execution” on the next window without entering any further details.
After the second “Start execution” click, we’re presented with a diagram to illustrate the execution flow. Almost immediately you should find the execution either succeeds (green) or fails (red) — a manual refresh of the page will be required if the state remains in progress (blue). Repeating this should deliver alternating results.
There you have it — a Step Functions callback task! Sending messages to SQS, consuming those messages with a Lambda function, which then reports back to the Step Functions with the task outcome. All the code seen here is available to fork on GitLab if you fancy testing this stack for yourself.