3 Ways to Deal with Long-Running Tasks in Lambda Using Step Function

Sushant Raje
4 min readJul 20, 2023

--

Running long or high compute tasks with AWS Lambda alone can be challenging. However, you can leverage the power of AWS Step Functions to overcome this challenge. In this guide, we’ll explore the various ways Step Functions can help you manage long-running tasks with AWS Lambda.

Step function in a nutshell

AWS Step Functions is a serverless workflow orchestration service that helps you build and coordinate multi-step applications. Step Functions can scale to handle any workload.

Step Function is based on the concepts of tasks and state machines. You can define state machines using the JSON-based Amazon States Language. A state machine is a series of steps that make up your application’s workflow. Each step in a state machine is called a state. A state can be of various types i.e. Pass, Task, Choice, Wait, Parallel, Map, Succeed, Fail

Task state represents a unit of work that is performed by another AWS service such as AWS Lambda. The output of one task can be the input of another task.

Once you have defined your workflow, step functions will execute the steps in order. If a step fails, Step Functions will retry the step a specified number of times. Step Functions will fail the workflow if the step still fails after the retries have been exhausted.

Step Functions provides a graphical console to arrange and visualize the components where you can drag and drop them. Configuration of multi-step applications is made simple with this.

Here are 3 ways to deal with long-running lambda using the step function

Map State

If you are iterating a list and each item takes a longer time to execute, then the Map state can help to speed up the processing of the data.

The Map state in AWS Step Functions allows you to run a set of workflow steps for each item in a dataset. Iterations of Map state can be executed parallelly based on concurrency which makes it possible to process a dataset quickly.

Iterator Pattern

Use an iterator pattern to gracefully shut down the Lambda function if it has less than 60 seconds remaining, and maintain a Boolean flag to check if the process is completed. If not, reinvoke the Lambda function.

// Check the remaining time before each long-running execution task 
// Maintain flags to resume execution or pass execution state to next state
// Pass a boolean flag IsCompleted to check whether the previous lambda
// execution is completed or not
// You can use db/redis cache to save intermediate state/values
if (context.RemainingTime.TotalSeconds >= 60)

Callback Pattern

AWS Step Functions can handle long-running workflows by moving long-running code out of Lambda into EC2, EKS, batch, etc., and then invoking it from a Lambda function or pushing a message into SQS SNS with a callback token. Once execution is completed, Step Functions can be resumed using the same callback token with successful or failed results. The standard workflow step function can be in a running state for 365 days. If timeout and heartbeat properties are configured in a callback pattern and if the task runs longer than the specified timeout duration and time elapses between two heartbeats, then the state will fail.

//to report that the task identified by the task token succeeded.
amazonStepFunctionsClient.SendTaskSuccessAsync(new SendTaskSuccessRequest
{
Output = "<response object to process by step function>",
TaskToken = "callbacktoken"
});

// to report that the task identified by the task token failed.
amazonStepFunctionsClient.SendTaskFailureAsync(new SendTaskFailureRequest
{
Error = "<error object to process by step function>",
TaskToken = "callbacktoken"
});

Conclusion

We explored three ways to deal with long-running lambda using Step Functions. We can make use of Map state in case of Iterative data. Property ‘RemainingTime’ helps us to gracefully shutdown the lambda and invoke a new lambda to continue the execution. We can make use of the Callback pattern to handle the long-running lambda.

--

--