Automating workflows using AWS Step Functions

Nitin
Unibuddy
Published in
6 min readFeb 22, 2022

At Unibuddy we’re always looking at ways to improve processes both for our customers and to make it easier on ourselves. This time, we’re looking at AWS Step Functions, how we use workflow automation, and the different ways of building workflows. We’ll also look at how to write working code for one of the use cases.

What is a Workflow?

A Workflow is a sequence of tasks that processes a set of data. You can think of workflow as the path that describes how tasks go from being undone to done. Workflows manage failures, retries, parallelization, service integrations, and observability so developers can focus on higher-value business logic.

You can use Apache Airflow or AWS step functions to build workflows. One way of implementing workflows could be AWS step functions.

What are AWS Step Functions?

AWS Step Functions are a low-code, visual workflow service used to orchestrate AWS services, automate business processes, and build server-less applications.

Key features

  • Built-in retry mechanism
  • Multiple AWS services can be used in a state machine
  • Supports long-running tasks (for up to one year)
  • Automated scaling
  • Pay per use

State machines

The workflows you build with Step Functions are called state machines, and each step of your workflow is called a state.

How to define state machines?

State machines are defined using a JSON-based Amazon States Language.
You can define state machines in many ways-

  1. Step Functions’ graphical console
  2. AWS SAM
  3. AWS CDK

A state machine can consist of multiple states like pass, wait, task, etc.

Use cases

Sequential execution of tasks

You create a workflow that runs a group of Lambda functions (steps) in a specific order. One Lambda function’s output passes to the next Lambda function’s input. The last step in your workflow gives a result. With Step Functions, you can see how each step in your workflow interacts with the other, so you can make sure that each step performs its intended function.

Function orchestration

Branching

Using a Choice state, you can have Step Functions make decisions based on the Choice state’s input.

Human in the loop

When you apply for a credit card, your application might be reviewed by a human. Because step functions can run up to one year, a state machine can wait for human approval and proceed to the next state only after approval or rejection.

Parallel Processing

Once you are sure of the number of branches, then you should use parallel processing. You cannot modify the number of branches in runtime.
E.g. a customer converts a video file into five different display resolutions, so viewers can watch the video on multiple devices. Using a Parallel state, Step Functions inputs the video file, so the Lambda function can process it into the five display resolutions at the same time.

Dynamic parallelism

This is similar to Parallel Processing except branches are created dynamically based on the input, say for example a customer orders three items, and you need to prepare each item for delivery. You then check each item’s availability, gather each item, and package each item for delivery. Using a Map state, Step Functions has Lambda process each of your customer’s items in parallel. Once all of your customer’s items are packaged for delivery, Step Functions goes to the next step in your workflow, which is to send your customer a confirmation email with tracking information.

Creating a step machine

Prerequisite

You should have adequate knowledge of AWS IAM, AWS CDK, and AWS Lambda.

We chose AWS CDK to create a step machine. The reason being we use CDK extensively to use other AWS services.
Below is the example of a step machine that invokes a Lambda and passes “Hello World!” as an input to Succeed state. If you notice carefully, Lambda code is written in java-script whereas step machine is defined in Python. That means you can use your choice of language in Lambda to write your business logic.

hello_step_stack.py
app.py

You can use AWS SDK to start an execution of a state machine.

How it works at Unibuddy

  1. We did a proof-of-concept to check if we can use step functions for long-running tasks

As we already know we can integrate and run other AWS services in any state of the workflow. For example, if we want to run a huge migration that takes around 10 hours to complete, we can use AWS Fargate to run this task.

step_functions_poc_stack.py
Executed state machine
Execution history

As you can see in the execution history, after the TaskSubmitted event, the state machine waited for the task to exit.

2. Breaking a large task into a smaller task using Map state

Our customers can download historic data of their prospects. These reports have huge amounts of data and may take a lot of time to generate if not divided into sub-tasks.

Currently, we use Celery to generate these reports but there are a few problems with it :
1. We are unable to track the state of individual tasks.
2. Long-running tasks have the potential to get killed.
3. No retry mechanism for individual tasks.

By using Step Functions Map state, based on date range input we can dynamically break a large task into smaller tasks. Each instance of GenerateSubReport will create a report for the smaller date range and pass the output to MergeSubReports. Once all the subtasks have performed their work, we can merge the output in the MergeSubReports Lambda function, and then in the next step, we can notify the user.

Conclusion

Whenever you think you need to perform a few tasks sequentially, in parallel or you have an ETL workflow and you want to pay only for the processing, step functions are a handy choice because of their graphical representation and easy retry mechanism.

This POC was done in two weeks of personal project time given to every engineer at Unibuddy. If you want to work at Unibuddy, we are hiring.
Check out the link — https://grnh.se/9b713f3f3us

--

--