Orchestrating backend services with AWS Step Functions

The problem

In many use cases, there are processes that need to execute multiple tasks. We build micro-services or server-less functions like AWS Lambda functions to carry out these tasks. Almost all these services are stateless functions and there is need of queues or databases to maintain the state of individual tasks and the process as a whole. Writing code that orchestrates these tasks can be both painful and hard to debug and maintain. It’s not easy to maintain the state of a process in an ecosystem of micro-services and server-less functions.

AWS Step Functions

Step Functions have been around since it was announced at re:Invent 2016. According to the AWS docs : AWS Step Functions is a web service that enables you to coordinate the components of distributed applications and micro-services using visual workflows.

What Step Functions does

  • Enables us to define visual work flows for processes by writing minimal to no code
  • Scales out automatically
  • Deals with failures and timeouts of tasks
  • Maintains an auditable log of all state changes

Step Functions are based on concept of Tasks, States and State Machine. All the work is done by the Tasks. A Task can be a Lambda Function, an Activity which can be fulfilled on other AWS hosted resources or even an Activity defined on our own machines.

The State Machine

The State Machine is the flow in which we want to execute our tasks to complete the required process. It is the biggest component of Step Functions. A State Machine can be defined by JSON written in the syntax of the AWS States Language.

States

The states are blocks which represent some action. Here is a list of states available in the States Language

  • Pass : Does nothing. Mainly used for debugging or as a place holder state
  • Task : Execute some code or run an activity
  • Choice : Adds branching logic to the state machine (if/else?)
  • Wait : Waits for a specified time before moving on to the next state
  • Succeed : Terminates the execution with a success
  • Fail : Terminates the execution with a failure
  • Parallel : Adds parallel branching to the state function

Refer the Step Functions docs on States for details.

One can use the AWS Step Function console to start making simple flows (State Machines) using step functions. There are pre-configured examples available to start with. In this article we’ll go through a working example which extracts EXIF metadata from an image, crops the object from image and saves it to s3. We will be assuming that the lambda functions* have already been deployed. I’ll have a blog up about these lambda functions soon. We will use step functions to orchestrate the lambda functions to process the images.

Image metadata extraction example with Step Functions

Goal : Extract metadata from images, resize image to medium size and a thumbnail. Upload the details to database.

We have two lambda functions already set up

  1. get-exif-lambda : Downloads image from s3, extracts EXIF data
    Input : s3 image URI
    Output : Image metadata
  2. process-image-lambda : Downloads image from s3, resizes it to the desired size
    Input : s3 image URI, desired size of the output image, 
    Output : s3 URIs of the processed images

Creating the State Machine

A state machine can be created by the AWS CLI or the AWS Step Functions console. Creating it from the console is easier because you can see a visual representation of it. It is also easy to setup the required IAM role to give the state machine permission to the required AWS resources from the console.

Open the Step Functions console and create a new state machine with the following JSON.

{
"StartAt": "GetExif",
"States": {
"GetExif": {
"Type": "Task",
"Resource": "<get-exif-lambda arn>",
"Next": "ResizeImage"
},
"ResizeImage": {
"Type": "Parallel",
"Next": "WriteToDb",
"ResultPath": "$.resizedLinks",
"Branches": [
{
"StartAt": "MediumSize",
"States": {
"MediumSize": {
"Type": "Task",
"Resource": "<resize-image-lambda arn>",
"Parameters": {
"thumbnail": false,
"source.$": "$.source",
"maxHeight": 600,
"maxWidth": 600
},
"End": true
}
}
},
{
"StartAt": "Thumbnail",
"States": {
"Thumbnail": {
"Type": "Task",
"Resource": "<resize-image-lambda arn>",
"Parameters": {
"thumbnail": true,
"source.$": "$.source",
"maxHeight": 128,
"maxWidth": 128
},
"End": true
}
}
}
]
},
"WriteToDb": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:putItem",
"Parameters": {
"TableName": "image-details",
"Item": {
"key": {
"S.$": "$.source.key"
},
"exif": {
"S.$": "$.exif"
},
"mediumURL": {
"S.$": "$.resizedLinks[0]"
},
"thumbnailURL": {
"S.$": "$.resizedLinks[1]"
}
}
},
"End": true
}
}
}

In the next step you’re required to provide an IAM role for the State Machine. Select “Create an IAM role”, give a valid role name and proceed. This will create a new IAM role which provides it access only to the resources that are being used by the state machine. The state machine created will look like this

Visual representation of the state machine

The flow starts at the GetExif state which executes a lambda function to retrieve metadata from the image.
Then it executes two image conversion tasks : MediumSize and Thumbnail in parallel.

This data is passed to WriteToDb which writes the output to DynamoDB.

This is simple example. It is very easy to design more complex workflows with other States available in the States Language.

We can also add Error handling and retry functionality to the States.

Manipulating Input and Output of a State

Understanding how to pass data from one state to another is important for building State Machines. The States Language allows us to manipulate and control the JSON data that flows between states.

In Amazon State Language, these fields manipulate and control the flow of data between states :

  • InputPath
  • OutputPath
  • ResultPath
  • Parameters

This diagram shows the sequence in which these fields are applied to the JSON data.

InputPath and Parameters

Lets assume the state receives the following input

{
"message": {
"title": "Msg Title",
"content": "Hello World!"
},
"timestamp": 12312432
}

We can add InputPath and Parameters to the state

"InputPath": "$.message",
"Parameters": {
"messageType": "text",
"messageTitle.$": "$.title",
"messageContent.$": "$.content"
}

This will give us the following as input to the worker

{
"messageType": "text",
"messageTitle": "Msg Title",
"messageContent": "Hello World!"
}

ResultPath

Assume the worker returns following output for the input in the previous example

"HELLO WORLD!"

We can add ResultPath to add the output to input

"ResultPath": "$.taskOutput"

This will include the result of the worker to the input

{
"messageType": "text",
"messageTitle": "Msg Title",
"messageContent": "Hello World!"
"taskOutput": "HELLO WORLD!"
}

OutputPath

OutputPath field filters data to be sent to the next state. In this case "OutputPath":"$.messageContent" will send "Hello World!" as input to the next task.

Read the Input and Output Processing documentation for more details on this.

Logging, Debugging and Monitoring

We can run tests, debug and monitor Step Function executions on the Step Functions console. If you are using Lambda functions to run the tasks the logs will be delivered to the Lambda’s CloudWatch log group as usual.

All StepFunction executions are listed in the console with status, and we can dive into an execution to check the details about underlying states
Each step of the a particular execution can be viewed for debugging. The CloudWatch logslink redirects to the log stream of that AWS service.

It also has a CloudWatch metrics integration to monitor failures in production which are available under CloudWatch > Metrics > States.

Conclusion

Step Functions is an easy to use service for orchestrating backend workflows. Very complex flows can be designed easily with the States Language. It maintains the state of all the tasks and orchestrate them to run when needed, only when needed and scales automatically. Step Functions is easily pluggable with existing architecture. Since, an SFN can stay alive for 1 year it can also be used for long running workflows using activity worker.

Step Functions has probably the coolest console among all the AWS services.