Energise Yourself With Express Step Functions

Published in

Engineers @ The LEGO Group

10 min readApr 17, 2023

I love energy drinks. I do. I’m a self-confessed addict. Sometimes it can be difficult to choose which flavour or brand I want to drink to help me achieve my base level of functionality. As I’m an engineer working in the serverless space, I figured I could leverage the power of the almighty AWS to help make the hardest of choices for me.

Existing Architecture

I put together a little service that would allow me to post a payload to an endpoint, and the result would be a drink selected for me. The diagram below shows a high-level overview of my existing architecture in the energy drink selector service.

A payload is sent to an API Gateway endpoint of /drinks/select containing information about can size and a few other options.

A Lambda function is then triggered to handle the payload and dispatch an event with the payload information. The Lambda function returns some unique identifiers, such as drinkReference and sessionReference. The caller can use these to find any data they may need later.

There is a standard Step Function that listens for the event and has steps to store information in DynamoDB tables such as session data, drink selections and drink information.

A Lambda function is used to select the drink within the Step Function also. The drink selector Step Function results in an event being dispatched with a payload containing the drink chosen.

So what’s the problem?

That white gold that we all love: sugar. When looking at the drinks that were being selected, I realised that every result was a sugar-filled drink. Now, I could stop drinking these drinks altogether, which would render this service (and blog post) obsolete. Or I could introduce the option of having a sugar-free choice. Therefore saving the service (and the blog post!). In the interest of the service, the blog post, and my health, I rounded up a selection of sugar-free drinks and added them to the drinks DynamoDB table.

Considerations

The introduction of the sugar or sugar-free logic needs to be implemented so that it has minimal impact on the already functional service in place. An update to the existing architecture will likely need to be made.

To facilitate the choice of a sugar or sugar-free option, the payload that is received by the service will need to be updated. Adding a flag into the payload as a boolean is a fairly straightforward update that would work.

With the addition of a new flag in the payload, it is important to consider how we would process the incoming payload. Let’s break down what will need to happen to facilitate the addition of the new logic:

Check the incoming payload for the sugar or sugar-free option
Run some logic based on if sugar or sugar-free

Although we only have two new steps that we need to execute, the existing architecture won’t allow for these without some fundamental changes. So, how can we approach this in such a way that allows us to run these new steps and not interfere with what’s already there?

…we don’t want to write specialised integration code if we don’t have to.

Sure, we could pretty easily set up a new Lambda function that the drinks/select endpoint invokes and then send an event to the event bus to trigger the existing drink select handler. However, this feels more like writing integration code, and we don’t want to write specialised integration code if we don’t have to. I do, however, have a sneaking suspicion that there is a better way to approach things.

It’s time to express ourselves…

Horrible pun aside, I introduce to you the Express Step Function. An AWS Express Step Function is the younger sibling of the standard Step Function. It is similar to its older sibling in that it allows you to build, run and coordinate a series of steps in a workflow as a state machine.

However, some key differences make an express Step Function an interesting option for our needs.

Duration

Express Step Functions are designed for fast, event-driven workflows with a maximum duration of five minutes. This is of particular interest as the new steps needed are likely to be low-intensity and quick-completion tasks.

Cost

The pricing model for Express Step Functions differs from standard Step Functions. Standard Step Function pricing is worked out by looking at the number of state transitions, i.e., each time a step in the Step Function is completed. For express Step Functions, however, the pricing is based on the number of executions run, duration, and memory consumption. Let’s look at the following example according to the AWS pricing calculator:

Standard Step Function
- 5000 workflow requests per month
- 10 state transitions per workflow
- Monthly price: 1.15 USD
Express Step Function
- 5000 workflow requests per month
- Duration of each workflow 1000ms
- 128MB of memory consumed by each workflow
- Monthly price: 0.02 USD

As you can see, there is a significant difference in pricing between the two, with express Step Functions being the cheaper option.

Sync versus Async

Step Functions can be invoked by several different AWS services, and the standard workflow type is invoked asynchronously. The express workflow type can, however, be invoked synchronously. Using the StartSyncExecution API, it is possible to set an express Step Function to run synchronously.

The consideration of sync versus async becomes more apparent when considering that we need to ensure that we return the same payload that the current existing service was returning. As the API endpoint triggered a Lambda function that returned some unique references, we need to ensure that the Step Function invocation does the same. Running a Step Function workflow with the type set to express allows us to do this.

Updated Architecture

As we can see in the diagram above, the overall architectural changes made are minimal. Instead of triggering a Lambda function from API Gateway, we trigger an express Step Function. This express Step Function runs checks on the incoming payload and then routes to a Lambda function that handles either a sugar or a sugar-free payload depending on the sugar boolean in the payload. The Lambda functions dispatch events that are then picked up by the drink selector Step Function. The final stage of the process then returns the drink choice to the client.

A closer look at the Express Step Function

Let’s first think about what needs to happen with the express Step Function:

Ensure that the value returned is the same as what was returned before the changes — Very important!
Run the desired checks on the payload received (sugar or sugar-free choice)
Run the required function as per the payload check

Sync vs Async

The first of these is that we need to ensure that the value returned is the same as before any changes are made. One of the reasons the express Step Function is a good choice in this scenario is that it can be configured to run synchronously.

As we’re invoking the Step Function directly from a call to API Gateway, we must ensure that we use the StartSyncExecution API call. As the service is built using Serverless Framework, the events block of the Step Function can be set as follows:

events:
  - http:
      path: /drinkSelect
      method: post
      private: true
      cors: true
      action: StartSyncExecution

Without the setting of the action in the HTTP block, the Step Function would be run asynchronously by default.

Response Configuration

Even though we have now set the Step Function to run synchronously, we’re not out of the woods yet. If we look at the response syntax for the StartSyncExecution we will be able to see it doesn’t match the expected response:

{
   "billingDetails": { 
      "billedDurationInMilliseconds": number,
      "billedMemoryUsedInMB": number
   },
   "cause": "string",
   "error": "string",
   "executionArn": "string",
   "input": "string",
   "inputDetails": { 
      "included": boolean
   },
   "name": "string",
   "output": "string",
   "outputDetails": { 
      "included": boolean
   },
   "startDate": number,
   "stateMachineArn": "string",
   "status": "string",
   "stopDate": number,
   "traceHeader": "string"
}

Don’t panic, though! We can configure both the request and the response templates the API gateway endpoint will use. By using API Gateway mapping templates, we can ensure that the incoming payload and the outgoing response match what is expected. Remember, the changes we are making need to ensure that there is no disruption to the existing working service.

The request

With the incoming request, we need to ensure that the correct attributes are being sent to the Step Function. If we consider the original architecture, API Gateway passed the request through to the drink select Lambda function. This would then mean that the Lambda function expects the incoming data to be wrapped in an object with the key of ‘body’. The YAML snippet below shows how we can ensure that the input to the Step Function is an object with a body key and the input JSON as the value. VTL is used when writing the mapping template to allow us to manipulate the elements of the payload.

request:
  template:
    application/json: |
      #set( $body = $util.escapeJavaScript($input.json('$')) )
      {
        "input": "{\"body\": $body}",
        "stateMachineArn": "arn:aws:states:${self:provider.region}:${aws:accountId}:stateMachine:service-drink-select-${self:provider.stage}"
      }

The response

As we need to match the response to what was previously returned, we need to override the default response. Again, looking at the original architecture and the return value from the drink select Lambda function, we can see how the response needs to be formatted.

return {
    body: {
      drink_reference: drinkReference,
      session_reference: sessionReference,
    }
};

As we can see, the body of the response contains the unique references we saw earlier. We can use the following YAML and VTL to ensure that we are returning the references correctly and that they’ll be formatted as to what any existing clients expect.

response:
  template:
    application/json: |
      #set ($bodyObj = $util.parseJson($input.body))
      #set ($outputObj = $util.parseJson($bodyObj.output))
      #set ($context.responseOverride.status = $outputObj.statusCode)
      $outputObj.body

You will also notice that we override the status that is sent with the statusCode from the output. This is due to the StartSyncExecution always returning a 200 OK response even if the execution fails. Overriding this with our status code gives us better visibility, and you know I love some observability.

Building the Step Function

I will start this section by emphasising the following:

Use the workflow studio when building your Step Functions

Trust me. Your life will be much easier, and you will pull out far less of your hair if you at least start the build using the workflow studio. Having a visual interface where you can drag and drop things and prototype any ideas you have is invaluable.

Going back to our remaining needs from the Step Function, we can see in the diagram below how we can:

Run the desired checks on the payload received
Run the required function as per the payload check

The choice state “Is Sugar Drink” looks at the input that it has received, checks for the existence of the sugar boolean, and checks if it is set to false. The Step Function will flow down the sugar-free path if this condition is met. If the condition is not met, the default flow will be followed, which is the sugar path. Each path has its respective handlers and fail states that will catch any errors thrown from the handlers.

We can see that in having the choice state, we have run the desired checks on the input to the Step Function and then run the required function for the choice. Using the existing sugar drink handler as the default, we can also ensure any service using the endpoint will still run as expected if they don’t update their payload.

When making updates that are in or around existing services, it is of the utmost importance that we don’t disturb what’s already there.

That’s all looking good, but why do it this way?

When making updates that are in or around existing services, it is of the utmost importance that we don’t disturb what’s already there. We may have a small energy drink selection service, but the sentiment rings true as you scale. What if you needed to implement some new functionality around a service that handles thousands of orders per day? Updating and adding to existing logic could open the door to a world of pain.

The path of single-responsibility thinking allows us to keep moving forward and updating our services with minimal to no impact on already running services. It also allows us to make things easier from a deployment point of view too. You can continue deploying your work without any knock-on effects on the existing service. For example, you could set up a temporary endpoint to test the functionality of the new express Step Function that is being introduced and then only make the swap to the live endpoint when fully happy.

There is a lot to consider when making any updates or additions to your services. Taking the time to step back and look at how your goal can be achieved with the least impact is always something worth doing.