Serverless Quick Tip #2: Asynchronous Micro Services with Serverless and AWS Step Functions

Published in

TrustBob Blog

5 min readDec 3, 2018

All micro services I developed on a FaaS platform like AWS Lambda had one aspect in common: The time limit of a common http request (30 seconds when using AWS API Gateway) was exceeded easily.

Therefore I had a design similar to this diagram:

Trigger: A short-lived AWS lambda function with an API Gateway event to start off processing / calling transform asynchronously.
Transform: One or more functions that did the actual processing.
Status: Another short-lived AWS lambda function with an API Gateway to query the status of process.
Status Database: A DynamoDB table or ElastiCache Redis instance to hold the state of all processes.

For 4. Status Database the choice between Redis and DynamoDB was always quite clear: Do you want to keep the states after completion? Use DynamoDB.

highly trafficked serverless functions can get quite complicated. On the other hand, configuring a AWS ElastiCache Redis instance with serverless, requires a lot of boilerplate code. See this Gist.

Not to mention, that both solutions aren’t purely serverless, when you look at the scalability and pricing AWS offers there.

Additional side concerns where:

When you have file system reliant code in AWS Lambda, you want to enforce some re-try and backoff policies, since each function only gets 500MB of disk, which is shared with concurrent calls to the same function instance.
If multiple functions making up the transform part, error handling, parallelisation and synchronisation can introduce a lot of boilerplate code to each function.

Introducing AWS Step Functions

To summarise, or in case you skipped the first section — the problems we face with asynchronous serverless micro services:

Persisting the state in DynamoDB or AWS ElastiCache / Redis is costly and/or complex.
Error handling, re-try/backoff behaviour and flow control require a lot of boilerplate code on a per function basis.

AWS sells Step Functions (short SFN) as a tool for building distributed applications with visual workflows. Meaning you can define an execution flow between different types of AWS services, foremost AWS Lambda.

I like to explain technologies by example, so let’s take a micro service that transforms vectorised PDFs to transparent PNGs. Let’s say for whatever reason we want to split up the transformation in two steps / two functions:

Convert: Converts the PDF to a PNG file.
Transform: Makes the white background of the PNG file transparent.

Additionally we would like to generate a thumbnail of the PNG while it is transforming. So we end up with a very simple flowchart:

After you setup your serverless project and defined all your functions in the `serverless.yml` file, install two plugins:

Serverless Step Functions: To define the state machine in the serverless.yml.
Serverless Pseudo Parameters: Required by serverless-step-functions to refer to the defined functions.

sls plugin install -n serverless-step-functionssls plugin install -n serverless-pseudo-parameters

These two commands should have added the following three lines at the bottom of your serverless.yml file:

plugins:- serverless-step-functions- serverless-pseudo-parameters

Let’s assume the functions are defined in serverless.yml as follows:

functions:  convert:    handler: handler.convert  transform:    handler: handler.transform  thumbnail:    handler: handler.thumbnail

To define the state machine as in our flowchart, we can apply the following YAML configuration after the functions section:

stepFunctions:
  stateMachines:
    pdfTransform:
      name: PDFTransform
      description: "Takes vectorised PDFs and transforms them to PNGs with transparentbackground, also generates thumbnails for them."
      definition:
        StartAt: Convert
        States:
          Convert:
            Type: Task
            Next: Processing
            Resource: arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:my-service-${opt:stage}-convert
        Processing:
          Type: Parallel
          End: true
          Branches:
            - StartAt: Transform
              States:
                Transform:
                Type: Task
                Resource: arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:my-service-${opt:stage}-transform
                End: true
            - StartAt: Thumbnail
              States:
                Thumbnail: 
                  Type: Task
                  Resource: arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:my-service-${opt:stage}-thumbnail
                  End: true

Deploy your function:

sls deploy

Then open your AWS Management Console, select the proper region, open the Step Functions Menu and click on PDFTransform. You can execute your state machine here for testing:

Very well, our functions are deployed to AWS Lambda and are orchestrated via Step Functions.

Next we want to implement retry/backoff policies for our Convert function. For functions who suffer from contended file system space on AWS Lambda I usually make two retries with a backoff two times the average runtime.

Let’s say in the case of convert the average runtime is 30 seconds. To configure the interval we simply add a Retry clause to the task definition:

Convert:
  Type: Task
  Next: Processing
  Resource: arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:my-service-${opt:stage}-convert
  Retry:
    - ErrorEquals:
        - States.TaskFailed
      IntervalSeconds: 30
      MaxAttempts: 2
      BackoffRate: 2

Fine! Next we add error handling: When Transform or Thumbnail fail, I want to make sure the converted PNG file gets deleted and an error is reported.

We can first define a new AWS Lambda function, we call Rollback:

functions:# ...  rollback:
    handler: handler.rollback

Second, define it as a new terminal state in our state machine:

Rollback:
  Type: Task
  Resource: arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:my-service-${opt:stage}-rollback
  End: true

And last: Define this state as the next step when our Processing parallel task fails:

Processing:
  Type: Parallel
  End: true
  Catch:
    - ErrorEquals: 
       - States.TaskFailed
      Next: Rollback
  Branches:
    # ...

After re-deploying and executing the state machine again:

To complete our web micro service, we need to define the Trigger and Status function. Since all they do is start a state machine execution and query its status via the AWS SDK, you can define them very generic: See this Gist.

Additionally you have to grant your functions the DescribeExecution and StartExecution permissions:

iamRoleStatements:
  - Effect: "Allow"
    Action:
       - "states:StartExecution"
    Resource:
      - "arn:aws:states:#{AWS::Region}:#{AWS::AccountId}:stateMachine:PDFTransform"
  - Effect: "Allow"
    Action:
      - "states:DescribeExecution"
    Resource:
      - "arn:aws:states:#{AWS::Region}:#{AWS::AccountId}:execution:PDFTransform:*"

And done. You can find the complete serverless.yml example here.

And done. You can find the complete serverless.yml example here:

https://gist.github.com/codecitizen/2e939c85b19dd96f0a607a7d34630d7b

AWS Step Functions solves a lot of common issues serverless micro services have. It removes boilerplate code and gives a good visualisation of your architecture.

Of course it introduces a hard vendor lock on AWS, but honestly, I haven’t been able to implement a function using the serverless that wasn’t AWS locked, yet.

Serverless Quick Tip #2: Asynchronous Micro Services with Serverless and AWS Step Functions

Introducing AWS Step Functions

Written by Alexander Magnus Partsch