Building Workflows with Amazon Simple Workflow Service vs Step Functions

Asanka Nissanka
AVM Consulting Blog
6 min readDec 20, 2017

Update: This article was initiallially published on 20th December 2017. However I completely re-wrote this article on 24th August 2019 including new updates from AWS for each service I am discussing here.

Workflow is a commonly used paradigm in applications. It’s basically used for coordinating work across distributed components. There are various use cases that we can implement using a workflow model, few such use cases are ( just to give a heads up on the context here )

Example Use Cases for Workflows

  • Order management systems
  • Multi-stage message processing systems
  • Billing management systems
  • Video encoding systems
  • Image conversion systems

Architecture or the design of a system like above should satisfy certain requirements in terms of resiliency, availability, fault tolerance and scale.

Concerns When Designing a Workflow Based System

  • Manage transition between states (Orchestration logic)
  • Monitor execution (Monitor state transition)
  • Control execution (Pause and start with a human action)
  • Scaling (Manage scaling at state level)
  • Error handling (Retry or fallback accordingly)
  • Integration with other services

Considering all the concerns above it’s a wise decision to use a platform or library that is built for this purpose rather re-inventing the wheel, And using a managed service would be even wiser since that would take away all the unnecessary burden require to maintain a platform like this by our own. Huh, now you probably know where I am getting at :)

Worflow PaaS Offerings in AWS

Amazon web services have mainly two dedicated managed services for workflow implementation.

  • Simple Workflow Service (SWF)
  • Step Functions

I wrote an article on how to use SWF to design a workflow before Step Functions was introduced and you can find it here if you are interested.

Okay, so now let’s look at how each of above services have addressed our concerns.

Manage transition between states

In SWF orchestration is handled by a component called “Decider”, and there can be multiple deciders per workflow. In Step Functions it is handled by the “State Definition” and there can be only one state definition per workflow.

The implementation of the orchestration logic using above components is the main differentiating factor of these two services. Deciders in SWF have to be implemented by our own similar to a worker task and it uses a polling method to look for decision tasks and schedule worker activities based on events recorded in the workflow execution history. I have done this and trust me this is a complex task. However in Step Functions this is simple as defining a JSON object. To be exact it’s a JSON based language named Amazon States Language that makes it easy. We can describe the state machines declaratively using this language, and it is super easy compared to implementing SWF deciders.

Monitor Execution

The handy feature of Step Functions is that you can visualize the workflow in a graphical view. Shown below is one such visualization of a workflow for a customer order management system.

Visual Representation of a Step Function Workflow

In SWF there is no visualization as above, however there is a management panel which shows a list of all executions along with logs. We can drill down into each execution and check what happened. In my opinion debugging with this management panel is not that fun 😉.

Control Execution

This is mainly required when we want to use a human action in between the states. For example we can think of an authorization or approval step to proceed or terminate the execution. In SWF this is handled via a feature called “Signals”. And in Step Functions this is handled via “Callback Tasks”. Both implementations use a token based approach. And basically, we can put the workflow into a pause state and receive a token to resume it via an API call later.

In addition to pausing and resuming, according to SWF documentation, signals can be used to inject information into the workflow execution as well. Anyways I haven’t use it in such a way.

Scaling

In terms of scaling my main concern here is how scalable the platform for managing complex and heavy tasks. If we talk about about complex tasks, SWF supports creating child workflows within a workflow. The number of child workflows can be managed via the decider. So in a way, we can use this to achieve some sort of a parallelism as well. But remember there is a limit of maximum child workflows you can start within a workflow (1000 per workflow) and a maximum number of child workflows you can start per second. You can find these limits in SWF service limits.

Step Functions support this same feature in the form of nested workflows. Here we can start workflow executions directly from the task states. You can start as many as nested workflow executions until you hit the “StartExecution” api action limit. This is an important feature when we want to orchestrate complex processes by composing modular, reusable workflows.

Error Handling

Error handling is mostly the same in both services, However it’s the orchestration logic which defines what to do next in case of an error. SWF records error events in the execution history and we have to detect them in the decider and write the logic to take necessary actions. Step Functions produce error codes which we can capture within the state definition and define necessary actions in the definition itself.

Integration with other services

There is no explicit way in SWF to integrate with other services, since we are responsible for implementing both orchestration logic and tasks (activities). Here we have the flexibility of integrating with any service as we want. However in Step Functions there are two options. We can either integrate with other services in our task logic or integrate directly via the state definition. The later option is really convenient since we can keep the task logic simple and small while the amazon state language handle the integration pain. More details on this can be found in the documenation.

On a final note, both services support task execution via Lambda functions but in SWF the deciders can’t be implemented using Lambda. However Step Functions doesn’t have this limitation since the coordination logic is executed by the service itself. So using Step Functions we can build 100% serverless workflows.

Now the problem comes, when to use what? This decision basically depends on how much control you need in your orchestration logic. Amazon’s recommendation is to first check whether Step Functions can serve your purpose, if not consider SWF.

This is from the SWF FAQs

AWS customers should consider using Step Functions for new applications. If Step Functions does not fit your needs, then you should consider Amazon Simple Workflow (SWF)

Anyways it’s always a best practice to read on the FAQs of each service before implementing a workflow system, specially read on the service limits of each service and check whether they comply with your requirements.

👋 Join us today !!

️Follow us on LinkedIn, Twitter, Facebook, and Instagram

If this post was helpful, please click the clap 👏 button below a few times to show your support! ⬇

--

--