Step Functions - Dynamic Parallelism ( Fan-Out explained )
AWS finally released support for dynamic parallelism in Step Functions on September 18th, 2019. It was one of the most requested features in Step Functions.
AWS Step Functions is a fully managed service that makes coordinating tasks easier by letting you design and run workflows that are made of steps, each step receiving as input the output of the previous step.
With dynamic parallelism, Step Functions now supports two very interesting messaging patterns out of the box :
- Fan-out: Fan-out can be now used to deliver a message to multiple destinations defined at runtime. This can be very useful in workflows such as order processing or batch data processing. Using this we can split a single array message into n individual messages and fan-out the processing of those messages.
- Scatter-gather: broadcasts a single message to multiple destinations (scatter) and then aggregates the responses back for the next steps (gather).
Now that we’ve established the patterns, let’s explore the fan-out pattern in -depth with an example.
State Machine Definition
So this is a very simple State machine definition using fan-out.
Using the Map State
The Type is set to Map here which indicates dynamic parallelism. Note that this is different from Parallel state which supports pre-defined static parallelism.
To configure a Map state, you define an Iterator, which is a complete sub-workflow. When a Step Functions execution enters a
Map state, it will iterate over a JSON array in the state input. In the case of the above example, it will iterate over chunks array.
For each item, the Map state will execute one sub-workflow, potentially in parallel. When all sub-workflow executions complete, it will return an array containing the output for each item processed by the Iterator.
As shown below, you can see the Step Functions console detects the Map state and provide an additional dropdown to view the sub-results. You can also view the Map Iteration index and exactly what input the particular lambda was invoked with. In this contrived example, the Input received was 1.
Another important control is the MaxConcurrency field. The default value is 0, which places no limit on parallelism and iterations are invoked as concurrently as possible.
A MaxConcurrency value of 1 has the effect to invoke Iterator sequentially.
It should be noted that the tasks executed in this example use a Lambda function, but with Step Functions, you can use other AWS Service integrations and have code running on EC2 instances or AWS Batch, etc.
And that’s it. This feature opens up so many possibilities to use Step Functions in your workflow and also optimize existing ones. Good luck!