Creating Scalable Solutions with Dynamic Parallelism

Wondering how to create a scalable solution using Step Functions? You’re in the right place!

Step Functions is a serverless orchestrator that makes it easier to sequence different AWS services into business-critical applications. It provides quite a bit of convenient functionality: automatic retry handling, orchestration, monitoring and alerting. On Gousto Recipes — Building Pipelines with Step Functions, you can see how we have been using Step Functions at Gousto.

In this blog post, we will go into more details on how to take advantage of Dynamic Parallelism (some people might know that as Fan-out).

What problem were we trying to solve?

As Gousto grows and opens more factories in different parts of the UK, the Planning team needs to plan what to buy for each factory way more in advance, therefore there is a business requirement to forecast the upcoming 4 weeks.

How was the architecture?

  • Configurator: setup variables for an execution such as weeks to forecast, order simulator constraints and factory caps.
  • Data Extractor: load data for each week required by Order Simulator and Site Specific Forecast.
  • Order Simulator: generate simulated orders for each week.
  • Site Specific Forecast: build a forecast with how much ingredients and products each factory will need.
  • Publisher: send an email to the Planning team with the forecast and provide data to downstream services.

In this architecture, each Lambda function was responsible for loading or making some computation for all weeks. For example, if we wanted to forecast for weeks A and B, Data Extractor would load data for both weeks, Order Simulator and Site Specific Forecast would perform some operations for A and B. And, finally, Publisher would send an email to the Planning team with corresponding forecasts.

This architecture worked pretty well for almost 1 year, however it is not scalable for mainly two reasons:

  1. The maximum execution duration of a Lambda function is 15 minutes. The Order Simulator was already taking 10 minutes to run for 2 weeks, so there was not much buffer to add an extra week.
  2. To forecast for an extra week would mean changing code of all components of the pipeline.

Map State to the Rescue

Essentially, we had to make two changes to the pipeline:

  1. Each Lambda function would perform tasks over a single week
  2. Update the state machine to wrap up Data Extractor, Order Simulator and Site Specific Forecast within a Map state.

You can find below a snippet code with the state machine.

Let’s have a look at Menu Week Map state — for the sake of simplicity we are using Pass states:

  • ItemsPath: an array of elements (in our use case, that is a list of numbers coming from Configurator). The value is a reference path identifying where in the effective input the array field is found.
  • Parameters: You can define a set of variables that are accessible to all states within Iterator.
  • Iterator: where you define states of a Map, we have two simple rules: states inside an Iterator can only transition to another state defined inside the same Iterator, and no state outside the Iterator field can transition to a state within it.
  • MaxConcurrency: number that provides an upper bound on how many invocations of the Iterator may run in parallel.

I just covered the basics and I highly recommend you to visit Map — AWS Step Functions.

How long does it take to run?

Now, with Dynamic Parallelism in place, the pipeline takes half of the time and predicts 4 weeks! Actually, it would still take the same amount if we had to forecast N weeks because each Lambda function would perform operations over one single week concurrently.

Conclusion

Interested to join us and build some interesting scalable solutions? Check out our openings - Gousto Jobs!

Gousto Engineering & Data

Gousto Engineering & Data Blog