Big BPM is coming

Published in

gft-engineering

6 min readApr 22, 2020

Beyond a request-response, there are many use cases requiring a complex state management, like responding to asynchronous events or communicating with other unreliable external systems. The usual approach to implement these use cases is a hodgepodge of stateless services, databases, batch/cron jobs and queuing systems. This negatively affects the developer’s productivity since most of the code is dedicated to plumbing, hiding the real business logic behind a mountain of low-level and low-value technical details. Also, such systems often have availability problems, since it is difficult to keep all parts running.

These use cases have been addressed by the Business Process Management (BPM) software products that have achieved some success, to a greater or lesser extent, although with difficulties to overcome acceptable levels of maintainability and scalability.

Cadence is an Open Source project by Uber that proposes a programming model with an agnostic fault state that conceals most of the complexities of building scalable distributed applications. In essence, Cadence provides virtual persistence that is not linked to a specific process and preserves the complete state of the application, including evidence in case of hardware and software failures. This allows you to write code using the full power of a programming language like Go or Java, while Cadence takes care of the durability, availability and scalability of the application.

Cadence consists of a programming (framework) and execution model (managed service or backend). The framework allows developers to create and coordinate tasks in familiar languages (Go and Java are available now, and soon Python and C # will be available through a proxy currently in development).

The backend has no status and is based on persistent storage. Currently, Cassandra and MySQL are compatible but an adapter can be added to any other database that provides transactions of individual multi-row fields. There are different models of service implementation. At Uber, they have clusters of hundreds of hosts that are shared by hundreds of applications.

Workflow

One of the main abstractions of Cadence is a Workflow with no fault status. The status of the Workflow code, including the local variables and the threads it creates, is immune to Cadence service failures. This is a very powerful concept as it encapsulates status, processing threads, durable timers and event handlers.

With Cadence, all logic can be encapsulated in a simple and lasting function that directly implements business logic. Because the function has status, the developer does not need to use any additional system to ensure durability and fault tolerance. The main restriction is that the workflow code must be deterministic, which means that it must produce exactly the same result for the same input, even it is executed several times. This discards any external API call from the workflow code, since external calls can intermittently fail or change their output at any time. That is why all communication with the external world must occur through Activities. For the same reason, the workflow code must use the Cadence APIs to obtain the current time, suspend and create new threads.

Below is an example of the workflow that implements the subscription administration use case:

This code directly implements the business logic. If any of the operations invoked (also known as activities) takes a lot of time, many instances or a lot of processing, the code will not change. There are no inconveniences if in chargeMonthlyFee the process is blocked for a day if the payment processing service is idle for so long. In the same way that blocking for 30 days inside the loop is a normal operation within the workflow code.

It also supports child Workflows (threads) that can be used to component reusable behaviors. Cadence has virtually no scalability limits on the number of open workflow instances. Even if there are hundreds of millions of consumers, the code above will not change.

Activities

In its simplest form, a Cadence Activity is a function or method of an object. Cadence does not recover the status of the activity in case of failures, therefore, an Activity function can contain any code without restrictions.

Activities are invoked asynchronously through task lists. A task list is essentially a queue used to store an activity task until it is picked up by an available Worker. The Worker processes an Activity by invoking its implementation function. When the function returns the result, the worker informs the Cadence service, which in turn notifies the Workflow of the completion.

Cadence allows you to configure for each Activity different policies for Timeouts, Retries, Long running, Cancellation, Routing, etc.

Event handling

Workflows with non-fault status can be indicated on an external event. A signal is always point to point and destined to a specific workflow instance. The signals are always processed in the order in which they are received. There are multiple scenarios for which the signals are useful.

Cadence supports the aggregation of events and their correlation without intending to replace solutions such as Apache Flink or Apache Spark. But in certain scenarios, it just fits better. For example, all events added always apply to a business entity with clear identification. And then, when a certain condition is met, actions must be taken.

Many business processes involve human participants. The standard Cadence pattern for implementing an external interaction is to execute an activity that creates a human task in an external system. It can be an email with a form, or a record in some external database, or a mobile application notification. When a user changes the status of the task, a signal is sent to the corresponding Workflow.

Synchronous Query

The workflow code has been with the Cadence framework that isolates it from several software and hardware failures. The state changes constantly during workflow execution. To expose this internal state to the external world, Cadence provides a synchronous query function. From the point of view of the workflow implementer, the query is exposed as a synchronous callback that is invoked by external entities. Multiple callbacks of this type can be provided per type of workflow exposing different information to different external systems.

How does it work ?

When a workflow invokes an Activity, it sends the ScheduleActivityTask decision to the Cadence service. As a result, the service updates the workflow status and sends an Activity task to a Worker who implements the Activity. Instead of calling the Worker directly, an intermediate queue is used. Therefore, the service adds an Activity Task to this queue and a Worker receives the task through a long poll request.

Similarly, when a workflow needs to handle an external event, a Decision task is created. A list of Decision tasks is used to dispatch to the workflow worker able to run it.

While Cadence’s task lists are queues, they have some differences with commonly used queue technologies. The main one is that they do not require explicit registration and are created on demand. The number of task lists is not limited. A common use case is to have a list of tasks by work process and use it to deliver Activity tasks to the process. Another use case is to have a list of tasks by a group of workers.

A typical Cadence-based application consists of a Cadence service, workers for workflow and activity, and external customers. Both types of workers, as well as external clients, are roles and can be placed in a single process if necessary.

The Cadence service is composed of Front End (FE) services that implement the API, history service (HS) that manages queues, handles events, stores and mutates workflow states and the Matching service (MS) responsible for dispatching tasks.

Moving forward…

You can find more information at cadenceworkflow.io and you can easily launch it in your laptop using:

curl -O https://raw.githubusercontent.com/uber/cadence/master/docker/docker-compose.yml && docker-compose up

I encourage you to share your questions and thoughts in the comments of this post.