Cleaner Microservice Orchestration With Zeebe+Lambda

Brian McCann
The Startup
Published in
11 min readJul 18, 2020

In this post I’ll talk about Zeebe. Zeebe is a source available workflow engine that helps you define, orchestrate, and monitor business processes across microservices. I’ll show you how to quickly get a full application that leverages Zeebe and Lambda running on AWS. I’m piggybacking off a post by the company’s co-founder in which he showed how you can use Zeebe with Lambda. You may want to start with that one first for a general overview of Zeebe and how it fits in with Serverless, then if you’re interested in quickly getting all the components running on AWS this post will show you how.

If you embrace AWS Lambda or are considering adopting it you may see Lambda as an alternative to Kubernetes and prefer the simplicity, reliability, and out of the box scaling that the Serverless way offers. I was motivated to write this for that type of developer. For Zeebe to work we need something running all the time so we can’t be fully “serverless”. My concession is to use ECS to bridge the gap. ECS is a simpler and cheaper alternative to Kubernetes when you’re getting started with containers. If you are a serverless first developer you or your organization might not want to take on the overhead of running and learning k8s if almost all your workloads work fine on Lambda anyway. Zeebe is a super powerful tool that allows you to compose your Lambda functions to build arbitrarily complex stateful applications. Today I’ll show how to get up and running quick with Zeebe and Lambda using ECS.

This is what we’ll make

Use Case?

In case you don’t know about Zeebe I’ll try to describe what it does by describing a use case. A little while ago I was working on a backend that would allow customers to put a down payment on a vehicle then go pick it up at a dealership. The whole process wasn’t too complicated but did involve 5–6 discreet steps involving external API calls. Any of those steps had a number of potential failure paths and based on the outcome of each step you would want to take a different action. On top of that there are different types of errors. There are errors due to conditions in the physical world like a customer putting their address in wrong and there are technical errors like network outages or a bug in the code. You can imagine how the number of scenarios grows exponentially each time you add a step. General purpose programming languages on their own are great at a lot of things, Orchestrating a complicated sequence of steps is not one of them. Using event driven architecture can help us manage emerging complexity by separating business logic into individual microservices whose isolated behavior is simpler to reason about. What we lose is end to end visibility of the business process, so that debugging is a challenge and refactors are scary. This is where a tool like Zeebe, or its predecessor Camunda can help. These tools let us represent our processes as diagrams that are executable. Here is an example:

One of the main advantages is that our business logic is now organized into manageable and isolated components. In addition, and in my opinion most importantly, this strategy separates the concern of Orchestration from everything else. This allows us to write functional style microservice components that are easy to read and refactor. All of our orchestration is managed by a dependable and tested framework, and the orchestration logic is cleanly separated as opposed to peppered throughout our microservices (as tends to happen). The boxes with gears are “service tasks” which means they represent pieces of code external to Zeebe that execute. The process engine (Zeebe) is responsible for executing the instructions and will invoke the code we’ve written where specified. In today’s example the service tasks (little boxes with gears ) will be implemented as Lambda functions. There are a lot of other powerful features exposed by BPMN (business process modeling notation), the ones pictured above are just a few.

We need a way for Zeebe to trigger the Lambda functions because Lambda functions by design are dormant until you trigger them. We will use the code provided by the zeebe-lambda-worker to create a link between the orchestrator (Zeebe) and the business logic (Lamba). The worker will listen for events on the Zeebe broker, pick up work tasks as they come in, and forward the payload to Lambda. The original post covers how all this works. Here is what I’m adding

  • How to spin up an ECS cluster and run the zeebe-lambda-worker on it
  • How to use an IAM role for authorization instead of AWS credentials

Here are the steps involved in this walkthrough:

  1. Get set up with Camunda Cloud
  • Sign up for a Free Camunda Cloud trial account
  • Launch a Zeebe instance

2. Use the Serverless Framework to deploy several Lambda functions and an IAM role

3. Get set up on ECS

  • Create an ECS cluster using a setup shell script
  • Deploy the zeebe-lambda-worker to ECS

4. Test that everything works end to end

Sign up for Camunda Cloud

Camunda and Zeebe are the same company. Zeebe is their newer product and Camunda Cloud is their managed cloud offering which happens to run on the newer Zeebe engine. It’s confusing, I know, probably has something to do with marketing as Camunda is an established and fairly well known brand.

Getting set up with a development Zeebe instance is fairly straightforward. This post from the zeebe blog walks you through the steps. The whole thing is a good read but you can skip everything after the create a client part if you just want to get up and running. We’ll need the broker address information and the client information later. Once we’re set up with a free Zeebe instance we can deploy the Serverless framework project.

Deploy the serverless project

prerequisite: You need to have an AWS account + IAM user set up with admin privileges. This is a great guide if you need help.

clone the repo https://github.com/bmccann36/trip-booking-saga-serverless

cd into this directory trip-booking-saga-serverless/functions/aws

In the serverless.yml file update the region property If you’d like to deploy to a region other than us-east-1.

I have made barely any changes to this from the original original forked repository. I removed the http event triggers from the functions since we won’t need them for this. This will make the project deploy and tear down faster since there are no API gateway resources. I also defined an IAM role that will give our zeebe-lambda-worker (which we have yet to deploy) permission to invoke the Lambda functions.

The policy portion of the role we’ll give to the ECS lambda-worker

Run the command sls deploy -v to deploy the Lambda functions (the -v gives you verbose output).

Connecting Lambda + Zeebe with ECS

The last step to get our demo application working end to end is to deploy the zeebe-lambda-worker. The source code can be found here. I also forked this from Zeebe and made a small change which was to add support for IAM role authorization. Since our ECS task will assume a role that gives it permission to invoke the Lambda functions it needs to, we do not need to supply AWS credentials to the worker code. I added an if / else statement so that a role will be used if no AWS credentials are supplied. This works because the AWS sdk sources credentials hierarchically. The role referred to in this snippet is the one we created earlier when we deployed the Serverless framework project.

If you prefer to use accessKey/secretKey make sure those keys are not the same keys you use as an admin user to work with AWS. You should create a role or user with credentials that allow the least permission possible. In this case we only need permission to invoke a few specific Lambda functions.

I have already packaged the zeebe-worker as a Docker image and pushed to a public repo. The ECS setup which we will get to in a moment will pull this image. If you want you can modify the worker source code and/or build the image yourself and publish to your own repository. You’ll need to package it as a .jar first as it is Java code.

Prerequisite: ecs-cli installed on your machine. You can get it with aws binary or homebrew

If you run into issues at any point look at the AWS walkthrough for reference. I mostly followed the steps outlined there to do this with a few minor modifications. I have chosen to use a launch type of EC2.

Fill in some configuration

To make it easy to configure your ECS cluster I’ve included some configuration template files, all of which have the word “SAMPLE” in the file name. To use them, cd into zeebe-event-adapter/ in the trip-booking-saga repo. Then copy all the files with the word SAMPLE in front and save them with “SAMPLE” removed from the file name. This will cause them to be ignored by git. I’ve configured the .gitignore to ignore these files so that you or I don’t accidentally commit sensitive credentials to source control. Next, populate the values in each file as outlined below.

(file) aws.env 
AWS_REGION=<FILL IN>
---
(file) Camunda.env
ZEEBE_CLIENT_CLOUD_CLUSTERID=<FILL IN>
ZEEBE_CLIENT_CLOUD_CLIENTID=<FILL IN>
ZEEBE_CLIENT_CLOUD_CLIENTSECRET=<FILL IN>
---
(file) ecs-params.yml
...
# modify this line
task_role_arn: arn:aws:iam::<YOUR AWS ACCOUNT ID>:role/zeebe-lambda-worker-role
...

The values we supply in Camunda.env will be used by the zeebe-lambda-worker to connect to the Camunda Cloud Zeebe instance we created in the first step.

Once we’ve filled in the values in each of these files we’re ready to deploy. I’ve combined all the steps into one bash script so everything can be deployed with one command. Just run the setupEcs.sh script and supply your region as the only argument.

i.e. bash setupEcs.sh us-east-1

If you run into problems at any step you may want to try running the commands in the script one by one manually, or refer back to the AWS tutorial.

# note you may get this warning INFO[0010] (service aws) was unable to place a task because no container instance met all of its requirements.Don’t get impatient and kill the process, this just means your infra isn’t quite ready yet, it should resolve on its own.

If you are used to working with Kubernetes the ECS terms will be confusing. An ECS service is not the same as K8s service at all and has nothing to do with networking. Instead in ECS a service just defines that you want to keep a “task” (which is basically a container with some config attached) running. It’s the same idea as a service you’d set up on a Linux server.

Test that it works

  • *disclaimer The main purpose of this post is to show how to easily deploy the zeebe-lamba-connector to work with Lambda functions so I won’t cover Zeebe usage much. If you want to learn more about Zeebe check out Bernd Rücker’s posts or try the zeebe quickstart.

Probably the simplest way to deploy and start the workflow is with the Zeebe Modeler tool which you can download here . You can also use the command line tool zbctl. Once you have the modeler downloaded, just open the trip-booking.bpmn file (in zeebe/aws directory) in the zeebe modeler. Use the upload and play buttons to upload the workflow to the cluster and start an instance of the process.

If the process instance worked successfully end to end you will see the completed instance in the operate console.

If we navigate to ECS, select the service we’ve deployed and click on logs, we can see the successful results from the Lambda functions being passed back and forth through the zeebe-lambda-worker.

Also if we navigate to the Lambda console, select one of the Lambdas involved in our workflow then select monitoring → view logs in cloudWatch, we can see the logs of the Lambda function itself.

Cleaning up ECS resources when you don’t want them anymore

take down the service ecs-cli compose service rm — cluster-config zb-cfg

remove the whole cluster ecs-cli down — force — cluster-config zb-cfg

Closing thoughts

I am curious to hear what others think about pairing Zeebe with Lambda. Does the combination make sense? What would you do differently? Also for anyone who uses or has used step functions I am curious to hear how this solution compares.

I had not used ECS much before this and I was pleasantly surprised by how easy it was to provision and use. It seems like they’ve added a lot of features since the last time I looked at it to try to keep pace with Kubernetes. I also like how they use a docker-compose file and docker like commands so you can work with your containers in the cloud more or less the same way you would locally. My one big concern and hesitation on getting good at ECS is that its clearly not the industry standard. Practically as engineers it just makes way more sense to use what everyone else is using because thats what everyone already knows.

I think Zeebe is a very promising new product, I also think they have their work cut out for them because there are a lot of competitors in this space. Some popular tools that fill the same Niche are Netflix’s Conductor and AWS’s own Step Functions. They face competition not just from competitor frameworks but also homespun orchestration solutions using event buses and other lower level orchestration/choreography tools. I think that a lot of orgs and developers don’t even realize that there is a tool that can help them glue together or orchestrate microservices. In a lot of cases people tend to write this code themselves. In many cases it is a conscious decision because they feel that adding in another Library just creates additional complexity, makes their app harder to understand for new developers, or perhaps even limits what they can do. I totally get that concern as I’ve had these problems myself. For my personal development and collaboration style I think Zeebe makes a lot of sense for a broad range of tech problems.

I hear about a lot of people reaching for Kafka to achieve EDA. I have nothing against Kafka and I think it is a technically impressive piece of engineering. An experienced dev can probably get up and running with it in a day but they definitely won’t be using it optimally or even getting benefits from it for a long time because it is so complicated with tons of use case specific config required. Kafka is more low level than Zeebe and therefore applies to a broader range of use cases. However I think that if you are just using Kafka as an event bus to drive a collection of choreographed microservices, Zeebe may offer a simpler and actually more powerful solution. With Kafka things like event replay, retry strategies, and monitoring solutions at a business process level, are capabilities we need to build ourselves. Zeebe is appealing to me because they’ve found a way to generalize these needs. This allows us as developers to simply use their API to endow our apps with these advanced features and ultimately solve a business problem.

--

--

Brian McCann
The Startup

I take things and make them serverless. Living in Greenpoint Brooklyn, employed as Developer @CedrusDigital. Opinions are my own.