How a Camunda Connector helps you handling files in the Cloud

Stefan Schultz
7 min readFeb 19, 2024

--

A bucket full of files in the cloud (AI generated)

Introduction

This article focuses on file handling in processes, the hurdles to take and how a Camunda connector can help you to automate your file intensive processes.

The following example shows a process that downloads a file from cloud storage, analyses the data and generates a report based on the contents. Afterwards it uploads the report back to the cloud storage. The process uses data from a file to decide on how to continue its flow.

Example of a process handling files in the cloud

What you will learn from this article:

  • the limitations of the various Camunda engine setups when handling files
  • the reasons why putting files into process variables is bad practice
  • the details on how using a connector makes it easier and more maintainable to handle files

Independently from the underlying architecture of the process engine, you should always ask yourself if it is a good idea putting whole files into the process engine. With simple data used to drive your process forward — e.g. for decisions — this is easy, since you just set them as process variables, but files on the other hand come with a certain overhead: They have metadata attached to them and they can be quite big. So it is generally not a good idea to send them to the process engine as variables to drive your process forward.

Why is file handling not trivial?

With Camunda Platform 7 developers are able to embed the engine into their process application. With the provided Java API there are very fine granular options to interact with the engine and the processes deployed to it. Business objects can be serialized in many ways and be sent to the engine where they could be retrieved and deserialized by other activities with relatively low overhead. The biggest drawback is, that the payload ends up in the database as blobs (binary large objects) taking up space in the engine’s database runtime table and also in the history table of your Camunda instance.

Schema of an embedded engine in a Java application

This gets more complicated when you want to run Camunda Platform 7 as a remote engine with an external task client. This scenario might be necessary when using a programming language that is not supported out of the box, or running a central Camunda installation. This means switching out the Java API and instead using the engine’s REST API to communicate with the engine, transforming your application into a distributed system where networking plays an important role. Sending a payload over a network to the engine is now a relevant factor adding to the equation, especially if the payload contains big files.

Schema of a Java application with a remote engine

With the introduction of Camunda Platform 8 a remote engine called Zeebe is now the standard, removing the possibility to embed an engine and additionally leveraging the advantages of a distributed system e.g. to improve fault-tolerance and scalability. The process engine consists of a set of independent nodes, each one capable of handling the execution of process instances of the processes deployed to them. For fault-tolerance all the data is replicated between the nodes, including process variables. The communication with the cluster is implemented using gRPC (Google Remote Procedure Call).

Schema of a Java application with distributed remote engine

Camunda 8 introduces some limitations that have to be taken into account when trying to add files to variables: Engine and infrastructure limitations.

Zeebe — the engine behind Camunda Platform 8 — only supports variables in JSON format, with a maximum size of approx. 3MB. You could add the content of your file as a character string to the payload if it is relatively small. Now your file is in the engine’s process state. What happens next is that Zeebe replicates the context based on the replication factor to some of the other brokers to assure fault tolerance, so now you have multiplied your file size by the factor of the replication. Additionally the engine sends the data to Operate for runtime access and Optimize for process optimisation, adding two more copies. This takes away disk space from your cluster and has an impact on the overall performance of the nodes, be it self-managed or the SaaS offering by Camunda. The underlying storage of a partition has a compaction and clean-up mechanism, but depending on the load you might reach performance limitations before a clean-up is triggered. You can read more about the fundamentals in Camunda’s documents about resource planning and about Facebook’s RocksDB in its official documentation.

What is the alternative to sending files to the engine?

In the previous section I told you about the disadvantages of storing files into variables. The alternative would be to store the file locally and only hand over the reference to the file. But this approach is als not free from obstacles. Generally there is no difference between Camunda 7 and 8, you can use e.g. a simple FileOutputStream in your worker or delegate to write your file to the local disk and return the file handle itself.

Path file = baseDir.resolve(filePath);
try (OutputStream stream = Files.newOutputStream(file)) {
stream.write(content);
return file;
}

Now you can hand the path as a variable to the engine and let other activities access it. This also works if you want to access this file from another process, because Camunda allows you to hand over variables during a process start as well. But what happens if your dependent process doesn’t share the local file system of your process application? You need a central place to put your file and share the path. This allows you to keep files locally to share them between activities and store them globally (e.g. in the cloud) to access them between processes, without the need of a shared filesystem.

Schema of a shared and separated file system, and a cloud alternative

What are the advantages of using a connector?

There are many good reasons on why to use a connector in your project. The important ones being:

  • open-source
  • accessibility through the marketplace
  • reusability

The most compelling one is the open-source and community nature behind a Camunda connector. You can see the code in Github and create a pull-request with changes or even your own fork of the repository. Even if the project is no longer maintained you have access to the sources. Another advantage is accessibility through the marketplace. Customers can search for certain functionalities and add the connector to their processes. The only downside is that partner and community connectors cannot be added directly to the SaaS solution. You have to deploy and run them yourself. Another advantage is that the connector and its runtime can easily be reused between different processes and applications in your company. You don’t need to reinvent the wheel every time you need some special kind of logic. Packing it into a connector makes it available for other teams in your company and provides a unified configuration interface in the Camunda Modeler. The AWS S3 connector e.g. only needs two configurations: authentication for AWS and the location of the file, the rest is handled by the connector.

What features does the AWS S3 connector provide?

Currently the connector provides these main features to support better file handling:

  1. Simple configuration interface for AWS authentication and file location
  2. CRUD operations in Amazon S3 buckets
  3. Location sharing with result variables
  4. High quality standards

With this set we can already implement most of the main scenarios when handling files: We can e.g. download an input file, parse it and use the content to do some processing. The downloaded files are shared in the local filesystem and the location is returned by the connector. If you decide to generate a result you can also save it locally and upload it back to the cloud, since our file handling APIs are accessible for connectors and even a JobWorker (or a JavaDelegate) running in the same application. Another process can react to that upload by listening for events on the bucket via the existing Amazon Simple Notification Service (SNS) inbound connector by Camunda and do some processing on that file as well. Additionally, the AWS S3 Connector is implemented and tested on the highest standards, the same as if we would implement it for a paying customer. This makes it easy for customers to focus on the functionality and not worry about quality and reliability in production.

Interested in using it? You can find the connector in the Camunda Marketplace for Connectors and the source code in the Github repository.

--

--

Stefan Schultz

I'm a Principal Software Engineer, Process Automation Expert and People Lead @ Consid GmbH