Automate Delivering Apache Flink Applications

Rashid
6 min readJun 30, 2024

--

Automation has emerged as a critical practice in modern software development. Traditional development methodologies often struggle to meet today’s business demands to innovate and deliver value to customers rapidly, causing delays, errors, and increased costs. That increases the pressure on organizations to look for alternatives to modernize existing applications with speed and agility, deliver modern applications, and pull through business demands.

This blog explores one of the automation practices, GitOps, which aims to boost software delivery efficiently and consistently.

What is GitOps?

GitOps is an operational framework for managing infrastructures and applications as code using source controls. Several tools can assist in implementing the framework, and today, we will examine ArgoCD more closely.

ArgoCD is a powerful open-source platform that enables declarative Continuous Delivery for Kubernetes. This means that it automatically synchronizes and deploys resources when developers make changes in their code source control, such as GitHub or Bitbucket repositories, following a set of predefined rules and configurations.

The platform allows the definition of applications as a single unit, including its components, dependencies, and configuration, and then quickly deploys it across multiple environments.

The process works as follows:

  1. You store your application code in a Git repository, serving as the single source of truth.
  2. Argo CD continuously monitors your Git repository and synchronizes it with the desired environment, such as an OpenShift cluster.
  3. When changes are detected, Argo CD applies them to the environment. This means that you can update your application code in the Git repository, and Argo CD will automatically deploy those changes to the environment without any intervention.

One of the key benefits of using Argo CD is its ability to roll back to a previous version of your application if something fails during deployment. If an issue arises, you can revert to a specific commit or revision in your Git repository, and Argo CD will restore the previous state of your application.

Figure 1. Example of a EDA Architecture using GitOps

It is worth mentioning that it maintains a history of changes, which provides valuable insights and auditing capabilities. It records all changes made to the environment, including deployments, rollbacks, and other events. This allows developers to view detailed logs of Argo CD’s actions, providing a clear audit trail for compliance and troubleshooting purposes.

Let us put GitOps into action

The following section explores how GitOps practice can simplify Apache Flink’s software development. Being a distributed processing framework, Apache Flink can significantly benefit from GitOps practices.

Here are some of the key benefits:

  • Version Control: With GitOps, Apache Flink resources can be efficiently versioned and managed using source control, like Git. This allows developers to track changes, collaborate more effectively, and maintain the history of all modifications, enhancing their productivity and effectiveness.
  • Declarative Deployments: GitOps practices rely on declarative deployment, which allows developers to describe the desired state of Apache Flink applications rather than specifying the actual steps to achieve it. GitOps performs the heavy lifting tasks to automatically apply and ensure the deployment meets the developers’ specifications.
  • Immutable Infrastructure: GitOps uses a single source of truth to apply the changes; therefore, resources are replaced with a new deployment when modified. The GitOps principle recommends making the changes in the code source control, i.e., a Git repository, rather than making the modifications in place. This practice helps maintain consistency and reduces the risk of errors.
  • Automation: GitOps, when integrated with CI/CD pipelines, provides a secure and controlled environment for the deployment of Apache Flink applications. It allows for easy rollbacks and versioning of deployments, enabling teams to quickly recover from issues or test new versions of their applications, thereby enhancing their sense of security and control.

Apache Flink offers options to compiling multiple languages such as Java, Scala, and SQL. The following example shows how to build and automate Apache Flink SQL jobs.

Define stream processing logic

In the example below, the flow is listening to raw events of new orders and filtering out everything except EMEA region orders, as stated in the “WHERE” closure.

The code and changes are expected to be stored and maintained in a source code repository.

CREATE TABLE `New Orders`
(
`id` STRING,
`customerid` STRING,
`price` DOUBLE,
`quantity` BIGINT,
`region` STRING,
`ordertime` STRING,
`event_time` TIMESTAMP(3) METADATA FROM 'timestamp',
WATERMARK FOR `event_time` AS `event_time` - INTERVAL '1' MINUTE
)
WITH (
'connector' = 'kafka',
'topic' = 'ORDERS.NEW',
'properties.bootstrap.servers' = 'bootstrap:9092',
'scan.startup.mode' = 'earliest-offset',
'format' = 'json'
);

CREATE TEMPORARY VIEW `EMEA Orders` AS
SELECT * FROM `New Orders`
WHERE `region` LIKE 'EMEA';

CREATE TABLE `Publish EMEA Orders`
(
`id` STRING,
`customerid` STRING,
`price` DOUBLE,
`quantity` BIGINT,
`region` STRING,
`ordertime` STRING,
`event_time` TIMESTAMP(6)
)
WITH (
'connector' = 'kafka',
'topic' = 'ORDERS.NEW.EMEA',
'properties.bootstrap.servers' = 'bootstrap:9092',
'scan.startup.mode' = 'earliest-offset',
'format' = 'json'
);

INSERT INTO `Publish EMEA Orders` SELECT * FROM `EMEA Orders`;

Developers can use any tool to write flow scripts, such as IBM Event Automation. It offers a low-code user interface that enables a broad range of users to work with events and the Apache Flink processing engine. Also, the SQL statements can be exported from the Event Processing UI and saved to a file, for example emea-order.sql

Build a custom container image

Using Dockerfile to define the image instructions, as the following:

FROM icr.io/cpopen/ibm-eventautomation-flink/ibm-eventautomation-flink:1.1.8

RUN mkdir /opt/flink/usrlib
ADD target/flink-sql-runner-example-*.jar /opt/flink/usrlib/sql-runner.jar
ADD sql-scripts /opt/flink/usrlib/sql-scripts

Apache Flink does not currently support submitting external SQL scripts directly as jobs — aka fried provisioning. The community has created a Flink Java Application called SQL Runner that executes SQL scripts in a Table Environment as if they were Flink Application jars. The custom script that we defined earlier is loaded in /opt/flink/usrlib/sql-scripts folder — aka baked provisioning .

Continuous Delivery of Flink jobs

As mentioned earlier, as long as ArgoCD is hooked up to the source control, it will continue to monitor and apply changes in the registered environments. ArgoCD is responsible for creating the required resources; FlinkDeployment Custom Resource has been provided in this case.

Here are some tips on how to work with GitOps.

  • Store your resources as code in a version control system like Git. This will allow you to track over time changes to your infrastructures and applications and quickly revert to previous versions if needed.
  • Use a pull-based deployment model, where changes are pulled from Git and deployed to your infrastructure. This can help ensure that your infrastructure is always up-to-date with the latest changes defined in your Git repository.
  • Take responsibility for your resources by continuously monitoring them to ensure they run as expected. This could involve using monitoring tools like Prometheus or Grafana to track metrics. Dashboards could be useful to see what’s happening on your environment in real time. Consider using a GitOps tool like ArgoCD to automate the deployment of your infrastructure based on changes made to your Git repository. These tools can help you ensure that your infrastructure is always up-to-date with the latest changes defined in your Git repository and can also automate the rollback of failed deployments.

In summary, a GitOps approach involves using automation tools, such as Argo CD, to ensure that your infrastructure and applications are always in sync with the desired state defined in your Git repository — the single source of truth. Apache Flink can benefit from GitOps practices by leveraging version control for workflows and configurations, declarative deployments, immutable infrastructure, CI/CD pipelines, and observability and monitoring features. By adopting these practices, teams can improve their Apache Flink deployments’ efficiency, consistency, and reliability or any other modern applications.

--

--

Rashid

Artist. Athlete. #ProudIBMer, Solutions Architect at IBM | views are my own ♌︎