Apache Flink Stateful Functions deployment

Published in

DevOops World … and the Universe

6 min readJan 26, 2021

Overview:

Stateful Functions is an API that simplifies the building of distributed stateful applications with a runtime built for serverless architectures. It brings together the benefits of stateful stream processing — the processing of large datasets with low latency and bounded resource constraints — along with a runtime for modeling stateful entities that supports location transparency, concurrency, scaling, and resiliency.

Flink StateFun Function can be written on the severals programming languages like: Java, Python, Go, Rust, Haskell

Flink jobs deployment types:

Flink can execute applications in one of three ways:

in Application Mode
in a Per-Job Mode
in Session Mode

The above modes differ in:

the cluster lifecycle and resource isolation guarantees
whether the application’s main() method is executed on the client or on the cluster.

Application Mode

Pros:
* Submission applications with multiple jobs
* Specific cluster configs per job
* Session cluster per application

Cons:
* Separate cluster per job/application
* Resources overhead (CPU, RAM, Network)

Per-Job Mode

Pros:
* Resources isolation
* Spin up a cluster for each submitted job
Cons:
* Separate cluster per job/application

Session Mode

Pros:
* Shared cluster/resources
* Support single cluster

Cons:
* Jobs affect each other
* Resources are shared across all jobs

Stateful Functions building and deployment

Stateful Functions applications can be packaged as either standalone applications or Flink jobs that can be submitted to a cluster.

Building

Docker Image

The recommended deployment mode for Stateful Functions applications is to build a Docker image. This way, user code does not need to package any Apache Flink components. The provided base image allows teams to package their applications with all the necessary runtime dependencies quickly.

Example Docker file

FROM flink-statefun:2.2.0

RUN mkdir -p /opt/statefun/modules/statefun-example
RUN mkdir -p /opt/statefun/modules/remote

COPY target/statefun-example*jar /opt/statefun/modules/statefun-example/
COPY module.yaml /opt/statefun/modules/remote/module.yaml

Official Docker image still in developing so you can use Ververica Docker image flink-statefun:2.2.0

Flink jar job

If you prefer to package your job to submit to an existing Flink cluster, simply include statefun-flink-distribution as a dependency to your application.

<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>statefun-flink-distribution</artifactId>
	<version>2.2.0</version>
</dependency>

For start your Function use:

./bin/flink run -c org.apache.flink.statefun.flink.core.StatefulFunctionsJob ./statefun-example.jar

The following configurations are strictly required for running StateFun application.

classloader.parent-first-patterns.additional: org.apache.flink.statefun;org.apache.kafka;com.google.protobuf

Deployment

Remote Functions

Functions can be executed remotely, with message and state access provided as part of the invocation request. This way, functions can be managed independently, like stateless processes. The state/messaging tier (i.e., the Flink processes) and the function tier are deployed, managed, and scaled independently.

Function invocations happen through an HTTP/gRPC protocol and go through a service that routes invocation requests to any available endpoint, for example a Kubernetes (load-balancing) service, the AWS request gateway for Lambda, etc. Because invocations are self-contained (contain message, state, access to timers, etc.) the target functions can be treated like any stateless application.

Co-located Functions

An alternative way of deploying functions is co-location with the Flink JVM processes. In such a setup, each Flink TaskManager would talk to one Function process sitting “next to it”. A common way to do this is to use a system like Kubernetes and deploy pods consisting of a Flink container and the function side-car container; the two communicate via the pod-local network.

This mode supports different languages while avoiding to route invocations through a Service/LoadBalancer, but it cannot scale the state and compute parts independently.

Embedded Functions

Embedded Functions are similar to the execution mode of Stateful Functions 1.0 and to Flink’s Java/Scala stream processing APIs. Functions are run in the JVM and are directly invoked with the messages and state access. This is the most performant way, though at the cost of only supporting JVM languages. Updates to functions mean updating the Flink cluster.

Flink Docker deployment

Kudo deployment

Kudo helps easy to deploy Flink cluster with jar jobs.

Here is example of the job uploader to the flink cluster.

Job will use kafka as from Kudo for events

command:            
    - "/usr/local/bin/bash"
          
args: ['-c',             
       'export JOB_FILENAME=$(basename $DOWNLOAD_URL); 
         echo "DOWNLOAD_URL: $DOWNLOAD_URL FILE: $JOB_FILENAME          JOBMANAGER: $JOBMANAGER"; 
          apk add --no-cache jq curl;              
          curl -s $DOWNLOAD_URL -o $JOB_FILENAME;
          curl -s -X POST -H "Expect:" -F "jarfile=@$JOB_FILENAME" $JOBMANAGER:8081/jars/upload;
          while true; do date; export JAR_ID=$(curl -s $JOBMANAGER:8081/jars | jq -r ".files[].id");
          if [ -z $JAR_ID ];
          then              
          echo "No uploaded jar detected";
          else              
          echo "Found jar $JAR_ID";
          export SUBMIT_MSG=$(curl -s -X POST -H "Expect:" $JOBMANAGER:8081/jars/$JAR_ID/run?program-args=--kafka_host%20kafka-kafka-1.kafka-svc.default.svc.cluster.local:9093 | jq -r ".errors");                        echo "RESPONSE: $SUBMIT_MSG";
          if [ $SUBMIT_MSG == "null" ];
          then              
          echo "SUBMITTED JOB!";
          exit 0;
          else
          echo "Failed to submit job: $SUBMIT_MSG";
          fi;
          fi;
          echo "=====================";
          sleep 5; done;']

K8s Helm chart

Statefun has native Dockerfile and building script in their repository that can be very useful for building images

https://github.com/apache/flink-statefun/tree/master/tools/docker

Helm chart already there as well. It deploys Flink cluster with your function in place
https://github.com/apache/flink-statefun/tree/master/tools/k8s

Docker compose

version: "2.1"
services:
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - "2181:2181"
  kafka-broker:
    image: wurstmeister/kafka:2.12-2.0.1
    ports:
      - "9092:9092"
    environment:
      HOSTNAME_COMMAND: "route -n | awk '/UG[ \t]/{print $$2}'"
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    depends_on:
      - zookeeper
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
  master:
    build:
      context: .
    expose:
      - "6123"
    ports:
      - "8081:8081"
    environment:
      - ROLE=master
      - MASTER_HOST=master
    volumes:
      - ./checkpoint-dir:/checkpoint-dir
  db:
    image: postgres
    environment:
      - POSTGRES_DB=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    ports:
      - "5432:5432"
  worker:
    build:
      context: .
    expose:
      - "6121"
      - "6122"
    depends_on:
      - master
      - kafka-broker
    links:
      - "master:master"
      - "kafka-broker:kafka-broker"
    environment:
      - ROLE=worker
      - MASTER_HOST=master
    volumes:
      - ./checkpoint-dir:/checkpoint-dir
  python-worker:
    build:
      context: ./greeter
    expose:
      - "8000"
  event-generator:
    build:
      context: generator
      dockerfile: Dockerfile
    depends_on:
      - kafka-broker

You can download a skeleton project here https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/downloads/walkthrough.zip

Summary:

Stateful Functions is useful and modern serverless solution. Stateful Functions is a framework built on top of the Apache Flink runtime, which means it inherits Flink’s deployment and operations model, and there are no new concepts you need to learn.

It deploys very easy with several types of deployments depends on the your needs and business goals. What method is better you should chose by yourself as each of them has the right to exist. Stateful Functions has ability to be delivered in common method. The main advantages that it supports high availability and scalability.

Flink helps to monitor and show Stateful Functions status using UI