Auto Scaling Apache Flink Pipelines using Kubernetes, Flinkoperator, and HPA

Introduction to Lyft Operator

Flinkk8sOperator is an open-source Kubernetes operator from Lyft (https://github.com/lyft/flinkk8soperator) that manages Apache Flink applications on Kubernetes. The operator acts as a control plane to manage the complete deployment lifecycle of a Flink application.

apiVersion: flink.k8s.io/v1beta1
kind: FlinkApplication
metadata:
name: wordcount-operator-example
namespace: flink-operator
annotations:
labels:
environment: development
spec:
image: docker.io/lyft/wordcount-operator-example:{sha}
deleteMode: None
flinkConfig:
taskmanager.heap.size: 200
taskmanager.network.memory.fraction: 0.1
taskmanager.network.memory.min: 10m
state.backend.fs.checkpointdir: file:///checkpoints/flink/checkpoints
state.checkpoints.dir: file:///checkpoints/flink/externalized-checkpoints
state.savepoints.dir: file:///checkpoints/flink/savepoints
web.upload.dir: /opt/flink
jobManagerConfig:
resources:
requests:
memory: "200Mi"
cpu: "0.1"
replicas: 1
taskManagerConfig:
taskSlots: 2
resources:
requests:
memory: "200Mi"
cpu: "0.1"
flinkVersion: "1.8"
jarName: "wordcount-operator-example-1.0.0-SNAPSHOT.jar"
parallelism: 3
entryClass: "org.apache.flink.WordCount"
  1. Operator bootstraps Flink Cluster by creating Job Manager and Task Manager pods.
  2. When the Flink Cluster is ready, the operator performs job submit using the Job Manager Rest APIs. If the Cluster fails to start, the operator updates the application status to deploy failed.

Auto Scaling in Kubernetes using HPA

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods in a replication controller, deployment, replica set, or stateful set based on observed CPU/memory utilization (or with custom metrics support on some Prometheus application-provided metrics).

  1. Understanding how HPA works
  • During each period, the controller queries the per-pod resource metrics (like CPU) from the metrics server (running in Kubernetes Control Plane) against the metrics specified in each HPA resource.
  • The controller then takes the mean of the utilization across all targeted pods and produces a ratio used to scale the number of desired replicas.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
labels:
app: foo
version: 1.0.0
name: foo
namespace: default
spec:
maxReplicas: 4
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: foo
targetCPUUtilizationPercentage: 50
  • ceil(80⁄50 * 2) = 4 (Desired Replicas)
  • Additional Replicas to be provisioned -> 4–2 = 2 (Desired Replicas — Current Replicas)
  • ceil(20⁄50 * 4) = 2 (Desired Replicas)
  • Replicas to be terminated -> 2–4 = -2 (Desired Replicas — Current Replicas)

Auto Scaling Flink pipelines using HPA

As explained in the previous section HPA can only with Kubernetes Deployment, Replica Set, and Stateful Set because these resources have subresource called “Scale”.

subresources:
status: {}
scale:
labelSelectorPath: .status.selector
specReplicasPath: .spec.taskManagerConfig.replicas
statusReplicasPath: .status.clusterStatus.numberOfTaskManagers

Putting it all together

When a Flink application with HPA definition is installed to the Cluster, the HPA controller will calculate desired TaskManager pods using the below algorithm.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
labels:
app: foo
version: 1.0.0
name: foo
namespace: default
spec:
maxReplicas: 4
minReplicas: 1
scaleTargetRef:
apiVersion: flink.k8s.io/v1beta1
kind: FlinkApplication
name: foo
targetCPUUtilizationPercentage: 40

Conclusion

In this post, I showed how to put together incredibly powerful patterns in Kubernetes — HPA, Operator, Custom Resources to scale a distributed Apache Flink Application. For all the criticism of complexity in Kubernetes, use cases like these really showcase where Kubernetes really shines.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Krishna Arava

Krishna Arava

Cloud Native Engineer — CKA, CKAD certified @Cisco. Follow me https://twitter.com/krisharava