This blog details my experience setting up a secure multi-node NiFi (1.9.2) cluster on Kubernetes (AWS EKS). I used the OIDC protocol for authentication (I set up a Keycloak server that acts as an identity broker to my organization’s Google IAM). There were a few ‘gotchas’ that took me way longer to figure out than they should have. The information on this topic is pretty scattered and I had to dig deep through NiFi’s documentation, source code, user mail archives and a lot of blog posts. So I figured documenting my experiences would be a useful thing to do.
Stateful sets — While deploying NiFi to Kubernetes, I find it best to deploy NiFi as a Stateful Set. Stateful Set guarantees stable identities for Kubernetes pods and can also enforce ordered create, update and delete of the pods. Pods, by nature, are ephemeral. They can be terminated and recreated as deemed fit by the Kubernetes scheduler. Usually, when a pod that is part of a replica set or a daemon set or a deployment is recreated, it’s identity, i.e, it’s name is also changed. For example, if you were to deploy a single NiFi pod as a replica set, on initial creation the pod name might be
nifi-abcd but when the pod is destroyed and recreated the second time, it’s pod name might be
nifi-pqrs. Just like the pod, it’s identity is ephemeral as well.
This is clearly not a desirable situation to be in, because nodes in a NiFi cluster should always be communicating with each other (sending heartbeats, syncing state and whatnot). The cluster can’t be in a healthy state if the participating node identities are not stable. This is where using a stateful set to deploy NiFi comes in real handy. If I specify the number of replicas in my stateful set to be 3, then Kubernetes creates 3 pods with stable node identities of
nifi-2. These identities are persistent and outlive any pod recreations.
A stateful set also provides stable network identities to each pod. Typically, direct pod to pod communications rarely ever happen in Kubernetes. Any outbound network call from within a pod has to be first resolved by
kube-dnsand only then the network call is routed to the appropriate destination. If you don’t already know, pod identities are not resolvable by
kube-dns only Kubernetes service identities are resolvable…