Operationalising NiFi on Kubernetes

Swarup Karavadi
The Startup
Published in
11 min readDec 7, 2019
Cloud Gate — Millenium Park Chicago.

This blog details my experience setting up a secure multi-node NiFi (1.9.2) cluster on Kubernetes (AWS EKS). I used the OIDC protocol for authentication (I set up a Keycloak server that acts as an identity broker to my organization’s Google IAM). There were a few ‘gotchas’ that took me way longer to figure out than they should have. The information on this topic is pretty scattered and I had to dig deep through NiFi’s documentation, source code, user mail archives and a lot of blog posts. So I figured documenting my experiences would be a useful thing to do.

Kubernetes

Stateful sets — While deploying NiFi to Kubernetes, I find it best to deploy NiFi as a Stateful Set. Stateful Set guarantees stable identities for Kubernetes pods and can also enforce ordered create, update and delete of the pods. Pods, by nature, are ephemeral. They can be terminated and recreated as deemed fit by the Kubernetes scheduler. Usually, when a pod that is part of a replica set or a daemon set or a deployment is recreated, it’s identity, i.e, it’s name is also changed. For example, if you were to deploy a single NiFi pod as a replica set, on initial creation the pod name might be nifi-abcd but when the pod is destroyed and recreated the second time, it’s pod name might be nifi-pqrs. Just like the pod, it’s identity is ephemeral as well.

This is clearly not a desirable situation to be in, because nodes in a NiFi cluster should always be communicating with each other (sending heartbeats, syncing state and whatnot). The cluster can’t be in a healthy state if the participating node identities are not stable. This is where using a stateful set to deploy NiFi comes in real handy. If I specify the number of replicas in my stateful set to be 3, then Kubernetes creates 3 pods with stable node identities of nifi-0, nifi-1 and nifi-2. These identities are persistent and outlive any pod recreations.

A stateful set also provides stable network identities to each pod. Typically, direct pod to pod communications rarely ever happen in Kubernetes. Any outbound network call from within a pod has to be first resolved by kube-dnsand only then the network call is routed to the appropriate destination. If you don’t already know, pod identities are not resolvable by kube-dns only Kubernetes service identities are resolvable. Stateful sets allow the creation of stable network identities of pods by creating what is known as a headless service. A headless service is a Kubernetes service that does not have a cluster IP associated with it — it inserts entries into the DNS tables that point directly to each pod that was created as part of the stateful set. If I created a headless service called nifi-hs along with my stateful set in the namespace nifithen the entries in the DNS table would like —

nifi-0.nifi-hs.nifi.svc.cluster.local
nifi-1.nifi-hs.nifi.svc.cluster.local
nifi-2.nifi-hs.nifi.svc.cluster.local

Why is this information relevant you ask? Well, when a NiFi node that has its nifi.cluster.is.node property set to true starts up, it goes and registers itself with the Zookeeper instance that is specified in the nifi.zookeeper.connect.string property. This information is then propagated to the other participating nodes in the cluster (or is used during the initial cluster setup for the discovery of other nodes). The NiFi node registers with Zookeeper, the value specified in its nifi.cluster.node.address property as the endpoint at which it can be reached. This endpoint should be the FQDN of the pod, which, in the case of a stateless set is defined by the corresponding headless service. The FQDN for the pod nifi-0 will be nifi-0.nifi-hs.nifi.svc.cluster.local — the same can be confirmed by running hostname -f in the container or by going through the contents of the /etc/hosts file in the container.

Pod Scheduling — My experience with Kubernetes before this endeavor was fairly limited and I was under the (false) impression that Kubernetes handles everything related to scheduling and I don’t have to worry about planning for capacity or how my network should look like, etc. I was wrong. When it comes to stateful sets, you have to think ahead and plan for capacity. Ideally, you’d want dedicated nodes for your NiFi deployment so that other pods don’t interfere i.e, compete for CPU and memory with the NiFi workloads. You’d also want to ensure that each NiFi pod is deployed on a separate worker node — high availability and all. To make this happen, you would have to add labels and taints to your worker nodes so that the NiFi pods can have corresponding affinities and tolerations.

To ensure not more than one NiFi pod is scheduled on a single worker node —

# pod spec -> affinity
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"
labelSelector:
matchExpressions:
# Pod should have corresponding label app:nifi
- key: "app"
operator: In
values:
- nifi

To ensure that the NiFi pods are scheduled on specific worker nodes —

# This should be done while provisioning the EKS cluster - ideally 
# via IaC tool like eksctl
kubectl label node nifi-node-0 nifi:1
kubectl label node nifi-node-1 nifi:1
kubectl label node nifi-node-2 nifi:1
# pod spec -> affinity
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "nifi"
operator: In
values:
- "1"

You can also ensure no other pods are scheduled on the worker nodes running NiFi by specifying the right taints (on the worker node) and tolerations (on the NiFi pods).

Volume Binding Mode — In your stateless set’s volumeClaimTemplates section, ensure that you specify a storage class that has a volume binding mode of WaitForFirstConsumer instead of Immediate. When a PVC that uses a storage class with a volume binding mode Immediate is created, the underlying PV is also immediately created and bound to the worker node. This means that in scenarios where a pod cannot be scheduled on the same worker node as the PV was bound to, the pod becomes un-schedulable. (I’m not even going to try and explain the weird combination of events that led to my NiFi pods getting stuck in the scheduler limbo because of volume node affinity conflicts — that is perhaps a subject for a dedicated blog post.) To avoid your NiFi pods getting stuck in the ‘Pending’ state, make sure that your storage class has the volume binding mode of WaitForFirstConsumer. In this mode, a PV is not created and bound until the pod is created and scheduled on a worker node.

NiFi

UI & Clustering — The way NiFi is designed, the UI and the runtime of NiFi are tightly coupled. In a NiFi cluster, it doesn’t matter which node is serving the UI, all nodes will return the same state of the cluster. In a way, the NiFi cluster is Strongly Consistent in the sense that in the case of a network partition, it does not allow any modifications to the cluster state (i.e., the cluster is unavailable for writes, although existing processors still continue to run on available nodes). You can verify this behavior by running kubectl delete pod -n nifi nifi-2 and deleting a pod from the NiFi cluster. Then try making changes to the UI, you’ll get the following error message on the UI —

Error message when modifying cluster state via the UI in a network partitioned cluster.

NiFi enforces consistency at the UI layer as well. It does this by performing what it calls HTTP Request Replication. When any NiFi node that is part of a cluster receives an HTTP request, it proxies the same request to all the other nodes in the cluster, merges the responses of all the requests received from the other nodes and then finally sends an HTTP request back to the client (the UI in this case).

Secure Cluster Communication — When configured to be secure, nodes in a NiFi cluster expect not only the user to be authenticated but also the nodes. Node authentication is required for the HTTP Request Replication that happens when a clustered NiFi node receives an HTTP request. That is why while configuring NiFi to be secure, you have to specify identities for both users and nodes. For a 3 node cluster, conf/authorizers.xml may look something like this -

conf/authorizers.xml file for a 3 node secure NiFi cluster

Notice that userGroupProvider defines user identities for each node which are then referred to in the accessPolicyProvider sections node identities. We modify the conf/authorizers.xml file on container startup (covered in the following sections).

Secure Configuration — While configuring a NiFi node to be secure, you have to specify the nifi.web.https.host property. Not specifying it means that the embedded jetty server will bind to all available hostnames. This is the expected behavior. But what threw me off was that when the nifi.web.https.host property is empty, any incoming request that has to be replicated will be sent to the localhost (because NiFi does not know which of the hostnames it has to send the request replication to, it uses localhost instead). This does not work and you end up getting an error in the UI. Refer to this JIRA ticket for more details.

TLS Toolkit — When provisioning a cluster, it is best to use the TLS toolkit in client/server mode. This way, all NiFi nodes will have received their certificates from a common certificate authority. I set up a single node replica-set that acts as a CA and a service to front it —

k8s manifest for deploying NiFi TLS Toolkit as a CA

Notice in the above gist — tls-toolkit.sh server -c nifi-ca-cs -t <token>. We are instructing tls-toolkit to run in server mode (i.e, act as a CA). The -c option indicates the CA hostname. This means that the CA (toolkit running in server mode) expects the incoming HTTP request’s host header (sent to it by the toolkit running in client mode) to have the same value that is provided to the -c option. As the NiFi nodes will talk to the CA via a service, -c should be set to the service name of the CA — nifi-ca-cs. In the NiFi pod’s startup script, we’ll instruct the tls-toolkit to run as a client and specify the CA endpoint — tls-toolkit.sh client -c nifi-ca-cs -t <token> — dn “CN=$(hostname -f), OU=NIFI” . Once the trustStore and keyStore are generated, you’ll have to copy them into the container’s nifi.properties file. Pierre Villard has an excellent blog on deploying a secure multi-node cluster on Google Cloud. The below scripts have been taken from his blog and slightly modified to fit the Kubernetes scenario.

A non-functional script showing the setting up the secure configuration at runtime. The fully functional script will be available at the end of this post.

Stale Zookeeper State — You’ll likely run into this problem when the hostnames that NiFi binds to change. For example, I initially set up the cluster to be not secure and then made the same cluster secure. Because of this, the NiFi node that had previously registered an HTTP replication endpoint with zookeeper as nifi-0.nifi-hs.nifi.svc.cluster.local:8080 now registered an HTTP replication endpoint as nifi-0.nifi-hs.nifi-svc.cluster.local:8443 — but it does not remove the previously registered HTTP replication endpoint. This causes problems when a node is trying to replicate a request to an endpoint that no longer exists — your NiFi cluster will not be available. The only way to overcome this is to delete NiFi’s cluster state from zookeeper.

Handling SSL Termination at Load Balancer — Once the cluster was set up, I had to expose the NiFi pods to the Internet. While there are multiple ways of doing this in Kubernetes (Ingress, Load Balancer, Node Ports) I used a Load Balancer to expose the NiFi pods to the Internet. I did not use Node Ports because I did not want to manage the port assignments myself. We were using an Nginx Ingress for our Kubernetes cluster and setting up SSL passthrough with proxy request headers was painful. Setting up SSL passthrough is not the most straightforward with Nginx Ingress and neither is it performant — this is because Nginx is an L7 load balancer that simulates SSL passthrough by doing an SSL re-encryption, unlike something like HAProxy which is an L3 load balancer and allows SSL passthrough via TCP proxying. However, this means we cannot modify HTTP headers — which is required when deploying NiFi behind a proxy.

So I ended up using a Load Balancer service. Now the problem with the load balancer service (and this is specific to the organization that I’m working for right now) is that it creates an ELB and SSL Termination is enforced on all ELBs by our infrastructure team. This meant that while HTTPS traffic was arriving till the ELB, the traffic from ELB to the pod was plain HTTP — this sucked coz NiFi won’t enable authentication UNLESS it gets traffic in HTTPS. So to overcome this (without wrestling with the org policies and the infrastructure team), I wrote a very simple node js proxy and injected it as a sidecar to the NiFi pods. The load balancer was then configured to send traffic to the sidecar’s port instead of the NiFi container ports. The sidecar would encrypt the incoming HTTP traffic and proxy the encrypted traffic to the NiFi pod to make it work.

NODE_TLS_REJECT_UNAUTHORIZED is set to 0. Ideally, this should not be the case. Use the above code with caution.

Now that we have introduced a proxy, we have to ensure that we configure the nifi.web.proxy.host and nifi.web.proxy.context.path options correctly.

Sticky Sessions for OIDC — When NiFi is configured to use OIDC as an authentication mechanism, it redirects the user to the OIDC provider’s login page. On successful authentication, the user is redirected back to the NiFi application. This will work fine when NiFi is deployed on a single node, but when deployed as a cluster behind a Load Balancer, the NiFi node that initiated the OIDC login flow will be different from the NiFi node that receives the on success callback. This new node will not have the server token that was sent to the OIDC as part of the login flow and you’ll end up seeing an error on your UI that goes something along the lines of Unable to Continue Login Sequence. See here for more details. This problem is pretty straightforward to overcome — all you have to do is configure the load balancer to have sticky sessions based on Client IP. This can be achieved in Kubernetes by setting service.spec.sessionAffinity to “ClientIP”.

Configure Policies — After you (the initial admin) log in, you’ll notice that all the buttons in the top bar (add a processor, create a funnel, add a processor group, etc) are disabled. This is because you have to explicitly grant yourself permissions to modify the flow. You can right-click on the canvas and click ‘Manage Access Policies’ and grant yourself the required policies.

Persisting users and authorizations across pod terminations — When the NiFi container starts, it looks at the contents of the conf/authorizers.xml and creates two more files — conf/users.xml and conf/authorizations.xml which contain the initial user list and the policies associated with each user respectively. Any new users you add or any new policies you create or any new policy associations you make to a user, the updates are stored in conf/users.xml and conf/authorizations.xml. For this reason, it is important to make these files persistent. The best way to do that is to modify the conf/authorizers.xml so that the authorizations and users files are read from a persistent disk mount location instead of the ephemeral conf directory that comes with the container —

Hopefully, the above points come across handy while you are setting up your own secure NiFi cluster using OIDC. I understand that the code samples I’ve provided are not fully functional. I’ll be working to get the fully functional code (manifests, charts, etc) in a few weeks from now. I’ll update this post once it’s done. Till then, here is the NiFi pod’s startup script —

--

--