Splunk Operator for Kubernetes (SOK) — Indexers on K8s, Search Heads outside the K8s cluster

5 min read1 day ago

What was the goal?

Our goal was to use the Splunk Operator for Kubernetes (SOK) to create Splunk indexer clusters running on our new Kubernetes environment. The issue lies in the fact that indexers running on Kubernetes (K8s) cannot be accessed by search heads that are not also running on K8s.

Why are the search heads unable to connect to the indexers?

Kubernetes (K8s) establishes an internal network that enables communication between K8s pods in the cluster, however, external servers cannot directly communicate with pods inside the K8s cluster.

To enable incoming traffic to a Kubernetes-based Splunk cluster manager pod, we can utilise a Kubernetes service that allows external servers to access the REST port.

The challenge lies in the way the cluster manager replies to requests for the indexer generation. The JSON payload includes IP addresses of the indexers which are pods within K8s, therefore the returned IP’s cannot be accessed by servers that are not inside the K8s cluster.

Due to this issue, any search heads that are running outside the Kubernetes cluster would not have access to indexers running within K8s.

Why not move the search heads into K8s?

In our scenario we had on-premise bare metal Splunk search heads, it did not make sense to rebuild the hardware as K8s nodes.

The other reasons I had against moving search heads into K8s were:

Minimal benefit for Splunk — we did not have a need for more search head cluster members
Added complexity — K8s and the SOK introduce new challenges that we were not prepared for
Finally, not all search heads were under our management — we had 2 search heads where we had zero access to move them inside K8s

How can you handle ingress traffic into Kubernetes?

Within Kubernetes you can use a “service”, options are:

HostPort
NodePort
LoadBalancer / Gateway API

Since we were on-premise the logical options were HostPort or NodePort.
The LoadBalancer service is commonly used in cloud environments, however, we didn’t have MetalLB or an alternative K8s bare metal load balancer available so this was eliminated as an option.

A K8s service can expose the cluster manager’s REST port to Splunk search heads outside the K8s cluster, however there are internal complications with how Splunk works with the indexer “generation”.

The issue discussed here is specific to Splunk indexer clusters and searching. Kubernetes services do not pose an issue for ingress of Splunk data via HEC, S2S, web interface ports (8000) or REST ports.

Why does a service not work?

This is an example sub-section of a JSON response from a Splunk cluster manager:

{ 
 "host_port_pair":"10.192.9.18:8089",
 "peer":"splunk-example-idxc-site2-indexer-12",
 "site":"site2",
 "status":"Up" 
},

The host_port_pair is an internal K8s pod IP address.

Despite successfully connecting to the cluster manager through a service, the search head encounters an issue when making subsequent REST calls to the indexers due to the K8s-based pod IP not existing in the network.

A visual depiction would look like this (note that NodePort should be a HostPort in the diagram below):

Visualisation of a call to /services/cluster/master/generation/GUID hitting the CM via a HostPort, and then receiving back a list of K8s pod IP’s. The next network call doesn’t know what to do as the K8s pod IP isn’t valid outside a K8s node — Search head -> Cluster Manager communication

Solution — external search heads

Once the HPE team completed a proof of concept, implementing the solution became straightforward. The solution involved installing the flanneld and kube-proxy software on the search heads located outside the Kubernetes cluster.

At Kubernetes level we created a “node” to represent each search head with the taints of NoSchedule, NoExecute. We used the attribute unschedulable: true to ensure a pod could not be scheduled on the node, we called these nodes “external nodes”.
The external nodes used a manually allocated IP in the K8s CIDR at the end of the range to avoid clashes with standard K8s IP allocation.

The flanneld configuration on the external search heads query the K8s manager API and the node name matched the “node” we created in K8s.

To add further context, flanneld implements the “overlay network” and kube-proxy generates rules, iptables rules in our case.

A YAML example of an external node:

apiVersion: v1
kind: Node
metadata:
    name: externalhost.company.com
    labels:
        node-role.kubernetes.io/external-host: ""
spec:
    podCIDR: 10.207.255.225/32
    taints:
    - effect: NoSchedule
      key: node-role.kubernetes.io/external-host
    - effect: NoExecute
      key: node-role.kubernetes.io/external-host
    unschedulable: true

kube-proxy and flanneld were configured to use a kubeconfig file so they could authenticate as a K8s node. Below is the relevant line from the kube-proxy systemd unit file:

ExecStart=/usr/local/bin/kube-proxy \
  --bind-address=0.0.0.0 \
  --cluster-cidr=10.192.0.0/12 \
  --kubeconfig=/root/.kube/admin.conf \
  --nodeport-addresses=127.0.0.1/8

How does this solution work?

The flanneld software allows network communication to the Kubernetes IP’s. This allows the external search heads access to any pod’s internal IP as if they are running inside the K8s cluster.

kube-proxy allows the service to pod mapping to function as expected, in our case iptables rules are updated. In retrospect, we may not have required kube-proxy, but it was part of the POC and therefore part of the solution.

In the initial testing we updated /etc/resolv.conf to use the K8s DNS service IP, this allowed the use of K8s internal DNS names. Since search head to indexer communication was direct to the IP address of indexer pods, we reverted this change in production and used our default company DNS servers.

Within the on-premise environment we saw no issues with this setup, search head to indexer communication worked well and there was no noticeable difference when compared to connecting to non-K8s indexers.

In the cloud environment we encountered issues with cloud to on-premise traffic. The VXLAN/UDP packets had drop issues and troubleshooting was very complicated. This may be a consideration if you choose to go down this path.
One suggestion I have here is ensure the same MTU is available throughout the network path, otherwise fragmentation of the packets will occur and this can increase the drop rate of the packets.
I have further details on the challenges we found with the SOK in the article Splunk Operator for Kubernetes (SOK) — Lessons from our implementation.

Summary

While the Splunk Operator for Kubernetes has the general assumption that search heads will be within the same K8s cluster as indexers, we managed to create a solution that allowed external search heads to continue to function using flanneld and kube-proxy.

I want to recognize that the HPE team were responsible for the proof of concept of “external nodes”.