Setup Splunk on Kubernetes

Swarup Donepudi
7 min readFeb 6, 2018

--

To my surprise I did not find a single medium story explaining how to set up centralized logging using managed Splunk on cloud for micro services running on kubernetes clusters. Our team manages close to 10 kubernetes clusters, most of which are multi-tenant while some are single-tenant clusters. We were using journald as the docker log driver. Also, it was required by the development teams to make sure that the micro service that they develop has to write all of the logs that they would like to find on splunk, be written to standard output. Journald was chosen as the log driver long before this project was undertaken. Simple reason for choosing journald as the log aggregator was that we could use the powerful journalctl CLI tool to get to our logs and also jourland is being adopted by almost all of *unix platforms. We were running kubernetes on CoreOS operating system. Going into the project I assumed that this problem has probably been solved by a bunch of engineers already. Apparently according to this github issue it is not such an easy problem after all. However, I found this blog on splunk.com super helpful, which explained exactly what I was looking for. I am reusing the same pictures available on the same blog, since they perfectly represent the design of the setup.

In the above infographic each Application corresponds to a micro service running as a docker container on each kubernetes worker node. Since journald is the docker logging driver, it captures all of the events written by the applications to standard output. This is all explained in the blog on splunk.com that I referenced earlier.

I am going to explain the process of setting up each of the below mentioned components required to take log events generated by docker container in a separate section. However, there are a few problems with this seemingly simple setup, which I will explain in these sections.

  1. Convert journald logs from binary to JSON format
  2. Splunk Deployment Server
  3. Splunk Forwarder on each node

Convert journald logs from binary to JSON format:

Splunk can not read the default binary format of journald , you should write to a non binary file and then forward to splunk cloud. Most of the problems that I have faced while I was trying to stabilize the log aggregation originated from this conversion. I will explain the problems that I have faced in this process in my next story. Assuming that everything in life is simple, the following command should take care of constantly converting the binary journal logs and write them to a readable json file on disk.

journalctl — no-tail -f — since “$(date — date=”3 minutes ago” +”%F %H:%M:%S”)” -o json > /var/splunk/journald.json

However, since this should run continuously, this needs to be run as a systemd unit. Here is the systemd unit that I used to set this up nicely.

Unit file : /etc/systemd/system/journald-to-json-for-splunk.service

Contents of the file:

enable: true
command: start
content: |
[Unit]
Description=Write json formatted journald logs to /var/splunk/journald for splunk forwarder.
[Service]
EnvironmentFile=/etc/environment
ExecStartPre=/usr/bin/mkdir -p /var/splunk
ExecStart=/bin/bash -c ‘/usr/bin/journalctl — no-tail -f — since “$(date — date=”3 minutes ago” +”%F %H:%M:%S”)” -o json > /var/splunk/journald.json’
Restart=always
RestartSec=10

Start the systemd unit

systemctl start journald-to-json-for-splunk

As long as the systemd unit is running, the journald logs will be converted to json format and are written to /var/splunk/journald.json file.

Setup Splunk Deployment Server

Up until this point we have not really done anything that would take the logs from the host machine and then forward them to splunk cloud. There are two steps that need to be completed in order to forward logs from a machine to splunk cloud. One of them is setting up a Splunk Deployment Server. In order for an an agent(splunk forwarder) to send logs from a host machine to splunk cloud, the agent needs the following information.

  1. What is the splunk cloud url?
  2. Authentication required to authorize itself with splunk cloud.
  3. What file to watch for?
  4. What is the index on splunk cloud?
  5. Any metadata to add to each event?
  6. Any filters to apply before sending?

I call all of this information as configuration. While some of this configuration might same for every single forwarder, some configuration is different. For instance #1, #2, #3 is same for all the forwarders. #4, #5 and #6 could be different for different forwarders. It is more likely that a group of forwarders will have same exact configuration. Managing this configuration is made easy is by a special piece of software from splunk called as “Deployment Server”. Running a deployment server is optional though. You can totally add this configuration on to a “Forwarder” directly. The below infographic from splunk docs, that explains how Deployment Server distributes this configuration.

We dockerized Splunk Deployment server and pushed the docker image to docker registry using CI CD pipeline. All of the configuration required by deployment server is passed during the deployment. This dockerized splunk deployment server is deployed as a “Deployment” resource on kubernetes. Optionally you can connect to the web interface of this splunk deployment server. If you are planning on doing that, then do not run more than one instances of as kubernetes does not provide session stickiness, unless you configure it to.

Setup Splunk Forwarder

This is probably the most important of all the three components. Splunk forwarder does almost all of the heavy lifting in forwarding the logs from host machine to splunk cloud. All of the configuration required by the splunk forwarder is received from deployment server. However, when a forwarder is started, it needs to know which deployment server to talk to in order to get the configuration. We have dockerized splunk-forwarder too. The image is built and pushed to internal docker registry using CI CD process.

Every time a new forwarder connects to a deployment server, the deployment server will identify the class of the client based on the name. After identifying the group/server-class to which the forwarder belongs, then deployment server will forward that configuration to the forwarder. Here is the kubernetes manifest files that are required to make it all work.

##########################
# Splunk Deployment Server
##########################
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: splunk-deployment-server
namespace: splunk
labels:
app: splunk-deployment-server
spec:
replicas: 1
revisionHistoryLimit: 10
template:
spec:
containers:
- name: splunk-deployment-server
image: my-splunk-deployment-server:latest
env:
- name: SPLUNK_INDEX
value: myindex
ports:
- name: splunk-api
containerPort: 8089
protocol: TCP
- name: splunk-web
containerPort: 8000
protocol: TCP

Splunk Deployment server is a simple kubernetes deployment. Splunk Forwader is a daemonset. However, setting up splunk forwarder required a complicated setup which includes adding an init container. This is because, the first time a forwarder connects to a deployment server, the deployment server will send the configuration to the forwarder. However, in order for that configuration to work on forwarder, the splunk daemon in forwarder needs to be restarted. Since, splunk daemon is PID 1 in the splunk forwarder docker container, the container dies and kubernetes will spin up a new one instantly. The configuration that deployment server sent to forwarded gets destroyed along with the container. This make splunk-forwarders semi stateful. Since the configuration is not present in a newly spun up container, the forwarder will contact the deployment server and this results in a cyclic behavior. To avoid this situation we add kubernetes init container that stores the configuration between container restart/recreations.

###############################################################################
# Splunk Forwarder Daemonset
###############################################################################
---
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: splunk-forwarder
namespace: splunk
spec:
template:
annotations:
pod.beta.kubernetes.io/init-containers: '[
{
"name": "wait-for-deployment-server",
"image": "busybox",
"imagePullPolicy": "Always",
"command": ["sh", "-c", "until ping deployment-server-internal-url -c 1; do sleep 3; done;"]
},
{
"name": "splunk-forwarder-initializer",
"image": "my-splunk-forwarder:latest",
"imagePullPolicy": "Always",
"env":[
{
"name": "CLIENT_NAME_PREFIX",
"value": "example-client"
},
{
"name": "CLIENT_NAME_SUFFIX",
"value": "initializer"
},
{
"name": "DEPLOYMENT_SERVER_HOST",
"value": "deployment-server-internal-url"
},
{
"name": "DEPLOYMENT_SERVER_PORT",
"value": "32004"
}
],
"volumeMounts": [
{
"name": "forwarder-config",
"mountPath": "/opt/splunk/etc"
},
{
"name": "forwarder-var",
"mountPath": "/opt/splunk/var"
}
]
}
]'
spec:
hostNetwork: true
volumes:
- name: var-splunk-journald-json
hostPath:
path: /var/splunk/journald-json
- name: forwarder-var
emptyDir: {}
- name: forwarder-config
emptyDir: {}
containers:
- name: splunk-forwarder-container
image: my-splunk-forwarder:latest
env:
# This variable is required to register the forwarder with deployment server
- name: CLIENT_NAME_PREFIX
value: example-client
- name: CLIENT_NAME_SUFFIX
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# This variable is required to register the forwarder with deployment server
- name: DEPLOYMENT_SERVER_HOST
value: deployment-server-internal-url
- name: DEPLOYMENT_SERVER_PORT
value: "32004"
volumeMounts:
- name: var-splunk-journald-json
mountPath: /opt/splunk/journald.json
- name: forwarder-config
mountPath: /opt/splunk/etc
- name: forwarder-var
mountPath: /opt/splunk/var
ports:
- name: splunk-daemon
containerPort: 8089
protocol: TCP
- name: event-collector
containerPort: 8088
protocol: TCP
- name: nw-input
containerPort: 1514
protocol: TCP

There is a lot more details associated with respect to host networking. port configurations and volume mounts. Size of this story might be blown out if every detail is documented here. I will try to write more posts around all of the several problems that I had to deal with in the process of stabilizing splunk logging.

--

--