How to monitor distributed logs in Kubernetes with the EFK stack.

Christiaan Vermeulen
11 min readSep 2, 2019

--

No more searching endlessly for the correct logs.

When running one pod for a service, it’s quite simple to get the logs.
kubectl logs pod right ?

When scaling the deployment, to like say 10, we could follow the logs for an entire deployment.
kubectl logs deploy/deployment-name right ?

Great !

But what happens when your node disappears ? What happens when you recreate your cluster, or want to search for a certain IP in your ingress logs, or see a dashboard of all your log data ? How do you apply ML to your logs ?

We’ll be using ElasticSearch (Storage), Fluentd (Logging Layer), and Kibana (Visualization) to store, aggregate & visualise logs.

There are a variety of different ways to add distributed logging, but I have found this approach quite simple to start off with.

You need a few things.

  1. An existing Kubernetes Cluster.
  2. kubectl binary locally installed

Getting a Kubernetes Cluster

There are a multitude of ways for getting a Kubernetes cluster setup, but I find the easiest just to use a DigitalOcean managed cluster. They already have all the networking and storage configured and all you have to do is create and download your kubeconfig

You can sign up for Kubernetes using this link
The above is a referral link with $50 free usage :)

You can also spin up clusters using tools like minikube, microk8s, or even using kubeadm to create your own cluster.

For this tutorial you might need slightly beefier nodes. So select 2 of the $40 , 8GB, 4vCPU machines. You’ll only be running these for a little while, so don’t worry too much about cost. You’ll end up losing < $2 of your free $50

Installing kubectl

Checkout the up-to-date Kubernetes docs for installing kubectl

Create a project Directory

We’ll want a place to store all of our Kubernetes manifests to be re-applied to a new cluster later or to recreate this one.

Create a directory called efk-tutorial anywhere on your machine and cd into it.

Create an empty git repo in your directory, create a README.md file, and commit that as a starting point

$ git init
$ echo "# EFK Tutorial" >> README.md
$ git add README.md
$ git commit -m "Initial commit"

You’re setup to start writing some manifests.

Deploy a workload which generates logs

If you have got an existing workload running which generates logs, you can skip this part, as you’ll be collecting your own logs.

If you are using this blog to learn, you’ll want a workload that spits out predictable logs.

We will use a utility Docker image, which sole purpose is to spit out random dragon names. You can check the source code here

When deploying this image, it will create a namespace called random-generator and deploy a pod which adds a json log entry for a random dragon name every second.

Create a new file called random-generator.yml and add the following content

# ./random-generator.yml
# The namespace for our log generator
kind: Namespace
apiVersion: v1
metadata:
name: random-generator
---
# The Deployment which will run our log generator
apiVersion: apps/v1
kind: Deployment
metadata:
name: random-generator
namespace: random-generator
labels:
app: random-generator
spec:
selector:
matchLabels:
app: random-generator
template:
metadata:
labels:
app: random-generator
spec:
containers:
- name: random-generator
imagePullPolicy: Always
# You can build the image off the source code and push to your own docker hub if you prefer.
image: chriscmsoft/random-generator:latest

Apply to your Kubernetes cluster using

$ kubectl apply -f random-generator.yml
namespace/random-generator created
deployment.apps/random-generator created

You can now output the log entries to see what the logs from the container look like

$ kubectl logs deploy/random-generator -n random-generator
{"name": "Siovaeloi, Protector Of The Weak"}
{"name": "Qandocruss, Champion Of The White"}
{"name": "Frarvurth, The Voiceless"}
[...]

We’ve got logs to work with !

Setup the directory structure

The completed directory structure will look more or less like this

tree
.
├── README.md
├── logging
│ ├── elasticsearch
│ │ ├── service.yml
│ │ └── statefulset.yml
│ ├── fluentd
│ │ └── daemonset.yml
│ ├── kibana
│ │ ├── deployment.yml
│ │ └── service.yml
│ └── namespace.yml
└── random-generator.yml
4 directories, 8 files

Create a directory in your project called logging. This is where we will store all our Kubernetes resources for logging.

$ mkdir -p logging
$ cd logging

We’ll create a namespace in Kubernetes called logging, where we will run all of our logging workloads.

Create a file called namespace.yml and insert the contents

# logging/namespace.yml
kind:
Namespace
apiVersion: v1
metadata:
name:
logging

Apply the namespace and check it has been created

$ kubectl apply -f namespace.yml
namespace/logging created
$ kubectl get namespaces
NAME STATUS AGE
[...]
logging Active 9s
random-generator Active 106m

Deploy ElasticSearch

ElasticSearch is where our log data will be stored. So this we need first.

There are a couple ways to deploy ElasticSearch in your cluster.

Statefulset - Easiest, but not recommended for production as you’ll have to maintain it yourself
Kubedb - Much better at running ElasticSearch, and will manage, backup, expose metrics etc. Much better for production.
Helm Chart - Essentially a statefulset, with a few extra resources for Kubernetes.

We are going to run ElasticSearch using a statefulset, as it’s easier to grasp on first try. If you’d like to use another method, checkout the links above.

Create a directory called logging/elasticsearch . This is where we will store all the configs for elasticsearch.

$ mkdir -p elasticsearch
$ cd elasticsearch

Next we want to create the statefulset for running ElasticSearch

Create a file in logging/elasticsearch called statefulset.yml and add the contents for the statefulset.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: logging
spec:
serviceName: elasticsearch
replicas: 1
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
resources:
limits:
cpu: 1000m
requests:
cpu: 100m
ports:
- containerPort: 9200
protocol: TCP
- containerPort: 9300
protocol: TCP
volumeMounts:
- name: elastic-data
mountPath: /usr/share/elasticsearch/data
env:
- name: cluster.name
value: kubernetes-logging
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.seed_hosts
value: "elasticsearch-0.elasticsearch"
- name: cluster.initial_master_nodes
value: "elasticsearch-0"
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
initContainers:
- name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
securityContext:
privileged: true
volumeMounts:
- name: elastic-data
mountPath: /usr/share/elasticsearch/data
- name: increase-vm-max-map
image: busybox
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
volumeClaimTemplates:
- metadata:
name: elastic-data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: do-block-storage
resources:
requests:
storage: 10Gi

Apply that and ElasticSearch should start up.

$ kubectl apply -f statefulset.yml
statefulset.apps/elasticsearch created
$ kubectl rollout status statefulset/elasticsearch -n logging
Waiting for 1 pods to be ready...
partitioned roll out complete: 1 new pods have been updated...
$ kubectl get pods -n logging
NAME READY STATUS RESTARTS AGE
elasticsearch-0 1/1 Running 0 7m27s

You can see the pods running in your cluster after a few minutes

We need to add a Kubernetes Service for ElasticSearch to be easily discovered by other components.

Still in your logging/elasticsearch folder, add a file called service.yml and add the contents for a service pointing at elasticsearch

kind: Service
apiVersion: v1
metadata:
name: elasticsearch
namespace: logging
labels:
app: elasticsearch
spec:
selector:
app: elasticsearch
ports:
- port: 9200
name: rest
- port: 9300
name: inter-node

Apply that in your Kubernetes cluster

$ kubectl apply -f service.yml
serviservice/elasticsearch created

You can now port-forward that service to see that ElasticSearch is working correctly

$ kubectl port-forward svc/elasticsearch 9200 -n logging

and then open http://localhost:9200/_cluster/health/ in your browser

You should see a page similar to this

Your ElasticSearch is now working 👌

Next we’ll add Kibana

Kibana is probably the simplest to setup.

All you’ll need is a Deployment and a Service.

The Deployment.

Back to your logging directory, add a new directory called kibana. This is where we will store everything related to Kibana.

# Change back to your logging directory first
$ cd ../
$ mkdir kibana
$ cd kibana

Create a new file called deployment.yml with the following contents

apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
namespace: logging
labels:
app: kibana
spec:
replicas: 1
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
- name: kibana
image: docker.elastic.co/kibana/kibana:7.2.0
resources:
limits:
cpu: 1000m
requests:
cpu: 100m
env:
- name: ELASTICSEARCH_URL
value: http://elasticsearch:9200
ports:
- containerPort: 5601

Apply that and you should see a pod running for Kibana

$ kubectl apply -f deployment.yml
deployment.apps/kibana created
$ kubectl rollout status deploy/kibana -n logging
[...]
deployment "kibana" successfully rolled out
$ kubectl get pods -n logging
NAME READY STATUS RESTARTS AGE
elasticsearch-0 1/1 Running 0 32m
kibana-67f95cc5f4-pqbwt 0/1 ContainerCreating 0 28s

Next, we’ll create a Kubernetes Service for Kibana

In a new file called service.yml add the contents for a kibana service

apiVersion: v1
kind: Service
metadata:
name: kibana
namespace: logging
labels:
app: kibana
spec:
ports:
- port: 5601
selector:
app: kibana

Apply that and then port-forward Kibana.

$ kubectl apply -f service.yml
service/kibana created
$ kubectl port-forward svc/kibana 5601 -n logging

Go to http://localhost:5601 in your browser.

You now have Kibana.

Next we add Fluentd

Fluentd will grab the logs from all your containers and push them into ElasticSearch, so you can view them in Kibana. You see how this whole thing works ?

Fluentd is installed with a Daemonset. A Daemonset is a workload that is not scaled by replicas, but rather one-for-every-machine. You can define which machines it should run on etc. But essentially it is a workload that will run a pod on every machine. With that we can mount onto the host and monitor it independently. When you add a node, the daemonset will automatically deploy a pod onto the new node. It mounts onto your docker logs, and pushes them up into ElasticSearch.

Back to your logging directory, add a new directory called fluentd

$ cd ../
$ mkdir fluentd
$ cd fluentd/

Create a file called service-account.yml .

apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd
namespace: logging
labels:
app: fluentd
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluentd
labels:
app: fluentd
rules:
- apiGroups:
- ""
resources:
- pods
- namespaces
verbs:
- get
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentd
roleRef:
kind: ClusterRole
name: fluentd
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: fluentd
namespace: logging

Create new file called daemonset.yml with the following contents

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: logging
labels:
app: fluentd
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
serviceAccount: fluentd
serviceAccountName: fluentd

tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
- name: FLUENT_ELASTICSEARCH_SCHEME
value: "http"
- name: FLUENTD_SYSTEMD_CONF
value: disable
resources:
limits:
memory: 512Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers

Apply & watch the rollout

$ kubectl apply -f daemonset.yml
daemonset.apps/fluentd created
$ kubectl rollout status daemonset/fluentd -n logging
Waiting for daemon set spec update to be observed...
Waiting for daemon set "fluentd" rollout to finish: 1 out of 2 new pods have been updated...
Waiting for daemon set "fluentd" rollout to finish: 0 of 2 updated pods are available...
Waiting for daemon set "fluentd" rollout to finish: 1 of 2 updated pods are available...
daemon set "fluentd" successfully rolled out

Now that Fluentd is running, we can go back to Kibana and check the logs for all our pods

Setup index pattern in Kibana

Port-forward Kibana again

$ kubectl port-forward svc/kibana 5601 -n logging
Give it a few seconds

Once it has loaded, click on the management icon, and go to index patterns

Click Create index pattern .

Enter logstash-* in the field for the index pattern.

Click Next step & Select @ timestamp & Create index pattern

You should now have a valid index pattern

Checking logs in Kibana

You should now be able to see all your logs in Kibana.

On the Kibana dashboard, go to the Discover page

You should now see a page full of logs:

You have now setup logging, and you can search logs from ANY Kubernetes deployments / pods.

Fluentd will pick up anything that gets written to your containers log.

Searching for only random generator pods

In the search bar, search only for our random generator containers by entering kubernetes.container_name : random-generator in the search bar

You will now only see logs for the random generator

On the left you should see a block of fields. Select only kubernetes.host, kubernetes.pod_name, and log, and you should be able to see the name of the dragon, the host the pod is running on, and the pod name that generated the log entry.

Now you can see your logs in a much clearer light.

It scales too

What happens when we scale the random generator to say 10 pods ?

In the graph above you will see that we are getting pretty much 30 entries per 30 second period. Which is correct, because we are logging at once per second.

Lets scale the random generator and see what happens

$ kubectl scale deploy/random-generator -n random-generator --replicas 10
deployment.extensions/random-generator scaled
$ kubectl rollout status deploy/random-generator -n random-generator
Waiting for deployment "random-generator" rollout to finish: 1 of 10 updated replicas are available...
Waiting for deployment "random-generator" rollout to finish: 2 of 10 updated replicas are available...
Waiting for deployment "random-generator" rollout to finish: 3 of 10 updated replicas are available...
Waiting for deployment "random-generator" rollout to finish: 4 of 10 updated replicas are available...
Waiting for deployment "random-generator" rollout to finish: 5 of 10 updated replicas are available...
Waiting for deployment "random-generator" rollout to finish: 6 of 10 updated replicas are available...
Waiting for deployment "random-generator" rollout to finish: 7 of 10 updated replicas are available...
Waiting for deployment "random-generator" rollout to finish: 8 of 10 updated replicas are available...
Waiting for deployment "random-generator" rollout to finish: 9 of 10 updated replicas are available...
deployment "random-generator" successfully rolled out

Back to the Kibana dashboard, you should now see 300 entries per 30 second period.

See the log count going up ? That shows our new pods are logging as well.

We now have distributed logging. We can see all our container logs in one place. You can also filter by stream if you only want to see errors etc. It will differentiate between output that was added to stdout and stderr.

Play around in Kibana to build some visualisations, and maybe even try out their Machine Learning section.

Here are some docs for Kibana

If you have any questions, be sure to post them down below in the comments !

--

--