Logging with EFK in GKE

Published in

Ninja Van Tech

7 min readJun 18, 2019

Solving missing logs with Elasticsearch Fluentd Kibana in Google Kubernetes Engine.

A few weeks back, my colleague, Mani Kuramboyina, and I were assigned to get to the bottom of a production issue — missing logs.

The issue started a few days before we started investigating it and we thought…

Could it be that we reached the throughput threshold of incoming logs to our Elasticsearch cluster? If yes, should we add some queueing mechanism then?

In order to simulate the issue properly, we decided to have a setup as close as possible to our Production environment. It was obvious then that we should use no other than our Dev GKE cluster. Figure 1 shows how our setup looks like.

Figure 1: Our Dev GKE Cluster as our test environment

Some key facts:

A Google Cloud Load Balancer is setup via a Kubernetes Service to accept incoming requests to our Kong containers where every single request to the cluster are being served. These Kong containers are managed by a Kubernetes Deployment that is being autoscaled by a Kubernetes Horizontal Pod Autoscaler.
The docker daemon in each node is configured to have JSON file as logging driver. This means each container that sends logs to stdout will also appear in some log file under /var/lib/docker. Kubernetes symbolically links these log files under /var/lib/containers and /var/lib/pods.
Fluentd is configured to run as a Kubernetes Daemonset; they read logs from /var/log/containers/* and forwards them to an Elasticsearch cluster. Fluentd handles log rotation and buffer files are configured with overflow_action as block to prevent losing logs.
Finally, we have Kibana set up to read the logs from Elasticsearch.

Before we begin, let’s ensure we can replicate the missing logs issue in our Dev environment.

We set up a distributed Apache JMeter cluster inside our Dev GKE cluster, and tried load testing our Kong load balancers as shown in Figure 2 below. The JMeter cluster was installed via Helm; here is the command to do so:

$ helm install --namespace load-testing --name distributed-jmeter stable/distributed-jmeter

And yes. We replicated it easily.

We implemented our hypothesis with a queueing mechanism between Fluentd and Elasticsearch. We installed Apache Kafka and Figure 3 shows that set up.

Side note: before starting the tests we enable Fluentd Elasticsearch Gen Hash Id plugin to ensure we don’t get duplicate logs.

Ok. Re-test… Ugh, still having missing logs.

We did a series of testing with different number of samples every round at different times of the day and the results were not so conclusive.

Figure 4: Initial test results with Apache JMeter

The result set shows the higher the rate of incoming logs the cluster gets, the higher the chance of witnessing missing logs.

An explanation why we didn’t have much missing logs at 10 AM was most likely because our engineers were just coming in to office and the load of our Dev GKE cluster had not reached its peak yet. At 1 PM, some of our engineers stepped out to have lunch, thus the load slightly increased from 10 AM. Lastly, at 3 PM all engineers were busy working; thus, the Dev cluster was receiving higher load and even with smaller samples we were already encountering missing logs.

But where exactly were the missing logs coming from now? Could Apache Kafka be the culprit this time?

Hold on. Pause. We told ourselves, we can not continue testing with this set up. We had to be certain how much logs were the container actually producing and how much of those went missing. Each Kong container weren’t just serving the requests we triggered from JMeter, but also from our very own engineers. Moreover, we had multiple replicas of Kong containers, so it wasn’t that clear how many logs each Kong container was actually producing.

To do this, we abolished the JMeter solution. We also removed Kafka and went back to the original setup shown in Figure 1. But this time around we spawned a new container to just print a number of logs each time.

I achieved this by running a shell script as follows:

#!/bin/bashkubectl delete pod test-pod; kubectl run -t -i test-pod --image bash --restart Never -- -c 'for i in {1..200000}; do echo "test-pod-batch-4..."$i; done'

Figure 5: Test results with single pod logging

The results were more consistent this time around. But to get to the bottom of the issue, I ssh-ed into the machine where the pod ran and investigated if the container printed the all logs into the Docker json log file. The commands I executed were as follows:

~ # ls /var/log/containers/ | grep test
test-pod_default_test-pod-762c44.log~ # ls -lh /var/log/containers/test-pod_default_test-pod-762c44.log
lrwxrwxrwx 1 root root 82 Jun  8 06:43 test-pod_default_test-pod-762c44.log -> /var/log/pods/default_test-pod_b818fffa/test-pod/0.log~ # ls -lh /var/log/pods/default_test-pod_b818fffa/test-pod/
total 4.0K
lrwxrwxrwx 1 root root 165 Jun  8 06:43 0.log -> /var/lib/docker/containers/762c44/762c44-json.log~ # ls -lh /var/lib/docker/containers/762c44/ | grep log
-r-------- 1 root root 9.1M Jun  8 06:43 762c44-json.log
-rw-r----- 1 root root 9.6M Jun  8 06:43 762c44-json.log.1~ # cat /var/lib/docker/containers/762c44-json.log | wc -l
96709~ # cat /var/lib/docker/containers/762c44-json.log.1 | wc -l
103291

These log file names are intentionally shortened for ease of reading.

Right there and then. There were 2 log files. The rotated log file that had 103291 lines and the current log file that had 96709 lines. My assumption then was that the log file rotated too fast (ie. as soon as the container was spawned) and Fluentd didn’t even had the chance to tail the initial log file. But hey, we were getting close to the bottom of the issue.

At this point, our CTO happened to be sitting beside me and we continued investigating the situation. By the way, thanks Shaun Chong!

We tweaked the command a bit to make the simulation more realistic. It’s not as if when a service starts, a burst of logs will appear all of a sudden. To do this, we added a timeout just before the burst.

The revised command is as follows:

#!/bin/bashkubectl delete pod test-pod; kubectl run -t -i test-pod --image bash --restart Never -- -c 'echo "Initializing..."; sleep 10; for i in {1..200000}; do echo "test-pod-batch-5..."$i; done'

Ok, now we are talking. There are no more missing logs.

To further prove our observation, we decided to do one last test, double the sample size. We should only get the logs from the initial and the eventually current log file. All the log files in between should not be captured by Fluentd because the logrotate will happen almost instantly, thus these logs should not appear inside Elasticsearch.

Figure 6: Almost instant logrotate causing fluentd to miss some logs

t=0 container initializing, dockerd pulls bash imaget=2 10MB — container started, initial log file created, fluentd tails and inserts to ESat t=12 a burst of logs caused a few log rotations instantaneously
t=12 10MB — log file rotated, fluentd fails to capture
t=12 10MB — log file rotated, fluentd fails to capture
t=12 7MB  — log file rotated, fluentd tails and inserts to ESat t=12 fluentd had inserted 17MB (10MB + 7MB) worth of logs to ES

Ok. Proven that was the culprit. Case closed… or not yet, we have to find a solution now.

Investigating the default configurations of GKE, we found out the configuration to be maximum of 10MB per log file and a maximum of 5 rotated log files.

In other words, we can only have a burst of less than 20MB worth of logs (previous 10MB + current 10MB). Otherwise, we lose the logs in between when logrotates is executed instantaneously.

A quick research indicates that GKE supports customizing these parameters¹. To do this, simply duplicate the instance group template generated by GKE and add the following under metadata, kube-env.

kube-env:
  DOCKER_LOG_MAX_SIZE: {{ log_max_size }}mb
  DOCKER_LOG_MAX_FILE: {{ log_max_file }}

Visual example:

Figure 7: Modifying docker log settings of GKE template

Before you hit save, make sure you apply the same Pod subnet settings under Networking > Alias IP ranges > Subnet range.

Figure 8: Apply alias IP range for the Pod Subnet

With the settings applied in Figure 7, we are now allowed to have at most 200mb burst of logs on each container. Of course, your use case might be different from ours, so if you have to do so, please adjust your configurations according to your own requirements. But don’t forget to consider your node’s disk size, ie:

For every node in node pool:node disk size > max pods per node * docker_log_max_file * docker_log_max_size

And that’s how we solved our missing logs issue in production.

Please subscribe if you like this article. Have questions/suggestions? Comment and I’ll do my best to answer them.

[1]: Murat Arslan. Rotation of Google Container Engine Application Logs
https://murat.io/post/logrotate-container-logs-in-google-container-engine/

Logging with EFK in GKE

Written by Ivan Kenneth Wang