Application Logging in Kubernetes with fluentd

6 min readFeb 15, 2018

I had a bit of confusion understanding how fluentd manipulates logging messages, and what better way to learn than apply it to my current Kubernetes explorations? After some trial and error, my tea ducky floating happily in a nice mug of tea, with a dash of my favorite search engine, I managed to grasp how structuring my log messages helps fluentd more effectively parse my messages.

Who doesn’t want to explore fluentd with a tea ducky?!

First off, I needed to deploy fluentd to my Kubernetes cluster. There are a lot of patterns, best described by Kubernetes’s Logging Architecture page. I started out thinking I could use the fluent/fluentd Docker image deployed as a DaemonSet. However, it needed the fluent-plugin-kubernetes_metadata_filter to filter Kubernetes metadata. The code snippet below shows the filter for Kubernetes.

<filter kubernetes.**>
  type kubernetes_metadata
</filter>

Maybe there is an image built with the plugin? I found this in the fluentd-kubernetes-daemonset repository. The repository has Kubernetes templates of fluentd as a Kubernetes DaemonSet for various backends, such as Elasticsearch, with a corresponding fluentd image. While I liked the completeness of the fluentd-kubernetes-daemonset, it contained more than I needed to figure out fluentd’s parsing when it gets an application’s logs. I took the Elasticsearch DaemonSet manifest as a sample, since it had some basic configuration. I opted to:

remove the Elasticsearch instance and pipe to stdout.
clean up the logs.
isolate the application logs.

Pipe to stdout (for experimentation)

I didn’t have an Elasticsearch instance and I didn’t want to set one up —fluentd should have a configuration to simply display all of the logs it parses to stdout. But how do I customize the fluentd configuration?

Should I create a new Docker image? Not ideal. I’d have to do extra work to build it every time I make a configuration change.
Maybe I can mount the configuration like a volume? Possible since I can change the configuration and restart the pods when I have a new one.

What is the default configuration? As it turns out, there are multiple configuration templates that can be used to compose the fluentd configuration. The first file that stood out to me was the kubernetes.conf.erb. It has far more context on directing system logs to other backends but I focused on three specific sections:

<match fluent.**>
  @type null
</match><source>
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag kubernetes.*
  format json
  read_from_head true
</source>...<filter kubernetes.**>
  @type kubernetes_metadata
</filter>

The first match directive filters fluentd’s system logs. If a log message starts with fluentd, fluentd ignores it by redirecting to type null. source tells fluentd where to look for the logs. In this case, the containers in my Kubernetes cluster log to /var/log/containers/*.log. As the container logs are written on the host, fluentd tails the logs and retrieves the messages for each line. There’s also a position file that fluentd uses to bookmark its place within the logs. Finally, the kubernetes_metadata filter keeps Kubernetes metadata.

Now that I defined my source, I need to figure out how to send my messages to a target. There is another configuration, fluentd.conf.erb, that has a match directive targeting an Elasticsearch cluster. Maybe I can change the snippet below to direct to a different type of target?

<match **>
   @type elasticsearch
   log_level info
   ...
</match>

The @type stdout directive redirects the matched log message to stdout. My configuration now looks like this:

<match fluent.**>
  @type null
</match><source>
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag kubernetes.*
  format json
  read_from_head true
</source><filter kubernetes.**>
  @type kubernetes_metadata
</filter><match **>
  @type stdout
</match>

I mounted it as a Kubernetes ConfigMap to /fluentd/etc, where fluentd configuration resides on an Alpine Linux image. After I apply this configuration, I should see my fluentd pod accumulating these log messages and outputting the structured messages in its own logs. I checked this using kubectl logs <fluentd pod>…

The output of kubectl logs on the fluentd pod is almost unreadable with all of the escaped strings…

Well, this is a bit of a mess. I can’t read any of it. It’s all been escaped repeatedly…

Clean Up the Logs

My logs contained so many escaped strings, it turned into post-modern ASCII art. How could I clean this up? I realized that the pod was recording itself, since the fluentd source expected to tail the logs of each container. Thus, fluentd recursively recorded and escaped its own log messages. The solution would be to ignore the log messages from the fluentd pod. I could use the match directive to filter the noise. What should the match expression be?

I caught the kubectl logs on the fluentd pod right before the escaped strings consumed my screen. I wanted to see how fluentd labels the messages. The format?

kubernetes.<path to logs>.<pod name>.<namespace>.<container name>.<container id>

This is what I could extract from the snippet below:

The log message is labelled based on the log path, pod name, namespce, container name, and container ID.

Based on the labelling, I opted to eliminate anything that had fluentd in the name and redirect it to null. I added the following match directive before the more generic kubernetes.** directive because match order matters.

<match kubernetes.var.log.containers.**fluentd**.log>
  @type null
</match><match kubernetes.**>
  @type stdout
</match>

After fixing that, I can read my logs!

The logs look much nicer, no more recursively escaped strings.

Isolate the Application Logs

Using a similar match directive, I filtered out any logs coming from kube-system. For easy reading, I only want to see my application logs.

<match kubernetes.var.log.containers.**fluentd**.log>
  @type null
</match><match kubernetes.var.log.containers.**kube-system**.log>
  @type null
</match><match kubernetes.**>
  @type stdout
</match>

After removing all of the kube-system logs, I created a simple nginx service and called the endpoint a few times. I snagged one of the lines and pretty-printed it to check it out some more.

fluentd log output for nginx’s application.

nginx’s access logs default to tab-delimited format. I can see it under the log field of the blob and figure out that the call came from my Postman. What happens when I pass an application with JSON-formatted log messages? I used my hello world application to test it out, since the logs output as JSON.

Successful call to my hello world application is reflected in the fluentd logging. I made a GET call to my /hello endpoint.

Failed call to my hello world application is parsed out by level and message by fluentd. It tells me there’s a 404!

I see the raw JSON log message under the log field. The neat part is that fluentd recognizes the JSON fields within the log field and extracts them as their own fields. This means I can search on my log level, request method, request URI, or any other field. I also get the Kubernetes metadata, like container name, namespace, and more.

Summary

I realized that structuring my logs into JSON may not be human-readable via docker logs or kubectl logs but it could be really useful for log aggregation. While there are many other fluentd log formats to help me reformat non-JSON to my favorite target, I feel like if I go the distributed logging route, I might just want my application logs formatted as JSON by default to make it easier to parse and index. Plus, they’re structured! Overall, pretty cool exercise to get familiar with fluentd and how its match directives work. Using these configurations, I can redirect it to stdout, Elasticsearch, or anywhere else. For the full sample configuration with the Kubernetes daemonset, see my Github.

The References

Rosemary Wang (@joatmon08) | Twitter

The latest Tweets from Rosemary Wang (@joatmon08). #infrastructureascode explorer. ☁️ #cloud enthusiast. @thoughtworks…

twitter.com