Compose your infrastructure, don’t micromanage it

TLDR: Leverage Kubernetes annotations across your cluster to declaratively configure management of monitoring and logging.

Two of the largest surfaces of applications that make contact with infrastructure are monitoring and logging, and this post talks how to approach both these needs in a scalable, composable, and simplified way.

If you are familiar with the Kubernetes, Prometheus, Fluentd, and the ELK stack, feel free to skip the background.

Background

Kubernetes

Kubernetes is a popular cluster management software that takes some simple concepts and composes them in a beautiful way with a metadata-rich API.

A core concept is the pod, which is one or more co-located containers. A pod can be individually started, but if it terminates or the server it is running on restarts, there is not any procedure to ensure that the workload will come back online.

apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80

The real power of Kubernetes comes from the API types that describe what pods should look like and manage the scheduling (and querying) of pods. Kubernetes does this with the use of key-value pairs on API objects called Labels. As an example, when a Deployment API-object is created, the Kubernetes Scheduler ensures that a specified number of pods are running across the available servers.

If you submitted the the following Deployment to Kubernetes, 2 pods would start up on your cluster running Nginx exposing port 80.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80

One other set of metadata on API objects is annotations. Annotations are arbitrary key-value pairs similar to Labels, but rather than being used for scheduling, are available for other applications to make decisions about what they care about.

apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
annotations:
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: 80

Fluentd

Fluentd is a log management application that takes streams of data from disparate sources, runs a set of modifications, filters, and slicing, and forwards those streams to a variety of aggregation, visualization, and storage systems.

Prometheus

To quote the documentation, “Prometheus is an open-source systems monitoring and alerting toolkit” that has become widely used by operations teams of all sizes as a default monitoring system.

Our use of Prometheus at Skuid has grown over the past year from monitoring key-infrastructure components like Kubernetes and Jenkins, to being where we monitor application health and performance.

Elasticsearch, Logstash, Kibana (ELK)

The ELK stack is an open-source log management solution. Elasticsearch is a document search server, Logstash is a server-side log shipping software (similar to Fluentd), and Kibana is a dashboard and visualization server that queries Elasticsearch.

At Skuid, we use AWS Elasticsearch and Kibana, and rather than Logstash, we use Fluentd for log shipping. We also send our logs up to CloudWatch Logs, which triggers a lambda function, which passes the log messages on to AWS Elasticsearch.

First steps with composition

When we initially configured Prometheus, the scrape_configs section looked something like this:

scrape_configs:
- job_name: 'jenkins'
kubernetes_sd_configs:
- role: endpoint
metrics_path: /prometheus
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
action: keep
regex: jenkins
- job_name: 'alertmanager'
kubernetes_sd_configs:
- role: endpoint
metrics_path: /metrics
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
action: keep
regex: alertmanager
...

Essentially, for each service we wanted to monitor, we added a new entry in the configuration file. After digging into the Prometheus documentation, we realized the that Prometheus’ Kubernetes integration can include any annotations that we add! Using label replacing, we updated our Kubernetes services and Prometheus configuration to the following:

Services

apiVersion: v1
kind: Service
metadata:
name: webapp
labels:
app: webapp
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '80'
prometheus.io/path: '/metrics'
prometheus.io/scheme: 'http'

Prometheus

scrape_configs:
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels:
- "__meta_kubernetes_service_annotation_prometheus_io_scrape"
action: keep
regex: true
- source_labels:
- "__meta_kubernetes_service_annotation_prometheus_io_scheme"
action: replace
target_label: __scheme__
regex: (https?)
- source_labels:
- "__meta_kubernetes_service_annotation_prometheus_io_path"
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
- "__address__"
- "__meta_kubernetes_service_annotation_prometheus_io_port"
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2

The result of this is that rather than individually writing configuration for each new application we want to monitor, we use the Kubernetes annotations to compose the target address. With this setup, any new service we want to monitor only needs to add the relevant annotations.

Annotations as a pattern

Along with monitoring, logging was a service management task where every time we created a new service, we needed to update some configuration to support that new service. Here’s an example of our logging configuration:

# Read all kubernetes container logs
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag k8s.*
format json_in_json
read_from_head true
</source>
# Add K8s metadata
<filter k8s.var.log.**>
@type kubernetes_metadata
</filter>
# Flatten nested documents
<filter k8s.var.log.**>
@type flatten_hash
flatten_array false
separator _
</filter>
# Add log_stream field
<filter k8s.var.log.**>
@type record_transformer
<record>
# Add a unique steam name to prevent a race condition
# for the stream sequence token for CloudWatch Logs
stream_name ${Socket.gethostname}
</record>
</filter>
# Take each log stream
<match k8s.var.log.**>
@type rewrite_tag_filter
rewriterule1 kubernetes_container_name grafana cwl.grafana
rewriterule2 kubernetes_container_name webapp cwl.webapp
rewriterule3 kubernetes_container_name prometheus cwl.prometheus
rewriterule4 kubernetes_container_name nginx cwl.nginx
</match>
...
# Add the es_type like this for every service
<filter cwl.grafana>
@type record_transformer
<record>
es_type grafana
es_index_prefix cwl-grafana-
</record>
</filter>
...
# Send to CloudWatch Logs
<match cwl.**>
@type cloudwatch_logs
log_group_name Kubernetes
auto_create_stream true
log_stream_name_key stream_name
</match>

This worked for our first handful of services, but as you can imagine this quickly got out of hand as more services were added. Taking a cue from Prometheus, we started rethinking how we could declaratively define the forwarding of our logs. What we ended up with was adding an annotation to pods in our Deployments:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 1
template:
metadata:
annotations:
fluentd.org/keys: '{"nginx": {"es_type": "nginx", "es_index": cwl-nginx-" }}'
spec:
containers:
- name: nginx
...

When a pod uses the above Kubernetes annotation, the kubernetes-metadata-filter plugin will add the following (along with other metadata)

{
"kubernetes_container_name" : "nginx",
"kubernetes_annotations_fluentd_org/keys" : "{\"nginx\": {\"es_type\": \"nginx\", \"es_index\": \"cwl-nginx-\"}}",
...
}

We also wrote a Fluentd plugin, fluent-plugin-json-lookup, that will extract a key-value pairs from a JSON value. The plugin takes the above message, and outputs the following message:

{
"kubernetes_container_name" : "nginx",
"es_type": "nginx",
"es_index": "cwl-nginx-"
}

The Fluentd configuration looks like this:

<filter pattern>
@type json_lookup
lookup_key kubernetes_container_name
json_key kubernetes_annotations_fluentd_org/keys
remove_json_key true
</filter>

In the end we reduced our Fluentd configuration (short of ingestion and upload) to:

# Add K8s metadata
<filter k8s.var.log.**>
@type kubernetes_metadata
annotation_match [ ".+"] # Get flutend annotations
</filter>
# Flatten nested documents
<filter k8s.var.log.**>
@type flatten_hash
flatten_array false
separator _
</filter>
# Unpack the fluentd annotations
<filter k8s.var.log.**>
@type json_lookup
lookup_key kubernetes_container_name
json_key kubernetes_annotations_fluentd_org/keys
remove_json_key true
</filter>

With a quick update to our lambda function, we were able to forward any logs we wanted from Kubernetes without having to touch our Fluentd configuration!

Summary

Now whenever we launch a new service, ingesting logs and adding metric scraping is fully automated in a declarative, composed way, and we can focus on the services rather than the management of them.


If this has been helpful to you, let me know! How are you using Kubernetes annotations to simplify your cluster operations?