Snap and Kubernetes: together at last

This is the story of how we* integrated Snap, our open telemetry framework, and Kubernetes together to help make a better story for monitoring and scale. We call the resulting proof of concept (PoC) kubesnap, which you can get up and running in GCE right now.

A demo of kubesnap presented by Nicholas Weaver at the 7/21 Kubernetes Community Meeting, can be seen below (starts at 1:00 and is about 23 mins long):

What is Snap?

Snap is an open telemetry framework designed to simplify the collection, processing and publishing of system data through a single API. To learn more about Snap you can check out the Snap repository on GitHub or the Snap landing page.

What is Kubernetes?

From the Kubernetes Readme: “ Kubernetes is an open source system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications.”

For our Proof of Concept integration we focused on a couple components of Kubernetes:

  • cAdvisor: cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers.
  • Heapster: Heapster collects and interprets various signals like compute resource usage, lifecycle events, etc, and exports cluster metrics via REST endpoints.

For the rest of this post we will assume that you are familiar with basic concepts in Kubernetes (e.g. pods, daemon sets, config maps and Horizontal Pod Autoscaling) and Snap (e.g. Snap daemon, Tribe and plugins).

Integration overview

From an operational perspective Snap is deployed as a DaemonSet in the Kubernetes environment, to allow for easy distribution of Snap daemons across all nodes. Since we need to pass the Snap daemons information about the Tribe setup (which Snap daemon is the Tribe seed and how many nodes there are in the Tribe), we decided to use ConfigMap to record this information during cluster startup so later on Snap pods can have access to it.

From the functionality perspective we wanted to show that Snap is able to integrate with Kubernetes, and that it can be the data source for Heapster and subsequently Horizontal Pod Autoscaler. At the time of writing, since Heapster can only support one data source we decided to extend our existing Docker collector plugin to support retrieval of all Docker metrics that are retrievable by cAdvisor. To give Heapster access to metrics collected by Snap we created a Heapster publisher plugin which exposes the Heapster API and allows Heapster to talk to the Snap daemons. The API call to get metrics is almost the same as the one used by Heapster to talk to kubelet to get metrics, differing only in port number. All we had to do was add support in Heapster for the Snap data source and re-use data parsers from the standard “kubernetes” data source to return metrics from Snap in the same format as cAdvisor.

For this PoC deployment environment we chose Google Compute Engine, which is part of the Google Cloud Platform. We used e2e test scripts to start a Kubernetes cluster on GCE. To start kubesnap you simply need to create a new VM on GCE, clone the kubesnap repo, and then start the provisioning script. The script will replicate all needed repos, install Docker and gcloud on the VM, and start kubesnap which involves building and starting the Kubernetes cluster. You can start the Kubernetes cluster with: go run hack/e2e.go -v --up

It can be destroyed later using the following command (while in the kubernetes directory): go run hack/e2e.go -v --down

To wrap it up we created a simple installer for the Google Compute Engine environment, so others can easily experiment with kubesnap. More details on how to install kubesnap can be found on GitHub.

Questions on How

Here are some frequently asked questions that arise when exploring the “how” of this project.

So, is cAdvisor still running in kubesnap?

Yes, cAdvisor is still running as part of kubelet and its API is still available. Heapster though is configured to talk to the Snap daemons. It’s pretty easy to set a data source for Heapster (cluster/addons/cluster-monitoring/influxdb/heapster-controller.yaml):

Setting snap as a data source for Heapster

If you want to switch to cAdvisor, just change the source in heapster-controller.yaml and restart Heapster. On the other hand, if in the future Heapster would be able to support multiple data sources, you could use both Snap and cAdvisor at the same time.

What exactly was changed in Kubernetes and Heapster to support kubesnap?

For detailed changes in Kuberentes, please see: https://github.com/kubernetes/kubernetes/compare/master...andrzej-k:snap_tribe

For a list of changes in Heapster, please see: https://github.com/kubernetes/heapster/compare/master...andrzej-k:snap

So what specifically changed?

To keep is simple, here is the list of changes we made in Heapster and Kubernetes:

1) Heapster changes

In metrics/sources/factory.go added support for Snap as a data source:

Snap as a data source in Heapster

Created metrics/sources/snap/snap.go which communicates with Snap daemons to get metrics. As mentioned previously, the REST API call is similar to the one used by the “kubernetes” provider, the only difference is the port number. Also, at least for now, all of the data processing is done exactly the same as in the original “kubernetes” provider since Snap re-uses the data format which is exposed by the kubelet (cAdvisor).

2) Kubernetes changes

Defined a DaemonSet for Snap (cluster/addons/snap/snap.yaml):

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: snap
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
name: snap
labels:
daemon: snapd
spec:
hostPID: true
hostNetwork: true
containers:
- name: snap
image: gcr.io/snap4kube-1/snap
env:
- name: SNAP_SEED_IP
valueFrom:
configMapKeyRef:
name: snap-config
key: tribe.seed
- name: SNAP_TRIBE_NODES
valueFrom:
configMapKeyRef:
name: snap-config
key: tribe.nodes
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- mountPath: /sys/fs/cgroup
name: cgroup
- mountPath: /var/run/docker.sock
name: docker-sock
- mountPath: /var/lib/docker
name: fs-stats
- mountPath: /usr/local/bin/docker
name: docker
- mountPath: /proc_host
name: proc
ports:
- containerPort: 8181
hostPort: 8181
name: snap-api
- containerPort: 8777
hostPort: 8777
name: publisher-api
imagePullPolicy: Always
securityContext:
privileged: true
volumes:
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: docker-sock
hostPath:
path: /var/run/docker.sock
- name: fs-stats
hostPath:
path: /var/lib/docker
- name: docker
hostPath:
path: /usr/bin/docker
- name: proc
hostPath:
path: /proc

This includes:

  • Data source for heapster-controller is set to Snap (cluster/addons/cluster-monitoring/influxdb/heapster-controller.yaml)
  • Tribe seed is set to Kubernetes master node IP (cluster/gce/util.sh)
  • Created ConfigMap to store Tribe data, seed IP and number of nodes in a Tribe (cluster/kube-up.sh and cluster/validate-cluster.sh)

3) Snap Changes

Snap is pretty well suited for running inside a Docker container, but we also needed to make sure that our Docker collector plugin would be able to retrieve metrics for containers on the node while running inside a container. This works well and we wrapped everything into this Dockerfile.

Two new plugins were created:

  • kubesnap docker collector— Snap Docker collector plugin was extended to retrieve all sorts of data available from a kubelet, including Kubernetes labels, network and filesystem statistics.
  • kubesnap heapster publisher — Heapster publisher plugin is a hybrid plugin that doesn’t entirely fit Snap’s model for publishers. The publisher plugin receives all incoming metrics, retains them, and exposes via HTTP endpoint conforming to Heapster API.

The Heapster publisher is composed of 3 elements:

  • Regular publisher module, conforming to the Publisher interface — here the metrics are recognized and grouped into standard and custom ones
  • Buffering module — keeps a history of received metrics, aggregated per all monitored containers
  • HTTP module — exposes API for retrieving and filtering collected metrics

The Heapster publisher supports several config options to tune its processing capabilities:

stats_depth: 0 # max number of records maintained per container (0 = unlimited)
server_port: 8777 # TCP port number for HTTP endpoint
stats_span: “10m” # Maximum time span of collected metrics per each container

To automate the Snap startup process we created a simple wrapper that starts the Snap daemon, finds out if it is a Tribe seed or not, attaches itself to an agreement and loads plugins (Docker collector and Heapster publisher). Then, once all nodes register into a Tribe, it starts a task which collects Docker metrics, and pushes them to the Heapster publisher, so Heapster can query for those metrics later. The script can be found here.

Get kubesnap up and running

We made it simple to get kubesnap running in GCE. Try it out and let us know what you think!