Monitor the Systems You Love on Kubernetes

10 min readOct 25, 2016

Services shift from the monolithic to microservices, containerization and orchestration in cloud computing. Monitoring the health of both hardware and software systems at scale with ease becomes crucial to ensure the minimal downtime of our applications. How can we ensure the interoperability of so much different software, dashboards/tools and the underlying platforms and architectures? How do we make sense of data when we measure data in petabytes and zettabytes? There are so many analytics and monitoring tools out there. For this blog post, we focus on how Snap can simplify your analytic and monitoring solution with Kubernetes.

Snap is an open source telemetry framework written in Go. Its gRPC server and client communication pattern allows interoperability among different languages. The Snap ecosystem comprises of hardware and software collector, processor and publisher plugins. You can check out the available plugins in the plugin catalog. Here are some features worth noting:

Collected data can be published to different data stores
Containerizable and packaged as microservices
Extendable to create your own automated monitoring solution

In this blog post, we will collect sample metrics from CPU, Memory, Disk, psutil and Elasticsearch to monitor the health of an Elasticsearch cluster and the the underlying system at the same time.

Elasticsearch is similar to a distributed document-based NoSQL database or data warehouse with lightning speed full-text search capability. It’s a very popular project for monitoring other systems, but how do we monitor Elasticsearch itself and its underlying system?

For this blog post we explore 2 different tools, docker-compose and Minikube to simplify monitoring the health of an Elasticsearch cluster and the underlying system utilizing the Snap Framework. We will use both tools to run an instance of Snap with plugins that collect the health data of Elasticsearch nodes in a cluster and the health data of system cpu, memory, disks and psutil. Then we will use influxDB and Grafana to visualize the data collected.

All scripts used in this blog are available on Github:

candysmurf/snap-elasticsearch-monitor

Contribute to snap-elasticsearch-monitor development by creating an account on GitHub.

github.com

Monitoring with docker-compose

First, let’s run the example using docker-compose on your laptop. Docker-compose is ideal for development. This will help us understand how to deploy the example to Kubernetes later. You can see the similarities and differences between docker-compose and Kubernetes in their respective deployment YAML files.

Before you start, you need docker/docker-machine and docker-compose installed on your laptop. Then you can proceed:

1. Start the example. This starts an instance of Snap with the Elasticsearch collector, cpu, meminfo, disk and psutil plugins pre-loaded collecting sample metrics, an instance of InfluxDB, an instance of Grafana and one Elasticsearch master node and one data node on start.

▶ cd snap-elasticsearch-monitor/docker-compose/
▶ docker-compose -f snap-es-monitor.yml up -d
Creating network "dockercompose_default" with the default driver
Creating dockercompose_snap-elasticsearch_1
Creating influxdb
Creating dockercompose_es-node_1
Creating grafana
Creating snap-elasticsearch-monitor
Creating dockercompose_master_1
Creating dockercompose_node_1

2. View Grafana dashboard. Go to http://[dockerhost]:3000 to view live metric collection visualization. There are two pre-created dashboard panels: Snap Elasticsearch Monitor and Snap System Monitor.

To see the IP address of your Docker host if you use docker-machine:

▶ echo $DOCKER_HOST
tcp://192.168.99.100:2376

For example: http://192.168.99.100:3000. Otherwise it’s http://127.0.0.1:3000 if you use Docker.

3. Scale Elasticsearch cluster. The example Elasticsearch cluster has one master and one data node. You can scale your cluster up or down. For example, to scale up to 5 data nodes:

▶ docker-compose -f snap-es-monitor.yml scale snap-elasticsearch=1 es-node=5
Desired container number already achieved
Creating and starting dockercompose_es-node_2 ... done
Creating and starting dockercompose_es-node_3 ... done
Creating and starting dockercompose_es-node_4 ... done
Creating and starting dockercompose_es-node_5 ... done

After scaling up you can view your updated docker containers:

▶ docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
8ead4c2c3e88        elasticsearch:latest     "/docker-entrypoint.s"   2 minutes ago       Up 2 minutes        9200/tcp, 9300/tcp                               dockercompose_es-node_4
e680486b0642        elasticsearch:latest     "/docker-entrypoint.s"   2 minutes ago       Up 2 minutes        9200/tcp, 9300/tcp                               dockercompose_es-node_3
49807affc8f7        elasticsearch:latest     "/docker-entrypoint.s"   2 minutes ago       Up 2 minutes        9200/tcp, 9300/tcp                               dockercompose_es-node_2
34821be64e08        elasticsearch:latest     "/docker-entrypoint.s"   2 minutes ago       Up 2 minutes        9200/tcp, 9300/tcp                               dockercompose_es-node_5
3ca984da410c        dockercompose_main       "sh -c '/usr/local/bi"   3 minutes ago       Up 3 minutes        8181/tcp                                         snap-elasticsearch-monitor
24b5dc5bcb35        elasticsearch:latest     "/docker-entrypoint.s"   3 minutes ago       Up 3 minutes        9200/tcp, 9300/tcp                               dockercompose_es-node_1
d800fa7a2c4e        grafana/grafana:latest   "/run.sh"                3 minutes ago       Up 3 minutes        0.0.0.0:3000->3000/tcp                           grafana
ebe9c2397e1a        tutum/influxdb:latest    "/run.sh"                3 minutes ago       Up 3 minutes        0.0.0.0:8083->8083/tcp, 0.0.0.0:8086->8086/tcp   influxdb
55f9ef583fba        elasticsearch:latest     "/docker-entrypoint.s"   3 minutes ago       Up 3 minutes        0.0.0.0:9200->9200/tcp, 9300/tcp                 dockercompose_snap-elasticsearch_1

There are now 5 Elasticsearch data nodes and 1 master node.

4. Stop running containers. Once you are done, you can stop running the example containers. This will allow you to re-start later if you wish.

▶ docker-compose -f snap-es-monitor.yml stop
Stopping dockercompose_es-node_4 ... done
Stopping dockercompose_es-node_3 ... done
Stopping dockercompose_es-node_2 ... done
Stopping dockercompose_es-node_5 ... done
Stopping snap-elasticsearch-monitor ... done
Stopping dockercompose_es-node_1 ... done
Stopping grafana ... done
Stopping influxdb ... done
Stopping dockercompose_snap-elasticsearch_1 ... done

If you want to stop and remove the containers.

▶ docker-compose -f snap-es-monitor.yml down
Stopping dockercompose_es-node_4 ... done
Stopping dockercompose_es-node_3 ... done
Stopping dockercompose_es-node_2 ... done
Stopping dockercompose_es-node_5 ... done
Stopping snap-elasticsearch-monitor ... done
Stopping dockercompose_es-node_1 ... done
Stopping grafana ... done
Stopping influxdb ... done
Stopping dockercompose_snap-elasticsearch_1 ... done
Removing dockercompose_es-node_4 ... done
Removing dockercompose_es-node_3 ... done
Removing dockercompose_es-node_2 ... done
Removing dockercompose_es-node_5 ... done
Removing snap-elasticsearch-monitor ... done
Removing dockercompose_es-node_1 ... done
Removing grafana ... done
Removing influxdb ... done
Removing dockercompose_snap-elasticsearch_1 ... done
Removing network dockercompose_default

Monitoring with Kubernetes

Second, we can try running the example inside Kubernetes with Minikube on your laptop. With just a few steps you’ll have the example running inside an enterprise grade container management.

Before you start you should have either Kubernetes or Minikube and kubectl installed. This example uses Minikube.

Start MiniKube.

▶ minikube start
Starting local Kubernetes cluster...
Kubectl is now configured to use the cluster.

2. Setup your console docker environment. This allows you to run docker commands on your console.

▶ eval $(minikube docker-env)

3. Start the example inside Kubernetes. This creates and starts an Elasticsearch cluster with one master and one data node. Also, this creates and starts an instance of Snap with the Elasticsearch collector, cpu, meminfo, disk, psutil plugins pre-loaded, an instance of Grafana, and an instance of InfluxDB.

▶ cd snap-elasticsearch-monitor/kubernetes/
▶ kubectl create -f deployment --namespace kube-system
deployment "snap-elasticsearch" created
You have exposed your service on an external port on all nodes in your
cluster.  If you want to expose this service to the external internet, you may
need to set up firewall rules for the service port(s) (tcp:31527,tcp:31462) to serve traffic.See http://releases.k8s.io/release-1.3/docs/user-guide/services-firewalls.md for more details.
service "elasticsearch" created
replicationcontroller "es-data" created
You have exposed your service on an external port on all nodes in your
cluster.  If you want to expose this service to the external internet, you may
need to set up firewall rules for the service port(s) (tcp:32657) to serve traffic.See http://releases.k8s.io/release-1.3/docs/user-guide/services-firewalls.md for more details.
service "monitoring-grafana" created
replicationcontroller "heapster" created
replicationcontroller "influxdb-grafana" created
service "monitoring-influxdb" created

4. Forward the Influxdb and Grafana ports to your local machine. Because we are running the example in a container, if you want to view the Influxdb and Grafana dashboard locally you need to forward the ports of the influxdb-grafana pod.

Start by finding the pod name when they are all in Running status:

▶ kubectl get pods --namespace kube-system
NAME                                        READY     STATUS    RESTARTS   AGE
snap-elasticsearch-3470735521-w8p5p   2/2       Running   0          2m
es-data-6rdi6                               1/1       Running   0          2m
heapster-yf7eu                              1/1       Running   0          2m
influxdb-grafana-jsatd                      2/2       Running   0          2m
kube-addon-manager-minikubevm               1/1       Running   39         6d
kubernetes-dashboard-sllpy                  1/1       Running   20         6d

Here the pod is named: influxdb-grafana-jsatd

Note: you will need to wait for all containers to be in a Running state before proceeding.

5. Create a Snap sample task to run the sample metrics.

Start by finding the running container ID:

▶ docker ps
CONTAINER ID        IMAGE                                                        COMMAND                  CREATED             STATUS              PORTS               NAMES
9c6d4c240bf1        kubernetes/heapster:canary                                   "/heapster --source=k"   12 minutes ago      Up 12 minutes                           k8s_heapster.6c2648e6_heapster-irt4k_kube-system_d3659d5f-9169-11e6-96f0-8203398eff0a_92f9930c
20558b9581bb        elasticsearch:latest                                                                  
07b00720507f        candysmurfhub/snap-es-mon-k8s:latest                         "sh -c '/usr/local/bi"   12 minutes ago      Up 12 minutes                           k8s_main.d6053766_snap-elasticsearch-1681843414-q3oyc_kube-system_d347e719-9169-11e6-96f0-8203398eff0a_e7d7dcac
cf5504f9b2f1        gcr.io/google_containers/pause-amd64:3.0                     "/pause"                 12 minutes ago      Up 12 minutes                           k8s_POD.d8dbe16c_influxdb-grafana-xilzh_kube-system_d36fdf75-9169-11e6-96f0-8203398eff0a_e9e6d561
7bfd879ea011        gcr.io/google_containers/pause-amd64:3.0                     "/pause"                 12 minutes ago      Up 12 minutes                           k8s_POD.69a02673_snap-elasticsearch-1681843414-q3oyc_kube-system_d347e719-9169-11e6-96f0-8203398eff0a_fe3f64b2
51756b97c8f8        gcr.io/google_containers/pause-amd64:3.0  
...

Then go into the container and check if plugins are loaded:

▶ docker exec -it 07b007 bashbash-4.3# snapctl plugin list
NAME    VERSION   TYPE    SIGNED   STATUS 
file    2         publisher   false    loaded 
influxdb  15      publisher   false    loaded
cpu     6         collector   false    loaded 
disk    3         collector   false    loaded 
elasticsearch   3 collector   false    loaded 
meminfo   3       collector   false    loaded  
psutil    8       collector   false    loaded

If you see the following, don’t panic! It takes time to start. Check back in a bit.

bash-4.3# snapctl plugin list
Error: URL target is not available. Get http://localhost:8181/v1/plugins: dial tcp [::1]:8181: getsockopt: connection refused

Create Snap tasks in the container:

bash-4.3# ./usr/local/bin/create_tasks

Note: Snap supports auto-discovery to automatically load plugins and create tasks; however, at the time of writing this post, Kubernetes does not support pod dependencies. Once adding the initialization containers to specify dependencies, step 6 can be skipped.

Then exit out of the container.

6. View Grafana dashboard locally. First, run the command to forward the ports:

▶ kubectl port-forward influxdb-grafana-jsatd 8083:8083 3000:3000 --namespace kube-system
Forwarding from 127.0.0.1:8083 -> 8083
Forwarding from [::1]:8083 -> 8083
Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000

Now login to the Grafana dashboard at http://127.0.0.1:3000/ (username/password is admin/admin). The dashboard has 4 pre-created panels: Cluster, Pod, Snap Elasticsearch Monitor and Snap System Monitor.

7. Scale up Elasticsearch Cluster. This example code scales Elasticsearch data pod from 1 to 3 pods:

▶ kubectl scale --replicas=3 rc es-data --namespace kube-system
replicationcontroller "es-data" scaled

After Scaling, there are three Elasticsearch data pods.

▶ kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
snap-elasticsearch-3470735521–1b7a7 2/2 Running 0 3m
es-data-86o1d 1/1 Running 0 3m
es-data-kqz1w 1/1 Running 0 33s
es-data-yplqe 1/1 Running 0 33s
heapster-y792r 1/1 Running 0 3m
influxdb-grafana-uq759 2/2 Running 0 3m
kube-addon-manager-minikubevm 1/1 Running 39 6d
kubernetes-dashboard-sllpy 1/1 Running 20 6d

8. Remove example deployment. Once you’re finished you should cleanup your kubernetes pods, services, replication controllers, and deployments.

▶ ./kubernetes/cleanup.sh
deployment "snap-elasticsearch" deleted
replicationcontroller "es-data" deleted
replicationcontroller "heapster" deleted
replicationcontroller "influxdb-grafana" deleted
service "elasticsearch" deleted
service "monitoring-grafana" deleted
service "monitoring-influxdb" deleted

9. Stop minikube.

▶ minikube stop
Stopping local Kubernetes cluster...
Machine stopped.

And reset your Docker environmental variables:

▶ eval $(minikube docker-env -u)

Using kubectl to check Kubernetes cluster

If you like to monitor the Kubenetes cluster health, you can use kubectl. Here are a few kubectl commands to explore:

1. Check what deployments you have running:

▶ kubectl get deployments --namespace kube-system
NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
snap-elasticsearch   1         1         1            1           32s

2. Check what pods you have running:

▶ kubectl get pods --namespace kube-system
NAME                                        READY     STATUS    RESTARTS   AGE
snap-elasticsearch-3470735521-w8p5p   2/2       Running   0          2m
es-data-6rdi6                               1/1       Running   0          2m
heapster-yf7eu                              1/1       Running   0          2m
influxdb-grafana-jsatd                      2/2       Running   0          2m
kube-addon-manager-minikubevm               1/1       Running   39         6d
kubernetes-dashboard-sllpy                  1/1       Running   20         6d

3. Check what services you have running:

▶ kubectl get svc --namespace kube-system
NAME                   CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
elasticsearch          10.0.0.64    <nodes>       9200/TCP,9300/TCP   3m
kube-dns               10.0.0.10    <none>        53/UDP,53/TCP       4d
kubernetes-dashboard   10.0.0.63    <nodes>       80/TCP              6d
monitoring-grafana     10.0.0.27    <nodes>       80/TCP              3m
monitoring-influxdb    10.0.0.178   <none>        8083/TCP,8086/TCP   3m

4. Check what replication controllers you have running:

▶ kubectl get rc --namespace kube-system
NAME                   DESIRED   CURRENT   AGE
es-data                1         1         4m
heapster               1         1         4m
influxdb-grafana       1         1         4m
kubernetes-dashboard   1         1         6d

5. To get more details on a pod:

▶ kubectl describe pods es-data-kqz1w --namespace kube-system
Name:  es-data-kqz1w
Namespace: kube-system
Node:  minikubevm/10.0.2.15
Start Time: Thu, 29 Sep 2016 08:49:08 -0700
Labels:  name=esdata
Status:  Running
IP:  172.17.0.7
Controllers: ReplicationController/es-data
Containers:
  esdata:
    Container ID: docker://32810123e5cd381769eb1b3be9316a67713a1e5fbaca1b0a0accc230ed4f5b6b
    Image:  elasticsearch:latest
    Image ID:  docker://sha256:22287ab1f811bd81eaaf3a9c112a8100b6532996a0a3a428ec5bcace8802db59
    Port:
    State:  Running
      Started:  Thu, 29 Sep 2016 08:49:15 -0700
    Ready:  True
    Restart Count: 0
    Environment Variables:
      DISCOVERY_ZEN_PING_UNICAST_HOSTS: elasticsearch
Conditions:
  Type  Status
  Initialized  True
  Ready  True
  PodScheduled  True
Volumes:
  default-token-c0kys:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-c0kys
QoS Tier: BestEffort
Events:
  FirstSeen LastSeen Count From   SubobjectPath  Type  Reason  Message
  --------- -------- ----- ----   -------------  -------- ------  -------
  47m  47m  1 {default-scheduler }    Normal  Scheduled Successfully assigned es-data-kqz1w to minikubevm
  47m  47m  1 {kubelet minikubevm} spec.containers{esdata} Normal  Pulling  pulling image "elasticsearch:latest"
  46m  46m  1 {kubelet minikubevm} spec.containers{esdata} Normal  Pulled  Successfully pulled image "elasticsearch:latest"
  46m  46m  1 {kubelet minikubevm} spec.containers{esdata} Normal  Created  Created container with docker id 32810123e5cd
  46m  46m  1 {kubelet minikubevm} spec.containers{esdata} Normal  Started  Started container with docker id 32810123e5cd

In this blog post we learned how to use docker-compose and Minikube to run an example of Snap with plugins that collect the health data of Elasticsearch nodes in a cluster and the health data of system cpu, memory, disks and network statistics. We also used InfluxDB and Grafana to visualize the data collected. Hopefully these instructions were easy for you to run and spark your imagination moving forward. We could explore running this example in GCP, Amazon, or Digital Ocean and publishing to different sources such as KairosDB, Cassandra, or Heka utilizing the Snap Framework.

Do you want to apply different securities considerations to different Kubernetes pods? Do you want to separate Docker containers into different network layers while they can still talk to each other? If you have any suggestions for improving this example please feel free to open a PR. I look forward to your feedback!

Acknowledgements

I’d like to thank the following people for helping make this blog post possible: Nan Liu, sarahjhh and Matthew Brender.