Monitor the Systems You Love on Kubernetes

Emily Gu
10 min readOct 25, 2016

--

Services shift from the monolithic to microservices, containerization and orchestration in cloud computing. Monitoring the health of both hardware and software systems at scale with ease becomes crucial to ensure the minimal downtime of our applications. How can we ensure the interoperability of so much different software, dashboards/tools and the underlying platforms and architectures? How do we make sense of data when we measure data in petabytes and zettabytes? There are so many analytics and monitoring tools out there. For this blog post, we focus on how Snap can simplify your analytic and monitoring solution with Kubernetes.

Snap is an open source telemetry framework written in Go. Its gRPC server and client communication pattern allows interoperability among different languages. The Snap ecosystem comprises of hardware and software collector, processor and publisher plugins. You can check out the available plugins in the plugin catalog. Here are some features worth noting:

  • Collected data can be published to different data stores
  • Containerizable and packaged as microservices
  • Extendable to create your own automated monitoring solution

In this blog post, we will collect sample metrics from CPU, Memory, Disk, psutil and Elasticsearch to monitor the health of an Elasticsearch cluster and the the underlying system at the same time.

Elasticsearch is similar to a distributed document-based NoSQL database or data warehouse with lightning speed full-text search capability. It’s a very popular project for monitoring other systems, but how do we monitor Elasticsearch itself and its underlying system?

For this blog post we explore 2 different tools, docker-compose and Minikube to simplify monitoring the health of an Elasticsearch cluster and the underlying system utilizing the Snap Framework. We will use both tools to run an instance of Snap with plugins that collect the health data of Elasticsearch nodes in a cluster and the health data of system cpu, memory, disks and psutil. Then we will use influxDB and Grafana to visualize the data collected.

All scripts used in this blog are available on Github:

Monitoring with docker-compose

First, let’s run the example using docker-compose on your laptop. Docker-compose is ideal for development. This will help us understand how to deploy the example to Kubernetes later. You can see the similarities and differences between docker-compose and Kubernetes in their respective deployment YAML files.

Before you start, you need docker/docker-machine and docker-compose installed on your laptop. Then you can proceed:

1. Start the example. This starts an instance of Snap with the Elasticsearch collector, cpu, meminfo, disk and psutil plugins pre-loaded collecting sample metrics, an instance of InfluxDB, an instance of Grafana and one Elasticsearch master node and one data node on start.

▶ cd snap-elasticsearch-monitor/docker-compose/
docker-compose -f snap-es-monitor.yml up -d
Creating network "dockercompose_default" with the default driver
Creating dockercompose_snap-elasticsearch_1
Creating influxdb
Creating dockercompose_es-node_1
Creating grafana
Creating snap-elasticsearch-monitor
Creating dockercompose_master_1
Creating dockercompose_node_1

2. View Grafana dashboard. Go to http://[dockerhost]:3000 to view live metric collection visualization. There are two pre-created dashboard panels: Snap Elasticsearch Monitor and Snap System Monitor.

To see the IP address of your Docker host if you use docker-machine:

echo $DOCKER_HOST
tcp://192.168.99.100:2376

For example: http://192.168.99.100:3000. Otherwise it’s http://127.0.0.1:3000 if you use Docker.

3. Scale Elasticsearch cluster. The example Elasticsearch cluster has one master and one data node. You can scale your cluster up or down. For example, to scale up to 5 data nodes:

▶ docker-compose -f snap-es-monitor.yml scale snap-elasticsearch=1 es-node=5
Desired container number already achieved
Creating and starting dockercompose_es-node_2 ... done
Creating and starting dockercompose_es-node_3 ... done
Creating and starting dockercompose_es-node_4 ... done
Creating and starting dockercompose_es-node_5 ... done

After scaling up you can view your updated docker containers:

▶ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8ead4c2c3e88 elasticsearch:latest "/docker-entrypoint.s" 2 minutes ago Up 2 minutes 9200/tcp, 9300/tcp dockercompose_es-node_4
e680486b0642 elasticsearch:latest "/docker-entrypoint.s" 2 minutes ago Up 2 minutes 9200/tcp, 9300/tcp dockercompose_es-node_3
49807affc8f7 elasticsearch:latest "/docker-entrypoint.s" 2 minutes ago Up 2 minutes 9200/tcp, 9300/tcp dockercompose_es-node_2
34821be64e08 elasticsearch:latest "/docker-entrypoint.s" 2 minutes ago Up 2 minutes 9200/tcp, 9300/tcp dockercompose_es-node_5
3ca984da410c dockercompose_main "sh -c '/usr/local/bi" 3 minutes ago Up 3 minutes 8181/tcp snap-elasticsearch-monitor
24b5dc5bcb35 elasticsearch:latest "/docker-entrypoint.s" 3 minutes ago Up 3 minutes 9200/tcp, 9300/tcp dockercompose_es-node_1
d800fa7a2c4e grafana/grafana:latest "/run.sh" 3 minutes ago Up 3 minutes 0.0.0.0:3000->3000/tcp grafana
ebe9c2397e1a tutum/influxdb:latest "/run.sh" 3 minutes ago Up 3 minutes 0.0.0.0:8083->8083/tcp, 0.0.0.0:8086->8086/tcp influxdb
55f9ef583fba elasticsearch:latest "/docker-entrypoint.s" 3 minutes ago Up 3 minutes 0.0.0.0:9200->9200/tcp, 9300/tcp dockercompose_snap-elasticsearch_1

There are now 5 Elasticsearch data nodes and 1 master node.

4. Stop running containers. Once you are done, you can stop running the example containers. This will allow you to re-start later if you wish.

docker-compose -f snap-es-monitor.yml stop
Stopping dockercompose_es-node_4 ... done
Stopping dockercompose_es-node_3 ... done
Stopping dockercompose_es-node_2 ... done
Stopping dockercompose_es-node_5 ... done
Stopping snap-elasticsearch-monitor ... done
Stopping dockercompose_es-node_1 ... done
Stopping grafana ... done
Stopping influxdb ... done
Stopping dockercompose_snap-elasticsearch_1 ... done

If you want to stop and remove the containers.

 docker-compose -f snap-es-monitor.yml down
Stopping dockercompose_es-node_4 ... done
Stopping dockercompose_es-node_3 ... done
Stopping dockercompose_es-node_2 ... done
Stopping dockercompose_es-node_5 ... done
Stopping snap-elasticsearch-monitor ... done
Stopping dockercompose_es-node_1 ... done
Stopping grafana ... done
Stopping influxdb ... done
Stopping dockercompose_snap-elasticsearch_1 ... done
Removing dockercompose_es-node_4 ... done
Removing dockercompose_es-node_3 ... done
Removing dockercompose_es-node_2 ... done
Removing dockercompose_es-node_5 ... done
Removing snap-elasticsearch-monitor ... done
Removing dockercompose_es-node_1 ... done
Removing grafana ... done
Removing influxdb ... done
Removing dockercompose_snap-elasticsearch_1 ... done
Removing network dockercompose_default

Monitoring with Kubernetes

Second, we can try running the example inside Kubernetes with Minikube on your laptop. With just a few steps you’ll have the example running inside an enterprise grade container management.

Before you start you should have either Kubernetes or Minikube and kubectl installed. This example uses Minikube.

  1. Start MiniKube.
minikube start
Starting local Kubernetes cluster...
Kubectl is now configured to use the cluster.

2. Setup your console docker environment. This allows you to run docker commands on your console.

▶ eval $(minikube docker-env)

3. Start the example inside Kubernetes. This creates and starts an Elasticsearch cluster with one master and one data node. Also, this creates and starts an instance of Snap with the Elasticsearch collector, cpu, meminfo, disk, psutil plugins pre-loaded, an instance of Grafana, and an instance of InfluxDB.

▶ cd snap-elasticsearch-monitor/kubernetes/
kubectl create -f deployment --namespace kube-system
deployment "snap-elasticsearch" created
You have exposed your service on an external port on all nodes in your
cluster. If you want to expose this service to the external internet, you may
need to set up firewall rules for the service port(s) (tcp:31527,tcp:31462) to serve traffic.
See http://releases.k8s.io/release-1.3/docs/user-guide/services-firewalls.md for more details.
service "elasticsearch" created
replicationcontroller "es-data" created
You have exposed your service on an external port on all nodes in your
cluster. If you want to expose this service to the external internet, you may
need to set up firewall rules for the service port(s) (tcp:32657) to serve traffic.
See http://releases.k8s.io/release-1.3/docs/user-guide/services-firewalls.md for more details.
service "monitoring-grafana" created
replicationcontroller "heapster" created
replicationcontroller "influxdb-grafana" created
service "monitoring-influxdb" created

4. Forward the Influxdb and Grafana ports to your local machine. Because we are running the example in a container, if you want to view the Influxdb and Grafana dashboard locally you need to forward the ports of the influxdb-grafana pod.

Start by finding the pod name when they are all in Running status:

kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
snap-elasticsearch-3470735521-w8p5p 2/2 Running 0 2m
es-data-6rdi6 1/1 Running 0 2m
heapster-yf7eu 1/1 Running 0 2m
influxdb-grafana-jsatd 2/2 Running 0 2m
kube-addon-manager-minikubevm 1/1 Running 39 6d
kubernetes-dashboard-sllpy 1/1 Running 20 6d

Here the pod is named: influxdb-grafana-jsatd

Note: you will need to wait for all containers to be in a Running state before proceeding.

5. Create a Snap sample task to run the sample metrics.

Start by finding the running container ID:

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9c6d4c240bf1 kubernetes/heapster:canary "/heapster --source=k" 12 minutes ago Up 12 minutes k8s_heapster.6c2648e6_heapster-irt4k_kube-system_d3659d5f-9169-11e6-96f0-8203398eff0a_92f9930c
20558b9581bb elasticsearch:latest
07b00720507f candysmurfhub/snap-es-mon-k8s:latest "sh -c '/usr/local/bi" 12 minutes ago Up 12 minutes k8s_main.d6053766_snap-elasticsearch-1681843414-q3oyc_kube-system_d347e719-9169-11e6-96f0-8203398eff0a_e7d7dcac
cf5504f9b2f1 gcr.io/google_containers/pause-amd64:3.0 "/pause" 12 minutes ago Up 12 minutes k8s_POD.d8dbe16c_influxdb-grafana-xilzh_kube-system_d36fdf75-9169-11e6-96f0-8203398eff0a_e9e6d561
7bfd879ea011 gcr.io/google_containers/pause-amd64:3.0 "/pause" 12 minutes ago Up 12 minutes k8s_POD.69a02673_snap-elasticsearch-1681843414-q3oyc_kube-system_d347e719-9169-11e6-96f0-8203398eff0a_fe3f64b2
51756b97c8f8 gcr.io/google_containers/pause-amd64:3.0
...

Then go into the container and check if plugins are loaded:

▶ docker exec -it 07b007 bashbash-4.3# snapctl plugin list
NAME VERSION TYPE SIGNED STATUS
file 2 publisher false loaded
influxdb 15 publisher false loaded
cpu 6 collector false loaded
disk 3 collector false loaded
elasticsearch 3 collector false loaded
meminfo 3 collector false loaded
psutil 8 collector false loaded

If you see the following, don’t panic! It takes time to start. Check back in a bit.

bash-4.3# snapctl plugin list
Error: URL target is not available. Get http://localhost:8181/v1/plugins: dial tcp [::1]:8181: getsockopt: connection refused

Create Snap tasks in the container:

bash-4.3# ./usr/local/bin/create_tasks

Note: Snap supports auto-discovery to automatically load plugins and create tasks; however, at the time of writing this post, Kubernetes does not support pod dependencies. Once adding the initialization containers to specify dependencies, step 6 can be skipped.

Then exit out of the container.

6. View Grafana dashboard locally. First, run the command to forward the ports:

kubectl port-forward influxdb-grafana-jsatd 8083:8083 3000:3000 --namespace kube-system
Forwarding from 127.0.0.1:8083 -> 8083
Forwarding from [::1]:8083 -> 8083
Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000

Now login to the Grafana dashboard at http://127.0.0.1:3000/ (username/password is admin/admin). The dashboard has 4 pre-created panels: Cluster, Pod, Snap Elasticsearch Monitor and Snap System Monitor.

7. Scale up Elasticsearch Cluster. This example code scales Elasticsearch data pod from 1 to 3 pods:

kubectl scale --replicas=3 rc es-data --namespace kube-system
replicationcontroller "es-data" scaled

After Scaling, there are three Elasticsearch data pods.

▶ kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
snap-elasticsearch-3470735521–1b7a7 2/2 Running 0 3m
es-data-86o1d 1/1 Running 0 3m
es-data-kqz1w 1/1 Running 0 33s
es-data-yplqe 1/1 Running 0 33s

heapster-y792r 1/1 Running 0 3m
influxdb-grafana-uq759 2/2 Running 0 3m
kube-addon-manager-minikubevm 1/1 Running 39 6d
kubernetes-dashboard-sllpy 1/1 Running 20 6d

8. Remove example deployment. Once you’re finished you should cleanup your kubernetes pods, services, replication controllers, and deployments.

▶ ./kubernetes/cleanup.sh
deployment "snap-elasticsearch" deleted
replicationcontroller "es-data" deleted
replicationcontroller "heapster" deleted
replicationcontroller "influxdb-grafana" deleted
service "elasticsearch" deleted
service "monitoring-grafana" deleted
service "monitoring-influxdb" deleted

9. Stop minikube.

▶ minikube stop
Stopping local Kubernetes cluster...
Machine stopped.

And reset your Docker environmental variables:

▶ eval $(minikube docker-env -u)

Using kubectl to check Kubernetes cluster

If you like to monitor the Kubenetes cluster health, you can use kubectl. Here are a few kubectl commands to explore:

1. Check what deployments you have running:

kubectl get deployments --namespace kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
snap-elasticsearch 1 1 1 1 32s

2. Check what pods you have running:

kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
snap-elasticsearch-3470735521-w8p5p 2/2 Running 0 2m
es-data-6rdi6 1/1 Running 0 2m
heapster-yf7eu 1/1 Running 0 2m
influxdb-grafana-jsatd 2/2 Running 0 2m
kube-addon-manager-minikubevm 1/1 Running 39 6d
kubernetes-dashboard-sllpy 1/1 Running 20 6d

3. Check what services you have running:

kubectl get svc --namespace kube-system
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch 10.0.0.64 <nodes> 9200/TCP,9300/TCP 3m
kube-dns 10.0.0.10 <none> 53/UDP,53/TCP 4d
kubernetes-dashboard 10.0.0.63 <nodes> 80/TCP 6d
monitoring-grafana 10.0.0.27 <nodes> 80/TCP 3m
monitoring-influxdb 10.0.0.178 <none> 8083/TCP,8086/TCP 3m

4. Check what replication controllers you have running:

kubectl get rc --namespace kube-system
NAME DESIRED CURRENT AGE
es-data 1 1 4m
heapster 1 1 4m
influxdb-grafana 1 1 4m
kubernetes-dashboard 1 1 6d

5. To get more details on a pod:

 kubectl describe pods es-data-kqz1w --namespace kube-system
Name: es-data-kqz1w
Namespace: kube-system
Node: minikubevm/10.0.2.15
Start Time: Thu, 29 Sep 2016 08:49:08 -0700
Labels: name=esdata
Status: Running
IP: 172.17.0.7
Controllers: ReplicationController/es-data
Containers:
esdata:
Container ID: docker://32810123e5cd381769eb1b3be9316a67713a1e5fbaca1b0a0accc230ed4f5b6b
Image: elasticsearch:latest
Image ID: docker://sha256:22287ab1f811bd81eaaf3a9c112a8100b6532996a0a3a428ec5bcace8802db59
Port:
State: Running
Started: Thu, 29 Sep 2016 08:49:15 -0700
Ready: True
Restart Count: 0
Environment Variables:
DISCOVERY_ZEN_PING_UNICAST_HOSTS: elasticsearch
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-c0kys:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c0kys
QoS Tier: BestEffort
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
47m 47m 1 {default-scheduler } Normal Scheduled Successfully assigned es-data-kqz1w to minikubevm
47m 47m 1 {kubelet minikubevm} spec.containers{esdata} Normal Pulling pulling image "elasticsearch:latest"
46m 46m 1 {kubelet minikubevm} spec.containers{esdata} Normal Pulled Successfully pulled image "elasticsearch:latest"
46m 46m 1 {kubelet minikubevm} spec.containers{esdata} Normal Created Created container with docker id 32810123e5cd
46m 46m 1 {kubelet minikubevm} spec.containers{esdata} Normal Started Started container with docker id 32810123e5cd

Next

In this blog post we learned how to use docker-compose and Minikube to run an example of Snap with plugins that collect the health data of Elasticsearch nodes in a cluster and the health data of system cpu, memory, disks and network statistics. We also used InfluxDB and Grafana to visualize the data collected. Hopefully these instructions were easy for you to run and spark your imagination moving forward. We could explore running this example in GCP, Amazon, or Digital Ocean and publishing to different sources such as KairosDB, Cassandra, or Heka utilizing the Snap Framework.

Do you want to apply different securities considerations to different Kubernetes pods? Do you want to separate Docker containers into different network layers while they can still talk to each other? If you have any suggestions for improving this example please feel free to open a PR. I look forward to your feedback!

Acknowledgements

I’d like to thank the following people for helping make this blog post possible: Nan Liu, sarahjhh and Matthew Brender.

--

--