Monitor your Kubernetes Cluster
Keeping an eye on logs and metrics is a necessary evil for cluster admins. The benefits are clear: metrics help you set reasonable performance goals, while log analysis can uncover issues that impact your workloads. The hard part, however, is getting a slew of applications to work together in a useful monitoring solution.
In this post, I’ll cover monitoring a Kubernetes cluster with Graylog (for logging) and Prometheus (for metrics). Of course that’s not just wiring 3 things together. In fact, it’ll end up looking like this:
As you know, Kubernetes isn’t just one thing — it’s a system of masters, workers, networking bits, etc(d). Similarly, Graylog comes with a supporting cast (apache2, mongodb, etc), as does Prometheus (telegraf, grafana, etc). Connecting the dots in a deployment like this may seem daunting, but the right tools can make all the difference.
I’ll walk through this using conjure-up and the Canonical Distribution of Kubernetes (CDK). I find the conjure-up interface really helpful for deploying big software, but I know some of you hate GUIs and TUIs and probably other UIs too. For those folks, I’ll do the same deployment again from the command line.
Before we jump in, note that Graylog and Prometheus will be deployed alongside Kubernetes and not in the cluster itself. Things like the Kubernetes Dashboard and Heapster are excellent sources of information from within a running cluster, but my objective is to provide a mechanism for log/metric analysis whether the cluster is running or not.
The Walk Through
First things first, install conjure-up if you don’t already have it. On Linux, that’s simply:
sudo snap install conjure-up --classic
There’s also a brew package for macOS users:
brew install conjure-up
You’ll need at least version 2.5.2 to take advantage of the recent CDK spell additions, so be sure to
sudo snap refresh conjure-up or
brew update && brew upgrade conjure-up if you have an older version installed.
Once installed, run it:
You’ll be presented with a list of various spells. Select CDK and press
At this point, you’ll see additional components that are available for the CDK spell. We’re interested in Graylog and Prometheus, so check both of those and hit
You’ll be guided through various cloud choices to determine where you want your cluster to live. After that, you’ll see options for post-deployment steps, followed by a review screen that lets you see what is about to be deployed:
In addition to the typical K8s-related applications (etcd, flannel, load-balancer, master, and workers), you’ll see additional applications related to our logging and metric selections.
The Graylog stack includes the following:
- apache2: reverse proxy for the graylog web interface
- elasticsearch: document database for the logs
- filebeat: forwards logs from K8s master/workers to graylog
- graylog: provides an api for log collection and an interface for analysis
- mongodb: database for graylog metadata
The Prometheus stack includes the following:
- grafana: web interface for metric-related dashboards
- prometheus: metric collector and time series database
- telegraf: sends host metrics to prometheus
You can fine tune the deployment from this review screen, but the defaults will suite our needs. Click
Deploy all Remaining Applications to get things going.
The deployment will take a few minutes to settle as machines are brought online and applications are configured in your cloud. Once complete, conjure-up will show a summary screen that includes links to various interesting endpoints for you to browse:
Now that Graylog has been deployed and configured, let’s take a look at some of the data we’re gathering. By default, the filebeat application will send both syslog and container log events to graylog (that’s
/var/log/containers/*.log from the kubernetes master and workers).
Grab the apache2 address and graylog admin password as follows:
juju status --format yaml apache2/0 | grep public-address
juju run-action --wait graylog/0 show-admin-password
http://<your-apache2-ip> and login with admin as the username and <your-graylog-password> as the password. Note: if the interface is not immediately available, please wait as the reverse proxy configuration may take up to 5 minutes to complete.
Once logged in, head to the
Sources tab to get an overview of the logs collected from our K8s master and workers:
Drill into those logs by clicking the
System / Inputs tab and selecting
Show received messages for the filebeat input:
From here, you may want to play around with various filters or setup Graylog dashboards to help identify the events that are most important to you. Check out the Graylog Dashboard docs for details on customizing your view.
Our deployment exposes two types of metrics through our grafana dashboards: system metrics include things like cpu/memory/disk utilization for the K8s master and worker machines, and cluster metrics include container-level data scraped from the K8s cAdvisor endpoints.
Grab the grafana address and admin password as follows:
juju status --format yaml grafana/0 | grep public-address
juju run-action --wait grafana/0 get-admin-password
http://<your-grafana-ip>:3000 and login with admin as the username and <your-grafana-password> as the password. Once logged in, check out the cluster metric dashboard by clicking the
Home drop-down box and selecting
Kubernetes Metrics (via Prometheus):
We can also check out the system metrics of our K8s host machines by switching the drop-down box to
Node Metrics (via Telegraf):
As with Graylog, your Grafana dashboards can be customized to monitor specific metrics that you care about. The Grafana community maintains hundreds of dashboards that may be useful to you. Learn more at the Grafana Labs Dashboards site.
The Other Way
As alluded to in the intro, I prefer the wizard-y feel of conjure-up to guide me through complex software deployments like Kubernetes. Now that we’ve seen the conjure-up way, some of you may want to see a command line approach to achieve the same results. Still others may have deployed CDK previously and want to extend it with the Graylog/Prometheus components described above. Regardless of why you’ve read this far, I’ve got you covered.
The tool that underpins conjure-up is Juju. Everything that the CDK spell did behind the scenes can be done on the command line with Juju. Let’s step through how that works.
Starting From Scratch
If you’re on Linux, install Juju like this:
sudo snap install juju --classic
For macOS, Juju is available from brew:
brew install juju
Now setup a controller for your preferred cloud. You may be prompted for any required cloud credentials:
We then need to deploy the base CDK bundle:
juju deploy canonical-kubernetes
Starting From CDK
With our Kubernetes cluster deployed, we need to add all the applications required for Graylog and Prometheus:
## deploy graylog-related applications
juju deploy xenial/apache2
juju deploy xenial/elasticsearch
juju deploy xenial/filebeat
juju deploy xenial/graylog
juju deploy xenial/mongodb
## deploy prometheus-related applications
juju deploy xenial/grafana
juju deploy xenial/prometheus
juju deploy xenial/telegraf
Now that the software is deployed, connect them together so they can communicate:
## relate graylog applications
juju relate apache2:reverseproxy graylog:website
juju relate graylog:elasticsearch elasticsearch:client
juju relate graylog:mongodb mongodb:database
juju relate filebeat:beats-host kubernetes-master:juju-info
juju relate filebeat:beats-host kubernetes-worker:juju-info
juju relate filebeat:logstash graylog:beats
## relate prometheus applications
juju relate prometheus:grafana-source grafana:grafana-source
juju relate telegraf:prometheus-client prometheus:target
juju relate kubernetes-master:juju-info telegraf:juju-info
juju relate kubernetes-worker:juju-info telegraf:juju-info
At this point, all the applications can communicate with each other, but we have a bit more configuration to do (e.g., setting up the apache2 reverse proxy, telling prometheus how to scrape k8s, importing our grafana dashboards, etc):
## configure graylog applications
juju config apache2 enable_modules="headers proxy_html proxy_http"
juju config apache2 vhost_http_template="$(base64 <vhost-tmpl>)"
juju config filebeat logpath="/var/log/*.log" kube_logs=True
juju config graylog elasticsearch_cluster_name="<es-cluster>"
## configure prometheus applications
juju config prometheus scrape-jobs="<scraper-yaml>"
juju run-action --wait grafana/0 import-dashboard \
Some of the above steps need values specific to your deployment. You can get these in the same way that conjure-up does:
- <vhost-tmpl>: fetch our sample template from github
juju config elasticsearch cluster-name
- <scraper-yaml>: fetch our sample scraper from github; substitute appropriate values for
- <dashboard-json>: fetch our sample k8s dashboard from github
Finally, you’ll want to expose the apache2 and grafana applications to make their web interfaces accessible:
## expose relevant endpoints
juju expose apache2
juju expose grafana
Now that we have everything deployed, related, configured, and exposed, you can login and poke around using the same steps from the Exploring Logs and Exploring Metrics sections above.
The Wrap Up
My goal here was to show you how to deploy a Kubernetes cluster with rich monitoring capabilities for logs and metrics. Whether you prefer a guided approach or command line steps, I hope it’s clear that monitoring complex deployments doesn’t have to be a pipe dream. The trick is to figure out how all the moving parts work, make them work together repeatably, and then break/fix/repeat for a while until everyone can use it.
This is where tools like conjure-up and Juju really shine. Leveraging the expertise of contributors to this ecosystem makes it easy to manage big software. Start with a solid set of apps, customize as needed, and get back to work!
Give these bits a try and let me know how it goes. You can find enthusiasts like me on Freenode IRC in #conjure-up and #juju. Thanks for reading!