Our journey implementing Sensu to monitor Kubernetes in production

Published in

The Sensu blog

9 min readJan 30, 2018

This is a guest post to the Sensu Blog by Andy Repton, member to the Sensu community. He offered to share his experience as a user in his own words, which you can do too by emailing community@sensu.io. Learn all about the community at sensuapp.org/community.

Schuberg Philis, a Dutch Managed Services provider based in Amsterdam, is a company dedicated to 100% customer satisfaction; we distinguish ourselves by running the Mission Critical Workloads of our customers. When Kubernetes sprung onto the scene we were excited by the possibilities that it represented and were eager to start using it in our engineering teams. As of today we run over 20 Kubernetes clusters in production for a variety of teams across multiple private and public clouds.

As part of being production ready we require detailed functional monitoring, and since we started deploying Kubernetes we’ve used Sensu as the monitoring solution for it all. This has supported our growing demand from both our internal teams and external customers, while also helping us meet our 100% uptime guarantee. This journey we took creating and updating our monitoring for a container landscape, while also migrating our existing infrastructure from Nagios to Sensu, presented us with some interesting challenges and opportunities.

We believe as a company that one of the secrets to uptime is not standing still but rather continuously evolving to avoid fragility. This aligns well with our customer’s expectations which have recently started the significant shift towards container technologies. As we expanded into containerised workloads we needed a powerful, scalable monitoring solution that would not only support this new world but our existing Virtual and Physical infrastructure as well. The solution we chose was Sensu.

Why we built it (where we started)

Nearly two years ago my colleague, Michael Russell, and myself started building what would become our Mission Critical Kubernetes platform. We took stock Kubernetes and added high availability, secure best practises, and monitoring to allow our individual customer teams at Schuberg Philis to easily start using this new technology.

We take monitoring very seriously because of our uptime requirements. Our existing infrastructure used Nagios as a monitoring solution but we had started to hit the limits of what it was capable of handling. On average our monitoring solution processes around 65,000 events per day, and at over 5,000 individual checks, all pull-based from our central servers, the system was starting to strain under the load. In addition, as we started looking into more dynamic infrastructure (where our systems may only exist for days or even hours before being replaced) the amount of time required to regenerate and manage the configuration became challenging.

While deciding on our solution we knew there was a huge benefit to ensuring our replacement tool had familiar semantics to what we were used to. We decided not to opt for a telemetry-only alerting system (like Prometheus) because it would mean a costly rewrite of all of our existing checks, and we would have to retrain the 250 engineers we have writing checks. Sensu provided the bridge. We can reuse the same monitoring logic of Nagios and can run existing Nagios checks we have in NRPE without a rewrite while solving the scaling issues we were facing with a scalable transport layer.

Rolling out exciting new infrastructure gave us an opportunity to reevaluate our monitoring framework. We decided on Sensu as the right choice.

What we built

We run over 20 production Kubernetes clusters deployed across private and public clouds supporting dozens of teams internally and externally all monitored by Sensu. Within our clusters the flow is similar to this:

Our deployment of our clusters is slightly unique. From the base we use a combination of Terraform and Ansible to deploy an initial Kubernetes cluster that we call the Management Kubernetes Cluster. This cluster is where we deploy the master components of our other Kubernetes clusters: the Apiserver, Controller Manager, and the Scheduler. When we started with Kubernetes, the lifecycle management of the master components was a major challenge and a challenge that Kubernetes itself excels at solving. By having our master components in a separate cluster, problems such as secret management, rolling upgrades, and high availability became remarkably simple. Since we started with this we’ve noticed that this is becoming a more common practice (see this article from The New Stack) and it’s working remarkably well for us.

Once our clusters were deployed we moved our focus to monitoring.

Iteration One: The existing kubernetes plugin

https://github.com/sensu-plugins/sensu-plugins-kubernetes

With this first iteration, we needed to check apiserver health, etcd health, etc. This monitoring was focused on the cluster itself and was a great starting point for us. With our master components now fully monitored for availability we moved our focus to the nodes.

Iteration Two: Adding Prometheus and node checks

While some work in the Sensu community had produced a set of plugins, we noted the excitement and momentum invested into the Prometheus project for scraping Kubernetes metrics (https://github.com/prometheus/prometheus). We felt that leveraging this to scrape metrics inside our clusters was a sensible move. The challenge here was that there were no plugins for Sensu to scrape Prometheus, so we wrote one ourselves and open sourced it:

https://github.com/schubergphilis/sensu-plugins-prometheus-checks

Using this, combined with the node_exporter, we were now monitoring the nodes and the health of our kubelets. Additionally, we started to scrape the Kubernetes API to add node metrics coming to the API from the metrics endpoints.

We centralised the checks in this way so that we could adjust the configuration as it evolved without needing to change anything on the nodes. This way we wouldn’t need to re-run configuration management tooling if we deployed a new monitoring alert.

The current production setup we run on our clusters runs a node exporter as a daemon set, a Prometheus deployment that scrapes this, and a Sensu client deployment that gathers all of this information and sends it on to our central Sensu server using an encrypted connection.

Iteration Three: Adding our check-k8s functional checks

More requirements arose as we improved upon the project. It was not sufficient to merely care about the health of the cluster, we also had to care about the health of the systems running within it. For example, we needed to make sure our pods were healthy and our replica sets were not clustering our pods together. This encouraged us to start writing another plugin: check-k8s.

We dynamically set Kubernetes limits and requests so we don’t overload our nodes, but for some of our clusters the focus is on the average CPU and Memory of the entire cluster rather than individual nodes. We adjusted the plugin to give us the average load over all CPUs in the cluster so we know when we need to scale up or down the nodes:

schubergphilis/sensu-plugins-prometheus-checks

Contribute to sensu-plugins-prometheus-checks development by creating an account on GitHub.

github.com

This infrastructure and the resulting plugins have allowed us to rapidly iterate our infrastructure. Below I’ll go into some of the challenges and solutions that we encountered during our journey of implementing Sensu for our Kubernetes setup.

Our challenges and solutions during our journey

As we iterated our monitoring deployments we learned a lot. Here are some of our experiences from implementation:

The migration path

What was the best way to migrate from Nagios to Sensu? Our existing monitoring needed to continue working without interrupt. As we already used NRPE as a standard we added the Sensu client to each virtual machine and reused the existing checks without further adjustment.

Alert routing from inside a shared Kubernetes cluster

We have so many events being processed across our teams that our setup splits the alerts into different teams based on the hostname of the server; with multiple teams potentially sharing a Kubernetes cluster we needed to keep this functionality.

With our Kubernetes setup we can have several teams running dedicated setups in the same cluster. Our administration team is ensuring DNS, the Dashboard, and the Sensu client are all running in the kube-system namespace, a development team could have a microservices architecture in the production namespace, and a security team might have an automated cronjob running in the security namespace. These all needed to be monitored with alerts going to their respective teams and it seemed wasteful to run multiple Sensu clients for this.

Luckily Sensu allows us to spoof any hostname we like, so we built this functionality in, setting a different hostname per namespace:

Using these commands…

How to keep multiple replicas separated

While Kubernetes is great at keeping a set number of replicas running at all times and recovering from node failure, often with our smaller clusters we found that all of our replicas would end up running on the same node. This was understandably frustrating as if the node crashed this would result in having all of our replicas die together.

While today, Pod Anti-Affinity is the best practise, in the early days this was was not available to ensure pods stay away from each other on nodes. To compensate, our check-k8s script alerts us if for any reason all the replicas are on the same node:

schubergphilis/sensu-plugins-k8s

sensu-plugins-k8s - Additional Sensu plugin to check Kubernetes resources

github.com

Avoiding non-ready Pods

One day we had one of our development teams ask for some help in debugging an application that was running while the service endpoint was not responding. After some poking we realised that all of the pods were running but were not ‘ready’. They were stuck waiting for a third party service to start up. By checking if enough of our pods are ready we can then alert if a subset are stuck.

Combining these two checks allowed us a novel way to create an alert by creating a deployment with a single replica and using the readiness probe to create an alert. When the pod goes non-ready, we’ll receive a page:

This example uses cURL but as you can run any image you like any level of complexity could be baked into the container and used as an alerter. Clearing the alert is as simple as removing the deployment and cleaning the agent from Sensu.

Selectively alerting namespaces

We found a common practice of a namespace per DTAP environment emerge, so decided to build into our plugin the ability to label a namespace to determine if it should be paging or not. With the label set to false the plugin will ‘fake’ an OK to the Sensu server.

It has never been easier for a developer on the team to turn off alerting on their pod for any reason. All our alerts, which are available here, look for a “pagethis=true” label. If it’s changed to false the Sensu check will always return 0. No need to edit check config files or rerun Chef — it’s as easy as an API or kubectl command.

Keeping Sensu config configurable

Alerts must be simple to configure and easy to turn on and off. Where was the right place to do that? In the Sensu config, Kubernetes labels, or elsewhere?

We wanted it to be incredibly simple to add and edit our checks so we designed two different ways to achieve this. The first using a Kubernetes config map for the Sensu client config and the second using our check-k8s plugin described above to create ‘Readiness Probe’ checks.

A large challenge was identifying the right metrics to check. Some of our teams didn’t care about CPU usage or Memory alerts. Their systems were designed to autoscale when resource usage was high. To make the solution flexible we use a Kubernetes config map to allow each team to determine the checks they would like enabled.

Use our code

To deploy our setup inside your own Kubernetes cluster you’ll need the following code to set up the targets. Note: we deploy into a ‘monitoring’ namespace.

The namespace:

The Prometheus configmap:

The node_exporter daemon set:

The Prometheus deployment and service:

And then for Sensu, the Sensu client config map for the Prometheus check and the client-checks.json (replace CLUSTERNAME with the host you would like it to appear as in Sensu):

Generate your secrets for your certs and config:

And deploy Sensu:

I’d like to explicitly thank my colleagues Michael Russell, Kevin Struis, Otávio Fernandes, Mitch Hulscher, and the other engineers at Schuberg Philis for all of their hard work developing and implementing this monitoring system.

Thank you for reading! As always our solution is constantly improving as we learn and adapt. Please reach out if you have any feedback or have any challenges with our code!