From Monolith to Microservice Architecture on Kubernetes, part 4— Monitoring, health checks, logging & tracing

Jeroen Rosenberg
Aug 9, 2017 · 12 min read

In this blog series we’ll discuss our journey at Cupenya of migrating our monolithic application to a microservice architecture running on Kubernetes. In the previous parts of the series we’ve seen how the core components of the infrastructure, the Api Gateway and the Authentication Service, were built and how we converted our main application to a microservice running in Kubernetes.

Parts


As mentioned there’s a number of gains to splitting our monolith up into microservices. For us it was mainly about:

  • Increased agility and smaller & faster deployments
  • Individually scalability of services
  • More fine grained control over service SLA’s (i.e. some services are crucial and need to have failover in place and some are not)
  • The ability to have autonomous teams responsible for a subset of services

However, running microservices also brings a lot of challenges. Instead of having a single log file, we now have a ton of them. There’s also a lot more application stacks to monitor. Services might crash. This needs to be detected so that they can be restarted. Dependant services need to handle this downtime. It’s much harder to troubleshoot performance issues. User requests might now and probably will span a set of microservices, so we would need to trace requests across service (network) boundaries to make sense of it and detect bottlenecks.

In this post I’ll show how we’ve addressed the mentioned challenges with regards to logging, automatic recovery, monitoring & tracing.

Logging

Let’s start with logging. As mentioned running (a lot of) microservices will result in a lot of logs scattered around your cluster. We would need to aggregate our logs somehow to have a complete timeline of log events across all our services. In this section I’ll show how we’ve approached this.

For starters we’ve configured all our microservices to log to stdout and stderr. For our microservices written in Scala we rely on Logback which natively implements the SLF4J API so we could switch to other logging frameworks with little effort. To split the logging output between stdout and stderr based on the logging level we are using the following logback.xml:

This splitting makes it really trivial to forward logs to a log aggregator with a little help of Kubernetes. Everything a containerized application writes to stdout and stderr is handled and redirected somewhere by a container engine. In our case the Docker container engine redirects those two streams to a logging driver, which is configured in Kubernetes to write to a file in JSON format. There it’s being picked up by the Fluentd logging agent. Since we are on Google Container Engine (GCE) this agent is configured to push it to Stackdriver.

Stackdriver

Stackdriver is the default logging solution for clusters deployed on GCE. You get it by default when setting up a new cluster unless you explicitly opt-out. By default, Stackdriver logging collects only your container’s stdout and stderr streams. To collect any logs your application writes to a file (for example), see the sidecar approach in the Kubernetes logging overview. As mentioned, with our Logback configuration we didn’t need to.

In the Stackdriver UI you can view all the aggregated logs as one nice stream. You can also easily filter down log entries by selecting the Kubernetes cluster, namespace or even specific pods. This makes it easy to lookup log entries from several pods within a certain timespan.

However, in our experience searching for log entries by textual criteria or scrolling through logs to find something could be inefficient through the web UI. Also by default there’s a 7 day retention policy, so logs older than 7 days are discarded. For those reasons we additionally set up a Stackdriver Sink to dump our logs to a Cloud Storage bucket on a daily basis. That way we could download the logs at any given time and go wild on them with our favourite tools grep and less. You could also easily setup a sink to export to Google BigQuery and do further analysis from there.

Health checking and automatic recovery

The second topic I’d like to address is automatic recovery of (micro)services. Our microservices might at some point in time transition to broken states where they cannot recover from except by being restarted. This raises a few questions. How do we detect that a microservice is broken? How do we recover them? And how do we notify dependant services of this downtime?

Liveness and readiness probes

To answer the first question: Kubernetes provides liveness probes to detect and remedy situations where a microservice is in an unrecoverable broken state. Those probes could be a simple command to execute (e.g. checking a pid file) an HTTP GET request (e.g. on a health check API) or a TCP connection. The kubelet, the primary “node agent” that runs on each node, is polling the liveness probe every X seconds. If the specified command, HTTP request or TCP connection is unsuccessful it will kill and restart the container. That way we can keep our (crucial) microservices up and running automatically.

Sometimes, for instance during initial startup or after being restarted, our microservices are temporarily unable to serve traffic. They might be still loading its configuration or autowarming their cache. We don’t want to send them traffic just yet in those situations. Readiness probes are meant to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services. Since dependant microservices communicate only through Kubernetes Services, requests to an unavailable microservice immediately get bounced. If you would run multiple replicas of a microservice, dependant services wouldn’t even notice any downtime!

Perfect. So how do we define those readiness and liveness probes? Let’s look at the following snippet of a deployment descriptor of a typical Cupenya microservice.

For the liveness probe we use a simple endpoint that just responds with 200 OK on a GET request. Each microservice also implements a standardised /health endpoint where we do more elaborate checks to see if the service is really in a state where it can serve traffic (e.g. check if it can reach a database), which is used for the readiness probe. This endpoint also follows a standardised response model.

Show me the code already!

Alright, alright… Let’s look at the implementation of the health check service in Scala:

As you can see the response model is a HealthCheckResults wrapper around a list of HealthCheckResult objects. Each service can implement multiple health checks (e.g. check if it can reach a database and some internal component is available). The combined statuses of those checks determine the overall health check status through an HTTP status code.

Now let’s look at the HealthCheckResult model itself:

It’s pretty straightforward. A HealthCheckResult has a name, status, timestamp and optional latency (e.g. time to reach an external service), message and tags.

So based on this tiny “framework” we can implement custom health checks for our microservice. Below is an example of such a health check.

Here we’ve implemented a very basic heartbeat health check to verify we can still list databases on a MongoDB server.

Now we can simply wire the health check service and route in our application using the following code

After this the route /health can be used for a Kubernetes readiness probe as shown earlier in this section. Kubernetes will make sure our microservice won’t receive traffic unless this endpoint will respond with status code 200. In this case that will only happen when it can reach MongoDB.

Monitoring

The next topic I’d like to discuss in this post is monitoring. As with logging running microservices means having lots of separate instances to monitor and lots of metrics scattered around. We needed something that

  • Could collect and aggregate all those metrics
  • Has great visualisations and allows for custom persistent dashboards
  • Provides alerting capabilities
  • Scales price wise with a growing amount of services & nodes
  • Supports tracing across network & service boundaries

To collect metrics our main application already uses Kamon for monitoring. Kamon is great and has a lot of supported reporter backends such as Statsd, Datadog and New Relic.

Datadog vs Prometheus

Before the migration to a microservices infrastructure we were using Datadog as a backend. This is a great cloud solution which is easy to setup and takes away the hassle of maintenance. However with the new setup we are afraid that the per-host pricing model would become quite pricy sooner or later. Hence, we looked for alternatives to Datadog which would still be easy to integrate with Kamon.

We chose Prometheus with the Grafana visualization on top. There’s an excellent post on how to install Prometheus and Grafana in Kubernetes using the provided Helm chart in the Kubernetes github project. In this post we’re going for the manual route, though.

Prometheus has a pull model for getting metrics in as opposed to Datadog’s push model (There’s a nice post on the rational behind this philosophy). It basically requires a /metrics REST endpoint to periodically fetch metrics from. Since we want to leverage our existing Kamon implementation and are not interested in building an additional REST endpoint in microservice, we need a “bridge” in between to get metrics from Kamon to Prometheus.

Statsd Exporter — Getting Kamon metrics into Prometheus

Fortunately there’s an out-of-the-box implementation of such a bridge available, called statsd exporter. The statsd exporter will receive metrics from a Statsd environment, in our case from Kamon, and provides a /metrics endpoint for Prometheus to pull from. The great thing about this tool is that it even supports the DogstatsD format that we are used to with Datadog. Therefore, we didn’t have to change anything in our Kamon configuration except the endpoint. Below is a snippet of our config.

The default-tags section ensures we add tags to our metrics which makes it easy to identify the source of metrics from a particular instance, in this case a deployment. As you can see the hostname property refers to a statsd-exporter-svc in the monitoring namespace. This assumes a statsd-exporter Kubernetes deployment and service. Fortunately the statsd exporter project has an official docker image. Let’s start by creating the monitoring namespace:

$ kubectl create namespace monitoring

Now we can create a statsd exporter deployment and service:

We can create the resources by running:

$ kubectl --namespace monitoring apply -f statsd-exporter.yaml

Deploying Prometheus

Great! Now our app is pushing metrics to the statsd exporter through Kamon. All we need to do now is to deploy a Prometheus instance to Kubernetes configured to pull metrics from the new statsd exporter endpoint. Let’s first create the necessary Prometheus configuration locally:

Now we can create a Kubernetes ConfigMap from the file to make it easy to inject it into our Prometheus pod(s).

$ kubectl --namespace monitoring create configmap prometheus-conf --from-file prometheus.yml

All we need now is a descriptor file for the Prometheus deployment. Also for this an official docker image is available.

As you can see on line 21–24 we conveniently create a so called projected volume from our newly createdprometheus-conf ConfigMap. Then on line 16–18 we mount this volume on /etc/prometheus for Prometheus to pick up our config file. Once we instruct kubectl to create the resources we have a fully working setup:

$ kubectl --namespace monitoring apply -f prometheus.yaml

Because we use the service type: LoadBalancer it will get a public IP which you can easily map to your own domain or put behind a firewall.

Updating the Prometheus configuration on-the-fly

To make changes to the Prometheus configuration without redeploying we could simply change the local configuration file and recreate the ConfigMap:

$ kubectl --namespace monitoring create configmap prometheus-conf --from-file prometheus.yml -o yaml --dry-run | kubectl --namespace monitoring replace -f -

The --dry-run flag ensures we are not creating a new ConfigMap but are piping the updated content to the replace command, resulting in updating our existing prometheus-conf ConfigMap.

Mounted ConfigMaps are updated automatically when its content changes. Kubelet is checking whether the mounted ConfigMap is fresh on every periodic sync. However, it is using its local TTL-based cache for getting the current value of the ConfigMap. As a result, the total delay from the moment when the ConfigMap is updated to the moment when new keys are projected to the pod can be as long as kubelet sync period + TTL of ConfigMaps cache in kubelet.

After this period elapses we need to restart all Prometheus pods to consider the new configuration, since the config is only loaded at startup time:

$ kubectl --namespace monitoring scale --replicas=0 deployment/prometheus
$ kubectl --namespace monitoring scale --replicas=1 deployment/prometheus

Note that we don’t have to recreate the deployment.

Adding Grafana

Although we can now already inspect our metrics through the Prometheus UI it’s rather difficult to create nice graphs and dashboards. Therefore we decided to introduce Grafana. It’s very feature-rich and easy to use. Grafana allows for beautiful visualisations, customised dashboards and alerts. It also supports Prometheus as a data source for metrics.

Let’s add Grafana to our monitoring namespace. First let us create an admin password which we can inject through a Kubernetes secret. A secret is like a ConfigMap, but encrypted.

$ echo -n "very-secret-password" > ./password.txt
$ kubectl --namespace monitoring create secret generic grafana-admin-secret --from-file=./password.txt

Now that we have our grafana-admin-secret we can inject it in a Grafana deployment:

We are populating the GF_SECURITY_ADMIN_PASSWORD environment variable from our grafana-admin-secret. Grafana uses this variable to set the credentials for the user.

Now we can ask kubectl to create the resources again and be done with it:

$ kubectl --namespace monitoring apply -f grafana.yaml

Also here we use the service type: LoadBalancer so it will get a public IP (which we can map to our domain again). We can open the web interface and login with the account and the password we’ve set in the previous steps. From there we can add Prometheus as a datasource and create custom dashboards and set up alerts.

Tracing

The last topic I’d like to touch in this post is tracing. As mentioned, in a microservice environment (user) requests probably will span a set of microservices, so we would need to trace requests across service (network) boundaries to have the full picture. At the time of writing we haven’t fully solved this problem yet at Cupenya, but we are working on it. In this section I’ll briefly provide some pointers and share our approach.

As we’ve seen in the previous section we are using Kamon for collecting metrics in our application. Kamon also provides a tracing module. The core concept of this tracing module is the . The TraceContext keeps track of all relevant metadata belonging to a single trace. At the highest level, a trace tells the story of a transaction or workflow as it propagates through (different microservices in) our system. A trace is in essence a directed acyclic graph (DAG) of A span is a timed operation representing a contiguous segment of work. In a microservice infrastructure each microservice contributes its own span or even multiple spans. A parent span may explicitly start other spans, either sequentially or parallel.

In order to fully keep track of a trace in a microservice environment we need each service to

  • Check if there is an active TraceContext
  • Create a new TraceContext with a unique ID if none is active
  • Create spans for each operation
  • Pass all that information back into the TraceContext
  • Propagate the TraceContext to other (remote) service calls

To pass the TraceContext around in a distributed system we need some kind of protocol to support this. In case of HTTP communication you could think of putting the unique ID of a TraceContext into the request headers along with the parent span ID. Wouldn’t it be cool if there was a language agnostic open standard for all of this?

Fortunately there is the OpenTracing initiative. The yet to be released (at the time of writing) version 1.0 of Kamon will support this standard. I’ll hope to be able to post more about this soon, but for now I’ll just share a few snippets of Scala code to give you an idea of how it will look like.

Let’s first look into working with spans. For tracking spans for synchronous operations we could make use of the following generic logic:

In line 2 we create a so called which implements a builder pattern for adding metadata to spans. In line 3 we are adding some default tags, which I’ll get to in a bit, add the new span as a child of the parent span context if available and start the new span. Then we activate the span, execute our block of code and finally deactivate and finish it. An example of the function is shown below

It gets a SpanBuilder as an argument and adds various tags to it such as environment variables set by Kubernetes.

For asynchronous operations it looks a little different, but not too much:

As you can see it’s very similar with the difference that we are not adding the span as a child to a parent span context, because it’s a parallel operation. Also we can the span immediately, but we can only the span after the future operation succeeds or fails (line 6–7).

For REST calls it gets even more interesting. Let’s look at an example Directive implemented for Akka HTTP:

Here we basically follow the same pattern as with asynchronous operations, but we have to extract the current trace context from the request headers as seen in line 6–7. Kamon supports extracting this context from a so we are transforming Akka HTTP request headers to a TextMap (line 25–30) When completing the request we are putting a custom x-trace-token header in the response. All that’s missing now is to report the spans and trace context to some distributed tracing system like Dapper, Zipkin or Jaeger for visualisation. There’s already a kamon-jaeger module available.

Well, that’s all for now. I hope I did give you some pointers on how to address the issues with monitoring, auto recovery, logging & tracing in microservice environments. I’ll keep a close eye on the developments regarding OpenTracing and Kamon 1.0 and hope to be able to post more about this soon.

In my next post in this series I’ll discuss how we achieved deployment automation & continuous delivery. Stay tuned!

Jeroen Rosenberg

Dev of the Ops. Founder of Amsterdam.scala. Passionate about Agile, Continuous Delivery. Proud father of three.

Jeroen Rosenberg

Written by

Dev of the Ops. Founder of Amsterdam.scala. Passionate about Agile, Continuous Delivery. Proud father of three.

Jeroen Rosenberg

Dev of the Ops. Founder of Amsterdam.scala. Passionate about Agile, Continuous Delivery. Proud father of three.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade