The Observability problem

Published in

etermax technology

7 min readFeb 18, 2021

Kubernetes is a great solution to orchestrate Docker containers. It gives you the chance to deploy and manage microservices, control replicas, and autoscaling. Also, you can check the status of every deployment and inspect logs of every pod or container, for example:

kubectl logs this-is-a-pod -n namespace pod-namespace --follow --tail=100

The command above shows the last 100 lines of logs. When a pod has more than one container, use -c to specify container name:

kubectl logs this-is-a-pod -n namespace pod-namespace --follow --tail=100 -c container-name

Logs of pods with exited containers can be inspected using -p:

kubectl logs this-is-a-pod -n namespace pod-namespace --follow --tail=100 -p

Here you can find more options about kubernetes logging.

Having said that, if you have more than one cluster, how do you solve problems like monitoring or checking logs of every pod? Using kubectl for those tasks would be less than practical.

Configuring a centralized solution that gathers logs from every cluster and at the same time makes them queryable seems to be the solution. Here are the things that we take into account deploying our logging solution.

A little bit of context…

At etermax, we have been using Kubernetes since 2019 to run our games backend as microservices in several clusters. Each cluster was born as an isolated one with its monitoring stack. As the number of microservices started to grow, logging became a big issue. Our teams were struggling with challenges as storage and log query timeouts. We needed to propose a solution to centralize, manage, and storage logs and at the same time to remove or to reduce the responsibility of cluster management from our development teams.

We considered ELK (Elasticsearch, Logstash, and Kibana) and Loki as solutions for our logging problems. It seemed that either ELK or Loki could work, so we started to take into account aspects like storage and integration (we are using Prometheus and Grafana). While ELK uses Elasticsearch as a storage solution, Loki has multiple options like Cassandra or DynamoDB. Other benefits offered by Loki are:

Grafana Labs is the developer of Loki, thus has full integration with Grafana and our current metrics stack.
Loki uses microservices architecture, giving us an advantage toward scaling.
Teams are collecting logs using Promtail, making migration easy to manage.

Benefits mentioned above made Loki the best option.

In the image below, Grafana Labs shows the architecture of Loki installation with Promtail running on each node and sending data to Loki, providing options like service discovery, labeling, and parsing, allowing to mutate discovered labels. Once data is read and processed, Promtail sends it to Loki, who is capable of showing the logs in Grafana and make them queryable.

Source: https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/

Backend storage

One of the big problems with saving logs is to manage storage and data retention. Many times we have had issues with PersistentVolume and PersistentVolumeClaims running out of space crashing pods.

Loki offers a way to manage storage using a Table Manager module. The module takes care of saving data, managing retention and expiration, and in some cases, establishing write and read limits.

Table Manager saves two kinds of data: indexes and chunks. The options to save indexes are:

Cassandra
BigTable
DynamoDB
BoltDB

and the options for chunks are:

Cassandra
GCS
File System
S3

Since we are using Amazon EKS as our cluster manager, our first choice for external services will always be AWS. Thus we choose DynamoDB and S3. Loki, through Table Manager, fully managed DynamoDB tables configuring retention periods and read/write capacity. However, it is not all good news regarding storage. Table Manager does not offer retention control over data written in S3, leaving us the job of controlling retention using policies that must match with those in Table Manager.

Loki as microservices

We need a solution that can scale on demand. For example, if we have a problem querying for logs, we would like to modify only the services in charge of reading and not the services that control saving, or if there is a problem with Table Manager, leave out the services that use S3.

Loki gives us options to use it in two modes: monolithic and microservices. When used in monolithic mode, all components will run in a single process, avoiding the possibility of scaling components individually. With microservices, each service has the option to scale horizontally, for example, in the case we need more read services than write services.

Microservices sounds good, but how can we deploy each component individually and connect them to work together? Well, Grafana Labs provides configuration files that help you with your Loki configuration. Those files are the ones they used in their production deployment and may not be adequate to your needs, but they exist to serve as inspiration.

After struggling with Tanka and refactoring some configurations, our final deployment looks like the figure below.

Loki architecture is based on Cortex, which consists of several microservices. The architecture is divided into three parts: Read path, Write path and Control path.

The Read path starts when a query is made from Grafana to the gateway and routed to the query-frontend. The query-frontend divides the query into smaller queries and places them in an internal queue that feeds the querier that execute these smaller queries in parallel. The querier send the queries to the ingester for in-memory data, if the ingester fail returning data for a query, the same query is executed against the backend storage.

The Write path starts with requests from Promtail. The streams are received by the distributor and hashed before sending them to the ingester. Each stream in the ingester is divided into chunks and saved in-memory before sending them to backend storage.

The Control path has two parts, the table manager and Consul. The table manager creates the tables in DynamoDB used to save chunks indexes and takes care of table retention. Consul is used for consistent hashing and register ingesters states.

Right now, each component has one replica. We have an ingress configured to expose the push endpoint in our internal network waiting for other clusters logs stream. Table Manager creates tables each week and retains them for four weeks. Inactive tables, namely tables from past weeks, have smaller writing and reading limits than the table used in the current week. To increase performance, Loki uses several instances of Memcached in its read and write paths. Further information about Loki architecture can be found at https://grafana.com/docs/loki/latest/.

Promtail

As stated before, other teams are using their own Loki installation along with Promtail to push logs. This new strategy should give them a solution to their problems without creating another one.

Promtail has the particularity of being able to push logs to two or more consumers. We can ask each team to start pushing logs to our Loki at the same time they are doing the same to their own Loki setup. This will make the migration as easy as changing one line in their Promtail. Using this approach, we can assure continuity on their logs or go back if something does not work as expected.

Another characteristic of Promtail is labeling. Users can send logs with labels like cluster name or custom application names. This feature facilitates having two or more clusters pushing logs to the same Loki server.

Visualization: Grafana

Loki allows you to query logs effortlessly due to its integration with Grafana. Our cluster has a Grafana installation already, and adding Loki is as easy as adding another data source. Besides, if Prometheus metrics are coming from the same source as Loki, Grafana has the option to combine both data sources to create amazing and useful dashboards. For example, we can create a dashboard showing logs lines and logs count along with information about pods performance giving us a big picture about pods behavior.

Summing up

Loki gave us a great solution to our logging problems. Every time we have a problem with a pod, our first step is to search in Grafana for metrics and logs. Since Promtail labels logs, we can query information using pods names, container names, filter by namespace, or just check logs from all pods.

At present, our production logging system is running and getting logs from one Kubernetes cluster, using DynamoDB and S3 as backend storage with data periods of 7 days and retention of 28 days.

We know that this is not a perfect solution, but we also know that observability is an everyday task, and this is our first step.