Grafana Loki Configuration Nuances

Published in

Lonto

13 min readMay 31, 2023

Hi! My name is Igor, I am a co-founder and a system architect at Lonto. We deal with different tasks, from corporate systems to custom projects, mobile development, and DevOps. The experience we’ve gained allows us to help our customers cope with the infrastructure and its bottlenecks using a variety of tools.

This article is based on my talk at Slurm’s School of Monitoring. In it, I wanted to share best practices for the Grafana Loki set up for collecting logs in infrastructure that we harvested through the years of its usage.

In my opinion, the threshold to enter Loki is quite low, and there are many tutorials on the Web. So I will talk about the more complex and not quite obvious settings that I have often encountered when working with Grafana Loki.

This is the article outline:

The task of collecting logs
Ways to launch Loki
The arrangement of Grafana Loki architecture
Minimum Loki configuration
Cluster and High Availability solutions configuration
Timeouts
Message sizes
Chunks
Parallelism
Optimizing the Write Path

The task of collecting logs

Four basic questions to ask yourself before trying to integrate any logging system:

How do I collect logs?
How do I extract the right metadata from them to make it easier to process logs in the future?
How do I store this data so that I can record and find it faster (probably the most difficult question)?
How do I query the logs?

Every system, be it syslog, Elasticsearch, systems built on ClickHouse, or even Grafana Loki itself, answers these questions differently.

So when we discuss the architecture, we’ll get back to how Grafana Loki differs from Elasticsearch conceptually, and why it wins in terms of log storage cost.

The log collection pipeline usually looks simple and straightforward:

So, we have a variety of data sources we get logs from: Kubernetes clusters, virtual machines, Docker containers, and others. They go through a collect, process, and filter phase.

Then they are saved in a certain form, depending on the storage you use. For example, as a database if you work with ClickHouse, or as a S3 Bucket in Grafana Loki. But note that every user who retrieves data from the other side may have different action scripts. For example: to retrieve logs for a year or for the last 10 minutes to filter data on them.

Ways to launch Loki

There are three ways to launch Loki which, by and large, differ in scale.

Single-binary

This method is the easiest and is mostly used in the primary Loki tutorials. The logic is the following: we take a binary file, run it, and connect it to a Storage.

The Storage role can be performed by both the file system on which our process is running and the remote S3 Bucket. It doesn’t matter here. This approach has its advantages. For example, the ease of launching: you need only a minimum configuration, which we will check out below. The disadvantage is poor fault tolerance: if the machine crashes, the logs are not written at all.

The situation can be improved:

Run the same Loki process with the same config on two different virtual machines
Combine them into a cluster using the memberlist section
In front of them, put any proxy, like Nginx or HAProxy
The main thing is to connect everything to one Storage

Accordingly, you can scale the process further, i.e. run three, four, or five nodes, etc.

In the result you will get more or less scheme as the following:

Note that Loki does both writing and reading. Therefore, the load is evenly distributed across all instances. But in fact it is quite uneven, because sometimes there are a lot of reads, while other times there are a lot of writes.

SSD: Simple Scalable Deployment

The second method follows the first one and allows to separate the read and write processes. That way, for example, we can run processes that are more disk-dependent on one hardware and less dependent ones on another.

You need to pass the -target=write or -target=read flag to run, and each of these processes runs the entities that are responsible for the specific query path: write or read. Similarly, you need to put a proxy in front of all instances. It will proxy:

write queries → to write nodes
the rest of the queries → to read nodes

Grafana considers this to be the most recommended way to launch and actively develops it.

Microservices mode

Microservices mode is a more advanced way where we run each Loki component independently.

There are a lot of components, but they are easily separated from each other, and they can be divided into two or even three groups.

The write component group: Distributors and ingesters that write to the Storage.
The read component group: “query-frontend,” “querier,” and “index gateway.” These are the components that deal with execution of queries.
All other utility components: e.g. caches, “compactor,” and others.

The arrangement of Grafana Loki architecture

Let’s dwell a little more on the architecture to understand what is configured in different sections.

First, let’s check out how the data in the block is indexed. Unlike Elasticsearch, which indexes all documents in full-text and in their entirety by default, Grafana Loki goes the other way: it indexes not the contents of the logs, but their metadata only, i.e. time and labels.

These labels are very similar to Prometheus labels. I think many of you are familiar with them.

We end up storing a very small data index in Grafana because there is very little data in it. Here I want to point out that with Elasticsearch the index is often more bloated than the data itself.

We keep unindexed data as it is in the order in which it appears. If you want to filter them out, we can use “grep”-like search built-in into Loki.

Stream is a unique set of labels, even though the logs may come from the same source.

In this case, the component=”supplier” label generates a new stream. We will need this in the future, since the settings relating to rate-limits and restrictions often apply to the stream.

Chunk is a set of several log lines. You take the log lines, put them into one entity, call it a “chunk,” compress it, and put it in the Storage.

Now let’s go back to the architecture and take a closer look at the write and read paths and their differences.

Write path. The logging entry point in Loki is an entity called Distributor. This is a stateless component. Its task is to distribute the query to one or more ingesters.

Ingesters are stateful components. They compose a so-called hash ring, a system of consistent hashing. It makes it easier to add and remove the ingesters from the cluster. All ingesters are connected to the same Storage.

Read path is more complicated, but the principle is similar. There are stateless components: query-frontend and querier. Query-frontend is a component that helps to split the query, so that it can be executed faster.

For example, let’s imagine that you need to query data for a month period. Using the query frontend, we divide the query into smaller intervals and send it in parallel to several queriers. After that we merge the result and return it to client.

Querier is a component that queries logs from the Storage. If we get new data from the ingesters that has not yet been written to the Storage, the querier asks for it too.

Minimum Loki configuration

Filesystem

This configuration focuses on working with the file system when Loki is running in a single instance mode. I will tell you about the most important configuration sections.

Loki is a tool that is constantly evolving to simplify and improve the way you work with configuration. One of the best changes was introduced a couple of versions ago: the “common” section. When there are several repeating items in a configuration, the “common” combines them in different parts of the config. That is, if before you had to configure the ring, storage, and other elements for the ingester, querier, and distributor separately, now you can do it all in one place.

auth_enabled: false


server:
  http_listen_port: 3100


common:
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  
  replication_factor: 1


  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory


schema_config:
  configs:
    - from: 2020-09-07
      store: boltdb-shipper
      object_store: filesystem
      schema: v12
      index:
        prefix: loki_index_
        period: 24h

Here you can see that storage is set up as a file system, and it shows where the chunks are stored. There you can also specify where to store the index, alerting rules, etc.

schema_config is a config to specify how the data is stored: as chunks & indexes. Nothing has changed here in general since long ago, but sometimes new scheme versions appear. Therefore, I recommend reading the change logs periodically so that you could update your Loki schemes in time and use the latest improvements.

Once several instances appear, you need to combine them in a cluster that runs on the “memberlist” protocol (you can also use third-party systems such as etcd or consul). That is a Gossip protocol. It automatically finds Loki nodes according to a certain principle.

auth_enabled: false


server:
  http_listen_port: 3100


common:
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  
  replication_factor: 1


  ring:
    kvstore:
      store: memberlist


schema_config:
  configs:
    - from: 2020-09-07
      store: boltdb-shipper
      object_store: filesystem
      schema: v12
      index:
        prefix: loki_index_
        period: 24h


memberlist:
  join_members:
    - loki:7946

There are many different ways to automatically configure a cluster, and they work just fine. For example, you can specify different settings in the memberlist.join_members section:

Single host address
List of addresses
dns+loki.local:7946 — Loki will make an A/AAAA DNS query to get a list of hosts
dnssrv+_loki._tcp.loki.local — Loki will make a SRV DNS query to get not only the hosts but also the ports list
dnssrvnoa+_loki._tcp.loki.local — SRV DNS, a query without A/AAAA query

Why do you need it? There are components within Loki that need to know about each other. For example, distributors need to know about ingesters. Therefore, they are registered in the same ring. After that, distributors know which ingesters to send a record request to. Another example is compactors which must work in a single instance in a cluster.

S3 as a storage

S3 is the most recommended way to store logs in Loki, especially if you deploy Loki into Kubernetes cluster. When we use S3 as storage, the configuration changes slightly:

auth_enabled: false


server:
  http_listen_port: 3100


common:
  path_prefix: /tmp/loki
  storage:
    
    s3: # The "filesystem" section changes to s3
      
      s3: https://storage.yandexcloud.net
      bucketnames: loki-logs
      region: ru-central1
      access_key_id:
      secret_access_key:
  
  replication_factor: 1


  ring:
    kvstore:
      store: memberlist


schema_config:
  configs:
    - from: 2020-09-07
      store: boltdb-shipper
      object_store: filesystem
      schema: v12
      index:
        prefix: loki_index_
        period: 24h


memberlist:
  join_members:
    - loki:7946

Tips:

Use https:// instead of s3://. That way you guarantee the use of an encrypted connection
You can specify several bucketnames to distribute storage
You can use ACCESS_KEY_ID and SECRET_ACCESS_KEY environment variables to configure S3 access keys

We change storage to S3, specify the endpoint, bucketnames, and other configurations that apply to S3.

Note that bucketnames are plural, meaning that you can specify several of them at once. This way Loki will begin to distribute the chunks evenly over all the specified buckets to reduce the load on a single bucket. For example, you need this when your hoster has RPS restrictions per bucket.

Cluster and High Availability solutions configuration

Suppose we’ve recorded logs in several nodes. One of them failed and you can’t query data from it, because it did not have time to write them into the storage.

High Availability in Loki is provided through the use of replication_factor option. With this setting, the distributor sends the logging query to more than one ingester replica.

auth_enabled: false


server:
  http_listen_port: 3100


common:
  path_prefix: /tmp/loki
  storage:
    s3:
      s3: https://storage.yandexcloud.net
      bucketnames: loki-logs
      region: ru-central1
      access_key_id:
      secret_access_key:
  
  replication_factor: 3  # Take a note of this field


  ring:
    kvstore:
      store: memberlist


schema_config:
  configs:
    - from: 2020-09-07
      store: boltdb-shipper
      object_store: filesystem
      schema: v12
      index:
        prefix: loki_index_
        period: 24h


memberlist:
  join_members:
    - loki:7946

replication_factor:

The distributor sends chunks to multiple Ingesters
Minimum of 3 for 3 nodes
Allows 1 of 3 nodes to fail
maxFailure = (replication_factor / 2) + 1

The distributor sends the chunks to several ingesters at once.

For data deduplication in the boltdb-shipper several caches are being used — chunk_cache_config config section for chunks and write_dedupe_cache_config for indexes. Moreover, queriers act in the whole deduplication process by filtering duplicated log lines (based on timstamp and the contents). More on this here and here.

Timeouts

Quite a hard topic, because when you set up Loki incorrectly, you can often see errors like 502, 504.

To better understand the errors, you should firstly increase the timeouts to sufficient values in your project. The second thing to do is to properly configure several types of timeouts.

The http_server_{write,read}_timeout configures a base timeout for the web server response time.
The querier.query_timeout and querier.engine.timeout sets the maximum running time of the engine that directly executes read queries.

server:
  http_listen_port: 3100
  http_server_write_timeout: 310s
  http_server_read_timeout: 310s


querier:
  query_timeout: 300s
  engine:
    timeout: 300s

If you use a proxy like NGINX in front of Loki, you should increase the timeouts there as well (proxy_read_timeout and proxy_send_timeout).

server {
  proxy_read_timeout = 310s;
  proxy_send_timeout = 310s;
}

You should also increase the timeout on the Grafana side. You can do this in the [dataproxy] section of the config.

[dataproxy]
timeout = 310

The best approach is to set the minimum of all four types of timeouts at the querier (300s in the example). That way it will finish first, and all the next ones, for example HTTP servers, Nginx, or Grafana, will take a bit longer.

The default timeout is very small, so I recommend increasing these values.

Message sizes

The topic may seem complicated because the origin of some errors is not obvious.

Message sizes, grpc_server_max_{recv,send}_msg_size are the limits of the possible size of the logs. The default limits are very small. For example, if there is a large stack trace and 20 MB of logs are sent in one log line, there is no way it can fit into that limit. So it has to be increased.

server:
  http_listen_port: 3100
  grpc_server_max_recv_msg_size: 104857600  # 100 Mb
  grpc_server_max_send_msg_size: 104857600  # 100 Mb


ingester_client:
  grpc_client_config:
    max_recv_msg_size: 104857600  # 100 Mb
    max_send_msg_size: 104857600  # 100 Mb

The default is 4 MB
Directly affects the size of logs processed by Loki

We can’t avoid the encoding of chunks either. The default value is gzip, which is the maximum compression. Grafana recommends switching to snappy, and I agree with them based on my experience. Then the logs may take a little more space in the storage, but reading and writing data becomes more efficient.

ingester:
  chunk_encoding: snappy

The default encoding is gzip:

Better compression
Slower queries

We recommend to use snappy encoding:

Compression is a little worse
However, very fast encoding/decoding

Chunks

There are many chunk settings regarding their size and lifetimes. I recommend not to edit them too much. And you have to really understand what you are doing when you change these values.

ingester:
  chunk_idle_period: 2h
  chunk_target_size: 1536000
  max_chunk_age: 2h

The defaults are pretty good:

I don’t recommend changing chunk_block_size and chunk_retain_period at all
The chunk_target_size can be increased if the chunks are mostly full. This will give them more space
chunk_idle_period means how long the chunk will live in the ingester’s memory if there are no records in it at all. So, if your streams are mostly slow and half-empty, it’s better to increase the period. The default is 30 minutes

Parallelism

Another important issue is concurrency.

querier:
  max_concurrent: 8


limits_config:
  max_query_parallelism: 24
  split_queries_by_interval: 15m


frontend_worker:
  match_max_concurrent: true

querier.max_concurrent shows how many queries one querier can handle in parallel. It is recommended to put about twice the number of CPUs. The default is 10 (be careful with these numbers)
limits_config.max_query_parallelism shows the maximum parallelism the tenant has. The querier.max_concurrent values must match with max_query_parallelism by the formula:
[queriers count] * [max_concurrent] >= [max_query_parallelism]

In our example at least 3 queriers should be running to provide the parallelism value of 24.

Optimizing the Write Path

There are several write-related settings here: ingestion_write_mb, ingestion_burst_size_mb.

limits_config:
  ingestion_rate_mb: 20
  ingestion_burst_size_mb: 30

The defaults are low, so I recommend increasing them. This will allow you to write much bigger logs and do it more often. The other values relate to the tenant, so you have to be careful with them.

There is a separate setting for streams: per_stream_rate_limit.

limits_config:
  per_stream_rate_limit: "3MB"
  per_stream_rate_limit_burst: "10MB"

These are more or less normal defaults in the example. But if they’re starting to constrain you, I recommend breaking up one stream into several by adding a label. This will reduce the stream’s rate-limit obviously. If this is not possible, you can try to increase the limits of course.

Summing up

In this article I tried to give a detailed explanation of how to work with Loki in the context of configuration:

The task of collecting logs
Loki architecture
3 types of Loki launch strategies
High Availability in Loki with replication_factor
Configuration of timeouts and message sizes
Parallelism — watch these settings very carefully!
Optimising the write path is generally not a difficult process. You just need to look at the indicators in your monitoring and optimise accordingly