Yes, you can use Loki to efficiently retrieve big data for analysis

Ronen Schaffer
12 min readOct 20, 2021

How we evaluated Loki as a log storage system focused on pushing and pulling massive amounts of data

Co-authors:

and

A lot of us are fans of Loki, the open-source system built to store logs for long term. And the ability to query and retrieve specific logs, makes it even more useful. Here’s the thing: we needed an efficient way to retrieve a whole batch of logs in one go. For example, if you want to carry out AI pipeline use cases, you need a solution that can get you the log data using a reasonable amount of CPU and memory resources and within a reasonable amount of time. The ability to ingest and extract a massive amount of log records didn’t exist so we set out to create it. This blog shares the story of how we configured and tuned Loki to do more, along with our evaluation. The exciting part is that it works really well, as you’ll see from the numbers. You can even try it for yourself.

What we found in a nutshell is this. The amount of memory and CPU resources required for our use case is very reasonable. This includes the process of ingesting and extracting large amount of log records. There is also a lot of value in having a single solution that can handle both the ‘needle in a haystack’ queries as well as queries for big data analytics.

Is this the only possible way to efficiently retrieve loads of log data? Well, another way would be to put all the data in object storage and then extract it all from there. But, you’d lose the ability to easily retrieve or fetch specific individual logs. Elasticsearch can also do this, but Loki is specifically made for work with logs: it’s open source, and can retrieve logs in a way that is distributed and large scale. Another alternative would be to do this with Spark and object storage, but it’s still easier with Loki.

A bit more about Loki

If you’re not already familiar with it, Loki is a promising persistent log aggregation system inspired by Prometheus, developed in open source, and led by Grafana Labs. It’s designed as a distributed micro-service system that’s highly scalable and configurable. Loki’s design is also optimized for ‘needle in a haystack’ use cases, to support the quick and efficient retrieval of a small subset of logs using a time-based query. The sweet spot is the way Loki balances the requirements for indexed data with those for the raw log storage, including its ability to persist logs in object storage like AWS S3 and IBM Cloud Object Storage.

Why did we choose the AI use case?

Our evaluation covers a big data analytics use case. It’s not very complicated, yet it’s very common. We knew from the documentation and our prior work that Loki handles focused log queries quite well. Now we wanted to see how well Loki could handle the process of storing and retrieving a very large number of logs for big data analysis.

Figure 1. Microservices and components used for the evaluation

About our evaluation

Loki consists of four types of components. There are actually more, but in this evaluation we used only these four:

  • ingester
  • distributor
  • querier
  • query-frontend

The ingesters and the queriers are responsible for the actual work of indexing logs and retrieving them, respectively. The distributors and the query-frontends function as load balancers (but have additional functions) before the ingesters and queriers move into action.

For the evaluation, we used one day (24 hours) of log data compressed with gzip; each hour was stored as an object in IBM Cloud Object Storage. An entire day of logs is 256GiB uncompressed. Each hour contains about 4 million log records, which take up 0.5GiB compressed and 10GiB uncompressed. The log data used for this evaluation was based on application log lines from multiple microservices that form a highly scaled production application.

Example: Log record snippet (excluding confidential information):

{
"_account": "1f3434521",
"_cluster": "ld34",
"_host": "pmp021-tf-mm-6945b94868-mszbf",
"_ingester": "theingester",
"_label": {
"service": "theservice",
"pod-template-hash": "6945bfg948",
"slot": "gyh345",
"component": "tf_gg",
"tenant": "ddd021"
},
"_logtype": "json",
"_tag": [
"agent-v2production",
"mothership-6677788"
],
"_file": "/var/log/containers/hhh021-tf-mm-23455b94868-mszbf_mode-runtime-a3a6117d7776661sdf4235dfg568bb7463c65c845671.log",
"_line": "{\"thread\":\"cleaner-task\",\"level\":\"INFO\",\"loggerName\":\"com.ibm.model\",\"message\":\"Cleaner registry pruning task took 0ms for 0/1 entries\",\"endOfBatch\":false,\"loggerFqcn\":\"org.apache.logging.log4j.spi.AbstractLogger\",\"instant\":{\"epochSecond\":1575158500,\"nanoOfSecond\":829000000},\"contextMap\":{},\"threadId\":54,\"threadPriority\":5}",
"_ts": 1575158500478,
"_app": "model-runtime",
"pod": "hillel021-tf-mm-6945b94868-eran",
"namespace": "logical",
"container": "modelmesh-runtime",
"containerid": "a3a6117d77766613b181basd345df96f6edabb463c65c845671",
"node": "kube-ronen11-crfd21a23456496adac5sdf34559e13d0-w810\n",
"_ip": "78.342.44.248",
"_id": "115342345700747",
"__key": "logline:1f34622021:model-runtime:pmp021-tf-mm-6943456868-mszbf:ld60",
"_bid": "e4aaab6b-c7ce-4baa-8028-0d9c1asdfsdf2b:13209:ld60",
"thread": "cleaner-task",
"level": "INFO",
"loggerName": "com.ibm.model",
"message": "Cleaner registry pruning task took 0ms for 0/1 entries",
"endOfBatch": false,
"loggerFqcn": "org.apache.logging.log4j.spi.AbstractLogger",
"threadId": 54,
"threadPriority": 5,
"o_instant": {
"epochSecond": 1575158500,
"nanoOfSecond": 829000000
},
"o_contextMap": {}
}

Distribution of logs per hour

Figure 2. Distribution of logs per hour

In Figure 2 you can see the distribution of logs used for this evaluation over time, at the granularity of one hour. There are 105M log lines in total during the day long period used for the evaluation, with storage totaling 256GiB. We can also see that there are variations in the amounts of logs emitted over time. The average hourly count of log lines is 4.3M and the average hourly log size is 11GiB.

The environment

For this evaluation we deployed an OpenShift cluster on AWS:

  • OpenShift version: 4.7.16
  • 3 master nodes of type m4.xlarge (4 vCPU, 16 GiB)
  • 3 worker nodes of type m5d.4xlarge (16 vCPU, 64 GiB)

We used AWS S3 as a persistent log storage for Loki.

The Loki environment included:

  • Loki version 2.3.0
  • 4 ingesters and 2 distributors
  • 4 queriers and 2 query frontends

The architecture we used for evaluation

Figure 1 shows the set of microservices and components we used for the evaluation. The flow moves from left to right. First, the raw logs, originally located in object-storage, are stored compressed. Then, they are read by a set of ingest containers that “write” logs into Loki. We used multiple containers in parallel for the process so we could accelerate and optimize the ingest process to the max. Both the number of containers and the interleaving behavior of those containers are configurable parameters that we varied during the evaluation.

Next, we had Grafana Loki with the set of microservices for the write path and the read path, allowing the efficient write of those logs into the store. We used S3 for the object storage.

The next step involved another set of containers. The “log reader” containers query Loki and pull data from Loki with dev/null as the destination. For our evaluation there was no real reason to keep the logs. In an actual AI pipeline, we would pull them from Loki and hand them over to some analytics engine, but this was outside the scope of what we were trying to achieve and we assumed the engine would be able to handle infinite log records per unit time.

The write path

We started the write path with a Kubernetes pod for each log file (single hour) and ran the following pipeline.

1. Download the corresponding compressed log file

2. Decompress the log file

3. Run Promtail to push the logs to Loki

Each pod was started with a delay of 180 seconds from the previous one to mitigate the load on Loki. This is an important characteristic for the ingest pipeline to balance memory and CPU on the cluster.

We applied only 2 labels to the logs: a constant job: promtail label and another label to indicate the filename/hour (one of the 24 log files). We actually tried to set the original log timestamp as Loki’s timestamp, but because the logs weren’t ordered by time, the out-of-order logs were rejected by Loki. To overcome this issue, we settled for using the index timestamp as the timestamp that Loki assigns, since it was less important for this evaluation. We’ll try to tackle this issue again down the road.

To pull large bulk amounts of log records based on time, there was no need for extended tagging, and therefore no reason to add extra labels to the log lines. That said, from what we observed during the evaluation, as long as the cardinality of labels was not very high, adding labels did not impact the performance of the ingest or query processes.

The read path

Similar to the write path, we started a pod for every hour of logs and ran LogCLI to pull all the logs of the corresponding time range. Before the start of the read path, we redeployed Loki’s components to ensure a fresh state and force the queriers to bring the data from the storage rather than the ingesters’ cache.

The adventure begins…

Initially, we tried deploying Loki using Helm, with its out-of-the-box default settings. We had a bit of a surprise when the memory consumption of the ingesters and the queriers exploded during upload and download, respectively. What’s more, the download rate was so slow that it would have taken several days to pull one hour of log data.

We tweaked Loki’s configuration until we got acceptable results. Basically, we went through a lot of iterations where we ran our experiment, faced an error or a performance issue, and searched for the right setting to tweak to overcome the problem.

During the read path, we used LogCLI to query and pull the logs. Behind the scenes, LogCLI breaks the query into smaller queries or ‘batches’ by tweaking the ‘to’ and ‘from’ parameters. By analyzing the output of the --stats flag, we found out that the first batch had to read all the chunks of the query even though only a fraction of it was returned. As the query moved on to the next batches, fewer chunks were read. This is pretty inefficient since the queriers read more chunks than needed for each batch and the chunks of the last batch are being read for all the previous batches. To mitigate this issue, we reduced the number of batches by increasing the batch size to 50,000.

Our results

Here you can see graphs showing relevant time-series metrics collected from Loki and input into Thanos during our evaluation.

Total number of log entries received by the distributors

Figure 3. Total number of log entries received by the distributors

This graph does a good job of showing the first stage of the ingest process, which took about 75 minutes to feed more than 100M log lines into Loki. It’s easy to see that the load is spread evenly between the 2 deployed distributors and that the total number of logs at the end of the ingest process exactly matches the total number of log entries reported by the uploading processes. We can also see the steady and consistent slope showing the behavior of the distributors over time.

Total chunk bytes stored by the ingesters

Figure 4. Total chunk bytes stored by the ingesters

The second stage in the Loki ingest pipeline is data persistency into log storage using the ingester processes. As the graph shows, we used four ingesters.

Although our raw data of logs is about 256GiB, the ingesters stored only about 14GB. This is because we configured Loki to use LZ4 compression for storing the data. We also verified the size on the S3 bucket.

Total number of chunks created in the ingesters

Figure 5. Total number of chunks created in the ingesters

As shown in Figure 5, the load is spread across all 4 ingesters, although one of them (indicated in red in the graph) works slightly harder than the others. This can be explained by the process stickiness of handling streams across Loki ingesters. We verified that the aggregated value at the end of the process indeed matched the number of chunks we found in the S3 bucket (617 chunk objects).

Number of chunks in the ingesters’ memory

Figure 6. Number of chunks in the ingesters’ memory

There are at most 22 chunks in memory in the system and at most 14 for a single ingester.

Write path performance

Below are the memory and CPU graphs of the uploaders, ingesters, and distributors during the write path.

Figure 7. Memory of the uploaders
Figure 8. CPU of the uploaders

Each log hour is uploaded into Loki using a standalone uploading process. To improve efficiency and maintain a good distribution of the required CPU, memory, and I/O resources over time, we interleaved the upload processes. Each upload process takes about 5 minutes on average, out of which, Promtail, the component that actually pushes the logs into Loki, takes about 3 minutes.

Figure 9. Memory of the ingesters
Figure 10. CPU of the ingesters

This graph shows that the memory and CPU of the ingesters correlates nicely with the distribution of logs per hour and the number of chunks in memory.

Figure 11. Memory of the distributors
Figure 12. CPU of the distributors

Looking into the CPU and memory consumption graphs of Loki distributors and ingesters, we can see that although both distributors and the ingesters consume an equal amount of CPU in this use case, the ingesters consume significantly more memory. The memory peak of the ingesters is 7.1GB and 3.7GB for a single ingester, while the total distributors’ memory reaches a maximum of 120MB. We can also see that CPU behavior is less consistent over time.

The CPU of the distributors and ingesters drops slightly after the last uploader finishes. This indicates that they have short queues. Actually, this was expected since we added delays to the start of the uploaders to avoid the congestion of the system.

Read path performance

Below are the memory and CPU graphs of the downloaders, queriers, and query-frontends, respectively, during the read path.

Figure 13. Memory of the downloaders
Figure 14. CPU of the downloaders

On average, the downloaders take about 4 minutes, while the upload was about 3 minutes on average. This is as opposed to our previous experience when they couldn’t reach completion and the downloaders took a few hours!

The memory pattern of the downloaders is different from the uploaders. This is because the uploaders use Promtail, which stays alive even after the upload is done; as a Go program it doesn’t release memory unless the machine is under pressure. The downloaders use LogCLI, which is also a Go program, but it terminates after all the data is retrieved and the memory is released.

Figure 15. Memory of the queriers
Figure 16. CPU of the queriers
Figure 17. Memory of the query-frontends
Figure 18. CPU of the query-frontends

We can see here that the CPU and memory of the queries and query-frontends correlate with the distribution of log records per hour.

We were happy to see that the load spreads evenly across the 4 queriers and the 2 query-frontends. The queriers are both CPU intensive and memory intensive compared to the query-frontends but also compared to the ingesters and distributors. The memory peak of the queriers is 10.2GB and 4.4GB for a single ingester.

Try it for yourself

After fine-tuning Loki, we were able to confidently conclude that it can be used efficiently for AI pipeline use cases. The amount of memory and CPU resources required by Loki for the process is very reasonable, including the process to ingest and extract large amounts of log records. We only spent a couple of weeks on this evaluation, but we assume that with more time, we can further optimize Loki’s settings to get even better performance.

We would love to hear from anyone interested in trying out this code, sharing ideas, or offering suggestions. You can find our code here. Go for it!

Contribution to the community

During the work on this evaluation, we found 2 bugs that were addressed by the community

https://github.com/grafana/loki/issues/3855

https://github.com/grafana/loki/issues/3838

--

--