How and Why to load test your logging environment???

Published in

techietalks

4 min readNov 24, 2020

Are you looking for a perfect solution to load test or performance test your logging environment with/without any 3rd party tools? This article is especially for you and you only!!

In my logging environment, I have Logstash installed on our application servers to send logs to Kafka. From Kafka, we have several Logstash servers consuming logs and do some sort of filtering and then, logstash write logs to multiple indexes on Elasticsearch depending on the tags present in the logs.

Before onboarding the application logs to our freshly created logging environment, I wanted to perform/simulate a load test with actual log data. My objectives for this experiments is to understand below factors,

Maximum index rate which can be achieved in Elasticsearch
How well the Logstash events receive rate and events emit rate can be tuned by adjusting JVM options, pipeline workers, batch size and by optimising nested parsing steps in Logstash pipeline configuration
To understand how well our Kafka handle log spikes and tweak the number of partitions
To understand the network throughput and bandwidth consumption by each component

Let’s begin!

To perform this load test, you should have created and tested connectivity between Kafka, Logstash, Elasticsearch and Kibana. This load test consists of below steps

Data preparation
Creating a Kafka topic with n number of partitions
Creating Elasticsearch index with n number of shards
Start generating load in Kafka
Tweak the parameters and repeat from step 2

The n number which mentioned above can be tweaked along with the experiment. You can start with small numbers and can find the optimum n value. Beware! n values are always smaller in most of the cases and as the value increase, complexity also increases. So be cautious while playing with it. You should also have enabled the logstash monitoring option with elasticsearch and kibana.

Step1: Data preparation

TL;DR: If you have already created the payload file somehow. Skip to the next step.

For preparing the payload for the load testing, I logged into our application server and downloaded all application log files to my test Linux server.

After that, I copied the logstash configuration from that application server, which is used to send logs to our test logging environment. Then I changed all the input file path patterns to match with the file path of log files I downloaded locally. Also, I edited the output configuration to use file output plugin to write the formatted logs to a file, named payload.json.

Step2: Creating Kafka topic with n number of partitions

Create the Kafka topic by executing the following command

/opt/kafka/current/bin/kafka-topics.sh --bootstrap-server yourkafkabtoker:9092 --create --topic loadtest --partitions 10 --replication-factor 1

After creating the topic you can verify whether the topic is created or not by executing below command

/opt/kafka/current/bin/kafka-topics.sh --bootstrap-server yourkafkabroker:9092 --describe --topic loadtest

Step3: Creating Elasticsearch index with n number of shards

Create the index in elasticsearch by executing below PUT request in Kibana dev console,

PUT /loadtest
{
 “settings”: {
 “index”: {
 “number_of_shards”: 3,
 “number_of_replicas”: 0 
 }
 }
}

Step4: Start generating load in Kafka

Here we will be using native performance tool from Kafka. SSH into a Kafka broker and execute below command,

/opt/kafka/current/bin/kafka-producer-perf-test.sh --topic loadtest --throughput 1000000 --num-records 90000000 --producer-props bootstrap.servers=youranotherbroker:9092 --payload-file payload.json

Make sure to replace youranotherbroker with you Kafka broker FQDN.
You can try adjusting the throughput and number of records parameter.

Lastly…

Now you have generated sufficient load to Kafka. Keep an eye on all the components. Adjust the configurations one by one, not altogether.

My observations

For Logstash, I adjusted the number of pipeline workers and batch size in each experiment. By default, pipeline worker number is same as the number of CPU cores. For me doubling that number was enough to attain 60% of CPU saturation at times of load. By default batch size is 125, I doubled it in each experiment and found sweet-spot at 2000.
For Elasticsearch, I had to increase the bulk queue value from 500 to 2000 to accept the batches of logs send from logstash. I also observed improvement in indexing rate when I changed the number of primary shards from 1 to 10. Managing a cluster with a large number of shards is very difficult. You should create proper ILM policies to keep the total number of shards in your cluster optimum.
For Kafka, I tried increasing the number of partitions and found drastic improvement in event consumption rate. Tweak the number of partition based on your requirement.
I observed the network traffic in each server by running nethogs and I found it stable. Nethogs is a CLI utility which helps to understand the bandwidth consumption of each process running in that server.

I have tried to generalize the idea of testing a logging environment at my best in this article. Hope you have enjoyed. Thank you!! 🙏