The Elastic stack is quite easy to set up and get running. On the other hand, it is a pretty complex science to get the most of it and keep it in good condition. In this article, we will take a look at some basic rules and best practices to keep your stack healthy and robust.
As we are talking about Elastic stack, we will cover Logstash in this article and Beats and Elasticsearch in the upcoming sections. Although Logstash is not as complicated as Elasticsearch itself, it is crucial for some use cases and also needs the same level of focus for its proper configuration.
Logstash work modus is quite simple, it ingests data, process them, and then it outputs them somewhere. Each of this phase requires different tuning and has different requirements. Processing phase heavily relies on raw processor power, while outputs rely on the speed of the system to which Logstash outputs data.
To be able to solve a problem, you need to know where it is, so If you are able to use Monitoring UI (part of X-Pack/Features) in Kibana, you have all information served in an easy-to-understand graphical way
If you are not that lucky, you can still get the information about running logstash instance by calling its API — which in default listens on 9600.
For example to get statistics about your pipelines, call:
curl -XGET http://localhost:9600/_node/stats/pipelines?pretty
and you will get all info in json format:
To better understand json output (and also Monitoring UI) you have to set id field for each input/filter/output in your Logstash pipeline definition:
By using pipelines, you can split your data processing into logical parts, and also you will gain the ability to set some options in a per-pipeline manner. It is even easier now with pipeline-to-pipeline communication. You can transfer data between pipelines, and it also allows you to create some creative configurations in a matter of architectural patterns. You can find more info and helpful examples in the official documentation.
Configure pipelines in YAML file, which is load at Logstash startup. Example configuration with three pipelines looks like this:.
The example uses pipeline config stored in files (instead of strings). Quite long and complicated parsing definitions is better to split into multiple files. That’s why we are using wildcards in the
path.config properties. Logstash than concatenates every file matching given expression in alphabetical orders — to avoid problems with the filter in a different order than expected, we name conf files with numbering prefix:
The persistent queue allows Logstash to write incoming events into filesystem and then loads them from there before processing. This mechanism brings you:
- In the case of high-load (which can’t be processed in real-time), you don’t have to store data in your application. Transmit them immediately Logstash. Store them in a queue, and process them continuously. Not storing your log files on persistent disk storage is a huge benefit when your application runs in a containerized environment.
- In a scenario when your application is under high-load, Logstash will hit its processing limit and tell Filebeat to stop sending new data. Filebeat stops reading log file. Only-place where your logs are stored then is in running container. In the case of a container crash, you can lose a portion of logs.
- By using persistent queue on your Logstash “ingest” pipeline, the logs are transferred almost immediately into Logstash, written into a queue, and you won’t lose anything.
- The safety net in case of Logstash failure — not processed data are still available and will be processed after it is back up
As you probably know, Logstash uses “workers” to parse, and output data and the number of workers define the maximum number of parallel parse-output streams.
Another good to know fact is that Logstash works with batches, so Logstash ingest few messages, then worker parses them (optional) and then outputs them. The parsing of a batch is complete when Logstash gets acknowledgment from all outputs about successful transmission. Only after that, the worker is capable of processing another portion of messages. Outputting to a super-fast elastic cluster and one slow only-as-backup cluster, we will still have to wait for the slow one, and the worker is meanwhile blocked. When all workers are used, Logstash is not capable of processing and outputting new messages.
To set the number of workers, we can use the property in logstash.yml:
The default value is equal to a number of host’s CPU cores. But as suggested in docs, it is recommended to increase the number of workers when we see that CPU is not fully used.
Another useful pipeline configuration parameter is
pipeline.batch.size. It tells Logstash, how many events it should process in once. That means, it takes a defined number of events from queue, processes them, and outputs them. The general rule is that larger batch is processed more efficiently, but we should be careful because of increased memory overhead and eventually the OOM crashes. More info on how to find the sweet spot in an article about tuning and profiling logstash.
The positive (side-)effect of batch size setting is, that some plugins take it into the account and can optimize their operation. For example, elasticsearch output plugin writes all events from batch using
_bulk API which is an efficient way how to put large portions of data into elasticsearch.
Be careful with groks
Grok filters are very CPU consuming, especially, if we have multiple long expressions in one grok (the first match wins) so you have to keep this in mind when writing your parsing configuration.
When dealing with log files parsing, we always recommend to log into JSON format and have most data in a structured form. The ideal situation is if you don’t have to use groks at all and leave all parsing to
json filter plugin.
To back this up with some real-world data: we had a java application with log4j logging into a text file and grok that made the parsing — timestamp, level, classname and lot more parsing. There were also log lines with network requests and responses more than 1000 characters long. After changing log format to JSON, and storing network related items into separate JSON fields, the Logstash throughput rose 7 times!
Logstash runs on JVM thus we need to take care of some java settings.
Most notable the heap size through well-known xms and xmx settings. Two most important rules:
- set xmx and xms to same value. This way your application will avoid very expensive heap resize operation
- set xmx to value, that garbage collector will not run too often
Especially, point number 2 is very hard to achieve and require a lot of effort to find optimal value, you can find useful tips & tricks in the official documentation.
Logstash is a crucial part of the elastic stack, and it is a robust tool. And like most such tools, also Logstash requires some fiddling with configuration and thinking about performance impact before writing parsing configuration. The tips above are things we had to tweak in our production environment and might give you some essential insight into Logstash performance tuning.