Logstash? ELK?

Murilo Machado
8 min readNov 7, 2019

--

LogStash is an open source event processing engine. It works with pipelines to handle text input, filtering, and outputs, which can be sent to ElasticSearch or any other tool.

It supports data from many sources and it can send that processed and parsed data to many others destinations.

LogStash it’s a great tool that can handle a very good data manipulation, that can be used to store only the required data to do some analysis, or even to use it later as dashboard with metrics. It also can handle conditionals to check what should be done with each chunk of data captured.

The pipeline from LogStash it’s pretty simple:
- First we make some configurations about what we’re going to receive, and what should be read.
- Then we’re going to manipulate that captured data, and filter only what do we want.
- And finally we can ship that data to any destination possible.

Pipeline = Input + filter + output
Pipeline = Input + filter + output

The input and outputs process can use it Codecs, which may change the representative data received or sent by those process.

LogStash are most used to handle logFiles format, but it can also deals easily with others data format, such as CSV, JSON, XML and many others.
LogStash has also a great synergy with ElasticSearch & Kibana, which are tools that together represents the ELK stack (ElasticSearch, LogStash and Kibana).

They are great together, since the LogStash can manipulate data in many ways and send it to ElasticSearch, which may retrive those data later to create metrics, graphs, and dashboard on Kibana.

Running a single event

To run LogStash we will need the JDK installed on our current machine, and also the LogStash by itself.

After install the JDK, and download and unzip the LogStash directory in some project folder.


We can run the LogStash with the following command:
bin/logstash -e ‘input { stdin {} } output { stdout { codec => “dots” } }’
bin/logstash -f path/to/dot.conf

In our first case we’re running the LogStash, and saying to him, that it should capture any input sent on our current shell session.

In our second case, we’re running the LogStash with the configurations based on the file from the path specified (this is the most properly way to run LogStash).

If we write the same sentence that we used on first case, in a file with the dot.conf format, and run it. The result will be pretty much the same, but way easier to read, since we can break the lines when dealing with the pipelines logic inside of a text file named filename.conf:

Our file would look like this: input {
stdin {}
} output {
stdout {
codec => “dots”
}
}

We can change between many others codecs, but the most used ones is “dots” codec, and “rubydebug” codecs. The dots codec is a good tool to get an idea about how many logs has been processed. And the rubydebug codec is awesome to read and understand the data that are being parsed.

We can also use the output field to send that parsed or captured data to our ElasticSearch instance.


The output configuration would look like this:
output {
elasticsearch { hosts => [“localhost:9200”] }
}

LogStash by default saves all the data that we’re processing as a hash with the key “message”, and the value as our entire captured text. If we want to send the input as a json, we must use the json codec, which is pretty simple.


input {
stdin {
codec => json
}
}

By using this pattern, we can save the keys and values from our json with a pretty good format, without sending everything as the same message value. But it only works for inputs which is completelly with JSON format, otherwise, we’re going to receive a jsonparsefailure tag, and the message will be saved as a string just like the case where we didn’t use any input codec.

A very good thing to be aware about the JSON input codec, is that if we send to him an array of objects, he is going to send on the output an event per object from that array.
And by doing this, we can send by example multiple events with a single input.

Saving the output into a file

In order to save an output into a file, we have to add the file key into the output piece of our pipeline, and define the path of the file that we may want to save our data (It can be a relative path, or an absolute path).

Like this:

output {
file {
path => “output.txt”
}
}
oroutput {
file {
path => “/var/log/output.txt”
}
}

Sending, parsing, and saving data with HTTP plugin

In order to send our data to LogStash using http, we have the http plugin, which is pretty simple to use. We just have to add a http hash inside of our input step of the pipeline, and inside of it, define our host and port. If we use it together with the json codec, we must send our request from our httpclient with the header “Content-Type: application/json”, in order to send and receive the correct data format. Otherwise, we’re going to receive the jsonparsefailure tag in our message response.


input {
stdin {
codec => json
}
http {
host => “127.0.0.1”
port => 9991
}
}

Just to test if this is working, we can send a simple cURL:
curl -X POST 127.0.0.1:9991 -H ‘Content-Type: application/json’ -d ‘{“a”:{“b”:[1,2,3]}}’

And then, just check our output file with the complete response.

Right now, if we use all the tips gathered by our little guide, your dot.conf file may look pretty much like this: input {
stdin {
codec => json
}
http {
host => “127.0.0.1”
port => 9991
}
}
output {
stdout {
codec => rubydebug
}
file {
path => “output.txt”
}
}

At this moment, we’re using the input and output steps from our LogStash pipeline, but we also have the filter step, which is used between the input and the output to format, parse and filter data.

The first filter that we’re going to use is the mutate filter. Mutate operator can be used with many different functions.

At first we’re going to use it, with converter functions, which can convert data types from values and arrays of values. By using it, we can convert a string value into an integer value easily using those operations.


This is what mutate operation with convert function looks like:

filter {
mutate {
convert => { “fieldName” => “dataType” }
}
}

This mutate operator with convert function is a pretty good way to lock the dataTypes that we may want to display in our output files, since we can’t assure that all the logs formats from some application will respond always with the same formats.

When speaking about filters, we have the common filter options, which are some options that can be used in many filters actions.
Those are the fields, and a brief explanation about how do they work.

  • add_field, This common option adds a new field to the triggered event;
  • remove_field, This one remove a field from our event;
  • add_tag and remove_tag does pretty much the same concept but with tags options;
A pretty simple example of one of those actions is this: 

filter {
mutate {
remove_field => [ “host” ]
}
}

Which in this case will simply remove the “host” value that usually are returned when doing inputs with the http plugin.

LogStash execution model

When talking about the execution model from LogStash, first we have to understand which steps LogStash uses to process an event, which are:

1 — Send an input;
2 — Trigger an event;
3 — Add that event to Work Queue;
4 — Run the batch time (which is the time used to define which worker will work with that event);
5 — And finally run the pipeline on our worked captured

By understanding those steps, we can be aware that LogStash can handle multiple inputs at the same time, since they’re going to be managed by the Work Queue step.

We also have that batch delay to handle multiple event triggers on the same time.

In the case of ElasticSearch, LogStash will get this batch delay time, and join many outputs that are going to be sent to ElasticSearch, and it will call a BULK request to send a bunch of outputs at the same time.

LogStash automatic reload flag

In Logstash, we have an automatic reload flag, which can be triggered by running Logstash with the flag — config.reload.automatic.

This is a very good approach that you can use to test new dot.conf files with many different setups, without having to restart LogStash all the time. The only thing that we have to be aware is that this option, does not works with the stdin plugin.

Understanding sincedb files

In LogStash we have a sincedb file, which is the file that contains the informations about which files have been processed before, and where it should start to process the new information that my be appended to any log file that is being watched.


Those sincedb files are contained on the following path:
LogstashRoot/data/plugins/input/file

If we read any of those files named .sincedb_hash, we’ll receive the data that displays what have been processed from those files, and where LogStash should be watching to know if this file has some new data to process.

If at some point we wish to process the same file that have already been processed, from the beginning, we can just delete the sincedb file referred to him.

Parsing file and creating custom keys and values

In order to create a more concise and user friendly output, we may need to parse the output with some keys and values, to define what do we want to capture and analyse later.
To do that, we have the Grok plugin, which is a plugin that can handle this data parse and transformation based on his syntax, and Oniguruma regexes patterns.
By using this powerful plugin, we can create logs that are way easier to understand and to display in graphs and dashboards.

Grok has a pretty fancy format to use it. It works with some predefined alias to capture data used in most of the logs structure, and by using that, you can avoid to write some enormous regex that usually we have to create.


Also you can give key names to those captured regex, using the following pattern:
%{SYNTAX:Semantic}
%{WORD:name}

The syntax fields is a reference to the Grok regex that we’re using, and the Semantic means the name that we should display on that key. We can find all the Grok patterns on the following link:


Also we can create our owns patterns by the usage of Oniguruma Regex pattern:
docker service create — name nginx -p 8080:80 — limit-memory=64M — limit-cpu=0.2 — replicas=5 nginxdocker service logs $(docker service ls -q) -f &> /var/log/nginxReplicas.log
while true;do curl 0.0.0.0:8080;done
input {
file {
path => [“/var/log/nginx/access.log”, “/var/log/nginx/error.log”]
type => “nginx”
}
}
filter {
grok {
match => [ “message” , “%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}”]
overwrite => [ “message” ]
}
mutate {
convert => [“response”, “integer”]
convert => [“bytes”, “integer”]
convert => [“responsetime”, “float”]
}
geoip {
source => “clientip”
target => “geoip”
add_tag => [ “nginx-geoip” ]
}
date {
match => [ “timestamp” , “dd/MMM/YYYY:HH:mm:ss Z” ]
remove_field => [ “timestamp” ]
}
useragent {
source => “agent”
}
}
output {
elasticsearch {
hosts => [“https://eb843037.qb0x.com:30024/"]
user => “5d53675f1e0dd8be3ada”
password => “3b193023f7”
index => “nginx-%{+YYYY.MM.dd}”
document_type => “nginx_logs”
}
stdout { codec => rubydebug }
}

--

--