How to Use Grok to Structure Unstructured Data in Logstash

Elastic (ELK) Stack Tips and Tricks for Transforming Log Data

Songtham Tung
Jan 29, 2019 · 5 min read

If you’re using the Elastic (ELK) Stack and are interested in mapping custom Logstash logs to Elasticsearch, then this post is for you.

Image for post
Image for post

The ELK Stack is an acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Together, they form a log management platform.

  • Elasticsearch is a search and analytics engine.
  • Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch.
  • Kibana lets users visualize data with charts and graphs in Elasticsearch.

Beats came later on and is a lightweight data shipper. The introduction of Beats transformed ELK Stack to Elastic Stack, but that is besides the point.

This article focuses on Grok, which is a feature within Logstash that can transform your logs before they are forwarded to a stash. For our purposes, I will only talk about processing data from Logstash to Elasticsearch.

Grok

Image for post
Image for post

Grok is filter within Logstash that is used to parse unstructured data into something structured and queryable. It sits on top of Regular Expression (regex) and uses text patterns to match lines in log files.

As we will see in the following sections, using Grok makes a big difference when it comes to effective log management.

Without Grok your Log Data is Unstructured

Image for post
Image for post
A single log line in Kibana.

Without Grok, when logs get sent from Logstash to Elasticsearch and rendered in Kibana, it only appears in the message value.

Querying for meaningful information is difficult in this situation because all of the log data is stored in one key. It would be better if the log messages were organized better.

Log Data

Unstructured

localhost GET /v2/applink/5c2f4bb3e9fda1234edc64d 400 46ms 5bc6e716b5d6cb35fc9687c0

If you take a closer look at the raw data, you can see that it’s actually made up of different parts, each separated by a space delimiter.

For more experienced developers, you can probably guess what each of the parts mean and that it’s a log message from an API call. The representation of each item is outlined below.

Structured

  • localhost == environment
  • GET == method
  • /v2/applink/5c2f4bb3e9fda1234edc64d == url
  • 400 == response_status
  • 46ms == response_time
  • 5bc6e716b5d6cb35fc9687c0 == user_id

As we can see in the structured data, there is an order to unstructured logs. The next step then is to programmatically refine the raw data. This is where Grok shines.

Grok Patterns

Built In

Logstash comes with over a 100 built in patterns for structuring unstructured data. You should definitely take advantage of this when possible for common system logs like apache, linux, haproxy, aws, and so forth.

However, what happens when you have custom logs like the example above? You have to build your own custom Grok pattern.

Custom

It takes trial and error to build your own custom Grok pattern. For me, I used the Grok Debugger and Grok Patterns to figure it out.

Please note that the syntax for Grok patterns is: %{SYNTAX:SEMANTIC}

The first thing I tried doing was going to the Discover tab in Grok Debugger. I thought that it would be great if this tool can auto generate the Grok pattern, but it wasn’t too helpful as it only found two matches.

Image for post
Image for post
Grok Debugger ‘Discover’ only matched 2 words

Using this discovery, I began building my own pattern on Grok Debugger using the syntax found on Elastic’s github page.

Image for post
Image for post
https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns

After playing around with different syntaxes, I was finally able to structure the log data in the way I wanted to.

Image for post
Image for post
Structuring Unstructured Log Data with Grok Debugger

https://grokdebug.herokuapp.com/

localhost GET /v2/applink/5c2f4bb3e9fda1234edc64d 400 46ms 5bc6e716b5d6cb35fc9687c0

%{WORD:environment} %{WORD:method} %{URIPATH:url} %{NUMBER:response_status} %{WORD:response_time} %{USERNAME:user_id}

{
"environment": [
[
"localhost"
]
],
"method": [
[
"GET"
]
],
"url": [
[
"/v2/applink/5c2f4bb3e9fda1234edc64d"
]
],
"response_status": [
[
"400"
]
],
"BASE10NUM": [
[
"400"
]
],
"response_time": [
[
"46ms"
]
],
"user_id": [
[
"5bc6e716b5d6cb35fc9687c0"
]
]
}

With the Grok pattern in hand and the data mapped, the final step is to add it to Logstash.

Update Logstash.conf

On the server that you installed the ELK stack on, navigate to Logstash config.

sudo vi /etc/logstash/conf.d/logstash.conf

Paste in the changes.

input { 
file {
path => "/your_logs/*.log"
}
}
filter{
grok {
match => { "message" => "%{WORD:environment} %{WORD:method} %{URIPATH:url} %{NUMBER:response_status} %{WORD:response_time} %{USERNAME:user_id}"}
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
}
}

After you save the changes, restart Logstash and check its status to make sure that it’s still working.

sudo service logstash restart
sudo service logstash status

Lastly, to make sure that the changes take affect, be sure to refresh the Elasticsearch index for Logstash in Kibana!

Image for post
Image for post
Refresh the Elasticsearch index for Logstash in Kibana

With Grok your Log Data is Structured!

Image for post
Image for post
Grok automatically structures unstructured logs

As we can see in the image above, Grok is able to automatically map log data to Elasticsearch. This makes it easier to manage your logs and to quickly query for information. Instead of digging through log files to debug, you can simply filter by what you’re looking for like environment or url.

Try giving Grok expressions a shot! If you have another way of doing this or you have any problems with examples above, just drop a comment below to let me know.

Thanks for reading — and please follow me here on Medium for more interesting software engineering articles!

Resources

https://www.elastic.co/blog/do-you-grok-grok

https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns

https://grokdebug.herokuapp.com/

HackerNoon.com

#BlackLivesMatter

Sign up for Get Better Tech Emails via HackerNoon.com

By HackerNoon.com

how hackers start their afternoons. the real shit is on hackernoon.com. Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Songtham Tung

Written by

Technical Account Manager | SF Native x BKK Resident 🇺🇲 🇹🇭

HackerNoon.com

Elijah McClain, George Floyd, Eric Garner, Breonna Taylor, Ahmaud Arbery, Michael Brown, Oscar Grant, Atatiana Jefferson, Tamir Rice, Bettie Jones, Botham Jean

Songtham Tung

Written by

Technical Account Manager | SF Native x BKK Resident 🇺🇲 🇹🇭

HackerNoon.com

Elijah McClain, George Floyd, Eric Garner, Breonna Taylor, Ahmaud Arbery, Michael Brown, Oscar Grant, Atatiana Jefferson, Tamir Rice, Bettie Jones, Botham Jean

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store