4種常用Logstash filter

Hung-Tao Hsieh
Mar 31, 2018 · 7 min read

最近因為專案需要,跳進了EFLK個坑,才發現同事多麼強大

這裡的內容比較適合像我這種剛研究的新手 xD

1. grok

grok是一種最常被使用在logstash的filter plugin,使用它來結構化你的未結構化log。例如我今天有一個字串從filebeat傳過來長這樣:

[32mINFO [0m[03-29|16:36:02|print/print.go:20] fire wqwdqw                                    print=testing tickat=2018-03-29T16:36:02+0000

接著,我就可以使用grok的表示式來結構化:

\[%{DATA:textColor}%{LOGLEVEL:logLevel} %{NOTSPACE:notSpace}\[%{DATA:textColor}\[%{MONTHNUM:month}-%{MONTHDAY:day}\|%{TIME:time}\|%{GREEDYDATA:path}\] %{GREEDYDATA:context}  %{GREEDYDATA:keypairs}

所以在logstash我們就可以這樣寫:

filter {
grok {
match => { "log" => "\[%{DATA:textColor}%{LOGLEVEL:logLevel} %{NOTSPACE:notSpace}\[%{DATA:textColor}\[%{MONTHNUM:month}-%{MONTHDAY:day}\|%{TIME:HHmmss}\|%{GREEDYDATA:logPath}\] %{GREEDYDATA:logContext} %{GREEDYDATA:keyPairs}" }
}
}

輸出如下,同時也是符合elasticsearch 的index格式

{
"logPath" => "print/print.go:20",
"month" => "03",
"log" => "\e[32mINFO \e[0m[03-31|04:28:32|print/print.go:20] fire print=testing tickat=2018-03-29T16:36:02+0000",
"day" => "31",
"notSpace" => "\e",
"logLevel" => "INFO",
"keyPairs" => "print=testing tick at=2018-03-31T04:28:32+0000",
"logContext" => "fire ",
"HHmmss" => "04:28:32",
"textColor" => [
[0] "32m",
[1] "0m"
],
}

驗證

有一些線上的網頁可以驗證你寫的grok表示式,例如 grok debugger。kibana也有支援(我用的是6.2.3),點選左邊工具列的 Dev tools ,接著點選 grok debugger 即可。

模式

grok有基本的patterns,很好用,且是由官方提供

2. mutate

顧名思義,可以對特定欄位做增減,以上面的例子繼續。從上面的輸出看到month, date, HHmmss是可以獨立成一個欄位的,我們可以這樣寫:

mutate {
add_field => {
"logTime" => "%{month}/%{day} %{HHmmss}"
}
}

接著,又有一些無用的資料,例如textColor,、notSpace等等,把它們移除

mutate {
# format time
add_field => {
"logTime" => "%{month}/%{day} %{HHmmss}"
}
# remove dirty data
remove_field => [ "keyPairs", "textColor", "notSpace", "month", "day", "HHmmss"]
}

3. kv

有時候log會包含很多的key、value,可以使用kv來結構化,更棒的是,不需要寫迴圈,就可以把所有的key、value都填到fileds,這是我覺得最棒最聰明的地方。同樣以上面舉例,可以看到 keypairs 裡的資料長這樣

{
...
"notSpace" => "\e",
"logLevel" => "INFO",
"keyPairs" => "print=testing tick at=2018-03-31T04:28:32+0000",
"logContext" => "fire ",
...
}

所以我們可以這樣寫:

kv {
source => "keyPairs"
field_split_pattern => '((?:"[^"]*"|[^: ])*)=((?:"[^"]*"|[^ ])*)'
}

輸出長這樣:

{
"print" => "testing",
"tickat" => "2018-03-31T07:19:57+000"
}

大功告成拉!

Parsing

上面有使用到 field_split_pattern,但其實可以拿掉不用,因為預設的pattern已經很強大了,例如message含有keypairs的資料他也可以parsing,要注意的就是key不能給兩個words,例如:

logger.Info("value not in key, append new kv, key= ", "key1", "value2")--------
logstash
key : key1=value2
logger.Info("value not in key, append new kv, key= ")
--------
logstash
key :
logger.Info("Two words in the key", "tick at", t)
---------
logstash
at : yyyMMddHHmmss

4. Output

可以在Output下插入一個 key/value,之後logstash就會噴訊息出來嚕!codec也可以使用json!

stdout { codec => rubydebug }

完整的logstash.config

input {
beats {
port => {{ .Values.logstash.inputs.beat.port }}
}
}

filter {
grok {
match => { "log" => "\[%{DATA:textColor}%{LOGLEVEL:logLevel} %{NOTSPACE:notSpace}\[%{DATA:textColor}\[%{MONTHNUM:month}-%{MONTHDAY:day}\|%{TIME:HHmmss}\|%{GREEDYDATA:logPath}\] %{GREEDYDATA:logContext} %{GREEDYDATA:keyPairs}" }
}
kv {
source => "keyPairs"
field_split_pattern => '((?:"[^"]*"|[^: ])*)=((?:"[^"]*"|[^ ])*)'
}
mutate {
# format time
add_field => {
"logTime" => "%{month}/%{day} %{HHmmss}"
}
# remove dirty data
remove_field => [ "keyPairs", "textColor", "notSpace", "month", "day", "HHmmss"]
}
}

output {
elasticsearch {
hosts => [ "elasticsearch.{{ .Values.namespace }}:{{ .Values.logstash.outputs.elasticsearch.port }}" ]
user => "{{ .Values.logstash.outputs.elasticsearch.user }}"
password => "{{ .Values.logstash.outputs.elasticsearch.password }}"
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade