This article will introduce how to deploy all the components required to set up a high availability data pipeline with the ELK Cluster.
ELK Cluster architecture
To enable a high-availability cluster, I configure a ELK Cluster with a minimum of two nodes(CentOS7) on GCP Compute Engine.
- Filebeat: Filebeat is used to fetch logs files and ship them into Kafka.
- Kafka: Kafka is deployed between the producer(Filebeat) and Consumer(Logstash). Brokers the data flow to decouple processing from data producers, to buffer unprocessed messages.
- Logstash: Collect logs from the Kafka, processing it, and ships to Elasticsearch for indexing.
- Elasticsearch: Elasticsearch stores data from Logstash as JSON documents.
- Kibana: Kibana is a web interface to search for and display message from Elasticsearch cluster, which is hosted through Nginx or Apache.
Elasticsearch
Installing from the RPM repository
Create a file elasticsearch-7.x.repo
in the /etc/yum.repo.d/
directory for CentOS based distributions.
$ cat /etc/yum.repos.d/elasticsearch-7.x.repo
[elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md
Installing Elasticsearch from the elasticsearch-7.x
repository.
$ sudo yum install — enablerepo=elasticsearch-7.x elasticsearch
Checking the Elasticsearch is installed in yum.
$ rpm -q elasticsearch
elasticsearch-7.5.1–1.x86_64
Configuring Elasticsearch
Elasticsearch has three configuration files at /etc/elasticsearch/
:
- elasticsearch.yml
- jvm.options
- log4j2.properties
Backup the original config file and setting some basic configurations.
$ sudo ls -l /etc/elasticsearch/
$ sudo cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.orig
Since I created two nodes on GCP, I need to define the node name of GCP instances as a master-eligible node.
$ sudo vim /etc/elasticsearch/elasticsearch.yml# Use a descriptive name for your cluster:
cluster.name: elk-node# Use a descriptive name for the node:
node.name: elk-node-1# Set the bind address to a specific IP (IPv4 or IPv6):
network.host: 0.0.0.0
network.publish_host: 10.140.9.69# The default list of hosts is ["127.0.0.1", "[::1]"]
discovery.seed_hosts: ["elk-node-1", "elk-node-2"]# Bootstrap the cluster using an initial set of master-eligible nodes:
cluster.initial_master_nodes: ["elk-node-1", "elk-node-2"]# enable x-pack Monitoring
xpack.monitoring.collection.enabled: true
xpack.monitoring.history.duration: 7d
Editing the /etc/hosts
file to define the private IP of all nodes.
$ sudo vim /etc/hosts10.140.9.69 elk-node-1
10.140.9.70 elk-node-2
Systemd configuration
According to Elasticsearch, system limits must be specified via systemd. The systemd service file /usr/lib/systemd/system/elasticsearch.service
contains the limits that are applied by default.
To override them, run the following command to add a file /etc/systemd/system/elasticsearch.service.d/override.conf
.
$ sudo systemctl edit elasticsearch
[Service]
LimitMEMLOCK=infinity
Save the file and run Elasticsearch.
$ sudo systemctl start elasticsearch.service
$ sudo systemctl enable elasticsearch.service
By default, Elasticsearch will be running on port 9200
. To confirm that everything is working as expected. Using the curl
request and Elasticsearch response should look something like this:
$ curl 'http://localhost:9200'
{
"name" : "elk-node-1",
"cluster_name" : "elk-node",
"cluster_uuid" : "_na_",
"version" : {
"number" : "7.5.1",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
"build_date" : "2019-12-16T22:57:37.835892Z",
"build_snapshot" : false,
"lucene_version" : "8.3.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Logstash
Installing JAVA 8 or JAVA 11
Logstash requires JAVA 8 or JAVA 11. Run the following command to install OpenJDK:
$ sudo yum install java-11-openjdk
To confirm whether the installation was successful.
$ java -version
openjdk version "11.0.5" 2019-10-15 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.5+10-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode, sharing)
Installing Logstash from the RPM repository
Create a file logstash-7.x.repo
in the /etc/yum.repo.d/
directory for CentOS based distributions.
$ cat /etc/yum.repos.d/logstash-7.x.repo
[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
The repository is ready for use, to install Logstash.
$ sudo yum install logstash
Checking the Logstash is installed in yum.
$ rpm -q logstash
logstash-7.5.1-1.noarch
Configuring Logstash
I need configure a Logstash pipeline that collect logs from the Kafka, and process it, ships to Elasticsearch for indexing. Add the following in /etc/logstash/conf.d/
directory in a file with .conf
suffix.
input
- The Kafka broker will listen on port
9092
. - The
consumer_threads
option sets the number of input threads. - The
decorate_events
default value is1
. Option to add Kafka metadata like topic, message size to the event. - The
auto_offset_reset
does not have a default value. The valuelatest
will automatically reset the offset to the latest offset.
output
- Using Elasticsearch output to set the destination
index
for each event. - The Elasticsearch listen
localhost:9200
.
filter
- Using
Grok
filter plugin to parse unstructured log data into something structured and queryable. - The syntax for a
grok
pattern is%{SYNTAX:SEMANTIC}
. - A list of Logstash supported grok patterns.
- Grok Debugger and Grok Constructor are useful tools for constructing and testing grok filter on logs.
- The
remove_field
only work when it's considered the fileter was applied successfully.
$ cat /etc/logstash/conf.d/logstash.conf
input {
kafka {
id => "mykafka"
bootstrap_servers => "10.140.9.61:9092"
topics => ["logstash","kafka"]
consumer_threads => 2
decorate_events => true
codec => "json"
auto_offset_reset => "latest"
}
}
filter {
if [fields][tag] == "nginx-access" {
grok {
match => { "message" => ["%{IPORHOST:[remote_ip]} - %{DATA:[user_name]} \[%{HTTPDATE:[time]}\] \"%{WORD:[method]} %{DATA:[url]} HTTP/%{NUMBER:[http_version]}\" %{NUMBER:[response_code]} %{NUMBER:[body_sent][bytes]} \"%{DATA:[referrer]}\" \"%{DATA:[agent]}\""] }
remove_field => "message"
}
} else if [fields][tag] == "nginx-error" {
grok {
match => { "message" => ["%{DATA:[time]} \[%{DATA:[level]}\] %{NUMBER:[pid]}#%{NUMBER:[tid]}: (\*%{NUMBER:[connection_id]} )?%{GREEDYDATA:[message]}"] }
remove_field => "message"
}
}
}
output {
if [fields][tag] == "nginx-error" {
elasticsearch {
id => "nginx-error"
hosts => "localhost:9200"
index => "nginx-error-%{+YYYY.MM.dd}" }
} else if [fields][tag] == "nginx-access" {
elasticsearch {
id => "nginx-access"
hosts => "localhost:9200"
index => "nginx-access-%{+YYYY.MM.dd}" }
} else if [fields][tag] == "system" {
elasticsearch {
id => "system"
hosts => "localhost:9200"
index => "system-%{+YYYY.MM.dd}" }
} else if [fields][tag] == "laravel" {
elasticsearch {
id => "laravel"
hosts => "localhost:9200"
index => "laravel-%{+YYYY.MM.dd}" }
}
}
Backup the original config file (/etc/logstash/logstash.yml
) and setting some basic configurations.
$ sudo cp /etc/logstash/logstash.yml /etc/logstash/logstash.yml.orig
$ sudo vim /etc/logstash/logstash.yml
Set to true
to enable Elasticsearch X-Pack monitoring for Elasticsearch on the node.
xpack.monitoring.enabled: true
Save the file and run Logstash.
$ sudo systemctl start logstash.service
$ sudo systemctl enable logstash.service
Kibana
Installing from the RPM repository
Create a file kibana-7.x.repo
in the /etc/yum.repo.d/
directory for CentOS based distributions.
$ cat /etc/yum.repos.d/kibana-7.x.repo
[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
Installing Kibana from the kibana-7.x
repository.
$ sudo yum install — enablerepo=kibana-7.x kibana
Checking the Kibana is installed in yum.
$ rpm -q kibana
kibana-7.5.1-1.x86_64
Configuring Kibana
Kibana has one configuration file at /etc/kibana/
:
- kibana.yml
Backup the original config file and setting some basic configurations.
$ sudo cp /etc/kibana/kibana.yml /etc/kibana/kibana.yml.orig
$ sudo vim /etc/kibana/kibana.yml# Enables you specify a file where Kibana stores log output.
logging.dest: /var/log/kibana/kibana.log# Set the value of this setting to true to suppress all logging output other than error messages.
logging.quiet: true
Create the log output path.
$ sudo mkdir /var/log/kibana
$ sudo chown kibana:kibana /var/log/kibana/
To run Kibana.
$ sudo systemctl start kibana.service
$ sudo systemctl enable kibana.service
Kibana is a web application that you access through port 5601
.
Nginx
Installing and configuring Nginx
Installing Nginx.
$ sudo yum install nginx
Configuring the new Nginx configuration file at /etc/nginx/conf.d/
.
$ sudo vim /etc/nginx/conf.d/elk.conf
Enter the following configuration. This configuration results in passing all requests processed in the location /
to the proxied at the 127.0.0.1:5601
.
upstream kibana_cluster {
ip_hash;
server 127.0.0.1:5601;
}upstream elasticsearch_cluster {
ip_hash;
server 127.0.0.1:9200;
}server {
listen 80;
location / {
proxy_pass http://kibana_cluster;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
}
location /els {
proxy_pass http://elasticsearch_cluster;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
}
}
Save the file and run Nginx.
$ sudo systemctl start nginx.service
$ sudo systemctl enable nginx.service
Starting the data pipeline
To check Logstash is aggregating the data and shipping it into Elasticsearch. You should see laravel-*
, system-*
and nginx-error-*
indices listed.
$ curl -XGET 'http://localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .monitoring-es-7-2019.12.31 FuHUIbLZTl64uZJVyQngJQ 1 1 50087 66408 77.5mb 38.6mb
green open .monitoring-es-7-2019.12.30 bm38BvRVSASsBBDow4RgWA 1 1 81278 83484 64.9mb 36.2mb
green open .monitoring-kibana-7-2019.12.30 jOFF3QUiSkuAKFxVLCA8ag 1 1 9670 0 4.6mb 2.3mb
green open .monitoring-kibana-7-2019.12.31 3uBfNonqTyCwYO8vYOJz2w 1 1 4756 0 2.6mb 1.2mb
green open .apm-agent-configuration I-rTKWgqSMe-Eql11Uzfqg 1 1 0 0 566b 283b
green open .monitoring-logstash-7-2019.12.30 pKwei8AaQfuBTq1BbfaUEw 1 1 110226 0 11.7mb 5.8mb
green open system-2019.12.31 oP7LDpdxSAiiaT4zmObryQ 1 1 121 0 11.1mb 2.1mb
green open .monitoring-logstash-7-2019.12.31 pP5eYPvQTuu3IhVtwabheA 1 1 54346 0 6.6mb 3.3mb
green open system-2019.12.30 eYs2jEV8RE6ouk4alk35_g 1 1 41012 0 38mb 19mb
green open .kibana_1 KSKeVeUpQBCXbIaGz-gwmA 1 1 6 1 43.7kb 21.8kb
green open laravel-2019.12.31 uG8lWj27QAmP_aS5QfFLNQ 1 1 7 0 140.5kb 70.2kb
green open .kibana_task_manager_1 5BVg0wlpTPixahsS7m3pmA 1 1 2 1 71.1kb 35.5kb
green open nginx-error-2019.12.30 736xOVGpSOa474ErdHu1Kw 1 1 340 0 313.7kb 156.8kb
green open nginx-error-2019.12.31 BoEb2-AaTvGOg-XDSv6Y1A 1 1 202 0 317.7kb 165.2kb
Another way, you can check it on Kibana under Dev Tools.
If all is working as expected, we have to follow the steps to define the index pattern in Kibana.
1. On the Management tab, click Index Patterns.
2. Click Create index pattern button. It will display the list of indices for which logs are available.
3. Enter larval*
in the index pattern field and click Next step button.
4. In Configure settings, select @timestamp
as the Time Filter field name. Click Create index pattern button to add the index pattern.
5. It will present a table of all fields and associated data types in the index. You can manage your index pattern the default or deleting it when you longer need it.
Kibana Searching
Navigate to the Discover tab to take a look at the data. The default the time range in Kibana is set to the last 15 minutes.
Since version 6.2, the one way to query in Kibana was using Lucene syntax.
Since version 7.0, KQL (Kibana Query Language) is the default language for querying. If you like, you can revert to Lucene.
KQL
If using KQL without using the drop down menu and want to omit the query field, you can query term as a string. As a string "cp-order-dev-1"
, the query will get:
KQL with autocomplete is there to help you with selecting valid fields and values in your index pattern. Therefore, I suggest you take advantage of selecting items from these drop down:
Lucene
Kibana’s legacy query language is based on lucene query syntax. For more detailed information, see the Query string syntax docs.