Deploying an ELK Cluster in GCP

Jason Li
9 min readJan 21, 2020

--

This article will introduce how to deploy all the components required to set up a high availability data pipeline with the ELK Cluster.

ELK Cluster architecture

To enable a high-availability cluster, I configure a ELK Cluster with a minimum of two nodes(CentOS7) on GCP Compute Engine.

ELK Cluster architecture
  • Filebeat: Filebeat is used to fetch logs files and ship them into Kafka.
  • Kafka: Kafka is deployed between the producer(Filebeat) and Consumer(Logstash). Brokers the data flow to decouple processing from data producers, to buffer unprocessed messages.
  • Logstash: Collect logs from the Kafka, processing it, and ships to Elasticsearch for indexing.
  • Elasticsearch: Elasticsearch stores data from Logstash as JSON documents.
  • Kibana: Kibana is a web interface to search for and display message from Elasticsearch cluster, which is hosted through Nginx or Apache.

Elasticsearch

Installing from the RPM repository

Create a file elasticsearch-7.x.repo in the /etc/yum.repo.d/ directory for CentOS based distributions.

$ cat /etc/yum.repos.d/elasticsearch-7.x.repo
[elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

Installing Elasticsearch from the elasticsearch-7.x repository.

$ sudo yum install — enablerepo=elasticsearch-7.x elasticsearch

Checking the Elasticsearch is installed in yum.

$ rpm -q elasticsearch
elasticsearch-7.5.1–1.x86_64

Configuring Elasticsearch

Elasticsearch has three configuration files at /etc/elasticsearch/ :

  • elasticsearch.yml
  • jvm.options
  • log4j2.properties

Backup the original config file and setting some basic configurations.

$ sudo ls -l /etc/elasticsearch/
$ sudo cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.orig

Since I created two nodes on GCP, I need to define the node name of GCP instances as a master-eligible node.

$ sudo vim /etc/elasticsearch/elasticsearch.yml# Use a descriptive name for your cluster:
cluster.name: elk-node
# Use a descriptive name for the node:
node.name: elk-node-1
# Set the bind address to a specific IP (IPv4 or IPv6):
network.host: 0.0.0.0
network.publish_host: 10.140.9.69
# The default list of hosts is ["127.0.0.1", "[::1]"]
discovery.seed_hosts: ["elk-node-1", "elk-node-2"]
# Bootstrap the cluster using an initial set of master-eligible nodes:
cluster.initial_master_nodes: ["elk-node-1", "elk-node-2"]
# enable x-pack Monitoring
xpack.monitoring.collection.enabled: true
xpack.monitoring.history.duration: 7d

Editing the /etc/hosts file to define the private IP of all nodes.

$ sudo vim /etc/hosts10.140.9.69 elk-node-1
10.140.9.70 elk-node-2

Systemd configuration

According to Elasticsearch, system limits must be specified via systemd. The systemd service file /usr/lib/systemd/system/elasticsearch.service contains the limits that are applied by default.

To override them, run the following command to add a file /etc/systemd/system/elasticsearch.service.d/override.conf.

$ sudo systemctl edit elasticsearch
[Service]
LimitMEMLOCK=infinity

Save the file and run Elasticsearch.

$ sudo systemctl start elasticsearch.service
$ sudo systemctl enable elasticsearch.service

By default, Elasticsearch will be running on port 9200. To confirm that everything is working as expected. Using the curl request and Elasticsearch response should look something like this:

$ curl 'http://localhost:9200'
{
"name" : "elk-node-1",
"cluster_name" : "elk-node",
"cluster_uuid" : "_na_",
"version" : {
"number" : "7.5.1",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
"build_date" : "2019-12-16T22:57:37.835892Z",
"build_snapshot" : false,
"lucene_version" : "8.3.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

Logstash

Installing JAVA 8 or JAVA 11

Logstash requires JAVA 8 or JAVA 11. Run the following command to install OpenJDK:

$ sudo yum install java-11-openjdk

To confirm whether the installation was successful.

$ java -version
openjdk version "11.0.5" 2019-10-15 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.5+10-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode, sharing)

Installing Logstash from the RPM repository

Create a file logstash-7.x.repo in the /etc/yum.repo.d/ directory for CentOS based distributions.

$ cat /etc/yum.repos.d/logstash-7.x.repo
[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

The repository is ready for use, to install Logstash.

$ sudo yum install logstash

Checking the Logstash is installed in yum.

$ rpm -q logstash
logstash-7.5.1-1.noarch

Configuring Logstash

I need configure a Logstash pipeline that collect logs from the Kafka, and process it, ships to Elasticsearch for indexing. Add the following in /etc/logstash/conf.d/ directory in a file with .conf suffix.

input

  • The Kafka broker will listen on port 9092.
  • The consumer_threads option sets the number of input threads.
  • The decorate_events default value is 1. Option to add Kafka metadata like topic, message size to the event.
  • The auto_offset_reset does not have a default value. The value latest will automatically reset the offset to the latest offset.

output

  • Using Elasticsearch output to set the destination index for each event.
  • The Elasticsearch listen localhost:9200.

filter

  • Using Grok filter plugin to parse unstructured log data into something structured and queryable.
  • The syntax for a grok pattern is %{SYNTAX:SEMANTIC}.
  • A list of Logstash supported grok patterns.
  • Grok Debugger and Grok Constructor are useful tools for constructing and testing grok filter on logs.
  • The remove_field only work when it's considered the fileter was applied successfully.
$ cat /etc/logstash/conf.d/logstash.conf
input {
kafka {
id => "mykafka"
bootstrap_servers => "10.140.9.61:9092"
topics => ["logstash","kafka"]
consumer_threads => 2
decorate_events => true
codec => "json"
auto_offset_reset => "latest"
}
}
filter {
if [fields][tag] == "nginx-access" {
grok {
match => { "message" => ["%{IPORHOST:[remote_ip]} - %{DATA:[user_name]} \[%{HTTPDATE:[time]}\] \"%{WORD:[method]} %{DATA:[url]} HTTP/%{NUMBER:[http_version]}\" %{NUMBER:[response_code]} %{NUMBER:[body_sent][bytes]} \"%{DATA:[referrer]}\" \"%{DATA:[agent]}\""] }
remove_field => "message"
}
} else if [fields][tag] == "nginx-error" {
grok {
match => { "message" => ["%{DATA:[time]} \[%{DATA:[level]}\] %{NUMBER:[pid]}#%{NUMBER:[tid]}: (\*%{NUMBER:[connection_id]} )?%{GREEDYDATA:[message]}"] }
remove_field => "message"
}
}
}
output {
if [fields][tag] == "nginx-error" {
elasticsearch {
id => "nginx-error"
hosts => "localhost:9200"
index => "nginx-error-%{+YYYY.MM.dd}" }
} else if [fields][tag] == "nginx-access" {
elasticsearch {
id => "nginx-access"
hosts => "localhost:9200"
index => "nginx-access-%{+YYYY.MM.dd}" }
} else if [fields][tag] == "system" {
elasticsearch {
id => "system"
hosts => "localhost:9200"
index => "system-%{+YYYY.MM.dd}" }
} else if [fields][tag] == "laravel" {
elasticsearch {
id => "laravel"
hosts => "localhost:9200"
index => "laravel-%{+YYYY.MM.dd}" }
}
}

Backup the original config file (/etc/logstash/logstash.yml) and setting some basic configurations.

$ sudo cp /etc/logstash/logstash.yml /etc/logstash/logstash.yml.orig
$ sudo vim /etc/logstash/logstash.yml

Set to true to enable Elasticsearch X-Pack monitoring for Elasticsearch on the node.

xpack.monitoring.enabled: true

Save the file and run Logstash.

$ sudo systemctl start logstash.service
$ sudo systemctl enable logstash.service

Kibana

Installing from the RPM repository

Create a file kibana-7.x.repo in the /etc/yum.repo.d/ directory for CentOS based distributions.

$ cat /etc/yum.repos.d/kibana-7.x.repo
[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Installing Kibana from the kibana-7.x repository.

$ sudo yum install — enablerepo=kibana-7.x kibana

Checking the Kibana is installed in yum.

$ rpm -q kibana
kibana-7.5.1-1.x86_64

Configuring Kibana

Kibana has one configuration file at /etc/kibana/:

  • kibana.yml

Backup the original config file and setting some basic configurations.

$ sudo cp /etc/kibana/kibana.yml /etc/kibana/kibana.yml.orig
$ sudo vim /etc/kibana/kibana.yml
# Enables you specify a file where Kibana stores log output.
logging.dest: /var/log/kibana/kibana.log
# Set the value of this setting to true to suppress all logging output other than error messages.
logging.quiet: true

Create the log output path.

$ sudo mkdir /var/log/kibana
$ sudo chown kibana:kibana /var/log/kibana/

To run Kibana.

$ sudo systemctl start kibana.service
$ sudo systemctl enable kibana.service

Kibana is a web application that you access through port 5601.

Kibana

Nginx

Installing and configuring Nginx

Installing Nginx.

$ sudo yum install nginx

Configuring the new Nginx configuration file at /etc/nginx/conf.d/.

$ sudo vim /etc/nginx/conf.d/elk.conf

Enter the following configuration. This configuration results in passing all requests processed in the location / to the proxied at the 127.0.0.1:5601.

upstream kibana_cluster {
ip_hash;
server 127.0.0.1:5601;
}
upstream elasticsearch_cluster {
ip_hash;
server 127.0.0.1:9200;
}
server {
listen 80;
location / {
proxy_pass http://kibana_cluster;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
}

location /els {
proxy_pass http://elasticsearch_cluster;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
}
}

Save the file and run Nginx.

$ sudo systemctl start nginx.service
$ sudo systemctl enable nginx.service

Starting the data pipeline

To check Logstash is aggregating the data and shipping it into Elasticsearch. You should see laravel-*, system-* and nginx-error-* indices listed.

$ curl -XGET 'http://localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .monitoring-es-7-2019.12.31 FuHUIbLZTl64uZJVyQngJQ 1 1 50087 66408 77.5mb 38.6mb
green open .monitoring-es-7-2019.12.30 bm38BvRVSASsBBDow4RgWA 1 1 81278 83484 64.9mb 36.2mb
green open .monitoring-kibana-7-2019.12.30 jOFF3QUiSkuAKFxVLCA8ag 1 1 9670 0 4.6mb 2.3mb
green open .monitoring-kibana-7-2019.12.31 3uBfNonqTyCwYO8vYOJz2w 1 1 4756 0 2.6mb 1.2mb
green open .apm-agent-configuration I-rTKWgqSMe-Eql11Uzfqg 1 1 0 0 566b 283b
green open .monitoring-logstash-7-2019.12.30 pKwei8AaQfuBTq1BbfaUEw 1 1 110226 0 11.7mb 5.8mb
green open system-2019.12.31 oP7LDpdxSAiiaT4zmObryQ 1 1 121 0 11.1mb 2.1mb
green open .monitoring-logstash-7-2019.12.31 pP5eYPvQTuu3IhVtwabheA 1 1 54346 0 6.6mb 3.3mb
green open system-2019.12.30 eYs2jEV8RE6ouk4alk35_g 1 1 41012 0 38mb 19mb
green open .kibana_1 KSKeVeUpQBCXbIaGz-gwmA 1 1 6 1 43.7kb 21.8kb
green open laravel-2019.12.31 uG8lWj27QAmP_aS5QfFLNQ 1 1 7 0 140.5kb 70.2kb
green open .kibana_task_manager_1 5BVg0wlpTPixahsS7m3pmA 1 1 2 1 71.1kb 35.5kb
green open nginx-error-2019.12.30 736xOVGpSOa474ErdHu1Kw 1 1 340 0 313.7kb 156.8kb
green open nginx-error-2019.12.31 BoEb2-AaTvGOg-XDSv6Y1A 1 1 202 0 317.7kb 165.2kb

Another way, you can check it on Kibana under Dev Tools.

Checking the indices on Kibana under Dev Tools

If all is working as expected, we have to follow the steps to define the index pattern in Kibana.

1. On the Management tab, click Index Patterns.

Index Patterns

2. Click Create index pattern button. It will display the list of indices for which logs are available.

Create index pattern

3. Enter larval* in the index pattern field and click Next step button.

Enter name in the index pattern

4. In Configure settings, select @timestamp as the Time Filter field name. Click Create index pattern button to add the index pattern.

Select @timestamp as the Time Filter field name

5. It will present a table of all fields and associated data types in the index. You can manage your index pattern the default or deleting it when you longer need it.

Manage the index fields

Kibana Searching

Navigate to the Discover tab to take a look at the data. The default the time range in Kibana is set to the last 15 minutes.

Default time range in Kibana is set to the last 15 minutes

Since version 6.2, the one way to query in Kibana was using Lucene syntax.

Since version 7.0, KQL (Kibana Query Language) is the default language for querying. If you like, you can revert to Lucene.

KQL

If using KQL without using the drop down menu and want to omit the query field, you can query term as a string. As a string "cp-order-dev-1", the query will get:

KQL with autocomplete is there to help you with selecting valid fields and values in your index pattern. Therefore, I suggest you take advantage of selecting items from these drop down:

Using the drop down menu
The query results

Lucene

Kibana’s legacy query language is based on lucene query syntax. For more detailed information, see the Query string syntax docs.

--

--