Deploying an ELK Cluster in GCP

9 min readJan 21, 2020

This article will introduce how to deploy all the components required to set up a high availability data pipeline with the ELK Cluster.

ELK Cluster architecture

To enable a high-availability cluster, I configure a ELK Cluster with a minimum of two nodes(CentOS7) on GCP Compute Engine.

Filebeat: Filebeat is used to fetch logs files and ship them into Kafka.
Kafka: Kafka is deployed between the producer(Filebeat) and Consumer(Logstash). Brokers the data flow to decouple processing from data producers, to buffer unprocessed messages.
Logstash: Collect logs from the Kafka, processing it, and ships to Elasticsearch for indexing.
Elasticsearch: Elasticsearch stores data from Logstash as JSON documents.
Kibana: Kibana is a web interface to search for and display message from Elasticsearch cluster, which is hosted through Nginx or Apache.

Elasticsearch

Installing from the RPM repository

Create a file elasticsearch-7.x.repo in the /etc/yum.repo.d/ directory for CentOS based distributions.

$ cat /etc/yum.repos.d/elasticsearch-7.x.repo
[elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md

Installing Elasticsearch from the elasticsearch-7.x repository.

$ sudo yum install — enablerepo=elasticsearch-7.x elasticsearch

Checking the Elasticsearch is installed in yum.

$ rpm -q elasticsearch
elasticsearch-7.5.1–1.x86_64

Configuring Elasticsearch

Elasticsearch has three configuration files at /etc/elasticsearch/ :

elasticsearch.yml
jvm.options
log4j2.properties

Backup the original config file and setting some basic configurations.

$ sudo ls -l /etc/elasticsearch/
$ sudo cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.orig

Since I created two nodes on GCP, I need to define the node name of GCP instances as a master-eligible node.

$ sudo vim /etc/elasticsearch/elasticsearch.yml# Use a descriptive name for your cluster:
cluster.name: elk-node# Use a descriptive name for the node:
node.name: elk-node-1# Set the bind address to a specific IP (IPv4 or IPv6):
network.host: 0.0.0.0
network.publish_host: 10.140.9.69# The default list of hosts is ["127.0.0.1", "[::1]"]
discovery.seed_hosts: ["elk-node-1", "elk-node-2"]# Bootstrap the cluster using an initial set of master-eligible nodes:
cluster.initial_master_nodes: ["elk-node-1", "elk-node-2"]# enable x-pack Monitoring
xpack.monitoring.collection.enabled: true
xpack.monitoring.history.duration: 7d

Editing the /etc/hosts file to define the private IP of all nodes.

$ sudo vim /etc/hosts10.140.9.69 elk-node-1
10.140.9.70 elk-node-2

Systemd configuration

According to Elasticsearch, system limits must be specified via systemd. The systemd service file /usr/lib/systemd/system/elasticsearch.service contains the limits that are applied by default.

To override them, run the following command to add a file /etc/systemd/system/elasticsearch.service.d/override.conf.

$ sudo systemctl edit elasticsearch
[Service]
LimitMEMLOCK=infinity

Save the file and run Elasticsearch.

$ sudo systemctl start elasticsearch.service
$ sudo systemctl enable elasticsearch.service

By default, Elasticsearch will be running on port 9200. To confirm that everything is working as expected. Using the curl request and Elasticsearch response should look something like this:

$ curl 'http://localhost:9200'
{
  "name" : "elk-node-1",
  "cluster_name" : "elk-node",
  "cluster_uuid" : "_na_",
  "version" : {
    "number" : "7.5.1",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
    "build_date" : "2019-12-16T22:57:37.835892Z",
    "build_snapshot" : false,
    "lucene_version" : "8.3.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Logstash

Installing JAVA 8 or JAVA 11

Logstash requires JAVA 8 or JAVA 11. Run the following command to install OpenJDK:

$ sudo yum install java-11-openjdk

To confirm whether the installation was successful.

$ java -version
openjdk version "11.0.5" 2019-10-15 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.5+10-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode, sharing)

Installing Logstash from the RPM repository

Create a file logstash-7.x.repo in the /etc/yum.repo.d/ directory for CentOS based distributions.

$ cat /etc/yum.repos.d/logstash-7.x.repo
[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

The repository is ready for use, to install Logstash.

$ sudo yum install logstash

Checking the Logstash is installed in yum.

$ rpm -q logstash
logstash-7.5.1-1.noarch

Configuring Logstash

I need configure a Logstash pipeline that collect logs from the Kafka, and process it, ships to Elasticsearch for indexing. Add the following in /etc/logstash/conf.d/ directory in a file with .conf suffix.

input

The Kafka broker will listen on port 9092.
The consumer_threads option sets the number of input threads.
The decorate_events default value is 1. Option to add Kafka metadata like topic, message size to the event.
The auto_offset_reset does not have a default value. The value latest will automatically reset the offset to the latest offset.

output

Using Elasticsearch output to set the destination index for each event.
The Elasticsearch listen localhost:9200.

filter

Using Grok filter plugin to parse unstructured log data into something structured and queryable.
The syntax for a grok pattern is %{SYNTAX:SEMANTIC}.
A list of Logstash supported grok patterns.
Grok Debugger and Grok Constructor are useful tools for constructing and testing grok filter on logs.
The remove_field only work when it's considered the fileter was applied successfully.

$ cat /etc/logstash/conf.d/logstash.conf
input {
  kafka {
    id => "mykafka"
    bootstrap_servers => "10.140.9.61:9092"
    topics => ["logstash","kafka"]
    consumer_threads => 2
    decorate_events => true
    codec => "json"
    auto_offset_reset => "latest"
  }
}
filter {
  if [fields][tag] == "nginx-access" {
    grok {
      match => { "message" => ["%{IPORHOST:[remote_ip]} - %{DATA:[user_name]} \[%{HTTPDATE:[time]}\] \"%{WORD:[method]} %{DATA:[url]} HTTP/%{NUMBER:[http_version]}\" %{NUMBER:[response_code]} %{NUMBER:[body_sent][bytes]} \"%{DATA:[referrer]}\" \"%{DATA:[agent]}\""] }
      remove_field => "message"
    }
  } else if [fields][tag] == "nginx-error" {
    grok {
      match => { "message" => ["%{DATA:[time]} \[%{DATA:[level]}\] %{NUMBER:[pid]}#%{NUMBER:[tid]}: (\*%{NUMBER:[connection_id]} )?%{GREEDYDATA:[message]}"] }
      remove_field => "message"
    }
  }
}
output {
  if [fields][tag] == "nginx-error" {
    elasticsearch {
      id => "nginx-error"
      hosts => "localhost:9200"
      index => "nginx-error-%{+YYYY.MM.dd}" }
  } else if [fields][tag] == "nginx-access" {
    elasticsearch {
      id => "nginx-access"
      hosts => "localhost:9200"
      index => "nginx-access-%{+YYYY.MM.dd}" }
  } else if [fields][tag] == "system" {
    elasticsearch {
      id => "system"
      hosts => "localhost:9200"
      index => "system-%{+YYYY.MM.dd}" }
  } else if [fields][tag] == "laravel" {
    elasticsearch {
      id => "laravel"
      hosts => "localhost:9200"
      index => "laravel-%{+YYYY.MM.dd}" }
  }
}

Backup the original config file (/etc/logstash/logstash.yml) and setting some basic configurations.

$ sudo cp /etc/logstash/logstash.yml /etc/logstash/logstash.yml.orig
$ sudo vim /etc/logstash/logstash.yml

Set to true to enable Elasticsearch X-Pack monitoring for Elasticsearch on the node.

xpack.monitoring.enabled: true

Save the file and run Logstash.

$ sudo systemctl start logstash.service
$ sudo systemctl enable logstash.service

Kibana

Installing from the RPM repository

Create a file kibana-7.x.repo in the /etc/yum.repo.d/ directory for CentOS based distributions.

$ cat /etc/yum.repos.d/kibana-7.x.repo
[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Installing Kibana from the kibana-7.x repository.

$ sudo yum install — enablerepo=kibana-7.x kibana

Checking the Kibana is installed in yum.

$ rpm -q kibana
kibana-7.5.1-1.x86_64

Configuring Kibana

Kibana has one configuration file at /etc/kibana/:

kibana.yml

Backup the original config file and setting some basic configurations.

$ sudo cp /etc/kibana/kibana.yml /etc/kibana/kibana.yml.orig
$ sudo vim /etc/kibana/kibana.yml# Enables you specify a file where Kibana stores log output.
logging.dest: /var/log/kibana/kibana.log# Set the value of this setting to true to suppress all logging output other than error messages.
logging.quiet: true

Create the log output path.

$ sudo mkdir /var/log/kibana
$ sudo chown kibana:kibana /var/log/kibana/

To run Kibana.

$ sudo systemctl start kibana.service
$ sudo systemctl enable kibana.service

Kibana is a web application that you access through port 5601.

Nginx

Installing and configuring Nginx

Installing Nginx.

$ sudo yum install nginx

Configuring the new Nginx configuration file at /etc/nginx/conf.d/.

$ sudo vim /etc/nginx/conf.d/elk.conf

Enter the following configuration. This configuration results in passing all requests processed in the location / to the proxied at the 127.0.0.1:5601.

upstream kibana_cluster {
  ip_hash;
  server 127.0.0.1:5601;
}upstream elasticsearch_cluster {
  ip_hash;
  server 127.0.0.1:9200;
}server {
  listen 80;
  location / {
    proxy_pass http://kibana_cluster;
    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header  Host $http_host;
  }
  
  location /els {
    proxy_pass http://elasticsearch_cluster;
    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header  Host $http_host;
  }
}

Save the file and run Nginx.

$ sudo systemctl start nginx.service
$ sudo systemctl enable nginx.service

Starting the data pipeline

To check Logstash is aggregating the data and shipping it into Elasticsearch. You should see laravel-*, system-* and nginx-error-* indices listed.

$ curl -XGET 'http://localhost:9200/_cat/indices?v'
health status index                             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .monitoring-es-7-2019.12.31       FuHUIbLZTl64uZJVyQngJQ   1   1      50087        66408     77.5mb         38.6mb
green  open   .monitoring-es-7-2019.12.30       bm38BvRVSASsBBDow4RgWA   1   1      81278        83484     64.9mb         36.2mb
green  open   .monitoring-kibana-7-2019.12.30   jOFF3QUiSkuAKFxVLCA8ag   1   1       9670            0      4.6mb          2.3mb
green  open   .monitoring-kibana-7-2019.12.31   3uBfNonqTyCwYO8vYOJz2w   1   1       4756            0      2.6mb          1.2mb
green  open   .apm-agent-configuration          I-rTKWgqSMe-Eql11Uzfqg   1   1          0            0       566b           283b
green  open   .monitoring-logstash-7-2019.12.30 pKwei8AaQfuBTq1BbfaUEw   1   1     110226            0     11.7mb          5.8mb
green  open   system-2019.12.31                 oP7LDpdxSAiiaT4zmObryQ   1   1        121            0     11.1mb          2.1mb
green  open   .monitoring-logstash-7-2019.12.31 pP5eYPvQTuu3IhVtwabheA   1   1      54346            0      6.6mb          3.3mb
green  open   system-2019.12.30                 eYs2jEV8RE6ouk4alk35_g   1   1      41012            0       38mb           19mb
green  open   .kibana_1                         KSKeVeUpQBCXbIaGz-gwmA   1   1          6            1     43.7kb         21.8kb
green  open   laravel-2019.12.31                uG8lWj27QAmP_aS5QfFLNQ   1   1          7            0    140.5kb         70.2kb
green  open   .kibana_task_manager_1            5BVg0wlpTPixahsS7m3pmA   1   1          2            1     71.1kb         35.5kb
green  open   nginx-error-2019.12.30            736xOVGpSOa474ErdHu1Kw   1   1        340            0    313.7kb        156.8kb
green  open   nginx-error-2019.12.31            BoEb2-AaTvGOg-XDSv6Y1A   1   1        202            0    317.7kb        165.2kb

Another way, you can check it on Kibana under Dev Tools.

Checking the indices on Kibana under Dev Tools

If all is working as expected, we have to follow the steps to define the index pattern in Kibana.

1. On the Management tab, click Index Patterns.

2. Click Create index pattern button. It will display the list of indices for which logs are available.

3. Enter larval* in the index pattern field and click Next step button.

4. In Configure settings, select @timestamp as the Time Filter field name. Click Create index pattern button to add the index pattern.

5. It will present a table of all fields and associated data types in the index. You can manage your index pattern the default or deleting it when you longer need it.

Kibana Searching

Navigate to the Discover tab to take a look at the data. The default the time range in Kibana is set to the last 15 minutes.

Default time range in Kibana is set to the last 15 minutes

Since version 6.2, the one way to query in Kibana was using Lucene syntax.

Since version 7.0, KQL (Kibana Query Language) is the default language for querying. If you like, you can revert to Lucene.

KQL

If using KQL without using the drop down menu and want to omit the query field, you can query term as a string. As a string "cp-order-dev-1", the query will get:

KQL with autocomplete is there to help you with selecting valid fields and values in your index pattern. Therefore, I suggest you take advantage of selecting items from these drop down: