LOG Centralization: Using Filebeat and Logstash

Mohamed Jawad P
6 min readApr 24, 2018

--

Logs give information about system behavior. In every service, there will be logs with different content and a different format. The differences between the log format are that it depends on the nature of the services. For Example, the log generated by a web server and a normal user or by the system logs will be entirely different. So the logs will vary depending on the content.

The logs are a very important factor for troubleshooting and security purpose. By analyzing the logs we will get a good knowledge of the working of the system as well as the reason for disaster if occurred.

The common use case of the log analysis is: debugging, performance analysis, security analysis, predictive analysis, IoT and logging. Log analysis helps to capture the application information and time of the service, which can be easy to analyze. Logs also carry timestamp information, which will provide the behavior of the system over time.

The logs are generated in different files as per the services. For example: if the webserver logs will contain on apache.log file, auth.log contains authentication logs. If we had 100 or 1000 systems in our company and if something went wrong we will have to check every system to troubleshoot the issue. In case, we had 10,000 systems then, it’s pretty difficult to manage that, right?

Now let’s suppose if all the logs are taken from every system and put in a single system or server with their time, date, and hostname. It will pretty easy to troubleshoot and analyze. That’s the power of the centralizing the logs.

Here I am using 3 VM’s/instances to demonstrate the centralization of logs. The architecture is mentioned below:

In VM 1 and 2, I have installed Web server and filebeat and In VM 3 logstash was installed.

Filebeat: Filebeat is a log data shipper for local files. Filebeat agent will be installed on the server, which needs to monitor, and filebeat monitors all the logs in the log directory and forwards to Logstash. Filebeat works based on two components: prospectors/inputs and harvesters.

Inputs are responsible for managing the harvesters and finding all sources from which it needs to read. Harvesters will read each file line by line, and sends the content to the output and also the harvester is responsible for opening and closing of the file.

Logstash: Logstash is used to collect the data from disparate sources and normalize the data into the destination of your choice. It can extend well beyond that use case. Any type of event can be modified and transformed with a broad array of input, filter and output plugins. Input generates the events, filters modify them, and output ships them elsewhere. Here we are shipping to a file with hostname and timestamp.

Before getting started the configuration, here I am using Ubuntu 16.04 in all the instances.

VM1 & VM2 [Filebeat Setup]

  1. install webserver
sudo apt-get install apache2

2. Start the service

sudo service apache2 start

3. Download the filebeat using curl

curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-amd64.deb

4. install filebeat using dpkg

sudo dpkg -i filebeat-6.2.4-amd64.deb

5. Configure the filebeat configuration file to ship the logs to logstash. for that Edit /etc/filebeat/filebeat.yml file

filebeat.prospectors:
# Each — is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.
- type: log
# Change to true to enable this prospector configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /var/log/*.log
- /var/log/*/*.log
..
..
..
..
# — — — — — — — — — — Logstash output — — — — — — — — — —
output.logstash:
# The Logstash hosts
hosts: ["192.168.xx.xx:5044"]

Here filebeat will ship all the logs inside the /var/log/ to logstash

make # for all other outputs and in the host’s field, specify the IP address of the logstash VM

6. Start the filebeat service

service filebeat start

7. There are some modules for certain applications, for example, Apache, MySQL, etc .. it contains /etc/filebeat/modules.d/ to enable it

./filebeat modules enable apache2

8. (optional) To run the filebeat

./filebeat -e

or

./filebeat -e -c filebeat.yml -d "publish"

VM3 [Logstash Setup]

For the installation of logstash, we require java

  1. install Java version 1.8. 0
  2. Download and install the Public Signing Key
wget -qO — https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

3. You may need to install the apt-transport-https package on Debian for https repository URIs

sudo apt-get install apt-transport-https

4. Save the repository definition to /etc/apt/sources.list.d/elastic-6.x.list:

echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

5. Run Sudo apt-get update and the repository is ready for use. You can install it with:

sudo apt-get update && sudo apt-get install logstash

6. Configure logstash for capturing filebeat output, for that create a pipeline and insert the input, filter, and output plugin.

Create a pipeline — logstash.conf in home directory of logstash, Here am using ubuntu so am creating logstash.conf in /usr/share/logstash/ directory

# INPUT HERE
input {
beats {
port => 5044
}
}
# FILTER HERE
filter{
grok {
match => { “message” => ‘^%{NOTSPACE:VirtualHost}’}
}
}
# OUTPUT HERE
output {
stdout { codec => rubydebug}
}

Here we will get all the logs from both the VM’s. It is very difficult to differentiate and analyze it. So, depending on services we need to make a different file with its tag. For this, I am using apache logs.

So create a apache.conf in /usr/share/logstash/ directory

# INPUT HERE
input {
beats {
port => 5044
}
}
# FILTER HEREfilter{
if [source]=="/var/log/apache2/error.log"
{
mutate {
remove_tag => [ "beats_input_codec_plain_applied" ]
add_tag => [ "apache_logs" ]
}
}
if [source]=="/var/log/apache2/access.log"
{
mutate {
remove_tag => [ "beats_input_codec_plain_applied" ]
add_tag => [ "apache_logs" ]
}
}
}
#OUTPUT HERE

To getting normal output, Add this at output plugin

output {
stdout { codec => rubydebug }
}
it shows standard output with “apache_log” tag (of VM 1)
it shows standard output with “apache_log” tag (of VM

To make the logs in a different file with instance id and timestamp:

# OUTPUT HERE
output {
if "apache_logs" in [tags] {
file {
path => "/home/ubuntu/apache/%{+YYYY}/%{+MM}/%{+dd}/%{+HH}/apache-%{host}-%{+YYYY-MM-dd-HH-zz}.log"
codec => “json”
}
}
}
saving the logs to the file with instance id and timestamp
saving the logs to the file with instance id and timestamp

7. To verify your configuration, run the following command:

bin/logstash -f apache.conf — config.test_and_exit

8. If the configuration file passes the configuration test, start Logstash with the following command:

bin/logstash -f apache.conf — config.reload.automatic

9. To run the logstash

./bin/logstash -f apache.conf

NOTE: You can create multiple pipeline and configure in a /etc/logstash/pipeline.yml file and run it.

--

--

Mohamed Jawad P

AWS | AZURE | DEVOPS | MIGRATION | KUBERNETES | DOCKER | JENKINS | CI/CD | TERRAFORM | ANSIBLE | LINUX | NETWORKING