Log collectors — How to set them up

Eni Sinanaj
Jan 17 · 8 min read

Requirements

All the log collectors tried below will be using Elasticsearch as the backend system and Kibana as it’s web interface.

Installed Elasticsearch and Kibana

Elasticsearch as a storage for the logs and Kibana as the Elasticsearch user interface. We’re going to need them as they come so just download from the official website.

Let’s just run them both with the following commands. (*nix/macos)

Fluentd

This tool comes with a service that needs to be installed in the system; td-agent.

td-agent is a tool that collects the logs and conveys them to a storage system, in this case Elasticsearch.
I was able to set it up and use as inputs the system log and a rest API that fluent listens to.

To make it work with Elasticsearch I had to install a plugin used to connect the two tools together.
Fluentd installation guide is very straightforward.

I can send logs to Fluentd and visualise them in Elasticsearch in the following ways:

the configuration for this listener in the Fluentd is done as follows in /etc/td-agent/td-agent.conf

An input is configured:

And then a match which is used to send data to Elasticsearch

To send a system log to Fluentd just run:

Also in this case there is an input section and a match section in the configuration file.

With Fluentd filters can be used to transform the logs in input before saving them. There are not a lot of predefined filters but new plugins can be developed to extend Fluentd.

Output sources:

For integration with Elasticsearch

Bottomline

Documentation: Poor
Community: Poor
Distributed extensions
Personally had some issues setting it up; not straightforward.
fluentd-ui helps a bit with the setup.

Graylog

Prerequisites:

  • Java (>= 8)
  • MongoDB (>= 2.4)
  • Elasticsearch (>= 2.x)

Installation is somewhat tricky by just following the documentaion. A few missing steps in the documentation led to erros when trying to set up the system.

It was necessary to insert a password secret and a sha2 hashed password for the root admin in the configuration file.

Configuration file: /etc/graylog/server/server.conf
Web UI reachable from the browser at: http://localhost:9000

This solution needs Mongodb and Elasticsearch as specified also above but it doesn’t need Kibana to visualise the data, because it offers it’s own UI.

From the web interface it’s possible to fully manage the Graylog system.

  • Manage input sources
  • Manage extractors (methods to transform, show and extract messages)
  • Configure alerts

All configurations are done directly through the web interface; input sources, transformations, outputs, alerts and so on. It is a full- stack environment.
Pipelines can be created to build a chain of transformations to log messages.

Plugins can be written to customly handle

  • Inputs: Accept/write any messages into Graylog
  • Outputs: Forward ingested messages to other systems as they are processed
  • Services: Run at startup and able to implement any functionalities
  • Alert Conditions: Decide whether an alert will be triggered depending on a condition
  • Alert Notifications: Called when a stream alert condition has been triggered
  • Processors: Transform/drop incoming messages (can create multiple new messages)
  • Filters: (Deprecated) Transform/drop incoming messages during processing
  • REST API Resources: An HTTP resource exposed as part of the Graylog REST API
  • Periodical: Called at periodical intervals during server runtime
  • Decorators: Used during search time to modify the presentation of messages
  • Authentication Realms: Allowing to implement different authentication mechanisms (like single sign-on or 2FA)

Nice overall impression. It comes with many tools out of the box but has to be configured.

Bottomline

Documentation: Tricky
Overall impression: fully fledged
Offers a great tool for managing logs. In fact is considered a log manager rather than a log collector

Logstash

Installation and setup is easy by following through the documentation provided online.

Can be integrated with different input sources out of the box and accepts complex filters to manage and extract data from messages. Grok patterns can be used, like in Graylog, to extract data from log messages. Messages are sent to Elasticsearch and visualized with Kibana.

Easy to configure but doesn’t offer a UI tool for the configuration but a config file (or set of files).

{put here own logstash configuration … }

Bottomline

Documentation: Tricky
Considered one of the two most popular options (the other being fluentd)
Beautifully integrated with elasticsearch, kibana and elastic beats
Overall impression: works pretty nice especially with filebeat
Must enable the option to have a persistent layer of events because the fixed size of the event queue that can handle only 20 events might not be sufficient.

Beats

Filebeat, Heartbeat, Metricbeat, Auditbeat

Filebeat

Is a nice system that reads from many different sources at the same time and feeds one or more outputs. I tried it with Logstash, made it read all the log files in

- /var/log/*.log

and it seems to handle the stream very well.

Run

Outputting the FileBeat stream directly to Elasticsearch works as well but the data, although already a bit structured, cannot be transformed and extracted as it can through Logstash Filters.

Filebeat ensures at-least-once-delivery for messages to the defined output source.

Documentation: Good

Heartbeat

It’s an elastic beat exclusively designed to monitor uptime. It continuously pings the configured web resources through http. The preconfigured setup monitors a local Elasticsearch installation at http://127.0.0.1:9200.

Configuration: Easy
Documentation: Trivial

Metricbeat

Collect metrics from your systems and services. From CPU to memory, Redis to NGINX, and much more, Metricbeat is a lightweight way to send system and service statistics. Kind of the same reporting/monitoring that Icinga does. We’ll get to it in a bit.

Configuration: Easy
Documentation: Trivial

Auditbeat

Collecting audit Linux data and monitoring file integrity through the system. Comes along with Kibana predefined dashboards as do other beats.

To install the available Kibana dashboards, ensure that Kibana is up and running and then run the following:

Same command to install the available dashboards is available for all beats. Generally every Beats comes with predefined dashboards that visualise log data beautifully.

To run Auditbeat:

Bottomline

Configuration: Easy
Documentation: Trivial
Beautiful integration with kibana and the predefined dashboards and visualisations are awesome
Only downside is that I didn’t figure out how to use the predefined dashboards if I make the logs go through logstash first

Icinga2

Icinga 2 is an open source monitoring system which checks the availability of your network resources, notifies users of outages and generates performance data for reporting.

Prerequisites:

Installation is easy. It’s just enough to add the package repository for it and then install the icinga2 package. It’s not that immediate to make it work right away because there are some external plugins that need to be installed as well.

Icinga is used to check Hosts and Services so it’s a service monitoring system.

Configuration seems simple but there are too manz files and the documentations desn’t explain completely how to set up the environment.

Offers an API interface for configuration but no UI out of the box.
It can be integrated with Elasticsearch / Logstash through an ElasticBeets component.

The UI can be installed but needs:

  • mysql
  • apache2 (web server)
  • php

It can be reached at: http://localhost/icingaweb2
It has a lot of configuration files and a lot of checkers.

Bottomline

Documentation: Somewhat incomplete
Fork from nagios. The out-of-the-box configuration gives a really great system monitoring service but it?s not a log collector. It’s a monitoring system that can be used to check and keep under surveillance the state of different network objects


I also created a simple Java application and a simple nodejs application. Both applications are just producing logs of different levels continuously.

run with gradle clean run

In the java app I’ve configured with logback different appenders:

  • Log file
  • Log file with logs formatted in JSON
  • UDP Gelf formatted logs sent to Graylog
  • Logs sent to Logstash

I configured Logstash to read the log files. I made the first one pass through some filters using a GROK pattern I defined. For the second one I just defined it as a JSON type of log. Logstash manages to identify the fields by itself this way. Just to mix things up, the Graylog server stores log messages on my Elasticsearch instance.

So basically, from the same logs produced by my java application and outputted in 4 different ways, I end up with 4 indexes on Elasticsearch that have the same data. This to compare the differences of the various methods.

Opinion first: JSON formatted logs where better structured. GROK parsing eventually ends up in failures every now and again.

I created a dashboard displaying the same data for the four indexes put together. Basically the charts I created shows the count for each error level logged.

Because of the GROK failures some were missing in the plain text log file case. In the others was easier to play around.

Eni Sinanaj

Written by

#dev #mobiledev #startup #entrepreneur #business #money #excess #earth #motivational #speaker #hype #manager #startup #consulting

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade