Tools for Log Analysis

Mathew Kenny Thomas
Tensult Blogs
Published in
7 min readApr 30, 2018

This Blog has been moved from Medium to blogs.tensult.com. All the latest content will be available there. Subscribe to our newsletter to stay updated.

In a computing context, a log is an automatically produced and time-stamped documentation of events related to a particular event. All systems and applications produce log files.

Log analysis helps us in understanding what has happened and derive useful metrics in monitoring, performance, digital marketing, etc. Log analytics helps us in performing real-time analysis of large scale data and obtain insights for a wide variety of applications such as digital marketing, application monitoring, fraud detection, ad tech, IoT etc. To read more about the basics of log analytics, click here.

Centralised log management is a type of logging solutions system that consolidates all your logs and pushes it into one central, accessible and easy-to-use interface. Centralised log management makes it easy for you to collect and store your data and also manage it efficiently. It enables you to access data in near real time.

Log analysis tools are those that help in extracting the data and find useful trends in computer generated data. When faced with a difficult situation it is much easier to use a log management solution rather than going through txt files spread across your system environment. With a single query, log analysis tools can help you pinpoint the root cause of any application or software error. The same can be done for security related issues so that your team can prevent attacks even before they happen. You’ll have to choose the right log analysis tools based on your current business operation.

Some of the log analytics tools available commercially are:

  1. Splunk

Splunk is a software platform which is used to search, analyse and visualise machine generated data. The data can be gathered from websites, applications, sensors etc which make up your IT infrastructure and business. Splunk allows real time processing of data which is it’s biggest selling point. Splunk can be used to create Alerts or Event notifications depending on the state of the machine. We can create visualisations using Splunk for better representation of the data.

2. Retrace

Retrace is a simple SaaS based solution designed to be affordable for companies of all sizes. It combines several tools into one. Retrace integrates code profiling, exception tracking, application logs, and key metrics to make it easy to solve problems. Retrace uses lightweight profiling to capture critical details about what your code is doing. Get deep, code-level insights into your application’s health. It’s called Retrace because you can literally retrace what your code is doing!

3. Logentries

Logentries is a cloud-based log management platform that makes any type of computer-generated type of log data accessible to developers, IT engineers, and business analysis groups of any size. Logentries’ easy on-boarding process ensures that any business team can quickly and effectively start understanding their log data from day one. It allows real time searching and monitoring, dynamic scaling for different types and sizes of infrastructure, visual analysis of data and custom alerts of pre-defined queries.

4. Logmatic

Logmatic is an extensive logging management software that integrates seamlessly with any language or stack. Logmatic works equally well with front-end and back-end log data and provides a painless online dashboard for tapping into valuable insights and facts of what is happening within your server environment. Custom parsing rules let you weed through tons of complicated data to find patterns. Powerful algorithm for pinpointing logs back to their origin.

5. Sumo Logic

Sumo Logic is a secure, cloud-native, machine data analytics service, delivering real-time, continuous intelligence from structured, semi-structured and unstructured data across the entire application lifecycle and stack. It can analyze your data in real-time using machine-learning, Sumo Logic can quickly depict the root cause of any particular error or event, and it can be setup to be constantly on guard as to what is happening to your apps in real-time. Sumo Logic’s strong point is its ability to work with data at a rapid pace, removing the need for external data analysis and management tools.

Some of the open source log analysis tools are as follows

  1. Graylog

Graylog is a free and open-source log management platform that supports in-depth log collection and analysis. Used by teams in Network Security, IT Ops and DevOps, you can count on Graylog’s ability to discern any potential risks to security, lets you follow compliance rules, and helps to understand the root cause of any particular error or problem that your apps are experiencing. Graylog leverages three common technologies to do its magic, two of which are major open source items: Java 7, Elasticsearch and MongoDB.

2. Logstash

Logstash from Elasticsearch is one of the most renowned open-source projects for managing, processing and transporting your log data and events. Logstash works as a data processor that can combine and transform data from multiple sources at the same time, then send it over to your favourite log management platform, such as Elasticsearch. It allow real-time data parsing and to create structure from unstructured data.

3. GoAccess

GoAccess is a real-time log analyser software intended to be run through the terminal of Unix systems, or through the browser. It provides a rapid logging environment where data can be displayed within milliseconds of it being stored on the server. GoAccess was designed to be a fast, terminal-based log analyser. Its core idea is to quickly analyse and view web server statistics in real time without needing to use your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly.

4. Logz.io

Logz.io provides a cloud-based log analysis service which is based on the open source log analysis platform — the ELK Stack (Elasticsearch, Logstash, Kibana). This environment provides a real-time insight of any log data that you’re trying to analyse or understand. The features provided by Logz.io include: alerting, user control, parsing services, support, integrations, and audit trail. Logz.io uses machine-learning and predictive analytics to simplify the process of finding critical events and data generated by logs from apps, servers, and network environments.

5. Fluentd

Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data. Fluentd collects events from various data sources and writes them to files, RDBMS, NoSQL, IaaS, SaaS, Hadoop and so on. Fluentd helps you unify your logging infrastructure. Fluentd’s flagship feature is an extensive library of plugins which provide extended support and functionality for anything related to log and data management within a concise developer environment.

If you are interested to see the comparison between two of the above mentioned and popular open source log analytics tools, Logstash and Fluentd, click here. To see a sample configuration for centralisation of logs using Filebeat and Logstash click here.

Alternatively you can use a set of simple, managed AWS services to perform serverless log analytics. Amazon Kinesis makes it easy to collect, process and analyse real-time, streaming data so you can get timely insights. Amazon Kinesis allows you to process and analyse data as it comes in instead of having to wait till the entire data is collected before processing can begin. The Amazon Kinesis platform includes the following managed services:

  • Amazon Kinesis Streams — allows you to collect, store and process data.
  • Amazon Kinesis Firehose — loads the streaming data in to Amazon Kinesis Analytic, Amazon S3, Amazon Redshift or Amazon Elasticsearch Service.
  • Amazon Kinesis Analytics — helps in real time monitoring of the streamed data by analysing it using SQL queries.
Source: Amazon Web Services

The above chart shows how such a solutions can work. The application nodes run Apache applications and writes the Apache logs locally to the disc. The Amazon Kinesis agent on theEC2 instance ingests the log into the Amazon Kinesis stream. The log input stream from various application nodes are ingested in to the Amazon Kinesis stream. Machine metadata about the application or the machine is stored in a S3 bucket. The analytics application adds referenced machine metadata from S3 and processes the streaming logs over tumbling windows. The Lambda function publishes the aggregated response from destination stream. The Amazon CloudWatch dashboard is used to view trends in data. Alarms are generated when specific conditions are met. To know more about how to send Linux logs to CloudWatch click here, or for Windows logs to CloudWatch click here. If your are interested to capture user logs, you can take a look at some of the configurations for the same here.

Another alternative is Amazon Elastic MapReduce (EMR). Amazon EMR is based on Hadoop, a Java-based programming framework. It supports large data set processing in a distributed computing environment. Amazon EMR processes big data across a Hadoop cluster of virtual servers on Amazon EC2 and Amazon S3. The elastic in EMR’s name refers to it’s dynamic resizing ability (can ramp up or reduce resource use depending on demand at any point in time).

Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. It has quickly become the most popular search engine, and is commonly used for log analytics, full-text search, and operational intelligence use cases. Kibana is an open source tool used for data visualization and exploration. It is used for log and time series analytics, application monitoring and operational intelligence use cases. It provides integration with Elasticsearch which makes it the default choice for visualizing data present in Elasticsearch.

--

--