Log Analytics — How to Log Smart — Elastic Stack

6 min readOct 17, 2020

Every system generates logs whenever an event occurs, we can control the frequency, format and amount of information. In order to make use of all this information and utilize it to achieve the desired results, you need a support system to:

collect all the data
extract and format the relevant information out of it
insert it into a big data system
fetch and consume the collected information, easily and fast
analyze it in real time

What is a Log ?

Log is information, it can be in any kind or format. It mainly consist of 2 things:

Log = TimeStamp + Data

Types of Logs

Each system/application may have multiple sources of information. But almost every system has at least following:

Application Error Logs — These logs are generated every time there is an handled or unhandled exception or error condition occurred in the application. This usually gives the low level and detail information on why, when and where the exception has happened.

Business Error Logs — These logs are also generated by the application but they are not exceptions. In your application whenever an expected business behavior is compromised or failed, you can log that event.

Example: User entered invalid password to login to the system. It is not an exception but a business error. Some developers may argue that it is a business information, instead of error. It doesn’t matter as long as you are recording and using this information.

Application Information Logs — It can be any type of information related to application that can be used to improve and optimize the system.

Example: In a banking system, the application owner may log every event of users. All pages visited by a user after login to the system, the order of page visit, how many clicks, IP address or location of login, the device type used to the system.

Example: We can capture and log the information of how a new users came to our website, via google/facebook/twitter etc.

Business Information Logs — This information is related to business operations and it can be helpful in understanding the user behavior. Capturing critical events or milestones can be helpful in predictive analysis. You can identify and analyze patterns that results in historical and transactional data, these patterns can be utilized to identify potential opportunities, as well as risks for the future.

Network Logs — It depends on the skills of the team and architecture of the application. It may consist of:

If you have multiple domain, then you can capture and log, count and location of each domain visit
Log all API calls/requests
Webserver internal logs — Nginx/Apache etc
Load Balancer internal logs
Health Check logs of network services

System/Server Logs — These are the logs from hardware and core services:

Metrics of all servers that gives real time information of

Resources used and available
CPU usage
Memory usage

Internal logs of database servers

SQL Server
Mongo
PostgreSQL

Health check logs of all servers
Number of requests handled by each server
Linux server logs
Windows application and security logs
AWS CloudWatch logs or similar from Azure/GCP

Challenges with Logs Management

We know that amount of logs can range from few to several millions every day. Logs are data rich and can be used in a wide variety of use cases. However, they come with their own set of challenges:

Format is not consistent — Every system, server and software involved in an application may generate logs in different format. It would be very difficult and require expertise in understanding the formats and information hidden in those logs, especially the ones which are not generated from code/application.

Logs are Decentralized — A typical mid-size web application may consist of few API servers, few database servers, web servers, load balancers. All of them will generate logs in different format and in separate physical locations/local storage. In order to study them together and fetch a meaningful information out of them, there is a need of centralized log management system.

Time format and zone is not consistent — With the advent of cloud computing and distributed computing, the resources supporting an application are spread across multiple physical location and time zones.

Some system/servers are smart enough to generate logs in local timezone and format, but others may generate in UTC timezone and universal datetime format. Correlating all these logs across multiple system at the same time can be a daunting task.

Logs are Unstructured — Log data is unstructured and thus it becomes even more difficult to perform analysis on it directly. So, it is very important to perform low level processing and transformation to convert raw logs data into the right structure that is easy to store and fetch, also efficient in terms of storage space and processing power it consumes.

Application of Logs

Logs contains rich information about the following and can be used in several ways:

State and behavior of a system
Behavior of the users
Ecosystem it is running inside
Predictive analytics
Frequency of events
Troubleshooting
Auditing
Health Check, Alerts and Notifications

Practical Application scenarios

Use-Case — Capture and study user behavior, time spent on website, physical location of user, device type used. Using this information business can take following decision:

Whether to invest more time and resources in IOS/Andrioid app or desktop website
Marketing team can use above information to study the trend and do focused marketing

Use-Case — Analyze the spike in CPU and Memory usage of the servers. It can help in making following decisions:

use IaC(Infrastructure as Code) to upgrade or downgrade servers in cloud to save cost, as well as provide smooth user experience.
Review the applications and services running on the server, that has spike. They may need improvement in code or configuration.
Spike may be because of high number of online users or traffic.

Use-Case — Keep a real time health check by integrating a monitoring and notification support system.

Can be used to send notification on Slack, Cellphone, Email etc., whenver there is a spike in error logs. In my project we send notification whenever there are more then 10 error logs in last 5 minutes.
Send a notification if any service or server went down.

Use-Case — Tracking the Login events

Create alerts and notifications when there are too many unsuccessful login attempts. It may be an organized attack on the system.
Too many logins from the system users, it can be an indication that there might be some problem with active session of the system users.

Next Article in this series

I will explain in detail how I am using Elastic Stack along with Redis/Kafka/RabbitMQ, Grafan, Nagios, PagesDuty, OpsGenie, Slack to achieve everything we discussed above.

Please don’t forget to clap if you like the article.

C# — Async Pipeline Action/Transform/Buffer blocks

I wonder if you guys ever struggled with processing time it takes to fetch and process data.

medium.com

How to Create REDIS Sharded Cluster(with Replicas) in AWS/Docker

In this article I will discuss couple of advanced distributed system features that can improve robustness and…

medium.com

C# — Async Pipeline using Observer Design Pattern

Welcome to the second part of parallel processing. In this article, I will try to explain the Observer pattern. What…

medium.com

Create SignalR & RabbitMQ with .Net Core ReactJS

ASP.NET Core SignalR is an open-source library that simplifies adding real-time web functionality to apps. Real-time…

medium.com

How to create RabbitMQ cluster in Docker/AWS Linux

RabbitMQ is the most widely deployed open source message broker.

medium.com

Log Analytics — How to Log Smart — Elastic Stack

C# — Async Pipeline Action/Transform/Buffer blocks

I wonder if you guys ever struggled with processing time it takes to fetch and process data.

How to Create REDIS Sharded Cluster(with Replicas) in AWS/Docker

In this article I will discuss couple of advanced distributed system features that can improve robustness and…

C# — Async Pipeline using Observer Design Pattern

Welcome to the second part of parallel processing. In this article, I will try to explain the Observer pattern. What…

Create SignalR & RabbitMQ with .Net Core ReactJS

ASP.NET Core SignalR is an open-source library that simplifies adding real-time web functionality to apps. Real-time…

How to create RabbitMQ cluster in Docker/AWS Linux

RabbitMQ is the most widely deployed open source message broker.

Written by Saurabh Singh