Logging: The ‘Root’ of All Debugging Adventures

What is logging? Logging plays a critical role in gaining visibility into system behaviour, aiding in debugging and resolving issues effectively.

Published in

Cloud Native Daily

9 min readJun 21, 2023

I am writing this article with the aim to provide a comprehensive understanding of logging in application services. It is tailored for beginners and freshers who are embarking on their journey into the world of software engineering.

Logging plays a critical role in gaining visibility into system behaviour, aiding in debugging and resolving issues effectively.

Metrics & Logs

Metrics and Logs both help in adding visibility to our system.

Metrics provide aggregated numerical data that can be used for statistical analysis, trend analysis, and performance monitoring. They offer a higher-level overview and can be useful for generating Visualisations, Dashboards, and Automated reporting.

Some of the common examples of the types of metrics that can be collected are — response time and error rates.

However, there are scenarios where metrics are not helpful -

metrics are enough to detect that something is wrong with the system. But it is difficult (not impossible if metrics are good enough) to find out exactly what happened with just metrics.

Where Metrics are not helpful?

Using metrics, we cannot track what exactly happened with 1 individual request.
If there is only a slight anomaly (only for a few requests), it would be difficult for us as well as the metric monitoring system to detect the anomaly.

This is where logs come in.

Logs offer a straightforward approach to understanding system behaviour. They allow us to debug, trace, and reproduce issues effectively.

What to Log?

To make logs useful for debugging purposes, it is crucial to pay special attention to the content of log entries. Logs should provide a clear picture of what is happening inside the application or system. Here are some guidelines for effective logging:

Avoid Excessive or Insufficient Logging: Both excessive and insufficient logging can be problematic. Excessive logging leads to performance overhead, increased serialization costs, and infrastructure requirements. On the other hand, insufficient logging may not provide enough information to diagnose issues effectively.
Link Logs to Request Identifiers: Each log entry should be linked to a unique request identifier, such as a transaction ID, order ID, account ID, device ID, or trace ID (UUID). This correlation enables tracing a specific request’s journey through the system.
Consistent Log Format: It is essential to maintain consistency in log formats across applications or services within an organization. This consistency facilitates log analysis and correlation across different components.
Log Content Clarity: Log content should make sense to both the log producers and consumers. It should contain actionable information that helps in identifying and resolving issues. Avoid logging any sensitive information to ensure data privacy and security.

Some examples of bad logs -

log.info("fetching from db");
log.error("error getting from db");
s.SharedHolder.Logger.Debugf("Processing search request");

It is important to take note of the above points as there are multiple costs that are associated with logging.

IO Cost
Serialisation cost
Cost of Infra to process and persist the log (save as well as a query)

IO Cost: IO cost refers to the performance overhead associated with reading from or writing to external storage devices, such as hard drives or solid-state drives (SSDs).

Each log entry requires a separate write operation to the storage device. Frequent disk IO operations can lead to performance bottlenecks, especially when dealing with high-volume logging.

To reduce IO costs, various strategies can be employed, such as:

Buffering: Collect log entries in memory and write them to disk in batches, reducing the number of individual IO operations.
Compression: Compress log data before writing to disk to reduce the amount of data that needs to be written.
Asynchronous writing: Perform logging operations asynchronously, allowing the application to continue its execution without waiting for the log write operations to complete.

Serialisation Cost: Serialisation cost refers to the overhead associated with converting data structures into a format suitable for storage or transmission. When logging, serialisation is typically required to convert complex data types (such as objects or structures) into a loggable format, such as text or binary.

The serialisation cost can depend on factors such as the complexity of the data structure and the chosen serialisation mechanism.

Some commonly used serialisation formats include JSON, XML, and Protocol Buffers. Each serialisation format has its own trade-offs in terms of readability, size, and processing overhead.

To minimise serialisation cost, you can consider the following:

Optimise data structures: Design data structures for efficient serialisation. Avoid unnecessary nesting or complex object hierarchies that require extensive serialisation processing.
Choose efficient serialisation libraries: Different serialisation libraries may have varying levels of performance. Consider using libraries or frameworks that provide fast and efficient serialisation capabilities.
Use binary formats: Binary serialisation formats often result in smaller payloads and faster serialisation/deserialisation compared to text-based formats like JSON or XML.

Log Levels

Log levels help us add context to all the logs and provide us with a way to categorise and prioritise log messages based on their severity or importance.

Granular Control
Production Monitoring

ERROR: Indicates a critical error that breaks the system and requires manual intervention. It represents unexpected and unhandled situations.

WARN: Indicates a condition that may be autocorrecting or requires attention but is not critical. It signifies unexpected scenarios that are gracefully handled.

INFO: Provides information about the system flow. Generally, it is not enabled in production unless required for audit purposes.

DEBUG: Includes detailed logs required for debugging issues in a production environment. It is enabled selectively for troubleshooting purposes.

When in doubt between log levels, choose the higher priority level

Log Levels — QUIZ

Every quiz scenario has 4 possible answers — DEBUG, INFO, WARN, ERROR

An application is starting up and initialising its components. Which log level would you use to indicate the successful initialisation of each component?
An unexpected but recoverable condition occurred during the execution of a function. Which log level would you use to provide information about the condition?
An error occurred that prevents the application from functioning correctly. Which log level would you use to indicate this critical error?
An exceptional situation occurred that requires immediate attention. Which log level would you use to indicate this severe error?
The application encountered an unexpected input but was able to recover without any impact on its functionality. Which log level would you use to provide information about this incident?
An authentication request was received with incorrect credentials. Which log level would you use to indicate this authentication failure?
A database connection could not be established. Which log level would you use to indicate this connection failure?
A user performed an action that is not recommended but doesn’t necessarily result in an error. Which log level would you use to provide information about this action?
An unexpected exception was caught during the execution of a critical function. Which log level would you use to indicate this exceptional condition?
A scheduled task or job is completed successfully. Which log level would you use to indicate the successful completion?

Answers at end of the article*

ELK Stack

The ELK Stack (Elasticsearch, Logstash, and Kibana) is a popular logging and log analysis solution.

Filebeat

Filebeat is an open-source lightweight shipper for logs. On Each Service, we install a filebeat agent which keeps on monitoring that folder for any file changes. In case of any file change, it sends a message to Kafka.

We need to configure the filebeat inputs to monitor the desired log folder(s) and also configure the topic to which the log needs to be sent.

Different Kafka topics for each service is recommended to handle varying log generation rates and prevent starvation.

Rate of log generation is different for each service. So with granularity over topics, we can decide the number of partitions, replication factor etc.
If for any reason, a service starts generating a huge amount of logs, the other services might get starved.

Sample Filebeat config

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /home/ubuntu/projects/**/logs/*.log
  fields_under_root: false
  tail_files: true
  exclude_files: ['^*newrelic']
  multiline.type: pattern
  multiline.pattern: '(?i)^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^com\.|^net\.|^org\.|^io\.|^id\.'
  multiline.negate: false
  multiline.match: after


output.kafka:
  hosts: [""]
  topic: "logs-%{[fields.servicename]}"

processors:
  - script:
      lang: javascript
      source: >
        function process(ev) {
            var field;
            field = ev.Get("log.file.path");
            var serviceName = (field.split("/")[4] + "").toLowerCase();
            ev.Put("fields.servicename", serviceName);
            return ev;
        };

Kafka

Kafka is a distributed streaming platform that acts as a message broker between Filebeat and Logstash.

To learn more about Kafka, check out these 2 other articles that I’ve written earlier — Apache Kafka & Replication in Kafka

Logstash

Logstash is responsible for log ingestion, processing, and enrichment.

Logstash has various plugins that allow it to push the processed logs. Generally, as part of ELK, the output is the Elastic search index, from where it can be read by Kibana.

We have multiple logstash instances that are part of the same consumer group & each logstash instance listens to all the Kafka topics. This is to ensure that, all the messages get equally distributed across consumers and no logstash is sitting idle.

All the logs are parsed in logstash to see if we can get any usable fields like — log level, threadId, traceId, className etc.
Logstash uses grok to find if the log matches the pattern and filter out fields

Logstash has 3 phases:

Input: Here we configure the inputs for the logstash. In our case, the input is Kafka, so we need to mention the Kafka bootstrap servers and the topics
Filter: In this phase, we do grok match, add/remove fields
Output: here we configure the logstash output, which in our case is Elasticsearch

input {
  kafka{
    codec => json
    bootstrap_servers => ""
    topics_pattern => ".*"
    decorate_events => true
    consumer_threads => 128
  }
}
filter {
    grok {
      match => [ "message", "\[%{LOGLEVEL:level}\] %{TIMESTAMP_ISO8601:logTime} \[%{DATA:threadId}\] %{JAVACLASS:className} "]
    }
    mutate {
      add_field => {
        "topic" => "%{[@metadata][kafka][topic]}"
      }
      add_field => {
        "logpath" => "%{[log][file][path]}"
      }
      remove_field => ["input", "agent", "ecs", "log", "event", "uuid","tags"]
    }

    uuid {
       target    => "uuid"
    }
}
output {
  elasticsearch {
      hosts => []
      index => "%{[topic]}-%{+YYYY.MM.dd}"
      pool_max_per_route => 100
  }
}

Index Name for the Elasticsearch: topicName-YYYY.MM.dd

Elasticsearch

Elasticsearch is a distributed search and analytics engine that stores and indexes logs. Some key points about Elasticsearch configuration:

Kibana

Kibana is a web-based user interface for visualising and exploring logs. Some key points about Kibana configuration:

Connect Kibana to Elasticsearch to enable log visualisation and querying.
Create custom dashboards and visualisations to gain insights into system behavior.
Configure index patterns to define how Kibana interprets log data fields.

Takeaways!

Logging is an integral part of application services and provides valuable insights into system behaviour. Metrics and logs complement each other, and a well-designed logging strategy enables effective debugging, issue resolution, and system monitoring.

By following best practices and leveraging log analysis tools like the ELK Stack, developers and operators can gain deep visibility into their systems and deliver more reliable and performant applications.

If you enjoyed this article and found it valuable, please show your support by liking, sharing, and following me on Medium and Linkedin.

You can follow me on LinkedIn by clicking here

Your likes keep me motivated to write more while sharing helps us reach a wider audience. By following, you’ll stay updated with the latest content.

Thank you for your support. Like, Share, and Follow for more insightful articles like this!

Peace out✌️

Quiz Answers

INFO, WARNING, ERROR, ERROR, WARNING, WARNING, ERROR, WARNING, ERROR, INFO

Kafka monitoring: Message brokers observability and troubleshooting

Message brokers enable microservices scalability but monitoring is a challenge. Here are 3 solutions: Redpanda, Kafka…

gethelios.dev

OpenTelemetry: A full guide

Learn all about OpenTelemetry OpenSource and how it transforms microservices observability and troubleshooting