Python logs — a json’s journey to Elasticsearch

Logging in json format and visualizing it using Kibana

Yonahdissen

Published in

DevOps Dudes

4 min readJun 22, 2020

What is Logging?

Logging is the output of your system. It let’s you know when something goes wrong with your system and it is not working as expected. It might have a problem connecting to the database or may just be throwing an exception which you haven’t thought of.

In addition the log shows the system is working properly. It shows that the systems functionalities are working and everything is working as expected. Both parts enabling the developers and maintainers to easily debug.

The stack

In this case we’ll be using EFK(Elastic Fluentd and Kibana) Fluentd being the agent forwarding the logs to elastic and Kibana the visualizer.

The application will be a python app. All of the the components can be run in Docker containers but that isn’t a must.

Logging in Python

Python has a built in library called logging, this library is very simple to use. You can choose the level of logging the format of the log and the handler(which is essentially where the logs will go to e.g. a file).

A code snippet like this:

import logging


def main():
    logging.basicConfig(level="INFO")
    logging.info("Hey I'm a logger!")
    
    
if __name__ == '__main__':
    main()

Will produce this:

INFO:root:Hey I'm a logger!

This log is nice and handy, but when you want to query this kind of data, you will have to use regex’s in order to find specific data.

For example a log like this:

INFO:root:Sending Email to username: 'Jack Sparrow' regarding server_ip: '192.168.1.2'

When you want to get all the server_ip’s you need to parse the data, which is not a fun activity.

Json to the rescue!

import logging


def main():
    logging.info("Sending Email to username: 'Jack Sparrow'                                    regarding server_ip: '192.168.1.2'")


if __name__ == '__main__':
    logging.basicConfig(level="INFO")
    logging.info("Creating handler")
    root = logging.getLogger()
    hdlr = root.handlers[0]
    json_format = logging.Formatter('{"time": "%(asctime)s", "level": "%(levelname)s", "message": "%(message)s"}')
    hdlr.setFormatter(json_format)
    main()

We create a logging handler naturally by logging(“creating handler”).
We set a format for the log in json format.

Now the output will be in json format:

{"time": "2020-06-22 15:13:21,300", "level": "INFO", "message": "Sending Email to username: 'Jack Sparrow' regarding server_ip: '192.168.1.2' "}

Now we have separation of fields where the log is in the message field. We can now filter easily based on time or log level. But we can do better!

We want to be able to create additional fields which we’ll use as filters, these fields should also be dynamic so we can have different fields in the logs across our application.

Dynamic formatter

import logging
import json


def logging_override(name: str, extra: dict):
    logger = logging.getLogger(name)
    logger.setLevel(logging.INFO)
    stream_handler = logging.StreamHandler()
    basic_dict = {"time": "%(asctime)s", "level": "%(levelname)s", "message": "%(message)s"}
    full_dict = {**basic_dict, **extra}
    stream_formatter = logging.Formatter(json.dumps(full_dict))
    stream_handler.setFormatter(stream_formatter)
    if not logger.handlers:
        logger.addHandler(stream_handler)
    logger.handlers[0] = stream_handler
    logger = logging.LoggerAdapter(logger, extra)
    return logger


def main():
    logging.info("Sending Email to username: 'Jack Sparrow' regarding server_ip: '192.168.1.2'")
    extra = {'server_ip': '192.168.1.2', 'username': 'Jack Sparrow'}
    logger = logging_override("json", extra)
    logger.info("Sending Email to username: 'Jack Sparrow' regarding server_ip: '192.168.1.2'")

The logging_override function receives the name of the logger(can be anything) and a dictionary which contains the extra fields with their values.

We take a basic dictionary containing common fields and merge it with the extra dictionary(In python 3.9 this can be done with a dictionary union |).
We use the json library to create a json object of this merged dictionary and use it as the logging formatter.
A logger object is returned and can then be used.

The output would be this:

INFO:root:Creating handler
{"time": "2020-06-22 15:13:21,300", "level": "INFO", "message": "Sending Email to username: 'Jack Sparrow' regarding server_ip: '192.168.1.2' ",}
{"time": "2020-06-22 15:13:21,300", "level": "INFO", "message": "Sending Email to username: 'Jack Sparrow' regarding server_ip: '192.168.1.2'", "server_ip": "192.168.1.2", "username": "Jack Sparrow"}

The Latter containing the additional fields of server_ip and username.

Fluentd — Shipping the logs

Fluentd is simple to configure and configuration can be found here:

Recipe Json To Elasticsearch

fluentd.conf should look like this (just copy and paste this into fluentd.conf): After that, you can start fluentd and…

docs.fluentd.org

Fluent allows sending the logs in json format as shown in the link above which is perfect for our use case.

Kibana — Visualizing the logs

Now that the logs are arriving in your Elasticsearch, Kibana allows to easily filter the logs based on whatever fields you want. If we want to see how many Emails were sent to every user every day, we can filter for logs containing “Email sent” where field username exists, and count logs in a day by username field.

The vertical bar visualization allows us to stack the daily emails and split them(with cool colors) based on username filters.

Kibana allows many different visualizations of your logs including creating dashboards. I found it far easier to get specific data the more dynamic my log fields were.

Conclusion

I hope this provides readers with the push they need to start logging in a more dynamic way and visualizing their logs.