Modern Python Logging

Paul Wolf
11 min readJan 14, 2018

--

This is an opinionated view of how logging should be done for Python applications. I won’t discuss how to configure the `logging` module in any detail; I assume you already know that. This is more about the bigger picture. The purpose of logging application events should be to better understand the behaviour of your system. But the way most developers use the `logging` module, you end up with fragments of information distributed across files and compute nodes that can’t easily be pieced together. This article should help remedy that situation. We assume you face the challenges of more complex commercial architectures, like the use of container technologies such as Docker and that your architecture is composed of multiple services.

In the last years, there has been an explosion of growth in logging services and tools. Toolsets like ELK (Elasticsearch, Logstash, Kibana), streaming platforms, like Apache Kafka, AWS Kinesis or online services like Loggly, Splunk. But before you get going with those, some general design principles around logging need to be observed. If you are still writing logs to log files on local storage of your application nodes, this article is for you. You can, with relative ease, transform your logging practices to make your application vastly easier to manage and debug. We’ll show you how to get working with the new paradigm of logging with minimal effort.

I’ll argue here that you should aggregate logs on a single service level where appropriate (event logging), stream these to your logging service (log streaming) and ensure you can associate log events that originate from different services that belong to the same transaction (distributed logging).

We are interested here in application logs, i.e. logging from your code. We are less interested in system logs generated by subsystems of the OS, web logs and other logs over which you as a developer have less control. For these types of logs, you might be best served using available tools that can already parse such logs and ship them to your chosen data storage or analytics front end.

The modern vision of logging is to treat your application logs as events, just a few of the events in your enterprise’s vast universe of events generated every second. There is a lot of talk today about analytics and machine learning. But our purpose here is much more practical. It’s aimed at Python developers who want to gain big advantages with relatively small effort.

What problems are we trying to solve? Logging configurations that write to log files on application nodes don’t really work well:

  • Log entries usually represent some fragment of a larger process. Putting these all together to understand what is happening in a system with multiple processes and high load is hard.
  • Having multiple nodes (for capacity scaling, for instance) means you have lots of places to check your log files.
  • The use of container technologies like Docker mandates that you not treat nodes as stateful entities, i.e. you can’t keep your logs in local storage (with some exceptions that I won’t discuss here).
  • Microservices architectures mean that many events distributed across your network are significantly related. You want to be able to see these associations.

I propose three principles to achieving “logging maturity”:

1. Event-based logging: treating events in code that span multiple
lines and possibly functions, methods, etc. as a single log event.

2. Log streaming: Using log streams that avoid writing to local storage
is a requirement for modern systems design.

3. Distributed logging: Logs can be generated by different services
spread across different compute nodes. Logs from one node are often
part of a transaction logged on another node. It’s necessary to be
able to associate these. This is the same as #1 but identifying related logs across multiple compute nodes.

Let’s say you have a product that has different services. One service
is in charge of availability of inventory (A). One service processes
orders from affiliates (C). Another service provides the interface to
buyers (B). B calls service A for inventory availability; if available, the order from B to C is requested. We want to log each process: “check avails”, “if available, order item”, “confirm order”. It does not matter that some of these processes are synchronous calls (check avails, order item) or asynchronous (confirm order). As far as reasonably possible, we’d like:

  • A single log entry for each one of these processes
  • A way to associate all three processes as belonging to one
    macro-transaction with some outcome

Below, I’ll provide examples for each of the three levels of logging. I’ll use an open source module for event logging (written by the present author), `LogTrace`. For streaming, we’ll use AWS Kinesis as an example. For storage of logs, we’ll use MongoDB purely by way of example. None of these choices should be understood as prescriptive. The three levels are design concepts that are useful independent of how you implement them.

1. Event logging

Let’s take the example of an authentication procedure for a RESTful API call:

  • Extract token from HTTP header
  • Check token with token service for validity
  • Find a local user associated with the token
  • Check with Central Authentication Service

Plus all the alternative paths that might be taken depending on whether the authentication attempt fails because of a variety of reasons, invalid or expired token, user not found, etc.

[12:12:54] INFO auth header: [b’Token’, b’2c59999137******************************’]

[12:12:54] INFO authenticate key, model: <class ‘tastypie.models.ApiKey’>

[12:12:54] INFO token validation for API call confirmed

[12:12:54] INFO got key from token table: paul

[12:12:54] INFO Successfully authenticated

Let’s ignore the issue of whether you should write short-lived tokens to a logging destination. In order to debug your custom auth code, you will be emitting the header information to make sure the header is present.

  • You have to pack full identifying information in every log entry to associate different log lines with one another thus continuously repeating information like the token to be able to determine if log entries are related.
  • They will be interwoven with other logs not related to the event you are interested in
  • Larger logs due to the need to embed identifying information in each log
  • Difficult to parse

`LogTrace` is a tiny manager that helps you concatenate the log entries, creating a single entry. Install thus:

pip install logtrace

It works like this:

import logging
from logtrace import LogTrace
logger = logging.getLogger(__name__)
trace = LogTrace(logger=logger)
trace.add(“Let’s get started”)

trace.add(“Later, something else happens”)

trace.add(“And finally…”)
trace.emit()

You get a single log entry like this:

[05/Jan/2018 11:11:00] DEBUG [21, .30s] Let’s get started; [65, .132s] Later, something else happens; [75, .330s] And finally…

Using our example above for token authentication, your entry will look something like this:

[12:12:54] INFO [132, 0.0006s] auth header: [b’Token’, b’2c59999137******************************’]; [132, 0.0007s] authenticate key, model: <class ‘tastypie.models.ApiKey’>; [132, 0.1057s] token validation for API call confirmed; [132, 0.1078s] got key from token table: paul; [163, 0.1079s] Successfully authenticated

Now we see all the information in one entry. We don’t need to repeat information in each sub-message to related it to the other messages. In the above, we set LogTrace to include the line number and number of seconds since creating the LogTrace in square brackets.

See here the module readme for more details: https://github.com/paul-wolf/logtrace

So, the first phase is done; you have intelligible logs that express a meaningful outcome for micro-processes.

2. Log Streaming

Now you need to send logs to a logging service. We are not going to help you much to explain how to consume these logs once they are aggregated in storage away from your applications. If your target storage is Elasticsearch, you might use Kibana. If it’s a hosted logging service, they will provide a UI to query and view logs. If you send to some other storage you manage, you might write your own log viewer or use some standard UI tool for querying that storage technology.

One principle of the greatest importance that many Python developers are hardly aware of: our logs should be structured data documents. Since many developers are used to configuring the `logging` module to write to text files, they don’t pay much attention to the LogRecord class, an extensible dict-like object that holds log records. It’s this object we want to stream, not just a simple string.

As mentioned above, we’ll use AWS kinesis (https://aws.amazon.com/kinesis/data-streams/) to illustrate streaming log records. We’ll also assume you are configuring logging in a Django application.

This is not the place to explain exactly how to configure and use Kinesis. To use the below code, you will need to have created and configured a Kinesis stream in your AWS console called ‘app_logging’. You also need to know how to create an AWS resource, like a Kinesis client using your AWS credentials with the boto or boto3 module.

So, let’s write our custom log handler:

import socket
import logging
import json
import traceback
# you need to write this function yourself to return a valid AWS client
from myproj.utils import get_kinesis_client
class KinesisHandler(logging.Handler):
def __init__(self, stream_name, handler_logger=None):
self.stream_name = stream_name
self.fqdn = socket.getfqdn()
logging.Handler.__init__(self)
def emit(self, record):
# save exception stacktrace if there is one
exc_info = ''
if record.exc_info:
exc_info = str(traceback.format_tb(record.exc_info[2]))
data = {
'fqdn': self.fqdn,
'pk': self.fqdn, # this is the kinesis PartitionKey; there are better ways
'exc_info': exc_info,
}
# dump all LogRecord attributes into event as strings
for k, v in record.__dict__.items():
data[k] = str(v)
# send off to kinesis
get_kinesis_client().put_record(
StreamName=stream_name,
Data=json.dumps(data),
PartitionKey=partition_key)

Now configure logging to use the handler:

LOGGING = {
'handlers': {
'kinesis_handler': {
'level': 'DEBUG',
'class': 'myproj.utils.logging.KinesisHandler',
'stream_name': 'app_logging',
},
},
'loggers': {
'myproject.apps': {
'handlers': ['kinesis_handler'],
'level': 'DEBUG',
'propagate': False,
},
}

You are done and your log messages, when your logger uses a name in the ‘myproj.apps’ hierarchy, get streamed to Kinesis. You can check your AWS Kinesis logs to see that this is so. If you call `logger.debug(“omg, it’s working!”)`, the log will end up as a Kinesis event. Likewise, `logger.exception(e)` will send the error and stacktrace to Kinesis.

This is a big step forward and very little code was required to get this far.

Now, we can show you how to get your Kinesis log events into MongoDB. To do this, you need to know how to setup an AWS Lambda function (https://aws.amazon.com/lambda/). Set up such a function and configure it to process events from the ‘app_logging’ stream. Write the function like this:

import json
import base64
import datetime
from pymongo import MongoClient
def lambda_handler(event, context):host = 'mymongodb.yourdomain.com'
db = 'logs'
user = 'log_user'
pw = 'xxxxxxxxxx'
uri = "mongodb://{}:{}@{}/{}".format(user, pw, host, db)
client = MongoClient(uri)
bplog = client[db]
for record in event['Records']:
payload = base64.b64decode(record["kinesis"]["data"])
data = json.loads(payload.decode('utf-8'))
collection_name = 'kinesis_logs_{}'.format(datetime.date.today().isoformat())
log_collection = bplog[collection_name]
log_collection.insert_one(data)

Now your logs are not only streamed to Kinesis, they exist in a MongoDB collection, with a new collection for every day. You also have structured data. You can call your logger like this to add any data:

logger.info(“Got inventory information for product id: xyz”, extra=response_data)

Where `response_data` might be the response from your inventory availability service (which might be another vendor’s API). The `extra` data gets added to the LogRecord and therefore transparently ends up in your MongoDB log event document.

You can’t exactly just cut and paste here. I took some shortcuts to keep the presentation clean of extraneous information. You need to do the following:

  • Have an AWS account
  • Write a small amount of code to connect to AWS resources, like Kinesis, using the `boto` module
  • Setup a new Kinesis stream
  • Setup a Lambda function in AWS to consume the stream events (with appropriate permissions)
  • Have a MongoDB instance accessible by the AWS Lambda function
  • Probably do some error checking that I have not included

None of these is onerous. If you want to use different tools, you can look at ready-made handlers that use log collectors like fluentd, logstash, AWS CloudWatch and many more. Streaming can use alternatives, like Apache Kafka. There are many hosted Kafka services. Any storage that is schemaless and takes JSON documents as input can be used, Elasticsearch, MongoDB, S3, HDFS (Spark), etc. There are complete hosted services that let you do everything with almost zero code and provide good analytical front ends for log querying.

Just to repeat, the components of the solution:

  • Log collectors, these integrate with your handler to send logs to a streaming service. Logstash, Fluentd, etc. We decided to write our own Python logging handler above to show how simple it is.
  • Streaming service: high-performance event streams. Apache Kafka, AWS Kinesis, etc.
  • Event processing service: streamed events need processing; we used a very simple AWS Lambda service which will work for many applications but there is also Apache Flink and other services that are more sophisticated.
  • Storage service: Store structured but schemaless data, Elasticsearch, MongoDB, HDFS, etc.
  • Analytics: Some UI that lets you look at logs and can be as simple as “show me the last 100 log events with these search criteria” or sophisticated analytics and dashboards that show summary conclusions. Kibana is the best known and works with Elasticsearch but there are many other options, especially if you use a hosted service dedicated to log aggregation.

3. Distributed Logging

We have discussed micro-process logging and used a small module called `LogTrace` to aggregate logs on a single process, single module level. We presented a way to send log events to a streaming service and process them by storing them in a document store for later querying.

With that, we are logging application events to an aggregated store from our entire network. All logs from all nodes exist at a single point of access. There is one more thing we want to do: we want to associate logs from different parts of our infrastructure. To repeat our example from above:

One service is in charge of availability of inventory (A). One service processes orders from affiliates C. Another service provides the interface to buyers (B). B calls service A for inventory availability; if available, the order from B to C is requested.

So:

  • A: Inventory service
  • B: Purchasing interface
  • C: Product ordering service

You have various deployments in your network representing these services. B and C communicate with external vendors through APIs. You now have logs aggregated at a single point of access. Your logs are no longer on local node storage so you can implement stateless containers that are ephemeral (Docker, e.g.). You are in good shape. One last thing: A buyer wants to buy a widget, they query the purchasing interface B for availability (it calls service A in turn) and reserves the 1000 widgets. Then they buy the widgets using B which calls service C.

This results in many logs from each service. While the logs are aggregated in our logging service, you need something to associate them. You might use some domain variable, like an order number to get all of the logs that describe the single macro-transaction. But this is not always easy because the order number might not exist yet at some point after the process starts. The best thing is to use some opaque Id. UUIDs are ideal for this. So, when recording the query:

import uuid
order_uuid = uuid.uuid4()
logger.info(“check availability for 1000 widgets”, extra={“log_uid”: order_uuid})available = check_inventory(widget_id, quantity=1000, log_id=order_uuid)if availabile:
order_widgets(widget_id, log_id=order_uuid)

Now, all services are writing their logs for this one macro-transaction to the log stream with a single unique id. When you go to query your logs, you can bring up all logs related to this distributed transaction in a single query. If you are using a sophisticated event stream processor, like Flink, or your own Lambda function, you could even store all these log events in a single JSON document in your intended log storage. While I don’t show explicitly how the unique id is communicated to remote systems, it might be part of a REST or RPC request. Or it can be passed in any HTTP header using a custom, but consistent, name.

Summary

At this stage, you have a framework for truly understanding the behaviour of your systems. Going back to check what happened in case there is a problem or to simply perform analytics on past application performance is an easy task because you are storing structured information in a way that is easy to query.

  • Log events: We are aggregating log messages that relate to a single micro-process
  • Log streaming: We stream log events to a single point of access
  • Distributed logs: we associate logs from disparate sources in our network that belong to the same macro-transactions.

These principles will serve you well if you are implementing a microservices architecture and using stateless containers. Chose the set of tools that work best for you. Comparable methods and tools exist for Microsoft Azure, Google Cloud Platform, AWS and other services.

--

--

Paul Wolf

Software technical and business exec; Python, Django, Microservices