Python JSON Logging and CloudWatch Dashboards

GLAMI Engineering
5 min readNov 24, 2021

How to configure python-json-logger, CloudWatch Agent, and setup a CloudWatch dashboard in AWS.

Vanilla Python logging is very readable and requires minimal setup. However, as the project progresses through its life cycle, the system’s observability becomes critical. One of the simplest ways to monitor application events and metrics is to log in JSON format to AWS CloudWatch dashboards. This post will walk you through the setup process step by step.

A prerequisite of this guide is that you already store your logs in local files. For example, Supervisor daemon can rotate log files for you. Some of the steps also expect that you use Ubuntu, but it should be easy to adapt them for your case.

Why to Log in JSON Format to AWS?

Instead of logging human readable lines with separators like this:

2021-11-23 11:43:29,391 : INFO : MainThread : root : this is the event with metrics val=1

We want to log into machine readable JSON lines:

{"asctime": "2021-11-23 11:49:56,654", "threadName": "MainThread", "name": "root", "levelname": "INFO", "message": "this is the event with metrics", "val": 1}

Such that once we have these parsable log lines in AWS, we can setup alerts and build dashboards like this:

Example hourly time-bin CloudWatch Dashboard

How to Setup JSON Logging in Python and Gunicorn

Downside of JSON is that you will have much harder time reading exception stack traces. JSON logging is not strictly necessary to be able to parse metrics information from your log lines, but you can save your self manual parsing later. With JSON logging you can also quite easily log any python variable using “extra” dictionary e.g.:

logging.info("job_end", extra=dict(val=1.0))

Directly within Python you can configure JSON logging in this way:

json_handler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter("%(asctime)s %(threadName)s %(name)s %(levelname)s %(message)s")
json_handler.setFormatter(formatter)
logging.basicConfig(handlers=[json_handler], level=logging.INFO)

If you run your application via Gunicorn, then Gunicorn’s logging configuration has a priority. Set Gunicorn logging configuration parameter:

gunicorn --log-config gunicorn_logging.conf main:app

The Gunicorn log configuration file should look like this:

[loggers]
keys=root, gunicorn.error

[handlers]
keys=console

[formatters]
keys=json

[logger_root]
level=INFO
handlers=console

[logger_gunicorn.error]
level=INFO
handlers=console
propagate=0
qualname=gunicorn.error

[handler_console]
class=StreamHandler
formatter=json
args=(sys.stdout, )

[formatter_json]
class=pythonjsonlogger.jsonlogger.JsonFormatter
format=%(asctime)s %(threadName)s %(name)s %(levelname)s %(message)s

How to Setup CloudWatch Agent

Now we have setup JSON logging into a file to a local disk. In this step we configure streaming of the log file into a CloudWatch log group using CloudWatch Agent.

Configure Hostname

CloudWatch metrics for disk space and RAM are reported to AWS under hostname of the machine.

sudo hostnamectl set-hostname my-app-name

Download The Agent Package

Consider double checking the AWS instructions for the latest version.

wget https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb

(Optional) Verify the Package Signature

Although the download was secured via HTTPS, we can check the package file signature.

# key
wget https://s3.amazonaws.com/amazoncloudwatch-agent/assets/amazon-cloudwatch-agent.gpg
# signature
wget https://s3.amazonaws.com/amazoncloudwatch-agent/debian/amd64/latest/amazon-cloudwatch-agent.deb.sig
# verify
gpg --import amazon-cloudwatch-agent.gpg
gpg --fingerprint D58167303B789C72# should match following
gpg --verify amazon-cloudwatch-agent.deb.sig amazon-cloudwatch-agent.deb

Install the Package

This depends on your OS. Below code is for Ubuntu.

sudo dpkg -i -E ./amazon-cloudwatch-agent.deb

Configure the Agent

You can use your modification of the most basic configuration below or alternatively use CloudWatch Agent wizard. First create the configuration file:

sudo vi /opt/aws/amazon-cloudwatch-agent/bin/config.json

Copy the file contents below and edit the log file location, the log group name, and the stream name.

{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/my-app-name.log",
"log_group_name": "my-app-name-group",
"log_stream_name": "my-app-name.log"
}
]
}
}
},
"metrics": {
"namespace": "my-app-name",
"metrics_collected": {
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}

Configure AWS Credentials

The simplest credentials setup is to use machine’s credentials stored in local home directory.

sudo tee -a /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml <<EOF
[credentials]
shared_credential_profile = "default"
shared_credential_file = "/home/ubuntu/.aws/credentials"
EOF

Load the Config into the Agent

Let’s load our configuration file into the CloudWatch Agent. The agent will use the credentials to start pumping the logs to AWS.

sudo amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s

Check the Agent Log

Now the CloudWatch Agent should be pushing your logs line by line to the cloud. Check the agent’s log for any issues.

less /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log

Then go to CloudWatch and check if you can see your new log group and if it is contains the lines from the log file.

https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups

AWS Dashboard

We have our JSON logs in CloudWatch, so we can create our dashboard. Access the dashboard page using the left panel in CloudWatch or with the link below.

https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:

Each dashboard can have one or more widgets displaying charts. Each widget declares its own log query. See below real-world example of a query for bar chart that uses time binning.

fields @timestamp, message, job_result
| fields (message = 'job_end' and job_result = 'success') as is_success, (message = 'job_end' and job_result = 'failure') as is_failure
| stats sum(is_fail) as is_failure_count, sum(is_success) as is_success_count by bin(60m) as time_of_match
| sort time_of_match asc
| limit 10000

Since our logs are now in JSON format we can extract its fields out of the box by just declaring them in the “fields” clause, which you can see in above query example. Otherwise we would need to parse the fields using custom functions.

Python JSON log entries always contain field “message”. Additionally, if “extra” dictionary is logged, then fields contain its keys. There are also meta information fields available like “@timestamp”. Timestamp that is generated automatically by the agent probably at read time. It is not parsed from log and it lags slightly behind the native log entry value.

When you write queries and there are log entries in which the field is missing, the field’s value is replaced with an equivalent of a null value in db systems.

If you bin by time, be careful about the chart interpretation. In points in time when there is no data, no time-bucket will be displayed on the chart, which is confusing. So avoid using filters in the query if you don’t have to use them. Instead create a new field like is_success in above. At the same time, there is 10k limit on the results size of each log query.

Click “Save” and “Save Dashboard” often or you don’t loose your progress! Avoid opening multiple windows for same dashboard to not overwrite your progress.

Other instructions for writing log queries are available here:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax-examples.html

Conclusion

JSON logging with an Dashboard can help use to gain more visibility into your Python application.

Subscribe for more blog posts!

Are you interested in working for GLAMI? Checkout our job listings!

By Václav Košař

--

--

GLAMI Engineering

Blog of Machine Learning and Software Developers from GLAMI. We post about Python, AI, and more.