Databases

Effortlessly analyze Postgres sever logs

3 min readAug 20, 2023

PostgreSQL logs serve as a valuable asset for diagnosing issues, monitoring performance and observing database activities.

In my opinion, it is a bit challenging and time-consuming to manually analyze these logs and find some correlation during an incident.

To overcome this challenge, I use beat stack (Filebeat + Elasticsearch + Kibana) to effortlessly collect, parse & visualize these logs.

Show me outcome first

Before we deep dive into my solution, let me show you results first.

The following graph shows a history of running queries in your production system with respect to operations such as INSERT , UPDATE etc.

Postgres query execution with respect operation overtime — Query execution time per operation overtime

And the following graph provides a history of executed queries with respect to username.

Postgres query execution with respect to username — Query execution time per username overtime

This will not only enable you to find correlation during an incident but also allow you to filter queries with respect to operation or username etc.

Sounds interested :) … continue reading.

Architecture overview

Archtecture overview — Analyze postgres server logs

It’s a simple architecture where filbeat reads and parses server logs and ships them to elasticsearch. That’s it.

You can use kibana to visualize information and have a deeper insights of your database system.

Proof of concept

Source code: here

Assuming your Postgres server logs are in CSV format. We can parse and extract required information via filebeat processor namely decode_csv_fields and extract_array.

filebeat.inputs:
- type: log
  paths:  
    - /path/to/log/*.csv
  processors:
    # decodes fields containing records in comma-separated format (CSV)
    - decode_csv_fields:
        fields:
          message: decoded_csv
        separator: ","
        ignore_missing: false
        overwrite_keys: true
        trim_leading_space: false
        fail_on_error: true
    # parse decoded CSV
    - extract_array:
        field: decoded_csv
        mappings:
          psql.timestamp: 0
          psql.username: 1
          psql.database: 2
          psql.operation: 7
          psql.log.level: 11
          psql.message: 13
          psql.details: 14

We use dissect processor to extract query and it’s execution duration.

- dissect:
    tokenizer: "duration: %{duration_ms} ms  statement: %{query}"
    field: "psql.message"
    target_prefix: "psql"
    # only parse logs this way when 
    # - "psql.log.level" is "LOG"
    # - contains fields "psql.message"
    when:
      and:
        - has_fields: [ 'psql.message' ]
        - equals:
            psql.log.level: LOG
# consider query execution as numeric as opposed to string.
# This way, we can use different aggregation function in kibana
# to visualize logs with respect to this parameter.
- convert:
    fields:
      - { from: "psql.duration_ms", to: "psql.duration.ms", type: "double" }
    ignore_missing: true
    fail_on_error: false
    when:
      has_fields: [ 'psql.duration_ms' ]

Only ship logs of interest and drop the rest.

- drop_fields:
  fields: [
    'message',
    'psql.duration_ms',
    'decoded_csv'
  ]
  ignore_missing: true
# Keep "psql.message" when log-level is ERROR
- drop_fields:
  fields: [
    'psql.message'
  ]
  ignore_missing: true
  when:
    not:
      equals:
        psql.log.level: ERROR
# Only ship logs that has execution duration or log level is error.
# Drop the rest.
# Some postgres operation such as authentication, idle etc does not have
# query execution duration.
- drop_event:
  when:
    not:
      or:
        - equals:
            psql.log.level: ERROR
        - has_fields: [ 'psql.duration.ms' ]

Finally, logs are published to an elasticsearch.

output.elasticsearch:
  hosts: ["elasticsearch:9200"]

After running this setup as mentioned in the Readme, you can run kibana to visualize logs ( http://localhost:5601/ ). So, a sample document looks as follow: