Correlate statement logs in CloudSQL for Postgres with connection sessions
--
This post is a part of the How Cloud Logging works series.
CloudSQL for Postgres features ingesting PostgreSQL logs into Cloud Logging for further processing. It can be enabled using database flags, specifically cloudsql.enable_pgaudit. However, the captured logs differ from what you might be accustom to off-line. This happens because CloudSQL does not support all log related configurations. For example, CloudSQL for Postgres does not support “log_line_prefix” flag (checked on January 2022). In some cases this can be mitigated using other features of CloudSQL. For instance, CloudSQL instance labels can be used as a replacement for prefixing each log entry with the application name. One of the remaining challenges was to associate log entries with executed SQL statements with a particular client application sessions. The on-prem PostgreSQL can be configured to prefix each log entry with the “Remote host name or IP address” of the client. In CloudSQL, this information is available only in the connection related log entries. To have them the instance has to have the log_connections database flag enabled.
There is a way to create a correlation between SQL statement log entries and connection log entries and to link each executed statement to particular client. Each SQL statement log entry ingested by pgAudit contains a field in the request’s payload, called databaseSessionId
:
request: {
@type: "type.googleapis.com/google.cloud.sql.audit.v1.PgAuditEntry"
auditClass: "READ"
auditType: "SESSION"
chunkCount: 1
chunkIndex: 1
command: "SELECT"
database: "postgres"
databaseSessionId: 245
object: ""
objectType: ""
parameter: "<not logged>"
statement: "SELECT * FROM TEST;"
statementId: 2
substatementId: 1
user: "postgres"
}
This identifier correlates with the value that can be found in the prefix of the message payload of the connection log entry:
Using Cloud Logging query language it should be possible to find all related log entries. If the relevant logs are exported to BigQuery, it will be possible to create a view that adds session info to each statement.
As a side note, if the connection to CloudSQL instance is done via Cloud sql-proxy, it may require additional correlation step since the proxied connections hide client details. Let me know if you are interested in how to do it.