Audit RedShift Historical Queries With pgbadger

Mar 9, 2020 · 5 min read

AWS RedShift is one of the most commonly used services in Data Analytics. In a very busy RedShift cluster, we are running tons of queries in a day. As a Datawarehouse admin, you can do real-time monitoring with the nice graphs provides by the AWS. Also, we have the historical data available on the console, so anytime we can go and search the queries. But many times we don’t need to see all the queries, We just need a consolidated report of overall queries in a particular time frame. It's not possible to filter the queries bases on users. It's always a good practice to audit RedShift historical queries which will help you to understand who is running what kind of queries.


Those of you with experience of running PostgreSQL in production, may have heard about PgBadger. Its an open-source tool to analyze the PostgreSQL logs. It’ll give you a nice overview of the PostgreSQL cluster including the query metrics. Since RedShift has PostgreSQL under the hood, we used PgBadger to explore and analyze RedShift logs.

Get the Logs:

In RedShift we can export all the queries which ran in the cluster to S3 bucket. But it's not in realtime. Every 1hr we’ll get the past hour log. We’ll get three different log files.

  • Connection log — logs authentication attempts, and connections and disconnections.
  • User log — logs information about changes to database user definitions.
  • User activity log — logs each query before it is run on the database.

We are only interested in analyzing the SQL queries. So we can parse the activity logs file alone and ignore the rest for now. This log is not enabled by default, it needs to be enabled manually.

Enable the activity log:

  • Create a custom parameter group.
  • Set enable_user_activity_logging is TRUE.
  • Then go to your logging S3 bucket assign the below bucket policy. But make sure you should replace the bucket name and the region-acc-id with proper AWS account id which is mentioned here.
"Version": "2012-10-17",
"Statement": [{
"Sid": "Put bucket policy needed for audit logging",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::region-acc-id:user/logs"
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::your-logging-bucket/*"
"Sid": "Get bucket policy needed for audit logging ",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::region-acc-id:user/logs"
"Action": "s3:GetBucketAcl",
"Resource": "arn:aws:s3:::your-logging-bucket"
  • Then go to cluster → maintenance and monitor → Audit logging
  • Once its done, in next one hour you can get the log files like below.

Install PgBadger:

The pgbadger is available on the official PostgreSQL repository.

-- Ubuntu
apt-get install pgbadger
-- centos
yum install pgbadger

Download the log files:

The logs are stored in the proper partition format(yyyy/mm/dd). So in our case, we do this analysis on a daily basis. So we download the files daily once (UTC time).

Update: Now RedShift log format is officially supported

But the challenge here is, the current format of RedShift logs are not acceptable by the pgbadger, but we can do some changes to make it parsable.

'2020-03-07T14:42:14Z UTC [ db=dev user=rdsdb pid=16750 userid=1 xid=5301 ]' LOG: SELECT 1
  • 2020–03–07T14:42:14Z — Remove the T and Z
  • LOG: — Add a suffix as statement

This line should be formatted as below.

'2020-03-07 14:42:14 UTC [ db=dev user=rdsdb pid=16750 userid=1 xid=5301 ]' LOG: statement: SELECT 1

Host the HTML file:

Once the file has been analyzed by the pgbadger, then it’ll generate the output file in html format. To view this, we can host it with a tiny ec2 instance or use S3 static hosting. Here we used S3 static hosting to avoid unnecessary costs for this.

Caution: Open this data to the public is not recommended, so use proper security hardenings and etc.

Script for download and parse the logs:

Update: 2020–03–24

Now Redshift log format is officially supported by PgBadger. Please refer the below link and screenshot.
So once you downloaded the log file, instead of customiznig, we can run the following command to generate the report.

pgbadger -f redshift log_file


Now you can hit the S3 URL to view your reports. But it’ll give you query level metrics. So directly go to the queries tab.


If you want the analysis in every hour, download the new log files (you can use s3 event triggers). We are refreshing the data on a daily basis but every day we want to see the last 24hrs data only. If you want to keep past N days/months use --incremental option. To learn more about the pgbadger options read their documentation page.

Audit with Athena:

If you want to perform the complete audit/analysis on top of this useractivitylog files, then refer to the below link.

Audit with RedShift Spectrum:

This another way, you can analyze these useractivitylog queries in the RedShift spectrum as well.

Searce Engineering

We identify better ways of doing things!

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store