Using PGAudit on Amazon RDS

Amet Umierov
Preply Engineering Blog
6 min readFeb 14, 2019

Remember, when we updated our production database on RDS from 9.4.15 to 9.5.13? So, from the 9.5.7 version we are able to use PGAudit for our DB.

In this article, I’ll describe what PGAudit is, how you can check events on the RDS database, and how to use AWS services to automate it.

PGAudit is watching

My name is Amet Umerov and I’m a DevOps Engineer at Preply.

Contents

  • Why should you use PGAudit?
  • Gmail preparations
  • DB preparations
  • AWS resources: Lambda and CloudWatch
  • Tests and results

Why should you use PGAudit?

PGAudit is the open source audit logging extension for PostgreSQL 9.5+.

It supports different classes of statements which could be logged (READ, WRITE, DDL, etc). It also supports logging for sessions and objects.

In our case, we want to monitor our production and stage databases for some events like:

  • Statements related to roles and privileges: GRANT/REVOKE/CREATE/ALTER/DROP ROLE
  • All DDL that is not included in the ROLE class

A warning message with new events should be sent to the email security@example.com using script.

Our goals are:

  • Monitoring tables schemes changes, we should know about new events related to DB structure
  • Monitoring roles and their privileges, it’s important for security access control
  • Allow our Data Analysts immediately know about new tables in DB

Here is a simplified diagram of how it works:

How it works
  • CloudWatch cron rule triggers a Lambda function every hour
  • Lambda function downloads a past-hour log and parses it for audit events
  • If new events have happened, the function sends an email to security@example.com from audit@example.com

Let’s implement it!

Gmail preparations

If you want to send a message to messenger instead of Gmail mailbox, you can use APIs and libraries, example for Slack is python-slackclient.

We use 2FA for all our mailboxes on Gmail, so we need to create an application password for it.

Go to Google account for audit@example.com and create a password for application (Other):

Application password for PGAudit

Google generates a 16-letter password for your application. For this example, let it be: abcdefghijklmnop.

We will use this password in our Lambda function later.

DB preparations

Here’s what we currently have:

  • DB with the endpoint: dbstage.abc1def2.eu-west-1.rds.amazonaws.com
  • DB master password for PostgreSQL user: pgUserPassword
  • Parameter group: parameter-group-stage, which already applied to DB instance

First of all, you will need to create a role for PGAudit:

export PGPASSWORD=pgUserPassword
export PARAMETER_GROUP=parameter-group-stage
export ENDPOINT=dbstage.abc1def2.eu-west-1.rds.amazonaws.com
export DATABASE=mydb
export INSTANCE_NAME=$(awk -F. '{print $1}' <<< ${ENDPOINT})
psql -h ${ENDPOINT} \
-U postgres ${DATABASE} \
-c "CREATE ROLE rds_pgaudit;"

Add PGAudit support to the parameter group:

aws rds modify-db-parameter-group \
--db-parameter-group-name ${PARAMETER_GROUP} \
--parameters "ParameterName=shared_preload_libraries,ParameterValue=pgaudit,ApplyMethod=pending-reboot"

Wait a few minutes for the new parameters to be applied to instance, and reboot it:

aws rds describe-db-instances \
--db-instance-identifier ${INSTANCE_NAME} \
--query 'DBInstances[*].[DBInstanceStatus]' \
--output text
available
aws rds reboot-db-instance --db-instance-identifier ${INSTANCE_NAME}

DB downtime depends on the size and load on it. For example, for our stage DB, it is 4 seconds, for prod DB — 32 seconds.

Go to RDS console and set other parameters for your parameter group:

  • pgaudit.role: rds_pgaudit
  • pgaudit.log_level: info
  • pgaudit.log: 'ROLE,DDL'
  • pgaudit.log_statement_once: 1

Check that all new parameters have been applied:

psql -h ${ENDPOINT} \
-U postgres ${DATABASE} -t \
-c 'SHOW shared_preload_libraries;'
rdsutils,pgaudit
aws rds describe-db-parameters \
--db-parameter-group-name ${PARAMETER_GROUP} \
--query 'Parameters[*].[ParameterName,ParameterValue]' \
--output text | grep pgaudit
pgaudit.log ROLE,DDL
pgaudit.log_level
info
pgaudit.log_statement_once
1
pgaudit.role
rds_pgaudit
...

Test it:

psql -h ${ENDPOINT} \
-U postgres ${DATABASE} \
-c 'CREATE ROLE test_role;'
LOGFILE=$(aws rds describe-db-log-files --db-instance-identifier ${INSTANCE_NAME} --query 'DescribeDBLogFiles[-1].[LogFileName]' --output text)aws rds download-db-log-file-portion \
--db-instance-identifier ${INSTANCE_NAME} \
--starting-token 0 \
--log-file-name "${LOGFILE}" \
--output text | grep AUDIT
2018-12-04 15:38:33 UTC:95.164.49.162(50215):postgres@mydb:[28087]:LOG: AUDIT: SESSION,1,1,ROLE,CREATE ROLE,,,CREATE ROLE test_role;,<not logged>

And if it works, PGAudit stores events to RDS logs, and we can grab them. Let’s automate it!

AWS resources: Lambda and CloudWatch

Create a role for Lambda function:

export ROLE_NAME=pgaudit_role_stage
export POLICY_NAME=pgaudit_policy_stage
cat << EOF > ${ROLE_NAME}.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"lambda.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
EOF
aws iam create-role \
--role-name ${ROLE_NAME} \
--assume-role-policy-document file://${ROLE_NAME}.json

Create the policy:

cat << EOF > ${POLICY_NAME}.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"rds:DownloadDBLogFilePortion",
"rds:DescribeDBLogFiles",
"rds:DownloadCompleteDBLogFile"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"lambda:GetFunction",
"lambda:InvokeFunction"
],
"Resource": "*"
}
]
}
EOF
aws iam create-policy \
--policy-name ${POLICY_NAME} \
--policy-document file://${POLICY_NAME}.json

Attach the policy to the role:

POLICY_ARN=$(aws iam list-policies --output json --query 'Policies[*].[PolicyName,Arn]' --output text | grep ${POLICY_NAME} | awk '{print $2}')aws iam attach-role-policy \
--policy-arn ${POLICY_ARN} \we
--role-name ${ROLE_NAME}

And now we are ready to create a Lambda function. Create a script and pack it to zip archive:

cat << EOF > lambda_function.py
import boto3
import os
import smtplib
from email.mime.text import MIMEText
def lambda_handler(event, context):
instance_name = 'dbstage' # replace me
username = 'audit@example.com' # replace me
targets = ['security@example.com'] # replace me
sender = username
password = os.environ['GMAIL_PASSWORD']
smtp_server = 'smtp.gmail.com'
temp_logfile = '/tmp/rds.log'
rds = boto3.client('rds')
get_log_file = rds.describe_db_log_files(
DBInstanceIdentifier=instance_name
)
# Get last log and save it to file
log_file = get_log_file['DescribeDBLogFiles'][-2]['LogFileName']
with open(temp_logfile, 'w') as f:
token = '0'
get_log_data = rds.download_db_log_file_portion(
DBInstanceIdentifier=instance_name,
LogFileName=log_file,
Marker=token
)
while get_log_data['AdditionalDataPending']:
f.write(get_log_data['LogFileData'])
token = get_log_data['Marker']
get_log_data = rds.download_db_log_file_portion(
DBInstanceIdentifier=instance_name,
LogFileName=log_file,
Marker=token
)
f.write(get_log_data['LogFileData'])
f.close()
# Find AUDIT events in log file
array = []
with open(temp_logfile, 'r') as f:
for line in f:
if 'AUDIT' in line:
array.append(line)
f.close()
if not array:
print('No audit events, skipping...')
else:
output = "\n".join(str(x) for x in array)
print('There are some audit events, sending email...')
msg = MIMEText(output)
msg['Subject'] = 'Stage DB audit'
msg['From'] = sender
msg['To'] = ', '.join(targets)
server = smtplib.SMTP_SSL(smtp_server, 465)
server.login(username, password)
server.sendmail(sender, targets, msg.as_string())
server.quit()
print('Email has been sent')
os.remove(temp_logfile)EOFcat << EOF > requirements.txt
boto3
EOF
zip pgaudit_lambda_stage.zip lambda_function.py requirements.txt

Create Lambda function:

export FUNCTION_NAME=pgaudit_stage
export GMAIL_PASSWORD=abcdefghijklmnop
ROLE_ARN=$(aws iam list-roles --output json --query 'Roles[*].[RoleName,Arn]' --output text | grep ${ROLE_NAME} | awk '{print $2}')aws lambda create-function \
--function-name ${FUNCTION_NAME} \
--zip-file fileb://pgaudit_lambda_stage.zip \
--role ${ROLE_ARN} \
--handler lambda_function.lambda_handler \
--runtime python3.6 \
--timeout 900 \
--environment Variables={GMAIL_PASSWORD=${GMAIL_PASSWORD}}

Create a CloudWatch event:

export CRON_RULE_NAME=pgaudit_stage_cronaws events put-rule \
--name ${CRON_RULE_NAME} \
--schedule-expression 'cron(02 */1 * * ? *)'
RULE_ARN=$(aws events describe-rule --name ${CRON_RULE_NAME} --query '[Arn]' --output text)aws lambda add-permission \
--function-name ${FUNCTION_NAME} \
--statement-id pgaudit_event \
--action 'lambda:InvokeFunction' \
--principal events.amazonaws.com \
--source-arn ${RULE_ARN}
FUNCTION_ARN=$(aws lambda get-function --function-name ${FUNCTION_NAME} --query 'Configuration.FunctionArn' --output text)cat << EOF > targets.json
[
{
"Id": "1",
"Arn": "${FUNCTION_ARN}"
}
]
EOF
aws events put-targets \
--rule ${CRON_RULE_NAME} \
--targets file://targets.json
aws lambda publish-version --function-name ${FUNCTION_NAME}

This event triggers our function every hour, at 00:02, 01:02, 02:02, etc.

pgaudit_stage function in AWS console

Tests and results

You can test it by running some events like, for example, role creation:

psql -h ${ENDPOINT} \
-U postgres ${DATABASE} \
-c 'CREATE ROLE test_role;'

Wait for cron time or go to Lambda in AWS console and run a test:

Email delivered to security@example.com

Stay tuned and subscribe to our blog, we will publish new articles soon :)

--

--