Using PGAudit on Amazon RDS
Remember, when we updated our production database on RDS from 9.4.15 to 9.5.13? So, from the 9.5.7 version we are able to use PGAudit for our DB.
In this article, I’ll describe what PGAudit is, how you can check events on the RDS database, and how to use AWS services to automate it.
My name is Amet Umerov and I’m a DevOps Engineer at Preply.
Contents
- Why should you use PGAudit?
- Gmail preparations
- DB preparations
- AWS resources: Lambda and CloudWatch
- Tests and results
Why should you use PGAudit?
PGAudit is the open source audit logging extension for PostgreSQL 9.5+.
It supports different classes of statements which could be logged (READ
, WRITE
, DDL
, etc). It also supports logging for sessions and objects.
In our case, we want to monitor our production and stage databases for some events like:
- Statements related to roles and privileges:
GRANT
/REVOKE
/CREATE
/ALTER
/DROP
ROLE
- All
DDL
that is not included in theROLE
class
A warning message with new events should be sent to the email security@example.com
using script.
Our goals are:
- Monitoring tables schemes changes, we should know about new events related to DB structure
- Monitoring roles and their privileges, it’s important for security access control
- Allow our Data Analysts immediately know about new tables in DB
Here is a simplified diagram of how it works:
- CloudWatch cron rule triggers a Lambda function every hour
- Lambda function downloads a past-hour log and parses it for audit events
- If new events have happened, the function sends an email to
security@example.com
fromaudit@example.com
Let’s implement it!
Gmail preparations
If you want to send a message to messenger instead of Gmail mailbox, you can use APIs and libraries, example for Slack is python-slackclient.
We use 2FA for all our mailboxes on Gmail, so we need to create an application password for it.
Go to Google account for audit@example.com
and create a password for application (Other):
Google generates a 16-letter password for your application. For this example, let it be: abcdefghijklmnop
.
We will use this password in our Lambda function later.
DB preparations
Here’s what we currently have:
- DB with the endpoint:
dbstage.abc1def2.eu-west-1.rds.amazonaws.com
- DB master password for PostgreSQL user:
pgUserPassword
- Parameter group:
parameter-group-stage
, which already applied to DB instance
First of all, you will need to create a role for PGAudit:
export PGPASSWORD=pgUserPassword
export PARAMETER_GROUP=parameter-group-stage
export ENDPOINT=dbstage.abc1def2.eu-west-1.rds.amazonaws.com
export DATABASE=mydb
export INSTANCE_NAME=$(awk -F. '{print $1}' <<< ${ENDPOINT})psql -h ${ENDPOINT} \
-U postgres ${DATABASE} \
-c "CREATE ROLE rds_pgaudit;"
Add PGAudit support to the parameter group:
aws rds modify-db-parameter-group \
--db-parameter-group-name ${PARAMETER_GROUP} \
--parameters "ParameterName=shared_preload_libraries,ParameterValue=pgaudit,ApplyMethod=pending-reboot"
Wait a few minutes for the new parameters to be applied to instance, and reboot it:
aws rds describe-db-instances \
--db-instance-identifier ${INSTANCE_NAME} \
--query 'DBInstances[*].[DBInstanceStatus]' \
--output text
availableaws rds reboot-db-instance --db-instance-identifier ${INSTANCE_NAME}
DB downtime depends on the size and load on it. For example, for our stage DB, it is 4 seconds, for prod DB — 32 seconds.
Go to RDS console and set other parameters for your parameter group:
- pgaudit.role:
rds_pgaudit
- pgaudit.log_level:
info
- pgaudit.log:
'ROLE,DDL'
- pgaudit.log_statement_once:
1
Check that all new parameters have been applied:
psql -h ${ENDPOINT} \
-U postgres ${DATABASE} -t \
-c 'SHOW shared_preload_libraries;'
rdsutils,pgauditaws rds describe-db-parameters \
--db-parameter-group-name ${PARAMETER_GROUP} \
--query 'Parameters[*].[ParameterName,ParameterValue]' \
--output text | grep pgaudit
pgaudit.log ROLE,DDL
pgaudit.log_level info
pgaudit.log_statement_once 1
pgaudit.role rds_pgaudit
...
Test it:
psql -h ${ENDPOINT} \
-U postgres ${DATABASE} \
-c 'CREATE ROLE test_role;'LOGFILE=$(aws rds describe-db-log-files --db-instance-identifier ${INSTANCE_NAME} --query 'DescribeDBLogFiles[-1].[LogFileName]' --output text)aws rds download-db-log-file-portion \
--db-instance-identifier ${INSTANCE_NAME} \
--starting-token 0 \
--log-file-name "${LOGFILE}" \
--output text | grep AUDIT
2018-12-04 15:38:33 UTC:95.164.49.162(50215):postgres@mydb:[28087]:LOG: AUDIT: SESSION,1,1,ROLE,CREATE ROLE,,,CREATE ROLE test_role;,<not logged>
And if it works, PGAudit stores events to RDS logs, and we can grab them. Let’s automate it!
AWS resources: Lambda and CloudWatch
Create a role for Lambda function:
export ROLE_NAME=pgaudit_role_stage
export POLICY_NAME=pgaudit_policy_stagecat << EOF > ${ROLE_NAME}.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"lambda.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
EOFaws iam create-role \
--role-name ${ROLE_NAME} \
--assume-role-policy-document file://${ROLE_NAME}.json
Create the policy:
cat << EOF > ${POLICY_NAME}.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"rds:DownloadDBLogFilePortion",
"rds:DescribeDBLogFiles",
"rds:DownloadCompleteDBLogFile"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"lambda:GetFunction",
"lambda:InvokeFunction"
],
"Resource": "*"
}
]
}
EOFaws iam create-policy \
--policy-name ${POLICY_NAME} \
--policy-document file://${POLICY_NAME}.json
Attach the policy to the role:
POLICY_ARN=$(aws iam list-policies --output json --query 'Policies[*].[PolicyName,Arn]' --output text | grep ${POLICY_NAME} | awk '{print $2}')aws iam attach-role-policy \
--policy-arn ${POLICY_ARN} \we
--role-name ${ROLE_NAME}
And now we are ready to create a Lambda function. Create a script and pack it to zip archive:
cat << EOF > lambda_function.py
import boto3
import os
import smtplib
from email.mime.text import MIMEText
def lambda_handler(event, context):
instance_name = 'dbstage' # replace me
username = 'audit@example.com' # replace me
targets = ['security@example.com'] # replace me
sender = username
password = os.environ['GMAIL_PASSWORD']
smtp_server = 'smtp.gmail.com'
temp_logfile = '/tmp/rds.log' rds = boto3.client('rds')
get_log_file = rds.describe_db_log_files(
DBInstanceIdentifier=instance_name
) # Get last log and save it to file
log_file = get_log_file['DescribeDBLogFiles'][-2]['LogFileName']
with open(temp_logfile, 'w') as f:
token = '0'
get_log_data = rds.download_db_log_file_portion(
DBInstanceIdentifier=instance_name,
LogFileName=log_file,
Marker=token
)
while get_log_data['AdditionalDataPending']:
f.write(get_log_data['LogFileData'])
token = get_log_data['Marker']
get_log_data = rds.download_db_log_file_portion(
DBInstanceIdentifier=instance_name,
LogFileName=log_file,
Marker=token
)
f.write(get_log_data['LogFileData'])
f.close() # Find AUDIT events in log file
array = []
with open(temp_logfile, 'r') as f:
for line in f:
if 'AUDIT' in line:
array.append(line)
f.close() if not array:
print('No audit events, skipping...')
else:
output = "\n".join(str(x) for x in array)
print('There are some audit events, sending email...')
msg = MIMEText(output)
msg['Subject'] = 'Stage DB audit'
msg['From'] = sender
msg['To'] = ', '.join(targets)
server = smtplib.SMTP_SSL(smtp_server, 465)
server.login(username, password)
server.sendmail(sender, targets, msg.as_string())
server.quit()
print('Email has been sent') os.remove(temp_logfile)EOFcat << EOF > requirements.txt
boto3
EOFzip pgaudit_lambda_stage.zip lambda_function.py requirements.txt
Create Lambda function:
export FUNCTION_NAME=pgaudit_stage
export GMAIL_PASSWORD=abcdefghijklmnopROLE_ARN=$(aws iam list-roles --output json --query 'Roles[*].[RoleName,Arn]' --output text | grep ${ROLE_NAME} | awk '{print $2}')aws lambda create-function \
--function-name ${FUNCTION_NAME} \
--zip-file fileb://pgaudit_lambda_stage.zip \
--role ${ROLE_ARN} \
--handler lambda_function.lambda_handler \
--runtime python3.6 \
--timeout 900 \
--environment Variables={GMAIL_PASSWORD=${GMAIL_PASSWORD}}
Create a CloudWatch event:
export CRON_RULE_NAME=pgaudit_stage_cronaws events put-rule \
--name ${CRON_RULE_NAME} \
--schedule-expression 'cron(02 */1 * * ? *)'RULE_ARN=$(aws events describe-rule --name ${CRON_RULE_NAME} --query '[Arn]' --output text)aws lambda add-permission \
--function-name ${FUNCTION_NAME} \
--statement-id pgaudit_event \
--action 'lambda:InvokeFunction' \
--principal events.amazonaws.com \
--source-arn ${RULE_ARN}FUNCTION_ARN=$(aws lambda get-function --function-name ${FUNCTION_NAME} --query 'Configuration.FunctionArn' --output text)cat << EOF > targets.json
[
{
"Id": "1",
"Arn": "${FUNCTION_ARN}"
}
]
EOFaws events put-targets \
--rule ${CRON_RULE_NAME} \
--targets file://targets.jsonaws lambda publish-version --function-name ${FUNCTION_NAME}
This event triggers our function every hour, at 00:02, 01:02, 02:02, etc.
Tests and results
You can test it by running some events like, for example, role creation:
psql -h ${ENDPOINT} \
-U postgres ${DATABASE} \
-c 'CREATE ROLE test_role;'
Wait for cron time or go to Lambda in AWS console and run a test:
Stay tuned and subscribe to our blog, we will publish new articles soon :)