Network monitoring | Use AWS Athena to query VPC Flow Logs

Exequiel Barrirero
binbash
Published in
4 min readDec 2, 2021

This article is based on our experience with dozen of AWS projects at https://www.binbash.com.ar, specifically with Binbash Leverage Reference Architecture for AWS the original inspiration article has been written and shared by Diego Ojeda (DevOps Cloud Solutions & Software Architecture Consultant at Binbash)

Overview

Monitoring network traffic is a critical component of security best practices to meet compliance requirements, investigate security incidents, track key metrics, and configure automated notifications.

AWS VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data can be published to Amazon CloudWatch Logs or Amazon S3. After you create a flow log, you can retrieve and view its data in the chosen destination.

Figure: Reference AWS VPC Flow Logs diagram. Flow log (fl-aaa) that captures accepted traffic for the network interface for EC2 A1 and publishes the flow log records to an S3 bucket. 2nd flow log captures all traffic for subnet B and publishes the flow log records to CloudWatch Logs. (Source: AWS, “Flow logs basics,” AWS Documentation, accessed December 2nd, 2021).

Flow logs can help you with a number of tasks, such as:

  • Diagnosing overly restrictive security group rules
  • Monitoring the traffic that is reaching your instance
  • Determining the direction of the traffic to and from the network interfaces

IMPORTANT: Flow log data is collected outside of the path of your network traffic, and therefore does not affect network throughput or latency. You can create or delete flow logs without any risk of impact to network performance.

In this guide, we will present how to set up and use VPC Flow Logs and use Amazon Athena to query and analyze VPC Flow Logs stored in S3.

In a nutshell

  1. Configure an S3 bucket for Athena to store query results (you only need to do this once).
  2. Create a table for VPC Flow Logs
  3. Create a partition for the dates you want to be able to query
  4. Run your queries now
Figure: Using Athena to query VPC Flow Logs (Source: “Analyze VPC Flow Logs with point-and-click Amazon Athena integration”, AWS official blog, accessed December 2nd 2021)

How-to Guide

Configure an S3 bucket for Athena to store query results

This step was introduced recently and thus might not be documented in older guides. It is simple to do though. Just do the following:

1- Go to Athena

2- Click on “Settings”

3- Select a bucket (either create a bucket before doing this or use an existing bucket):

4- Click Save

Create a table for VPC Flow Logs

Use the following query to create a table that will inform Athena about the schema of your data source — make sure you replace the placeholders surrounded by curly brackets, such as {YOUR_LOGS_BUCKET}:

CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs ( 
version int,
account string,
interfaceid string,
sourceaddress string,
destinationaddress string,
sourceport int,
destinationport int,
protocol int,
numpackets int,
numbytes bigint,
starttime int,
endtime int,
action string,
logstatus string,
vpcid string,
subnetid string,
instanceid string,
tcpflags int,
type string,
pktsrcaddr string,
pktdstaddr string,
region string,
azid string,
sublocationtype string,
sublocationid string,
pktsrcawsservice string,
pktdstawsservice string,
flowdirection string,
trafficpath string 31
)
PARTITIONED BY (`date` date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://{YOUR_LOGS_BUCKET}/{PREFIX}/AWSLogs/{ACCOUNT_ID}/vpcflowlogs/{REGION_CODE}/'
TBLPROPERTIES ("skip.header.line.count"="1");

OPTIONAL: this query above will create the table in your currently selected database (which most likely be the default one). Feel free to create a new table should you need one.

Create a partition for the dates you want to be able to query

You need to create a table partition to restrict the amount of data scanned by each query. This is useful to optimize query execution.

Run the following query to create partition for the given date (again make sure you replace all the placeholders accordingly):

ALTER TABLE vpc_flow_logs 
ADD PARTITION (`date`='{YYYY}-{MM}-{dd}')
location 's3://{YOUR_LOGS_BUCKET}/{PREFIX}/AWSLogs/{ACCOUNT_ID}/vpcflowlogs/{REGION_CODE}/{YYYY}/{MM}/{dd}';

IMPORTANT: you need to run the query above for every date you want to be able to query. Consider automating this process if you need to run queries at any time without having to rely on remembering to run this query every time.

Run your queries now

Feel free to use any of the examples below as a reference to get started quickly.

Figure: AWS Management Athena console querying VPC Flow Logs (Source: “Analyze VPC Flow Logs with point-and-click Amazon Athena integration”, AWS official blog, accessed December 2nd 2021)

For instance:

1️⃣ Return 100 records for the given date:

SELECT *  
FROM vpc_flow_logs
WHERE date = DATE('2021-12-02') 4LIMIT 100;

2️⃣ This is how you can list all of the rejected TCP connections:

SELECT day_of_week(date) AS 
day,
date,
interfaceid,
sourceaddress,
action,
protocol
FROM vpc_flow_logs
WHERE action = 'REJECT' AND protocol = 6
LIMIT 100;

3️⃣ Check which of your servers is receiving the highest number of HTTPS requests:

SELECT SUM(numpackets) AS 
packetcount,
destinationaddress
FROM vpc_flow_logs
WHERE destinationport = 443 AND date > current_date - interval '7' day
GROUP BY destinationaddress
ORDER BY packetcount DESC
LIMIT 10;

Check out the links under the references section to see more details and query examples.

References

  1. Querying Amazon VPC Flow Logs — Amazon Athena
  2. https://aws.amazon.com/premiumsupport/knowledge-center/athena-analyze-vpc-flow-logs/
  3. Query flow logs using Amazon Athena — Amazon Virtual Private Cloud
  4. https://aws.amazon.com/premiumsupport/knowledge-center/athena-create-use-partitioned-tables/
  5. https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-log-records
  6. https://acloudguru.com/hands-on-labs/working-with-aws-vpc-flow-logs-for-network-monitoring

--

--

Exequiel Barrirero
binbash

Co-Founder & Director of Engineering @ binbash | AWS Community Builder 🏗️☁️