Photo by Stephen Phillips — Hostreviews.co.uk on Unsplash

AWS Logs Insights

Amit Singh Rathore
Nerd For Tech
Published in
3 min readDec 9, 2020

--

AWS CloudWatch Logs Insights is an SQL like interactive solution for querying, analysing & visualising log-data from cloudWatch. Cloudwatch logs can be VPC flow log, cloudTrail logs, Contact Flow Logs, RDS Logs, Service specificlogs, or custom application logs.

Log insights has a custom query language which is pretty similar to SQL

display → select
fields → attributes/columns available for display
filter → where
stats → group by
sort → order by
limit → limit

Unlike SQL where select … From … where … are single command, log insights separates them into individual commands. To combine them, we need to use the unix-style pipe(|).

By default we get four fields @log, @timestamp, @message, @logstream.

Let us see one basic query.

fields @log, @logStream, @timestamp, @message
| sort @timestamp desc
| limit 100

In the above query we will get 100 records, each having log group name, log stream name, timestamp of log, and the actual message, in descending order of timestamp.

Logs generated by AWS services are in JSON format and you can access individual attributes directly.

fields eventName, eventSource, errorCode, userIdentity.arn
|filter @message like /(?i)(Exception|error|fail)/
|filter eventSource ='sagemaker.amazonaws.com'
|limit 200

PARSE

If the log in JSON format then we can get the individual attributes of message directly, otherwise we need to use ‘PARSE’. Parse accepts both glob and regular expression.

Lets say you have a string field in the Log.

config_rule_name: AWS-007-EBSEncryption

If you want extract individual elements from name then query will be:

fields @message
| PARSE Details.config_rule_name "*-*-*" Scope, RuleId, RuleName
| filter Scope = 'AWS'
| DISPLAY @log, @logStream, Scope, RuleId, RuleName
| limit 20

Using API To query:

Using the console to query the logs is good to start. But generally we should be making API call to have automation. Let us see how to work with insights API.

From above figure its evident that querying logs is a two step process. First to start the query and once its complete then get the results. Implementation in python for the same can be as below.

Note: The query results are not real-time. There is a delay between the event and the log being pushed to CloudWatch. Also the query may take time as it scans more and more data.

Let us see some log query for different type of logs.

RDS Logs:

CloudTrail Logs:

fields eventSource, errorCode
| filter errorCode =~ /^(?i)\w/
| stats count(*) as eventCount by eventSource, errorCode
| sort eventCount desc

By default, Data plane events (e.g. invoke lambda, s3 get, Cloudwatch put metric) are not logged at all in CloudTrail. S3 and Lambda data plane events can be captured by specifically enabling it.

All API calls that are performed via a VPC endpoint and are Denied are NOT logged to CloudTrail at all.

VPC Flow Logs:

filter interfaceId =~ /<eni-id1>|<eni-id2>|<eni-id3>/
| filter dstPort = 443
| fields @timestamp, dstAddr, srcAddr, dstPort
| stats count(srcAddr) by srcAddr

Happy Searching!!

References:

--

--

Amit Singh Rathore
Nerd For Tech

Staff Data Engineer @ Visa — Writes about Cloud | Big Data | ML