Diving into the new CloudTrail Lake

Michael Kandelaars
11 min readMar 12, 2022

--

Creator: Aguirre, Aaron [C]

One of the first big announcements from AWS in 2022 was CloudTrail Lake launching on January 5. I finally got a chance to have a play around with it and wanted to share my thoughts and experiences on it here.

But first let’s start with a quick refresh of what CloudTrail does, which is essential in grasping what benefits CloudTrail Lake brings to the table and whether you should consider it for your environment.

CloudTrail in a nutshell

To quote directly from the AWS documentation,

AWS CloudTrail enables auditing, security monitoring, and operational troubleshooting. CloudTrail records user activity and API usage across AWS services as Events. CloudTrail Events help you answer the questions of “who did what, where, and when?”

CloudTrail records two types of events: Management events capturing control plane actions on resources such as creating or deleting Amazon Simple Storage Service (Amazon S3) buckets, and data events capturing data plane actions within a resource, such as reading or writing an Amazon S3 object.

Considering that every interaction you and your users have with AWS is through API calls, being able to track every user action means you have full auditability and traceability of changes to your AWS environment.

To look at a more practical example, let’s say we know that an EC2 instance was terminated in your AWS account and we want to find out who did it. CloudTrail gives us the data to discover this. A search for the Event name TerminateInstances, shows us that the user michael did this on February 19.

Looks like it was me all along!

Now on the face of it you might be wondering, “This all sounds perfect already, what else could I possibly need”, and this is where we need to dive into the detail a bit more.

The nice little interface shown above to search for the TerminateInstances event is the Event History feature of CloudTrail, which unfortunately has some big limitations.

Limitation #1 — Only stored for 90 days
Yep, only 90 days.

Limitation #2 — Only management events are captured
Cloudtrail management events like TerminateInstances mentioned above are captured in the Event History, however data events such as every read or write to an Amazon S3 bucket or DynamoDB tables are not captured.

Limitation #3 — Event History search is limited
Let’s say we had 1000 TerminateInstances calls in the last day and we wanted to find a very specific one by another field, say userAgent or sourceIPAddress. We’d either have to use the Download events feature to manually sift through the data on our local machine, or start to look at alternatives like AWS Athena, which I’ll touch on in a minute.

The best way to start to solve these problems is to create a CloudTrail trail to send the CloudTrail logs to either S3 or CloudWatch for archiving purposes. The first CloudTrail trail you create in your account (or AWS Organization) for Management Events is free (except for storage costs), so unless the limitations of Event History isn’t a problem for you, or cost is really really a concern, then you should absolutely enable a trail to capture all management events in your AWS Organization for all regions.

Ok so now you’ve solved the first problem and you have more than 90 days worth of CloudTrail logs being stored, you now have a new problem. Every day your S3 bucket is going to have hundreds of gzipped CloudTrail log files in it, so now you need a solution to search through them if you’re after something beyond 90 days.

Until the launch of CloudTrail Lake, the standard AWS native solution to this problem has been to use Athena to search through these. This can be setup either by clicking on the shiny Create Athena Table in the Event History page, or through your favorite IaC tool.

This stack of CloudTrail trail -> S3 and Athena for searching isn’t a bad pattern, but there are quite a few considerations that need to be made to ensure the immutability and security of the data. For example, for the S3 bucket you need to think about the encryption of the data, setting up KMS keys and policies, ensuring that the log files can’t be tampered with or deleted, lifecycle rules for the objects, and more. Similarly for Athena you have other considerations like correctly configuring the workgroup, setting up the S3 bucket to store query results, thinking about the encryption of the query results, etc.

It’s not an insurmountable list of things to consider to make this stack audit ready, but it’s work especially if you’re doing it for the first time or you don’t have automation scripts to configure this. If only there was an easier way…

AWS CloudTrail Lake

To quote directly from the product announcement;

AWS CloudTrail Lake is designed to be a fully managed solution for capturing, storing, accessing, and analyzing user and API activity on AWS. It is a managed data lake for audit and security information, enabling you to aggregate, immutably store your activity logs (control plane and data plane) for up to 7 years, and query logs within seconds for search and analysis. IT auditors can use CloudTrail Lake as an immutable record of all activities to meet audit requirements.

In summary, it’s designed to solve all the problems listed above with the out of the box Event History, and also of the established S3 / Athena pattern for storage and query.

So let’s test it out!

Setting it up

The setup is an absolute breeze, as shown in the following three screenshots from the console. The configuration options are quite minimal, but this is simply because so many things are taken care for you. No need to worry about S3 buckets, managing custom keys for encryption, ensuring log file integrity, etc — all of this is handled for you. You just need to configure how long you want to store the logs for, what events to capture, and away you go.

Setting up CloudTrail Lake
Choose your events
You’re done!

If you’re like me and prefer IaC for provisioning and configuring all your AWS resources, thanks to the hard work of the Terraform community, CloudTrail Lake already has support. Provisioning it is again quite straightforward, with the simplest example from the Terraform docs being;

resource "aws_cloudtrail_event_data_store" "example" {
name = "example-event-data-store"
retention_period = 7
}

If CloudFormation is your IaC tool of choice, unfortunately you’ll have to wait until it has support — so it’s still clickety clack in the console for you.

Ok so I’ve setup a CloudTrail Lake, how do I search it?

Fishing in the CloudTrail Lake

The interface for searching is pretty good, and if you’re familiar with SQL then the syntax will be very familiar to you. Unlike the Event History, you can perform much more complex queries on your data, which can be aggregated across multiple AWS accounts and Regions, making it much easier to find what you are looking for. Plus being able to search over the entire lifetime of the trail (up to 7 years) is very powerful indeed, although this comes out a cost which I’ll touch on soon.

First a very simple example, searching for all events within a specific date range.

SELECT eventID, eventName, eventSource, eventTime, userIdentity.principalid
FROM $EDS_ID_HERE
WHERE
eventTime > '2022-03-08 00:00:00' AND
eventTime < '2022-03-08 23:59:59'

What really shines here is when you are looking for data with multiple conditions and also for specific values within JSON. For example, let’s say you wanted to find out who changed the Instance Deletion Protection on an EC2 instance. The eventName for this API call is ModifyInstanceAttribute, but you also need to look for the attribute disableApiTermination which is found within requestParameters. With the following query we can find this data.

SELECT eventID, eventName, eventSource, eventTime, userIdentity.principalid
FROM $EDS_ID_HERE
WHERE
eventTime > '2022-03-08 00:00:00' AND
eventTime < '2022-03-08 23:59:59' AND
eventName = 'ModifyInstanceAttribute' AND element_at(requestParameters, 'attribute') = 'disableApiTermination'
It was me again!

So what’s the catch

This is all quite powerful, and while the same level of functionality can be achieved through other means (i.e. Athena), the simplicity of setting it up makes it very attractive. The big downside though is the cost.

As mentioned earlier with regular CloudTrail trails, the first trail of management events is free and is therefore an easy decision to turn on in your environment. The same is not true for CloudTrail Lake.

The two key differences is that there is no distinction in pricing for Management vs Data events, and you pay per data ingested rather than per 100,000 events. S3 storage costs for regular CloudTrail trails are calculated separately (more on that further below)

CloudTrail Trail Pricing
Management Events:
First copy of management events is delivered free; additional copies: $2.00 per 100,000 management events delivered
Data Events: $0.10 per 100,000 data events delivered

CloudTrail Lake Pricing
First 5TB: $2.5 per GB
Next 20TB: $1 per GB
Over 25TB: $0.5 per GB

So which is cheaper? The answer actually really surprised me.

Pricing Deep Dive

For management events you can’t beat free, but let’s take a look at data events. To compare I took a very minimal data event of a DynamoDB Scan Query, which in my test the API call was 1649 bytes (this will vary depending on many factors, but this is a good ballpark estimate to play with). For 100,000 calls, this is 164,900,000 bytes, or 164.9MB. At a pricing of $2.5 per GB, this is $0.41 for those 100,000 events compared to $0.10 for regular CloudTrail. If we compare the cheapest pricing point once we are ingesting over 25TB of data and we pay $0.5 per GB, that we pay $0.08 and fall below the $0.10 price point — but remember this is a very minimal Data Event. Once we start comparing larger events such as the actual writes of data to Dynamo DB each event can go up to 256KB, and then the cost comparison is really not in CloudTrail Lake’s favor.

But what about S3 storage costs? The CloudTrail Lake pricing includes data storage for up to 7 years, so $2.5 per GB for 7 years works out to be $0.029 a month, or at the cheapest price point of $0.5 per GB for 7 years that is $0.0059 a month, which incidentally is a bit more than the cost of S3 Glacier Instant Retrieval, but in a similar vein.

Ok so factoring in S3 storage costs over 7 years, let’s run those number again on those 100,000 data events.

Storage costs for 100,000 X 1649 byte Data Events stored for 7 years
CloudTrail Lake:
$0.412 for 100,000 events at First 5TB pricing, or $0.082 at over 25TB pricing.
CloudTrail Trail on S3 Standard Storage:
$0.10 for 100,000 events + $0.31 (S3 cost $0.023 per GB / 1000 * 164.9MB * 84 months) = $0.41

Interesting! The results are pretty comparable at low volume, but CloudTrail Lake comes out on top at very high volume. Let’s try with a different Data Event size by multiplying it by 10.

Storage costs for 100,000 X 16490 byte Data Events for 7 years
CloudTrail Lake: $4.12
for 100,000 events at First 5TB pricing, or $0.82 at over 25TB pricing.
CloudTrail Trail on S3 Standard Storage:
$0.10 for 100,000 events + $3.18 (S3 standard $0.023 per GB / 1000 * 1649MB * 84 months) = $3.28

Regular CloudTrail trails edges in front here, but still massively lags behind once we go into 25TB of logs territory.

Whilst these two examples above don’t factor in using different S3 storage tiers (which also has costs for object transition which can make it actually more expensive to transition), or the compression of the logs files, it looks like at a certain point CloudTrail Lake really comes out on top with Glacier like storage pricing. However this is not the typical use case to have the volume of logs necessary to gain from these price benefits.

Pricing summary

For most use cases you probably only need to record management events and likely you don’t need to keep them for 7 years, so in that case regular CloudTrail trails -> S3 is likely your cheapest option for longer term storage (i.e. more than 90 days). However if you need to retain your logs for multiple years for audit purposes and also need full data events, CloudTrail Lake might actually be considerably cheaper. In that case, take the time and run the math on your environment. The results may surprise you!

Features I’d like to see in the future

The one feature that does seem strangely missing is the ability to configure your CloudTrail Lake in an account other than the root account. It is a typical pattern in any modern AWS environment to have a dedicated log archive account, and additionally several of the other AWS security monitoring services such as GuardDuty or Detective support a “Delegated Administrator” function to shift core infrastructure out of the Root account. CloudTrail Lake should be the same.

Additionally there is no export feature for the logs in CloudTrail Lake, so if you were to completely move away from storing your CloudTrail logs in S3 you lose the ability to run them through an external SIEM tool (in case you were doing that). Of course you can still real time monitoring of the logs using EventBridge, but that is for a future Medium post ;)

CloudFormation support for provisioning CloudTrail Lake. I’m not the biggest fan of CloudFormation and much prefer Terraform, but launching a product on AWS without the support of the AWS native IaC tool is strange to me. This is unfortunately not unusual though for CloudFormation support to lag behind, but it still shouldn’t be the case.

And of course most importantly, a reduction in the cost. If price wasn’t a consideration, turning on CloudTrail Lake would be an obvious thing to do in any AWS environment because it really is so easy and solves a number of the classic CloudTrail problems. If the price could compete with storing CloudTrail logs in S3 at a low volume, then this could become the new standard for most AWS environments.

Conclusion

If you already have an established way of storing and querying your CloudTrail logs, then I wouldn’t go rushing to replace it with CloudTrail Lake just yet. The additional costs likely aren’t worth it, however if you are capturing data events at high volume and you need to retain them for multiple years, then it is worth spending the time to evaluate the costs as it may be a saving for you in the long run.

If you’re starting in a new environment and want something quick and easy that is audit ready, this is a really neat solution and definitely worth checking out. As of writing, new customers can try CloudTrail Lake for 30 days at no additional cost (limited to 5GB of ingestion and 5GB data scanned), so why not turn it on and see if it’s a fit for you!

References

--

--

Michael Kandelaars

Platform Engineer. Shoots for the sky, builds for the cloud