Investigating CloudTrail Logs

Ryan McGeehan
Starting Up Security
12 min readNov 28, 2016

--

These nightmares are typical in an AWS breach. Would you know how to investigate them?

  • Your caching layer will be intentionally exposed to the internet.
  • Snapshots of running instances will be shared with unknown AWS accounts.
  • Powerful instances will mine Bitcoin on your bill.

These are just a sample of the horror stories that happen when an AWS credential is stolen from your account. Take the bonsai.io breach for example:

Investigation showed a deliberate API-initiated mass-termination of all instances in One More Cloud’s AWS account.

CloudTrail is your most important resource in an AWS breach. See for yourself, from that same article:

We found CloudTrail logs, in correlation with logs from other systems, to be immensely useful in our post-incident security analysis and pursuit of attribution.

This is an incident response reference to understand an AWS breach through your CloudTrail logs.

Get Logs

Note: This is no longer the easiest method. Search your logs into Athena.

The CloudTrail logging location is fairly easy to find or enable in the console. Or, run aws cloudtrail describe-trails and it will reveal the S3 buckets being logged to. If IncludeGlobalServiceEvents is true, the CloudTrail bucket will include logs for all regions. Logs land about every 15 minutes.

To quickly peek at logs, you can aws s3 sync s3://the-cloudtrail-bucket/some/specific/day/. /some/local/directory to manually inspect logs locally. They are gzip’d json files, a helpful command to make them plaintext is find . -name "*.gz" | xargs gunzip. The jq tool in invaluable in reading these logs and doing basic filtering on small sets of logs.

You’re going to want to have powerful search capability with CloudTrail, like an ELK cluster, Splunk, or a cloud service like Loggly or Papertrail. You can also send logs to CloudWatch, but this doesn’t help with a retroactive incident unless they’re already been delivered. It doesn’t hurt to send it there, too, especially if you’ve configured a reasonable log retention.

Problem: Anti-Forensics

Dealing with an attack in progress? You don’t want to spend time taking logs seriously that are about to disappear on you, or were unreliable to begin with.

Bad guys hate logs. Bad guys delete logs.

Dan Grzelik has written extensively about some of these techniques.

  • Logs that land in an S3 bucket could be accessible to an attacker to delete or modify individual logs. What permissions are on the bucket? What users and roles can access the bucket?
  • An attacker could also encrypt CloudTrail logs to a key of their own, which would allow logs to continue streaming without alarm.
  • An attacker may also have the ability to disable logging for that region specifically, or kill the trails overall with CloudTrail permissions.

While investigating an incident that will rely heavily on CloudTrail logs, be sure to consider the integrity of the logs through the duration of the incident.

Answer: Protect Your Logs

These steps should make Dan Grzelik’s article irrelevant. Most of this pertains to the permissions your IAM users and roles have access to. Production users and roles should not have access to CloudTrail, and vice versa.

  • Enable CloudTrail Log File Integrity. This gives you a cryptographic signature to ensure the logs you’ve received haven’t been tampered with, and can be verified easily over the AWS CLI. It will also give you a way to prove logs weren’t deleted altogether, as the digests you receive each chain to the next which will prove if any logs are missing.
  • Cryptographic integrity logs don’t actually protect anything. They only give you absolute certainty that your logs have been ruined, it won’t stop them from being ruined. Minimize access to the S3 bucket that CloudTrail writes to prevent destruction of logs.
  • Minimize access to the CloudTrail API. If a key or role is compromised with write access to any CloudTrail API actions (DeleteTrail, StopLogging, UpdateTrail)
  • Pull logs into a centralized store like ELK / Splunk / Loggly / Papertrail. Completely segment it from your production environment. If you’re using a Lambda function to process these logs, make sure the Lambda function used to process logs cannot be tampered with by your production users and roles.
Someday I hope to investigate something with an actual magnifying glass

Investigating Logs

CloudTrail is very straightforward until you get into assumed roles or see abuse with cross account activity.

If a malicious API event was run by an IAM user with an Access Key / Secret, it will just be a matter of finding out where it was run or stolen from. For example, in an employees laptop, a script, an environment variable. This is narrowed down by the sourceIpAddress and userAgent that created the log.

Assumed roles are more complicated. For roles assumed by IAM users, you’ll first have to track down the corresponding log from an IAM user that that assumed the role (with AssumeRole) and figure out how their credentials were initially stolen. For malicious activity caused by roles assumed by an Ec2 instance, you’ll have to investigate the host for a breach. For Lamba functions, there will likely be an application vulnerability associated with the function’s source code allowing someone to behave badly.

Neither a Lambda function or EC2 instance will have a corresponding AssumeRole log like an IAM user will have. Instead, they seem to silently acquire their credentials.

Reference Notes

These fields are all pulled from official documentation here, and here, with commentary for common incident response scenarios. These are written with the assumption that something may be already wrong about a log you’re looking at. None of these indicate bad behavior without a greater context of your incident.

Please drop me a comment if anything changes in the future or if there’s room for improvement.

eventName

This is the most important part of the record to glance at, it describes the action taken in the API. There are specific naming conventions to get used to, but generally all changes to an account have an obvious prefix of Create*, Write*, Destroy*, etc. Most of the read only actions are Describe*, Get*, List*, etc.

One specific exception: EC2:DescribeInstanceAttribute may have access to read the userData attribute of an instance. This may contain secrets that are meant to be passed to instances and useful to an attacker. So while it may be a “read only” action, it may reveal data to an attacker that allows them to move laterally.

awsRegion

You’ll want to keep an eye out if bad behavior is happening in a region you don’t normally use. You can create alerts for your common regions and alert for anything happening outside of your primary region. Reminder: have CloudTrail enabled in all of your regions.

sourceIPAddress

This field can have DNS entries for AWS resources, or forward along the relevant IP address when things like the AWS console is used. For instance, if someone logs into the console and makes a bunch of changes, it won’t provide the IP address of whatever host the AWS console is running on, but instead will forward the IP address of the browser accessing the console.

  • If an employee is compromised and their console access is hijacked, this will be an important field to tell you if the attacker is remotely accessing them (credential theft) or if the malicious access is local to their host (malware).
  • If a key has leaked and actions are taken off the network, this host will be likely be from some random ISP.
  • If a key is being used maliciously within your own infrastructure, you’ll see an internal IP and it may indicate another compromised host.

Another great alerting mechanism is to use the AWS Config service to receive an inventory of all ENI and EIP’s in your account. A pretty simple script can generate a list of known IP addresses in your account, and allow you to quickly determine if an IP address is known, or potentially belonging to a malicious AWS account that isn’t attached to any resource of yours.

userAgent

The user agent of a malicious client may be a dead giveaway and IOC in a breach. In my decade of security work, most adversaries fail to mimic the look and feel of the clients they spoof. They misspell them, improperly format them, use old versions, none at all, or have a compulsive need to boost their self-esteem and put their hacker handle in them, or mess it up in some other humiliating way.

That said, keep an eye for a userAgent that directly identifies an attacker. Even so called “sophisticated” APT groups still mess this up. History shows that they always will.

You’ll find a lot of less sophisticated attackers that exploit leaked keys using tools like S3 Browser, or ElasticWolf. These are clearly identified in the userAgent field and are a simple alert.

If you’re reading this for alerting ideas: you may have a finite set of userAgent’s in your environment and can try alerting on drastically new ones, while allowing versions to change to reduce noise.

errorCode

In an environment that manages its errors well, this is a great sign of bad behavior. An adversary without access to a GetUserPolicy call will bang into the error codes AccessDenied and UnauthorizedOperation* when they’re forced to enumerate their access, like a noisy port scan.

The errorCode is an excellent field to create alerts with and usually the first I will implement in an incident. However, you’ll find a lot of auditing tools that generate these errors, so don’t panic if you see them.

requestParameters and responseElements

These vary greatly per service, but it’s basically the parameters submitted to the API, and results received. However, responseElements only appears if something actually changed. This may be a workaround to filter for changes if readOnly is not able to be relied upon in the future, by looking only at CloudTrail logs with responseElements for changes.

Update: There are several “read only” calls that return responseElements that did not modify, like DescribeTargetGroups

readOnly

Don’t bother with this field. There should be a better way to filter for a read or write only action in AWS logs, however, with the readOnly value (since eventVersion 1.01) of a CloudTrail log, which is true if something changed, and false if nothing was modified.

This is not consistently implemented across AWS services, so it’s not useful for an incident responder. Filter searches for the existence of responseElements and the naming convention of eventName if you’re looking for logs that have changed an account.

sharedEventID

When a role is assumed in your account from another AWS account, a log is fired off in both accounts, and they’re joined by a sharedEventID. This is a very useful forensic artifact. If you own both accounts, you can continue investigating a suspicious role assumption very quickly by tracking down the sharedEventID in both accounts.

In the event of a malicious AWS account, you’ll have forensic proof of the malicious access by finding the sharedEventID in the malicious AWS account’s CloudTrail logs, which you can likely get from the compromised customer (or AWS) with a warrant. It will be hard to disprove involvement (intentional or unintentional) if that log appears.

userIdentity

This section of the log relates to the “who” behind the API activity, and is important in identifying what was compromised. This field varies greatly depending on whether the identity was a role or a IAM user.

userIdentity.Type

The type of identity used will dictate the next steps of your investigation. The official documentation is here.

  • If Root was used in malicious API requests, then your adversary has full access to the account. This is in addition to what an admin IAM user can do, which means they could also modify billing, support, and the Root credentials themselves. Make Root nonexistent in your account’s scope of permissions so you can loudly alert on its usage.
  • IAMUser would mean an IAM user was compromised. This means secret credentials were stolen. You can hopefully track down the systems where this specific users credentials lived and narrow down how they were compromised.
  • AssumedRole means that a role was assumed with the Security Token Service, and an investigation would follow up to discover what EC2 instance, Lambda function, or IAM user had permission to assume that role. The accessKeyId would be temporary, and you’ll have to find the corresponding AssumeRole that granted it. Finding that log would be an important next step in an investigation.
  • A FederatedUser investigation is similar to AssumedRole, except SAML is involved with granting the temporary credentials.
  • If AWSAccount appears, the request came from a different AWS account altogether, hopefully one that you own. Role assumption is complicated to follow in cross account scenarios. If this role was not assumed in a malicious AWS account, you’ll be able to link the role assumption between both accounts with thesharedEventID field. This field will appear in the CloudTrail logs for both accounts. If you are ultimately investigating a backdoor, the other account may be fraudulent or compromised, and you’ll have to get a warrant or AWS support to continue investigating the malicious account. In your own account, you’ll have to find backdoor permissions on the role that was assumed that allowed another account to assume it.
  • SAMLUser when the request was made with SAML assertion. You may need to roll an investigation forward into wherever identity is federated for AWS (Like Okta, OneLogin, etc)
  • WebIdentityUser is used when the request is made by a web identity federation provider, and may involve actual customers or users. For instance, temporary credentials might be given to a user so they’ll have permission to access an S3 bucket. These are highly likely to be very low trust users, like customers or users of an application that have authenticated with Facebook or a mobile OAuth flow.

userIdentity.PrincipalID

This is not as well documented by AWS as it should be, though it’s very important. These are unique identifiers that are affiliated with objects (similar to an ARN). Their prefixes give a hint as to what the object is, for instance AIDA followed by an identifier is a IAM user, or ARON followed by an identifier is a role. I haven’t been able to find public documentation on all of these identifiers.

In the event that temporary credentials were used (from AssumeRole) you’ll see an ID for the session name of the role assumption after this identifier. The session name is chosen by the client assuming the role, so an attacker can choose something confusing or misleading if they want to.

When an EC2 Instance assumes the role, the instance id will be appended after this as well (like: AROA[ROLE IDENTITY]:i-[INSTANCEID]), and there will not be an AssumeRole log associated with it. With Lambda, the name of the function assuming the role will be appended.

userIdentity.ARN

These are Amazon Resource Names and are well documented.

userIdentity.accountID

For a company operating out of a single AWS account, this should always be the same in your logs, unless you’re backdoor’d by a malicious AWS account a resource assume a role in your own account. So, it may make sense to alert when new AWS accounts are appearing in your logs. You may see innocent behavior from external services like Evident.io or Cloudsploit that you’ve set up, but you can easily whitelist these after an initial pass.

If you’re in a massive environment with tens or hundreds of accounts, this will obviously make alerting much tougher and you’ll have to spend time documenting what accounts ID’s are your own to surface any new, potentially malicious ones.

userIdentity.accessKeyId

These are either permanent credentials for root, an IAM user, or temporary STS credentials being used to assume a role. Permanent keys seem to have a prefix of AKIA, and temporary credentials seem to have the ASIA prefix, but this is not documented and may see changes in the future.

userIdentity.sessionContext

This element should only exist with assumed roles. This has details about the session that will be important in putting together a timeline that continues an investigation, since it will inform you exactly when the session started (and where STS activity should appear in CloudTrail to grant the session)

Additionally, MFA is noted here. If MFA was enabled and abuse still occurred, you may have malware on the endpoint or system hosting the second factor, a malicious or incompetent insider, or an AWS MFA exploit (least likely).

userIdentity.sessionContext.sessionIssuer

This is more useful data to understand the specific object that gave credentials to assume the role. This is most likely a role, but could also be Root or an IAM user when GetFederationToken was used. The accountID will be important to note here.

userIdentity.invokedBy

This is the name of an AWS backend service that may have triggered the API call. These are rarely malicious and are generally noisy things like AWS Config, Auto Scaling or Elastic Beanstalk being noisy in your logs. It may be interesting to consider what an attacker could accomplish while hiding behind an invoked service, but I haven’t seen this in any incidents I’ve worked.

Other Incident Guides

Conclusion

CloudTrail logs are a useful tool before, during, and after an incident. Turn them on, secure them, and make them accessible for investigations and troubleshooting.

@magoo

I’m a security guy, former Facebook, Coinbase, and currently an advisor and consultant for a handful of startups. Incident Response and security team building is generally my thing, but I’m mostly all over the place.

--

--