Netflix Cloud Security: Detecting Credential Compromise in AWS

Will Bengtson, Netflix Security Tools and Operations

Credential compromise is an important concern for anyone operating in the cloud. The problem becomes more obvious over time, as organizations continue to adopt cloud resources as part of their infrastructure without maintaining an accompanying capability to observe and react to these compromises. The associated impacts to these compromises vary widely as well. Attackers might use this for something as straightforward as stealing time and CPU on your instances to mine Bitcoin, but it could be a lot worse; credential compromise could lead to deletion of infrastructure and stolen data.

We on the Netflix Security Tools and Operations team want to share a new methodology for detecting temporary security credential use outside of your AWS environment. Consider your AWS environment to be “all those AWS resources that are associated with your AWS accounts.”

Advantages

  • You’ll be able to detect API Calls with AWS EC2 temporary security credentials outside of your environment without any prior knowledge of your IP allocations in AWS.
  • You’ll go from zero to full coverage in six hours or less.
  • The methodology can be applied in real time, as well as on historical AWS CloudTrail data, to determine potential compromise.

Scope

In this post, we’ll show you how to detect compromised AWS instance credentials (STS credentials) outside of your environment. Keep in mind, however, that you could do this with other temporary security credentials, such as ECS, EKS, etc.

Why is this useful?

Attackers understand where your applications run, as well as common methods of detecting credential compromise. When attacking AWS, attackers will often try to use your captured AWS credentials from within their AWS account. Perhaps you’re already paying attention to invocations of the “dangerous” AWS API calls in your environment — which is a great first step — but attackers know what will get your attention, and are likely to try innocuous API calls first. The obvious next step is to determine if API calls are happening from outside of your environment. Right now, because the more general AWS IP space is well-known, it’s easy to detect if API calls originate from outside of AWS. If they originate from AWS IPs other than your own, however, you’ll need some extra magic. That’s the methodology we’re publicizing here.

How does it work?

To understand our method, you first need to know how AWS passes credentials to your EC2 instance, and how to analyze CloudTrail entries record by record.

We’ll first build a data table of all EC2 assumed role records, culled from CloudTrail. Each table entry shows the instance ID, assumed role, IP address of the API call, and a TTL entry (the TTL helps keep the table lean). We can quickly determine if the caller is within our AWS environment by examining the source IP address of the API call from an instance.

Assume Role

When you launch an EC2 instance with an IAM Role, the AWS EC2 service assumes the role specified for the instance and passes those temporary credentials to the EC2 metadata service. This AssumeRole action appears in CloudTrail with the following key fields:

eventSource: sts.amazonaws.com
eventName: AssumeRole
requestParameters.roleArn: arn:aws:iam::12345678901:role/rolename
requestParameters.roleSessionName: i-12345678901234
sourceIPAddress: ec2.amazonaws.com

We can determine the Amazon Resource Name (ARN) for these temporary instance credentials from this Cloud Trail log. Note that AWS refreshes credentials in the EC2 metadata service every 1–6 hours.

When we see an AssumeRole action by the EC2 service, let’s store it in a table with the following columns:

Instance-Id
AssumedRole-Arn
IPs
TTL

We can get the Instance-Id from the requestParameters.roleSessionName field. For each AssumeRole action, let’s check to see if a row already exists in our table. If not, we will create one.

If the row exists, let’s update the TTL to keep it around. At this point we update the TTL; since the instance is still up and running, we don’t want this entry to expire. A safe TTL in this case is 6 hours, due to the periodic refreshing of instance credentials within AWS, but you may decide to make it longer. You can construct the AssumedRole-Arn by taking the requestParameters.roleArn and requestParameters.roleSessionName from the AssumeRole CloudTrail record.

For example, the resulting AssumedRole-Arn for the above entry is:

arn:aws:iam::12345678901:assumed-role/rolename/i-12345678901234

This AssumeRole-Arn becomes your userIdentity.arn entry in CloudTrail for all calls that use these temporary credentials.

AssumedRole Calls

Now that we have a table of Instance-IDs and AssumeRole-ARNs, we can start analyzing each CloudTrail record using these temporary credentials. Each instance-id/session row starts without an IP address to lock to (remember, we claimed that with this method, you won’t need to know all your IP addresses in advance).

For each CloudTrail event, we will analyze the type of record to make sure it came from an assumed role. You can do this by checking the value of userIdentity.type and making sure it equals AssumedRole. If it is AssumedRole, we will grab the userIdentity.arn field which is equivalent to the AssumeRole-Arn column in the table. Since the userIdentity.arn has the requestParameters.roleSessionName in the value, we can extract the instance-id and do a lookup in the table to see if a row exists. If the row exists, we then check to see if there are any IPs that this AssumeRole-Arn is locked to. If there aren’t any, then we update the table with the sourceIPAddress from the record and this becomes our IP address that all calls should come from. And here’s the key to the whole method: If we see a call with a sourceIPAddress that doesn’t match the previously observed IP, then

we have detected a credential being used on an instance other than the one to which it was assigned, and we can assume that credential has been compromised.

For CloudTrail events that do not have a corresponding row in the table, we’ll just have to discard these, because we can’t make a decision without a corresponding entry in the table. However, we’ll only face this shortcoming for up to six hours, due to the way AWS handles temporary instance credentials within EC2. After that point we’ll have all of the AssumeRole entries for our environment, and we won’t need to discard any events.

Edge Cases

To prevent false positives, you’ll want to consider a few edge cases that impact this approach:

  • For certain API calls, AWS will use your credentials and make calls on your behalf.
  • If you have an AWS VPC Endpoint for your service, calls to these will show up in the logs as associated with a private IP address.
  • If you attach a new ENI or associate a new address to your instance, you’ll see additional IPs for that AssumedRole-Arn show up in CloudTrail.

AWS Makes Calls on Your Behalf

If you look in your CloudTrail records, you may find that you see a sourceIPAddress that shows up as <servicename>.amazonaws.com outside of the AssumeRole action mentioned earlier. You will want to account for these appearing and trust AWS in these calls. You might still want to keep track of these and provide informational alerting.

AWS VPC Endpoints

When you make an API call in a VPC that has a VPC endpoint for your service, the sourceIPAddress will show up as a private IP address instead of the public IP address assigned to your instance or your VPC NAT Gateway. You will most likely need to account for having a [public IP, private IP] list in your table for a given instance-id/AssumeRole-Arn row.

Attaching a New ENI or Associating a New Address to Your Instance

You might have a use case where you attach additional ENI(s) to your EC2 instance or associate a new address through use of an Elastic IP (EIP). In these cases, you will see additional IP(s) show up in CloudTrail records for your AssumedRole-Arn. You will need to account for these actions in order to prevent false positives. One way to address this edge case is to inspect the CloudTrail records which associate new IPs to instances and create a table that has a row for each time a new IP was associated with the instance. This will account for the number of potential IP changes that you come across in CloudTrail. If you see a sourceIPAddress that does not match your lock IP, check to see if there was a call that resulted in a new IP for your instance. If so, add this IP to your IP column in your AssumeRole-Arn table entry, remove the entry in the additional table where you track associations, and do not alert.

Attacker Gets to it First

You might be asking the question: “Since we set the lock IP to the first API call seen with the credentials, isn’t there a case where an attacker’s IP is set to the lock IP?” Yes, there is a slight chance that due to this approach you add an attacker’s IP to the lock table because of a compromised credential. In this rare case, you will detect a “compromise” when your EC2 instance makes its first API call after the lock of the attacker’s IP. To minimize this rare case, you might add a script that executes the first time your AWS instance boots and makes an AWS API call that is known to be logged in CloudTrail.

Summary

The methodology we’ve shared here requires a high level of familiarity with CloudTrail, and how AssumeRole calls are logged. However, there are several advantages, including scalability, as your AWS environment grows and your number of accounts increases, and simplicity, since with this method you needn’t maintain a full list of IP addresses allocated to your account. Do bear in mind the “defense in depth” truism: this should only constitute one “layer” of your security tactics in AWS.

Be sure to let us know if you implement this, or something better, in your own environment.

Will Bengtson, for Netflix Security Tools and Operations

Like what you read? Give Netflix Technology Blog a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.