Troubleshoot AWS KMS permissions for encrypted EBS-backed AMIs launched in secondary ASG

Tony Tannous
Engineering the Skies: Qantas Tech Blog
7 min readJul 27, 2023

In most cases, troubleshooting EC2 issues starts with reviewing system logs through the AWS console, or by connecting to an instance via SSH/Session Manager, but what happens when an instance fails to launch?

The log messages below were observed while attempting to launch an instance in an AWS “cross-account” scenario:

State transition reason:Client.InternalError
State transition message: Client.InternalError: Client error on launch

Some further background on the activities leading up to the error:

  • An encrypted EBS-backed AMI image, with the following ID, had been created in a primary AWS account
"ImageId": "ami-0ad79b02b5570cdcc"
  • EBS snapshot for the AMI, in the primary account, was encrypted using a KMS Customer managed key (CMK) with ARN:
arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337
  • The AMI was shared with an AWS secondary account
  • Permissions policy attached to the KMS CMK in the primary account, was correctly configured to grant the secondary account access to the key
  • Attempting to launch the AMI in the secondary account’ autoscaling group (ASG), leads to the error message previously described, i.e.,
State transition reason:Client.InternalError
State transition message: Client.InternalError: Client error on launch

Note: references to resource names have been altered for obvious reasons.

It turned out the error was a result of missing KMS permissions.

Ok, so what does “missing KMS permissions mean”?

This article drills down to the level of granularity required to zone in on the root cause, and finally, a resolution.

The following fictitious AWS primary/secondary accounts and resource IDs will be referenced during troubleshooting procedure:

╔══════════════════════════════════════════════════════════════════════════╗
║ PRIMARY ACCT ║
╠═════════════════╦════════════════════════════════════════════════════════╣
║ Account ID ║ 444455556666 ║
╠═════════════════╬════════════════════════════════════════════════════════╣
║ KMS CMK ARN ║ arn:aws:kms:us-east-1:444455556666:key ║
║ ║ /12ee9c11-3476-492c-b5cc-ed4a4d636337 ║
╠═════════════════╬════════════════════════════════════════════════════════╣
║ KMS key ID ║ 12ee9c11-3476-492c-b5cc-ed4a4d636337 ║
╠═════════════════╬════════════════════════════════════════════════════════╣
║ KMS alias ║ my-cmk-key ║
╠═════════════════╬════════════════════════════════════════════════════════╣
║ AMI ID ║ ami-0ad79b02b5570cdcc ║
╠═════════════════╬════════════════════════════════════════════════════════╣
║ EBS Snapshot ID ║ snap-0cbabc774a6d47618 ║
╚═════════════════╩════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════════╗
║ SECONDARY ACCT ║
╠═════════════════╦════════════════════════════════════════════════════════╣
║ Account ID ║ 111122223333 ║
╠═════════════════╬════════════════════════════════════════════════════════╣
║ Auto scaling ║ arn:aws:iam::111122223333:role/aws-service-role/ ║
║ service-linked ║ autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling ║
║ role ARN ║ ║
╚═════════════════╩════════════════════════════════════════════════════════╝

Understanding the architecture components

The following diagram is a visual representation of the architecture/services we’ll touch on. It should prompt us to make a mental note: solving problems of this nature calls for “hybrid” thinking.

Green circles (numbered 1 to 6), are assigned to various components to assist with explanations/considerations in the sections to follow.

Each section heading includes the respective number(s), as depicted in the diagram.

1) KMS

KMS Key

Starting with circle numbered (1):

Check to ensure the key ID 2ee9c11–3476–492c-b5cc-ed4a4d636337, used to encrypt the EBS-backed AMI is a CMK.

  • Locate the key within the KMS service, and ensure it’s listed as a Customer managed key

After confirming a CMK was used, we note down the ARN for key:

arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337

KMS key policy

The Key policy attached to the CMK, should include the statements to allow the secondary account access to the key, i.e.,:

...
{
"Sid": "Allow use of the key",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::111122223333:root"
]
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
},
{
"Sid": "Allow attachment of persistent resources",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::111122223333:root"
]
},
"Action": [
"kms:CreateGrant",
"kms:ListGrants",
"kms:RevokeGrant"
],
"Resource": "*"
}
...

2&3) Check baked AMI created using correct KMS CMK

Numbered circles 2 & 3 in the diagram, serve as a reminder to ensure the correct KMS CMK key was used during the AMI creation.

For our sample AMI ami-0ad79b02b5570cdcc , the KMS key ID should match the value in the previous section:

(4) Ensure the AMI has been shared

Check the AMI has been shared with the secondary account.

Explicitly sharing the snapshot is not required.

If the AMI is not already shared, run the following to share the AMI with the secondary account:

$ aws ec2 modify-image-attribute \
--image-id ami-0ad79b02b5570cdcc \
--launch-permission "Add=[{UserId=111122223333}]"

Checkpoint

At this stage, it would seem that almost everything is covered, however, attempting to launch the AMI in the secondary account’ ASG leads to:

Following the suggested troubleshooting tip, does not lead to any further clues:

$ aws --profile sandbox_admin ec2 describe-instances \
--instance-id i-08eac0xxxxxxxxxxx \
--region us-east-1

output:

{
"Reservations": [
{
"Groups": [],
"Instances": [
{
"AmiLaunchIndex": 0,
"ImageId": "ami-0ad79b02b5570cdcc",
"InstanceId": "i-08eac0xxxxxxxxxxx",
...
...
"State": {
"Code": 48,
"Name": "terminated"
},
"StateTransitionReason": "Client.InternalError",
"Architecture": "x86_64",
"BlockDeviceMappings": [],
...
...
"StateReason": {
"Code": "Client.InternalError",
"Message": "Client.InternalError: Client error on launch"
},
...
}

Create a trail (AWS CloudTrail)

Searching through the secondary account’ standard CloudTrail logs did not reveal anything obvious.

However, I decided to create a trail, hoping this would provide further diagnostic messages/clues.

The trail would send logs to a nominated S3 bucket (trail log location), and was configured to capture all API activity (including AWS KMS events) along withEC2 instance connect endpoint data events.

After enabling the trail, I attempted launching the AMI once again (using a launch template configured to use the ASG).

Not knowing the exact intervals at which CloudTrail “flushes” data to S3, I waited around 5 minutes before downloading the logs.

Running a quick grep -i against the logs, searching for strings matching any of the following: kms,access,error, narrowed down the output to:

{
...
"eventVersion": "1.08",
"userIdentity": {
"type": "AssumedRole",
"principalId": "XXXXXXXXXXXXXXX:AutoScaling",
"arn": "arn:aws:sts::111122223333:assumed-role/AWSServiceRoleForAutoScaling/AutoScaling",
"accountId": "111122223333",
...
"sessionContext": {
"sessionIssuer": {
"type": "Role",
...
"arn": "arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling",
"accountId": "111122223333",
"userName": "AWSServiceRoleForAutoScaling"
...
...
"invokedBy": "autoscaling.amazonaws.com"
...
...
"userAgent": "autoscaling.amazonaws.com",
"errorCode": "AccessDenied",
"errorMessage": "The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.",
...
"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "111122223333",
"eventCategory": "Management"
...
}

From the above, we can see from the errorMessage:, that the autoscaling service-linked role was denied access when attempting to access the KMS key, i.e.,:

...
"arn": "arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling",
"accountId": "111122223333",
"userName": "AWSServiceRoleForAutoScaling"
...
...
"invokedBy": "autoscaling.amazonaws.com"
...
"errorMessage": "The ciphertext refers to a customer master key
that does not exist, does not exist in this region, or
you are not allowed to access.",

After revisiting the documentation, it became clear that an additional KMS grant was required to allow the secondary account’ ASG service-linked role, access to the CMK in the primary account.

Getting back to our green numbered circles in the architecture diagram, this would be our number (5).

(5) KMS key grant for autoscaling service

The service-linked role ARN for the autoscaling service was listed in the previous trail log:

arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScalin

The official documentation describes the additional access requirement, along with the associated AWS CLI command required to apply the grant:

“If you create a customer managed key in a different account than the Auto Scaling group, you must use a grant in combination with the key policy to allow cross-account access to the key.”

We substitute our infra resource values and run the command:

aws --profile sandbox_admin kms create-grant \
--key-id arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337 \
--grantee-principal \
arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling \
--operations Decrypt \
GenerateDataKeyWithoutPlaintext \
ReEncryptFrom \
ReEncryptTo \
CreateGrant

Output:

{
"GrantToken": "AQpAZTZm....",
"GrantId": "cff2...."
}

List the grant to ensure permissions have been applied:

{
"Grants": [
{
"KeyId": "arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337",
...
{
"KeyId": "arn:aws:kms:us-east-1:444455556666:key/12ee9c11-3476-492c-b5cc-ed4a4d636337",
"GrantId": "cff2....",
...
"GranteePrincipal": "arn:aws:iam::111122223333:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling",
"IssuingAccount": "arn:aws:iam::111122223333:root",
"Operations": [
"Decrypt",
"GenerateDataKeyWithoutPlaintext",
"ReEncryptFrom",
"ReEncryptTo",
"CreateGrant"
....

Launch instance using AMI (6)

Before testing, ensure we set ASG values, Desired Capacity=1 and Maximum capacity>=1.

Finally, launching an instance from the AMI leads to a successful boot.

Originally published at https://anthony-f-tannous.medium.com on July 27, 2023.

Information has been prepared for information purposes only and does not constitute advice.

--

--