Providing On-Demand, Time-based, Attribute-based, and Least Privilege Access to AWS accounts

Using AWS IAM and Lambda URLs to gain access securely

Pierin Sako

Published in

Engineers @ The LEGO Group

17 min readJul 12, 2022

Introduction

This article describes an approach to provide the ability for a central team (e.g., security or operations team) that owns an AWS account (referred to as the trusted account) to get access to a set of other AWS accounts (referred to as trusting accounts) that may be owned by one or more other teams within a Digital organization.

Within our Digital organization, we are part of the Cloud Enablement team that is responsible for setting up and operating the AWS landing zone. As part of the landing zone, we have the ability to get access to other AWS accounts that are members of our AWS Organizations, whenever that is needed.

Problem Statement

Within the same Digital organization, we have another central team that is responsible for providing our internal product teams with a centralized container platform. For simplicity, we will call this the “Container Platform” team.

In a nutshell, the Container Platform team provides the ability for other product teams to easily deploy containers through this platform and not have to worry about the operation, monitoring, and security aspects that come with containers. The product teams are simply focused on the applications that run in the containers.

These containers are deployed in AWS accounts that are owned by the different product teams where the Container Platform team does not get access besides when the containers are deployed. This posed a problem for the Container Platform team as they were dependent on the product teams to provide them with access to their AWS accounts whenever there was an operational issue with the different containers. This led to precious time being lost as the product teams were not always available, especially if there was an issue during the night or on weekends.

Our team, Cloud Enablement, set out to build a solution that would allow the Container Platform team to get access to different AWS accounts without depending on the product teams, under certain conditions. For this, we were going to utilize the power of the AWS IAM service and specifically IAM Roles as seen further below.

One of the prerequisites was that the product teams and the Container Platform team had agreed between them on this access. This agreement is established as part of the process of using the container platform by the product teams and is not part of the scope of this article.

Solution Overview

We built a serverless solution to allow the Container Platform team to gain cross-account access in the trusting accounts by assuming a specific IAM role. This role is created on-demand in the trusting accounts by the Container Platform team itself when it is needed and it has a trust policy that allows the account of that team to assume it and take specific actions under certain conditions and for a specific amount of time. In addition, this role does not exist permanently, as we describe below.

Solution Requirements

As we take security and confidentiality seriously, before building anything, we had several requirements from the solution we were going to implement to provide access securely:

Authentication and authorization of the Container Platform team must be based on AWS IAM.
The IAM role that provides the Container Platform team with access can be created only from/by another authorized AWS account — referred to as the trusted account, owned by the Container Platform team.
The above role can be created only inside preapproved accounts — referred to as the trusting accounts, owned by the product teams using the container platform.
The role can be created only by an authorized principle from within the trusted account.
The role should be created on-demand and without depending on the product teams.
The role must provide the trusted account time-based, attribute-based, and least-privilege access in the trusting accounts.
The role cannot exist permanently in the trusting accounts.
The role must allow only authorized users from the Container Platform team to get access (assume the role).
Inside the trusting account, the role must allow access only to resources that are owned by the trusted account (those that the Container Platform team creates).
The role must require that all API actions are performed over HTTPS.

With the above requirements, we are able to provide secure access as the trusted account can assume the role securely. This role allows only time-based, attribute-based (ABAC), and least privilege access for the duration that is specified in the role creation request. The policy of the role is pre-determined and based on the actions that the trusted account will need to perform in the trusting accounts. For more implementation details, please read below in the architecture section.

Solution Architecture

The serverless architecture allows on-demand, time-based, temporary, and least-privilege access. Source: Author

In the above architecture, the green box represents that trusted account (Container Platform team — 123456789012), and the blue box represents the trusting accounts (product teams — 112233445566).

In order to be able to provide the ability to create on-demand and temporary access as explained above, we are making use of the new Lambda URL feature. This feature allows for the invocation of a Lambda function via an HTTPS endpoint. The endpoint is invoked just like any other HTTP(S) endpoint and we can pass a request body and different request headers. An example of a Lambda URL looks like this: https://pjqpro68hkual6idxnawkhytoa0uc8go.lambda-url.eu-west-1.on.aws/

The red box represents the account where the Lambda HTTPS Endpoint resides. This account is a core Landing Zone account and has the ability, as part of the Landing Zone setup, to get access to the preapproved accounts to create the role. We use this account as the accounts of the Container Platform team and those of the product teams are totally isolated from each other in terms of IAM role access — there is no prior or existing trust relationship between them. The Landing Zone account is the location where all automated actions for the role creation process take place.

Instead of Lambda URL, we could have used an API Gateway but in that case more resources and configuration are needed. Lambda URLs simplify things a lot.

Architecture Breakdown

Lambda URL Authentication & Authorization

In order to control who can invoke the URL of the Lambda function, we have attached to the function a resource-based policy that designates the principal that is allowed to invoke (lambda:InvokeFunctionUrl action) the URL of the function. In this case, the principal is an IAM role from the trusted account. The policy attached to the Lambda function looks like this:

Lambda function resource-based policy. Source: Author

The example policy above states that only the IAM role “TrustedAccountExecutionRole” from the trusted account “123456789012” (and no other roles from the same account) is allowed to successfully invoke the function. So users must have signed in to the AWS account with that specific IAM role (or use access keys generated from that role - see more about this below) in order to successfully invoke the URL. If any other role(s) from any other accounts attempts to invoke the function URL, the IAM service will deny the request.

The corresponding CloudFormation code of the above function resource-based policy is as below:

TrustedAccountAccessFunctionPermission:
  Type: "AWS::Lambda::Permission"
  Properties:
    FunctionName: !Ref TrustedAccountAccessFunction
    Action: "lambda:InvokeFunctionUrl"
    FunctionUrlAuthType: "AWS_IAM"
    Principal: "arn:aws:iam::123456789012:role/TrustedAccountExecutionRole"

The Container Platform team that owns the trusted account only needs to send an HTTP request using the URL of the function together with the required request payload (seen also below). The IAM service will verify the credentials from the request and will either allow or reject it. For more information on how the HTTPS request is authenticated and authorized by the IAM service, please see below.

Role Creation, Update & Deletion Process Logic

Whenever a request is successfully sent to the HTTPS endpoint of the function, the function will immediately start the process of creating the role. As part of this process, the function will first validate the data that is sent with the request body. This data that is sent by the trusted account must contain the target AWS account ID where the role will be created and the access duration (in minutes) for the role.

request_body = {
 "accountId": "112233445566",
 "accessDurationMinutes": 5
}

As we are not using an API Gateway in this scenario, but using only Lambda URL, we have to implement the validation of the request in the function code. For this, we are using JSON Schema. For example, to validate the request body, the below schema is used:

REQUEST_BODY_SCHEMA = {
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "http://example.com/example.json",
    "type": "object",
    "title": "Request body schema",
    "description": "This schema describes the required structure of the request body",
    "properties": {
        "body": {
            "type": "object",
            "description": "The allowed and required fields of the request body",
            "properties": {
                "accountId": {
                    "type": "string",
                    "description": "The AWS account ID",
                    "pattern": r"^\d{12}$"
                },
                "accessDurationMinutes": {
                    "type": "integer",
                    "description": "The duration of the access in minutes",
                    "minimum": 5,
                    "maximum": 60
                }
            },
            "required": ["accountId", "accessDurationMinutes"]
        }
    },
    "required": ["body"]
}

Then we use the validate() method from the aws_lambda_powertools Python package to validate the request body against the above schema:

from aws_lambda_powertools.utilities.validation import validate
from aws_lambda_powertools.utilities.validation.exceptions import SchemaValidationErrortry:
    validate(event=request_body, schema=REQUEST_BODY_SCHEMA)
except SchemaValidationError as e:
    return {"errorMessage": e.validation_message}

If the request body cannot be validated successfully, a SchemaValidationError exception is raised and a proper HTTP error response is returned to the caller.

In order to control which accounts this role can be created in, a DynamoDB table containing records of preapproved account IDs is used. This table is used to ensure that the role is not created in any other account besides those that have been agreed upon between the Container Platform team and the product teams. Adding a new account ID in this table is a one-time action, performed by the Cloud Enablement team, and only when the role needs to be created in new accounts. Initially, the records in the table look like in the below screenshot:

Records of preapproved AWS accounts. Source: Author

If the data in the payload is as expected, the function will then determine if the target account ID from the request exists in the table with the whitelisted accounts. If the account ID is not whitelisted, the function will return an HTTP error response to the caller:

HTTP Error response — Denied. Source: Author

If the account is whitelisted, depending on the HTTP request type (POST or PATCH), the function will then proceed with the creation, or the update of the role in the trusting account(s).

Role Creation
In order to create the IAM role in the trusting account(s), the Container Platform team must send a POST HTTP request. Once the payload from the request is validated and confirmed, the function will then create the role in the target/trusting account. This role is created with the least privilege principle in mind and it contains several conditions that restrict who can access the role, for how long, and what actions they can take. If the role already exists, the function will reply to the caller with an HTTP error response:

HTTP Error response — Conflict. Source: Author

The payload attribute accessDurationMinutes that is sent with the HTTPS request is used by the function for two purposes:

Converted to a DateTime object and used in the role's trust policy to restrict access to the role based on a specific date and time. For more information about the role's policy and condition, see below.
Converted in seconds and used as a pausing period where the solution “sleeps” and waits for the access duration to expire before deleting the role.

Once the role has been created, the function will then start a Step Functions state machine execution. The execution will immediately “sleep” for as long as the specified access duration that was sent with the request. The function will then update the record in the DynamoDB table and add the execution ARN for the given account ID received with the request. This time the records in the table look as below:

Records of preapproved AWS accounts with state machine execution information. Source: Author

Storing the execution ARN is needed in order to ensure that only one request at a time per account can update the deletion time of the role when the access duration has expired (since multiple requests for the same account can be received).

Specifically, before starting the new execution and writing its ARN to the table, the function performs an intermediate step and checks if any previous executions are still in a “sleeping” state. If any are found, then this means that there is a previous request (POST or PATCH) for which the duration hasn’t expired yet. This can happen when two or more users send requests, one close to the other. Or if the state machine executions failed to complete for some reason. In any case, any old executions that are found, are aborted and replaced with the new execution. If all actions succeed, the function will send a successful HTTP response back to the caller:

Role Update
When it comes to updating the role, the caller can update only its access duration in the role’s trust policy. In order to update the IAM role in the trusting account(s), the callers must send a PATCH HTTP request. Once the payload from the request is validated and confirmed, the function will then proceed with updating the role:

If the role doesn’t exist, the function will reply to the caller with an HTTP error response.

Update response when role does not exist. Source: Author

Similar to the steps that are taken after the role creation as described above, the same steps are followed after the role has been updated.

Role Deletion
As described above, after the function has created or updated the role, it starts a Step Functions execution of a Step Functions state machine. The definition of the state machine looks like below:

Step Functions state machine. Source: Author

Each execution is named after the target account ID (hidden in green color), the user from the trusted account that sent the request for the role creation (hidden in blue color), the HTTP method that was used, and the request ID. All this information is retrieved from the Lambda URL invocation event:

Step Functions state machine executions. Source: Author

This way, we can easily identify what each execution was for and who requested it.

This execution will “sleep” for as long as the specified duration in the request. Once that duration has expired, the execution will resume and it will invoke the same lambda function. This time, the function will delete the role and all its attached policies, leading thus to the full removal of the access.

Role Policy
The role that is created in the trusting account(s), follows the least privilege principle and has policies with several conditions that restrict access from many aspects.

a) Firstly, the role’s trust policy has several conditions with the following restrictions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowContainerPlatformTeamRoleAccess",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789012:role/TrustedAccountExecutionRole"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalType": "AssumedRole",
                    "sts:RoleSessionName": [
                        "*:JoeDoe",
                        "*:MikeMikey"
                    ],
                    "aws:ResourceAccount": "112233445566",
                    "aws:PrincipalAccount": "123456789012",
                    "aws:PrincipalArn": "arn:aws:iam::123456789012:role/TrustedAccountExecutionRole"
                },
                "StringLike": {
                    "aws:userid": [
                        "*:JoeDoe",
                        "*:MikeMikey"
                    ]
                },
                "Bool": {
                    "aws:SecureTransport": "true"
                },
                "DateGreaterThan": {
                    "aws:CurrentTime": "2022-07-10T20:26:16Z"
                },
                "DateLessThan": {
                    "aws:CurrentTime": "2022-07-10T20:31:16Z"
                }
            }
        }
    ]
}

Only a specific principal can assume the role in the trusting account (aws:PrincipalArn condition key). The principal in this case can only be a designated IAM role from the trusted account. This is the same as the Principal element in the statement of this policy.
Only principals from a specific account can assume the role (aws:PrincipalAccount condition key). The account in this case can only be the designated trusted account.
Only principals from assumed role sessions can assume the role (aws:PrincipalType condition key). For more information about this type of principle see the link.
The role can be assumed only if the requested resource belongs to the trusting account (aws:ResourceAccount condition key).
The role can be assumed only if the assumed role session name has a specific value (sts:RoleSessionName condition key). This is used for traceability reasons.
The role can be assumed only through HTTPS (aws:SecureTransport condition key).
Only specific users with access to the allowed IAM role of the trusted account can assume the role in the trusting account (aws:userid condition key). Because access to the IAM role in the trusted account is controlled through AD groups (ADFS) that are not managed by our team, we have no way of controlling which users are added to this group. Therefore, with this condition, we can restrict access to the role in the trusting account only to specific users from the Container Platform team, regardless of the AD group membership. Just because a user has access to the AD group, does not necessarily mean that they should also be able to get cross-account access.
The role can be assumed only during the specified date and time (DateLessThan and DateGreaterThan conditions with the aws:CurrentTime condition key). The duration that is specified in the HTTP request, is converted to a DateTime object that is used in this condition. If the deletion of the role fails for some reason, then access to the trusting account will still be denied because of the time-based condition. This condition serves as an extra mechanism to ensure time-based access.

Note: In the above trust policy, the aws:PrincipalAccount and aws:PrincipalArn condition keys could be ommitted because we already specify the allowed principal in the Principal node of the policy. In our case, we included these two condition keys in order to harden the trust policy as much as possible.

b) Besides the trust policy of the role which determines which principal (user, role) can assume it and under what circumstances, the role contains also customer-managed IAM policies attached to it. These IAM policies are the second layer of protection to make sure that the trusted account is not able to take any actions other than those that they are allowed. To achieve this, the managed policies make use of the Attribute-Based Access Control (ABAC) mechanism based on request and resource tags. Specifically, the policies ensure that any of the allowed modification actions can be taken only on resources that have a specific tag already attached to them. In addition, any actions that create resources, require that a specific tag is sent with the API call that creates the resource. This way we restrict the Container Platform team to modify only resources that they own (based on the tag). Take as an example, the IAM policy statements below:

{
    "Sid": "AllowEKSReadAndModifyActionsBasedOnTags",
    "Effect": "Allow",
    "Action": [
        "eks:UntagResource",
        "eks:TagResource",
        "eks:UpdateClusterConfig",
        "eks:DeleteCluster",
        "eks:DeleteNodegroup"
    ],
    "Resource": "*",
    "Condition": {
        "StringEquals": {
            "aws:ResourceTag/Owner": "ContainerPlatform"
        }
    }
},
{
    "Sid": "AllowEKSCreateActionsBasedOnTags",
    "Effect": "Allow",
    "Action": [
        "eks:CreateCluster",
        "eks:CreateNodegroup",
        "eks:TagResource"
    ],
    "Resource": "*",
    "Condition": {
        "StringEquals": {
            "aws:RequestTag/Owner": "ContainerPlatform"
        },
        "ForAllValues:StringEquals": {
            "aws:TagKeys": [
                "Owner",
                "Name"
            ]
        }
    }
},
{
    "Sid": "DenyNonHTTPSAccess",
    "Effect": "Deny",
    "Action": "*",
    "Resource": "*",
    "Condition": {
        "Bool": {
            "aws:SecureTransport": "false"
        }
    }
}

The statement with Sid AllowEKSReadAndModifyActionsBasedOnTags dictates that the trusted account can take the UntagResource, TagResource, UpdateClusterConfig, DeleteCluster, and DeleteNodegroup EKS actions only on resources that contain a tag with key Owner and value ContainerPlatform. This is achieved by using the aws:ResourceTag/Owner condition key. This condition ensures that the Container Platform team cannot take any other actions in the trusting accounts on resources that are not owned by them.

Here we need to pay particular attention to the fact that we are including also the TagResource action in the first statement. This is needed to protect against privilege escalation. The Container Platform team can tag any existing EKS resources, as long as they are already tagged with the specified key-value tag. Without this, one could escalate their privileges by tagging other EKS resources not owned by them and thus taking them under their management.

Similarly, the statement with Sid AllowEKSCreateActionsBasedOnTags requires that when the trusted account takes the CreateCluster, CreateNodegroup, and TagResource actions, it must provide with the request a tag with a key Owner and value ContainerPlatform. This is achieved by using the aws:RequestTag/Owner condition key. This condition ensures that the Container Platform team marks the resources that they own in the trusting accounts. If this condition was not in the statement, then the Container Platform team would be blocked by the first statement the next time they want to modify that specific EKS cluster.

In addition, the second statement contains the ForAllValues:StringEquals condition with the aws:TagKeys condition key. This restricts the tags that can be sent with those specific EKS actions to only the Owner and Name tag keys.

Lastly, the third statement with Sid DenyNonHTTPSAccess requires that all API actions are taken over HTTPS. If not, the action is denied.

Therefore, the customer-managed policies that are attached to the role, are least privileged and tailored to the specific needs of the Container Platform team. They dictate what actions that trusted account can take once it has assumed the role in a trusting account and on which resources. These managed policies will always be different and depend on the actions that the trusted account needs to perform.

Note: The actual role policy is much larger in size as it contains several AWS actions that are needed by the trusted account. The above are just example statements that show how we are achieving the least privilege part of the access and ensure access is given only for resources owned by the trusted account.

Sending an HTTP Request to Invoke the Lambda URL

In order to send an HTTP request to any AWS API, that request must be signed with AWS Signature Version 4 (SigV4) signing process.

From AWS documentation:
SigV4 is the process to add authentication information to AWS API requests sent over HTTP(S). For security, most requests to AWS must be signed with an access key. The access key consists of an access key ID and secret access key, which are commonly referred to as your security credentials.

For more information, see this page.

In order to invoke the URL of the Lambda function, the users of the trusted account must sign their HTTP request using access keys which they must generate from the trusted IAM role (in the trusted account) that is allowed (in the function resource policy) to invoke the URL. Without doing this, the AWS IAM service will deny the request as it will not be authenticated successfully.

Alternatively, one can use an HTTP client that supports AWS v4 Signature to send signed HTTPS requests to AWS. An example of such an HTTP client is Thunder Client in VSCode.

Note: The AWS SDKs and AWS CLI already sign API requests on your behalf using the access key that you specify. Therefore there is no need to sign the requests when using these tools. However, if you’re writing your own code to send and sign AWS API requests, you need to follow the instructions that are seen on the above-linked page. The recommendation is that you use the AWS SDKs, CLI, or other AWS tools to send signed API requests, instead of writing your own code.

Conclusion

As you can see, we have created a solution that allows product teams to create, inside pre-approved AWS accounts and in an on-demand manner, an IAM role that exists temporarily, provides time-based, attribute-based, and least privilege access, and that allows access only to a specific principal.

In a nutshell, all that the users of the trusted account have to do is:

Generate credentials using the designated allowed IAM role from the trusted account.
Send a signed HTTP request with the right payload to the Lambda URL to create the role.
Use the new role to assume it and get access to the trusting accounts.

Our solution is agnostic to the organization structure and can very easily be used by more teams. The only prerequisite is that the involved teams have agreed between them on this access.