IAM Policy Management Unleashed: A Case Study on AWS

Ofri Sherf
Melio’s R&D blog
7 min readFeb 15, 2024

--

In the ever-evolving landscape of cloud infrastructure, effective and secure management of access permissions is crucial.

In this article, I will delve into our journey to enhance access control using IAM and optimizing Terraform codebase to streamline third-party access to resources. I will explain how I implemented the management of these policies while overcoming limitations in Terraform and restrictive service quotas in AWS.

At Melio, we leverage AWS Identity and Access Management (IAM) to oversee third-party access to our resources. Recognizing the need for robust permission management, I embarked on integrating a better solution into our Terraform codebase.

Upon a thorough examination of the existing policies, it became apparent that our existing setup and policies were less than optimal. The configuration included a mix of inline policies directly linked to the user, additional inline policies tied to various groups the user belonged to, and multiple managed policies associated with both the user and several groups. Beyond the challenge of editability and control, the configuration exhibited redundant and repetitive permissions scattered across numerous policies, adding a layer of complexity to our access management framework.

Quota Limitations

It’s crucial to be aware of quota limitations when working with IAM in AWS. These limitations vary based on the type of resource and can impact the number of policies, users, roles, and other IAM entities you can create.

IAM policies in AWS define permissions for actions within your AWS account. These policies are attached to IAM identities (users, groups, or roles) and determine what actions these identities can perform on specified resources. The challenge often lies in creating and managing these policies at scale, especially in complex infrastructures with numerous users and resources.

IAM policies in AWS can be attached to resources through various mechanisms:

  • User Policies: Policies attached directly to IAM users define the permissions for individual users.
  • Group Policies: Policies attached to IAM groups are inherited by all users within the group, streamlining the management of permissions for multiple users.
  • Role Policies: IAM roles are a powerful construct for delegating permissions to entities such as AWS services, applications, or users from another AWS account.
  • Resource Policies: Certain AWS resources, such as S3 buckets and Lambda functions, allow you to attach policies directly to the resource. These resource policies define who can access the resource and what actions they can perform.

When combining all of these scattered policies, I had to consider reaching some non-adjustable quotas.

Read more about AWS IAM quotas and limitations.

Solution overview

After taking everything into consideration, I created a user that will be used by the third party, created a dedicated group, and added the user as a member of that group. I planned to attach the policies to the group, thus having the user inherit the permissions from the group.

This is where I encountered the first quota — the number of policies attached to a group. The default value is 10, and it is non-adjustable. Not a big issue, I decided to do a group rotation — for every 10 policies, I will create a new group and attach the next batch of policies to that group. The user will be a member of all those groups and therefore inherit the policies from all of them.

In order to ensure that the policies from the previous paragraph are reusable, I opted for using only managed policies. Preferably, AWS-managed policies, and where not possible — customer-managed policies.

The AWS-managed policies were the easy part — I created a list of the policies’ ARNs, and iterated over them, and attached them to the groups.

It’s worth mentioning that Terraform v1.6.4 doesn’t support nested loops.

If it were any other programming language, I would use a map object of the following structure:

mapping = {
group_1 = [
policy_1,
policy_2,
policy_3,
policy_4,
policy_5,
policy_6,
policy_7,
policy_8,
policy_9,
policy_10
],
group_2 = [
policy_11,
policy_12,
policy_13,
policy_14,
policy_15,
policy_16,
policy_17,
policy_18,
policy_19,
policy_20
]
}

Here I had to flatten the map so it can be iterated over with a single loop. I used this great article by Dave Perrett in order to achieve that. First I had to generate the normal map, and then manipulate it using the flatten() function.

The reversed mapping after running the following code looks like this:

reversed_mapping = [
{
group_id = "group_1"
policy_arn = "policy_1"
},
{
group_id = "group_1"
policy_arn = "policy_2"
},
{
group_id = "group_1"
policy_arn = "policy_3"
},
{
group_id = "group_1"
policy_arn = "policy_4"
},
{
group_id = "group_1"
policy_arn = "policy_5"
},
{
group_id = "group_1"
policy_arn = "policy_6"
},
.
.
.
{
group_id = "group_N"
policy_arn = "policy_M"
},
]



locals {
###############################################################################
# Global
###############################################################################

batch_size = 10 # Do not change this value because of AWS non-adjustable quota
policy_prefix = "my_group"

###############################################################################
# AWS Managed Policies
###############################################################################
# List of ARNs of AWS Managed policies already available to use
aws_managed_policies = [
"arn:aws:iam::aws:policy/CloudFrontFullAccess",
"arn:aws:iam::aws:policy/AmazonEC2FullAccess",
"arn:aws:iam::aws:policy/AmazonRoute53ReadOnlyAccess",
"arn:aws:iam::aws:policy/AmazonSNSFullAccess",
"arn:aws:iam::aws:policy/AmazonSQSFullAccess",
"arn:aws:iam::aws:policy/AmazonSSMFullAccess",
"arn:aws:iam::aws:policy/AmazonAPIGatewayInvokeFullAccess",
"arn:aws:iam::aws:policy/AmazonECS_FullAccess",
"arn:aws:iam::aws:policy/AWSGlueSchemaRegistryFullAccess",
"arn:aws:iam::aws:policy/AWSLambda_FullAccess",
"arn:aws:iam::aws:policy/AmazonAPIGatewayAdministrator"
]
# Number of batches (how many groups to create)
# The key-value format is: groupX = [policy1, policy2...policyN]
num_managed_batches = ceil(length(local.aws_managed_policies) / local.batch_size)

# The group's base name. will be followed by a serial number of the group
managed_group_name = "my_group_managed"

# This map is used to map batch_size policies to each group
managed_batch_map = {
for i in range(local.num_managed_batches) : "${local.managed_group_name}_${i + 1}" =>
slice(local.aws_managed_policies, i * local.batch_size, min((i + 1) * local.batch_size, length(local.aws_managed_policies)))
}

# This map is the reversed version of the batch_map because nested loops are not available in TF
# The key-value format is {policy1 = groupX, policy2 = groupX ... policyN = groupX}
managed_reversed_batch_map = flatten([
for id, policy_arns in local.managed_batch_map: [
for policy_arn in policy_arns: {
policy_arn = policy_arn
group_id = id
}
]
])

For the sake of simplicity when modifying policies, I decided to create a different policy for each AWS service. My custom policies are stored under the policies directory:

The challenge was when I wanted to apply the same logic to the customer-managed policies. At first glance, it appears that the only difference is that I have to create the policy before using its ARN and continue as I did with the AWS-managed policies.

But the way that I implemented this — using the locals block to generate the policies-to-groups mapping — meant that I had a chicken-and-egg scenario. I had to first create the policies, and then generate the map object based on the output of the aws_iam_policy resource, but for_each can’t run on unknown number of elements.

This means that I have to know in advance not just how many policies I’m creating, but their ARNs. Fortunately, an IAM policy’s name is the unique identifier used in the ARN, and since I’m the one who created the policies, I decided on the name:

 ###############################################################################
# Custom Policies
###############################################################################

# Since the policies don't exist yet, but the ARNs are needed for the mapping in the locals,
# The ARN can be calculated here since the suffix is the name of the policy.
# The policy will be created at runtime by the resource aws_iam_policy.custom_policy

# The template format of the policy ARN
custom_policy_arn_template = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:policy/${local.policy_prefix}_<POLICY_NAME>"

# The files under policies directory (where the custom policies files are)
custom_policies_files = fileset("${path.module}/policies", "*.json")

# Create a list of policies (similar to local.aws_managed_policies):
# Trim the .json suffix of the files
# Replace the placeholder in the ARN template with the filename (which will be the policy)
custom_policies = [
for file in local.custom_policies_files: replace(local.custom_policy_arn_template,"<POLICY_NAME>",trimsuffix(file,".json"))
]

# Number of batches (how many groups to create)
# The key-value format is: groupX = [policy1, policy2...policyN]
num_custom_batches = ceil(length(local.custom_policies_files) / local.batch_size)

# The group's base name. will be followed by a serial number of the group
custom_group_name = "my_group_custom"

# This map is used to map batch_size policies to each group
custom_policies_batch_map = {
for i in range(local.num_custom_batches) : "${local.custom_group_name}_${i + 1}" =>
slice(local.custom_policies, i * local.batch_size, min((i + 1) * local.batch_size, length(local.custom_policies)))
}

# This map is the reversed version of the batch_map because nested loops are not available in TF
# The key-value format is {policy1 = groupX, policy2 = groupX ... policyN = groupX}
custom_reversed_batch_map = flatten([
for id, policy_arns in local.custom_policies_batch_map: [
for policy_arn in policy_arns: {
policy_arn = policy_arn
group_id = id
}
]
])
}

Most of the magic happens in the locals block. After that, the usage of the generated mappings of policies-to-groups in the resources is fairly straightforward:

################################################################################
# Groups
################################################################################

resource "aws_iam_group" "my_group_managed" {
for_each = local.managed_batch_map

name = each.key
}

resource "aws_iam_group" "my_group_custom" {
for_each = local.custom_policies_batch_map

name = each.key
}

resource "aws_iam_user_group_membership" "my_user_my_group_managed" {
depends_on = [aws_iam_group.my_group_managed, aws_iam_user.my_user]

user = aws_iam_user.my_user.name
groups = [for group in aws_iam_group.my_group_managed: group.name]
}

resource "aws_iam_user_group_membership" "circleci_my_group_custom" {
depends_on = [aws_iam_group.my_group_custom, aws_iam_user.my_user]

user = aws_iam_user.my_user.name
groups = [for group in aws_iam_group.my_group_custom: group.name]
}


################################################################################
# Policies
################################################################################

resource "aws_iam_group_policy_attachment" "aws_managed_policies" {
depends_on = [aws_iam_group.my_group_managed]

for_each = {
for index,entry in local.managed_reversed_batch_map:
entry.policy_arn => entry
}

group = each.value.group_id
policy_arn = each.value.policy_arn
}

resource "aws_iam_policy" "custom_policy" {
for_each = fileset("${path.module}/policies", "*.json")

name = "${local.policy_prefix}_${trimsuffix(each.key,".json")}"
description = "A policy for ${each.key}"
policy = file("${path.module}/policies/${each.key}")
}

resource "aws_iam_group_policy_attachment" "my_group_custom_policies" {
depends_on = [aws_iam_group.my_group_custom, aws_iam_policy.custom_policy]

for_each = {
for index,entry in local.custom_reversed_batch_map:
entry.policy_arn => entry
}

group = each.value.group_id
policy_arn = each.value.policy_arn
}

Closing Remarks

My goal in this post was to shed light on the effective solution and the reasoning behind my approach to this seemingly straightforward problem. While piecing together all elements posed a challenge, Terraform played a crucial role in simplifying the process and ensuring the solution’s adaptability for a robust, dynamic production environment.

Visit our career website

--

--