Box Cloud Management Framework: Multi-Cloud IAM Implementation (Part 3 of 3)
This is the final blog post in our mini series on Multi-Cloud IAM Governance. If you have not read Part One and Part Two of this series, I would encourage you to do so. This blog leverages all of the key points articulated in those first two blogs to illustrate how to actually apply those concepts and implement a consistent Multi-Cloud IAM Governance model across AWS, GCP, and Azure. It is important to remember that this model will take time to implement and it should be guided by a well thought out and prioritized roadmap. Although some of the details seem rather simple to implement, they will require diligence to ensure they are conceptually consistent across all providers, otherwise most of the value will be lost.
Each of the implementations below illustrate what the model looks like on each of the selected Cloud Providers. We will primarily focus on the meta-model, that was defined in the initial blog, and how to apply it. This includes the following areas:
- Organization Structure
In general, the organization structure is a critical pattern that will enable you to logically organize your cloud hierarchy so that you can manage and govern your cloud environment(s) in a consistent manner. The organization structure enables you to reason about what guardrails to set in a consistent way across different cloud providers. Establishing a conceptual model on what guardrails to set between different types of organization groups, resource groups, and resources makes is much easier to implement your governance model. Although each cloud provider provides defines different nomenclature around how to implement this construct, ultimately they are all conceptually trying to achieve the same end result (i.e. define a hierarchical implementation of how to manage your cloud resources). There are many ways to categorize your cloud organization and resource; at Box, we chose to organize by environments (i.e. production, stage, development, sandbox). In the sections below, we’ll illustrate how we implemented the organization layout for each provider.
- Organization Policies
One of the most important aspects of managing one or more clouds is to establish appropriate guardrails to ensure developers can move fast, but also prevent accidental or malicious mistakes that can compromise the security and compliance of your cloud environment. Organization Policies provide the mechanism to establish your cloud guardrails. These policies vary between cloud providers, but the end goal is to implement guardrails that will secure your cloud against threats and limit the blast radius of errors that will inevitably be introduced into your environment. In each of the implementations below, we will show some of the policies we used to setup our cloud guardrails.
- Custom Roles
Although all cloud providers support basic roles (such as owner, editor, viewer in GCP), it is not a best practice to apply these roles, liberally, as they do not follow the principle of least privilege. Based on your requirements, some of the pre-defined cloud roles may be sufficient to limit the permissions to only what is required by the service owner. At Box, we leveraged custom roles across cloud providers to ensure we set least privilege permissions for different persona’s managing cloud resources.
In order to keep this blog relatively short, we will not go into additional details on Federated Identities, Decommission Organization Groups, Test Organizations, tagging/labeling, naming conventions, or cost control constructs. Let us know if there is interest in these areas in the comment section of the blog post and we’ll explore writing a follow-on blog that goes into more details in those areas.
You will notice in the implementations below that in some cases we are further along in applying the Meta-Model and therefore some of the learnings and implementation details will be slightly different between each provider. Those learnings will ultimately be applied to each provider in a consistent manner. In a perfect world, we would be lock-step with each provider implementation, but the reality is that is not feasible or even practical in most cases. This is primarily reason is that most companies (box being no different) tend to develop more in one provider than another and therefore those learnings and implementations will proceed at different paces.
The following figure illustrates the overall organization structure for AWS. We map our IAM meta-model to AWS specific constructs.
AWS released the Organizations capability for general availability in February 2017. AWS Organizations allow you to put accounts into logical groups and then apply policies to these groups using Organization Units (OUs). As you can see in the illustration above, we have setup a Production OU, Stage OU, Development OU, and Sandbox OU. Each of these environments will then contain one or more AWS accounts
AWS actually supports six different policy types; one of these policy types is called Service Control Policies. Service Control Policies (SCPs) provide mechanisms for setting up guardrails in your AWS Organization by setting policies on OUs. AWS doesn’t recommend setting SCPs at the organization root without thorough testing. This is actually where one of the constructs we mentioned in our initial blog called test organizations provides a critical method to validate these types of changes before applying to your primary organization. The custom roles section below addresses two of the other policies: identity and resource policies. Refer to the policy types link above for more details on the all of the supported AWS policies.
Two of the SCP policies we’ll use to describe some of the guardrails we have established, in AWS, are DenySSLRequests and DoNotLeaveTheOrg. The first SCP will block accounts from requesting SSL certificates from Amazon. The second SCP is used to prevent accounts from leaving the AWS Organization. If an account needs to be removed from our AWS Organization, it has to be manually removed by an administrator of the master account. These only represent a few of the SCPs you can put in place to setup the appropriate guardrails in your organization.
As with most providers, AWS recommends that you start with the default roles as the base for your custom roles and use them to create new policy statements, modify the permissions as appropriate, and then attach the new permission policies to the roles you want to create. We have used this construct to help create custom roles to separate control and data planes for data analytics operations we run in AWS. It is important to note the difference between identity and resource-based policies. Identity-based policies are attached to IAM users, groups, or roles. Resource based-policies are attached to resources. When to use custom roles (using Identity-based policies) or resource policies is left as an exercise for the reader. There are situations that are appropriate for each type of policy.
On AWS, we use custom roles to setup and manage access to AWS EMR clusters and DynamoDB resources across multiple AWS accounts. The custom roles enable the ability to setup fairly sophisticated data management constructs and build a platform to deliver a service to multiple types of services that require access to these resources to perform various types of data analysis.
Google Cloud Implementation
The following figure illustrates the overall organization structure for GCP. We map our GCP meta-model to GCP specific constructs.
Google Cloud uses Folders to help organize GCP projects into logical groups and then apply organization policies to those Folders. Just like AWS, we organize our Google Cloud Folders using environments. As illustrated above, we have Production, Stage, Development, and Sandbox Folders. Each of these Folders will contain one or more GCP projects.
IMPORTANT NOTE: There are a few constructs that are not depicted in the above diagram: Organization Level Folder, Decommission Folder, and Test organizations. As part of our implementation, we have found that there are some GCP projects that are used to manage resources across the entire organization and therefore do not belong to a single environment. So, we have recently added the Organization Level Folder construct to contain those types of global projects. We are also validating the use of a decommission folder which will serve as a staging area prior to permanently removing resources from our cloud environment. And finally, we leverage the use of multiple Test Organizations to support validation of high risk changes prior to rolling them out into our primary production organization.
GCP provides a number policy and security constructs: organization policies, IAM policies, and VPC service controls are a few of the key ones. We will discuss IAM policies in the context of our custom roles section below. GCP organization policies provide us a way to establish various global guardrails as well as across each environments we support. VPC service controls provide a key capability to setup virtual security layers to protect GCP API access in your GCP environment. Here are examples of an organization policy and VPC service control we implement on GCP: restrict external IPs and IP access restrictions. We make use of shared VPCs and use only internal IPs to protect our workloads from access via the internet. In order to ensure we don’t have workloads violating this constraint, without explicit approval, we set a global organization policy that prevents the configuration of Public IPs on Compute or CloudSQL instances in our environment.
We also use VPC Service Perimeters to implement IP access restrictions that limit access to GCP APIs to only IPs in our internal network CIDR block ranges. This adds an additional layer of protection and allows us to keep our sensitive data within our network and block bad actors, that are outside of our defined network boundaries, from gaining access to it.
GCP provides similar mechanisms to the other cloud providers to manage custom roles. You essentially define a set of IAM permissions to a role that can now be used to grant privileges to and IAM identity (user, group, or service account). Google recommends you start with a pre-defined role and add to or modify as appropriate to create your custom role.
At Box, we have defined a number of custom roles to help better manage our GCP environment. For example, in order to improve developer velocity, we created a custom role with permissions to manage all approved services in GCP. This role is only granted in dev and allows developers to more quickly develop their service without needing to request new permissions for their service account every time they want to experiment with a new service. Once those permissions are fully understood in the development environment, they will then request custom roles for their Stage and Production environments. Ultimately, we will leverage this same model across our other providers as well.
The following figure illustrates the overall organization structure for GCP. We map our GCP meta-model to GCP specific constructs.
By now, I’m sure you see a common theme of how we chose to organize our cloud providers based on the meta-model we defined in the initial blog. Although each provider uses different terminology, the constructs are very similar. In the case of Azure, it uses the subscription concept to help organize Azure resource groups into a logical grouping. At Box, we organize Azure into Production, Stage, Development, and Sandbox Subscriptions. All resource groups are contained within one of these Subscriptions.
Azure Policies, Initiatives, and other access restriction capabilities provide the mechanisms to set organization wide guardrails to ensure developers are adhering to the standards defined to maintain a secure and compliant operating environment. Although we are not as far along in our Azure Multi-Cloud IAM journey, there are a number of these policies we use or plan to use to define appropriate guardrails. Two of the key policies are Restrict Resource Location and IP Access Restrictions. The Restrict Resource Location is a built-in policy that allows you to restrict which regions Azure resources can be provisioned. IP Access Restrictions are actually managed using Azure App Service Access restrictions and essentially serve as network ACLs. This guardrail will restrict access to your applications to only the specific CIDR blocks. Azure also supports the concepts of Initiatives which are essentially a collection of policy definitions that can be used to simplify the management of policy definitions.
Azure follows the same general concepts as the other providers and provides a way to create custom roles by defining a role with specific permissions defined by your organization. Azure also recommends you start with the pre-defined roles and then add or modify the permissions as required to define the custom role.
In general, the custom role concept is a critical capability you will need to develop in order to ensure you are following the principle of least privilege. This can definitely be a tedious process, but will ultimately ensure you maintain a secure and compliant environment by only allowing developers the minimum permissions they require in order to develop and deliver their application(s).
This is the final blog in our mini series on Multi-Cloud IAM Governance. We hoped that you gleaned some valuable insights from our Multi-Cloud IAM Meta-Model, the Multi-Cloud IAM Challenges and methods we used to resolve them, and in the actual IAM implementation examples we presented in this blog.
Special thanks to the following people for their detailed reviews and comments:
- Luis Hernanz, Principal Architect, Box
- Matt Bowes, Staff Security Engineer, Box
- Xaviea Bell, Senior Site Reliability Engineer, Box