Preventing prod access with IAM Conditions
I was posed the following puzzle.
In a Google Cloud organization, there is a group called developers
that has members that are the developers at the company. All developers should be allowed to create / destroy Compute Engines. At the Organization level, an IAM Allow Policy was defined that bound the developers group to the Compute Admin role. Everything was great. Folders and projects within the organization worked exactly as desired. Many projects and folders were created and the permissions felt appropriate … until … it was time to go into production. A folder was created called prod
and projects created subordinate to that which would host the production environment. It was here that the “oh oh” moment came. We don’t want any of our developers to be able to accidentally damage our production environment. With the current security setup, the permissions granted would allow them to delete Compute Engines in production. What we wanted was to have our permissions apply to our organization as a whole, except the production environment.
This is illustrated in the following diagram:
Our first attempt to solve the problem was to create an IAM Deny Policy at the prod
folder. What IAM Deny policies do is define a set of restrictions. Our thinking was that we could define the equivalent of:
DENY: Group Developers: Compute Admin Role -> On Folder Prod
This is where we hit our first challenge. While IAM Allow Policy takes Roles (collections of permissions) to allow, the IAM Deny Policy takes raw permissions for its denial specification. This would seem to imply that if we want to “cancel” the Compute Admin Role for the group at Prod and below, we would have to unpack the role to its constituent permissions and then create a Deny policy for that set of permissions. This becomes burdensome and there is no assurance that all the permissions granted by a role higher in the hierarchy can be accommodated in an IAM Deny Policy as only a subset of permissions are supported by IAM Deny.
However, we then considered a more restrictive challenge. Our goal was to prevent all our developers from working on prod
by default but we wanted a select few of our developers to be able to come in and do work. Let’s see why the use of IAM Deny Policy would be a problem.
Consider user joe@example.com
. Joe is a member of the developers group and hence has permissions across the board. Now consider we applied an IAM Deny Policy restriction on the developers group at the prod
folder. If Joe then wished to work in prod
, then he would be denied. Our intuitive answer is to explicitly grant Joe Allow at the prod
folder. This feels like it should work. We have explicitly declared that Joe is allowed. Unfortunately, it won’t work.
Let us look at the following permissions flow chart:
What we find is that when a Deny Policy is matched, the request is denied then and there and no further attempt at Allow Policy matching is performed. If we overlay this flow chart on our story, when Joe attempts to access a prod
project, the Deny Policy will detect that he is a member of the developers group and deny him access. Even though he has been explicitly allowed through IAM Allow, the Deny Policy takes precedence and he will be forever prevented from prod
access.
Now let us step back and look at an elegant solution that does work.
Our original contemplation that to nullify and IAM Allow was an IAM Deny was where we introduced our poor thinking. Loosely, we had thought that the opposite of IAM Allow was IAM Deny. Instead, we should have considered that the opposite of IAM Allow was NOT IAM Allow. Putting it another way, we don’t want to think in terms of IAM Deny but instead think in terms of NOT having the IAM Allow present at the prod
level. This seems to contradict a notion that Google has given us in the past which is that if you bind an IAM Policy at a higher level in the resource hierarchy, that IAM Policy cascades downwards (is inherited by all subordinates). If that is the case, then we apparently can never remove a higher level grant. This is where the notion of conditional IAM Allow Policies can come into play.
When we define an IAM Allow Policy we can attach a boolean valued expression to the binding. The Allow policy is only honored if it applies to the principal making the request and the conditional expression evaluates to true. What we want then is some expression which would be true for the green resources but false for the red resources as shown in the following:
Fortunately, Google has created a mechanism that is perfect for this capability. It is called Tags.
We can think of a tag as a named key that can have predefined values associated with it. The set of tags that are available for use are defined at the organization level along with their permitted values. When we create a resource (eg a folder or a project), we can associate one or more tags with that resource. In a hierarchical resource structure, if we associate a tag with a higher level resource, then the tag is automatically inherited by lower level resources.
In our story, imagine we now create a tag called production
which has allowed values of yes and no. We can now attach the tag called production
to our folder called prod and that will be inherited by all the projects within the prod folder. Armed with this, we can now alter our organization level IAM Allow Policy to:
ALLOW: Group developers: Compute Admin Role WHEN the tag called production is not present or does not have the value yes.
In English, this will grant permissions to the group members on resources where a tag that flags the resource as production is not present.
Perfect!!!
Now lets see how we can go about setting this up.
We see that at our organization level, we have an IAM Allow policy that grants Compute Admin to the developers group.
If we go to an arbitrary project, we find that members of this group can work with Compute Engines.
At the organization level, we next create a tag called production
:
A key value associated with the tag is yes
:
Notice the identities generated for the key and value. We will need these shortly.
Now we add the tag to our folder called prod
which is the root of our production resources.
when we now look at our resource hierarchy, we see:
The thing to note is that the prod folder and its descendants now include the tag.
Finally, we update our IAM Allow Policy.
The expression starts with “!” meaning negate the following expression which is true when the resource is tagged as production. The complete expression thus is true ONLY when the resource is NOT production which is what we want. We now see that the condition has been applied:
And … that’s it. Members of the group developers can work with Compute Engines across the organization except folders and resources beneath the prod folder. However, if those users are granted IAM Allow permissions, then they will be allowed access.