Running and Writing Gatekeeper Policies in Kubernetes — Part 3

Sebastian Radloff
Namely Labs
Published in
6 min readOct 27, 2021

Series Overview

This series examines an opinionated Kubernetes Gatekeeper policy ecosystem, including what Gatekeeper is, why use Gatekeeper, writing and testing policies, deploying policies to a local Kubernetes cluster, and how to make them visible to users. All code for this series resides in this GitHub repository:

Part 1: Gatekeeper Components and Architecture Overview

Part 2: Install Gatekeeper to a Local Cluster and Iterate on Policies

In this post, we’ll continue writing policies to demonstrate a fuller range of possibilities, such as how to use the konstraint core library, why you might not want to apply policies directly to pods, and how to write policies that require knowledge of existing resources in the cluster.

Using the konstraint core library and default keyword

As we’ve seen before, we can import existing rego and use it in other policies. The official OPA docs do a great job explaining imports if you need a refresher. In this section, we’ll create a few policies that leverage the imported konstraint core library.

The above policy (namespace_team_label_02) is functionally the same as the first one we reviewed (namespace_team_label_01), ensuring all namespaces have the ‘team’ label. We kept the same tests from our first policy since the logic should be the same. Let’s take a look at the policies side by side.

The significant differences are the core library usage, the helper function has_team_label, and the violation rule block signature.

The core library document has predefined queries and functions that we leverage. Below are the relevant snippets of source code from the core library, which we depend on in our policy:

package lib.coredefault is_gatekeeper = falseis_gatekeeper {
has_field(input, "review")
has_field(input.review, "object")
}
resource = input.review.object {
is_gatekeeper
}
resource = input {
not is_gatekeeper
}
name = resource.metadata.namelabels = resource.metadata.labelshas_field(obj, field) {
not object.get(obj, field, "N_DEFINED") == "N_DEFINED"
}
format(msg) = {"msg": msg}format_with_id(msg, id) = msg_fmt {
msg_fmt := {
"msg": sprintf("%s: %s", [id, msg]),
"details": {"policyID": id},
}
}

As we can see, this code is similar to what we had in the first policy (namespace_team_label_01). The resource variable is defined as input.review.object assuming the function is_gatekeeper evaluates to true. The following rule block where the resource variable is equal to input if the is_gatekeeper function evaluates to false. This overwriting style is typical with OPA policies as a form of control flow. Below is an example of how the policy could be written in python:

# imagine as python codedef is_gatekeeper(input):
return has_field(input, "review") and has_field(input["review"], "object")
def has_field(obj, field):
val = obj.get(field, "N_DEFINED")
return val != "N_DEFINED"
if __name__ == "__main__":
input = {
"review": {
"object": {
"is_obj": "nice"
}
}
}
if is_gatekeeper(input):
resource = input["review"]["object"]
else:
resource = input

The has_team_label helper function in our new policy is leveraging the default keyword, which allows us to define a default value for documents that might evaluate to undefined. In our usage, core.labels might evaluate to “undefined” if there is a labels key on the resource under review. With this helper function, we “catch” that error and ensure the value for our helper function returns false.

The violation rule block signature hasn’t changed. The variablemsg resolves to the signature required{"msg": msg, "details": {}} , but the core library is doing that for us with the core.format_with_id function. It nicely formats the output with a policyID, which is useful in error messages, enabling users to look up the policy in our generated documentation easily.

Using the konstraint pods library and the unforeseen consequences of applying policies directly on pods

Let’s create a policy requiring all containers to have resource requests to binpack our nodes and properly avoid the noisy neighbor problem.

In this policy, we use the konstraint pods library to iterate through all containers. The pods.containers document returns a list of container objects, which we then iterate through using the [_] notation, and assign each container to the container variable. The OPA official documentation on universal quantification does a great job explaining the iteration strategies.

There are three violation rules in this policy and three helper functions. The helper functions are simple but make the policy code more expressive. container_requests_provided confirms that the container has the resource requests CPU and memory keys defined. The container_requests_zero_memory and container_requests_zero_cpu functions run a regex match against memory and CPU resource requests to ensure a non-zero value was provided. The output message is formatted in such a way to identify the resource kind, resource name, and container name violating the policy, providing clear and actionable feedback to our users.

While this policy does what we expect it to do in theory, most of our users are not creating Pod resources directly but are instead creating Deployments or Jobs that eventually create the Pods. If they apply a Deployment with containers violating this policy, the Deployment will be successfully created, creating a new ReplicaSet that will then fail to create a Pod. That’s why it’s essential to consider how your developers interact with the cluster when writing these policies. We can easily make this policy more user friendly by adding the kinds to the @kindsannotations at the top of the src.rego file, as demonstrated below in container_deny_without_resource_request_02 .

Writing policies that make determinations based on other resources existing in the cluster

There are situations in which you want to block the creation or update of a resource given the existence of another resource in the cluster. For example, let’s say that you’ve established an API gateway pattern in which routing to each service is “/api/${resource}/${subpath}” is rewritten to “/${subpath}. Consider the following ingresses:

The foo ingress will route all calls to/api/foozles/${subpath} to the foo service at/${subpath}. The bar ingress will route calls to /api/foozles/barzles/${subpath} to the bar service at /${subpath}, which goes against our standard pattern.

We can prevent this from happening by writing a policy and using a gatekeeper Config resource type. We list the Kubernetes resource types that we’d like to query as part of our policies in the config spec. Important to note that the name of the config must be config, for it to be reconciled by Gatekeeper. Below is the config we have written to query namespaces and ingresses:

Now we can query both object types through data.inventory in our policy. For cluster scoped objects like namespaces, the query format is data.inventory.cluster[<groupVersion>][<kind>][<name>] . For namespace scoped objects like ingresses, the query format is data.inventory.namespace[<namespace>][<groupVersion>][<kind>][<name>]. Below is the rego source defining our policy:

At the top of the violation rule, we iterate through all namespaces existing in the cluster and each namespace’s ingresses. Then we check to see if the ingress we’ve pulled from inventory is the same as the one currently under review. This check is necessary whenever you’re applying a policy against the same resource type you’re actively querying from the cluster. For example, if we want to apply changes to the foo ingress in the foo namespace, without the not identical(other_ing, curr_ing) check, the policy would find the changes in violation due to the existing foo ingress in the foo namespace. The rest of the policy isn’t necessarily an ideal way for determining if there’s an issue, but it works for this example.

When writing tests, you’ll want to mock data.inventory. Creating helper functions to build your Kubernetes objects makes it easier to create test cases and enhances test legibility. You can see this in the test file below:

Feel free to test the policy by applying the manifests in example-k8s-resources/ingress-policy-testing.

In the next post, we’ll take an in-depth look at structuring your CI for your policies repository and recommend strategies for applying policies that already have resources in violation.

--

--