Enforce your policies as code on Kubernetes using GateKeeper(OPA)

Tanat Lokejaroenlarb
NonTechCompany
Published in
8 min readJul 21, 2021
https://unsplash.com/photos/a7yAzuBnKfY?utm_source=unsplash&utm_medium=referral&utm_content=creditShareLink

Open Policy Agent, OPA, This word first came to my attention more than a year ago.

Despite the fact that I was working with a Platform engineering team. What I only knew back then was, it is an engine to declare some policies in a central place and have the domain specific language to define those polices instead of having these policies scattered in all the places.

and that’s it, that’s all I know about it, since the work I needed to involve never come near OPA.

Now, since I have a chance to play with OPA a little bit. I am writing this to share what I have discovered so far, and what I think you should know.

Firstly, What is the purpose of OPA? Seeing concrete examples would help us to understand better.

As Platform engineering team, one of our job is to provide a solid Kubernetes platform for the development team to deploy their workloads. There are plenty of opportunities that you could use OPA to fortify the platform. For example,

  • Resource limit must be set to prevent a single Pod to overuse the resource and tear down the whole worker node
  • Allow only internal image registry to be used to ensure only vetted images will be running in the cluster
  • Prevent the creation of duplicated hostnames in ingress

The idea can go on and on, the only limitation is your imagination.

so, writing those policies into a documentation and hope that the developers will read them and behave is utterly crazy. How about we enforce them directly into the platform (I considered this as one of the great example for “architectural fitness functions” from Building Evolutionary Architecture books )

For How to do it, OPA comes to the rescue.

https://unsplash.com/photos/L_vbehltPq8?utm_source=unsplash&utm_medium=referral&utm_content=creditShareLink

Before we get to how, let me clarify these 2 words here.

OpenPolicyAgent and GateKeeper. what is their difference?

OPA (oh-pa) itself is the “policy engine”. it allows you to express your policies using domain-specific language called Rego(ray-go). it also offers an API for your services to query to validate the requests against the defined policies.

Gatekeeper is a controller/operator deployed into Kubernetes which allows you to define your Policies natively in Kubernetes (using CRDs, as you might already have guessed correctly) and integrate itself to the Kubernetes api-server. hence, every time you make a request to Kubernetes api-server (kubectl apply / create / update), the request will be sent to Gatekeeper controller to determine whether to accept or reject based on the defined policies

Lastly, before we get into the implementation. I promise this is the last “before the actual code”. but, I really want you to understand how Gatekeeper works.

The concept that you need to understand is “Kubernetes admission controller”.

Kubernetes Admission Controller is what Gatekeeper uses to intercept your requests to validate them.

In Kubernetes, before your request to the api-server is completed and saved to etcd database. There are several things happen in the background as shown in below diagram

from: https://kubernetes.io/blog/2019/03/21/a-guide-to-kubernetes-admission-controllers/

After your request is authenticated and authorized. There are 2 mains admission controllers in the process concerning our scope which are Mutating, and Validating controller.

Mutating Admission: Once Kubernetes authorized your request, it will send a request containing object being created or updated to the mutating admission process to call the defined webhook service.

This webhook service can do 2 things, it either rejects your request or “mutates” your request object. to give you an example of what Mutate does, let’s say you create a Pod without specifying serviceAccount. you can see that once it is created, it has a serviceAccount: default attached to it. this is actually being done by different controller, but it should give you an idea of how mutate works.

Validating Admission: this controller’s name is quite self-explanatory, it either allows or rejects the request by sending the request to the registered webhook for a decision.

Once you have learnt that there is this concept in the Kubernetes api-server. you might be having a good idea, how Gatekeeper is implemented.

Gatekeeper, currently, uses only Validating Admission controller to register itself to a Kubernetes api-server as a webhook for validation.

Utilizing Mutating controller is in the road map, if you want to hear more about it, you can listen to this podcast featuring Max Smyth, the maintainer of the Gatekeeper project.

code from Gatekeeper official Helm chart to register itself as a validation webhook

This yaml is captured from official Gatekeerper’s Helm chart. You do not need to understand this in detail, but what it does is basically register itself as a validating webhook to be called whenever there is a CREATE or UPDATE request sent to the api-server.

— — — — —

OK, you are all set with the fundamentals. let’s start coding.

What we will do is, we will prevent a service of type LoadBalancer to be created.

In this article, I will use Kind as a local Kubernetes cluster to implement a very classic easy example.

  1. Install kind easily by
brew install kind

2. Create this configuration file to specify cluster detail. in this case we will have single master 3 workers. This is because the default number of replicas for Gatekeeper controller instances is 3.

and run

kind create cluster --config multi-workers.yaml

you should see this screen indicating the success creation of the cluster

also, we can check the set up is done properly (I alias my kubectl as k, for fancy reason)

3. Install Gatekeeper using Helm into our local cluster

helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/chartshelm install -n gatekeeper-system gatekeeper gatekeeper/gatekeeper --create-namespace
helm install -n gatekeeper-system gatekeeper gatekeeper/gatekeeper --create-namespace

Try getting the pods and we should see all the controllers running, which means we are good to go

controllers are deployed into the cluster, we will talk about audit in the next episode

4. Now, we need to define the template of our Policy and its logic.

Gatekeeper allows us to define the constraint template natively using CustomResourceDefinition(CRD) like this

This template basically says that we will have a constraint template called LoadBalancerConstraint with a logic to review Service object that is being created, and if its type is LoadBalancer, this rule will be true and the creation request will be denied

How OPA rules work is that, each line of the code will be validated as “and” operator. if one statement is evaluated as false, then the logic will be short-circuited and this rule will be ignored (service can be created, in this example)

if you want to do “or” operation. you can define multiple rules in the package. see more detail in this document

Next up, we will need to create an actual constraint resource from the constraint template we created in the earlier step

the created “resource” will look like this

This basically specifies the scope to which this constraint applies to.

the main idea behind segregating the template and the actual constraint from the Gatekeeper team is the share-ability. you can just apply the policy that someone else has developed by just changing the scope, or parameters to your need without having to write your own template from the scratch.

When Kubernetes api-server receives a request to a “Service” resource within “opa-test” namespace. it will send a request to the Gatekeeper controller’s webhook and execute logic we wrote in the template for validation.

5. Let’s apply both template and the actual constraint

you should see the template we created as a CRD

and loadbalancerconstraint resource created with given detail

loadbalancerconstraint resource, we can have multiple instances of it to apply to different scope and object

6. Test our policy by trying to create a Service type LoadBalancer in opa-test namespace

service type: LoadBalancer
rejected by webhook with defined error message

Voilà, we got an error message returned from api-server as expected.

if I apply this to the default namespace which is not in the scope, it will not be impacted by what we have done

not included by our policy scope

That’s it, now you have the idea of how to use this GateKeeper in your platform.

Notes: if you want to see how does the review object looks like you can use this statement in the policy

msg := sprintf("review object: %v", [input.review])

and short-circuited it to print the message out in return, this way you will see the review object returned when you try to apply the resource

— — — — —

In the next episode, I will tell you guys more about how we can use data within the cluster to aid the decision in our policy and also how to apply the policies to object created before the policies are applied (audits).

Until then, be happy, be safe, and see you later.

Part 2 is now live here:

Enjoy!

References:

--

--