Validating Kubernetes Deployment YAMLs

It doesn’t need an Introduction

One of the challenges working with K8s is pre-deployment validation of your descriptor YAMLs. A slight misalignment of the — env values and the server returns an ugly error which makes you wonder if your life is worth living.

It is more annoying when you see this error while deploying through an automated tool/system.

Yes, CI/CD! Not good, right?

I was bestowed upon with this opportunity recently. Like every other engineer I started with an internet search How to validate Kubernetes YAMLs and the results were not really welcoming. I thought there would be a multitude of tools and systems offering this but to my dismay, there was only a handful of them, some of them created by the same person Gareth.

Well, what to test?

With more research and introspection I realized that there were primarily 2 things that need to be validated:
1. The schema: So the server does not complain about invalid kind or invalid array while it was expecting a map 
2. The state: So that when you deploy your workload it does not land in a dangling state. e.g. The service name is incorrect and the dependent services are calling non-existent URLs.

Prince, cut the crap and tell me where should I test?

Now there are several stages of the application life cycle where you want to have these two checks. Validating YAMLs while developing application logic might not make sense but having a fail-fast mechanism would result in less error prone deployments.

Possible stages where you can think of integrating these checks are:
1. Development and local test: Developers laptop
2. Continuous Integration: Full-fledged CI system where you have a dedicated Validate or Test stage
4. Pre Deployment: The deployment tool validates the YAMLs before deploying
5. Post Deployment: Some tool which pulls the created object and performs validation on them

I would assume that your project uses containers extensively and your tests are also run inside a container (If that is not too much to ask for).

There are two ways you can validate your schema, you either use kubectland use --validate with --dry-run or you utilize some tool which does not require a K8s server as a dependency.

Validating the schema

I will talk about the latter here.

kubeval is a tool which uses OpenAPI spec to validate your config. It has a modified (or cleaned up) version of Kubernetes OpenAPI spec which is suitable to build a tool around.

Before we proceed shoutout to Instrumenta for the awesome work!

kubevalalso allows you to use your own schema for validation (I have never tried it though and there is a bug which restricts it to an extent).

Let us see a quick example of how kubevalworks. I have a deployment file which looks like:

apiVersion: apps/v1
kind: Deploy
metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
spec:
replicas: 1
selector:
matchLabels:
run: nginx
template:
metadata:
labels:
run: nginx
spec:
containers:
- image: nginx
name: nginx

For some unfortunate reason, I have declared the kind as Deploy which should have been Deployment. Now I can run the following command to validate my YAML:

bash-5.0$ kubeval deploy.yaml
1 error occurred:
* Problem loading schema from the network at https://kubernetesjsonschema.dev/master-standalone/deploy-apps-v1.json: Could not read schema from HTTP, response status is 404 Not Found

You can see that it errors out saying that the schema does not have any kind Deploy. After fixing the kind:

bash-5.0$ kubeval deploy.yaml
The document deploy.yaml contains a valid Deployment

I modified the replicas to accept an array whereas it should accept an integer:

..
spec:
replicas: [1]
selector:
..

kubeval reports the error:

bash-5.0$ kubeval deploy.yaml
The document deploy.yaml contains an invalid Deployment
---> spec.replicas: Invalid type. Expected: integer, given: array

Useful, isn’t it?

However, if you introduce an unknown field in the manifest the above command would fail (notice the field non-metadata):

bash-5.0$ cat deploy.yaml
apiVersion: apps/v1
kind: Deployment
non-metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
..
..
bash-5.0$ kubeval deploy.yaml
The document deploy.yaml contains a valid Deployment

To make sure it catches these errors use the flag --strict

bash-5.0$ cat deploy.yaml
apiVersion: apps/v1
kind: Deployment
non-metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
..
..
bash-5.0$ kubeval deploy.yaml --strict
The document deploy.yaml contains an invalid Deployment
---> non-metadata: Additional property non-metadata is not allowed

Hey, what about validating the state?

There are two tools both created by a gentleman if you haven’t guessed yet: Gareth

  1. kubetest : It’s a nice tool but I could not successfully use it due to this bug.
  2. conftest : This tool is gold. Why? because it uses Rego for queries. Rego is the same query language that Open Policy Agent uses. As there are several enterprises already using OPA there is better support and documentation.

Let me know if you are interested in examples of kubetest, I will create another post for it.

Let us see how you can use conftest with an example. But wait, some intro first. conftest accepts a set of policies which are a collection of rules. A rule can either evaluate to true or false.

By default, the policies are assumed to be in the ./policy directory. Obviously, you can override the location using --policy flag. Why would you even think that the support to override the location would be missing?! Dumb you.

I won’t go into many details but see here is a sample:

.
├── deploy.yaml
└── policy
└── base.rego

The content of base.rego :

package main
deny[msg] {
not input.kind = "Deployment"
msg = "not a deployment"
}

I have defined onedeny rule which checks the input.kind value. If

not input.kind = “Deployment"

evaluates to true then the msg would be returned. Basically what the rule says is that if the input kind is not deployment then succeed. Here, success will cause the deny rule to pass, which means error. You might want to read the above lines a few times to make sense. BTW this is the input we are using:

apiVersion: apps/v1
kind: Deploy
metadata:
..
..

Here is the result:

bash-5.0$ conftest test deploy.yaml
deploy.yaml
not a deployment
bash-5.0$ echo $?
1

Now let us correct the deploy.yaml kind:

apiVersion: apps/v1
kind: Deployment
..
..

and here is the magic:

bash-5.0$ conftest test deploy.yaml
deploy.yaml
bash-5.0$ echo $?
0

How do I integrate it into the pipeline?

Well, containers! Run your test in containers and have these tools available in them. You would be good to go with a few lines of code in your Jenkinsfile or some test script.

Really important stuff!

  1. Clap for the story
  2. Follow me on medium
  3. Send me a LinkedIn invite with message “from medium blog”
  4. And yeah you can follow me on github too: https://github.com/princerachit