Member preview

Example of Yaml Generator and Validator in Python

If you work with Yaml regularly or not, the thing most people know about it is that it _definitely_ cares about whitespace, and even careful practitioners can still sometimes automate a bad process, and with Yaml, this is a bad time, so validating (particularly when generating Yaml, to say nothing of writing it by hand) is a must.

Let’s take a common Yaml use case: Kubernetes manifests. In my case, I wanted to create different configurations, populate information on-the-fly (things like tokens of a known length, for example), and then dump to a Yaml file used elsehwere. I did this with Python using `pyyaml`.

To use encryption at-rest in your cluster for resources like secrets, Kubernetes requires an EncryptionConfig file, which is a fairly short piece of Yaml to generate, it just needs the provider, a key, and which resource to encrypt at rest in Etcd, which to generate as Yaml, I’m just going to represent this as JSON:

configIn = {
"kind": "EncryptionConfig",
"apiVersion": "v1",
"resources": [
{
"resources": [
"secrets"
],
"providers": [
{
"aescbc": {
"keys": [
{
"name": "key1",
"secret": "%s" % (generateSecret(32))
}
]
}
}
]
}
]
}

and then we’re going to use that `generateSecret` (a [lambda](http://www.diveintopython.net/power_of_introspection/lambda_functions.html) that takes a string length and returns a base64-encoded version of a random string of that length) result to populate that JSON object’s value:

import base64
import random
import string
import os
import sys
import yaml
generateSecret = lambda length: base64.b64encode(''.join(random.sample(string.lowercase+string.digits,length))) #32 length
def populateConfig():
configIn = {
"kind": "EncryptionConfig",
"apiVersion": "v1",
"resources": [
{
"resources": [
"secrets"
],
"providers": [
{
"aescbc": {
"keys": [
{
"name": "key1",
"secret": "%s" % (generateSecret(32))
}
]
}
}
]
}
]
}

configOut = yaml.dump(configIn)
return configOut

and then have `yaml.dump` return that object to us as Yaml:

apiVersion: v1
kind: EncryptionConfig
resources:
- providers:
- aescbc:
keys:
- {name: key1, secret: BASE64_STRING }
resources: [secrets]

which is valid Yaml, but to make it idiomatic with the Kubernetes style (and because the experimental feature supported by this won’t accept this as of 1.11), we’ll change the `configOut` line’s dump option to look like this:

 configOut = yaml.dump(configIn,default_flow_style=False)

to return:

apiVersion: v1
kind: EncryptionConfig
resources:
- providers:
- aescbc:
keys:
- name: key1
secret: BASE64_STRING
resources:
- secrets

Okay, great, we’ve got our config, and it _looks_ reasonably correct, but since it’s automatically created, we probably want to double check.

There’s a few ways to do this, but because my input was relatively simple, and the schema wasn’t being modified in any meaningful way, just populating data, and because I’d prefer to do with this with the libraries already imported, we can use the `yaml` package’s built-in `safe_load` method to see if an incoming config (like the one returned by the above function) validates:

def validateYaml(config):
try:
yaml.safe_load(config)
return config
except:
sys.exit('Failed to validate config.')

This function will bail if the config cannot validate (which becomes important in a moment), but returns the valid config if it does, so with this information, we can advance to our program’s entrypoint to stitch all this together, where we’ll write the config to a file if it is valid:

if __name__ == '__main__':  
config = validateYaml(populateConfig())
EncryptionConfig = open("secrets.conf","w")
EncryptionConfig.write(config)
EncryptionConfig.close()
print "OK"

If `validateYaml` fails, it will prevent us from writing a bad config (or at least one that is certain **not** to work, other validation issues may present themselves that safe_load may not detect by default in a more complicate Yaml input).