Trying to fix messy Concourse YAMLs with Dhall
TLDR: Concourse YAMLs grow, get infested by confusing ERB (or any other templating language) and become monsters. I propose that we use dhall as the default configuration language for concourse.
Let’s start with the problems
Problem 1: Repetition
In my experience, we need to run the same tests in different situations, install our product(s) on different platforms, environments. For example, when I was working in the CFCR team, we had to install CFCR in 5 environments and run 3 different types of tests. Given this kind of challenge, teams usually look towards string based templating. To be fair, it may sometimes solve the problem. But in my experience from two teams that I have used concourse on, it only makes life more difficult.
Even after using ERB, we had to copy-paste the pipeline to create PR pipelines, simply because we decided more ERB would make it too complicated to make any sense.
Problem 2: Size of the files
As files become bigger, the cognitive load of keeping track of various jobs grows with it. The ease of discovering jobs/steps also keeps shrinking as the pipelines start becoming bigger. The way to deal with it in programs has been ability to namespace and codify namespaces in separate files. The lack of imports in YAML starts becoming a problem as the size of the files start growing. This has provoked people to use rulers to write yaml.
What’s my beef with templating?
Templating languages have been the favourites to generate markup since forever. But, I have always felt uncomfortable by them. Let’s dive into that.
Back when ERB was the coolest thing to template HTML, I was always annoyed that the generated HTML was never indented properly. But this annoyance became a real problem when we started using it to generate markup with significant whitespace (yes, that’s YAML). It is just too easy to generate invalid YAML. Not to mention the sense of indentation in ERB infested YAML is just non-existent.
And as if it wasn’t bad enough to be able to generate invalid YAML, the errors from any YAML parser would almost never be good. The line numbers will never match. The errors usually look like:
yaml: line 7: could not find expected ‘:’
But my file only had 6 lines. Yes, it is easy to spot in a 6 line file but most pipelines I worked with were never less than 500 lines.
Sometimes the YAML is valid but not a valid pipeline. And now I have only my eyes to tell me if I mis-indented something or I mis-spelt a key name.
Substituting multi-line strings like putting ssh-keys in YAML is also very difficult. The first thing that you write is definitely going to be wrong. And in the end there will be a weird gsub
to replace newlines with empty spaces or you’d end up defining a function which indents strings.
Also, if you ever forget to quote a version number, good luck getting version 1.10. Or if you generate a number beginning with 0 it is going to be an awesome debugging session.
Proposal: Use Dhall as the configuration language
Introduction to dhall
From README of the project:
Dhall is a programmable configuration language that is not Turing-complete
You can think of Dhall as: JSON + functions + types + imports
A simple program in dhall with a function and corresponding result of dhall-to-yaml
looks like this:
For the sake of explanation if increment function was to be shared across multiple files, it could be written like this:
So, how can dhall help us with pipelines?
- Solve the repetition problem with imports and functions.
- Solve the need for templating with types which can be neatly converted to YAMLs and the
dhall-to-yaml
binary. - Avoid huge files with imports.
- Additional benefit: Resource authors could publish functions to generate different steps, resource and resource type definitions, so they can be easily consumed.
Here is my attempt at doing it
- dhall-concourse: types, defaults, renderers and helpers for concourse
- dhall-concourse/examples: example pipelines. Contains WIP translation of ~2000 lines of YAML+ERB for CFCR to dhall. It also has couple of simpler examples too.
What I think we should do next?
Write a tool to convert pipelines from YAML to dhall expressions
I haven’t actually tried to do this, so I am not sure if this would be easy to do. But I think one of the biggest challenges would be to have a migration story for concourse users.
An even better proposal
We should consider supporting dhall configuration as first class in concourse. One should be able to run:fly -t some-target set-pipeline -p some-pipeline -c pipeline.dhall
fly could internally convert dhall expressions to json/yaml and send it to the server.
This would obviously be hindered by the lack of golang bindings for dhall expressions. But it could be solved by rewriting fly in either haskell, python or clojure.
PS: These reasons also apply to everything that is wrong with helm, but later on that.