Origin Routing Machine: Making our own rules
The team Webcore-infra at SVT develops and manages infrastructure for use by the teams that develop all user-facing online services that SVT provides. This includes databases, container platforms and much more, which we manage from the hardware level up.
Historically, the Webcore-infra team at SVT have also provided load-balancing and proxy services for online services developed at SVT. These services served as origin for our CDN and were managed by the Webcore-infra team only.
As is common in agile organisations, development teams saw a need to sometimes make quick and independent changes to request routing and filtering, and the CDN we were using provided this functionality in its UI. Although neither the UI nor the functionality itself was always appropriate for this, at least the teams had access to it and could make quick changes without waiting for the infra team to register their request. Great, right?
So time passed. In 2018, we found ourselves in a situation where we were using our CDN to reverse-proxy the majority of requests based on host headers, paths, and parameter values. However, the rulesets were becoming increasingly large, complex and new rules commonly interacted in unexpected ways with pre-existing rules, sometimes leading to incidents. Some of these incidents were easy to diagnose as they occured in direct connection with a ruleset update, but not all. As in all cases where cache is involved, there were cases when the effects showed up several hours, or days, later, making the problem increasingly difficult to diagnose.
We also had no guarantees that the CDN solution we were currently using would remain our primary partner in a year's time, or that the next CDN solution would even support the type of rules we were currently depending on. We needed a new solution, and this new solution had to be:
- Unaffected by CDN switches.
- Easily managed by the developer teams, free of dependencies on the Webcore-infra team.
The seed that inspired the creation of ORM was the idea of being in control of our own routing rules at SVT. Having a tool like ORM enables one to be CDN agnostic, because there is no need to use proprietary routing and rewriting features.
We did not want to build our own CDN. We also did not want to construct a proxying load balancer from scratch. We wanted, as the economical sysadmins that we are, to continue using the software that we trust and know well, but we also wanted to let the developers control the configuration of that software to some extent, without generating an incident (apologies to all Varnish-savvy developers out there).
After some discussion we decided on a strategy. We would define a yaml syntax (later dubbed ORM rules) for defining the proxy rules independently of the software we decided on, and then write code that would parse the rules and ideally let us choose between different output formats. To start with, that output format would be "working config files for HAProxy and Varnish cache". HAProxy took care of the routing and Varnish would perform rewrites, redirects and any other custom responses we required.
Another problem we wanted to solve was having multiple teams editing the same ruleset without stepping on each others toes, and being confident that "if I make a change, it will not conflict with or break any other rule in the whole ruleset". The solution to this problem was to become the ORM rules collision checking algorithm, that detects if more than one rule can apply logic on the same request.
ORM, the Origin Routing Machine (or ORM Routing Machine, depending on who you ask), was first developed in the spring of 2018, in close collaboration with our users, the development teams at SVT. The ORM array went into full production around the time of the Soccer World Championships 2018, during which we were able to tune a lot of the capacity settings in the array. The developer teams migrated their routing rules successively from the CDN to the ORM array, in the beginning with a lot of hands-on assistance from us, but by the end of the process wholly independently.
Today, our ORM rules live in a shared git repo with a CI pipeline set up to validate, build, deploy to stage and run any defined tests for all rules on every commit. Developer teams work in branches and Merge Requests are automatically merged if the corresponding pipeline passes and the rules are then deployed to production. The process requires no waiting for resources from us (only the occasional rebase) and rules no longer collide or have unintended effects. The automation is not a part of ORM, but ORM is what enables the automatic deployment of routing rules in a safe way. In the future, we hope to see ORM expanded to support more output formats, more filtering features, advanced request rewriting, possibly cache settings, and the kitchen sink – which is why the project is now open sourced. Regardless of your choice of software, we want you (and us) to make the rules.
If you want to use, contribute to or find out more about ORM, the project is available on Github: https://github.com/SVT/orm. We hope you enjoy it!
Frida Hjelm, Christian Hernvall & the rest of the Webcore-infra team