Avoiding the Configuration Spiderweb
Why ‘Convention over Configuration’ should always be your watchword
I’ll be honest with you. I have as much culpability in our configuration hellhole as any other dev and engineer at our company.
You see, that’s how the configuration spiderweb gets you. Everyone starts out with the best of intentions. A database connection string here, a cache server url there. One at a time. One dev, one engineer, one new configuration parameter on the old build / deployment server.
Time passes, and those couple of configuration parameters grow. And grow. And grow.
Before you know what happens? A couple of years tick by, and what was a few configuration values in the deployment server (We use Octopus Deploy, but whatever deployment system you use, it’s all the same) is now the configuration spiderweb, and you and the whole DevOps team is trapped in it.
The Configuration Spiderweb Defined
If you’ve never encountered the Configuration Spiderweb, let me define it as best I can
Configuration Spiderweb A convoluted series of configuration variables and values in your application, your deployment server, or even your database which has become a form of technical debt.
Now, most programming languages, deployment servers, even build servers use some form of configuration. Configuration usually looks pretty much the same in any tool.
"ExternalApiUrl" : "https://some.url/API",
"LogLevel" : "Info",
And in any number of circumstances, I have no problem with configuration. Some configuration is necessary, particularly when there is no other option. Examples of no other option? Let’s put a pin in that.
What is configuration, and why to use it as sparingly as possible
As I said, everyone always starts off with a few simple configuration variables. No one intends to end down some rabbit hole surrounded by hundreds of configurations, each with any number of possible deployment values (depending on which server to which you’re deploying, what you’re building, your compiler options, and so on), and all of them taunting you. Yet this is where we find ourselves all too often.
Configuration is, for all intents and purpose, a series of name-value pairs. Name-value pairs represent data, data which must be managed, cared, watered, and fed. All of this somewhat related data is stored in any number of systems across your organisation, none of which have much of anything to do with one another, and none of which are designed by default to interact in a neat and meaningful way with one other. Actually, one of the few things they do have in common is the fact that they all have some of your configuration data!
Configuration is, in fact, a valid way of extracting parameters as variables out of your code, and passing them in from an external source. It means you can vary any number of things based on your configuration, and this is a good thing.
As long as you stay in control.
Configuration is great, until it’s not. This is why you need to understand when you should and shouldn’t use it.
When to use configuration
- Values you know you will likely change There is a strong chance that at some point, I will want to change the log level in my application. There is a strong chance I will want to disable or enable some feature for a given rollout. These are good reasons to use configuration.
- Values outside of your control One of your integration partners has decided to update the URL to their API. You don’t want to bake this into the code. You want to be able to configure this. Urls of any sort are good candidates for this kind of thing, as are tokens or credential information.
- Values which represent part of your security model I’m clearly not talking about usernames and passwords in plain text, but having a central configuration repository for tokens and secure encrypted password is far better than embedding this right into your app.
- Values which MUST vary between environments We all know there are those unique values which represent a given environment in our platform. If you have such values, then configuration is a good use case for such things. If an environment must have values unique to them, then configuration is a convenient and powerful way of dealing with this.
When NOT to use configuration
- When you want to prop up differing infrastructure Every seasoned engineer knows this one. Prod doesn’t look like Test. Test has a different drive for web files from QA. QA is pointed at a database named “MyApp-Db”, but in my dev environment, it’s called “MyApp-Dev”. This is one of the most common use cases for configuration, and quite frankly, the worst reason in the world to use it. Configuration should not be propping up your crappy infrastructure implementation. Get your head in the game, standardise as much of your environments as humanly possible, and ditch those superfluous configuration parameters.
- When you ‘might’ want to change a value Devs are particularly guilty of this mojo. “Well, we might want to change the scopes our REST API accepts,” says the overzealous developer. I used to nod along like a simpleton. Now I roll my eyes and groan. Be honest with yourself: If you are saying you ‘might’ want to change a configuration value, the chances are you won’t change it in any environment in real-time. Why add this superfluous overhead for something which might eventuate, but in all likelihood never will? If, later on, you find you DO need to change this value, add the configuration when you KNOW you need it, not because you think it might be handy.
- When you are addicted to generics I used to do SAP development. If you aren’t familiar, SAP is a software platform for business and customer relationship management. At it’s heart, SAP is nothing more than a LOT of configuration to glue together some pretty generic process flows. In theory, this sounds amazing. A platform which can do all kinds of things, just by configuring it? Awesome! Yeah, a lot of corporate chumps got played on that front, and a lot of devs suffered configuration spiderweb hell for decades afterwards.
Architects love generics. Take something and keep refining it down until it is a generic process, with lots of generic calls and configuration injection to make it work for near-on anything. This is the ultimate in over-architecture, and one to avoid like the plague. Generics have a place, but there is a limit. If you’ve gotten so generic you’re using configuration left/right/center, then stop and rethink. You’ve gone too far. A little bespoke is a good thing.
Convention-based programming to the rescue
Now, if we’re not going to use configuration, what’s the alternative? What does one use instead of configuration.
Convention-based programming is a model I have been using for some time, and encourage architects, devs, engineers, and testers to adopt whenever and wherever they can.
Convention-based programming is a simple idea. Build your applications to ‘assume’ values in their infrastructure as following a convention. For example, my log file output directory is always ‘/mapped-drive/logs/websites/site.name/’, my database connection string is always mapped to the name named instance, no matter in which environment its running. My cache server URI is always mapped to an internal reference of ‘cacheserver: 6379’.
Convention-based programming starts from the idea that your application will run atop a standardised environment. This environment is self-contained and looks as alike all other environments hosting my app as humanly possible.
Tools such as a Docker containers, or Docker Compose / Kubernetes clusters are an excellent example of the kind of environments I’m talking about. Such environments are based on a default image, with all infrastructural needs either automatically available, or scripted into the container’s image (drives, networking, exposed ports, what have you). My code is then bundled on to this infrastructure image, and the image itself is the unit of deployment.
This kind of coupling of your infrastructure and your code into a deployable unit is massively powerful. I am no longer held hostage by the whims of infrastructural changes which can (and more often DO) occur without my knowledge.
Convention-based programming is a simpler way of looking at software development. You’ve removed those niggling ‘what ifs’ which crop up in bespoke deployment environments. The developer is no longer forced into using configuration to mask disparate environments, for that issue is no longer a concern.
Convention-based programming often rides on the back of Convention over Configuration, a concept one finds in many application frameworks, such as Ruby on Rails, Java Enterprise, and ASP.NET Core. These frameworks often use ‘sensible defaults to even further reduce Configuration Spiderwebs. Convention over configuration
I truly hope you are not one of the suffering masses of IT staff who find themselves trapped in the Configuration Spiderweb. If you’re not one, then do yourself a real solid, and don’t become one by doing something stupid. If you are one of the suffering masses, the best I can say is that I feel your pain, and hope you can find your way out without too much struggling.
That’s it for now. If y’all will excuse me, I have a deployment server to de-lice of nearly 700 configuration variables. I’ll let you know how it went in 2–6 months.
Thanks for reading, and I hope this helped.