Better Applications Using Dynamic Configuration Management

Published in

Walmart Global Tech Blog

9 min readSep 30, 2021

Image credits: https://pixy.org/4864454/

Introduction

When an application is developed, it typically contains a configuration file. In the race to ship the application, these configuration files become a part of the application build-packages. The most common problem faced by teams is this — a simple change in one configuration property requires a new build. The developer says “It is a simple property change, we don’t have to run entire regression suite for this”. The QA disagrees, the engineering manager sweats, the product manager rolls eyes. Finally, the team ends up running the entire regression suite again, followed by promotion of build to environments. All for a simple property change.

Let us imagine a world where change in application configuration does not require a new build and does not require application restart. In such a world, there exists a component called ‘Configuration Server’.

What is a Configuration Management System?

A configuration management system is a software component responsible for managing application configurations. These configurations are used by applications in the ecosystem to carry out their responsibilities. Simply put, the raison d’etre for any configuration management system is to provide capability to store and retrieve configurations to other services.

At a high level, the following diagram represents a configuration management system.

Role of a configuration management system

Multiple applications in the software ecosystem communicate with the configuration system to retrieve configurations. The mode of communication varies from system to system. Some systems may expose RESTful APIs, other may use specific protocols for a full duplex communication, while a few even use messaging system to broadcast configuration change events. It is fairly common to find configuration management systems developed in house in organisations. So before you go hunting for an open source solution, do find out if your organization has one in place already.

Configuration systems should be singular across environments.

All environments (such as DEV, QA, Stage, UAT, Prod) should use the same deployment of configuration management system to store their configurations. Teams should stay away from having environment specific deployments of configuration management systems, to avoid overhead of configuration promotions from one environment to another. A singular deployment also brings down the infrastructure cost.

How to store configurations?

There is no industry standard for storing configurations. Typically, configurations are stored as key value pairs, where the key is string and value can range from simple strings to complex objects like JSON or YAML. It is a good idea to leave the interpretation of values to consuming applications. The configuration-system should be flexible enough to take up any key-value pairs given to it.

How to Store Secrets?

There are times when teams may have to store secret information such as passwords, API keys, secrets, etc. Configuration management systems are not the right choice to store secrets. They should instead be stored in ‘Secrets Management Tool’. A few notable examples of good secrets management tools are HashiCorp Vault, Azure Key Vault, AWS Key Management Service and Google Cloud’s Secret Manager.

In rare circumstances, where teams do not have a Secrets Management Tool in their ecosystem, secrets can be strongly encrypted and stored in configuration management system. One should be careful not to store the encryption keys in the configuration management system in such cases. However, this approach is highly discouraged.

Features to look for

Dynamic updates of configurations

The configuration management system should be able to perform CRUD operations on configurations without any downtime or interruption of service.

Realtime propagation of changes

The configuration system should be able to ensure that any updates to configuration should be readily available for application consumption. This can be achieved in many ways.

One approach is to develop a client library which can be bundled in each client application. This client library should expose an interface of getting the latest value of a given configuration from the configuration system. However, this approach comes with additional cost of higher loads on configuration system servers.

Another approach is to use messaging systems such as Kafka, RabbitMQ, etc. in this case, the server takes up the responsibility of broadcasting an update on the messaging system as soon as a configuration is updated. The clients consume the broadcasted messages to update their locally cached values to the latest one. The con of this approach is additional infrastructure cost of messaging system, more code in client applications for consuming updates.

Environment based definitions

The configuration system should be able to intuitively represent values based on environment. This allows better partitioning of values by environment and lesser chances of human error in updates. Example: feature-flag being turned on in production when it was supposed to be turned on in Stage (this happens!).

Leader election and failure recovery

The configuration system should be resilient to failure. Hence, choose a distributed configuration system, instead of one which can easily become a SPOF (single point of failure). Example: systems such as Zookeeper and Redis which can easily scale up to thousands of node, and are smart enough to handle touch problems like leader election and split-brain networks.

Versioning

Keeping track of configuration values by versioning them helps in reverting configuration values. If change of a configuration causes an existing functionality to break, the DevOps team an quickly revert the configuration to the earlier version. In absence of this features, it is fairly common for team to run around looking for the previous value, with which the feature worked.

Audit

The capability to track the four W’s — What , When, Who, Why — comes in handy when things break due to configuration changes. This also comes in handy in systems where compliance is of utmost importance.

Options

The following solutions are some of popular and battle tested options available, if one needs to setup a configuration management system.

Reference Architecture : ZooKeeper

Let us look at Apache ZooKeeper as a reference system. We shall first go through the architecture and core concepts of ZooKeeper, then move on to see its application as a configuration management system.

ZooKeeper Architecture

ZooKeeper runs on a cluster of servers called an ensemble that share the state of your data. Whenever a change is made, it is not considered successful until it has been written to a quorum (at least half) of the servers in the ensemble. ZooKeeper quorum is the minimum number of nodes a ZooKeeper cluster needs to be up and running, for a ZooKeeper cluster to work. It is recommended that the ensemble should consist of odd number of nodes.

A more realistic *ensemble which avoids the ‘split brain’ issue*

A leader is elected within the ensemble. All write requests from clients are forwarded to a single server, called the leader. The rest of the ZooKeeper servers are called followers.

All write requests to ZooKeeper are are forwarded to the leader. The leader processes the write requests, forwards them to all the followers and waits for ack responses from the followers. If at least half of the followers answer, the write is considered successful. ZooKeeper also guarantees that writes from the same client will be processed in the order they were sent by that client.

Data model and the hierarchical namespace

ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace which is organized similar to a standard file system. The namespace consists of data registers — called znodes, in ZooKeeper parlance — and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can achieve high throughput and low latency numbers. The ‘znodes’ are identified by unique absolute paths which are ‘/’ delimited Unicode strings.

Unlike standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory. Znodes maintain a data structure that includes version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode’s data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data. The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.

Guarantees

ZooKeeper is very fast and very simple. Since its goal, though, is to be a basis for the construction of more complicated services, such as synchronization, it provides a set of guarantees. These are:

Sequential Consistency — Updates from a client will be applied in the order that they were sent.
Atomicity — Updates either succeed or fail. No partial results.
Single System Image — A client will see the same view of the service regardless of the server that it connects to.
Reliability — Once an update has been applied, it will persist from that time forward until a client overwrites the update.
Timeliness — The clients view of the system is guaranteed to be up-to-date within a certain time bound.

ZooKeeper as Configuration Management System

Structuring Data

Let us say, we have an app called ‘myapp’, it’s configuration is defined in a properties file as follows.

myapp.api.retry.enabled=true
myapp.api.retry.maxcount=3
myapp.api.retry.fallback=linear

To move this configuration to ZooKeeper, we need to represent the properties as znodes, as shown in the diagram below. These properties can be added and updated using ZooKeeper command line interface.

Once the znodes are created in ZooKeeper, all the interested clients need to query the following properties (aka znodes).

/myapp/api/retry/enabled
/myapp/api/retry/maxcount
/myapp/api/retry/fallback

Environment Specific Configurations

Extending this to become environment aware, we can add environment name as leaf node to the znode tree (as depicted in the diagram below). This allows us to reuse the whole configuration tree, and add environment specific values at the end.

The znodes shown above, translate to the following properties

/myapp/api/retry/enabled/dev
/myapp/api/retry/enabled/qa
/myapp/api/retry/enabled/prd/myapp/api/retry/maxcount/dev
/myapp/api/retry/maxcount/qa
/myapp/api/retry/maxcount/prd/myapp/api/retry/fallback/dev
/myapp/api/retry/fallback/qa
/myapp/api/retry/fallback/prd

Querying Configurations in Applications

ZooKeeper supports integration libraries (aka client bindings) for most of the programming languages. ZooKeeper ships with C, Java, Perl and Python client bindings. Support for C#, Node.js, Scala, Go, etc. is available from community.

Clients communicate with ZooKeeper using language specific bindings.

Application — ZooKeeper integration using supported client libraries

Each application should include the binding library and be aware of the list of ZooKeeper servers.
When applications needs to fetch a configuration, they should invoke the GET method in ZooKeeper library.
The ZooKeeper library then makes a call to one of the zookeeper servers. If the target server is unavailable, library tries the next one.
Once a response is received, the the application can use the response value.
Applications can also leverage the ZooKeeper watch recieve updates when a particular configuration value is updated or deleted in ZooKeeper. This can reduce load put on ZooKeeper servers by continuous polling by clients.

Application — ZooKeeper integration leveraging ‘znode watch’ feature

Conclusion

Configuration management is an important aspect of managing applications, specially in a modern, fast paced, customer facing applications. It is important to decouple configuration from code, this decoupling can ensure that configuration can be used to control code behaviour in production. It can even minimise the risk of rolling out new features in production by controlling behaviour by configurations (such as feature flags and engines to be used to serve requests). Choosing the right configuration management system can add significant value to release cycle of applications.