Introducing Schematic
Our team at Walmart runs a lot of Clojure code. We run a fair number of continuously running services, alongside a smaller number of batch jobs. We have QA environments and multiple production clusters. We connect to multiple databases, JMS and Kafka queues, and a growing number of internal web services.
A common feature of all of that is configuration. We need to know all those URLs and user names and passwords. All those buffer sizes, timeouts, notification emails, and feature switches. All that stuff that applications run on, that varies between local testing, QA, and production, or even varies based on production cluster and data center. It’s a lot to keep track of.
Luckily, we get to do our configuration in EDN, and the configuration is applied to components (via Stuart Sierra’s Component library). Lots of components, lots of configuration:
We’ve found, from hard experience, that the best approach is to keep the configuration of all the servers and tools in one place, and apply an onion-skinning approach. That is, we have a master configuration file that defines all the configuration (all those URLs, passwords, queue names, and whatnot) in a form that is suitable for local testing.
In each environment, there’s an overlay file that is deep merged on top of the master file.
Going back well over a year, we were experiencing some growing pains with this approach: we just had so much configuration and so many components … some shared between applications, some specific to just one. We had tons of boilerplate code to build the system maps that could be used with Component. Lots of little factory functions that would pull just the right bits of data out of the big ball of configuration mud, and convert that into a record or map, and add it to a system map with a well known name (so that it could be used as a dependency elsewhere).
Of course, we kept our eyes open for a solution to these annoyances. We’d seen other libraries, such as Integrant or Mount, but none of them offered us exactly what we needed.
For most of the team, the current state of affairs was just the cost of doing business and we ground our teeth, swallowed our pride, and got on with it … fortunately, Steve Ashton had other ideas.
Steve’s analysis was that:
- We had conflated the structure of our configuration with the structure of our component graph. These needed to be more distinct.
- Dependencies between components were a special type of configuration, and did not need to be in code (this part is clearly influenced by Integrant).
- Components should be instantiated with only their specific configuration map, rather than have the full system configuration passed to them.
- To keep things DRY, we needed a mechanism to centralize key bits of configuration for sharing between components.
- Different applications needed a different subset of the overall set of configuration and components.
Out of this was born Schematic.
Schematic’s job is to configure components, instantiate them, and set up dependencies between them. The input to Schematic is the configuration map, which includes not only all those URLs and passwords, but also the components themselves. The output is a system map ready to be started by Component.
Schematic components come in two flavors: configuration and code.
Configuration components are simply EDN data. They can contain any data whatever. For example, you might want to have a configuration component, :zookeeper
, that centralizes configuration related to accessing ZooKeeper servers.
{:zookeeper
{:root "/clockwork"
:master false
:worker-count 5
:max-task-attempts 5
:task-retry-delay 3000
:connections ["zoo1.mydomain.org:2181"
"zoo2.mydomain.org:2181"
"zoo3.mydomain.org:2181"]} ...}
By itself, this configuration does nothing … but we can combine it with some code components:
:comp/zookeeper
{:sc/create-fn com.walmartlabs.clockwork.zk/map->ZooKeeper
:sc/merge [{:from :zookeeper
:select [:root :connections :zk-id]}]}:comp/task-master
{:sc/create-fn com.walmartlabs.clockwork.task-master/map->TaskMaster
:sc/merge [{:from :zookeeper
:select [:master :max-task-attempts
:task-retry-delay]}]
:sc/refs {:zookeeper :comp/zookeeper}}
:comp/task-worker
{:sc/create-fn com.walmartlabs.clockwork.task-worker/map->TaskWorker
:sc/merge [{:from :zookeeper
:select [:worker-count]}]
:sc/refs {:zookeeper :comp/zookeeper
:task-executor :comp/task-dispatcher}}
This defines three related components, :comp/zookeeper
, :comp/task-master
, and :comp/task-worker
. These are code components because they have the :sc/create-fn
key. We’ve used the :comp/
prefix as a convention, to distinguish configuration components from code components.
The :sc/merge
key is a sequence of rules for how to extend the components’ configuration by incorporating other parts of the configuration: These rules are defining which bits of the :zookeeper
configuration component should be copied into each component’s configuration map.
The function named by :sc/create-fn
is required (that is, the namespace is loaded as necessary) and invoked with the merged configuration map. For example, com.walmartlabs.clockwork.task-master/map->TaskMaster
will be invoked with the map {:master false, :max-task-attempts 5, :task-retry-delay 3000}
(the :sc/
keys are removed during the process of assembling the system).
As is commonly the case, map->TaskMaster
is a factory function created by defrecord
. The factory function is passed the configuration map as the basis for the record, which is returned to Schematic.
Finally :sc/refs
is used to set up dependencies between the components.
The full documentation of schematic covers other :sc/merge
rules; the upshot is that we’ve replaced ad-hoc code that pulled data out of the master configuration with these rules, such that each component sees just what it needs, its own personalized configuration map.
The effect on our code base has been dramatic. Large swaths of confusing and difficult to maintain code have been entirely eliminated. It is easier for us to inject testing mocks and stubs into the system (as a step between schematic and component/start-system
).
A hidden side benefit is that REPL startup is faster. Because Schematic will load namespaces as needed, far less code must be loaded to get to a REPL prompt; the loading of the majority of the namespaces in the application is deferred to inside the call to schematic/assemble-system
. The total time may be the same, but getting to the initial prompt faster keeps developers more focused and happy.
What’s especially gratifying about working in Clojure is how well these kinds of small, focused, useful libraries can be combined. Schematic doesn’t need to know where its input configuration map comes from … it’s up to the application to decide where the file, or files, are stored and even what format they are in. We can easily inject other focused libraries, such as dyn-edn, into the mix. We can also add our own special processing before or after Schematic does its job.
Ultimately, every Clojure developer ends up with a tool chest full of useful tools that fit together quite seamlessly. We hope Schematic will feature prominently in yours.