Configuration Considerations

Darius
Sempiler
Published in
7 min readMay 20, 2019

Configuration design is a subset of general API design that is a principal driver in the pursuit of low friction to use, high value from use.

Whether through auxiliary file formats (JSON, YAML etc), command line arguments, or scripts, elegant configuration design is a necessary and systematic process in software development.

Motivation In Context

Often times the scope of a project grows or changes because in solving the original mission statement you discover new, quantifiably better approaches, and/or consequently reframe previously ‘wouldn’t it be cool if..’ features, to ‘these are definitely core MVP’ features.

In the case of Sempiler the underlying intent is still to tackle the frustrations I personally have with the modern developer’s toolkit for x-platform development — namely it’s reliance on blackboxes of framework glue and Virtual Machines that degrade app performance… which in turn degrade user experience.

However, when it came to developing an API for subjective optimisations (allowing the user to execute custom code at compile time, and/or generate custom code) it dawned on me that not only had the scope of the original goal crept, but I was passing that overhead onto the developer by asking them to verbosely tell the compiler what to build and how to build it.

Scope creep isn’t always a bad thing. It could be a sign of you learning more about your problem space, or finding product market fit. But passing on the complexity to the end user is definitely a bad thing, and can occur through poor, rigid or uninformed API design.

In an attempt to mitigate such problems let me break down how I think about configuration design.

An example of Sempiler input, using compile time directives

Zero Configuration

Zero configuration (ZC) is as it sounds.

A tool that maybe suitably flexible in nature, but makes opinionated assumptions to provide a default configuration for the tool, in lieu of explicitly provided parameters by the user.

When assessing whether your software can be zero configuration consider:

  • Is it possible? Can the tool infer or assume parameters given no input. Not in terms of have you built support for it yet, but in terms of it being theoretically and rationally feasible.
  • Is it useful/informed? In supporting ZC by making opinionated assumptions, is the tool actually supporting a valuable non-trivial use case
  • Is it configurable? Does the tool provide an interface to allow explicit configuration, to support scenarios that deviate from the default setup
  • Is it obvious/expected? Does the user know what assumptions the tool is making on their behalf, and particularly in the case of compilation, are those assumptions faithful to the intent of the source code
  • Is it reliable? The assumptions are deterministic, and reproducible on everyone’s machine at any time
  • Is it expensive? If the assumptions the tool makes mean it does double the work and/or takes longer to do the work, then maybe the user would rather give the tool some hints. If the compiler has to parse a file and fully explore it’s AST to perform some inference, then actually just passing an argument to the tool works better for the user!
  • Is it justifiable? The more flexible you make a tool, the more code you need to write to validate configuration parameters, and ingest them. If only 1% of your users need that flexibility, it can probably be deferred or you can provide a work around, …or those users should just conform!

…And maybe even more factors besides those.

Close-To-Zero Configuration

In my experience it is rare that a truly zero configuration tool is completely useful, and in Sempiler I’ve opted for a close-to-zero configuration paradigm.

For example, the compiler supports generating multiple artifacts (clients, server etc) from one codebase — So tired am I of sharing common code between myriad repositories.

That pretty much rules out ZC. Indeed, the compiler needs to know something about the artifacts you want it to generate, and cannot necessarily grok this from the source code alone.

Accordingly, below are 3 design sensibilities I am employing in search of a low friction/high value interface.

Script-ish

Generally scripts by design feel lightweight to write and run. TypeScript is a great input language for Sempiler because it feels light to write like a typical script, but beyond that it gives us useful concepts that help tremendously with accurate, x-platform native code generation — such as explicit type annotations.

Scripts can articulate the same level of complexity as a full solution or project, but in a form that (correctly or otherwise) feels compact and concise. The friction in transferring your intent to the machine is simply

  • Create file
  • Put code in file
  • Execute file

Moreover, a script is an abstract enough idea that we can use it to contain:

  • Input source code (for execution at run time)
  • Configuration code (for execution at compile time)
  • Plugin code (for execution at compile time and/or run time).

For example, if the user wants to write a consumer that ingests a compilation result, that’s just a case of authoring a script… NOT a full blown DLL (as previously proposed by yours truly. *shudder*)

Indeed the manifests and boilerplate code in solutions and projects is friction between our intent and the tool. Friction we want to avoid.

It’s not axiomatic that we need such things, nor that their absence limits capabilities.

And in writing x-platform code we need to further consider that different target platforms have different entry point conventions (a static main method, an index file etc.).

As a default assumption Sempiler will just make the file scope level code in your index file run as the entry point for the platform you are targeting. If that involves wrapping your code in a class and/or main method, so be it.

Though this sounds contrary to my earlier mantra of adherence to target platform conventions, instead think of this as an extension of ZC. One that can be overwritten if the user wants to do that heavy lifting themselves.

Maximal Inference

This is about making the most from as few data points as possible, namely squeezing every last drop from the data you make the user provide.

In terms of configuration design, we consider the least amount of information we need the user to provide. Said process is informed by determining parameters that allow us to infer multiple things.

For example, in order to signal to the compiler what artifacts to create you use a directive in your code of the form:

#compiler build(name : string, targetLanguage : string, targetPlatform : string)

Now consider this earlier versions of the same API where multiple functions perform similar tasks:

#compiler create_client(name : string, targetLanguage : string, targetPlatform : string)#compiler create_server(name : string, targetLanguage : string, targetPlatform : string)

This is to tell the compiler the specific kind of artifacts you need (client, server etc.).

But I latterly realised you can derive this information using the targetPlatform (and in some cases, the targetLanguage too):

#compiler build("foo", "typescript", "node");

From this we can infer user wants to build a server artifact, because node is a server environment that is often configured to run typescript.

But here:

#compiler build("bar", "java", "android");

The user wants to build a mobile client, because android is a mobile OS that that supports native code written in java.

So from a combination of targetLanguage and targetPlatform we can deduce:

  • What language to emit
  • The semantics we have to adhere to when generating code in that language
  • The APIs we can leverage for recreating the source intent in the target platform (eg. network I/O, file I/O, drawing pixels)
  • The environment we have to use for performing semantically accurate compile time code execution/evaluation
  • The supplementary files/boilerplate we have to emit for that target platform (eg. manifests, config) to constitute a valid artifact (each of which may also be in a different target language!)

…And maybe even more factors besides those.

The point is we have a cleaner API, which is less friction for the end user, yet we are still able to infer all of the above information.

Likewise we can extract multifaceted value from the artifact name alone.

First and foremost we can allow the user to toggle pieces of code based on the artifact being generated:

#compiler if(artifact_name("foo"))
{
System.out.println("Hello from foo");
}

We can use this mechanism to conditionally execute some particular code at compile time, run time, or both.

Moreover, the cost of deciding this is entirely at compile time. Emissions for any artifact not called foo will not evaluate or include the conditional block.

But in truth, you can also use the targetLanguage and targetPlatform in similar predicates…!

Beyond this we can also use the artifact name to:

  • Assume that all input scripts for the artifact are named either src/<name>.*, or reside inside src/<name>/**/*
  • Assume that all output files from compiling the artifact will be written to out/<name>/**/*
  • Assume that all post build (consumer) scripts for the artifact are named either post/<name>.*, or reside inside post/<name>/**/*

This makes concrete sense if you substitute <name> for folders you see in repositories all the time like mobile, desktop, android, server etc.

Fundamentally, it’s an example of configuration inference that makes rational sense because it is informed by convention.

Conventional

Following convention reduces the need for explicit configuration.

By assuming that…

  • Source scripts live in the src/ folder
  • Output files live in the out/ folder
  • Post build scripts live in the post/ folder

…The compiler can parse and compile the relevant files for each artifact without being told these paths explicitly.

Note also that the scripts inside said folders can still link to common, shared code that may not live in the exact same folder. The compiler just needs to deduce where to look for the seed/initial files, a problem it solves through said use of convention.

This is informed convention too. Namely, many code repositories already follow the same (or very similar) file hierarchies so again, the friction for tool adoption is lower.

The file hierarchy serves as the configuration!

Lastly, as with any of the opinionated assumptions the compiler makes on your behalf, this reliance on convention can be overwritten or supplemented as necessary for your use case.

Check out the Sempiler website for more info, or follow this blog to stay up to date on the forthcoming release.

--

--

Darius
Sempiler

Software Eng // prev @Microsoft // passionate about compilers & tooling 🛠️