Validate Service Configuration in Scala

Photo by Med Badr Chemmaoui on Unsplash

A mistake in configuration file may cause several hours or even days of work in order to fix an application on any environment. This gets painful and costly, especially when it comes to production environment.

Often, a configuration is stored in a JSON, YAML, INI, you name it, file format. The nature of configuration is dynamic. That means we can validate and read for the first time, when an application is already running. Of course, it would be nice to reject invalid configuration in compile time. However, it is not real life case, as an application code abstracts away from any concrete value of a configuration parameter. The same code must work with different set of configuration values, whether it is test or production configuration. Validating configuration at startup time (eagerly) is always better than doing that upon first use (lazily) of that configuration. However, validation at application start up time is something in the middle, which can greatly improve life of DevOps or SRE engineers.

Application configuration in HOCON format

Lots of Scala applications use Typesafe/Lightbend Config library, which offers its own format called HOCON. It stands for Human-Optimized Config Object Notation. HOCON format does not require commas, quotes, key and value pairs can be divided via equal or colon sign. It also allows references to existing keys.

Example of HOCON configuration for an HTTP server and JDBC connection:

server {
host = localhost
port = 8080
}
storage {
host = localhost
port = 5432
dbName = trips
url = "jdbc:postgresql://"${storage.host}":"${storage.port}"/"${storage.dbName}
driver = "org.postgresql.Driver"
user = "trips"
password = "trips"
connectionTimeout = 3000
maximumPoolSize = 100
}

There are 11 values and 11 possibilities to make a mistake and cause some mess. Perhaps, password should not be kept as a clear-text in such kind of config file, but be injected from outside via environment variable or a file.

Anyways, similar configuration can be often found in the many service applications today.

It would be great, if HOCON would be chosen instead of YAML for modern DevOps, however YAML is much wider adopted.

Scala Refined library

Refined is based on Scala feature called literal-based singleton types. Refined itself is using this feature via Shapeless library. This allows to validate literal values at compile time. That means any configuration, which would be hardcoded in the code, could be immediately validated in compile-time. In case an invalid value occurs, Refined can produce compile-time error.

What is invalid configuration?

Refined provides some Scala types, which can be used to define fields of a user class. In case value does not comply to a field/variable type, then error is produced. Example:

final case class Server(host: NonEmptyString = "localhost", port: UserPortNumber = 8080)

Server class is defined using two standard Refined types: NonEmptyString, UserPortNumber. In fact, both types are type aliases for a bit more complex type expressions. There are more standard types available.

  • NonEmptyString is to check that string not empty.
  • UserPortNumber is to check that number within possible user-defined OS port . The range is from 1024 to 49151.

Validation in action:

@ val s = Server("", 9)
cmd9.sc:1: Predicate isEmpty() did not fail.
val s = Server("", 9)
^
cmd9.sc:1: Left predicate of (!(9 < 1024) && !(9 > 49151)) failed: Predicate (9 < 1024) did not fail.
val s = Server("", 9)
^
Compilation Failed

Similar validation can be triggered from a code, i.e. in runtime, when the configuration is going to be read from a file into a case class, like Server class above. There is one more library called PureConfig, which is integrated with Refined.

PureConfig

helps to load and validate different configuration sources. One of the source is Typesafe Config. PureConfig can also trigger Refined-based validation.

SBT dependencies for both Refined and Pure Config:

"com.github.pureconfig" %% "pureconfig" % "x.y.z",
"eu.timepit"
%% "refined-pureconfig" % "x.y.z"

An example to load a Config and trigger a Refined validation via Pure Config:

val path = sys.env.getOrElse("APP_CONFIG_PATH",      
"src/main/resources/application.conf")
val parseOptions = ConfigParseOptions.defaults()
.setAllowMissing(false)
val config = ConfigFactory.parseFile(new File(path),
parseOptions).resolve()
val c: Either[ConfigReaderFailures, Server] = 
loadConfig[Server](config, "server")

In case configuration is valid according to Server case class types, then c will be equal to Right(Server). Otherwise, it will be Left value, containing list of errors, explaining what is actually wrong.

More complex example

storage {
host = localhost
port = 5432
dbName = trips
url = "jdbc:postgresql://"${storage.host}":"${storage.port}"/"${storage.dbName}
driver = "org.postgresql.Driver"
user = "trips"
password = "trips"
connectionTimeout = 3000
maximumPoolSize = 100
}

Database configuration has much more places to make a mistake, than 2 fields Server class. Besides checking for empty string, Refined can help to define a type, which is based on regular expression. This can be handy for url field. Numeric fields can be checked for number ranges.

object refined {
type ConnectionTimeout = Int Refined Interval.OpenClosed[W.`0`.T, W.`100000`.T]
type MaxPoolSize = Int Refined Interval.OpenClosed[W.`0`.T, W.`100`.T]
type JdbcUrl = String Refined MatchesRegex[W.`"""jdbc:\\w+://\\w+:[0-9]{4,5}/\\w+"""`.T]
}
final case class JdbcConfig(
host: NonEmptyString,
port: UserPortNumber,
dbName: NonEmptyString,
url: JdbcUrl,
driver: NonEmptyString,
user: NonEmptyString,
password: NonEmptyString,
connectionTimeout: ConnectionTimeout,
maximumPoolSize: MaxPoolSize
)
val jdbc = loadConfig[JdbcConfig](config, "storage")
  • ConnectionTimeout type forces a value to be within a [0, 100000) range.
  • MaxPoolSize type forces a value to be within a [0, 100) range.
  • JdbcUrl type forces a value to comply with a template like jdbc:some text here://some text here: some number here/some text here”
  • Host, Port and DbName independently checked, thus these separate fields help the url field be more “refined”.

Conclusion

It is better to validate configuration than do not validate it at all. This may seem a bit of redundant work, but it actually may save a lot of time when running an app on any mission critical environment. PureConfig provides elegant way to work with Typesafe Config type of configuration. Refined makes it easy to configure user defined types via type aliases. It can help to catch invalid configuration values in compile time and in run time. One would need to just trigger the validation mechanism via Refined itself or through PureConfig-Refined bridge.

Links

  1. Complete source code example: https://github.com/novakov-alexey/akka-slick-vs-http4s-doobie-service/blob/master/src/main/scala/org/alexeyn/configs.scala
  2. Refined: https://github.com/fthomas/refined
  3. PureConfig: https://github.com/pureconfig/pureconfig
  4. Typesafe Config: https://github.com/lightbend/config
  5. SIP-23: https://docs.scala-lang.org/sips/42.type.html