To tag a type
A common need in type-safe code is the ability to tell various values apart. This is done to have the high degree of certainty that a value is used in a context where it is supposed to be used. On the most basic level types serve this exact purpose —
val x: A and
val y: B may be the most obvious example of separating values by labeling them with a type. But once programs start to get bigger, the inevitable thing happens — the types get re-used. This is especially a problem with common types like
Int. The textbook example of this problem is a database id. Usually, the internal representation of such a value is, in fact, a number or
UUID — so at some level it is correct to say
case class User(id: Long, …). But the specifics of internal representation quickly stop to adhere to the meaning being attached to the value. Once you have
case class User(id: Long, …) co-existing with
case class Transaction(id: Long, …), both ids start to become interchangeable, by the principle of having the same type, while clearly they should not be because, logically, id-of-transaction has no meaning as id-of-user and vice-versa. The bigger the domain, the more values share the same representation (after all, almost anything can be thought of as string) but do not share the same meaning.
There are two widespread techniques to overcome this problem. One is to simply create a new type for every context. Translating it to Scala, you’ll get:
case class UserId(value: Long),
case class TransactionId(value: Long) etc.. The other is called type tagging. It is, in essence, a generic type constructor (usually called
@@) that concatenates our base type with some other type carrying context information (called a tag). Again, translating it to Scala, you’ll get:
Then you can write:
and, as with case-classes, enjoy the benefits of being able to tell these two apart eg. you can’t assign transaction-id to user-id
While the basics of type tagging are as outlined above, there is more than meets the eye to it. One thing that may look suspicious is the use of
asInstanceOf. Isn’t downcasting like that unsafe and doomed to fail at runtime? Well, it is not unsafe unless the tag changes runtime representation. If it does not have any fields or methods to be called on — it will be totally ok as you can’t mistakenly access anything that would reveal that the type of the value is faked. That’s why we have used empty traits as tags. Also, due to various Scala compiler idiosyncrasies this representation is not enough. For instance:
Taking all this into account, the actual representation we have been using in our library kebs is a bit more involved:
What to use?
Being given the two techniques, which one to choose? Creating case-classes is considerably simpler to understand
than, somewhat hairy, tagging. But there are some big advantages of tagging.
Tagged-type is subtype of its base-type
String @@ Name is simply an elaborate alias for
String with Name, meaning you can use it anywhere you could use
String. One effect of this feature is that all
String methods are readily available eg. you can call
trim on it. Be warned though — what you’ll get back is a plain
String, so you’ll have to tag it back manually. Or, one thing that you cannot do with case-classes, treat tagged types as a functor:
The other is that
”email@example.com”.@@[Email] == “firstname.lastname@example.org” — unlike case-classes where, obviously,
Email(“email@example.com”) == “firstname.lastname@example.org” is false.
That may not seem as much at the first glance, but it plays a bigger role in connection with variance in Scala. Consider a scenario where you want to serialize a tagged value to JSON using
play-json has a contravariant typeclass for this:
A @@ B <: A then
Writes[A] <: Writes[A @@ B]. This means you do not have to write a single line of code to implement JSON serialization. For instance,
Writes[Int] <: Writes[Int @@ UserId], which implies that the default implementation will be picked up. That’s a much bigger problem with the case-class approach, where you simply need to re-implement everything and, as explained in the next section, it turns out that you cannot really do this as generically!
Tagged-types are generic
Even when you cannot benefit from variance then you can add required implementations in a generic manner — that is for all types A @@ B . Consider, for instance, that you need to save a tagged value to a database with Slick. It does not work out of the box since required
BaseColumnType[A] is invariant, meaning there is no relation between
BaseColumnType[A @@ B] and
BaseColumnType[A]. But you can easily state that the
BaseColumnType[A @@ B] exists if
You can do even better than that using the casting trick we discussed in #Scala-specifics.
Lack of genericity is the biggest problem when it comes to case-classes. We’ve even built the whole library around doing the same thing with them :-(
With case-classes you pay for the object allocation whenever you wrap a value into a case-class. Of course, you can mitigate the effect to some extent by extending
AnyVal. But still there could be a price of extracting the value back to do some operation unless somewhere some JIT decides to inline it. Compare:
Everything has its ups and downs. There are two particular problems we have identified while using tagged types extensively.
Case-classes are easier to enhance
Let’s say you need a string-like type representing plain-text passwords. Clearly, things marked as plain-text passwords will end up encrypted at some point, so you’d probably want an
encrypt operation on them. With case-classes, this is the easiest thing in the world for any programmer:
This approach guarantees that all extra operations are easy to find and immediately available.
In case of tagged types this gets more convoluted. You’re going to need an
implicit class, since a tagged type is just a type alias:
It needs to be visible wherever you want to apply conversions, so code organization must be well thought out (package objects seem to be the best option). Additionally, you will have to rely on a tool to give you a hint that such an operation is defined for some particular tag.
Case-classes are easier to validate
More often than not, by tagging a value you ascertain that it possesses some properties. After all, not every string is valid as a user name (say, it must be non-empty and cannot contain some characters), and not every long is suitable for an id (say, it must be positive). This validation is essential if you tag at your system boundaries eg. when you deserialize data from JSON request. Then tag serves as the proof of data validity that is being carried throughout the system. And again, this is natural with case-classes — just add
require to the case-class body or a bit of code to its constructor, and a bit awkward for tagged types since they lack type-specific constructors (the generic
@@ constructor is too generic for this :-) ). Inevitably, you’ll end up simulating constructors:
This is nauseatingly repetitive (we ended up using
scalameta to generate all this boilerplate — see kebs-tagged. You have to keep in mind that you’ll end up with two almost-the-same-but-different ways of constructing a tagged value — the other being the generic
@@ which will bypass validations (counterargument: this might come handy eg. in tests you might not care about the value being properly validated)
The need for such a solution is apparently strong in the Scala programmers’ community, because for the future Scala versions the so-called opaque types proposal is being implemented. You can read about the details of the concept here. In a nutshell, tagged types can be implemented using the new construct as (a slightly adapted version of the code found on the website):
As you can see, there is a promise of letting awkward type definitions and suspicious
asInstaceOf go. Even though I could not find any details of this proposal that would hint on how to alleviate aforementioned pain points of tagged types, this is definitely a step in the right direction.
To tag, or not to tag?
By all means — to tag! Compared to case-classes, tags have very good properties, are generic and do not incur performance costs. If you want to see the actual implementation we’ve been using at the Iterators, please take a look at kebs-tagged.