Overcoming type erasure in Scala

This article aims to show a couple of techniques to tackle some common problems caused by type erasure in Scala.

Introduction

Scala has a really strong type system. Existential types, structural types, nested types, path-dependant types, abstract and concrete type members, type bounds (upper, lower, view, context), use-site and declaration-site type variance, support for type polymorphism (subtype, parametric, F-bounded, ad-hoc), higher-kinded types, generalized type constraints… And the list goes on.

But even though Scala’s type system is theoretically very strong, in practice some type-related features are weakened by the restrictions and limitations of its runtime environment — that’s right, I’m looking at you, type erasure.

What is type erasure? Well, simply put, it’s a procedure performed by Java and Scala compilers which removes all generic type information after compilation. This means that we are not able to differentiate between, say, List[Int] and List[String] at runtime. Why does the compiler do this? Well, because Java Virtual Machine (the underlying runtime environment that runs both Java and Scala) doesn’t know anything about generics.

Type erasure exist for historical reasons. Java didn’t support generics from the beginning. So when they finally added them in Java 5, they had to keep the backward compatibility in mind. They wanted to allow seamless interfacing with old, non-generic legacy code (that’s why we have raw types in Java). What happens under the hood is that type parameters in a generic class are replaced either with Object or its upper bound. For example:

class Foo[T] {
val foo: T
}
class Bar[T <: Something] {
val bar: T
}

becomes

class Foo {
val foo: Object
}
class Bar {
val bar: Something
}

So you see, runtime has no idea about the actual class that a generic class was parameterized with. In our example, it only sees raw Foo and Bar.

Don’t think that type erasure is a product of someone’s incompetence or ignorance or whatever. It’s not bad design (which would suggest it was a product of someone not being smart enough or competent enough); it’s a deliberate trade-off. There is a lot of stuff to consider when dealing with source, binary and behavioral compatibility and Java guys took a lot of time to think this through and do what they thought was best. Personally, I feel that a better long-term decision would have been simply breaking backwards compatibility and forcing people to work with generics in the upcoming releases of Java. But from a business perspective, their move is completely understandable. Making things complicated for a significant portion of your clients (and thereby potentially pissing them off) is not something you would easily choose to do.

Anyways, I really don’t want to diverge that far from my intended topic. What I do want to talk about is how we can deal with type erasure in Scala. Unfortunately, there’s no way to prevent type erasure itself, but we will see a couple of ways to work around it.

How it works (or doesn’t work)

Here’s one simple example of type erasure:

object Extractor {
def extract[T](list: List[Any]) = list.flatMap {
case element: T => Some(element)
case _ => None
}
}

val list = List(1, "string1", List(), "string2")
val result = Extractor.extract[String](list)
println(result) // List(1, string1, List(), string2)

Method extract() takes a list of all kinds of objects; since it holds objects of type Any, we can put numbers, booleans, strings, bananas, oranges, whatever. By the way, seeing List[Any] in a piece of code should be an instant “code smell”, but let’s forget about best practices for a second and focus on the problem with type erasure.

So, our desire is to have a method that takes a list of mixed objects and extracts only objects of certain type. We can choose this type by parameterizing the method extract() with it. In the given example the chosen type is String, which means that we will try to extract all strings from a given list.

From a strictly language point of view (without going into runtime details), this code is reasonable. We know that pattern matching is able to figure out the type of a given object without problems by deconstructing it. However, due to program being executed on JVM, all generic types are erased after compilation. Therefore pattern matching cannot really get far; everything beyond the “first level” of type is erased. Matching our variable directly on Int or String (or any non-generic type, such as MyNonGenericClass) would work fine, but matching it on T, where T is a generic parameter, cannot work. Compiler will give us a warning saying “abstract type pattern T is unchecked since it is eliminated by erasure”.

To provide some assistance with these situations, Scala introduced Manifests somewhere around version 2.7. However, they had problems with not being able to represent certain types so Scala 2.10. deprecated them in favour of the more powerful TypeTags.

Type tags are divided into three separate types:

  • TypeTag
  • ClassTag
  • WeakTypeTag

Even though this is the official classification from the documentation, better division in my opinion would be something like:

  • TypeTag:
    - “classic”
    - WeakTypeTag
  • ClassTag

I’m trying to make a point that TypeTag and WeakTypeTag are actually two flavours of the same thing with only one significant difference (as we’ll show later), while ClassTag is a quite different construct.

ClassTag

Let’s get back to our extractor example and see how we can fix the type erasure problem. All we’re going to do now is add a single implicit parameter to the extract() method:

import scala.reflect.ClassTag
object Extractor {
def extract[T](list: List[Any])(implicit tag: ClassTag[T]) =
list.flatMap {
case element: T => Some(element)
case _ => None
}
}
val list: List[Any] = List(1, "string1", List(), "string2")
val result = Extractor.extract[String](list)
println(result) // List(string1, string2)

And voila! Suddenly the print statement displays “List(string1, string2)”. In your face, type erasure. Note that we can also use context bound syntax here:

// def extract[T](list: List[Any])(implicit tag: ClassTag[T]) =
def extract[T : ClassTag](list: List[Any]) =

I will use the standard syntax simply to make the code as clear as possible, without any extra syntax sugar.

So, how does it work? Well, the thing is that when we require an implicit value that is of type ClassTag, compiler will create this value for us. Documentation says:

If an implicit value of type u.ClassTag[T] is required, the compiler will make one up on demand.

So, the compiler is happy to provide us with an implicit instance of a needed ClassTag, we just need to ask. This mechanism will also be used with TypeTag and WeakTypeTag.

OK, we have our implicit ClassTag value available in extract() method (thanks, compiler). What happens once we’re inside the method body? Look at the example once again — not only did the compiler automatically provide us with the value for our implicit parameter tag, which is nice enough, but we never needed to use the parameter itself. We never had to do anything with the “tag” value. It’s the mere existence of it that allowed our pattern matching to successfully match the String elements in our list. OK, that’s pretty nice of the compiler, but it feels like there’s too much “magical stuff” going on. Let’s see that in more detail.

We can check the docs in search for an explanation. Indeed, it’s hidden here:

Compiler tries to turn unchecked type tests in pattern matches into checked ones by wrapping a (_: T) type pattern as ct(_: T), where ct is the ClassTag[T] instance.

Basically what happens is that if we provide the compiler with an implicit ClassTag, it will rewrite the condition(s) in pattern matching to use the given tag as an extractor. Our condition:

case element: T => Some(element)

gets translated by the compiler (if there is an implicit tag in scope) into this:

case (element @ tag(_: T)) => Some(element)

In case you never saw the “@” construct before, it’s just a way of giving a name to the class you’re matching, for example:

case Foo(p, q) => // we can only reference parameters via p and q
case f @ Foo(p, q) => // we can reference the whole object via f

If there is no available implicit ClassTag for type T to be used, compiler will be crippled (due to lack of type information) and it will issue a warning that our pattern matching will suffer from type erasure on type T. Compilation won’t break, but don’t expect compiler to know what T is when we get to pattern matching (since it will be erased by the JVM at runtime). If we do provide an implicit ClassTag for type T, compiler will be happy to provide a proper ClassTag at compile-time as we have seen in our example. The tag will bring along the information about T being a String and type erasure cannot touch it.

Looks good, doesn’t it? But there’s one important weakness. If we wanted to differentiate our types on a higher level and get values of List[Int] from our initial list while ignoring e.g. List[String], we would not be able to do so:

val list: List[List[Any]] = List(List(1, 2), List("a", "b"))
val result = Extractor.extract[List[Int]](list)
println(result) // List(List(1, 2), List(a, b))

Whoops! We wanted to extract only List[Int], but we got a List[String] too. Class tags cannot differentiate on a higher level. Only on the first one. This means that our extractor can differentiate between e.g. sets and lists, but it cannot tell apart one list from another (e.g. List[Int] vs List[String]). Of course, it’s not just the lists — this goes for all generic traits/classes.

TypeTag

Where ClassTag fails, TypeTag succeeds gloriously. It can differentiate a List[String] from a List[Integer]. It can go deeper too, such as differentiating List[Set[Int]] from List[Set[String]. This is possible because TypeTag has richer information about the generic type at run time. We can easily get the full path of the type in question, as well as all the nested types (if there are any). To get this information, you just need to invoke tpe() on a given tag.

Here’s an example. The implicit tag parameter is provided by the compiler, just like with ClassTag. Pay attention to the “args” argument — it’s the one that contains additional type information which ClassTag doesn’t have (information about List being parameterized by Int).

import scala.reflect.runtime.universe._
object Recognizer {
def recognize[T](x: T)(implicit tag: TypeTag[T]): String =
tag.tpe match {
case TypeRef(utype, usymbol, args) =>
List(utype, usymbol, args).mkString("\n")
}
}

val list: List[Int] = List(1, 2)
val result = Recognizer.recognize(list)
println(result)
// prints:
// scala.type
// type List
// List(Int)

(You may need to add a dependency).

I introduced a new object here — a Recognizer. What happened to the good-old Extractor? Well, sad news. We cannot implement an Extractor using TypeTags. Good thing about them is having more information about the type, such as knowing about the higher types (that is, being able to differentiate List[X] from List[Y]), but their downside is that they cannot be used on objects at runtime. We can use the TypeTag to get information about a certain type at runtime, but we cannot use it to find out the type of some object at runtime. Do you see the difference? What we passed to recognize() was a straightforward List[Int]; it was the declared type of our List(1,2) value. But if we declared our List(1, 2) as a List[Any], TypeTag would tell us that we passed a List[Any] to it.

OK, here are the two main differences between ClassTags and TypeTags in one place:

  1. ClassTag doesn’t know about “higher type”; given a List[T], a ClassTag only knows that the value is a List and knows nothing about T.
  2. TypeTag knows about “higher type” and has a much richer type information, but cannot be used for getting type information about values at runtime. In other words, TypeTag provides runtime information about the type while ClassTag provides runtime information about the value (more specifically, information that tells us what is the actual type of the value in question at runtime).

There’s one more thing worth mentioning regarding the difference between ClassTag and (Weak)TypeTag: ClassTag is a classical good old type class. It comes bundled with a separate implementation for each type, which makes it a standard type class pattern. On the other hand, (Weak)TypeTag is a bit more sophisticated and to use it we need to have a special import in our code, as you may have noticed in the snippet given earlier. We need to import the universe:

Universe provides a complete set of reflection operations which make it possible for one to reflectively inspect Scala type relations, such as membership or subtyping.

Don’t worry, all you need to do is to simply import the correct universe, and in case of (Weak)TypeTag that is scala.reflect.runtime.universe._ (docs).

WeakTypeTag

You are probably under the impression that TypeTag and WeakTypeTag are quite similar as all the differences so far were explained in respect to the ClassTag. And that is correct; they are indeed two variants of the same tool. But, there is an important difference.

We saw that TypeTag is smart enough to examine a type as well as its type parameters, then their type parameters etc. However, all those types were concrete. If a type is abstract, TypeTag will not be able to resolve it. This is where WeakTypeTag comes into play. Let’s revise the TypeTag example for a second:

val list: List[Int] = List(1, 2)
val result = Recognizer.recognize(list)

See that Int over there? It could have been any other concrete type, such as String, Set[Double] or MyCustomClass. But if you have an abstract type, you need a WeakTypeTag.

Here’s an example. Note that we need a reference to an abstract type so we will simply wrap everything in an abstract class.

import scala.reflect.runtime.universe._
abstract class SomeClass[T] {

object Recognizer {
def recognize[T](x: T)(implicit tag: WeakTypeTag[T]): String =
tag.tpe match {
case TypeRef(utype, usymbol, args) =>
List(utype, usymbol, args).mkString("\n")
}
}

val list: List[T]
val result = Recognizer.recognize(list)
println(result)
}

new SomeClass[Int] { val list = List(1) }
// prints:
// scala.type
// type List
// List(T)

Resulting type is a List[T]. If we had used a TypeTag instead of a WeakTypeTag, compiler would have complained that there is “no TypeTag available for List[T]”. So, you can look at WeakTypeTag as a sort of a superset of the TypeTag.

Note that WeakTypeTag tries to be as concrete as possible, so if there is a type tag available for some abstract type, WeakTypeTag will use that type tag and thus make the type concrete instead of leaving it abstract.

Conclusion

Before we finish, let me mention that each type tag can also be instantiated explicitly using available helpers:

import scala.reflect.classTag
import scala.reflect.runtime.universe._

val ct = classTag[String]
val tt = typeTag[List[Int]]
val wtt = weakTypeTag[List[Int]]

val array = ct.newArray(3)
array.update(2, "Third")

println(array.mkString(","))
println(tt.tpe)
println(wtt.equals(tt))

// prints:
// null,null,Third
// List[Int]
// true

That’s all. We saw three constructs, ClassTag, TypeTag and WeakTypeTag, that will get you through most of your type erasure troubles in your everyday Scala life. Note that using tags (which is basically reflection under the hood) can slow things down and make the generated code significantly bigger, so don’t go around adding implicit type tags all over your library to make the compiler smarter “just in case” and for no practical reason. Save them for when you really need them. And when you do need them, they will provide a powerful weapon against JVM’s type erasure.

As usual, feel free to contact me on sinisalouc@gmail.com or find me on Twitter.

A single golf clap? Or a long standing ovation?

By clapping more or less, you can signal to us which stories really stand out.