Static and non-static interface member calls in .NET: they are not the same
In an article from Yoshifumi Kawai, I was surprised to read that he decided to use static abstract interface methods because “this avoids the cost of invoking via virtual methods”.
First, an interface call is not really the same as a virtual method call. Interface calls are implemented using virtual stub dispatch (VSD in the BOTR) while virtual method calls are implemented using virtual method tables. But I suspect that Yoshifumi Kawai simply implied that the static abstract interface members allowed the compiler to apply some kind of devirtualization. And the devirtualization concept is applicable to both interface calls and virtual methods calls: it is a feature of the compiler that allows it to determine at compile time which method should be called. Replacing an indirect call by a direct call is really valuable, but the best aspect of devirtualization is that it allows the compiler to inline small methods, thus removing the cost of method calls entirely.
Then, static abstract interface members can only be used in a generic context. And in a generic context, by definition, the compiler does not have access to the generic types, unless the generic types are value types, which is not the case here. So, my initial thought was that abstract interface members would not lead to devirtualization when used on reference types. Yet, Yoshifumi Kawai seems to know what he is talking about. I clearly needed to experiment a bit to understand when a static abstract interface member can be devirtualized.
This is only the introduction, and I am already tired of writing “static abstract interface member”. This feature is awesome, but its name is too long. I will use SAIM for the rest of the article.
A short review of method calls
Let’s start by reviewing how method calls are handled by the .NET JIT. Please remember that the assembly code might change depending on the runtime version, the target platform, or the runtime settings (e.g.: tiered-compilation, tiered-PGO, …). We will look at the assembly generated by the following methods:
Direct method calls
Here is the assembly you might get for a standard instance method call.
There are a few details to notice:
- The call target operand is a constant address.
- The first line is a null check. It might not be included in some situations, for example when the call-site is already an instance method of the same type, or of course when the target method is static.
- The runtime might be able to replace the method call by a jmp instruction, for example when applying a tail call optimization.
- The call might include additional ceremony depending on the method parameters, the return type and the application binary interface (ABI) of the CPU/OS.
But, most of all, a direct call is inlinable, because the target address is known at compile-time, so the call might get removed entirely:
Virtual method calls
Virtual methods calls are also pretty straightforward in C#. There are simply additional dereference steps to load the target address from the type method table.
- The first mov loads the method table (an object reference directly points to the method table pointer).
- The second mov loads the virtual method list within the method table at a specific offset from the method table start (0x40) .
- The call jumps to the method within the method list at a specific offset (0x20).
Of course, virtual method calls are not inlinable, unless the compiler can identify the target type. For example, if you instantiate a type, store the instance in a local variable, and call a virtual method on it right away, the compiler might be able to devirtualize the call.
Interface method calls
Here is the assembly you might get for an interface method call:
The assembly looks very similar to the direct method call because it uses a constant address. But there is a twist: the squared bracket syntax will load the address referenced by the specified constant. Here the constant address is a memory location and not the address of the function, it is effectively an indirect call. The constant address points to the dispatch cell (or “indirect cell” in the BOTR schema) which contains the address of the stub responsible for resolving the target method.
This article is already too long to include an in-depth description of the VSD, but the dispatch cell probably points to a monomorphic dispatch stub at this point. Here is the possible code for the dispatch stub:
It can be roughly translated to:
The case of SAIM
The previous section introduced the JIT result for common method calls. Or maybe it was just an excuse to look at assembly code blocks. But at least we now have a base to compare the SAIM to.
Let’s define two similar interfaces, one with static members and one with regular members:
The interfaces can be implemented by the same type:
We will analyze the assembly of the following call sites:
We already know the assembly generated for
call [DispatchCell]. There is nothing special here. The exact same assembly is also generated for
CallSite_Standard_Generic. I added it as a reminder that interface calls are not optimized in shared generic code. It would of course be totally different if Accessor was a value type.
Open generic context
Now let’s look at the assembly for a SAIM call in a shared generic method:
Well, that’s clearly something new. There is no dispatch cell, and the call target code is much more complex than a monomorphic dispatch stub. It turns out that because SAIM are static, they can be handled by the shared generic infrastructure. The target method is found using a JIT helper, JIT_GenericHandleMethod, and then saved in the instantiated method metadata. This method lookup assembly is not specific to SAIM, it is very similar to the one you would find for loading
typeof(List<T>) in a generic method. However, the JIT helper used here is dedicated to generic methods. Another helper is used for instance methods of generic types: JIT_GenericHandleClass.
The SAIM method call is probably faster than a VSD call but it is generic, while the VSD method lookup is often optimized for a single target type.
Closed generic context
Our last case is simply a non-generic method that invokes the previous method with a specific type. So, the code is now invoking one additional method. It should be slower right? Let’s look at the assembly code again:
Finally! The JIT was smart enough to identify and inline the target method. It is not something that is possible with non-static interface calls, because the target methods cannot be identified at compile time. It is probably the devirtualization Yoshifumi Kawai was talking about.
To be honest, I did not plan to include benchmarks in this article. But looking at the assembly for SAIM calls made me wonder if the performance was equivalent to the VSD. I created two benchmarks, one to measure the raw cost of the method calls, and one to measure the effect of calling methods with multiple types. The code is available here.
The goal of this benchmark is to measure the raw cost of invoking interface methods. Here are the results:
The SAIM call is very slightly slower than a monomorphic VSD call. This is surprising, because the method lookup is only executed once in the SAIM call and then cached. However, it is not that SAIM calls are slow, it is simply that monomorphic VSD calls are very efficient.
However, when the VSD calls are invoked with multiple types, they become polymorphic, and the lookup cost increases significantly.
Finally, the SAIM calls can be inlined and completely defeat VSD calls.
The goal of this benchmark is to confirm that the SAIM method call duration is the same when invoked with a consistent type and when invoked with varying types. Unlike VSD, SAIM calls cannot become monomorphic or polymorphic. This is verified by the results:
It is interesting to note that the “single type” method is still marginally faster than the “multi types” method for SAIM. It might be simply explained by memory effects (the code looks up and calls different methods instead of one).
The .NET runtime always generates dedicated code for generic methods invoked with value type arguments. Therefore, SAIM calls are always direct calls when used with value types, which often results in method inlining. This behavior is important, because the shared math use case was one of the main objectives of SAIM. Of course, this devirtualization was already possible for standard interface method calls.
The case of generic methods invoked with reference types is more interesting. The target method lookup is handled by the shared generic infrastructure instead of the VSD. The target method is found using a JIT helper (JIT_GenericHandleClass or JIT_GenericHandleMethod) and then saved in the instantiated method metadata. This mechanism is very slightly slower than a monomorphic VSD, but the difference is probably insignificant for most use cases. Moreover, the case of shared generic code calling static interface members should be rare, and the method lookup is probably not expected to be monomorphic. However, even with reference types, the SAIM target can often be resolved at compile time thanks to inlining, which is not possible for VSD.
SAIM are an awesome addition to the .NET ecosystem. They might be only seen as a new elegant design option for creating APIs. But they are also a new gear in the performance toolbox for writing efficient code. I will definitely try to see if I can use them to make the .NET Disruptor even faster!
 The pointer offset of the virtual method list is not always 0x40. That is because method tables can contain multiple virtual method lists. Actually, virtual methods are stored in chunks of fixed size. This fact can be easily verified by looking at the assembly code of types with many virtual methods.
Many thanks to @Lucas_Trz for the nerd review.