Improving performance of LINQ operations.
LINQ is a great tool for writing easy to read and easy to maintain data processing code. I’ve been writing articles on how using it incorrectly will seriously affect the performance of the application but, even when used correctly, it usually does not result in the best performant solution.
For this reason, many avoid using it. Some use libraries that try to solve this issue. Some of these libraries try to stay true to the LINQ API, some have very different solutions.
Recently, I’ve started wondering what if LINQ used some of the techniques described below? I’m sure I’m not the first one but I want to explore this. I may get stuck in a dead end, still I’m sure I’ll learn a lot along the way.
foreach in C# is actually syntactic sugar. The
foreach (top) is converted by the compiler into a
while loop inside a
finally block (bottom):
Lets make a small change. Lets convert the result of
Range() into a
List<int> by adding a
Notice that the
enumerator variable type changes from
Note: This is one great reason why you should always use
var. When writing similar code, if left as
IEnumerator<int>, it would inadvertently affect performance.
varuse results in the best case. This is why I use
Let's dive into the .NET Core source code to better understand what’s going on.
IList<T>, making it also implement
IEnumerable<T>. The implementations of the two
GetEnumerator() methods, that are required by the interfaces, are explicitly implemented. This means that these methods are only called when the
List<T> is cast to one of these interfaces. Otherwise, the public
GetEnumerator() implementation is called. This version of
Enumerator, which is a public inner struct of
When the compiler finds a foreach, it doesn’t simply look for the
IEnumerator<T> implementations. It looks for a public method
GetEnumerator() that returns a type that implements a public property
Current and a public method
MoveNext(). It uses the types returned and not the interfaces they implement. This is duck typing! But, why?
List<int>.Enumerator is a struct, a value type. If
List<T> is stored in an
IEnumerator<T> variable, the
IEnumerator<T>.GetEnumerator() explicit implementation will be used and the
List<int>.Enumerator will be boxed. This means that it will be copied into the heap and that all method calls will be virtual. Not a major issue when these methods are called a few times but these are usually called millions of times in the lifetime of most applications. It’s a huge issue when dealing with large sets of data.
foreach keyword, together with value-type enumerables which are returned by all the .NET framework collections, guarantees the best possible results.
LINQ is a library of extension methods to
IEnumerable<T>. Everything is cast to
IEnumerator<T>. This means that, although it includes many internal optimizations, ends up as the worst scenario described above.
It’s possible to use interfaces in .NET without causing boxing of value types.
The following code demonstrates it using a simple
IInterface with one method.
MyStruct is a value type that implements
MyStaticClass implements three methods that show different ways of invoking the method defined by
IInterface. After the C# code, you’ll find the equivalent code in IL and JIT Asm.
NOTE: The code was converted using SharpLab. The attribute on the third method is used so that SharpLab generates the JIT for when
Things to notice in the IL:
- When the parameter is a value type, the method is called using the
- When the parameter is an interface (reference type), the method is called using the
- When the parameter is a constrained interface, the method is called using the
callvirtinstruction but preceded by
Things to notice in the JIT Asm:
callvirtresults in more expensive code than
- Although the constrained interface uses a
callvirtinstruction, it results in the same code as
This means the compiler will generate a new method for each
TInterface found, avoiding boxing and virtual calls for value types. It’s one more duck typing done by the compiler.
LINQ still does not take advantage of this feature when passing in the
Since LINQ was created, the following interfaces were added to the
IEnumerable<T>, these give a read-only view of the collections but exposing functionality that can be used to avoid some of the
All .NET framework collections implement these but LINQ is still not aware of them…
Read-only structs and read-only refs
This is the repository where I keep the code for this brainstorm. Feel free to explore and contribute.
I’ll be writing followup articles on the successes and failures along the way…