I choose to start from one of the most common and basic things, Iteration.
We will explore how .NET iterates over List<T> and over an Array<T> and see what the differences are between iteration with a for and a foreach loop .
But before we will start, let’s go over List<T> and foreach and understand how each of them works
The List class stores our items in an internal array named _items, when we are creating a new list without any parameter, _items is assigned to an empty array of T.
By adding an item into the array
.NET starts by updating the version of the array (the _version variable keeps changing by an increment of 1 every time the array is modified so it will be possible to keep track when the collection has been modified).
The _size variable acts as a virtual size for our list, while the physical size is actually the length of the _items array, called Capacity.
So every time that we are adding an item into the list it ensures that the capacity is big enough to store another item and if it is , then the new item will be assigned.
Every time the virtual size is bigger than the capacity, AddWithResize method is being called:
This ensures that the next minimum size will be bigger then the current one plus 1 since we wish to add a new item into the list.
Then the function calls to EnsureCapacity, which sets the new capacity into the default one (DefaultCapacity=4) in case _items is an empty array, or doubles _items size in case that _items is not empty.
That’s the most basic explanation to how List<T> was implemented and the goal is to cover 3 basic points
1. List<T> uses an internal array
2. The Capacity(_items.Length) of the List is different then the Count(_size) of the list
3. The list keeps an updated version variable that keeps track of changes.
The foreach loop
Many different collections such as List<T> or Dictionary<T> can be iterated by a foreach loop, because their class implements one of the following interfaces: System.Collections.IEnumerable or System.Collections.Generic.IEnumerable<T> and satisfies the following conditions:
· has the public parameterless
GetEnumerator method whose return type is either class, struct, or interface type,
· the return type of the
GetEnumerator method has the public
Current property and the public parameterless
MoveNext method whose return type is Boolean.
Foreach loop works as the following example: (the following example is not the real .NET implantation):
So every initialization starts by calling to the GetEnumurator()
Which creates a new Enumerator struct and passes the current list as an argument:
Important to mention that struct is not a reference type, and it’s important to understand the difference between the two - you can read more about that here.
After initializing the Enumerator and assigning the value, the MoveNext() method is being called repeatedly n times:
The MoveNext method starts by assigning the list instance into a local variable, and then validates that the enumerable struct version matches the version of the list (to verify that the list was not modified)
((uint)_index < (uint)localList._size))
is equivalent to:
if (_index>0 || _index<localList._size)
worth to read about the sign bit and Two’s complement
_current = localList._items[_index];
it assigns the next value into _current and increments the index and return true.
Warm up — Simple for loop
Just for warm up, let’s start from a basic for loop, and have a look at following C# program:
“some code” will be executed 333 times as long there is no break, return or throw involved. However, since we are talking about iteration , we will focus on the iteration itself which is highlighted in blue and not the first initialization part.
As we can see, in any iteration, the CPU executes 11 instructions to complete each iteration that is distributed in the following way:
Array iteration with for loop
Now, since we are all warmed up, let’s examine how exactly .NET iterates over an array with a for loop. We will examine the next example:
And the disassembly:
In that iteration, the CPU will have to execute 20 instructions- each iteration is distributed in the following way:
List Iteration With a foreach loop
Above, we saw how foreach iteration is implemented in .NET.
Now, let’s examine the next function:
And the disassembly: (as before I highlighted only the part that occur n times, and ignored the initialization)
It’s easy to see that the current iteration was much longer. The CPU executes 34 instructions, with each iteration distributed in the following way:
List Iteration With a for loop
Before we dig in into the assembly, there is another thing that is worth mentioning. Any time we use on a list instance, we are invoking the following extension:
Now, let’s take the following C# function:
And the disassembly:
In that iteration, the CPU executes 39 instructions n times just for the iteration which is distributed in the following way:
Array Iteration with a foreach loop
And for our last example let’s have a look at the following C# code:
And let’s look even deeper:
In that iteration, the CPU executes 16 instructions n times just for the iteration which is distributed in the following way:
The results are displayed in the following tables: as seen in the first table above, the “total” column represents the number of assembly instructions. The“Estimated CPU cycles score” represents an estimated score that was calculated by the type of the instruction, the latency and the throughput.
To test our conclusions, I created the following TimeMesurment class, And initialized it with the value of 200000000
In conclusion, it is nearly twice as efficient to iterate over an array then a list.
Of course, that List<T> creates a lot of necessary functionality such as adding and removing items from the collection, but sometimes developers use a List when those extra functionalities are unnecessary, as a form of a bad habit.
Thank you for reading.
- Or Yaacov