Published in

.NET Under the hood

13 min readJul 8, 2023

Internal Mechanisms of Tasks in .NET

This is the first blog post of the Unveiling Asynchronous & Parallel Programming in .NET Blog Series.

What do we mean by the word Task in .NET:

By the word Task, we sometimes refer to the .NET’s Task class or an Instance of the Task class.

Sometimes we also refer to some CPU-Intensive task(a few lines of code defined by a delegate on Task.Run()) that is needed to be executed or an I/O task(Defined by .NET’s class library).

If the reader isn’t careful enough, this might create confusion.

Let’s see some examples,

XYZ is an instance of Task(Here task refers to the .NET Task Class)

Abc method returns a Task(Here, task refers to an instance of the .NET Task Class)

The task is getting executed(Here, task refers to a few lines of code, say a delegate, for example, on Task.Run(delegate)).

The ReadAsync Task is finished(I/O task Defined by the .NET’s class library for reading a sequence).

Task-Based Programming Model(Here, task refers to the Concept/Idea)

In the .NET Doc, the Task word is used in such a convention, and we will also use it similarly.

Why Tasks? (Identifying the problem):

Well, maybe a few questions have already come to your mind.

Why Tasks? Why is it even needed? Why can’t we just stick to the good old thread-per-request approach?

In the thread-per-request model, the server assigns a dedicated thread to serve the request whenever a request comes to the server.

But this thread-per-request approach has some problems.

Problems of the thread-per-request approach:

Most of the requests in a real-world application server do several I/O. It can be reading from the database, writing to the database, or making some HTTP calls to other servers, etc.

So as per the thread-per-request approach, if we assign each request a separate thread, the threads will be blocked for I/O most of the time. It is a waste of resources.

Imagine handling the thread-per-request approach with the C10K(Concurrent 10K requests) problem.

Here you need at least 10K threads to serve the requests. But you will see that most threads are idle/blocked, waiting for some I/O to complete.

Managing so many threads will take a vast resource, and most of the resources here are being wasted by sitting idle, waiting for some I/O to complete.

Sometimes, you may even get into the Thread Pool Starvation problem(That is, a new request has arrived, but you don’t have any thread to serve that request, all the threads are already in use for other requests). It will degrade performance significantly.

How do tasks solve this problem? (The idea):

The solution can be defined with two simple rules. They work together,

Rule 1: We won’t assign a dedicated thread to each request.
Rule 2: We will prevent thread-blocking on I/O.

Now let me explain how these two rules are implemented…

Implementation of the Solution:

In The Thread-Per-Request Model, a request is an unbreakable unit of work.

A request may contain CPU tasks, I/O tasks, etc., and everything is executed on the same assigned thread. That assigned thread is responsible for the request’s CPU tasks(By being scheduled on the CPU) and I/O tasks (By sitting idle/blocked).

But in the Task-based Programming model, Each request is broken into several tasks(CPU and I/O tasks).

And then, the CPU tasks are eventually executed by a thread.

I/O tasks don’t need CPU time and aren’t executed like CPU tasks on threads. The .NET runtime has an internal mechanism that handles I/O based on the underlying operating system.

By Default, The thread pool worker threads execute the CPU tasks of the requests, and it is not guaranteed that the CPU tasks of the same request get executed on the same thread.

Read the lines above again… Yes, it is not guaranteed that the CPU tasks of the same request get executed on the same thread.

In this implementation, both of the rules described above are followed. Let’s see how…

Rule 1: We aren’t dedicating a thread to a request, rather, we are breaking the requests into tasks, and the CPU tasks of a request can be executed on different threads.

Rule 2: We aren’t also blocking threads for I/O tasks. Instead, We are dividing tasks into two categories, CPU tasks and I/O tasks, and CPU tasks only execute CPU operations, so they don’t need to block for I/O. And I/O tasks are handled differently by .NET runtimes internal mechanism.

The next section will describe how CPU tasks are scheduled and executed.

CPU Task Execution:

Task Scheduler:

Generally, Tasks are Scheduled on a Task Scheduler for execution. A Task Scheduler decides ‘When & Where’ a task should be executed.

A Task Scheduler may,

queue the task to the default .NET thread pool to be executed
create a new thread to execute the task
queue it on a Custom Thread Pool for execution
schedule it on another Task Scheduler
execute it in the current thread

It actually depends on the Task Scheduler’s implementation. We will discuss several task schedulers in this post.

.NET Default Task Scheduler:

.NET has a default task scheduler that queues the tasks to the default thread pool for execution. All created tasks will be scheduled to the default task scheduler by default.

For example, when we create tasks using Task.Run(), the task is scheduled to the default Task Scheduler.

But with the TaskFactory.StartNew(), we can explicitly tell in which scheduler we want to execute our task.

When a task is scheduled to the Default Task Scheduler, the answer to When and Where to execute the task is,

When: Now/Immediately.
Where: isLongRunning ? in a new thread: on the default thread pool;

That means when a task is scheduled to the default Task Scheduler and is not long-running, it will be immediately queued to the Threadpool to be executed.

But If it is long-running, a new thread is created to execute the task.

If you don’t know long-running tasks, they are tasks created by TaskFactory.StartNew() with TaskCreationOption longrunning. Generally, it is used for tasks that take a long CPU time.

The reason behind the Default Task Scheduler’s creating a new thread for long-running tasks is:

long-running CPU-intensive tasks may consume too much time in a thread pool thread, and it can make other queued tasks of the thread pool wait a long time.

So when we create CPU-intensive tasks that may run for too long, we should run it as a long-running task. Else it will make other tasks that are queued on the thread’s queue waiting for a long time.

And even if we are creating a custom task scheduler, executing long-running CPU-intensive tasks in a separate thread other than the thread pool thread is recommended.

Other than the Default TaskScheduler, .NET provides some other Task Schedulers. We can also create a Custom Task Scheduler. We have described more about this at the end of this post.

The .NET Default Thread Pool:

The .NET Default Task Scheduler Queues tasks on the .NET Default thread pool. The thread pool keeps the task on one of its queues, and then a thread pool thread executes it.

Task Queues on Thread Pool:

The .NET Default thread pool keeps a single global FIFO queue and local queues(one for each thread pool thread).

Where the Tasks are queued?:

Top-level tasks, which are tasks that are not created in the context/inside of another task, are put on the global queue just like any other work item.

However, nested or child tasks, created in the context/inside of another task, are handled quite differently.

A child or nested task is put on a local queue specific to the thread the parent task executes.

If we want to explain in more detail, These tasks go to the Global queue

Tasks that are not created in the thread pool thread
If tasks are queued in the thread pool with ThreadPool.QueueUserWorkItem(), or ThreadPool.UnsafeQueueUserWorkItem() method.
If we set the TaskCreationOption to PreferFairness on Task.Factory.StartNew().
Task.Yield() uses TaskCreationOption.PreferFairness, so tasks created for this will also be queued to the global queue.

Generally, in other cases, tasks will be put on a local queue specific to the thread where the task was created.

The reason behind this queuing Architecture:

The thread pool had a single queue in the prior versions of CLR 4.0/.NET Framework 4.

The New queuing Model and Task-Based asynchronous Programming were introduced in CLR 4.0/.NET Framework 4.

So Why the local queues? Why can’t we stick to the single global queue?

Well, local queues are introduced for efficiency. With local queues, we get benefits from the cache locality, and these also reduce contention.

Let me explain how…

Reduce of Contention:

Contention means the conflict over access to a shared resource.

In our case, if there were only a single task queue, all the threads trying to safely deque the tasks from the queue would create a huge contention.

But with the ‘a local queue per thread’ architecture, each thread will dequeue and execute tasks from its own local queue most of the time, so there is no contention here.

By the way, what if a thread’s local queue doesn’t have any task? Will it sit idle?

The answer is ‘No.’

The thread pool features a work-stealing algorithm to help make sure that no threads are sitting idle while others still have work in their queues.

When a thread-pool thread is ready for more work, it first looks at the head of its local queue and executes tasks from there.
If the local queue is empty, it goes to the global queue and executes tasks from there.
If the global queue is also empty, then it goes to the local queues of other threads and executes tasks from there.

So there will be some contention when a thread dequeues tasks from global or other threads queue. But it is far less than the single global queue model.

Cache Locality:

Threads access their own local queue in last-in, first-out order (LIFO) to preserve Cache Locality. It’s because, most of the time, data required for the last task is hot in the cache.

How?

Well, it’s obvious that the most recently executed code’s data will be hot on the CPU L1/L2 Cache.

And here, the most recently added task is the last task of the queue, so it has a better chance that the data it needs is hot in the cache than a task that was queued long ago.

But If a thread finds a work item in the local queue of another thread, i.e., work-stealing, it first applies heuristics to ensure that it can run the work efficiently, and if it can, it de-queues the work item from the tail (in FIFO order).

If you like my Content, You may support me at the link below

Completing Task Manually:

An I/O task is complete when the I/O is complete, and a CPU task is complete after the task is executed. The runtime handles their completion.

But what if we want to create a task that will be completed by us manually?

Well, we can create such a task using TaskCompletionSource Class.

TaskCompletionSource has a Task Property and a SetResult() method.

After instantiation of the TaskCompletionSource, we get the task, and we can make the task complete manually by just calling the SetResult() method.

It gives us Better Control, it has several use cases.

One of its use cases is to convert old Callback based asynchronous Programming Patterns to Task-Based Asynchronous Programming using it. Let’s see how we can do this.

Converting Callback-Based Asynchronous Pattern to Task-Based Asynchronous Pattern:

Here, on the Callback-Based Asynchronous pattern, we passed a callback that will be executed later when some event completes or something happens. It gives us less control. We can’t ‘await’ the event.

But if we convert this to a Task-Based Asynchronous Pattern, we can ‘await’ for the event with more control.

Understanding Async/Await with tasks:

To better understand how Async/Await works under the hood, we will try to convert an await point with task.continuewith().

We have converted the async Method AbcMethodAsync to AbcMethod using tasks in the image above.

We see the code before the first await point is executed synchronously. Tasks get created on the first await point.

By the way, we have made this so simple that it should be illegal. If you want to know more details, Read this blog post.

One important thing to note is that,

If the task that is being awaited is already completed, then the code will be executed in a Synchronous way.

i.e. in that case no task will be created and no scheduling will be done. The code will be executed like the method call was synchronous.

Do and Don’ts in the Task-Based Asynchronous Programming:

Task-Based Asynchronous Programming can cause some gotchas, especially among developers from Thread-Per-Request backgrounds.

The trick to avoiding these gotchas is to remember that two tasks of the same request may execute on separate threads because it is the root cause of all the pitfalls.

I have discussed common gotchas and what you should and shouldn’t do.

Don’t Use Synchronous I/O:

We have discussed how .NET divides tasks into two categories, CPU tasks, and I/O tasks. But always keep in mind that it can only be done when you are using asynchronous I/O APIs.

If you use Synchronous I/O, the thread will be blocked and cause significant performance issues.

.NET provides Asynchronous APIs for almost all the I/O’s, We should use them always.

Don’t Use ‘await’ inside locks that have thread affinity:

If a lock has thread affinity, only the thread that acquired the lock can release the lock.

If a thread acquires a lock and there is an await inside the lock, then there is a chance that the task created after the await will be executed on another thread, so the lock may never be released.

For example, Lock and Monitor have thread affinity.

The .NET compiler doesn’t allow ‘await’ inside lock().

By the way, using await inside Monitor won’t cause a compilation error. But a Runtime Exception will be thrown if the task created before and after the await point gets executed on different threads.

In real life, most of the time, the ‘await’ inside Monitor will work fine because child tasks created from the same parent task will be put on the same queue of the thread pool. However, work-stealing may happen, which will cause an Exception on runtime.

So the developer needs to be careful here because getting some exceptions randomly on the production can be frustrating.

So if we need to await inside a lock, what can we do?

Solution:

Using only Synchronous I/O’s instead of Asynchronous I/O inside the lock will work, but it is not a solution. We should always use Asynchronous I/O APIs.

The solution is to use locks that don’t have thread affinity.

For example, instead of lock/Monitor, we can use Semaphore/SemaphoreSlim with InitialCount=1 and MaxCount=1.

ThreadLocal or AsyncLocal? Choose Carefully:

ThreadLocal stores data on a thread’s Thread-Local-Storage.

Since tasks of the same request can be executed on separate threads, it may happen that a task stores some data on thread-local-storage, but a later task of the same request can’t access it because the later task is executing on another thread.

Only store Data on Thread-Local-Storage if it is specific to the current thread. Request-specific data should not be stored on the Thread-Local-Storage.

So where can we store request-specific data so that any task of the same request can access it?

The answer is AsyncLocal.

Async-Local Stores data on ExecutionContext. All tasks of the same request share the same ExecutionContext.

So even if two tasks of a request are running on a separate thread, the later task can access AsyncLocal data stored by the previous task since they share the same ExecutionContext.

If you like my Content, You may support me at the link below

More on Task Scheduler:

Decorator Task Schedular(A Task Scheduler scheduling tasks to another Scheduler):

We can implement a decorator pattern on Task Scheduler.

That is, a Task Scheduler contains its logic of when a task should be executed, and when the time comes, it schedules the tasks on another scheduler for execution.

Concurrent & Exclusive Scheduler:

The Concurrent & Exclusive Scheduler provided by .NET are examples of Decorator Task Schedulers.

Two Schedulers, a Concurrent & an Exclusive Scheduler, are found as properties by creating an instance of ConcurrentExclusiveSchedulerPair Class.

The tasks scheduled on Concurrent Scheduler may execute concurrently, but the Tasks scheduled on Exclusive Scheduler execute exclusively(no two tasks will execute simultaneously).

Internally, when Tasks are Scheduled, the concurrent/Exclusive Scheduler keeps them on an Internal Data Structure(IProducerConsumerqueue<task>).

Scheduling the Tasks to Execute Concurrently or Exclusively is the responsibility of the Concurrent Scheduler and Exclusive Scheduler, Respectively, and they contain the business logic to do that.

And when they want to task to be executed, they just schedule the task to their Underlying Task Scheduler. Then the Underlying Task Scheduler takes responsibility.

By default, the underlying scheduler is the Default .NET Scheduler. However, if we want the internal scheduler to be any other scheduler, we need to pass that scheduler on the constructor of ConcurrentExclusiveSchedulerPair class.

So we see, Concurrent/Exclusive Schedulers are just decorators who decorate the Underlying Scheduler.

One use case of the Concurrent & Exclusive Scheduler is for some reader and writer tasks. The reader tasks can run concurrently and be scheduled on the Concurrent Scheduler. The Writer Tasks should be executed exclusively, so they should be scheduled on the Exclusive Scheduler.

Custom Task Scheduler:

We can also create custom task schedulers that may use the default thread pool or a custom thread pool, other thread(s), or another Task Scheduler.

References:

https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler#remarks

https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler#the-global-queue-vs-local-queues

https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler#work-stealing

http://www.danielmoth.com/Blog/New-And-Improved-CLR-4-Thread-Pool-Engine.aspx

TaskCompletionSource Class (System.Threading.Tasks)

Represents the producer side of a Task unbound to a delegate, providing access to the consumer side through the Task…

learn.microsoft.com

https://medium.com/criteo-engineering/net-threadpool-starvation-and-how-queuing-makes-it-worse-512c8d570527