Demystifying Windows Internals — Part 1 of 2: Windows Threads

Published in

SyntheticSecurity

13 min readJun 30, 2023

What is a Windows Thread , how do they work and how can they be used maliciously by threat actors?

I’m creating this series to attempt to demystify some important topics in the seminal work by Mark Russinovich “Windows Internals”, I want to break things down and try to explain things in a way that I would have needed them explained 5 years ago before I got into InfoSec. That said, I still am learning myself and in no way am implying I’ve mastered the book and understand all concepts in it, far from it — as a mentor in my field said to me once “Your job now is to be a student 24/7”

This book goes into galactically intricate and microscopic detail about the inner workings of the Windows OS. It’s extremely important to understand the very basics of the OS if you want to truly excel at blue teaming and defending systems from threat actors. I will be reviewing other topics in the book however for now we will start here.

NOTE: At the bottom of the article, I have a little glossary for some of the words and terms I’ve used which might be new to some. Scroll down and check it out if there’s a word you’re unsure of. If the word is bolded there’s a high chance I have a definition for it, however because I am a daughter of chaos I sometimes choose to define words within the paragraphs as well, there’s no reason for this.

In this article we’re going to go over the what and the how, we will cover security (how windows secures Threads & Processes) in the next follow up article as it’s way too vast a topic to cover in passing.

To understand a thread, we must first confirm the definition of a process, a process in simplest terms, is an executing program (teams.exe / agent.msi). One or more threads run in the context of the process.

Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process id, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process starts with a single thread, often called the primary thread, but can create additional threads from any of its threads.

All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled.

A thread is the basic unit to which the operating system allocates processor time. A thread can execute any part of the process code, including parts currently being executed by another thread. To put it another way, a thread is an entity within a process that Windows schedules for execution. Without it, the process’s program can’t run. A thread includes the following essential components:

· The contents of a set of CPU registers representing the state of the processor

· Two stacks — one for the thread to use while executing in kernel mode and one for executing in user mode (“living between two worlds”)

· A private storage area called thread-local storage (TLS) for use by subsystems , run-time libraries and DLLs

· A unique identifier called a thread ID (part of an internal structure called a client ID; process IDs and thread IDs are generated out of the same namespace

Looking at threads from an API (Application Programming Interface) perspective, the simplest creation function in user mode is CreateThread. This function creates a thread in the current process, accepting the following arguments:

· An optional security attributes structure

· An optional stack size

· A function pointer

· An optional argument

· Optional flags

On successful completion, a non-zero handle is returned for the new thread and, if requested by the caller, the unique thread ID.

In Windows threads a handle is a token that represents a resource that is managed by the Windows kernel. A non-zero handle is a handle that has been assigned a value other than zero. In the context of threads, a non-zero handle can be used to identify a thread object. The index of the first thread in the process is zero, and the index of the last thread is the number of threads in the process minus one. The number of threads in the current process can be found by using GetNumberThreads.

A caller refers to the code or program that initiates a function or system call. When a thread makes a function call or requests a specific operation, the calling thread is known as the “caller.”

In Windows, threads interact with the operating system and other components by invoking system functions or API (Application Programming Interface) calls. These functions perform various tasks such as creating files, manipulating resources, allocating memory, or managing synchronization objects. When a thread invokes a function or system call, it becomes the caller of that particular operation

To make CreateRemoteThreads operate properly, the process handle must have been obtained with enough access rights to allow such operation. Protected processes cannot be injected in this way because handles to such processes can be obtained with very limited rights only (we will learn about these rights in the next article in this series).

Now in order for this function to pass through to the kernel , CreateRemoteThreadEx is used (this is a superset of CreateRemoteThread) which calls NtCreateThreadEx in Ntdll.dll (NTDLL is a library file that contains NT kernel functions for use by user-mode applications. It provides the interface between user-mode and kernel-mode components of the operating system)

This makes the usual transition from user mode to kernel mode, where execution continues in the executive function NtCreateThreadEx, there the kernel mode part of thread creation occurs.

This part of the operating system is fascinating to me, there is at all times this seamless constant symphony of communication / exchange taking place between the user mode and the kernel. This goes on behind the scenes without the end user ever knowing, it’s one of the many things I think we take for granted. I won’t go into Kernel/User modes here but if you want to understand it read this: https://learn.microsoft.com/en-us/windows-hardware/drivers/gettingstarted/user-mode-and-kernel-mode

Next , in an effort to consolidate this article, we’re going to skip over the more architectural understandings of the actual data structures of threads and get some hands on visuals on what threads actually look like in the GUI. Download Process Explorer , grab a drink and lets dive in.

In this example we’re going to look at the threads associated with everyone’s favorite file storage application, OneDrive, which definitely doesn’t randomly delete files, refuse to delete document you no longer need and fail to sync important documents you’ve been working on for days :]

Very cool, here at the top of the list, we can see the ntdll process I just defined earlier. If we right click and choose Stack we can see what’s contained within the thread and its parent threads:

So what are all these different columns, lets take a look at WaitReason: The Wait Reason column in Process Explorer displays the reason why a thread is waiting. The “wrqueue” wait reason means that the thread is waiting on a KQUEUE object in kernel (KQUEUE is a kernel object used to manage queues of IRPs (I/O request packets) in the Windows operating system. It is used to manage I/O completion packets (IRPs) in the kernel-mode I/O subsystem). This can be a call to ZwRemoveIoCompletion or Win32 shell GetQueuedCompletionStatus (IOCP is exactly KQUEUE object).

In the Wait Reasons , we see a bunch of UserRequests entries , this is used by the win32k.sys (this provides the interface between the user-mode graphics device interface (GDI) and the display driver. It’s responsible for managing windows, drawing on screen, and other visual related functions) subsystem. Usually, this is when a thread calls GetMessage. So if we view WrUserRequest, we can be sure that the thread is waiting for window messages.

Most of the other columns are pretty self-explanatory however there’s one that’s particularly interesting , “Cycles”. Check out what thread is at the top of this list , “WrAlertByThreadId” at a whopping 103 BILLION (and you wonder why your laptop gets so hot) :

The “Cycles” column shows the number of CPU cycles that the thread has executed since it was created. “WrAlertByThreadID” is a system thread that is used to wake up threads that are waiting for an alertable state. An alertable state is a state in which a thread can be awakened by an asynchronous procedure call (APC) or an I/O completion routine. When a thread is waiting for an alertable state, it is waiting for an event that will cause it to wake up and perform some action.

When a thread enters an alertable state, it is placed on a wait queue. The “WrAlertByThreadID” thread periodically checks the wait queue to see if any threads are waiting for an alertable state. If it finds any threads on the wait queue, it sends them an APC to wake them up. Lastly, the TID is the Unique Thread Identifier that we briefly covered before.

There’s so many more elements to cover, however I’m going to leave it there today. Examining thread activity is especially important if you are trying to determine why a process that is hosting multiple services is running (such as Dllhost.exe, Svchost.exe, or Lsass.exe) or why a process has stopped responding. The best tools to review threads on a system are WinDbg, Performance Monitor, and the one I just demonstrated Process Explorer (sysinternals).

The two other main topics I wish to review today are thread scheduling and thread pools, we’re going to cover these briefly which will not do it justice.

Thread scheduling in windows is the process of assigning processor time slices to threads. Threads are scheduled for execution based on their priority. All threads are assigned processor time slices by the operating system. Windows implements a priority driven, preemptive scheduling system. At least one of the highest-priority ready threads always run, with the caveat that certain high-priority threads ready to run might be limited by the processors on which they might be allowed or preferred to run — phenomenon called processor affinity.

After a thread is selected to run, it runs for an amount of time called a quantum. A quantum is the length of time a thread is allowed to run before another thread at the same priority level is given a turn to run. Key to understanding thread scheduling algorithms is understanding priority levels:

Thread priority levels are assigned from two different perspectives: those of the Windows API and those of the Windows kernel. The Windows API first organizes processes by the priority class to which they are assigned at creation:

A thread pool is a collection of worker threads created and managed by the system that efficiently execute asynchronous callbacks on behalf of the application. Additionally, there are waiter threads that wait on multiple wait handles (callback to us looking at WaitReason), a work queue, a default thread pool for each process, and a worker factory that manages the worker threads.

Worker factories refer to the internal mechanism used to implement user-mode thread pools. By default, each thread pool has a maximum of 500 worker threads. The thread pool attempts to create more worker threads when the number of worker threads in the ready/running state must be less than the number of processors.

Windows supports preemptive multitasking, which creates the effect of simultaneous execution of multiple threads from multiple processes. On a multiprocessor the system can simultaneously execute as many threads as there are processors on the computer.

Attacks using Threads:

Now on to some ways threat actors can attack and manipulate threads, this will be brief, as my next article on Threads will cover the security around them and the various windows security mechanisms in place behind the scene!

The function CreateRemoteThreads accepts an extra argument, which is a handle to a target process where the thread is to be created. You can use this function to inject a thread into another process. One common use of this technique is for a debugger to force a break into a debugged process. The debugger injects the thread, which immediately causes a breakpoint by calling the DebugBreak function. Another common use of this technique is for one process to obtain internal info about another, which is easier when running within the target process context (for example , the entire address space is visible) , this can be done for legitimate or malicious purposes.

Remote Threads — Another defense evasion technique, a common technique used by malware which works by injecting the shellcode (payload) into the context of another eligible process and creates a thread for that process to run the payload. Remote Threads are created using the Windows API CreateRemoteThread and can be accessed using OpenThread and ResumeThread

This is used in multiple evasion techniques including DLL injection, thread hijacking, and process hollowing (2)

Thread Synchronization Attacks — Attackers may exploit vulnerabilities in thread synchronization mechanisms to disrupt or manipulate the intended behavior of threads. For example, they can use techniques like race conditions or deadlocks to cause unintended consequences, such as resource conflicts or denial of service situations.

Reflective PE injection — This technique injects and run a complete executable module inside another process memory. This is similar to reflective DLL injection, since they do not drop any files to the disk: reflective DLL injection works by creating a DLL that maps itself into memory when executed, instead of relying on the Windows loader

Like reflective DLL injection, this technique does not rely on LoadLibrary function, but copies its malicious code into an existing open process and cause it to execute (using shell code, or by calling CreateRemoteThread) . The malware allocates memory in a host process, and instead of writing a “DLL path” it writes its malicious code by calling WriteProcessMemory

(What I find particularly interesting are these techniques which are used to bypass security measures that are in place to prevent unauthorized code execution. We will review these in depth in the part 2 article on Threads.)

Thread starvation -This is a technique where an attacker causes a thread to starve for resources such as CPU time or memory. The attacker can use this technique to cause DoS attacks or to slow down the system

One way I’ve found of Identifying suspicious thread creations:

One way to monitor for them is in the sysinternals tool Sysmon on windows (here is a highly recommended Sysmon pre config file developed by SwiftOnSecuritythat you should configure on your systems): https://github.com/SwiftOnSecurity/sysmon-config , this example is taken from this.

Event ID 8: CreateRemoteThread

The CreateRemoteThread Event ID will monitor for processes injecting code into other processes. The CreateRemoteThread function is used for legitimate tasks and applications. However, it could be used by malware to hide malicious activity. This event will use the SourceImage, TargetImage, StartAddress, and StartFunction XML tags.

<RuleGroup name="" groupRelation="or">
<CreateRemoteThread onmatch="include">
<StartAddress name="Alert,Cobalt Strike" condition="end with">0B80</StartAddress>
<SourceImage condition="contains">\</SourceImage>
</CreateRemoteThread>
</RuleGroup>

The above code snippet shows two ways of monitoring for CreateRemoteThread. The first method will look at the memory address for a specific ending condition which could be an indicator of a Cobalt Strike beacon. The second method will look for injected processes that do not have a parent process. This should be considered an anomaly and require further investigation.

That’s all for now, as stated previously, this is only a general overview. Next Threads Pt 2 we will take a deep dive into process / thread security , as well as go deeper into thread scheduling, priority levels, and how thread pools work.

Below is a little glossary for some of the words and terms I’ve used which might be new to some.

Definitions:

PE (Portable Executable) — this format is the standard file format for executables, object code and dynamic link libraries (DLLs) used in 32 and 64 bit versions of Windows . The PE format is a data structure that encapsulates the info necessary for the windows loader to manage the wrapped executable code. This includes DLL references for linking , API export / import tables, resource management data and thread-local storage data

Fiber — A unit of execution that must be manually scheduled by the application. Fibers run in the context of the threads that schedule them. Each thread can schedule multiple fibers. In general, fibers do not provide advantages over a well designed multithreaded application. However, using fibers can make it easier to port applications that were designed to schedule their own threads.

asynchronous procedure call (APC) — a function that executes asynchronously in the context of a particular thread. When an APC is queued to a thread, the system issues a software interrupt. The next time the thread is scheduled, it will run the APC function

Win32k.sys is a kernel-mode system driver that is responsible for graphics management and what you see on the screen. It is separated into two major subset components: GDI (Graphical Device Interface) and USER (the window manager of Windows). It’s used to boost the successful communications between your Windows system and hardware.

Exception Handler: An exception handler is code that stipulates what a program will do when an anomalous event disrupts the normal flow of that program’s instructions. An exception, in a computer context, is an unplanned event that occurs while a program is executing and disrupts the flow of its instructions.)

User-mode scheduling (UMS) — a lightweight mechanism that applications can use to schedule their own threads. An application can switch between UMS threads in user mode without involving the system scheduler and regain control of the processor if a UMS thread blocks in the kernel. Each UMS thread has its own thread context instead of sharing the thread context of a single thread. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls.

Ref:

1. Windows Internals 7th Edition - Chapter 4 Threads pg 202

2. About Processes and Threads : https://learn.microsoft.com/en-us/windows/win32/procthread/about-processes-and-threads

3. Process Injection: Remote Thread Injection or CreateRemoteThread : https://0x00sec.org/t/process-injection-remote-thread-injection-or-createremotethread/24399

4. Processes, Threads, and Windows : https://scorpiosoftware.net/2021/07/03/processes-threads-and-windows/

5. Windows System Processes — An Overview For Blue Teams : https://nasbench.medium.com/windows-system-processes-an-overview-for-blue-teams-42fa7a617920

Demystifying Windows Internals — Part 1 of 2: Windows Threads

Definitions:

Written by SyntheticSecurity