How bad design decisions created the least secure driver on Windows

This driver is called win32k, it manages the user interface of Windows. This post will discuss the multiple bad ideas that are part of this driver.

How bad is it?

It is hard to get a bug count estimate. Each page on the Microsoft website is unique and it can be hard to infer affected modules. You can download the patch and look at the files, when the links still work.

I used the published bulletin data and correlated it with the CVE entries. The CVE description helped identify the affected drivers. The stats below are not meant to be accurate but reflect overall trend.

MSRC cases versus CVE count for main modules between 2001 and 2016

If you look at the number of MSRC cases, win32k had couple more than the main kernel module (ntoskrnl). The CVE count shows that each case bundled more bugs in win32k than ntoskrnl. At least twice as many bugs were fixed in win32k than in ntoskrnl. It also doesn’t count any silent fixes included in the patches.

Designed with trust in mind

In Windows NT 3.5, the UI was managed by a user-mode module in the CSRSS system process. The original design was overhauled in Windows NT 4 due to bad performance. The win32k driver was created almost as an extension of its user-mode counterpart with an obvious trust between the two of them. This new design was much faster and flexible though hard to make reliable and secure.

Security at Microsoft was not much of a concern before Windows XP and the Trustworthy Computing memo from Bill Gates. Between Windows NT 4 and Windows XP many applications were built to rely on the USER & GDI APIs linked to the win32k driver.

User-mode callbacks

User-mode callbacks are a way for the win32k driver to call user-mode synchronously. Returning in user-mode is not necessarily a bad thing but it has to be done correctly (for example APCs). Doing it synchronously in countless places is dangerous. The driver also allows to intercept messages and events resulting in user-mode callbacks in unexpected places.

It created an endless number of bugs, mainly user-after frees. In the most common case, the kernel thread keeps a reference an unlocked object. Sometime the driver assumes the object state didn’t change leading to type confusions.

If you want to know more about user-mode callbacks and related security issues, you can read Tarjei Mandt (kernelpool) BlackHat 2011 paper. I also wrote a paper on Uninformed volume 10 back in 2008.

An exclusive lock for more than 600 syscalls

The USER kernel component takes an exclusive lock at the beginning of each syscall. It means only one thread can call this component per terminal session. This component was never designed for concurrent usage. All objects and critical structures expect only one thread to use them.

How can the UI work with such constraint? The user-mode part does most of the work using the mapped heap and object table.

A kernel heap mapped in user-mode

The USER and GDI components both have their own object table. The object table section is mapped in user-mode as part of a kernel heap. Most USER objects are also in the same section. That’s how the exclusive lock design is compensated. The USER user-mode component tries to do everything through mapped sections and use syscalls only when needed.

Each USER object table is an array of this structure:

typedef struct _HANDLEENTRY {
// Points to the kernel object
struct _HEAD* ptr;

// Thread/process owner
VOID* pOwner;

// Object type
UINT8 bType;

// Flags related to state
UINT8 bFlags;

// Unique part of the handle id
UINT16 wUniq;

The ptr field discloses the address of the object in kernel memory. If you debug user32.dll handle validation code, you can see the kernel pointer and arithmetic to translate to the user-mode mapping:

// Before rbx points to the kernel object fetched from mapped table
sub rbx,qword ptr [rdi+28h]
rdi+0x28h points to fffff706eee30000 which is the difference between the user-mode and kernel-mode mapping.
// After points to the kernel object mapped in user-mode. rbx=000001fa51b1ad10
> !vprot 000001fa51b1ad10
BaseAddress: 000001fa51b1a000
AllocationBase: 000001fa519d0000
AllocationProtect: 00000002 PAGE_READONLY
RegionSize: 000000000006d000
State: 00001000 MEM_COMMIT
Protect: 00000002 PAGE_READONLY
Type: 00040000 MEM_MAPPED
000001fa519d0000 + fffff706eee30000 = fffff90140800000
This shared Rtl heap is based at fffff90140800000 in kernel.

A syscall is not needed if you want to gather information about an object. The user-mode component can do it by looking at the object. This design completely breaks KASLR.

A bug reported as MS07–017 allowed to remap this shared section as writeable. At the time Ivanlef0u wrote an interesting blog post on how to exploit this vulnerability (in French).

Parsing fonts in kernel-mode

Font parsing is an extension of win32k through atmfd.dll (a driver with a dll extension). The GDI component handles font management and atmfd is used to parse the font files.

Fonts should not be parsed in such a high privilege level. Example of the complexity of font parsing on the TrueType Wikipedia page:

TrueType systems include a virtual machine that executes programs inside the font, processing the “hints” of the glyphs.

TrueType has its own instruction set running in a virtual machine. What would happen if you can corrupt the state of this virtual machine?

The Duqu malware used a bug in this driver to get a foothold on targets’ machines. The target user would open a Word document with a font embedded in it. The kernel parsed the font resulting in memory corruption of the virtual machine and installation of the rootkit. The Word document renders perfectly.

Windows 10 by default parses fonts in user-mode. A mitigation policy can prevent loading non-system fonts for specific processes.

A good read on font bugs is Mateusz Jurczyk (j00ru) great work on font parsers including atmfd as well as his posts on Google Project Zero.

A hard problem to solve

I know first hand that it is a difficult problem to solve. Changing something that is fundamentally flawed and widely used is almost impossible. Finding bugs might be easier, but when are you done?

Sandoxing is the only approach making a significant difference so far. Chrome blocks access to the driver by preventing switching to UI threads. Edge talked about a syscall filter few months ago. Any previous sandboxing technology made little sense as the weakest part of Windows was still exposed.

As Windows NT 4 needed a design change for better performance, it is time for a significant change for both security and performance.

Show your support

Clapping shows how much you appreciated Thomas Garnier’s story.