Removing Process Creation Kernel Callbacks

VL
9 min readSep 22, 2021

--

Introduction

Kernel callbacks is a popular mechanism for AV/EDR products which provides those products with a way to monitor process activity on the system. Windows provides a way to notify the security vendors of things such as:

  • Process creation.
  • Thread creation.
  • Handle request.
  • Image loading.

From a stand point view of evasion, if we can remove the notification of process creation, for example, then maybe we can run a malicious process without being detected. Kernel callbacks have been discussed in detail from a stand point view of malware and evasion. The idea is far from new. I mean just consider this awesome FireEye article:

and it’s from 2012!!

This article does not provide new insights on the subject nor does it go into detailed explanation of the mechanics beyond what is required for the sake of the discussion. For detailed explanations and discussions on the subject refer to the articles at the end.

Rather, the aim here is to present an approach of attacking this mechanism and to mess around with some assembly and reversing along the way (which I am always up for :)).

Brief overview

The focus here is on process creation notification but the case of thread creation and image load is the same.

Suppose we have a way of arbitrary read-write in the Kernel, using the many vulnerable drivers such as MSI Afterburner RTCore64 (CVE-2019–16098) which used for the POC in this article, but really any driver which gives the ability to read and write in the kernel will do just fine.

So…we aim to remove the callback/callbacks which the EDR registered in the system which notifies of a new process being created.

Those callbacks are stored in an array by the name PspCreateProcessNotifyRoutine which we don’t know the address of.

But we do know that the callback is registered using a function by the name PsSetCreateProcessNotifyRoutine because…MSDN.

Taking a look at the PsSetCreateProcessNotifyRoutine:

We can see it invokes a function by the name PspSetCreateProcessNotifyRoutine which then load the address of the callback array:

Again, external sources mentioned at the end can explain this little process in greater detail.

So to remove the callback we need first to find the array.

How to find the callback array ?

General steps:

  1. Find the address of PsSetCreateProcessNotifyRoutine function.
  2. Find the address of PspSetCreateProcessNotifyRoutine function.
  3. Find the address of PspCreateProcessNotifyRoutine array.
  4. Find the address of PsSetCreateProcessNotifyRoutine function:

This one is relatively simple and includes the following steps:

  1. Load the Kernel module into process memory.
  2. Get the offset to PsSetCreateProcessNotifyRoutine using the loaded module.
  3. Get the Kernel base address.
  4. Get the address of PsSetCreateProcessNotifyRoutine using Kernel base address and the offset.
Getting the PsSetCreateProcessNotifyroutine address.

The next steps are a bit more tricky. The technique, to find PspSetCreateProcessNotifyRoutine and the callback array, which will be used is basically signature scanning.

But which bytes?

The issue with looking for some signatures in memory, is that the bytes can vary between OS versions making this method not very reliable.

So the idea is to look for bytes which can be considered as an invariant which means they will stay consistent, across multiple OS versions, as much as possible anyways. The term “consistent” here is used loosely but still, we can try…

NOTE: The OS versions and the code in question only deals with 64 bit.

Let’s take several kernel images as test cases and see what the code looks like:

The cases which I reviewed:

1 .Windows Server 2012 9600.

2. Windows Server 2016 1607.

3. Windows 10 2004.

4. Windows 7 7601.

5. Windows Server 2019 1809.

Should be enough to give a fair indication of which bytes to target.

Windows 2012 9600:

Windows Server 2016 1607:

Windows 10 2004:

Windows 7 7601:

Windows Server 2019:

From the disassembled samples above it is a fair assumption that two main things can be targeted:

  1. JMP/CALL instruction which transfer execution from PsSetCreateProcessNotifyRoutine to PspSetCreateProcessNotifyRoutine.
  2. LEA instruction which which loads the address of PspCreateProcessNotifyRoutine array.

NOTE: The first might not be necessary and the case of LEA instruction could suffice but for consistency and because it feels right for some reason, I will target both.

2. Find the address of PspSetCreateProcessNotifyRoutine function:

In this case we have either JMP or CALL instruction depending on the OS in question.

Looking at Intel’s Developer’s manual for the CALL instruction:

The opcode for the instruction is 0xE8 with a displacement of 4 bytes relative to the next instruction.

So for example in case of Windows 10 2004:

Address of PspSetCreateProcessNotifyRoutine = 0x0000000140781D5D + 5 + 0x000001B6 = = 0000000140781F18.

Performing the same steps for the JMP instruction:

The opcode for the JMP instruction is 0xE9 with a 4 byte displacement relative to the next instruction.

For example in the case of Windows Server 2012 9600:

Address of PspSetCreateProcessNotifyRoutine = 0x00000001404E54E3 + 5 + 0x000000BC = 0x00000001404E55A4.

The following code snippet will search for the JMP/CALL instruction from the start of PsSetCreateProcessNotifyRoutine.

NOTE: All the offsets are sign extended so if we take an 8 byte value as the base case and a 4 byte offset which specifies a negative value then we will have to sign extend.

3. Find the address of PspCreateProcessNotifyRoutine array:

To find the address of the array we are looking for LEA instruction.

Let’s consider the following to decode the LEA instruction:

A 64 bit operand is used so we have a prefix which specified by the first byte.

The second byte is the opcode of the LEA instruction. Looking at the following table:

We can see the opcode is 0x8D.

The instruction operand encoding uses ModR/M according to the following table:

ModR/M has the following fields specified: Mod, R/M , REG.

REG will specify the register and R/M combined with Mod bits will specify the addressing mode.

Let’s decode the LEA instruction used in Windows 10 2004:

4C 8D 2D D5 F6 49 00

So we know that 0x4C is the prefix and 0x8D is the LEA opcode. Next we have the ModR/M byte 0x2D.

Looking at the bits: 0x2D = 00101101

Mod = 00

R/M = 101

REG = 101

From the following table:

We can see that mod = 00 and R/M = 101 will result in a 4 byte displacement operation. In long mode this comes down to displacement of 4 bytes from the instruction pointer.

We can see from the table that the value for the third byte in the instruction is changing only by the 3 bits of REG which specifies the register used.

05 = 000 = r8

0D = 001 = r9

15 = 010 = r10

1D = 011 = r11

25 = 100 = r12

2D = 101 = r13

35 = 110 = r14

3D = 111 = r15

32/64-bit address table from https://wiki.osdev.org/X86-64_Instruction_Encoding.

We can see that the result fits with what the disassembler shows:

So the address of PspCreatePRocessNotifyRoutine = 0x000000014084CC44 + 7 + 0x0049F6D5 = 0x0000000140CEC320.

From these conditions the target bytes in this case can be:

  1. 0x4C and 0x8D for the first two bytes.
  2. For the third byte the set of bytes that specify the register can be used:

0x05, 0x0D,0x15,0x1D,0x25,0x2D,0x35,0x3D(Even if I only saw uses of r12 — r15).

The following code snippet will search for the LEA opcode from the start of PspSetCreateProcessNotifyRoutine:

What’s next ?

Once the address of PspCreateProcessNotifyRoutine array has been found it’s a matter of looping through the array in search of the target callback and replacing the callback with 0.

Thus making the EDR blind to new processes which are created in the system.

OS Versions tested:

Windows 10 20H2

Windows 10 1909

Windows 10 1903

Windows 10 21H1

Windows Server 2019 1809

Windows 7 7601

Windows Server 2012 9600

Windows Server 2016 1607

The full code implementation can be found in the following GitHub page: https://github.com/JustaT3ch/Kernel-Snooping.

NOTE: The code considers a single callback target so once or if the callback found it will stop searching the rest of the array. So in case of multiple callbacks some minor changes has to be made.

What about thread creation ? image loading notifications ?

The same approach with some modifications can be used for thread creation and image loading callbacks and will be discussed in the next article. As for handle creation, it’s an interesting case and definitely requires a discussion all by itself.

Final Thoughts

Hopefully this gives some insight into a different approach then getting's some bytes signatures which are OS based, checking OS versions etc. hoping not to get a BSOD.

Although I cannot promise this will be BSOD free, it did try to make it as smooth as possible.

As stated before the MSI driver was used in this case and thus it’s read write primitives were implemented.

But there are many vulnerable drivers out there and the major change which will have to be made for this to work with a new one, is to implement the corresponding write/read primitives i.e. how the data is being passed to the driver , what are the sizes of the read value which is passed etc. But the logic of finding the callback array remains the same.

If You did not fall asleep until this point then I guess I did something right.

Thanks for reading :)

References:

https://itw01.com/8SRQMEH.html

--

--