Hooking Heaven’s Gate — a WOW64 hooking technique

Hoang Bui
Hoang Bui
May 14, 2019 · 10 min read

This is not new, this is not novel, and definitely not my research — but I used it recently so here is my attempt at explaining some cool WOW64 concept. I also want to take a break from reading AMD/Intel manual to write this hypervisor. I also think the term “Heaven’s Gate” is quite appropriate and is the coolest thing ever, so here we have it.


Introduction

I usually add some pictures here to show how I started my journey but because it was 2 months ago on a free slack (shoutout to GuidedHacking), I don’t have the log anymore. Either way, it went something like this…

Me: Yoooooooo any good technique to catch a manual syscall?!?!?
GH: That is going to be tough.
GH: Wait, is it Wow64?
Me: Yes
GH: You can’t manual syscall on Wow64, you coconut.
Me: ????

So there you have it, no such thing as a manual syscall on WOW64. Well, there is one way but I will covert that topic at a later time. (Hint: Heaven’s Gate)

First, we need to understand a bit about WOW64.

WoW64 (Windows 32-bit on Windows 64-bit)

I will covert a very brief part simply due to the fact of how complicated the subsystem is and prone for possible mistakes that I might make.

WOW64 applies to 32 bit applications running on a 64 bit machine. This mean that while there is very small different in how the 32 bit and the 64 bit kernel work, there is no doubt incompatibilities. This subsystem tries to mitigate those incompatibilities through various interfaces such as wow64.dll, wow64win.dll, and wow64cpu.dll. There is also a different registry environment for wow64 applications vs native 64-bit applications but let’s not get into that mess.

An interesting behavior to notice while executing a WOW64 application is that all kernel-mode components on a 64-bit machine will always execute in 64-bit mode, regardless whether the application’s instructions are 64-bit or not.

This in conclusion means that WOW64 applications run a bit differently than a native 64 bit application. We are going to take advantage of that. Let’s look at the difference when it comes to calling a WINAPI.

NTDLL.dll vs NTDLL.dll

Ntdll.dll on a Windows machine is widely covered and I won’t go too deep into that. We are only interested in the feature of ntdll.dll when performing a WINAPI call that requires a syscall. Let’s pick Cheat Engine as our debugger (because it can see both DLLs) and Teamviewer as our WOW64 application.

Image for post
Image for post
If you can’t find the functionality
Image for post
Image for post
Ara ara? What is so strange about this

If this was a live conversation, I would torment you with this question but this is not a live session. Noticed, there are those 3 wow64 interface dlls that I mentioned earlier, but the particular thing you want to notice is the two ntdll.dll. What even more bizarre is that one of the ntdll.dll is currently residing in a 64 bit address space. Wtf? How? This is a 32 bit application!

The answer: WOW64.

The Differences

I am sure there are a ton more differences between the two dlls but let’s cover the very first obvious difference, the syscalls.

We all know (if not, now you do) that ntdll.dll in a normal native application is the one responsible for performing the syscall/sysenter, handing the execution over to the kernel. But I also mentioned earlier that you cannot perform a syscall on a WOW64 application. So how does WOW64 application do… anything?

By going into an example function such as NtReadVirtualMemory, we should be expecting a service id to be placed on the eax register and follow by a syscall/sysenter instruction.

Image for post
Image for post
No syscall, at all

Okay, now that’s weird. There is no syscall. Instead, there is a call and I know for sure you can’t just enter kernel land with just a call. Let’s follow the call!

Image for post
Image for post
A jump to wow64transition inside wow64cpu.dll
Another jump, into another jump…hold up, is that “RAX” I see?.. isn’t RAX a 64-bit register ?

We are now at some place inside wow64cpu.dll called Wow64Transition that is now executing with 64 bits instruction set. We also see that it is referencing CS:0x33 segment. What is going on?

In Alex Lonescu’ blog, he said:

In fact, on 64-bit Windows, the first piece of code to execute in *any* process, is always the 64-bit NTDLL, which takes care of initializing the process in user-mode (as a 64-bit process!). It’s only later that the Windows-on-Windows (WoW64) interface takes over, loads a 32-bit NTDLL, and execution begins in 32-bit mode through a far jump to a compatibility code segment. The 64-bit world is never entered again, except whenever the 32-bit code attempts to issue a system call. The 32-bit NTDLL that was loaded, instead of containing the expected SYSENTER instruction, actually contains a series of instructions to jump back into 64-bit mode, so that the system call can be issued with the SYSCALL instruction, and so that parameters can be sent using the x64 ABI, sign-extending as needed.

So what this mean is that when the 32-bit code is trying to perform a syscall, it would go through the 32-bit ntdll.dll, and then to this particular transition gate (Heaven’s Gate) and performs a far jump instruction which switches into long-mode (64-bit) enabled code segment. That is the 0033:wow64cpu.dll+0x7009 you see in the latest screenshot. Now that we are in 64-bit context, we can finally go to the 64-bit ntdll.dll which is where the real syscall is performed.

Image for post
Image for post
You can specify in Cheat Engine 64bit WINAPI version with _ before the API’s name
Image for post
Image for post
Finally the expected syscall

There you have it, the full WOW64 syscall chain. Let’s summarize.

32-bit ntdll.dll -> wow64cpu.dll’s Heaven’s Gate -> 64-bit ntdll.dll -> syscall into the kernel

Now that we understand the full execution chain, let’s get hooking!

Hooking Heaven’s Gate

So as hackers, we are always looking for a stealthy way to hook stuff. While hooking heaven’s gate is in no way stealthy, it is a lot stealthier (and more useful) than hooking the single Winapi functions. That is because ALL syscall go through ONE gate, meaning by hooking this ONE gate — you are hooking ALL syscalls.

The Plan

Our plan is quite simple. We will do what we usually do with a normal detour hook.

  1. We will place a jmp of some sort on the transition gate/Heaven’s Gate, which will then jump to our shellcode
  2. Our shellcode will select what service id to hook and jump to the appropriate hook.
  3. Our hook once finished execution, will jump to the transition gate/Heaven’s Gate.
  4. Transition gate/Heaven’s Gate will continue on with the context switch into 64-bit and execute as normal

But first, how does the application knows where is heaven’s gate located?

Answer: FS:0xC0 aka TIB + 0xC0

Image for post
Image for post
FastSysCall is the another name for the Transition Gate aka Heaven’s Gate

So, in theory — we could determine where Heaven’s Gate is by using this code snippet.

const DWORD_PTR __declspec(naked) GetGateAddress()
{
__asm
{
mov eax, dword ptr fs : [0xC0]
ret
}
}

Now that we know where the current Heaven’s Gate is at, and we are going to hook it — let’s create a “backup” of the code we are about to modify.

const LPVOID CreateNewJump()
{
lpJmpRealloc = VirtualAlloc(nullptr, 4096, MEM_RESERVE | MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
memcpy(lpJmpRealloc, (void *)GetGateAddress(), 9);

return lpJmpRealloc;
}

This will effectively allocate a new page and copy 9 bytes far jmp from heaven’s gate over. Why we do this will not be covered but if you want to know the specific term, we are creating a trampoline for our detour hook. This will allow us to preserve the far jmp instructions that we are about to overwrite in the next step.

The 9 bytes is the instruction we are backing up: jmp 0033:wow64cpu.dll + 7009

Next, we are going to replace that far jmp with a PUSH Addr, RETeffectively acting as an absolute address jump. (Push the address you want to jump onto the stack, Ret will pop it from the stack and jmp there)

void __declspec(naked) hk_Wow64Trampoline()
{
__asm
{
cmp eax, 0x3f //64bit Syscall id of NtRVM
je hk_NtReadVirtualMemory
cmp eax, 0x50 //64bit Syscall id of NtPVM
je hk_NtProtectVirtualMemory
jmp lpJmpRealloc
}
}

const LPVOID CreateNewJump()
{
DWORD_PTR Gate = GetGateAddress();
lpJmpRealloc = VirtualAlloc(nullptr, 0x1000, MEM_RESERVE | MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
memcpy(lpJmpRealloc, (void *)Gate, 9);

return lpJmpRealloc;
}

const void WriteJump(const DWORD_PTR dwWow64Address, const void *pBuffer, size_t ulSize)
{
DWORD dwOldProtect = 0;
VirtualProtect((LPVOID)dwWow64Address, 0x1000, PAGE_EXECUTE_READWRITE, &dwOldProtect);
(void)memcpy((void *)dwWow64Address, pBuffer, ulSize);
VirtualProtect((LPVOID)dwWow64Address, 0x1000, dwOldProtect, &dwOldProtect);
}


const void EnableWow64Redirect()
{
LPVOID Hook_Gate = &hk_Wow64Trampoline;

char trampolineBytes[] =
{
0x68, 0xDD, 0xCC, 0xBB, 0xAA, /*push 0xAABBCCDD*/
0xC3, /*ret*/
0xCC, 0xCC, 0xCC /*padding*/
};
memcpy(&trampolineBytes[1], &Hook_Gate, 4);
WriteJump(GetGateAddress(), trampolineBytes, sizeof(trampolineBytes));
}

This code will overwrite the 9 bytes FAR JMP along with all the VirtualProtect you need.

Let’s dissect hk_Wow64Trampoline.

So we know that before any syscall happen, the service id is ALWAYS in the EAX register. Therefore, we can use a cmp instruction to determine what is being called and jmp to the appropriate hook function. In our case we are doing 2 cmp (but you can do as many as you want), one with 0x3f and one with 0x50 — NtRVM and NtPVM. If the EAX register holds the correct syscall, je or jump-equal will execute, effectively jumping to our hook function. If it is not the syscall we want, it will take a jmp to lpJmpRealloc (which we created in our CreateNewJump function. This is the 9 original bytes that we copied over before overwriting it).

void __declspec(naked) hk_NtProtectVirtualMemory()
{
__asm {
mov Backup_Eax, eax
mov eax, [esp + 0x8]
mov Handle, eax
mov eax, [esp + 0xC]
mov Address_1, eax
mov eax, [esp + 0x10]
mov DwSizee, eax
mov eax, [esp + 0x14]
mov New, eax
mov eax, [esp + 0x18]
mov Old, eax
mov eax, Backup_Eax
pushad
}

printf("NtPVM Handle: [%x] Address: [0x%x] Size: [%d] NewProtect: [0x%x]\n", Handle, Address_1, *DwSizee, New);

__asm popad
__asm jmp lpJmpRealloc
}


void __declspec(naked) hk_NtReadVirtualMemory()
{
__asm pushad

printf("Calling NtReadVirtualMemory.\n");

__asm popad
__asm jmp lpJmpRealloc
}

Note that before you are doing any sort of stuff within the hook function, you must pushad/pushfd and then later popfd/popad to preserve the registers and the flags. If you do not do this, expect the program to crash in no time.

Similarly, I’ve tried very hard to get the values from the declspec(naked) function through arguments but it just can’t do because you will end up usign ECX as a register and ECX just happens to hold a 64bit value in my experience.

Image for post
Image for post
PUSHAD will lose the first 4 bytes of ECX

Please let’s me know if you know of a way to get something like this to work.

DWORD __declspec(naked) hk_NtProtectVirtualMemory(
IN HANDLE ProcessHandle,
IN OUT PVOID *BaseAddress,
IN OUT PULONG NumberOfBytesToProtect,
IN ULONG NewAccessProtection,
OUT PULONG OldAccessProtection
)

Summary

In summary, when you are running as a Wow64 process — you cannot access the kernel directly. You have to go through a transition gate aka Heaven’s Gate to transition into 64bit mode before entering Kernel Land. This transition can be hook with a traditional detour which this post covers.

The technique detour the transition gate into a fake gate that does conditional jump based on the service number to the correct hook function. Once the hook function finished execution, it is then jump to a transition gate that we backed up. This will change our 32bit mode into 64bit mode, in which we will then continue with the execution by going into the 64bit Ntdll. 64bit Ntdll will then perform a syscall/sysenter and enter Kernel land.

32bit Ntdll-> Heaven’s Gate (hooked) -> Fake Gate -> hook_function -> Heaven’s Gate Trampoline -> 64bit Ntdll -> Kernel land

Result

Take a look at the example code here.

Image for post
Image for post
10/10 paint job

Another thing to notice is that you cannot just printf the syscall Id within the Wow64 hook, and that is because printf requires a syscall (I believe so) and if you hook the printf syscall while calling printf inside the hook, you are going to have a bad time (Infinite loop).

Conclusion

Hooking is a technique consists of multiple methods. How you hook depends on your creativitiy and your understanding of the system. So far, we have prove that we can hook any function at almost all stages. Maybe next we will go into SSDT hook or some sort. However, my OSCE exam is tomorrow so wish me the best of luck. It took me over a month to finish this because I got so side-tracked. Please forgive me if there are more mistakes toward the 2nd half!

-Fs0x30

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store