This post is a continuation from Part 1.
A quick recap: Part 1 was an introduction into of using Ghidra as a tool to statically reverse engineer Emotet. Emotet encoded their strings with a simple xor. Since the xor routine was used throughout the file, I showed how I used Ghidra’s script manager to write a python script to automate decoding the strings.
For Part 2, the challenge was to see if I could do something similar to decode the API names that Emotet encoded as a hash value.
How does Emotet resolve the API addresses?
The Malware doesn’t have encrypted strings of the API names in its binary. Instead it has the API name stored as hash values in the binary.
These array of hash values are pushed onto stack and are referenced by a pointer used in the function 0x401230. I’ve labeled the function as decodeAPINames.
First a handle to the DLL is gotten before the call to decodeAPINames is made. The malware finds the handles for kernel32.dll and ntdll.dll by enumerating through PEB. And the rest of the DLLs are loaded with LoadLibraryW (the DLLs names were encoded by the xor routine).
The function uses this handle to read the Export Address Table of the DLL. It hashes each API name in the table and compares it to the hash values pushed onto stack. If it matches, the API address is saved to an offset in the file.
Side-note: Not all the hashes are useful. Some are dummy hashes so it appears that the file has a gargantuan list of API addresses to resolve and thus meant to confuse the analyst.
Getting the names from the hashes with Python
Here is the script I’ve written so you can follow along — https://github.com/0xd0cf11e/ghidra/blob/master/ghidra_emotet_decode_hash.py
There are different ways that one can tackle this. The method I employed requires interaction by the user and the user having a list of the API names in a file ready. This made it easier for me to write a simple script, though it requires some preparation.
First, I wrote a quick script to get all the export functions of a DLL to a file.
I ran the script over ntdll.dll, kernel32.dll, advapi32.dll, shell32.dll, crypt32.dll, urlmon.dll, userenv.dll, wininet.dll, wtsapi.dll.
Next was to find all the offsets where decodeAPINames is referenced. For this I came across Ghidra’s Java script ShowCCallScripts.java in the Script Manager. As the script mentioned, I placed the cursor in the decodeAPINames function and ran it.
For running the script, all I had to do was note which referenced offset was resolving API names for which DLL.
For example, the function at 0x4079c0 resolves for crypt32.dll API names. I know this since I can see that name was decoded at 0x407afa when the call to decode_strings was made (if you read Part 1). It then calls decodeAPINames at offset 0x407b3a.
When the script is run, it prompts for the offset. Continuing from the example I enter 407b3a.
Next it prompts for the file and I select the file I’ve created having the export function names from crypt32.dll.
In the console we see what API names got resolved. In the case of crypt32.dll, there is only 1 API name CryptDecodeObjectEx despite 39 (0x27 in hex) hashes being pushed onto stack.
And accordingly the script labels the offset with the API name.
Below is one run for kernel32.dll:
Hope this has inspired you. If you have found a better way to decode the hashes or some other cool techniques with Ghidra, please feel free to share :).