BASICS OF SHELLCODE ANALYSIS
A binary chunk of data that is hidden in malware which is used to do malicious task by performing techniques like process injection. Its a PIC (Position Independent code) which uses no hard-coded address, PIC means that it can be executed at any memory location. Shellcode can not use windows loader because it does not have a format , it does not have PE header that windows loader can use to execute it. Its just binary data. To execute properly it needs to load libraries and have dependencies required for proper functioning. But how it is achieved?
HOW IT IS ACHIEVED?
- To reverse engineer a shellcode first we will give the binary chunk of data a PE format by converting it to an exe file using “shellcode_launcher.exe”
- First the shellcode needs to identify a location where it needs to be launched and for that it needs to de-reference a base pointer.
- Since x86 does not support EIP-relative addressing (means we can’t directly use “mov eax, eip), it uses control-flow instructions so shellcode needs to dereference a base pointer when accessing data in a position independent manner. Therefore it uses techniques like “call/pop” & “fastenv” instruction.
- Call/Pop technique - Shellcode executes a POP instruction immediately after Call instruction therefore getting the address of the instruction after the CALL instruction. See the below image for reference. (0x00250009)
- Using Fastenv- “fpu_instruction_pointer” field of “FPUSAVESTATE” contains the address of last CPU instruction that use the FPU. when fastenv is executed it stores the “fpusavestate” structure on stack and executes the POP instruction to get value of “fpu_instruction_pointer”.
- Once it de-reference a base pointer the shellcode needs to interact with system API but it can not use windows loader, it needs to resolve symbols itself and typically needs “LoadLibraryA” & “Getprocaddress”, the shell code can access the full API if somehow it manages to get these two functions.
How does it do this? — Manual Symbol Resolution
- Finding Kernel32.dll in memory
- Parsing PE Export data
Finding kernel32.dll in memory.
- Access the FS segment register to get the TEB(Thread Environment Block), at offset 0x30 is the pointer to PEB(Process Environment Block) structure. (a)
- At offset 0x0c within the PEB structure is a pointer to a doubly linked list PEB_LDR_DATA structure which contains 3 doubly-link structure of LDR_DATA_TABLE — one for each loaded module. (b)
- [eax+14] — access “InMemoryOrderLinks”/”InMemoryModuleList”.©
- Traverse the InInitializationOrderLinks to get the kernel32.dll’s LDR_DATA_ENTRY and finally getting pointer to Dllbase.(d)
PARSING PE EXPORT DATA
- First the shellcode will try to get access to IMAGE_EXPORT_DIRECTORY because it holds the export data.
- At the end of IMAGE_OPTIONAL_HEADER , in IMAGE_DATA_DIRECTORY RVA (Relative Virtual Address) to IMAGE_EXPORT_DIRECTORY is stored.
- IMAGE_EXPORT_DIRECTORY has the following member -
Number of Names;
Address of Functions;
Address of Names;
Address of Name Ordinals;
- “AddressOfFunctions” member of IMAGE_EXPORT_DIRECTORY points to the actual export functions
- To find the export address of a symbol — we have to take into consideration the following structures — “AddressOfNames”, “AddressOfNameOrdinals” & “AddressOfFunctions”
- Do a string comparison for the required symbol while traversing
AddressOfNamesarray which gives —
iNamewill be an index for
AddressOfNameOrdinalsarray which gives the value —
iOrdianlwill be an index for
AddressOfFunctionsarray which gives the value — RVA (Relative Virtual Address) of the exported symbol.
Once the address of LoadLibrary is found the shellcode can interact with Windows APIs and functions and proceed to do the malicious task.
Reference — “Practical Malware Analyis”.
Sample Used — “bfa5dba46db1253587058b0392c04c8403846fa55d7dcf1044e94e6a654d4715”