Hostile Code: Dealing with stack strings in IDAPython

DCSO CyTec Blog
10 min readAug 15, 2023

--

A function full of stack strings with its function graph on the left

For the first post in our new “Hostile Code” series, in which we aim to showcase the various challenges (and, typically, solutions!) you encounter when analyzing malware, we decided to look at stack strings, a type of code obfuscation rather common in malware nowadays.

After we find out what it is and how it looks, we tackle a specific flavor of stack strings and partially automate the deobfuscation process in IDAPython, showing how simple, semi-automated scripts can be a big aid in the analysis process.

Blog post authored by Johann Aydinbas

Code obfuscation and stack strings

In contrast to classic anti-analysis approaches like debugger or VM detection, which try to prevent the dynamic analysis of malware, code obfuscation tries to make reading and understanding the actual malicious code hard. As such, it’s a defense mechanism mostly (but not exclusively) targeting human analysts — the CPU does not care if it has to execute 1 or 1000 instructions with the same result, it just does.

We on the other hand have limited time, patience and focus, but the malware doesn’t even have to make us quit in frustration — just buying time might be all it needs to do damage. If it takes us a week to decide if a sample is malicious at all, then that’s plenty of time for malware to wreak havoc.

A very straight-forward example for code obfuscation is junk code insertion. Just drown every one meaningful instruction in 10 or 100 that ultimately have no effect and all of a sudden a small function becomes a nightmare to analyze and reason about. For the execution it doesn’t matter — if the 100 junk instructions essentially are nops (no-operation) then the code still performs the same task but looks vastly worse to a human.

Example of obfuscation via junk code

But code that does meaningful stuff typically also uses some data. If the malware wants to add itself to autorun via the registry for example, it needs to have the respective registry keys as text strings. So even if the code is essentially unreadable, if we see such text strings in the binary, we can make an educated guess and possibly confirm it by looking specifically for registry access during a sandbox run.

In general, text strings provide excellent entry points for code analysis. If we want to find out how or where a malware sample communicates to, spotting strings like “http”, “POST” or perhaps a user agent is like a blinking red arrow to an analyst. But not only humans benefit from plain strings — they also make great targets for signatures for example.

Thus, it makes a lot of sense for malware to try and hide their strings.

Hiding strings can be done in a multitude of ways but today we’re looking at one specific method that seems to be in trend recently. It is known as “stack strings”. The “stack strings” obfuscation essentially moves text strings from the static data realm to the dynamic realm. Instead of storing and referencing a string like “http://” as data, it generates code that builds the string on the stack at runtime where needed. This means the string is only ever present in full and plain when needed, and gone right after.

Therefore, if you inspect the binary for text strings, there won’t be any (well, there will be, but not the ones that matter)!

A very simple example is shown below, a function that builds the string InternetConnectA on the stack for use in import resolution:

It is rarely this nice

As we can see, the string is rebuilt on the stack character by character and as the stack is ephemeral, the string will be gone from memory once execution leaves the current function. In general, rebuilding is usually done with multiple mov instructions (or similar), often with varying operand sizes, too:

Different operand sizes in use, followed by xor decryption

Also very common is using encryption on top, so not only are there no static strings, even the reassembled strings on the stack need decryption first in order to make sense.

And one more devious detail about stack strings — they are typically implemented as a compile time obfuscation. That means that the obfuscation to text strings is applied before compilation, which has the nasty side effect that the code goes through the compilation phase — including any code optimizations active!

For bigger functions this may result in all kinds of seemingly odd data transfers across the stack and/or various registers. This loss of data locality can be a major source of issues when trying to tackle the problem with tooling automatically.

Stack strings after compiler optimizations can have odd data transfers

While there are decent approaches for the general problem, today we will look at a nice edge case instead, and then write a nifty script in IDAPython (the Python plugin for IDA Pro) in order to automate most of the deobfuscation process.

Stack strings with SSE registers

Recently we came across an interesting sample that utilized stack strings to make analysis harder, but the sample used a special flavor of it. Strings are rebuilt via movaps/movups instructions, which are part of the SSE instruction set, and are used to move 16 byte chunks around, from memory to xmmN registers and then to the stack.

Example of a string being rebuilt using SSE instructions

For added difficulty, the string is reassembled out-of-order and it is also encrypted with AES, because why not.

All in all, it’s very effective in hiding strings that might attract our attention. Thus, before diving deeper into the sample analysis, we need to deal with the hidden strings.

Breaking down the problem

The best approach to solving such tasks is breaking them down in sub tasks, investigating which of those can be solved (easily), and then combining the solutions into a finished tool. Divide and conquer:

  1. We need to find all stack string locations
  2. Once we have a location, we need to figure where the reassembly code starts and ends
  3. Once we know the boundaries, we can extract the data used in the instructions
  4. Once we have the data, we have to make sure it is in the right order

1 — Finding all stack string locations

After reading the code a bit, this was easy. It seems that at every stack string site, the code calls the function I’ve named store_qword . It provided a clear end of the reassembly code for all locations we inspected so that’s good. Checking xrefs to aes_decrypt is also a possible approach but due to the C++ nature of the sample, the actual decryption turned out to be more distant from the stack string reassembly than looking for store_qword.

This solves problem 1.

2 — Determine the boundaries

Then this one turned out to be difficult. store_qword actually writes the start and end pointer of the stack string somewhere so in theory we have the boundaries on the stack, but having the location on stack is not the same as finding the instructions writing to the stack.

In fact, pivoting from the destination boundaries to the source boundaries (so we know which instructions to consider for reassembly) didn’t seem solvable too easily. We can find where the arguments for store_qword are but the reassembly can happen anywhere before the call but also before the arguments are written, after or even in between.

Looking over the code snippets, it seems reasonably easy for us human analysts to spot the boundaries, but pouring our intuition/pattern recognition into code is often way more difficult than it would seem. So why don’t we just leave this step to us — after all speed is king in malware analysis.

You can select ranges in IDA and also query the selection range via API. Even better, there is an API to iterate over instructions for any given address range so we end up with this simple code for starters (the function returns the addresses of the 1st byte of any instruction, hence ‘head’):

for head in Heads(read_selection_start(),read_selection_end()):
...

3 — Extract the data

Now that we have delegated the boundary detection task to ourselves, we just need to extract the data. This is were this edge case made it easy for us. It exclusively uses 16 byte chunks, which are always copied from memory to xmm0 and then written to the stack, before finally being decrypted.

So while there is a data transfer involved (memory to xmm0 to stack) it’s still very straight-forward to solve. We need a mini tracer that just looks out for when xmm0 is written, and when it is used to write. At the site of writing xmm0 we remember the value, and when we then find a mov that sources xmm0 to write the stack, we know what value will end up there.

“Mini tracer” makes it sound complicated, but it’s not more than this:

addr = None
for head in Heads(read_selection_start(),read_selection_end()):
if print_insn_mnem(head).startswith("mov"): # skip any other instruction
if print_operand(head,0) == "xmm0": # Is this writing to xmm0?
addr = get_operand_value(head,1) # -> remember the source address
elif print_operand(head,1) == "xmm0": # Is this reading from xmm0 instead?
...

The code is very straight-forward. If the instruction mnemonic is a mov and the 1st operand is xmm0 we know we encountered an instruction of the form mov xmm0, …, or if the 2nd operand is xmm0 we encountered an instruction like mov …, xmm0

This code isn’t bulletproof. There may be cases where we find an instruction writing to xmm0 but the source is another register so the value will be returned as 0 etc. but starting with the naive solution is always the best approach — once we do actually stumble upon edge-cases we can decide which to cover and which we dismiss as “acceptable losses” — even if we only cover 70% of cases, that’s still a big improvement over having to piece every single string together by hand!

4 — Reassemble in the correct order

We can now follow the code in a given range and catch stack writes and we also know the source for that stack write. But the problem is that the writes happen out of order so if we just write them down as we encounter them, the reassembly will be wrong.

Solving this is easy though. While a stack writing mov instruction likely looks like this in IDA:

movups  [ebp+var_28D8], xmm0

there is still just a number in the background. You can press ‘k’ to toggle the stack variable display and it will turn to this (this is just a visual toggle, the script is not affected by it at all):

movups  xmmword ptr [ebp-28D8h], xmm0

Now that we see the write is just an offset, a number, we can simply record the offset for all writes that use xmm0 as a source, order the data used in those writes based on this number and get back the data in perfect order.

With get_operand_value(addr,op_idx) we can query the “number” used in operands, and for the used type here (register plus offset), it returns the offset.

Putting it together

We extend our code with an array of pairs that records all writes together with their offset, then we sort that array by offset and reassemble the chunks in order:

addr = None
chunks = []
for head in Heads(read_selection_start(),read_selection_end()):
if print_insn_mnem(head).startswith("mov"):
if print_operand(head,0) == "xmm0":
addr = get_operand_value(head,1)
elif print_operand(head,1) == "xmm0":
off = num(get_operand_value(head,0))
chunks.append((off,addr)) # add the pair to our chunk list

full = b""
for (off,addr) in sorted(chunks,key=lambda p: p[0]): # sort array by off
chunk = get_bytes(addr,16) # fetch 16 bytes from the stored addr
full += chunk # and add it to the final blob

Now we just need to add a little bit of code to make it usable. We put all this code into a function. During analysis we noticed the AES key is hardcoded so it’s easy to add AES decryption on full , but we still need to make calling this function convenient.

To do so we can just put it into a function and bind this function to a hotkey. A personal preference is binding scripts to the number ‘2’ key so the final script looks like this:

def follow_flow():
addr = None
chunks = []
for head in Heads(read_selection_start(),read_selection_end()):
if print_insn_mnem(head).startswith("mov"):
if print_operand(head,0) == "xmm0":
addr = get_operand_value(head,1)
elif print_operand(head,1) == "xmm0":
off = num(get_operand_value(head,0))
chunks.append((off,addr))

full = b""
for (off,addr) in sorted(chunks,key=lambda p: p[0]):
chunk = get_bytes(addr,16)
full += chunk

print(decrypt_blob(full,key))

idaapi.add_hotkey("2",follow_flow)

Now we run this script in IDA once so the hotkey is assigned, and from then on we can simply mark any location we identify as a stack string reassembly, press ‘2’ and our script automagically prints the decrypted string:

Decrypting two stack strings with two key presses

Together with the function we identified in step 1) above, we can quickly jump between all stack string locations by hand, mark the area, press ‘2’, find out what string is being reassembled and use this as an entry point for further analysis.

Summary

Malware often tries to make itself hard to analyze, in various ways. One such way is “stack strings”, a code obfuscation technique where static strings are replaced by code that creates strings at run time on the stack.

In our sample a special flavor of stack strings was used, which allowed us to write a reasonably simple script to semi-automate the task of deobfuscation and thus mitigate the the impact of the obfuscation.

--

--

DCSO CyTec Blog

We are DCSO, the Berlin-based German cybersecurity company. On this blog, we share our technical research.