Decode shikata ga nai with binary ninja — part 2

Explore about unicorn and using it emulation ability to help us in the process of decode shikata ga nai automatically

kishou yusa
9 min readFeb 9, 2022

In the last part, we explore shikata ga nai and write a simple script in binary ninja to decode shikata ga nai if you have already known the key, position and length in the initialising phase. So in the this part, we will learn about unicorn engine and how to ulitise its emulation ability to find the key, position and length in phase 1.

The position of random FPU instructions and fstenv is random (and sometime the add 0x4 is include in the before the first xor so it really hard to parse), also the register to store the position and the key are also choosen arbitrary from the set of registers (eax, ebx, edx, esi, edi, ebp). So we need to determine which register store the key and which store the register. We also must get the bytecodes that we need to emulate from the binary.

Get the register and bytecode to emulate

But we do see that in the final xor to decode the next part, we see it the form of:

xor [<register_store_pos> + <offset_to_decode_position>], [<register _store_key>]

So the idea to get the right register is using binary ninja to read from the start of the round of shikata ga nai and find the xor instruction which have the above format.

To get the disassemble from the beginning of an address, we will use bv.disassembly_tokens(addr) , it will return back a generator and in each iterator it contain a list of 2 values, an another list of tokens and the number of bytes to disassemble the instruction.

For example in the binary that i had provided in part 1, go to address 1001e0c1 and we will try to parse that xor.

the key is stored in ebx register and the position is stored in eax register

Run the command bv.disassembly_tokens(0x1001e0c1) you will see it return a generator object. If you don’t know what an iterator it check out Iterators in Python — GeeksforGeeks. Now we just want the first instruction so do next(bv.disassembly_tokens(0x1001e0c1)) (note: you can press up arrow key on your keyboard to get back the previous command you typed, it is quicker than have to type back the command) it return back 2 values which i have described above.

The result of disassembly_tokens

So to find the right xor instruction it has to satisfy two conditions. The first condition is the first token inside the list of tokens is ‘xor’ and the second condition is the number of tokens is not smaller than 10. The reason we want to compare the length of the list is because it works faster and easier to code than try to do some regex or parsing on the instruction. Below is the code i use to parse the xor token (note you need to convert the token in the list to text or value so it can be used as string or integer):

def _parse_xor_token(_token):
offset = _token[6].value
eip = _token[4].text
key = _token[-1].text
return key, eip, offset

Next we also need to get the bytes to use it for code emulation. In the return tuple of bv.disassembly_tokens the second value contain the number of bytes to disassemble the isntruction. We can use it to determine the length for bv.read command.

So to get the bytecode, use bv.disassembly_tokens loop through the generator and check if the instruction satisfy the xor pattern and stop if it hit one, if the instruction doesn’t satisfy the xor pattern then use bv.read to read the byte and append it to emulation process. (note we don’t want to take the xor function to emulate). The final code to get the bytecode, the intial register and the offset from the FPU instruction to the decode position.

Here i run the above code again the given malware sample at the beginning of bv shikata ga nai round (0x1001e0ac) on snippet editor plugin. You may seem the snippet from the image is different from the above because that version wasn’t the final version. In the snipptet, i also use log_info to log the result into the log panel. Below is the code that i append to the snippet

result = get_code_to_emulate(bv, 0x1001e0ac)
log_info(f"The key: {result[1][0]}")
log_info(f"The pos: {result[1][1]}")
log_info(f"The offset: {hex(result[1][2])}")

And the result is exactly the information we want

Now we have now the registers which store the key and the FPU instruction position, we can use unicorn to emulate and get the value of the registers with the bytecode we have just gotten.

Using unicorn engine to get the intial components

Unicorn engine is a tool to emulate binary code on multiple architectures (x86, arm, mips, …). The tool is really easy to use and powerful to emulate a part of code inside a binary file without the need to run it. You can check out the document of unicorn engine for a simple example in python.

From the example it is quite easy to use and straightforward, first you initialise the emulator with register and stack. In the initialising phase, i will be using the code from this plugin: nao/eliminate.py at master · tkmru/nao · GitHub.

After our unicorn emulator have been initialised, we want to emulate the code until before the xor instruction to get the value for the key and the FPU instruction position.

The str_to_unicorn_reg function is used to convert the register string into the register enumerate for unicorn engine.

Inside function emulate, the emu.mem_write at line 71 is called to add our bytecodes we want to emulate. The next important code emu.emu_start is call to start the emulation process from the address we’ve written our code into up until the end of our bytecodes. After that, we call emu.reg_read to read value from registers and add to the FPU instruction position the offset to get back the desire decode location. The other part of emulate function is just for initialising.

Run the Emulate.py and you should get the following output:

In this example i haven’t add counter but you can try it yourself, the counter is always at register ecx.

With that, you have gotten all the components that is needed to decode shikata ga nai in phase 2.

But how can we automate the progress for the next shikata ga nai because we are dealing with is like russian doll, layer after layer.

A russian doll have many layers to keep the secret inside safe which is similar to shikata ga nai scheme

Automatic decode multiple rounds of shikata ga nai

The idea behind how to decode shikata ga nai is that shikata ga nai contain 2 phase, each phase is represent as a basic block.

So after we have decode a layer of shikata ga nai, we can get the next begin address by goto the next basic block address from the loop instruction. Binary ninja let you do that through the api bv.get_next_basic_block_start_after which return the next basic block’s first address of our input address.

But how can we know when the function should stop? The answer to that question is pretty simple, check the next few instructions after the loop instruction to see if it have fnstenv instruction, if it doesn’t have any then high chance we have deobfuscated shikata ga nai.

A more sure way is to check the characteristic of the initialising block, like if the block contain a random FPU instructions and fnstenv with pop instruction near it. But this way is very complicate and just check the next 8 instructions from the beginning of the shikata ga nai scheme already give us more than 99% that it is deobfuscated or not (if you don’t believe me then try gather some shikata ga nai scheme and do normal distribution on the distant from the beginning of the scheme to the fnstenv instruction.

Writting a plugin

Let’s combine all of what we have learn to finally write a plugin to deobfuscate shikata ga nai.

First we need to register a plugin. Here i use PluginCommand.register_for_address(<plugin_name>, <Description>, main_function) . The api register the main_function as a plugin for binary ninja and when i right click on an address, i can use the plugin and it will return the address that i right clicked

I have register the my plugin DeShikata to start decode shikata ga nai at the address i right clicked

We want the plugin to run in background by create a class inherit from BackgroundTaskThread class from binary ninja. The class has to contain a __init__ function and a run function. Here is an example:

class RunInBackground(BackgroundTaskThread):
def __init__(self, bv, addr, msg):
BackgroundTaskThread.__init__(self, msg, True)
self.bv = bv
self.addr = addr
def run(self):
bv = self.bv
DeShikata = DeobfuscateShikataGaNai(self.bv, self.addr)
DeShikata.run_deobfuscate()
def main(bv, address):
s = RunInBackground(bv, address, "Deobfuscate shikata ga nai")
s.start()

Inside the __init__ function, we have to call the original __init__ function from the parent class with, the first agrument is the initial progress state, the second is a msg to display at the bottom left corner when the plugin is running, the third argument tell you if you can cancel the task when it is running (For more detail go to the binary ninja api document: binaryninja.plugin.BackgroundTaskThread — Binary Ninja API Documentation v2.4).

The run function responsible for running our deobfuscate functions in the background and when we call s.start it wil invoke run as a background task.

After each time we modify the binary, we need to tell the binary ninja to reanalyse because we’ve just updated it and then wait until the update is done. Binary ninja provide us with function bv.update_analysis_and_wait so when ever we change something in the binary, we can reanalyse it and wait until it done so we can read the next instruction.

Don’t ask me the reason why we need to run the plugin as background task, i don’t know either. When i try to run the plugin not as a background task, i can’t even call bv.update_analysis_and_wait. And if i don’t do call the update_analysis_and_wait function, weird things happen and i think the reason is that when i wasn’t running the plugin as background task, binary ninja also create a thread to automatic reanalyse after each change i make and because the analysing speed doesn’t keep up with the changing, all hell let lose. But i still don’t know why i can’t call bv.update_analysis_and_wait. Maybe some readers can give a proper answer to this reason.

The final plugin can be found on my github: GitHub — acheron2302/Binary-ninja-plugin-collection: A binary ninja plugin to decode shikata ga nai

The result from my plugin:

Conclusion

In the two part, i have show you guys from how to write a simple script for quick use to write a full fledged plugin for general use. In the way, we learn some basic concept of binary ninja api and unicorn engine.

Here i only use simple function from binary ninja that you will use daily like bv.read , bv.write , transformor bv.disassembly_tokens but there are more function, class to messing like basic block, function class,…

What i just scratch is just a tip of the iceberg, there are more to discover with binary ninja like intermediate language, ssa form, workflow, architecture hook and so on.

Try to messing around with binary ninja, it is really fun, easy to use and the slack channel has great support for customer. I would recommend the series: F’ing Around with Binary Ninja — YouTube for some advance binary ninja technique.

Reference

GitHub — tkmru/nao: Simple No-meaning Assembly Omitter for IDA Pro (This is just a prototype)

Unicorn — The Ultimate CPU emulator (unicorn-engine.org)

Binary Ninja Python API Documentation — Binary Ninja API Documentation v3.0

Vector35/snippets: plugin for storing and using snippets of useful Binja script (github.com)

Vector35/OpaquePredicatePatcher (github.com)

--

--