Basic Introduction to Unidbg
Introduction
How can we analyze binary files better and faster? This is a question that plagues all reverse engineering analysts. For the past two or three decades, countless projects have emerged because of this. If one only remains a script kiddie, tired from learning a variety of different tools without understanding the principles and direction of development of these tools, then ultimately one will end up with nothing achieved.
We need to understand the reasons these tools were created, why they became popular, and what the next form or even better tools will look like.
Overview
Unidbg is a lightweight emulator that was open-sourced by zhkl0228 in early 2019. It supports the emulation execution of Android Native functions. A few months after its release, it was further extended in an attempt to add support for the emulation execution of iOS Native functions. To date, Unidbg has higher completeness and usability on Android Native, and our discussion will primarily focus on Android rather than iOS.
It is a JAVA project built with Maven. Readers are advised to download its source code from Github and then open it in IDEs like IDEA/VScode. After the dependencies are downloaded and the project is set up, test run the unidbg-android/src/test/java/com/sun/jna/JniDispatch32.java
code file. If the environment is set up correctly, you should be able to see the following results from the run.
……………………………………………………………………………………(omission)…………………………………………………………………………………………………………………………
JNIEnv->SetByteArrayRegion([B@fdefd3f, 0, 32, RX@0x40010af0[libjnidispatch.so]0x10af0) was called from RX@0x400044d4[libjnidispatch.so]0x44d4
JNIEnv->NewByteArray(4) was called from RX@0x400043e0[libjnidispatch.so]0x43e0
JNIEnv->SetByteArrayRegion([B@a4102b8, 0, 4, RX@0x4000fa80[libjnidispatch.so]0xfa80) was called from RX@0x40004408[libjnidispatch.so]0x4408
JNIEnv->NewObject(class java/lang/String, <init>([B@a4102b8) => "utf8") was called from RX@0x4000442c[libjnidispatch.so]0x442c
JNIEnv->NewObject(class java/lang/String, <init>([B@fdefd3f, "utf8") => "1a6047467b59e8748f975e03016ce3d9") was called from RX@0x40004514[libjnidispatch.so]0x4514
getAPIChecksum checksum=1a6047467b59e8748f975e03016ce3d9, offset=106ms
Find native function Java_com_sun_jna_Native_sizeof => RX@0x40007b50[libjnidispatch.so]0x7b50
sizeof POINTER_SIZE=8, offset=107ms
destroy
History of Tool Development
Let’s recall those most handy and popular tools in the Android Native scenario. For regular tools, the most commonly used one is IDA, which we use for disassembly and decompilation.
Disassembly is the process of converting the machine code at the top of the diagram into the assembly code on the lower left, while decompilation is transforming the machine code or assembly code into the C pseudocode on the lower right. The decompilation quality of IDA is very good, and many reverse engineers rely on its F5 function for their work.
Some friends prefer Binary Ninja or Ghidra, and there are reasons for this. Price-wise, the genuine version of IDA is very expensive, while Binary Ninja is much cheaper, and Ghidra is free. Apart from the price, the user experience and capabilities of Ghidra and Binary Ninja are also commendable. Ghidra, an open-source software by the NSA, has a high level of completion and has received widespread support from the open-source community. Its decompilation quality is very good, almost rivaling IDA in the industry. Binary Ninja, with its attractive UI and excellent control flow and data flow analysis, handles complex instructions that IDA may struggle with, without any issues.
Disassembly and decompilation tools provide static analysis capabilities. In addition to static analysis, we also need dynamic analysis, that is, observing the real execution of the program. The most classic and universal dynamic analysis tool is the debugger, such as GDB, LLDB, IDA Debugger, etc. Debuggers offer classic functionalities like breakpoints, step-by-step debugging, and viewing and modifying registers and memory. The implementation of debuggers is based on the interrupt and exception mechanisms provided by CPUs, such as breakpoint instructions, step mode, etc. As the most classic dynamic analysis tool, the debugger has two pain points. First, because it is classic and commonly used, the means to detect it are particularly mature, and almost all applications have implemented logic to detect and prevent the use of debuggers. Second, as debuggers are based on the exception mechanism, they are resource-intensive; anyone who has used a debugger to trace knows it can be incredibly slow.
Apart from debuggers, another major category of dynamic analysis tools is Hook. In the Android Native analysis scenario, the most commonly used Hook techniques are inline Hook and plt Hook. The former is represented by Frida, and the latter by xHook. Many people like using Frida to Hook Native because there are ample resources available, and Frida’s JS binding is indeed very convenient. However, it should be noted that more and more tools are beginning to detect Frida, just as they frequently detected debuggers in the past.
Debuggers are mainly used in algorithm analysis, while Hooks can be used both in algorithm analysis and for observation and monitoring. How do we differentiate between algorithm analysis and observation monitoring? Algorithm analysis is a more “fine” concept, concerned with the entire process of generating a specific data. Monitoring and observation are more “broad” concepts, focusing on the program’s access to critical system APIs, such as JNI, file access, system calls, library functions, encryption and decryption functions, etc.
Sometimes we modify and customize the Android system for better monitoring and observation, which we refer to as a sandbox environment. The most common approaches include modifying the kernel’s system call table to monitor and intercept system calls, and altering Android Framework code to monitor the invocation of JNI functions, the loading of SOs, and the use of standard encryption and decryption algorithms, among others.
In recent years, eBPF has become very popular. It is a backdoor and observation interface provided by the Linux kernel, allowing us to observe the Android system in a non-intrusive, elegant, and flexible manner. This enables the monitoring and interception of critical APIs such as JNI, system calls, and network flows.
Let’s summarize. Tools like IDA, which offer disassembly and decompilation, enable static analysis for algorithm analysis and monitoring observation; debuggers are traditional, universal tools for dynamic analysis; Hook is the most flexible and widely used dynamic analysis technique, extensively applied in both algorithm analysis and monitoring observation; modifying/customizing the system, as well as using eBPF mechanisms, etc., achieve comprehensive and thorough monitoring and observation from a system perspective, and are among the hottest topics in Android reverse engineering in 2022.
Each technological direction and its specific tools, whether static or dynamic, help us better understand the behaviors or algorithms in binary programs. So, where does Unidbg fit in, and what are its advantages?
Overview of the Emulator
Unidbg falls under the category of emulators. To discuss Unidbg’s positioning and advantages, we need a two-step approach. First is to discuss the positioning and advantages of emulators in general, and second is to discuss what advantages Unidbg has over other emulator projects. Clarifying these issues is not easy, so I’ll divide the content into three subsections. This section discusses emulators.
Emulators can be simply divided into CPU emulators and operating system emulators. Let’s first talk about CPU emulators.
The CPU is the core computing module of a computer, with a straightforward responsibility: executing instructions. In specific terms, this process is divided into three steps: fetch, decode, and execute.
Below is an assembly snippet of an ARM program in IDA:
.text:0000D506 8A 42 CMP R2, R1
.text:0000D508 E8 D1 BNE loc_D4DC
.text:0000D50A 4C F2 E3 20 MOVW R0, #0xC2E3
.text:0000D50E 4E F6 37 25 MOVW R5, #0xEA37
.text:0000D512 C3 F2 40 60 MOVT R0, #0x3640
.text:0000D516 19 1A SUBS R1, R3, R0
.text:0000D518 01 39 SUBS R1, #1
.text:0000D51A 48 F2 41 02 MOVW R2, #0x8041
.text:0000D51E 08 44 ADD R0, R1
.text:0000D520 4E F6 36 21 MOVW R1, #0xEA36
.text:0000D524 44 F2 44 13 MOVW R3, #0x4144
.text:0000D528 4D F6 B2 76 MOVW R6, #0xDFB2
.text:0000D52C C6 F2 5F 25 MOVT R5, #0x625F
.text:0000D530 0C 90 STR R0, [SP,#0x58+var_28]
.text:0000D532 C6 F2 5F 21 MOVT R1, #0x625F
.text:0000D536 CE F6 4B 22 MOVT R2, #0xEA4B
.text:0000D53A C4 F6 82 73 MOVT R3, #0x4F82
.text:0000D53E C7 F2 59 56 MOVT R6, #0x7559
.text:0000D542 6F F0 01 0C MOV R12, #0xFFFFFFFE
.text:0000D546 28 46 MOV R0, R5
.text:0000D548 1B E0 B loc_D582
The CPU receives a byte stream similar to what is described below:
00000000 fc 6f ba a9 fa 67 01 a9 f8 5f 02 a9 f6 57 03 a9 |üoº©úg.©ø_.©öW.©|
00000010 f4 4f 04 a9 fd 7b 05 a9 fd 43 01 91 ff c3 06 d1 |ôO.©ý{.©ýC..ÿÃ.Ñ|
00000020 48 d0 3b d5 e1 23 01 a9 08 15 40 f9 e9 e3 01 91 |HÐ;Õá#.©..@ùéã..|
00000030 ea c3 01 91 f4 03 00 aa a8 83 1a f8 e9 2b 03 a9 |êÃ..ô..ª¨..øé+.©|
00000040 e8 1b 40 f9 ff 3f 00 f9 e8 1f 40 f9 55 fe ff 97 |è.@ùÿ?.ùè.@ùUþÿ.|
00000050 e1 03 00 aa e2 c3 01 91 e3 e3 01 91 e0 03 14 aa |á..ªâÃ..ãã..à..ª|
00000060 ad 0b 00 94 59 f5 93 52 9a 17 8c 52 3b 1e 81 52 |....Yõ.R...R;..R|
00000070 8a 11 98 52 34 c2 80 52 2b 40 96 52 b3 17 8c 52 |...R4Â.R+@.R³..R|
00000080 18 40 96 52 36 17 88 52 f7 43 84 52 dc e6 88 52 |.@.R6..R÷C.RÜæ.R|
00000090 35 3c 85 52 48 f5 93 52 b9 86 b0 72 ba 62 b3 72 |5<.RHõ.R¹.°rºb³r|
000000a0 5b c4 b2 72 ea c5 a2 72 f4 73 b1 72 8b 5d a2 72 |[IJrêÅ¢rôs±r.]¢r|
000000b0 b3 62 b3 72 98 5d a2 72 96 06 ae 72 77 09 bc 72 |³b³r.]¢r..®rw.¼r|
The CPU first fetches a machine code from the memory address pointed to by the PC (Program Counter). For instance, in this IDA program, the first code is 8A 42
. If the instruction set is of fixed length, like ARM64, where each instruction is 4 bytes long, fetching instructions is quite straightforward: always fetch the four bytes from PC to PC+4. If the instruction set is variable-length, like X86, where an instruction could be 1 byte or up to several bytes, or in the thumb2 instruction set, where an instruction could be 2 or 4 bytes, the process of fetching instructions becomes more complex.
After fetching comes decoding. Decoding involves translating the machine code and understanding the semantics of these bytes, including the type of instruction and the number and type of operands. You can think of decoding as disassembly, translating 8A 42
into CMP R2, R1
.
Taking ARM64 as an example, each instruction is 4 bytes long, or 32 bits. Some of these bits describe which type of operation the instruction performs, including loading from memory, storing to memory, arithmetic operations like addition, subtraction, multiplication, division, XOR, and AND operations, etc. These bits that define the type of operation are known as the OpCode (Operation Code). The remaining bits describe the parameters needed for this operation, known as operands, which could be registers, immediate values, or memory addresses.
Decoding, or disassembly, involves figuring out what operations and operands are signified by machine codes like 4D F6 B2 76
or 8A 42
. For example, our first line of 8A 42
, as disassembled by IDA, shows that its operation code (OpCode) is CMP, with the first operand being R2 and the second operand R1.
The difficulty of decoding depends on the encoding rules of the instruction set. Suppose in an instruction set, the OpCode is always the first byte of the instruction. In that case, decoding becomes relatively simple. For example, if the first byte 0x00 indicates Nop, 0x01 indicates ADD, 0xFE indicates LDR, and 0xFF indicates STR, then during decoding, one only needs to parse the first byte and then interpret the operands according to the corresponding instruction category rules. However, if another instruction set uses several scattered bits in a 32-bit format to represent the OpCode, perhaps for compact arrangement or other reasons, the process becomes more complex, requiring mask and shift operations to determine the OpCode.
After decoding, the CPU executes the instruction, a process that may involve accessing and updating registers, memory, immediate values, etc.
The fetch-decode-execute cycle performed by the CPU is a form of hardware support. If we implement these three steps in software, that’s essentially a CPU emulator, which can also be called a virtual machine.
Below is a simple pseudocode structure, continuously simulating and executing instructions.
while(True){
Instruction = Fetch(PC++);
Opcode = Decode(Instruction);
switch(Opcode){
case ADD{
doAdd();
}
case LDR{
doLDR();
}
// …………
}
}
CPU emulators are widely used in many fields of computer science, such as education and development testing.
In computer science education, teachers use CPU emulators to help students more intuitively understand various instructions. In assignments, students might be tasked with writing a CPU emulator to deepen their understanding of instruction systems. For example, Visual2, an interesting project, was a student assignment at Imperial College London.
Visual2 is a project written in the F#
language, simulating the ARM instruction set under the ARM32 architecture (excluding Thumb2). It features an attractive front-end interface, and students even designed floating diagrams for some instructions, as shown below.
It should be noted that it is an excellent instructional tool for teaching ARM32 instructions. You can download its installation package, V2releases, for a hands-on experience.
Development and testing are also significant needs. When doing cross-platform assembly development, instruction simulators can assist developers in testing, verifying functionality, and uncovering bugs, such as the simulation debuggers used in embedded development.
If we consider CPU simulators from the perspective of reverse engineering needs, it must be said that past open-source CPU simulators had many issues. For instance, they often supported only a single architecture. Popular projects like x86emu and pyemu were limited to the X86 architecture. This is inconvenient if you are dealing with scenarios involving multiple architectures. Besides being limited to a single architecture, another major issue is incomplete simulation. These simulators often only simulated the most commonly used few dozen or a couple of hundred instructions, neglecting the relatively less common instructions in the set. This led to situations where the simulator couldn’t handle certain instructions when dealing with complex samples.
Speed is another significant issue. Real CPUs, over decades, have reached a frighteningly fast level of instruction execution speed thanks to hardware development and technological advancements. Simulating a CPU via software for instruction emulation is significantly slower, sometimes running only a few hundred instructions per second. With complex samples that involve millions of lines of execution, such simulators are practically unusable.
To speed up simulators, the mainstream solution is to shift from simulation to translation. Specifically, this means translating the instructions to be simulated into native instructions of the host machine, then letting the CPU execute them directly. The most widely used technology for this is JIT (Just-In-Time compilation). For example, to execute ARM instructions on an X86 processor, ARM instructions can be translated into X86 instructions for direct execution by the processor.
Some readers might wonder, given the many issues with simulators, why do we still use them? The reason is that a normally designed CPU is an unemotional instruction-executing machine; it does not offer any assistance or backdoors for our algorithm analysis. Simulators, however, are different. We design simulators specifically to better observe the process of instruction execution. Therefore, various callbacks, interceptions, and printing mechanisms are provided in simulators to enhance self-observation.
For example, the PyEmu project from a decade ago supported these kinds of self-observation mechanisms.
- Entering custom processing logic after executing each instruction.
- Entering custom processing logic when executing a specific category of instructions (such as jump instructions or memory load instructions).
- Entering custom processing logic for a specific type of instruction (like CMP).
- Entering custom processing logic for a subclass of a certain instruction (such as CMP.W).
- Entering custom processing logic when a particular register is assigned a value.
- Entering custom processing logic when a specific area of memory is accessed.
For instance, here are a few specific API examples: enter the custom my_eax_handler
processing logic when the eax
register is accessed (either read or written).
def my_eax_handler(emu, value, type):
print("[*] Hit my_eax_handler %x: %s (EAX, %x, %s)" % (emu.get_register("EIP"), emu.get_disasm(), value, type))
return True
emu.set_register_handler("eax", my_eax_handler)
For example, intercept and print when the instruction is CMP.
def my_cmp_handler(emu, op1, op2, op3):
print("[*] Hit my_cmp_handler %x: %s (%x, %x)" % (emu.get_register("EIP"), emu.get_disasm(), op1, op2))
return True
emu.set_mnemonic_handler("cmp", my_cmp_handler)
For instance, when read or write operations are initiated at the address 0x44444444, enter the custom my_memory_handler
to print the contents.
def my_memory_handler(emu, address, value, size, type):
print("[*] my_memory_handler(0x%08x, %x, %x, %s)" % (address, value, size, type))
return True
emu.set_memory_handler(0x44444444, my_memory_handler)
Having briefly discussed CPU simulators, let’s now turn to operating system simulators.
The sole function of a CPU simulator is to execute instructions, meaning it does not comprehend higher-level concepts such as various binary file formats like PE, ELF, or MachO. Parsing and loading these formats are tasks for the operating system. In addition, a typical operating system provides hundreds of system calls with functionalities ranging from retrieving system information to managing processes/threads, memory management, file management, and so on. You might be more familiar with the concept of “library functions,” but these too, at their core, rely on system calls. For example, below is the implementation of openat
.
.text:00000000000C1920 __openat ; CODE XREF: open64+B8↑p
.text:00000000000C1920 ; __open_2+28↑j ...
.text:00000000000C1920 ; __unwind {
.text:00000000000C1920 MOV X8, #0x38 ; '8'
.text:00000000000C1924 SVC 0
.text:00000000000C1928 CMN X0, #1,LSL#12
.text:00000000000C192C CINV X0, X0, HI
.text:00000000000C1930 B.HI __set_errno_internal
.text:00000000000C1934 RET
.text:00000000000C1934 ; } // starts at C1920
The SVC (Supervisor Call) instruction is used to initiate a system call. It causes a trap into the kernel, and then the operating system takes over for resource management and scheduling. Typically, a CPU simulator encountering an SVC instruction will either report an error or do nothing and continue executing (just like encountering a NOP instruction).
An operating system simulator needs to be built on top of a CPU simulator. It requires a binary file loader, commonly referred to as a Linker, and then needs to implement various system calls that would normally be handled by the operating system on the corresponding architecture, as detailed on this website.
To implement these system calls, you need to at least understand their semantics. Some of them can be roughly handled with hard-coded methods, while others require the use of the host machine’s APIs, as well as a considerable amount of logic built to simulate these system calls, which can be a significant workload.
Unicorn
We have discussed CPU simulators and operating system simulators above. Now, let’s talk about the Unicorn project. Operating system simulators like Unidbg and Qiling are developed based on it. Unicorn is a CPU simulator that was open-sourced in 2015 by a team from Nanyang Technological University in Singapore, and it has been updated to Unicorn2. In addition, Keystone and Capstone are also works of this team. These three are collectively referred to as the “Three Musketeers,” used respectively for assembly, disassembly, and emulation execution. “Unicorn” is literally translated as ‘Unicorn,’ and its logo is designed accordingly.
Unicorn prides itself as a new generation CPU emulator, considered superior to its predecessors. Its confidence stems from three main aspects:
- Leveraging Qemu’s Strengths: Qemu is the most famous open-source operating system emulator, and many popular mobile game emulators, like Nox and Bluestacks, depend on Qemu. Unicorn carved out the CPU emulation part of Qemu as a separate project. Thus, Unicorn inherits Qemu’s various excellent features, such as support for numerous architectures (X86/X64/ARM/ARM64/Mips, etc.) and JIT compilation for translating instructions into native instructions of the host machine for acceleration.
2. Unicorn’s Own Efforts: Simply leveraging Qemu isn’t enough; Unicorn has also made significant contributions. It provides various levels of hooks and interceptions, similar to what was described with PyEmu, which requires considerable effort. Additionally, Unicorn offers bindings in multiple programming languages like Python, Java, and C#, enabling convenient integration into projects in these languages. In contrast, many past projects had limitations, such as X86Emu being an IDA plugin with poor portability, or PyEmu being written in Python 2, making it inconvenient to use.
3. Sound Design Philosophy: Many past CPU simulators, in an attempt to be more practical, also took on the tasks of operating system simulation. This strong coupling of CPU and operating system simulation presented numerous problems. By trying to manage both, these simulators often ended up not doing particularly well in either area. Unicorn, however, focuses solely on CPU emulation. It offers simple interfaces for hooks, memory operations, and instruction execution, without concerning itself with higher-level operating system tasks like binary formats, system calls, or library functions. This approach provides a pure, multi-architecture, and efficient CPU emulation capability.
This focus allows researchers to build upon Unicorn and concentrate on creating upper-layer operating system simulators, knowing they have a robust and reliable Unicorn supporting the base. Numerous operating system simulators based on Unicorn have emerged in recent years, with Unidbg and qiling being notable examples.
In summary, Unicorn is an exceptional instruction emulator, more focused and powerful than previous CPU emulators.
Projects Based on Unicorn
There are dozens to hundreds of projects based on Unicorn, so why choose Unidbg? The answer is quite straightforward, boiling down to two main reasons:
1. Focus on Android Native Scenarios: Unidbg is an operating system emulator specifically focused on Android Native scenarios, whereas many other emulators focus on the Windows operating system, like Binee and speakeasy. In fact, the majority of emulators deal with Windows, with a smaller portion focusing on Linux. But what about Android Native? It’s almost neglected. Qiling, although a multi-operating system emulator, does not offer a high degree of simulation in Android Native scenarios, with almost no handling of JNI logic. Unidbg’s real competitors in this area are mainly AndroidNativeEmu and its successor, ExAndroidNativeEmu.
2. Higher Completeness Compared to Competitors: Compared to its competitor, ExAndroidNativeEmu, Unidbg offers better completeness. It simulates more system calls and JNI, indicating a higher level of project completion.
The benefits of simulators
We need to discuss two aspects: 1) the advantages of simulators compared to previous solutions, and 2) the strengths and weaknesses in the current development of simulators. Simulators have long been considered more competitive compared to traditional debugging or hooking methods.
First, let’s consider debugging. Originally, using traditional debuggers like GDB/LLDB/IDA Debugger had two major issues. Firstly, they were easily detectable due to the distinct features of the underlying toolkits they were based on, such as software breakpoints, ptrace, debug servers, etc. This detectability became customary as debuggers were a conventional solution. Secondly, debugger-based solutions had many limitations, with limited functionality. For instance, monitoring memory read/write operations with watchpoints, especially on the ARM architecture, was challenging. Additionally, instruction tracing was extremely costly in terms of overhead when used in debugging, making it almost impractical for tracking large code samples. This was because instruction tracing in debuggers relied on exceptions and single-stepping, which incurred significant overhead and greatly affected the program’s execution.
The advantages of simulators are evident as they often come with built-in codeHook and memHook, offering powerful, stable, and flexible functionality. Mature tools like Unidbg, for example, provide both a headless debugger built on Unicorn and a well-implemented GDBStub, allowing the use of IDA’s debugging interface as a frontend. Moreover, Unidbg’s traceCode feature can trace millions of instructions per hour, providing an excellent user experience.
Next, let’s discuss hooking, such as Frida-like hooking solutions, which are also easily detectable. In Unidbg, there are third-party hook frameworks like Dobby, similar to Frida, and you can also perform hooking based on Unicorn’s codeHook. This provides a good hooking experience in Unidbg and is less prone to detection. However, overall, the hooking experience in Unidbg is not necessarily better or more powerful than previous tools.
Finally, compared to modifying or customizing systems, or using monitoring mechanisms provided by the system itself (e.g., eBPF), what are the unique features of Unidbg? Let’s delve a bit more into this topic. In the field of dynamic analysis, there are two main schools of thought on how to observe a sample’s behavior.
- The first approach involves making modifications or utilizing internal mechanisms of a complete system or performing full simulation (e.g., with QEMU) to give the system an “observer” capability from top to bottom. Examples of simple modifications include altering ROM to print JNI calls or loading shared objects (SO). More complex modifications could involve altering the system call table to monitor system calls, or using the non-intrusive and elegant eBPF mechanism provided by the kernel to intercept and monitor system calls.
- The second approach focuses on building a locally usable micro-operating system from the ground up, achieving complete control over the runtime environment. Examples of tools in this category include Qiling and Unidbg, which use CPU emulation as their foundation and add loaders, simulate major system calls, implement critical memory management, file management, and thread management. Specific samples are then patched to create an environment with full control. In this approach, system calls, JNI functions, SO loading logic, etc., are all self-implemented or simulated, leaving no areas that cannot be observed.
How to evaluate these two trends? Both approaches have their uses and limitations.
For the first approach, its greatest advantage lies in its generality and adaptability. It can handle any samples and scenarios that the operating system can handle. On the other hand, the second approach is limited in this regard because it simulates an operating system, and emulation is not as comprehensive as the real thing. If the operating system can handle a wide range of situations, various simulators can only handle a few specific scenarios. For example, Unidbg is not suitable for handling game shared objects (SO) and cannot handle entire applications.
The second approach’s main strength is its control over the environment. To continue with the ship analogy, with an aircraft carrier, even if you add many mechanisms, you cannot have complete control over it. You are merely using its mechanisms and establishing certain agreements and compromises. However, with the small ship you build, every piece of wood, every detail is under your control, giving you complete and ultimate control. For example, in Unidbg, we can easily specify the base address of a shared object (SO), the location of the stack, how a specific system call is implemented, and what operations are performed on a particular file, and more.
Currently, the first approach is more popular in research because eBPF is incredibly powerful, giving us finer and more flexible control over aircraft carriers. It’s also superior to intrusive methods, as it utilizes mechanisms and “backdoors” provided by the system itself.
Many people are using simulators for monitoring and observation, such as using Unidbg to simulate the execution of samples and then observing JNITrace, Syscall Trace, file access, and more. While simulators are valuable for algorithm analysis due to their capabilities based on low-level CPU emulation, their assistance in monitoring and observation comes from simulating the operating system. In the process of simulating the operating system, printing log outputs for internal JNI and Syscall activities is not a difficult task.
However, it’s important to note that simulators are not inherently well-suited for monitoring and observation. Researchers should focus more on observation mechanisms supported by real systems like eBPF. The reason is straightforward: simulators cannot faithfully replicate all system calls, so observations made with simulators may not completely match the actual behavior of samples on real devices, especially for complex samples.
The primary and core function of simulators should always be to assist in algorithm reconstruction. It’s also essential to distinguish between heavyweight operating system simulators and lightweight micro-operating system simulators. Strictly speaking, simulators like Unidbg fall into the category of lightweight, limited operating system simulators. You cannot expect them to run a complete APK, let alone game applications. On the other hand, simulators like Bochs or Qemu are heavyweight or complete operating system simulators, capable of performing almost any task similar to a real operating system.
Heavyweight operating system simulators are more powerful but relatively less flexible, offering fewer analysis and debugging capabilities. Lightweight operating system simulators have more limited functionality but are more flexible and can perform fine-grained and comprehensive analyses. This is a theoretical statement, but in reality, we observe that lightweight operating system simulators often underperform, as they are weaker in terms of capabilities compared to heavyweight counterparts and do not provide extensive functionality.
To clarify further, all projects based on CPU emulation can theoretically offer similar features. The difference lies in which CPU emulator supports more architectures, is more user-friendly, faster, and has a larger ecosystem, with Unicorn being a standout.
Similarly, all projects based on Unicorn can theoretically provide similar features since they share the same foundation. The key difference is in simulating more system calls, indicating a higher level of completeness. In the context of Android Native, Unidbg excels.
When comparing heavyweight and lightweight operating system simulators, the latter’s characteristic is that they only simulate a limited subset of functions. For example, Unidbg only simulates shared objects (SO) and cannot simulate complete APKs. Theoretically, this means that because they do fewer things, they can do those limited tasks more elegantly and have stronger control. For example, they can perform checks within their limited memory for sensitive content after executing a basic block or a function. Heavyweight operating system simulators cannot do these things because they have to simulate a lot more, leading to larger memory spaces and an inability to frequently “introspect” themselves. Theoretically, lightweight operating system simulators can achieve this, which is a significant advantage, but in practice, we observe that projects like qiling, Unidbg, and others haven’t fully utilized this advantage.
Conclusion
Unidbg is one of the most powerful auxiliary analysis tools in recent years for Android native reverse engineering scenarios. To effectively utilize Unidbg, you should have a certain foundation in the following areas, and the more you master, the better:
1. Basic Analysis Tools:
— Use of IDA Pro for disassembly and analysis.
— Familiarity with Frida for native hooking and function calls.
2. Basic Analysis Strategies:
— Top-down sequential analysis.
— Tracing results back to their origins.
— Breaking through key functions.
— JNI tracing.
— Function tracing, among others.
3. Common Encoding and Algorithms:
— Hash algorithms and schemes: MD5, SHA1, SHA256, CRC32, HMAC, etc.
— Encryption algorithms: AES, RC4, SM4, RSA.
— Encoding and compression algorithms: Base64, Zlib, Protobuf.
4. Programming Languages and APIs:
— Basic knowledge of C/C++ programming.
— Fundamentals of JNI (Java Native Interface) programming.
— Familiarity with C standard library functions.
Having a strong foundation in these areas will be invaluable when using Unidbg for Android native reverse engineering. It will enable you to effectively analyze and understand the inner workings of Android applications and libraries.