After having written a similar payload for Linux/Intel x64 I was curious about how to apply this knowledge for other architectures so I decided to go with ARM since it’s an interesting and wide-spread one.
ARM is a Reduced Instruction Set Computing (RISC) processor architecture that is used everywhere these days: mobile phones, smart thermostats, tv’s, wi-fi dongles, cars, credit cards, you name it.
How does ARM compare against Intel x86?
Here are some key takeaways:
- Being ARM a RISC processor it has a simplified instruction set that is just a fraction of its 32-bit Intel counterpart, the x86.
- While on x86 most instructions are allowed to access/operate memory, on ARM the data must be moved from memory into registers before being operated on. Most ARM instructions operate only on registers. Only Load/Store instructions can access memory.
- ARM has two main instruction set states ARM and Thumb. Thumb instructions are 2 bytes long most of the time while in ARM state instructions are always 4 bytes long. For shellcode writing, Thumb state is the de facto as it saves space and avoids a lot of null bytes.
- The ARM instruction has a limited range of immediate values available to be used directly with a mov instruction. If a number is out of this range it can’t be used directly and must, therefore, be split into parts and loaded using several operations/values.
How do I switch to Thumb state?
In order to switch to Thumb state, we can make use of the Branch and Exchange instruction (bx) after having set the destination register’s least significant bit to 1. This can be achieved by adding 1 to the Program Counter (pc register) while on ARM state.
<some arm code>
// Here we are running on ARM state
add r0, pc, #1
// Increase value of PC by 1 and place the result into r0
// Branch & Exchange to the address in r0
// This will make the switch to Thumb state because the LSB of r0 = 1
// From here on we can execute Thumb state instructions!
<some thumb code>
From now one all the coding will be done in Thumb state since this is the relevant state for writing shellcode.
Setting up the lab
First of all, we’ll need a lab to run our tests on. Here are some options for it:
1- You can go for the real deal and test the payload on a real Raspberry Pi 1.
2- You can build/run an emulated environment using Qemu. Since I used the Qemu armv6_stretch image from this repo I’d recommend you use the same setup. It pretty much works off-the-shelf, don’t worry.
3- You could download and use the VM provided by Azeria Labs.
Writing the payload
Stage One: General Overview
First of all, what are we trying to achieve here? Our goal is to write shellcode for the Linux 32-bit ARMv6 architecture that will connect back to a remote location over TCP/IPv4 and provide a shell only after the remote client provides a valid password. In order to write the payload, we need to chain several syscalls. The exact order is the following:
1- We create a new socket to manage the new connection.
2- We connect to the target address.
3- We read from the socket and check if the provided password is correct.
4- We duplicate each standard stream into the new connection stream using the dup2 syscall, so the target machine can read and write messages to and from the source machine.
5- We start a shell by using the execve syscall.
Each of these syscalls has a signature we need to address. Certain registers must contain specific values. For example, the r7 register is used to identify the syscall that is executed so it should always contain the syscall number. A whole document containing a full syscall table can be found here.
Stage Two: Writing a Syscall
Let’s see an example of how to write a syscall in ARM Thumb state. We’ll use the socket syscall:
//  socket(2, 1, 0)
02 20 mov r0, #2 // loads immediate value 2 into r0
01 21 mov r1, #1 // loads immediate value 1 into r1
52 40 eor r2, r2 // zero-outs r2 by xoring it with itself // 281 is out of range for immediate values
// It must be loaded in parts
c8 27 mov r7, #200 // part1: loads immediate value 200 into r7
51 37 add r7, #81 // part2: adds 81 to r7 as (syscall number)
01 df svc #1 // issues the syscall
Here you can see:
- The r7 register being used identifying the syscall
- Registers r0,r1, and r2 used as parameters for the syscall
- An example of how to deal with immediate values that are out of range
- The use of svc instruction to perform a system call
Stage Three: Writing the full payload
Armed with all our knowledge we are now prepared to chain every syscall and put together our payload. The following Gist was extracted from the source code on my main repository:
When testing the payload take in consideration that it was crafted for Linux 32-bit ARMv6 (the same chip the Raspberry Pi 1 has). Some quirks may be needed for it to work on other platforms/architectures. In the following video, you can see the whole process of booting up the Qemu armv6 image, assembly of the payload and, finally, the test:
That’s all! Hope you enjoyed this one. The source code can be found on my GitHub repo and Exploit-DB: