How I wasted a whole day trolling with assembly
So Sindre Sorhus and I were exchanging troll-ish code back and forth the other day in competitive fashion. He mentioned I should make a post about it. Here we are.
At first, it was my seemingly un-compilable C++ snippet.
The secret? C++11 and onward apparently allows non-ascii characters in identifiers, as long as they aren’t the first or last.
So I (ab)used the
UNICODE ZERO WIDTH SPACE U+200B character and sprinkled a few of them in there. The different
lol's are really different identifiers.
Neat! And confusing.
Sindre retaliated. Things were heating up.
What’s going on here? This nonsense:
+is the equivalent of
- Prefixing the above with
!is the equivalent of
!0, or simply
- The first array that is created is simply
!+repeated three times and summed (
3is then the first element of an array, yielding
- That array is added to another array with the value of
+, which we determined yields
0. The second array is thus
- When you add two arrays, they are first coerced to strings (
"0", respectively) and the resulting strings are concatenated.
Thus finally yielding
He then went on to show me up even more.
For those of you that don’t want to wait for your terminal to paste all 24206 characters into Node, it simply outputs
I had to pull out all the stops. I went deep into my bag of tricks and recalled Dr. Tom Murphy VII’s SIGBOVIK entry that was both a human readable, fully printable text file that was also a fully functioning DOS executable (no compilation needed).
That’s to say, his human readable text file consisted of just the right selection of bytes that corresponded to just the right machine code instructions to run a program. All of those bytes are considered printable (
I decided to do the same.
I hadn’t fully read Murphy’s paper, and I considered that half of the fun.
JSON.stringify(.slice.call(document.querySelectorAll('table.ref_table tbody tr')).map(row => [row.children.innerHTML, row.children?row.children.innerHTML:null, row.children?row.children.innerHTML:null]))
and produced a JSON string of just the opcodes and their descriptions, pulled that into a local Node REPL and used a quick filter based on if the opcode’s corresponding byte was printable.
I realized two things: opcodes aren’t necessarily the only bytes in the instruction (the bad news), but there are a lot of things you can do with printable opcodes (the good news).
Needing to test these different instructions, I set up a quick little command line tool that would show me the resulting machine code bytes along with their printable text, just so I could experiment.
$ (fasm /dev/stdin /dev/stderr > /dev/null) 2>&1 <<< “use64^Mpush rax” | xxd
00000000: 50 P
Beautiful. The first test was promising; pushing the value in the
RAX register onto the stack was a single printable byte
P. This was going better than I expected.
I experimented with a few more instructions (
use64^M was removed from
fasm's input string for brevity):
$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "push rdi" | xxd
00000000: 57 W
$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "push rcx" | xxd
00000000: 51 Q
$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "pop rax" | xxd
00000000: 58 X
$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "pop rcx" | xxd
00000000: 59 Y
$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "pop rdi" | xxd
00000000: 5f _
Too easy. After about 3 hours of toying around with different instructions, I ended up finding a few that were quite useful:
$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "sub eax, dword 0x55555555" | xxd
00000000: 2d55 5555 55 -UUUU
$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "xor [eax], edi" | xxd
00000000: 6731 38 g18
$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "jns 113" | xxd
00000000: 796f yo
Yes, that’s right. Jump near (relative to the current position) with an argument that is an immediate (aka “literal”) operand (argument) that is within the bounds of a signed 8-bit integer (in this case, 113) yields the two printable bytes
I also found out, through farting around enough, that
<.< (with the space) is an innocuous series of
AL register comparisons (I wasn’t using the zero flag at all, so it didn’t apply to my code). Of course, I was going to throw that in there as much as I could.
My toolbelt was defined, and I couldn’t have been happier.
Well I could have, if
ret or any of its variants (e.g.
pop rax; jmp rax, which yields the bytes
58 ff e0) were printable. The closest I could do was simply
ret itself, which was
\xC3. Naturally, I went with
After a few hours of pacing, thinking and determining just exactly what I wanted to do, I determined I wanted to use a string as a function and output something simple, like “Hi!”.
Hi!\0 fits into four bytes. An integer. Way too perfect, especially since a few 64-bit instruction variants yielded unprintable characters — since we were only dealing with 32-bits overall (four bytes in our
Hi!\0 string), we only needed the top 32 bits of our registers to work with the data, thus
eax and equivalent extended register instructions would suffice (especially since our machines are usually little endian).
It was settled. I was going to overwrite the first four bytes of my string “function” to the bytes
"Hi!\0" and then return the string itself back to something like
I drafted up some harness code to cast a string to a function pointer and ended up with the following:
rdi, which is the register the first argument of x86_64 function calls is passed into on System-V-type ABIs. (hint: this is the cheat-sheet I’ve come to rely on).
Xpops that value into
rax, the register used to pass return values back to the caller (also according to the System V AMD64 ABI).
\xC3returns execution to the caller. For the less machine-code inclined, this is the equivalent of popping the return address off of the stack and
jmp-ing to it.
LLDB confirmed it as well:
Process 21727 stopped
* thread #1: tid = 0x3ad74, 0x0000000100000fa6 testlol, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100000fa6 testlol
-> 0x100000fa6: pushq %rdi
0x100000fa7: popq %rax
I casted that string to a function pointer that takes a string and returns a string, and called it, using the same string. It is the equivalent of:
The above code transfers execution to the address of the string’s content (its bytes) and begins execution.
The stage was set. I needed to somehow transform the first four bytes of the string into
Hi!\0. I determined I needed a string to begin with, and wanted to make it flashy, because what would be the point otherwise?
I remembered a few of my branch instructions that were printable and the one that stood out to me was
yo, which equates to if the ‘sign’ flag is not set, then jump to the position relative to the next signed 8-bit integer ‘0’, or 113 bytes forward.
Perfect. I had 113 bytes of runway after the initial
you should have expected this to get out of hand, Sindre. 💃😈this is almost entirely your fault!! 🙈💩
Yep. I went there. I used UTF-8 encoded emojis (anywhere from 3 to 5 bytes). The above string is perfectly 115 bytes (minus the
yo and you get 113) of Grade-A unicode cheese.
The next instructions were crucial. I determined I could
xor the first four bytes with a constant to get my desired result. This yielded
0x20540631 — which is so close to being printable (but the
0x06 messed up my day).
Alright, so what could I do? I remembered subtraction (
sub) was printable with an immediate (literal) value. What if I started with a constant integer literal, subtracted from it another constant integer literal to arrive at the
xor product, and then
xor'd that with the first four bytes of the string to produce
"Hi!\0"? Both the initial value and the subtracted value needed to be printable in binary form. Could there really be a pair of integers that existed to conform to such a constraint?
/tmp directory has since been cleared out. Rest assured it was useless.
So off to C land it was. Since I knew I was going to have to subtract, and I wanted to keep runtimes to a minimum, I decided to start at the
xor product and iterate upward. If I didn’t find any matches, I’d then re-write the loop to start at
Here was the test case:
I immediately proofed a few of the results to make sure the math was solid (it was), picked a pair at random, and went to work on my final product — 7 and a half hours later (assembly is tricky, okay?).
30 minutes later I had a working function string. Two things I remembered (and consequently experienced slight pain reflexes from my time of reverse engineering Combat Arms all those years ago) were that:
- In most situations, you cannot write to statically allocated memory (causing my
xorinstruction to crash with an access violation)
- In most situations, you cannot execute anything but statically allocated memory (causing another access violation when casting a
strdup()'d copy of the string to the function pointer).
Of course, you can change permissions on the memory, but that was too over the top for le simple troll.
So I settled on the happy medium of passing in a
strdup()'d copy of itself to the statically-casted function pointer, et voilà!
Let’s break it down:
yo- jump 113 bytes forward if the sign flag is not set (it never is). Lands on the
Wafter the shit emoji 💩.
- I transfer
rdi(the first argument) to
rcxto store it (
pop rcx, or in ASCII,
WY). I needed to do this since the
eaxto be printable; xor-ing
ecxyielded a non-printable byte.
<.<, just for fun.
- We push the first literal of the pair I chose onto the stack. This gives us
h(push literal dword)
[+}Q(the initial value).
- I added a few more
<.<‘s, for good measure.
X- pop the value into
- Then it was time for our subtraction. First we have
-, or subtract literal dword from
rax, and then our 32-bit literal
*%)1. At this point,
eaxhad the value I needed to
xorinto the first four bytes of our string to yield
- It was then time to set up the very specific
xor. I needed the destination address in
raxand the value to XOR in
rdi. Subsequently, this would set me up nicely to just return the string’s address itself as the return value (in
- First, the juggling act:
P), another face,
Q), and then
X) — which left our to-be XOR’d value in
rdiand the address of the beginning of the string in
- Then, the XOR itself:
xor [rax], rdiwhich means XOR the value in
rdiwith the contents of
rax(which works because it’s a little endian system, so our XOR value is in the first 4 bytes of the 8 byte registers) and store them at the address in
rax. Its ASCII equivalent is
- Finally, our slightly annoying return (
\xC3P0(of course, the
P0here being superfluous text that is never executed).
Then it was some more nonsense at the end to mask the gibberish-like machine code instructions, a few more faces to round everything out, and I had myself an executable string.
Needless to say, I won.
Okay, so it wasn’t the most exhilarating battle. However, 8 hours later and a new arsenal of machine code knowledge, I emerged glorious and victorious.