How I wasted a whole day trolling with assembly

9 min readMay 11, 2017

So Sindre Sorhus and I were exchanging troll-ish code back and forth the other day in competitive fashion. He mentioned I should make a post about it. Here we are.

At first, it was my seemingly un-compilable C++ snippet.

Outputs “30”.

The secret? C++11 and onward apparently allows non-ascii characters in identifiers, as long as they aren’t the first or last.

This will definitely not get you fired from your job. 🦄

So I (ab)used the UNICODE ZERO WIDTH SPACE U+200B character and sprinkled a few of them in there. The different lol's are really different identifiers.

Neat! And confusing.

Sindre retaliated. Things were heating up.

30. That outputs 30. I’d been had. Leave it to JavaScript to act completely irrationally.

What’s going on here? This nonsense:

+[] is the equivalent of Number([]), which in JavaScript yields 0.
Prefixing the above with ! is the equivalent of !0, or simply 1.
The first array that is created is simply !+[] repeated three times and summed (!+[] + !+[] + !+[]), yielding 3.
That 3 is then the first element of an array, yielding [3].
That array is added to another array with the value of +[], which we determined yields 0. The second array is thus [0].
When you add two arrays, they are first coerced to strings ("3" and "0", respectively) and the resulting strings are concatenated.

Thus finally yielding "30".

He then went on to show me up even more.

For those of you that don’t want to wait for your terminal to paste all 24206 characters into Node, it simply outputs "Qix".

Touché.

I had to pull out all the stops. I went deep into my bag of tricks and recalled Dr. Tom Murphy VII’s SIGBOVIK entry that was both a human readable, fully printable text file that was also a fully functioning DOS executable (no compilation needed).

That’s to say, his human readable text file consisted of just the right selection of bytes that corresponded to just the right machine code instructions to run a program. All of those bytes are considered printable (\r, \n, and 0x20-0x7E).

I decided to do the same.

I hadn’t fully read Murphy’s paper, and I considered that half of the fun.

Needing to know exactly what kinds of instructions I could use (can I return? push? pop? add? XOR?), I pulled up my favorite x86_64 reference, used some handy JavaScript:

JSON.stringify([].slice.call(document.querySelectorAll('table.ref_table tbody tr')).map(row => [row.children[0].innerHTML, row.children[8]?row.children[8].innerHTML:null, row.children[21]?row.children[21].innerHTML:null]))

and produced a JSON string of just the opcodes and their descriptions, pulled that into a local Node REPL and used a quick filter based on if the opcode’s corresponding byte was printable.

I realized two things: opcodes aren’t necessarily the only bytes in the instruction (the bad news), but there are a lot of things you can do with printable opcodes (the good news).

Needing to test these different instructions, I set up a quick little command line tool that would show me the resulting machine code bytes along with their printable text, just so I could experiment.

$ (fasm /dev/stdin /dev/stderr > /dev/null) 2>&1 <<< “use64^Mpush rax” | xxd
00000000: 50                                       P

Beautiful. The first test was promising; pushing the value in the RAX register onto the stack was a single printable byte P. This was going better than I expected.

I experimented with a few more instructions (use64^M was removed from fasm's input string for brevity):

$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "push rdi" | xxd
00000000: 57                                       W$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "push rcx" | xxd
00000000: 51                                       Q$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "pop rax" | xxd
00000000: 58                                       X$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "pop rcx" | xxd
00000000: 59                                       Y$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "pop rdi" | xxd
00000000: 5f                                       _

Too easy. After about 3 hours of toying around with different instructions, I ended up finding a few that were quite useful:

$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "sub eax, dword 0x55555555" | xxd
00000000: 2d55 5555 55                             -UUUU$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "xor [eax], edi" | xxd
00000000: 6731 38                                  g18$ (fasm /dev/stdin /dev/stderr >/dev/null) 2>&1 <<< "jns 113" | xxd
00000000: 796f                                     yo

Yes, that’s right. Jump near (relative to the current position) with an argument that is an immediate (aka “literal”) operand (argument) that is within the bounds of a signed 8-bit integer (in this case, 113) yields the two printable bytes yo.

I also found out, through farting around enough, that <.< (with the space) is an innocuous series of AL register comparisons (I wasn’t using the zero flag at all, so it didn’t apply to my code). Of course, I was going to throw that in there as much as I could.

My toolbelt was defined, and I couldn’t have been happier.

Well I could have, if ret or any of its variants (e.g. pop rax; jmp rax, which yields the bytes 58 ff e0) were printable. The closest I could do was simply ret itself, which was \xC3. Naturally, I went with \xC3P0.

After a few hours of pacing, thinking and determining just exactly what I wanted to do, I determined I wanted to use a string as a function and output something simple, like “Hi!”.

Coincidentally, Hi!\0 fits into four bytes. An integer. Way too perfect, especially since a few 64-bit instruction variants yielded unprintable characters — since we were only dealing with 32-bits overall (four bytes in our Hi!\0 string), we only needed the top 32 bits of our registers to work with the data, thus eax and equivalent extended register instructions would suffice (especially since our machines are usually little endian).

It was settled. I was going to overwrite the first four bytes of my string “function” to the bytes "Hi!\0" and then return the string itself back to something like puts().

I drafted up some harness code to cast a string to a function pointer and ended up with the following:

It worked:

W pushes rdi, which is the register the first argument of x86_64 function calls is passed into on System-V-type ABIs. (hint: this is the cheat-sheet I’ve come to rely on).
X pops that value into rax, the register used to pass return values back to the caller (also according to the System V AMD64 ABI).
\xC3 returns execution to the caller. For the less machine-code inclined, this is the equivalent of popping the return address off of the stack and jmp-ing to it.

LLDB confirmed it as well:

Process 21727 stopped
* thread #1: tid = 0x3ad74, 0x0000000100000fa6 testlol, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100000fa6 testlol
->  0x100000fa6: pushq  %rdi
    0x100000fa7: popq   %rax
    0x100000fa8: retq

I casted that string to a function pointer that takes a string and returns a string, and called it, using the same string. It is the equivalent of:

The above code transfers execution to the address of the string’s content (its bytes) and begins execution.

The stage was set. I needed to somehow transform the first four bytes of the string into Hi!\0. I determined I needed a string to begin with, and wanted to make it flashy, because what would be the point otherwise?

I remembered a few of my branch instructions that were printable and the one that stood out to me was yo, which equates to if the ‘sign’ flag is not set, then jump to the position relative to the next signed 8-bit integer ‘0’, or 113 bytes forward.

Perfect. I had 113 bytes of runway after the initial yo.

you should have expected this to get out of hand, Sindre. 💃😈this is almost entirely your fault!! 🙈💩

Yep. I went there. I used UTF-8 encoded emojis (anywhere from 3 to 5 bytes). The above string is perfectly 115 bytes (minus the yo and you get 113) of Grade-A unicode cheese.

The next instructions were crucial. I determined I could xor the first four bytes with a constant to get my desired result. This yielded 0x20540631 — which is so close to being printable (but the 0x06 messed up my day).

Alright, so what could I do? I remembered subtraction (sub) was printable with an immediate (literal) value. What if I started with a constant integer literal, subtracted from it another constant integer literal to arrive at the xor product, and then xor'd that with the first four bytes of the string to produce "Hi!\0"? Both the initial value and the subtracted value needed to be printable in binary form. Could there really be a pair of integers that existed to conform to such a constraint?

I wrote a dirty script in JavaScript to start, using an elaborate chain of buffers, strings, etc. I won’t recreate it here since it was slow and my /tmp directory has since been cleared out. Rest assured it was useless.

So off to C land it was. Since I knew I was going to have to subtract, and I wanted to keep runtimes to a minimum, I decided to start at the xor product and iterate upward. If I didn’t find any matches, I’d then re-write the loop to start at 0.

Here was the test case:

Don’t you just love C?

I was instantly inundated with more choices than I could handle. Take that, JavaScript!

I immediately proofed a few of the results to make sure the math was solid (it was), picked a pair at random, and went to work on my final product — 7 and a half hours later (assembly is tricky, okay?).

30 minutes later I had a working function string. Two things I remembered (and consequently experienced slight pain reflexes from my time of reverse engineering Combat Arms all those years ago) were that:

In most situations, you cannot write to statically allocated memory (causing my xor instruction to crash with an access violation)
In most situations, you cannot execute anything but statically allocated memory (causing another access violation when casting a strdup()'d copy of the string to the function pointer).

Of course, you can change permissions on the memory, but that was too over the top for le simple troll.

So I settled on the happy medium of passing in a strdup()'d copy of itself to the statically-casted function pointer, et voilà!

Let’s break it down:

yo- jump 113 bytes forward if the sign flag is not set (it never is). Lands on the W after the shit emoji 💩.
I transfer rdi (the first argument) to rcx to store it (push rdi and pop rcx, or in ASCII, WY). I needed to do this since the xor operation required edi and eax to be printable; xor-ing eax and ecx yielded a non-printable byte.
<.< , just for fun.
We push the first literal of the pair I chose onto the stack. This gives us h (push literal dword) [+}Q (the initial value).
I added a few more <.< ‘s, for good measure.
X- pop the value into rax.
Then it was time for our subtraction. First we have -, or subtract literal dword from rax, and then our 32-bit literal *%)1. At this point, eax had the value I needed to xor into the first four bytes of our string to yield "Hi!\0".
It was then time to set up the very specific xor. I needed the destination address in rax and the value to XOR in rdi. Subsequently, this would set me up nicely to just return the string’s address itself as the return value (in rax).
First, the juggling act: push rax (P), another face, pop rdi (_), push rcx (Q), and then pop rax (X) — which left our to-be XOR’d value in rdi and the address of the beginning of the string in rax.
Then, the XOR itself: xor [rax], rdi which means XOR the value in rdi with the contents of rax (which works because it’s a little endian system, so our XOR value is in the first 4 bytes of the 8 byte registers) and store them at the address in rax. Its ASCII equivalent is H18.
Finally, our slightly annoying return (ret) instruction: \xC3P0 (of course, the P0 here being superfluous text that is never executed).

Then it was some more nonsense at the end to mask the gibberish-like machine code instructions, a few more faces to round everything out, and I had myself an executable string.

It appears my powerline is messed up. Oh bother.

Needless to say, I won.

Okay, so it wasn’t the most exhilarating battle. However, 8 hours later and a new arsenal of machine code knowledge, I emerged glorious and victorious.

Cheers!

How I wasted a whole day trolling with assembly

Written by Qix