Coding a WebAssembly CTF Challenge

Jacob Baines
May 30 · 10 min read

I recently wrote a CTF challenge for my coworkers. The challenge was written using WebAssembly (WASM), a language I initially knew nothing about. I found the language specification and various API descriptions to be quite good, but, while coding, I still found myself googling nearly everything.

Besides NCCGroup’s excellent Security Chasms of WASM, I haven’t found a writeup describing oddities in WASM that might interest someone like myself. And while I’m no WASM expert, I hope by sharing my experience with the language and toolchain, I can help others explore the weird and wonderful world of WASM.

But before you give this a read, perhaps you want to attempt the challenge yourself? The challenge is browser based, so I’ve hosted it on GitHub here.

About the Source Code

The source is also available on GitHub here. The challenge is just two files: main.c and challenge_shell.html. The C code compiles down to WASM (index.wasm) and Javascript (index.js). The compiler also transforms challenge_shell.html into index.html.

Compilation requires the Emscripten toolchain. Fortunately, the Emscripten maintains great installation instructions. Once the toolchain is installed, the challenge can be compiled using the provided makefile.

Let’s dive into the code and discuss a few things I found interesting.

Executing Before main()

WASM, similar to ELF, supports execution of programmer defined functions before main(). Initially, I assumed WASM had an init_array so I could just use the traditional constructor function attribute.

While this technically works, hello() does get called before main(), it turns out it isn’t exactly what I wanted. When using the constructor attribute, the generated Javascript creates an array of functions and loops over them.

Under normal circumstances, calling hello() from Javascript would be fine. But hello() contains the bulk of anti-debugging logic, and I’d prefer the function not be obviously locatable simply by reading index.js.

Also, I execute the Javascript keyword debugger from hello() to catch use of a developer console. This is the stack trace Emscripten’s constructor implementation will generate when the Firefox web console handles the debugger statement.

That stack trace gives the attacker context that I’d prefer they not have.

Switching hello() from a constructor to a start node should fix this. The only problem is that I’ve no idea how to make Emscripten generate a start node. However, the WebAssembly Text (WAT) format of the start node is simple:

Having no clue if Emscripten could even generate a start node, I resolved to add one myself. To start I redefined hello() so that it no longer used the constructor attribute and wasn’t exportable.

The function name no longer appeared in index.wasm or index.js. But now, in order to create the start node, I need to know hello()’s function index. To figure that out, I converted index.wasm to the more human friendly WAT format using wasm2wat.

Then I opened index.wat and tracked down the exported functions that I’d written in C.

Each of these functions calls hello() towards the end of the function. For example, __syscall72() looks like this in C:

When you look at __syscall72(), or function index 25, you should find a call to hello().

The call 33 at the very end should be the call to hello(). Which means the start node should be written as: (start 33). You just need to insert that into index.wat and convert it to index.wasm.

Now, this isn’t exactly production ready or, quite frankly, even remotely stable but I added the following to the makefile to automatically set hello() as the start function.

Now the developer console generates the following stack trace.

This approach gives the attacker less context and keeps the logic out of Javascript which humans, presumably, find it easier to read.

Not Very Indirect Function Call

Sometimes there is a large gap between how you think something should work and how it actually works. For me, how C function pointers are translated to WASM was one of those things.

Consider the global function pointer defined in main.c.

I was sure attackers relying on static analysis would have to track down where that pointer gets set. I ended up setting the pointer in a conditional in main().

Where g_func_ptr = 1; points to the index or “address” of a function named call_me_indirectly(). At least, according to the output I received from printf("%p\n", call_me_indirectly);

The function pointer, g_func_ptr()eventually gets invoked in __syscall80().

But, when I looked at the WAT for __syscall80(), I was pretty mystified.

Not only was my “indirect call” using a hard coded value, but it wasn’t even a value that I recognized. What happened to g_func_ptr = 1;?!

After a bit of head scratching, I came across this table in WAT.

This is the global function table. The function that g_func_ptr() is supposed to point to, call_me_indirectly(), is located at index 58. The compiler did all of the static analysis work! No one has to hunt down where g_func_ptr() gets set because it was already resolved! Just index into the global table and keep on going.

Although, I think it’s kind of sketchy since I set g_func_ptr() in a conditional branch.

Either way. The case still isn’t closed! Why does g_func_ptr = 1; generate a call_indirect to the 58th entry in the global function table!? Well, it turns out that all the functions in the table are grouped by their type. Remember how g_func_ptr() is used:

WASM is very conscious about function types. The following types, expressed in WAT, are defined in the challenge:

In the global function table, functions 57–63 are all of type 0 (one i32 parameter and no return value). call_me_indirectly() is found at index 58 which happens to be the second entry in the type 0 group. Or, in other words, call_me_indirectly() is index 1 of the type 0 group. Which is why, g_func_ptr = 1; works in the C code.

While I was surprised to learn how function pointers are translated from C to WASM, I ended up not changing this part of the challenge. Anyone doing static analysis will also have to learn or know about the global function table which seems good enough to me.

Runtime Compilation and Execution

Javascript can compile and execute byte arrays of WASM at runtime. Here is a very simple example from __syscall72().

In this example, the WASM in the byte array compares a passed in value against the number 9.

When I learned about this, my question was, “Alright, but how do I actually generate the byte array?” I’ll explain two ways. The first way is to WASM Fiddle.

WASM Fiddle, like many of the other fiddles, is a nice platform for sharing code snippets. It also happens to provide different output options for WASM. One option is a Javascript Uint8Array you can quickly copy and paste into your code.

The other way, of course, is to use your compiler. In my case, that’s Enscriptem’s emcc.

In the above, you can see that I used the following to compile main.c:

I then converted the compiled WASM to a C array using xxd so I could easily copy and paste it into C code.

One interesting thing about the two examples I just provided is that it does matter where the WASM array is defined (e.g. C vs. Javascript). Anything defined in EM_ASM, the Javascript interface in C, will appear in index.js. However, an array defined in C is stored in index.wasm which makes it more difficult to find and read.

The fun thing about these arrays is that you can modify them anyway you want before passing them to the compiler. For example, in __syscall18() I stored the WASM bytecode as xor encoded data.

For many years now, we’ve seen obfuscated Javascript. It should be interesting to see what the world does with obfuscated WASM.

Conclusion

WebAssembly is a neat new technology that is going to increase in popularity. As it sees wider acceptance, developers will find new and innovative ways to make it do terrible things. As security experts, it’s important that we keep up with these developments. Writing, breaking, and sharing CTF challenges is the perfect way to expose people in our field to new and weird technologies.

Tenable TechBlog

Learn how Tenable organizes itself, architects, builds and operates the systems that help keep you safe

Jacob Baines

Written by

Tenable TechBlog

Learn how Tenable organizes itself, architects, builds and operates the systems that help keep you safe