Coding a WebAssembly CTF Challenge

Published in

Tenable TechBlog

10 min readMay 30, 2019

I recently wrote a CTF challenge for my coworkers. The challenge was written using WebAssembly (WASM), a language I initially knew nothing about. I found the language specification and various API descriptions to be quite good, but, while coding, I still found myself googling nearly everything.

Actual video of me writing the challenge

Besides NCCGroup’s excellent Security Chasms of WASM, I haven’t found a writeup describing oddities in WASM that might interest someone like myself. And while I’m no WASM expert, I hope by sharing my experience with the language and toolchain, I can help others explore the weird and wonderful world of WASM.

But before you give this a read, perhaps you want to attempt the challenge yourself? The challenge is browser based, so I’ve hosted it on GitHub here.

About the Source Code

The source is also available on GitHub here. The challenge is just two files: main.c and challenge_shell.html. The C code compiles down to WASM (index.wasm) and Javascript (index.js). The compiler also transforms challenge_shell.html into index.html.

Win the satisfaction of knowing you did a thing

Compilation requires the Emscripten toolchain. Fortunately, the Emscripten maintains great installation instructions. Once the toolchain is installed, the challenge can be compiled using the provided makefile.

Let’s dive into the code and discuss a few things I found interesting.

Executing Before main()

WASM, similar to ELF, supports execution of programmer defined functions before main(). Initially, I assumed WASM had an init_array so I could just use the traditional constructor function attribute.

void __attribute__((constructor)) hello()

While this technically works, hello() does get called before main(), it turns out it isn’t exactly what I wanted. When using the constructor attribute, the generated Javascript creates an array of functions and loops over them.

Javascript generated to call constructor functions before main()

Under normal circumstances, calling hello() from Javascript would be fine. But hello() contains the bulk of anti-debugging logic, and I’d prefer the function not be obviously locatable simply by reading index.js.

Also, I execute the Javascript keyword debugger from hello() to catch use of a developer console. This is the stack trace Emscripten’s constructor implementation will generate when the Firefox web console handles the debugger statement.

HEY EVERYONE, HELLO HAS ANTI-DEBUGGING LOGIC

That stack trace gives the attacker context that I’d prefer they not have.

Switching hello() from a constructor to a start node should fix this. The only problem is that I’ve no idea how to make Emscripten generate a start node. However, the WebAssembly Text (WAT) format of the start node is simple:

(start $start_function)
 - or -
(start 42)

Having no clue if Emscripten could even generate a start node, I resolved to add one myself. To start I redefined hello() so that it no longer used the constructor attribute and wasn’t exportable.

static void hello()

The function name no longer appeared in index.wasm or index.js. But now, in order to create the start node, I need to know hello()’s function index. To figure that out, I converted index.wasm to the more human friendly WAT format using wasm2wat.

albinolobster@ubuntu:~/wasm_challenge$ wasm2wat ./build/index.wasm -o ./build/index.wat

Then I opened index.wat and tracked down the exported functions that I’d written in C.

(export "___syscall12" (func 30))
(export "___syscall18" (func 27))
(export "___syscall188" (func 31))
(export "___syscall42" (func 26))
(export "___syscall72" (func 25))
(export "___syscall80" (func 24))

Each of these functions calls hello() towards the end of the function. For example, __syscall72() looks like this in C:

void EMSCRIPTEN_KEEPALIVE __syscall72(int p_value)
{
    ... compute result ...
    if (result == 1)
    {
        EM_ASM(
        {
            window['console']['log'] = function(param)
            {
                var result = Module.ccall(
                 '__syscall42', 'void', ['number'], [param]);
            }
        });
    }
    else
    {
        hello();
    }
}

When you look at __syscall72(), or function index 25, you should find a call to hello().

(func (;25;) (type 0) (param i32)
  i32.const 2
  local.get 0
  call 14
  i32.const 1
  i32.eq
  if  ;; label = @1
    i32.const 3
    call 13
    drop
  else
    call 33
  end)

The call 33 at the very end should be the call to hello(). Which means the start node should be written as: (start 33). You just need to insert that into index.wat and convert it to index.wasm.

Now, this isn’t exactly production ready or, quite frankly, even remotely stable but I added the following to the makefile to automatically set hello() as the start function.

wasm2wat $(OUTPUT_FOLDER)/index.wasm -o $(OUTPUT_FOLDER)/index.wat
truncate -s -2 $(OUTPUT_FOLDER)/index.wat
echo "\n(start 33))" >> $(OUTPUT_FOLDER)/index.wat
wat2wasm $(OUTPUT_FOLDER)/index.wat -o $(OUTPUT_FOLDER)/index.wasm

Now the developer console generates the following stack trace.

This approach gives the attacker less context and keeps the logic out of Javascript which humans, presumably, find it easier to read.

Not Very Indirect Function Call

Sometimes there is a large gap between how you think something should work and how it actually works. For me, how C function pointers are translated to WASM was one of those things.

Consider the global function pointer defined in main.c.

// Stored indirect call here to be annoying
static void (*g_func_ptr)() = 0;

I was sure attackers relying on static analysis would have to track down where that pointer gets set. I ended up setting the pointer in a conditional in main().

int main(int p_argc, char** p_argv)
{
    // the javascript glue invokes the script this way.
    if (p_argc != 1 || strcmp(p_argv[0], "./this.program") != 0)
    {
        EM_ASM(
        {
            delete window['console']['log'];
            window['console']['log'] = window['console']['assert'];
            exit(0);
        });
    }
    else
    {
        // "call_me_indirectly" should be at address 1 currently.
        g_func_ptr = 1;
    }    return EXIT_SUCCESS;
}

Where g_func_ptr = 1; points to the index or “address” of a function named call_me_indirectly(). At least, according to the output I received from printf("%p\n", call_me_indirectly);

The function pointer, g_func_ptr()eventually gets invoked in __syscall80().

/*
 * This is the first digit handler. Indirectly call 
 * call_me_indirectly. Just to be annoying. p_value is the value
 * passed into "console.log"
 */
void EMSCRIPTEN_KEEPALIVE __syscall80(int p_value)
{
    ... some anti debug stuff ...    // call call_me_indirectly based on index in the lookup table
    g_func_ptr(p_value);
}

But, when I looked at the WAT for __syscall80(), I was pretty mystified.

local.get 0
i32.const 58
call_indirect (type 0)
return

Not only was my “indirect call” using a hard coded value, but it wasn’t even a value that I recognized. What happened to g_func_ptr = 1;?!

After a bit of head scratching, I came across this table in WAT.

(elem (;0;) (global.get 0) 115 40 44 56 51 51 51 115 116 69 67 70 74 78 79 116 117 41 42 45 46 55 61 63 68 68 91 92 93 94 95 96 97 98 99 117 117 117 117 117 117 117 117 117 117 117 117 117 118 119 80 81 119 120 72 89 120 121 29 59 73 85 88 121 121)

This is the global function table. The function that g_func_ptr() is supposed to point to, call_me_indirectly(), is located at index 58. The compiler did all of the static analysis work! No one has to hunt down where g_func_ptr() gets set because it was already resolved! Just index into the global table and keep on going.

Although, I think it’s kind of sketchy since I set g_func_ptr() in a conditional branch.

Either way. The case still isn’t closed! Why does g_func_ptr = 1; generate a call_indirect to the 58th entry in the global function table!? Well, it turns out that all the functions in the table are grouped by their type. Remember how g_func_ptr() is used:

// call call_me_indirectly based on index in the lookup table
g_func_ptr(p_value);

WASM is very conscious about function types. The following types, expressed in WAT, are defined in the challenge:

(type (;0;) (func (param i32)))
(type (;1;) (func (param i32) (result i32)))
(type (;2;) (func (param i32 i32 i32 i32) (result i32)))
(type (;3;) (func (param i32 i32 i32) (result i32)))
(type (;4;) (func (param i32 i32) (result i32)))
(type (;5;) (func (param i32 i32 i32 i32 i32) (result i32)))
(type (;6;) (func))
(type (;7;) (func (result i32)))
(type (;8;) (func (param i32 i32)))
(type (;9;) (func (param i32 i32 i32 i32 i32 i32)))
(type (;10;) (func (param i32 i32 i32 i32)))
(type (;11;) (func (param f64 f64) (result f64)))
(type (;12;) (func (param f64) (result f64)))
(type (;13;) (func (param i32 i32 i32 i32 i32 i32) (result i32)))

In the global function table, functions 57–63 are all of type 0 (one i32 parameter and no return value). call_me_indirectly() is found at index 58 which happens to be the second entry in the type 0 group. Or, in other words, call_me_indirectly() is index 1 of the type 0 group. Which is why, g_func_ptr = 1; works in the C code.

While I was surprised to learn how function pointers are translated from C to WASM, I ended up not changing this part of the challenge. Anyone doing static analysis will also have to learn or know about the global function table which seems good enough to me.

Runtime Compilation and Execution

Javascript can compile and execute byte arrays of WASM at runtime. Here is a very simple example from __syscall72().

int result = EM_ASM_INT(
{
    /**
     * int oh_no(int p_pressed_key) {
     *     if (p_pressed_key == 9) {
     *       return 1;
     *     }
     *     return 0;
     * }
     */
    var wasm = new Uint8Array([
        0,97,115,109,1,0,0,0,1,134,128,128,128,0,1,96,1,127,1,127,
        3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,
        128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,146,128,128,128,
        0,2,6,109,101,109,111,114,121,2,0,5,111,104,95,110,111,0,0,
        10,141,128,128,128,0,1,135,128,128,128,0,0,32,0,65,9,70,11
    ]);    var module = new WebAssembly.Module(wasm);
    var module_instance = new WebAssembly.Instance(module);
    var result = module_instance.exports.oh_no($0);
    return result;
}, p_value);

In this example, the WASM in the byte array compares a passed in value against the number 9.

When I learned about this, my question was, “Alright, but how do I actually generate the byte array?” I’ll explain two ways. The first way is to WASM Fiddle.

WASM Fiddle, like many of the other fiddles, is a nice platform for sharing code snippets. It also happens to provide different output options for WASM. One option is a Javascript Uint8Array you can quickly copy and paste into your code.

The other way, of course, is to use your compiler. In my case, that’s Enscriptem’s emcc.

In the above, you can see that I used the following to compile main.c:

emcc -O1 -s ONLY_MY_CODE=1 -s WASM=1 main.c -o main.html

I then converted the compiled WASM to a C array using xxd so I could easily copy and paste it into C code.

/*
 * int oh_no(int p_pressed_key) { 
 *  if (p_pressed_key == 4) {
 *      return 1;
 *  }
 *  return 0;
 * }
 */
const char wasm[43] =
{
    0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x06,
    0x01, 0x60, 0x01, 0x7f, 0x01, 0x7f, 0x03, 0x02, 0x01, 0x00,
    0x07, 0x0a, 0x01, 0x06, 0x5f, 0x6f, 0x68, 0x5f, 0x6e, 0x6f,
    0x00, 0x00, 0x0a, 0x09, 0x01, 0x07, 0x00, 0x20, 0x00, 0x41, 
    0x04, 0x46, 0x0b
};int result = EM_ASM_INT(
{
    // read the C array into a uint8array
    var wasm_array = new Uint8Array($2);
    for (var i = 0; i < $2; i++)
    {
        wasm_array[i] = getValue($1 + i);
    }    // compile and execute
    var module = new WebAssembly.Module(wasm_array);
    var module_instance = new WebAssembly.Instance(module);
    var result = module_instance.exports._oh_no($0);
    return result;
}, p_value, wasm, 43);

One interesting thing about the two examples I just provided is that it does matter where the WASM array is defined (e.g. C vs. Javascript). Anything defined in EM_ASM, the Javascript interface in C, will appear in index.js. However, an array defined in C is stored in index.wasm which makes it more difficult to find and read.

The fun thing about these arrays is that you can modify them anyway you want before passing them to the compiler. For example, in __syscall18() I stored the WASM bytecode as xor encoded data.

/*
 * int oh_no(int p_pressed_key) { 
 *  if (p_pressed_key == 7) {
 *      return 1;
 *  }
 *  return 0;
 * }
 */
char wasm[97] =
{
    170,203,217,199,171,170,170,170,171,44,42,42,42,170,171,202,
    171,213,171,213,169,40,42,42,42,170,171,170,174,46,42,42,
    42,170,171,218,170,170,175,41,42,42,42,170,171,170,171,172,
    43,42,42,42,170,170,173,56,42,42,42,170,168,172,199,207,
    199,197,216,211,168,170,175,197,194,245,196,197,170,170,160,39,
    42,42,42,170,171,45,42,42,42,170,170,138,170,235,173,236,
    161,
};for (int i = 0; i < 97; i++)
{
    wasm[i] = (wasm[i] ^ 0xaa) & 0xff;
}int result = EM_ASM_INT(
{
    // read the C array into a uint8array
    var wasm_array = new Uint8Array($2);
    for (var i = 0; i < $2; i++)
    {
        wasm_array[i] = getValue($1 + i);
    }    // compile and execute
    var module = new WebAssembly.Module(wasm_array);
    var module_instance = new WebAssembly.Instance(module);
    var result = module_instance.exports.oh_no($0);
    return result;
}, p_value, wasm, 97);

For many years now, we’ve seen obfuscated Javascript. It should be interesting to see what the world does with obfuscated WASM.

Conclusion

WebAssembly is a neat new technology that is going to increase in popularity. As it sees wider acceptance, developers will find new and innovative ways to make it do terrible things. As security experts, it’s important that we keep up with these developments. Writing, breaking, and sharing CTF challenges is the perfect way to expose people in our field to new and weird technologies.