Learning and trying to reverse web assembly for fun!

RIXED LABS
RIXED_LABS
Published in
11 min readDec 31, 2021

0x00: Pretext

A few days ago, I along with Kumar from AX1AL decided to try web assembly, learn a bit about it, and maybe experiment a bit like compiling C to web assembly and something similar, and maybe disassembling it and trying to understand the raw byte code generated, so as curious nerds who didn’t know much about it, we googled up and stumbled upon two really great videos which gave us an understanding of what actually Web Assembly is! Therefore the next step we thought to document this journey in form of a small blog and quickly sorted out a few contents, which we are going to try and write about in this blog. We feel we are extremely novice in this Web Assembly stuff, so if you find any issues, please let us know in the comments, would be more than happy to fix it and learn.

0x01: Contents

  • Getting started with Web assembly.
  • Understanding some basic terminologies.
  • Compiling WASM from C and analyzing it.
  • Ways to compile from Rust, Go , Typescript to WASM.
  • WASM to Disassembly (using a simple program)
  • Resources & Credits.
  • Author’s two cents.

0x02: Getting Started with Web Assembly

As a novice, the first step was figuring out what actually web assembly was, after going through these really awesome videos by Guy Royse & Lin Clark we decided to make a small mind map depicting what we understood in a novice manner using mind maps.

Although, after reading a few blogs, we made this small mind map to demonstrate how web assembly works along with JavaScript which makes things faster, and compatible with existing programmers who enjoy writing programs in C/Rust. During this tenure we came up with some myths we would like to drop our two cents on:

[1] Is Web assembly really a type of assembly language which is architecture dependent?

  • No, unlike assembly language where we have mnemonics, and instruction set architecture which varies from one processor to another, Web assembly is designed to run compatibly on all processors’ browsers, the high level languages while we compile them to target WASM, byte code is generated not the assembly code, if you are curious about the difference about byte-code and assembly code, check this out.

[2] Do really all browsers support web assembly?

  • Not really, all major browsers only support running WASM files, old version based browsers do not have a support of it yet.

[3] Will Web-Assembly replace JavaScript or is WASM better than JS?

  • As a novice who has really less hands on both of them, it would be better if the reality of this myth is phased as “WASM is not really here to replace JavaScript, but to work along with it to make things faster”, although some legitimate people from Mozilla Research itself stated the facts with proof that WASM is way fast than JS, but as this blog is focused on reversing also we would not like to turn this into a holy war on why JS is bad or WASM is good, each of them has it’s limitation and specifications.

After this basic introduction to web assembly, we will move forward to understanding some basic terminologies of web assembly and how to set things up for doing further tinkering with WASM.

0x03: Understanding some basic terminologies

Before analyzing a program, we will now get through some basic terminologies which might speed up our understanding while we encounter some weird terms. While we compile our high level languages, to Web Assembly using Emscripten we are left with three files, one of them the HTML, other the JS file and one weird file with an extension .wasm , since we planned on understanding but wait running file output.wasm gives us :

test.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)

😕How do we read this now?

  • Web Assembly binary toolkit at the rescue, the wasm2wat parses the binary and generates the equivalent text readable file, although this was not discussed prior in the blog that:
    WAT -> Web Assembly Textual format 

Although the emscripten generates the .wasm file which is a binary file, we can also retrieve the .wat which is a bit easy to read. We will not leave the readers confused, the entire process will be demonstrated step by step in the upcoming sections.

Binary format or (.wasm)

00000000: 0061 736d 0100 0000 0192 8180 8000 1660  .asm...........`
00000010: 037f 7f7f 017f 6001 7f01 7f60 0001 7f60 ......`....`...`
00000020: 017f 0060 0000 6002 7f7f 017f 6003 7f7e ...`..`.....`..~
00000030: 7f01 7e60 047f 7e7e 7f00 6005 7f7f 7f7f ..~`..~~..`.....
00000040: 7f01 7f60 067f 7c7f 7f7f 7f01 7f60 027f ...`..|......`..
00000050: 7f00 6002 7e7f 017f 6004 7f7f 7f7f 017f ..`.~...`.......
00000060: 6002 7c7f 017c 6002 7e7e 017c 6007 7f7f `.|..|`.~~.|`...
..........
....

Textual format or (.wat)

(module
(type $FUNCSIG$ii (func (param i32) (result i32)))
(import "env" "puts" (func $puts (param i32) (result i32)))
(table 0 anyfunc)
(memory $0 1)
(data (i32.const 16) "Hello World\00")
..........further text....
..........
....

After we retrieve some equivalent understandable text format, let us understand what these words like “module” , “type”, “import” and other actually mean?

Let us understand them one by one :

Module : This component of the textual format signifies that web assembly programs are organized into modules consisting of various fields like type , import , table , memory , data etc and sometimes some more components.

type : This component of the textual format defines a vector of function types.

func : This component of the textual format defines functions like other programming languages.

param i32 : This component of the textual format defines parameter and it’s type integer 32-bits.

result i32 : This component of the textual format defines return type integer 32-bits.

import : This component of the textual format defines a set of imports that are required for instantiation.

func $puts (param i32) (result i32)))

This is pretty similar to the original “puts” function we used in our C program to print the “Hello World”,where func is used to declare a function , puts is the name of the function, then param i32 defines that it takes a parameter of 32 bits and returns a 32-bit integer.

table 0 anyfunc : This component of the textual format is reserved for defining tables, which are pretty similar to a linear memory, here 0 means that we have nothing in our table.

memory $0 1 : This component of the textual format is the linear memory used by the module and signifies that we are using memory , these are in unit of page size.

data : This component of the textual format is used to initialize a range of memory from a static vector of bytes.

Now, as we are a bit familiar with these terms, in the next section we will focus on setting up the environment for converting our first C program and check out the equivalent WAT generated .

0x04: Compiling WASM from C and analyzing it

Before we move ahead to generate the WASM from C, we definitely need to setup our environment, I made a small installer script you can get that from above, copy that small script from the above link, save it to your local machine, apply appropriate execute permissions. Then run it and once it executes without any error, run this command:

emcc --version

If you end up with :

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 2.0.26 (5af6a110ee58ca4a6885fbcad830c92207fefea2)
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Good Job! We are done with our setup for compiling C programs to WASM, now open your favorite text editor and save this small C program:

#include<stdio.h>int main(void) {
printf("My first program in WASM\n");
return 0;
}

Now, save this program in a file named first.c and run this command:

emcc first.c -s WASM=1 -o first.html

Now, WASM=1 flag denotes that we want a WASM output, next while we are trying to read this binary file, it is surely a tedious task, but not an impossible one, so we will try out Web Assembly Binary Toolkit or WABT (mostly pronounced as wabbit) to convert the wasm binary to wat or readable form.

You can use the above script, provide it with appropriate permissions you get your setup ready. Next copy the .wasm file to this build folder where you will land up after running the script. Then run this following command:

./wasm2wat first.wasm -o first.wat

After running this command you will land up with a really huge output if you could view it with any random text editor, I pasted that in a gist you can check that out here. Well , honestly after looking at this we nearly had a stroke so I immediately asked this question into the Web Assembly discord that why are there so many weird functions and a generous soul broke it down for me, here are the snippets of the chat I would like to add ⏬

Therefore, I came along this site, which generated a simple output of our program first.wasm :

This looks quite simple, sweet and easy to explain compared to 5k lines of output! Let us understand those above 17 lines:

  • Line 1

➡️ module

Just the starting of the web assembly module, if you could check out the hex dump of the .wasm binary those are the magic numbers which indicate the starting of wasm file.

  • Line 2
(type $FUNCSIG$ii (func (param i32) (result i32)))

This line describes type section which contains all the function signatures which are to be used in this module . Here func (param i32) (result i32) denotes the function signature.

  • Line 3
(import “env” “puts” (func $puts (param i32) (result i32)))

This line describes all the imports the module need, the “puts” function is called and then the function signature for puts function.

  • Line 4
(table 0 anyfunc)

This line describes a table this section is reserved for defining zero, a table is quite similar to linear memory just they are resizable arrays which has references here 0 is the reference which says we have nothing in our table, but we are providing MVP’s value of anyfunc which refers to a function.

  • Line 5
(memory $0 1)

This line is pretty simple, here memory refers to the memory used by our module, which is of 64KB or one page.

  • Line 6
(data (i32.const 16) "My first program in WASM\00")

This line describes the section data , this section declares the initialized data that is to be loaded into the memory also contains an i32 initializer expression that computes the offset at which the data is to be placed,and finally the data.

  • Line 7 & 8
(export "memory" (memory $0))
(export "main" (func $main))

This line describes declares the exports which are to be returned at the instantiation to the host environment, the export has fields mainly a name , type which indicates the export is a function, global or memory, here memory & the function main is exported.

  • Line 9–17
(func $main (; 1 ;) (result i32)
(drop
(call $puts
(i32.const 16)
)
)
(i32.const 0)
)
)

Now, after understanding after exports, imports and other parts of the program let us look into functions, here in the 9th line, the function signature is declared along with a number (;1;) and the return type with the name, then in the 10th line drop instruction has been used which is a type-parametric operator instruction which basically has been used to pop a value from the stack, or in simple terms drop is an unary operator that discards the value of its operand. then on the 11th line puts function has been called which was previously exported inside the module using the call instruction then on the next line aka 12th the value My first program in WASM is pushed onto stack via the instruction i.32 const 16 here constant is the string we are printing, then on the 15th line another value 0 is pushed onto the stack using this instruction i.32 const 0 it’s just similar to the above, just the value of const is 0 in this line. So, here we tried to understand a WAT file in next sections, we will be looking forward to convert other programming languages to WASM.

0x05: Ways to compile from Rust, Go , Typescript to WASM

In this section, we will not discuss each on of them, how to convert these high level languages to WASM, instead we leave this up-to readers, if you want to try them out, check out the resources!

0x06: WASM to Disassembly (using a simple program)

In this section, we will do a little exploration how to convert and understand an extremely simple disassembly generated using from a WASM using WASMFiddle , here is the simple program :

int main() { 
return 42;
}

wasm-function[0]:
sub rsp, 8 ; 0x000000 48 83 ec 08
mov eax, 0x2a ; 0x000004 b8 2a 00 00 00
nop ; 0x000009 66 90
add rsp, 8 ; 0x00000b 48 83 c4 08
ret ; 0x00000f c3

This is just a simple disassembly generated from the above site, where wasm-function[0] refers to the main function, then sub rsp, 8 this is the prologue, then the mov eax, 0x2a here the return 42 or in simple words 42 is moved inside eax register, then nop stands for no-operation , then add rsp, 8 is the epilogue and finally the ret which basically passes control to the return address. This was an extremely basic example, let us add a function to the above program.

C program:

#include<stdio.h>int somenumber(){
int a = 78;
printf("%d\n", a);
return 0;
}
int main() {
puts("Hello World");
somenumber();
return 42;
}

This time the x86–64 assembly equivalent generated is extremely weird to understand and time confusing, so we take a alternative a binary analysis tool known as JEB Decompiler we will load the binary into it, we downloaded the the community version which was enough for us to understand and save our time.

Now, we load the WASM binary inside this analysis tool and find out there are two functions somenumber & main as expected. Now we will go through both the functions and instructions one by one.

Now, let us understand the instructions one by one `

Resources:

  1. https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format

--

--