C strings and javascript

David Konsumer
4 min readAug 20, 2023

--

Pokemons chillin’ on a web.

Often, when doing javascript stuff, it can be helpful to interact with C strings. Even if you are not using C directly, several interfaces use a pretty universal C-ABI to interchange these simple byte-structures in/out.

If you want to skip to a lib to use in your project, see cmem_helpers. It can also do C structs in a nice & easy way.

With WebAssembly, emscripten has some stuff built-in that generates utils in js to interact with these, but if you want to keep the host simple & light, and the WASM minimal, then it’s not so much an option.

I noticed that I was writing roughly the same code for several different types of C-memory access, across several JS projects recently:

  • FFI — bun, node, deno, etc — linking to a native DLL someone else (or you) made
  • Native Node (NAPI) — your code will be in C/C++/rust
  • WASM — your code can be in anything that compiles to WASM, but it can only pass basic number-types across the WASM host-barrier: i32, u32, i64, u64, f32, f64, other things can be passed as pointers (u32) or if it will fit in less bytes (like a u8, for example) it just uses some of a 32-bit type.

A Simple Example

Let’s imagine you have a simple hello function:

// hello.c

#include <string.h>
#include <stdlib.h>

// given a name and return-pointer return greeting string
void hello(char* name, char* result) {
char* h = "Hello ";
strcpy(result, h);
strcat(result, name);
}

Essentially, this will take a string, and return a string with a greeting, like hello("David") => “Hello David". The other C-stuff is just “take a string-pointer, add it to another string, from this address, then put it into this other memory address.”

If you were going to use it in a normal C program (this is not needed for WASM, or a DLL, just standalone program) it would look like this:

// for printf
#include <stdio.h>

int main() {
char* ret = malloc(100);
hello("World", ret);
printf("%s\n", ret);
free(ret);
return 0;
}

This is silly, because I could just printf directly, and not use hello but it’s meant to be a trivial example.

The malloc allocates 100 bytes for the return-value, and later I free it. There are problems here, like what if the return value is greater than 100 bytes? Short answer: buffer-overflow! I am trying to not go off on too many tangents, so we will ignore all that, for now. This will run in WASM, so a lot of important C programming things are not quite as important here, like free or this very real overflow-potential, since the memory is self-contained (WASM sandbox) and it won’t crash your computer, if it does something stupid/terrible, but they are still things to consider, especially if you are running your wasm for a while, and calling functions in it more than once, before destroying the instance.

What is a pointer?

A pointer is an integer that represents an address of some memory. On 64bit systems (like modern native) it’s an u64, but 32bit systems (like WASM) are u32.

What are C-Strings?

In a general/practical sense they are simply a pointer to some bytes of a UTF8 string, that ends with \0 character (called “null-terminated”.) Using this encoding, we do not need to store the bytes-length in another place, we can just look for \0 (the end) for a plain string. In C, this type is represented as char* meaning “it’s a pointer to the start of some bytes” (char or uint8 is a single byte.) It can cause problems if you ever actually need \0 in your string, in which case, you will need to keep track of the length, yourself.

"Hello World" = ['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', 0]

In this case the actual pointer would be to H, since it’s the first byte, and the 0 tells the reader that is the end.

How do I compile this to WASM?

I won’t go too deep into tools & setup, but you can install docker and use a little container I made, that has all the tools I like, and is fairly quick to get started with:

# mount the current directory inside the docker and give me a bash-prompt
docker run -it --rm -v $(pwd):/cart konsumer/null0:latest

# now you are inside the container

# compile hello.c, using wasi-sdk
clang --sysroot=$WASI_SYSROOT -Wl,--export=hello -Wl,--export=free -Wl,--export=malloc -Wl,--no-entry -nostartfiles -o hello.wasm hello.c

# inspect the wasm
wasm-objdump -x hello.wasm

There is a bunch of stuff here, but the key things we care about here are in the Export section: hello, free, malloc and memory.

How do I use this in javascript?

Put this in an .html file:

<script type="module">

const encoder = new TextEncoder()
const decoder = new TextDecoder()

function setString (value, len = 0, pointer) {
if (!len) {
len = value.length + 1
}
if (!pointer) {
pointer = malloc(len)
}
const buffer = encoder.encode(value)
for (let b = 0; b < len; b++) {
mem.setUint8(pointer + b, buffer[b] || 0)
}
return pointer
}

function getString (pointer, len = 0) {
let end = pointer + len
if (!len) {
while (mem.getUint8(end) !== 0) {
end++
}
}
return decoder.decode(mem.buffer.slice(pointer, end))
}

const { instance } = await WebAssembly.instantiateStreaming(fetch("hello.wasm"), {env: {}})
const { malloc, hello, memory, free } = instance.exports
const mem = new DataView(memory.buffer)

const ptrName = setString('World')
const ptrResponse = malloc(100)
hello(ptrName, ptrResponse)
console.log(getString(ptrResponse))

// not strictly required, but also try not to malloc without free
free(ptrName)
free(ptrResponse)

</script>

That’s all there is to it, really.

--

--

David Konsumer

A passionate hardware/software hacker, with a desire to unravel the mysteries of the inner-workings of the universe.