C strings and javascript
Often, when doing javascript stuff, it can be helpful to interact with C strings. Even if you are not using C directly, several interfaces use a pretty universal C-ABI to interchange these simple byte-structures in/out.
If you want to skip to a lib to use in your project, see cmem_helpers. It can also do C structs in a nice & easy way.
With WebAssembly, emscripten has some stuff built-in that generates utils in js to interact with these, but if you want to keep the host simple & light, and the WASM minimal, then it’s not so much an option.
I noticed that I was writing roughly the same code for several different types of C-memory access, across several JS projects recently:
- FFI — bun, node, deno, etc — linking to a native DLL someone else (or you) made
- Native Node (NAPI) — your code will be in C/C++/rust
- WASM — your code can be in anything that compiles to WASM, but it can only pass basic number-types across the WASM host-barrier:
i32
,u32
,i64
,u64
,f32
,f64
, other things can be passed as pointers (u32
) or if it will fit in less bytes (like au8
, for example) it just uses some of a 32-bit type.
A Simple Example
Let’s imagine you have a simple hello
function:
// hello.c
#include <string.h>
#include <stdlib.h>
// given a name and return-pointer return greeting string
void hello(char* name, char* result) {
char* h = "Hello ";
strcpy(result, h);
strcat(result, name);
}
Essentially, this will take a string, and return a string with a greeting, like hello("David") => “Hello David"
. The other C-stuff is just “take a string-pointer, add it to another string, from this address, then put it into this other memory address.”
If you were going to use it in a normal C program (this is not needed for WASM, or a DLL, just standalone program) it would look like this:
// for printf
#include <stdio.h>
int main() {
char* ret = malloc(100);
hello("World", ret);
printf("%s\n", ret);
free(ret);
return 0;
}
This is silly, because I could just printf
directly, and not use hello
but it’s meant to be a trivial example.
The malloc
allocates 100 bytes for the return-value, and later I free
it. There are problems here, like what if the return value is greater than 100 bytes? Short answer: buffer-overflow! I am trying to not go off on too many tangents, so we will ignore all that, for now. This will run in WASM, so a lot of important C programming things are not quite as important here, like free
or this very real overflow-potential, since the memory is self-contained (WASM sandbox) and it won’t crash your computer, if it does something stupid/terrible, but they are still things to consider, especially if you are running your wasm for a while, and calling functions in it more than once, before destroying the instance.
What is a pointer?
A pointer is an integer that represents an address of some memory. On 64bit systems (like modern native) it’s an u64
, but 32bit systems (like WASM) are u32
.
What are C-Strings?
In a general/practical sense they are simply a pointer to some bytes of a UTF8 string, that ends with \0
character (called “null-terminated”.) Using this encoding, we do not need to store the bytes-length in another place, we can just look for \0
(the end) for a plain string. In C, this type is represented as char*
meaning “it’s a pointer to the start of some bytes” (char
or uint8
is a single byte.) It can cause problems if you ever actually need \0
in your string, in which case, you will need to keep track of the length, yourself.
"Hello World" = ['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', 0]
In this case the actual pointer would be to H
, since it’s the first byte, and the 0
tells the reader that is the end.
How do I compile this to WASM?
I won’t go too deep into tools & setup, but you can install docker and use a little container I made, that has all the tools I like, and is fairly quick to get started with:
# mount the current directory inside the docker and give me a bash-prompt
docker run -it --rm -v $(pwd):/cart konsumer/null0:latest
# now you are inside the container
# compile hello.c, using wasi-sdk
clang --sysroot=$WASI_SYSROOT -Wl,--export=hello -Wl,--export=free -Wl,--export=malloc -Wl,--no-entry -nostartfiles -o hello.wasm hello.c
# inspect the wasm
wasm-objdump -x hello.wasm
There is a bunch of stuff here, but the key things we care about here are in the Export
section: hello
, free
, malloc
and memory
.
How do I use this in javascript?
Put this in an .html file:
<script type="module">
const encoder = new TextEncoder()
const decoder = new TextDecoder()
function setString (value, len = 0, pointer) {
if (!len) {
len = value.length + 1
}
if (!pointer) {
pointer = malloc(len)
}
const buffer = encoder.encode(value)
for (let b = 0; b < len; b++) {
mem.setUint8(pointer + b, buffer[b] || 0)
}
return pointer
}
function getString (pointer, len = 0) {
let end = pointer + len
if (!len) {
while (mem.getUint8(end) !== 0) {
end++
}
}
return decoder.decode(mem.buffer.slice(pointer, end))
}
const { instance } = await WebAssembly.instantiateStreaming(fetch("hello.wasm"), {env: {}})
const { malloc, hello, memory, free } = instance.exports
const mem = new DataView(memory.buffer)
const ptrName = setString('World')
const ptrResponse = malloc(100)
hello(ptrName, ptrResponse)
console.log(getString(ptrResponse))
// not strictly required, but also try not to malloc without free
free(ptrName)
free(ptrResponse)
</script>
That’s all there is to it, really.