Efficient and secure module hot-loading on native platforms

Corentin Godeau
9 min readDec 17, 2023

--

In my previous company, we assisted video streaming platforms in enhancing user experience and resource utilization through our client-side P2P products. One of these products was provided as a native SDK written in C++ and wrapped in platform-specific libraries.

When we switched from a JavaScript-based product running in the browser to a native SDK, we lost the possibility for quick iterations. Although it’s technically possible to download and interpret code over-the-air, it is often outright forbidden by native platforms. These quick iterations are a big part of the way we work.

Fortunately, we came up with a way to overcome these limitations while still complying with the rules set by the platforms. If you are also working on a project that would benefit from quick iteration on the core of your software, you may want to read on!

(There is even a small surprise just for you at the end)

How we came to choose Wasm

Working on a P2P application means that its behavior in a production environment is difficult to predict and quite hard to reproduce in staging. Some might even say impossible given the scale of real-world traffic, with hundreds of thousands of people watching content.

Our work relied on feature flags, A/B tests, and the regular deployment of new P2P logic to fulfill our customers’ needs (If you haven’t already read it, Sergey Arsenyev wrote an article on this topic). That is why we tweaked our product on a regular basis.

This was easy to do as our JS product was downloaded (when not already cached) every time a viewer would play a stream from our customers’ web pages. We simply had to increase the patch number, deploy it to our CDN, and wait for new viewers to start a stream.

With native SDKs, this workflow became a bit more complex. Development cycles were slower and, most importantly, followed our customers’ release cycles. If they chose to release every six months, we had to wait six months before seeing how our code behaved in production.

The obvious solution was to plan for all the features we wanted to test, implement them, and disable them by default through feature flags. But this solution only allowed us to run A/B tests and get feedback every six months, when our customers released their app with the updated version.

This is when we started to think about solutions to help avoid a six-month long experiment / development loop.

First, we had to take a step back and look at what we were trying to achieve. Our goal was to be able to deploy code on-the-fly on viewers devices, without disturbing our customers’ apps. Concretely, this is easier than you might think. Code is just binary data that has been annotated as executable. Technically speaking, nothing prevents us from having the end user devices download a shared library and dynamically call into it. With proper security and caching, this would have solved our issue and we could have kept improving the product.

But this is where platform limitations and rules come into play. On iOS devices, for instance, the Apple Developer Program License Agreement explicitly forbids the download of executable code by applications (section 3.3.2). The sanctions for non-compliance can be the removal of your app from the AppStore. This ruled out the possibility of deploying code over-the-air and put us back to square one.

That’s when we remembered something.

The first native-oriented implementation of our product relied on using our JavaScript codebase through a WebView running in the background and communicating through a native adaptation layer. This means that downloading a JS script from a webpage and running it within the confines of the WebView sandbox is in compliance with the aforementioned rules of the platform’s agreement.

In other words, it is not the act of running arbitrary code that is forbidden, but rather trying to bypass the tools that are offered by the platform. Within the approved and safe environment that is provided, developers can download and interpret code on Apple devices.

Our objective was then to find a technology offering acceptable performance compared to native, that was compliant with the platform security rules and that could easily be integrated within a C++ codebase?

That is where it clicked for us.

Why Wasm is a good fit ?

According to their website, WebAssembly (or Wasm) is a “binary instruction format for a stack-based virtual machine” and is advertised as being “Efficient and Fast” and “Safe.” Furthermore, Wasm being an open standard, a multitude of runtimes exist, including ones with a C++ implementation.

The Wasm logo

On paper, this solution checked all of our requirements:

  • Being a bytecode format (similar to Java bytecode) makes it easy to reach acceptable performance.
  • It’s naturally designed for being sandboxed.
  • C++ implementations exist, so integration in an existing codebase should be smooth.
  • Many languages support Wasm as a target output, allowing us to keep all the code in C++.

And in the case of the iOS ecosystem, WebKit provides a WebAssembly runtime implementation, which sets a precedent for us of allowing Wasm to run on their devices.

For all these reasons, we started to work on a prototype that would allow us to hot-load Wasm modules inside our SDK.

Our previous experience with Wasm

Before diving into the details, I want to give a bit more context about our previous experience with Wasm since this was not the first time we had to work with it.

During the development of our C++ product, one of the first milestones in the roadmap was to replace existing JavaScript deployment with its Wasm equivalent (See my lightning talk on the subject during the CNCF Wasm Day 2021). This decision was motivated by the idea that it would give us a good comparison point to see if we were headed in the right direction.

Unfortunately, after some trials, we changed course and fell back to focusing on mobile platforms instead (iOS and Android mainly). Wasm was still at an early stage, and we faced many roadblocks due to that. This is not the place for an exhaustive list of technical challenges, but here are two important ones.

Wasm is single-threaded

Being originally targeted at the web, Wasm is deeply single-threaded. There are plans to allow seamless multi-threading in the browser through the use of WebWorkers and SharedArrayBuffer, but it was not available at that time.

This had consequences on the way we had to architect our codebase as it needed to transparently work in single-threaded environments (web browsers) as well as in multi-threaded environments (mobile, desktops, etc.).

Wasm Memory Model is very simplistic

In terms of memory, Wasm offered a single heap that grows indefinitely when more space is required. By default, the C++ language does not force you to be watchful on dynamic memory allocations and this led us to a lot of fragmentation and a higher memory consumption compared to the same codebase running in native environments.

Our conclusion was simply that it was too soon for us to create a full application in Wasm.

What has changed since then?

Maybe Wasm is still not entirely ready for complex applications, but our situation has changed since then. Most of the efforts of the past years have been to decouple parts of our application and organize them as modules. The gains were multiple:

  • Clear responsibilities of each module
  • No more spaghetti code spread over the codebase, something that was making understanding and maintaining very difficult
  • Faster parallel iterations on different modules

But another cool side effect is that with this architecture, the only constants are the interface between the modules. As long as modules know how to talk to each others to action specific tasks, the internal logic of these modules can completely change without introducing unexpected interactions or bug. These interfaces take the form of communication channels with messages transiting through them. This means that if we know how to interface with these communication channels, the logic of each module can be driven by anything, for instance a Wasm program running in a virtual machine.

P2P product simplified architecture

Among the modules, the P2P Scheduler module was an outlier. It was at the heart of our technology and is entirely responsible for P2P performance. This is where all our peer logic happened, and therefore where the fast iterations needed to happen. Among all the modules composing our codebase, this was the one that we aimed to update at a faster rate than our customers’ release cycles.

This is where Wasm can really shine. The scope was very restrained, memory light, and fundamentally single-threaded. This made Wasm a valid fit.

Consequently, we developed a proof of concept of this by exposing the communication channels to the Wasm module. On top of that, and just to validate the concept, we also decided to use a language other than C++ to target Wasm and chose Rust (keeping C++ would have worked just as well).

As a Wasm Virtual Machine, we used WAMR (WebAssembly Micro Runtime) but other choices are available according to your needs. This choice was driven by the fact that WAMR is written in C and C++. It also provided a CMake file that we could directly integrate to our existing CMake build scripts.

The way it worked was quite simple. By default, our SDK came with an embedded version of the Scheduler module, compiled to Wasm. When fetching the configuration (a necessary step to fit to each customer use case), we would check the presence of an override on the Scheduler version to use. If absent, we used the Scheduler version embedded in the application. If present, we fetched the new Scheduler from our CDN, cached it for future use, and instantiated it. We also added a basic cryptographic check as an additional verification on top of HTTPs download guarantees.

And that is how we managed to inject custom behavior on-the-fly, with acceptable performance, while remaining compliant with the various platform rules of the native ecosystem.

Limitations and feedback

The story wouldn’t be complete without mentioning the pain points we encountered.

Although Rust is able to output Wasm modules, in its current state, it might not be the best choice. During preliminary work on the POC, we were unable to tell Rust not to output WASI dependent code, and this required additional features from the Wasm runtime we used. For context, WASI gives access to low-level primitives such as system calls and makes it easier for a module to run outside of the browser. That way, you can keep the same codebase using, for example, read() and write() syscalls (on Linux) but still target Wasm without requiring any specific adaptations.

For us, this was not something we needed as the core logic has no dependencies on operating systems features. It is purely computational, and we can manually handle the small set of functions we want to expose to the inside of the Wasm module ourselves (random number generation is a good example).

A better fit as a language could be Zig. It is still a young language, but considers Wasm as a target that will become more and more popular in the future, and can output WASI-free modules with ease.

Conclusion

We consider this project as an undeniable success. We managed to meet our requirements with a solution that is viable and that provides an acceptable trade-off between the additional efforts required and the possibilities it opens for us. Now that we have proven that this solution works, the next steps would probably be to focus on implementing additional security mechanisms as safety is a very important concerns for modern softwares.

If you are in a similar situation, try it out. Just be wise that the language you target Wasm from may lead to unexpected hassle.

Considering the surprise I mentioned at the beginning of this post, here it is : https://github.com/Corendos/wasm-hot-loading. It is a small repository containing a simplified version of what is mentioned in this article. This can be used as a good starting point if you are interested in trying that solution.

--

--