Debugging floating-point generation in Rust Wasm smart contract

Bartłomiej Kuras
CosmWasm
Published in
9 min readMay 13, 2022
Photo by Timothy Dykes on Unsplash

Most time working with Rust is a pleasure — great ecosystem, excellent tooling. However, dark places in the world drive a developer into the madness. Recently I had such a journey I would like to share with you.

Context

To talk about the problem, I have to start with setting up a common understanding of the basics. Let’s start with the binary I am trying to build.

CosmWasm smart contracts are Wasm binaries generated by Rust
wasm32-unknown-unknown target. However, there is a significant problem: CW virtual machine does not support floating-point operations (neither f32 nor f64).

What is the crucial point — the floating-point operations are generally supported by Wasm — CW does not support it. And there is an issue related to it — the rust compiler would successfully generate floating-point Wasm instructions, and there are not many ways of avoiding it. You could avoid using floating-point operations, but it is not easy to have complete control over the whole dependency tree.

Now the code I would try to build: the cw1-whitelist-ng contract from the state of this commit. The contract is port of cw1-whitelist contract to the new framework I am building.

Note: the review is not describing what I did precisely step by step. It was a whole day of struggle, jumping from idea to idea and trying things. Some of them failed so early that I didn’t even mention it. It is a rough go-through of my most successful takes.

Let’s upload a contract

So now there is time to go through my Friday. I just made all tests for thecw1-whitelist-ng contract, and I am happy - I finally can merge the first instance of ng framework, which would be a really big step for me. There is one last thing - I need to test it online on blockchain. So starting with building my contract with CosmWasm rust optimizer - I have to build the whole workspace because of the fact that my key cw-derive crate is not yet published, and I use it with path dependency and single contract optimizer would not see it. Therefore in the root cw-plus directory, I run:

and after a couple of minutes, I can see that my new contract is built, and it is not much bigger than the original one (I expect some overhead of framework, but I am happy it is low):

Now just upload it, and celebrate:

Floating-point vs. CosmWasm

You need to know one thing — you do not want to struggle with the “The use of floats is not supported.” error. But if you have to, I hint that there are two main reasons. The first is obvious: using f32 or f64 in code (you can reason about which one is it, reading the message — here it is F64Load operation, so we are probably dealing with f64). The other one is tricky - using usize in #[derive(Deserialize)] types - the reason is that serde code for usize de-serialization contains floating-point operations - I didn't investigate it yet. Still, it turns out not to matter for this very case.

Anyway — my first instinct was to go through my whole code to figure out if I do any fp or usize reading (we are working with F64Load, so it does not even need to be arithmetic - assigning it somewhere may be enough to cause this). The problem is that ng framework is mostly a code generator (using macros), so I assumed that problem has to be in something I generate - the outcome is that just eye-scanning code won't help.

Brief NG introduction

Before going forward, I would like to introduce the idea of ng briefly, so we are on the same page with that.

The idea of the framework is to make contract development more abstraction-focused instead of data-driven. In the traditional approach, you define JSON messages for the contract (by serde-serializable structures), and then you create raw entry points in which you can do whatever you want. It is nothing wrong with it, but it creates some design challenges.

The way I proposed is to define contract messages as a set of Rust traits, which we use to determine the capabilities of the contract. The contract itself becomes the Rust type on which those traits are implemented. All the infrastructure, including messages, entry points, and additional tooling: helpers to smart-query contract, utilities to build multi-tests with them, possibly more in the future — are generated from those traits. To define interfaces for contracts, we now just use #[cw_derive::interface] macro (naming is subject to change) like this:

Then just create a contract type with a state description, and implement the trait on it:

In the current state, not everything is generated yet. Some parts still have to be included manually, but it is the direction taken.

Debugging procedural macros

I need to expand macros. Hopefully, I know two ways of doing it:

  1. rust-analyzer supports macro expanding, which is very helpful for quickly checking what macros generate
  2. There is cargo expand tool doing the job more persistently

Therefore my next step was to expand the whole contract and look for some issues:

I have three findings checking the code: good, bad, and terrible.

  1. Good: no f32 or f64 in the generated contract
  2. Bad: some usize in the generated contract
  3. Terrible: all usize are in serde-generated functions

I was hoping for something I messed up in cw-derive so I could fix it and call it a day, but not this time. I just used a dependency, which is now giving me some headaches.

Hacking on serde-json-wasm

Why is there a problem if I do not deserialize usize and, in general, I don’t use any floating-point arithmetic? To understand it, we need to look at the difference between messages used in the original cw1-whitelist contract and generated for cw1-whitelist-ng one (not even their serde generated code). Let's take a look at execute message only.

It is by design that in ng, it is possible to split execute messages to different “interfaces,” and then they would be at the end merged into one message with #[serde(untagged)]. But it turns out this is causing the problem. At this point, I was not yet sure if I was right, but what I did was I removed the "glue" message. Unfortunately, now entry points would not compile because they were using the top-level ExecMsg. Still, for now, I changed their signature to Empty and just implemented it as todo! () to make the compiler happy. Then I need to do the same with the Query message, and I am good to go. Now I checked it with the check_contract utility example from the CosmWasm ecosystem (I have it built and available in $PATH as it is a useful tool while working on contracts):

I am not happy — one of the core functionality of ng is broken. But at least I found a reason, and I can try to work it out. The first thing I am thinking of is the serde_json_wasm crate — this crate is designed for CosmWasm smart contracts. And if it could improve serialization for them, it would be easy to justify merging it there.

So first things first — I need to clone the crate and force the whole contract dependency tree to use my patched version instead of the released one. Hopefully, there is a nice way to do it — I have to add a section to top-level Cargo.toml:

The path is functionality designed for this task. So now I can try to experiment with my local dependency, and after removing and commenting out parts of code, I figured out an interesting thing. In the serde_json_wasm, there is this function deserialize_any() which is responsible for the deserialization of untagged enums by serde. If I comment out the whole match statement and return an error, then the contract magically works (not as intended, but it doesn't have fp code generated). But if I comment all match legs leaving only the _ => Err(...), then the build is wrong again! Extraordinary - in particular, considering that the only called function here is parse_whitespace() on match top itself, which is used everywhere around. I assumed that there is some optimizer problem - probably there is some fp code, which in every other use of parse_whitespace() is easily inlined and then removed as unused, but for some reason, here it is more tricky for rustc. I was wrong, but hopefully, this thought led me to the next step.

Hacking on serde

I realized that I could not do much more finding by blindly commenting stuff out. I need some hints on where those fp instructions are coming from. What I did was take a look into generated Wasm itself. Having my Wasm binary, I can convert it to textual Wat format using wasm2wat tool:

I can easily find instructions causing a problem, and I go up to the first function signature above it:

So it seems like my problematic instruction is somewhere in $_ZN5serde9__private2de7content7Content10unexpected17h733ab7aaf0483000E function. Funny name, telling me not too much - except it some private serde thing - it doesn't make me happy.

However — it is good to figure out that the mangling for Rust function is independent of platform building on — it would be the same for wasm32 and normal binary build. So I need to find Rust demangler, and I am happy. There is a rustfilt tool which would help me with the task:

Nice, now I need to figure out what calls this function. I cloned serde, patched my contracts to use my local copy, and went for a trip. Here is my function causing problems: Content::unexpected(&self). What is very interesting are those lines:

So let’s take a step back. What is the Context::unexpected function doing in our code that was unnecessary without an untagged struct? The first question to ask should be: how is even untagged work? Let's check serde documentation:

There is no explicit tag identifying which variant the data contains. Serde will try to match the data against each variant in order, and the first one that deserializes successfully is the one returned.

Serde is generating code that tries to parse one of our structures, and if nothing is matching, it would call unexpected to convert Content object to Unexpected object! And it would probably be always converting Content::Map to Unexpected::Map - as those are the only types sent in wasm ecosystem (raw non-object values are not sent directly to contracts). But technically - rustc has no way to ensure that the Content would never have the value of f32 or f64, so it needs to include this conversion in generated code. Why is it not required in other cases, like when we try to deserialize some object to regular enum? Because the Content structure is used strictly for deserializing untagged and internally tagged enums.

I removed the Float variant from Unexpected (not even touching F32/F64 in Content!), mapping floating-point Content variants to Unexpected::Other made the contract being properly generated. To make everything compile, I needed to patch the serde_json additionally. Still, it is because the patch affects all dependencies in the tree, so if it is used by serde_json (where f32 are ok, and which is using the Unexpected type internally), it will cause serde_json not to build, but it was only for testing purposes.

What next?

So I know more or less what caused the problem, and now I need to figure out the solution. There are three coming to my mind:

  1. Fork serde for CosmWasm purposes, just like we did with serde-json
  2. Substitute serde with another serialization crate (in Wasm world, it is open discussion if serde is even the best choice in general)
  3. Get rid of #[serde(untagged)] enum

I don’t like 1st option. Making a fork of such a big crate would be challenging to maintain. Also, 2nd as it may be helpful in general (if we decide some other crate fits better), it is much work to do, and worse, it would impact SmartContracts API. The 3rd option seems reasonable. Instead of using an enum with #[serde(untagged)], my idea is to use an enum with custom serde::Deserialize implementation, generated by the same macro generating all messages. It would be tricky to do, but I can try to avoid any tough calls there.

References

CW Plus repo in the state of this article
Rust optimizer for CosmWasm smart contracts
Cargo expand for expanding macros
Serde crate
Rustlift for demangling Rust symbols
Twiggy for Wasm binaries analysis (not mentioned in write up but proved to be useful)

Find me on Github and LinkedIn.

--

--

Bartłomiej Kuras
CosmWasm

Developer and trainer at Confio, CosmWasm maintainer. Rust evangelist, enthusiast of sharing his experience in Software Development.