Async-Friendly Performance Counter

Introducing a new performance counter on the Internet Computer to easily optimize async canister code

Andriy Berestovskyy

Published in

The Internet Computer Review

4 min readDec 7, 2023

Background

The Internet Computer can host a full dapp — frontend, backend and data. Users can deploy their dapp as a canister (smart contract) on the Internet Computer. Each canister can store up to 96 GB of data and execute up to 20 Billion WebAssembly instructions per transaction.

Unlike other blockchains, the Internet Computer offers developers enormous amounts of resources. This poses a real challenge for canister developers: writing efficient canister code. The more efficient a canister is, the less cycles it pays.

Performance counter is the way for a canister (smart contract) to track the amount of work done. This information could be used as the main metric to optimize the code.

The Internet Computer provides this System API call:

ic0.performance_counter : (counter_type : i32) -> i64

The type argument specifies which performance counter to return. For a long time only type 0 counter was supported on the Internet Computer — the number of WebAssembly instructions the canister has executed since the beginning of the current message execution.

This type 0 counter works well, and many developers use it daily. But the counter is valid only for the current message, and it resets after each await point.

How could it be improved?

We are introducing a new call context performance counter (type 1) to track the executed WebAssembly instructions across await points.

Comparing The Performance Counters

Consider this example function:

#[query(composite = true)]
async fn example() -> (u64, u64) {
    do_some_work();
❶   call(id(), "nested_call", ())
❷       .await;
    do_some_work();
❶   call(id(), "nested_call", ()) 
❷       .await; 
    do_some_work();
❸   (performance_counter(0), performance_counter(1))
}

The function ❶ makes two nested calls, ❷ awaits them, and finally ❸ replies to the original call.

From the developer’s point of view, it’s just one function. From the user’s perspective, it’s just one call. But in fact, under the hood, there are three message executions:

The original call is executed up until the first await point.
Then the reply to that call triggers the second execution.
At the end, yet another reply triggers the final part, which replies to the original call with two performance counters.

There is a call context to track the original call. The Internet Computer creates it for each call and keeps it around until the call is either replied or rejected.

The performance counter type 0 just follows those three message executions. It starts from 0 and goes up to some value i. After the first await, it resets and goes up to j, and then up to k.

On the other hand, the new type 1 counter lives in the call context, so it monotonically increases until the original call is replied. In this example, it starts from 0 and goes to i, i+j and then up to i+j+k.

The results are similar. Can just the old type 0 counter do the job?

To get the same i+j+k result with the counter type 0, the counter should be saved before each reset:

#[query(composite = true)]
async fn example() -> u64 {
    do_some_work();
    let c = call(id(), "nested_call", ());
❶   let i = performance_counter(0);
    c.await;
    do_some_work();
    let c = call(id(), "nested_call", ());
❷   let j = performance_counter(0);
    c.await;
    do_some_work();
❸   let k = performance_counter(0);
    i + j + k
}

Even this simple example becomes spaghetti, as every single await point must be changed. Imagine a bit more complicated code, with many nested functions and library calls… It’s a challenge to properly instrument production asynchronous code!

On the other hand, the new call context counter still must be explicitly collected at the end of the function:

#[query(composite = true)]
async fn example() -> u64 {
    do_some_work();
    call(id(), "nested_call", ()).await;
    do_some_work();
    call(id(), "nested_call", ()).await; 
    do_some_work();
❶   performance_counter(1) // i + j + k
}

But in this case there is no need to instrument every single await point in the nested functions or libraries. There is just one place, right before the counter is returned. It’s a huge step forward to fully automate the code instrumentation and profiling in the future.

Using Performance Counters

To quickly run the performance counters example (assuming the Internet Computer SDK is already installed):

$ git clone git@github.com:dfinity/examples.git
$ cd examples/rust/performance_counters
$ cargo update
$ dfx start --clean --background
$ dfx deploy
$ dfx canister call performance_counters example
(7_012_128 : nat64, 21_568_690 : nat64)

The example function returns two values: the old counter (type 0) and the new one (type 1). As the new call context counter includes all three message executions, it reports three times more WebAssembly instructions: 21 Millions vs just 7 Millions.

Please check the Performance Counters Example on GitHub for detailed instructions.

The new async-friendly performance counter is available everywhere:

Rust:       ic_cdk::api::performance_counter(1);
Motoko:     import IC "mo:base/ExperimentalInternetComputer";
            IC.performanceCounter(1);
TypeScript: import ic from 'azle';
            ic.performanceCounter(1);
Python:     from kybra import ic
            ic.performance_counter(1)

References

Backend Developers Guide: Optimizing Rust Canisters
Motoko Developers Guide: Optimizing Canisters

If you have questions/suggestions or just want to meet Internet Computer developers and DFINITY engineers, join the forum.dfinity.org.