The Next Step for DeAI: On-Chain Inference Enabling Face Recognition

DFINITY
The Internet Computer Review
5 min readJul 15, 2024

The replica version e4eeb3 that was approved by the community in Proposal 13094 completes the Cyclotron milestone from ICP’s roadmap.

The goal of this milestone is to enable on-chain inference of AI models with millions of parameters, which is the first step towards a more ambitious goal of on-chain training and inference of large AI models.

It is common knowledge that AI workloads are compute-intensive. Running inference on an AI model with millions of parameters involves billions of arithmetic operations, such as multiplications and additions. This means that in order to support on-chain inference, a blockchain needs capacity to process billions of operations per second. The Cyclotron milestone increased the computing power of ICP by an order of magnitude (~10x), making it the only blockchain that has working examples of smart contracts performing face recognition fully on-chain, along with other use cases such as image classification and running GPT2 (by DecideAI).

Facial Recognition Demo by Dominic Williams

The foundation for on-chain AI computation

A virtual machine is the part of a blockchain that is crucial for AI computation because it executes a smart contract’s code. The features and performance of the virtual machine directly affect how much AI computation a smart contract can perform. For example, EVM is the virtual machine of Ethereum. It was tailored for DeFi smart contracts and lacks features such as floating-point arithmetic needed for AI computation. In contrast, ICP uses WebAssembly as the virtual machine. WebAssembly supports floating-point numbers and was designed ground up for near-native performance.

The idea of the Cyclotron milestone is to squeeze as much floating-point number performance out of ICP’s virtual machine as possible.

Optimization #1: Deterministic floating-point operations

Most AI libraries and frameworks rely on floating-point arithmetic. In the context of ICP, floating-point operations have to be deterministic, which means they should produce the same predictable result with the same input operands. This determinism property is important because ICP executes the same code on multiple nodes and then runs its consensus algorithm to establish the correct result. If a floating-point operation is not deterministic, nodes might diverge, stopping the progress of the blockchain.

DFINITY engineers found a way to make deterministic floating-point operations faster in the WebAssembly virtual machine implementation called Wasmtime. This is a low-level compiler optimization that produces faster code. This optimization benefits not only ICP but also other platforms and blockchains that use Wasmtime.

Optimization #2: Single instruction, multiple data (SIMD)

SIMD is a technology supported by all modern CPUs. It allows the CPU to execute multiple arithmetic operations with a single instruction. For example, WebAssembly can perform four parallel floating-point additions with a single instruction, as shown in the diagram below.

WebAssembly SIMD can also work with integer numbers. For example, it can perform 16 parallel arithmetic operations with small 8-bit integers. Depending on the type of numbers and operations, performance may increase up to 4x-16x.

Smart contracts running on ICP can now use deterministic SIMD instructions and benefit from parallel computation. Learn how to compile a smart contract with SIMD.

Optimization #3: SIMD support in AI inference engine

The final piece of the Cyclotron puzzle is adding WebAssembly SIMD support to AI libraries. DFINITY engineers contributed a WebAssembly SIMD implementation to the open source Sonos Tract inference engine. The new code implements matrix multiplication and other numeric algorithms using SIMD instructions. Similar to the first optimization in Wasmtime, this contribution benefits not only ICP but a wider developer community.

Results

These optimizations combined speed up numeric microbenchmarks by 28x. In the end-to-end AI inference workloads, the observed improvement varies from 5x to 19x depending on the model, as shown in the chart below.

The source code of the smart contracts containing these AI models is available on GitHub, so anyone can reproduce and verify the results:

  • Image classification: this is a MobileNet model that classifies the input image and returns the most likely labels out of 1000 known labels. The number of Wasm instructions for running a single inference reduced from 24.7 billion to 3.7 billion.
  • Face detection: this is an Ultraface model that finds the bounding box of a face in the input image. The number of Wasm instructions for running a single inference reduced from 6.1 billion to 1.2 billion.
  • Face recognition: this is a model that computes vector embedding of the input image of a face. The number of Wasm instructions for running a single inference reduced from 77 billion to 9 billion. The execution limit on the mainnet is 40 billion instructions, which means that previously face recognition would fail to run on the mainnet and could only run locally in a patched replica.
  • GPT2: this is a GPT2 model translated into a smart contract by DecideAI using their rust-connect-py-ai-to-ic framework. Details of the benchmark are described here.

The benchmarks were run in dfx version 0.20.1 (Baseline) and version 0.22.0-beta.0 (Cyclotron).

Conclusion

The Cyclotron milestone brings the performance of AI compute on ICP to near-native CPU performance by optimizing floating-point operations and enabling WebAssembly SIMD instructions. It enables on-chain AI inference on models with millions of parameters such as image classification, face recognition, and GPT2.

This is a first step towards running large AI models fully on-chain to solve AI’s trust problem. The next AI milestone in ICP’s roadmap aims to scale beyond CPU limitations. To perform AI inference and training of large models on-chain, smart contracts need a way to run compute- and memory-intensive computations on specialized hardware such as GPUs. Stay tuned for the Gyrotron milestone.

--

--

DFINITY
The Internet Computer Review

The Internet Computer is a revolutionary blockchain that hosts unlimited data and computation on-chain. Build scalable Web3 dapps, DeFi, games, and more.