How to win in Web Framework Benchmarks

Two years ago I’ve started my journey to explore Virtual DOM and wrote kivi library. This library is using key ideas from React, but has completely different implementation. The primary goal for this library was to show the best numbers in benchmarks. I wouldn’t recommend to use kivi for anything except benchmarks, and I hope nobody is using it.

Almost all implementations that have the best results in benchmarks are using the same ideas that was originally implemented in kivi and Bobril.

Virtual DOM is probably not the best possible way to win benchmarks, but it is the easiest way. And maybe because of Virtual DOM simplicity we still haven’t seen other architectures with good results in benchmarks. It is way much easier to optimize Virtual DOM for benchmarks than other implementations with a much more complicated architectures.

In this article I’ll try to explain the most interesting tricks and optimizations that are used in different libraries to get the best score in benchmarks.


vdom-benchmark

Lets start from one of the worst benchmarks. I wrote this benchmark when I’ve started to work on my own Virtual DOM implementation.

There are many test cases in this benchmark, but I’ll try to explain test cases that has the most noticeable differences between libraries.

React, kivi and Inferno results will be enough to show the difference. Almost all other libraries have results that either similar to React, or similar to kivi. It mostly depends on their architecture. It is either some kind of two-pass diff/patch with immutable tree, or one-pass syncing with mutable tree.

React in general doesn’t show any impressive results, but if we will take into account the time that browser is spending on recalc style, reflow, painting and composition, instead of just javascript execution time, the difference will be significantly smaller.

Lets start with render() use cases:

Inferno is significantly faster than all other implementations, but it is not because it is somehow creates DOM nodes and insert them way much faster than other implementations. It is because it is using an optimization that recycles DOM nodes. And it is not a simple object pool, when it removes nodes that has a key property, it will place them into a dictionary using the same key property. And when benchmark performs the second iteration, Inferno can find existing node with exactly the same key from the pool and insert it back, so it doesn’t need to update anything, all virtual nodes will have the same values.

Another use case that I’d like to mention in vdom-benchmark is a simple move of just one DOM Node:

kivi and Inferno doesn’t have any difference in results between moveFromEndToStart and moveFromStartToEnd. But in React, moving nodes in one direction is much slower than in the other direction. It is because React is using a naive algorithm to find changes in children lists, in one case it will perform one insertBefore operation to move node, and in the other case it will perform N-1 insertBefore operations.

All modern Virtual DOM implementations are using simple technique to handle such use cases:

A: [a b c d e f g]
B: [a b f d c g]

A is an old children list and B is a new children list. Each item has unique key and we need to rearrange list A with the least amount of transformations.

A: -> [a b c d e f g] <-
B: -> [a b f d c g] <-

We can start by skipping common prefix [a b] and suffix [g]

A: -> [c d e f] <-
B: -> [f d c] <-

At this position we will look at the opposite edges of different lists and move nodes to the opposite edge if it has the same key. Here we can move c to the end, and f to the beginning.

A: -> [d e]
B: -> [d]

Now we can skip prefix d.

A: [e]
B: []

And remove e.

It is a really simple optimization, and when you look at it, it seems like an obvious solution, but somehow all implementations didn’t have it, until Boris Letocha implemented it in Bobril. Nowadays almost all modern Virtual DOM implementations are using this technique to handle such use cases.


JS web frameworks benchmark

This benchmark is significantly better than vdom-benchmark, because its result isn’t just a javascript execution time, it also measures recalc style, reflow, paint, composition etc. But it is also a synthetic test with all its downsides.

Test cases that are used in this benchmark probably were chosen because it is easy to implement them efficiently in Vanilla JS, so Vanilla JS implementation can be used as a baseline. But I guess that the author of this benchmark was surprised when we’ve started to add Virtual DOM implementations that was optimized to win benchmarks.

Vanilla JS implementation couldn’t compete with our libraries in many test cases and it was completely rewritten and now it performs DOM operations in the same way as Virtual DOM implementations.

For example, this is how it will remove a single row:

for(let i=this.rows.length-2; i>=idx;i--) {
let tr = this.rows[i];
let data = this.store.data[i+1];
tr.data_id = data.id;
tr.childNodes[0].innerText = data.id;
tr.childNodes[1].childNodes[0].innerText = data.label;
this.data[i] = this.store.data[i];
}
this.store.delete(this.data[idx].id);
this.data.splice(idx, 1);
this.rows.pop().remove();

If you are Vanilla JS master and think that you can always perform DOM manipulations way more efficiently than using some kind of abstraction like Virtual DOM, try to understand why this is faster than simple removal of a single row.

I don’t want to write about libraries that have extremely bad results in this benchmark. I’ll focus on the leaders and how they get their results.

As we can see, with this benchmark it is way much harder to improve rendering performance with recycling, because it doesn’t perform several iterations in one session.

All libraries are pretty fast in this test case, but there are still some key differences between implementations. Some implementations are using an old optimization technique with event delegation and some don’t.

The one thing that I’d like to mention is that React, Bobril and vidom implement event delegation implicitly and hide all implementation details from their users. Other libraries like kivi and Inferno are using event delegation explicitly in their benchmark implementations.

I’ve implemented it explicitly with kivi just to get better results in this benchmark, and I think that this is a kind of “cheating”. Explicit event delegation in React-like libraries is an anti-pattern.

But it also wouldn’t be fair not to mention that implicit delegation also has its problems, this issue perfectly describes why it can cause scroll jank on mobile browsers.

This is the most interesting piece in this benchmark.

Leon Sorokin (domvm author) found out that if we disable track by key algorithm and will use naive algorithm to rearrange children lists we can get amazing results in this test cases, because it won’t trigger long-running recalc style. In other words, just render children list without key property in React, disable track-by in angular or vue, or do some magic to get rid of tracking by object identity.

The reason for this strange behavior is nth-of-type(odd) selector that is used for rows in this benchmark. Implementations that properly rearrange, replace and remove nodes will be way much slower than implementations that update existing nodes, and always remove the latest rows.

This “optimization” technique does the trick because rows doesn’t have any complicated structure and updates are trivial.

Also, this benchmark doesn’t have any other test cases with moving nodes, etc. Swapping rows is the simplest transformation:

A: [a b c d e f]
B: [f b c d e a]

Here sync algorithm need to perform two updates a -> f and f -> a. But imagine a much more common situation with one move:

A: [a b c d e f]
B: [b c d e f a]

Now it will need to perform six updates, because we removed all keys and syncing algorithm won’t be able to move node with one insertBefore operation.

And even if this benchmark doesn’t have any internal state in rows, and it should be ok to remove all keys, I think that this is completely wrong, and the only library that has the correct behavior in such situations is React. It will immediately print a warning that key properties are missing. This is one of the most common errors when you working with web UI libraries.

Here is what will happen if you forget to add keys and then someone decides to add a CSS transition in one of the components that are used in this rows:

When item is removed from the list without keys, the next item will use css transition state from the removed row, instead of starting a new transition on the next row. That is because when you click on a row to remove, you are actually removing the latest row and all other rows are modified, so it looks like you removed a row that has been clicked, and now all nodes have incorrect internal state.


uibench

Another benchmark that I’ve created, the same problems as all benchmarks. I just wanted to measure more test cases with components overhead, etc.

This benchmark has an option to measure full render time (javascript, recalc style, reflow, etc), or just javascript execution time. Here I’ll use full render time in all results.

Again, test case that creates a table and inserts into document. Nothing new here, Inferno is using recycling and event delegation, kivi is using event delegation and Snabbdom just have an optimized implementation without any tricks. kivi and Inferno results will be similar to Snabbdom if we remove benchmark-specific optimizations.

This test cases are showing who is still using naive algorithms to rearrange nodes when prefix/suffix optimization doesn’t work. There are many ways to disable prefix/suffix optimization, for example delete nodes at both ends, or insert nodes at both ends.

There are several different worst cases because some libraries are iterating nodes in one direction in their naive algorithms and other libraries in the other direction. And kivi_worst_case can’t have fast results, because it requires to reverse all nodes, it just triggers the slowest path in kivi syncing algorithm.

kivi demonstrates good results in this test cases because when prefix/suffix optimization fails, it uses an algorithm that actually finds a minimum number of DOM operations. Inferno copied this algorithm from kivi, so it has the same results.

I am not sure why Snabbdom have bad results in all this test cases, all naive implementations should have bad results in no more than 3 out of 4 worst test cases. Probably a bug in snabbdom.


Repaint Rate Challenge

There are no special ways to get the best results in this benchmark, except some obvious cheaters that are using setImmediate, or sending messages from WebWorkers that invoke ping(), while main thread is completely blocked by “amazing” WebWorker implementation.

Just don’t look at repaint rate monitor at the bottom right and instead open dev tools and enable FPS Monitor.


Benchmark games

All this is just a benchmark games, and nobody should take seriously any numbers in this benchmarks. They are useful for web framework authors, and to those who deeply understand all internals, tricks and “cheats” that are used in them. Don’t use numbers from any web framework benchmarks to make a decisions when you are choosing a framework or a library.