Software, complexity and cache invalidation

Published in

Wix Engineering

4 min readOct 16, 2018

Photo by Marc-Olivier Jodoin on Unsplash

The curse of derived state

TLDR — We’re open sourcing a new JavaScript compiler for derived state, that combines the best features of Mobx, Redux & Lodash. Moves most of the computationally intensive work to build time, and let developers focus on their core domain problems without sacrificing performance.

Sometimes the complexity of software development is a source of pure joy, like solving an intricate jigsaw puzzle. Finding creative ways to connect the different pieces together in a project, especially a large one, is very satisfying. But sometimes complexity can be downright annoying, incidental and painstaking. One of the largest sources for this type of complexity is derived state.

Almost all projects have core state; and in most cases various parts of your codebase require all sorts of computed pieces created from different parts of said state, but these projections aren’t your source of truth. They are always be recreateable from your ‘true state’, and maintaining them is tedious and not to mention horribly bug prone

There are only two hard things in Computer Science: cache invalidation and naming things — Phil Karlton.

As an industry we have found a few strategies for handling derived state without touching the icky manual cache invalidation stuff:

Derive everything every time you need it again or at least every time you change any part of the state — this works beautifully but incurs a huge runtime cost if your state & derivation are non-trivial. ReactJS is the poster child of this approach, the VDOM is your derived state and is recalculated every time a component is rendered
Model your state using immutable objects, or at least treating them as such (either by using the languages type system guarantees or using immutable data structures like Hash Arrays Mapped Tries — like Immutable-Js), and cache the results based on the object’s identities. Redux’s ReSelect is a good example for this approach
Use Reactive Programming, wrap your state fragments in getters/setters, wrap you derivation functions so they track which fragments they consumed and are re-evaluated when one of setters for one of the fragments they consumed is used. MobxJS is probably the most commonly used example for this approach in the JS ecosystem

All approaches have significant runtime&memory costs, and when your performance really matters, for example because you don’t want to miss a rendering frame and your budget is 16 milliseconds, they aren’t good enough, and none of them actually get you the tightest code possible as they aren’t incremental in nature if for example your derived state is the sum of an array changing an item in the array will recalculate the sum of the entire array again.

Alan Kay famously said “Perspective is worth 80 IQ points” and I have a pretty unique perspective. As the architect of Wix’s editor group, I am trying to wrangle a beast with 2M lines of JavaScript, 80 devs and non derived states which literally wouldn’t have fit into the working memory of any of my first 3 computers. We needed something better, an abstraction over derived state that was easy to write, super performant and scaled to the complexity of the problems we are tackling.

Just like no working developer tracks CPU register allocation manually nowadays, because we have compilers that handle that for us and do much better jobs at it than we would. Cache invalidation of derived state needs to be someone else’s problem. It shouldn’t fall on the shoulders of every developer, nor should use expensive abstractions like Immutability or Reactive Programming.

I started thinking about what it means to derive state. State derivation is a Directed Acyclic Graph; you compute stuff based on the state and then use the results of these computations and other bits of your core state to compute more stuff based on them and so on. I needed a way to model this graph in a way that would compute the derived state as quickly as possible and would have as little development effort as possible. Rust has a beautiful concept of Zero Cost Abstractions, stuff that helps you write code more elegantly and easily but disappears in compilation and doesn’t have any overhead, thus the concept of CARMI was born a Compiler for Automatic Reactive Modelling of Inference.

CARMI — is a declarative DSL(Domain Specific Language) and compiler which is fed two types of inputs:

Getters of derived state — expressed using a Lodash inspired syntax, calculate everything you want to know.
Setters — define the parts of your model that are writable

It then generates the source code of your state container function — run it on your initial state and get back an object with all your getters values fully resolved— run any of the setters and change the model, it will update the model while doing as little work as possible to make sure your getters are in sync.

Because the compiler knows in advance all the stuff that can be read/written from/to the model, it can do all sort of cool stuff that is nearly impossible to do automatically using other alternative approaches

All computation is incremental
Track conditional consumption of parts of the model only if used with zero overhead
Hoisting shared sub expressions, so they are only calculated once
Ignore dependencies if there are no setters that can cause the expression to invalidate

CARMI itself is an open source project, released under the Wix Incubator program. I’d be happy to know that it can help more people making the web faster and more reliable!

https://carmi.js.org

https://github.com/wix-incubator/carmi

Software, complexity and cache invalidation

The curse of derived state

Written by Avi Marcus