Immutable.js, persistent data structures and structural sharing

Why use Immutable.js instead of normal JavaScript object?

Treating your data as immutable brings many benefits. In fact, that’s a principle behind React: React elements are immutable. You might also be interested in learning about an Immutable App Architecture.

But what’s the benefit of using Immutable.js:

function toggleTodo (todos, id) {
return todos.update(id,
(todo) => todo.update('completed',
(completed) => !completed
)
)
}

…over using normal JavaScript objects (and treating them as immutable, optionally using helpers such as seamless-immutable), like this?:

function toggleTodo (todos, id) {
return Object.assign({ }, todos, {
[id]: Object.assign({ }, todos[id], {
completed: !todos[id].completed
})
})
}
// Using updeep
function toggleTodo (todos, id) {
return u({
[id]: {
completed: (completed) => !completed
}
}, todos)
}

A very large object…

Let’s say we have 100,000 tasks in our todo list:

var todos = {

t79444dae: { title: 'Task 50001', completed: false },
t7eaf70c3: { title: 'Task 50002', completed: false },
t2fd2ffa0: { title: 'Task 50003', completed: false },
t6321775c: { title: 'Task 50004', completed: false },
t2148bf88: { title: 'Task 50005', completed: false },
t9e37b9b6: { title: 'Task 50006', completed: false },
tb5b1b6ae: { title: 'Task 50007', completed: false },
tfe88b26d: { title: 'Task 50008', completed: false },

(100,000 items)
}

I just finished the 50005th thing on the list.

Now I want to mark it as complete.


With a normal JavaScript Object

var nextState = toggleTodo(todos, 't2148bf88')

This single operation took 134ms to run.

Why? Because when you use Object.assign, JavaScript shallowly copies each property from the sources over to the destination. One property at a time.

We have 100,000 todos, so that means 100,000 properties to copy.

That’s why it takes a long time.


Why does it have to be this way?

In JavaScript, objects are mutable by default.

When you clone an object, JavaScript has to copy every property over so that these two objects become totally separated.

100,000 properties are being (shallowly) copied over to the destination.

This lets you mutate any object afterwards without affecting the other one. Even though you plan to treat these objects as immutable, JavaScript doesn’t care.


Using Immutable.js

var todos = Immutable.fromJS({

t79444dae: { title: 'Task 50001', completed: false },
t7eaf70c3: { title: 'Task 50002', completed: false },
t2fd2ffa0: { title: 'Task 50003', completed: false },
t6321775c: { title: 'Task 50004', completed: false },
t2148bf88: { title: 'Task 50005', completed: false },
t9e37b9b6: { title: 'Task 50006', completed: false },
tb5b1b6ae: { title: 'Task 50007', completed: false },
tfe88b26d: { title: 'Task 50008', completed: false },

(100,000 items)
})

Having represented our data using an Immutable.Map, let’s update it.

var nextState = toggleTodo(todos, 't2148bf88')

This operation takes only 1.2 ms to run. More than 100x faster!

Why is it so fast?


Persistent data structures

Persistent data structures enforces a constraint that all operations will return a newer version of that data structure and keep the original structure intact, instead of updating the original structure in-place.

This implies that all persistent data structures are immutable.

Given this constraint, libraries that implement persistent data structures can perform more optimizations because they know that we won’t ever (and can’t) mutate our data.

Let’s look at one such optimization:


Optimization using tries

To make it easy to visualize, let us try a smaller example.

Imagine storing a key-value mapping like this:

We could store them all in a single JavaScript object:

const data = {
to: 7,
tea: 3,
ted: 4,
ten: 12,
A: 15,
i: 11,
in: 5,
inn: 9
}

But how about we create a trie instead? It’s a tree structure that… uh just look it up on Wikipedia. It looks like this:

Image source: Wikipedia

Basically, you can get to the value you need by following the paths in this graph from the root.

If you want data.in, start at the root, follow the path labelled i and n respectively. You should arrive at a node that contains 5.

Now, how about modifications?

Let’s say we want to change the value at the key tea from 3 to 14.

We can construct a new trie, reusing existing nodes as much as possible.

Green colors means newly-created nodes.

The old tree is still there, unchanged, in case you keep a reference to it.

The nodes in gray will be garbage-collected if we lose a reference to the old tree.

As you can see, we only need to reconstruct 4 new nodes to update this tree. Other nodes could be reused as-is.

This is called structural sharing.

This is how Immutable.js implements its Immutable.Map. It creates a tree where each node can have up to 32 branches.

Exploring an Immutable.Map containing 100,000 entries. This diagram is drawn from an actual Immutable.Map.

When we update a single item, only a few nodes need to be recreated.

Immutable.js employs crazy advanced techniques to keep this tree structure compact and performant by creating multiple types of nodes to account for various characteristics of subtrees.


Not always…

Please don’t take this article to mean “you should always use Immutable.js.” No, I’m just trying to highlight its benefits in this article and explain why it’s recommended a lot.

Data structure matters, but when I write software, I try to do the simplest thing first. I’d use plain arrays and objects, and then use Immutable.js when I need a speed boost, or when I have a very strong feeling that I will need it for sure (e.g. I anticipate my collection to hold 10,000 items). I almost never use Immutable.js with small objects or collections that I know will have just only a few items in it.


“But doesn’t that mean I may have to come back and change it later?”

Yes. And it’s fine! If all your data access goes through a single, well-defined module. For example,

// -- Todos.js --
export const empty = { }
export function add (todos, id, todo) {
return Object.assign({ }, todos, { [id]: todo })
}
export function getById (todos, id) {
return todos[id]
}

Assuming all application code always use this module to access the data in it, then you only need to update this file when you want to change the underlying data structure.

I call this an ‘entity module,’ which encapsulates the knowledge of representing an entity in a software system. It came from the Clean Architecture. I plan to write more about this concept in the future.


Don’t couple your application logic to data structures.

I’ve learned the hard way to not couple the application logic to data structures. This is because we never know how our data will be accessed in the future.

For example, our todo app is now managing 100,000 tasks. We switched to use Immutable.js and now everything is good and fast.

Suddenly, “a task may have assignees!” said the requirement. “And users should be able to see what tasks they have been assigned to as well.”

function findByAssigneeAsArray (todos, assigneeId) {
return todos.filter(
(task) => Task.containsAssignee(task, assigneeId)
).toArray()
}

And people started complaining that the app is slow. Profiling reveals that the above function is the culprit.

Searching for tasks using the above code requires a sequential search through 100,000 tasks. This can’t be fast.

To optimize for this use case, we need to change the underlying data structure by maintaining a reverse lookup table from assignees back to a list of task IDs. This made-up example came from my experience optimizing the app’s performance at Taskworld.

It would be very hard to make this kind of change if our reducer/selector/view code accessed the contents of these data structures directly.

So that’s it. If we want to iterate fast, we need to make sure it’s easy to make changes in our software. Keep code clean, write tests, set up CI and CD from the beginning.


Thanks for reading!

P.S. There’s some more discussion on Reddit.