Syncing State is Hard

Kobe Albright
Valour App
Published in
5 min readMay 9, 2023

When designing Valour, time and planning went into most aspects of the app. Specific features such as voice and video were recognized as ‘big ticket’ items and it was understandable that these things would take time and effort to build.

You can find Valour here

At some point, it all just looks like a blur of numbers.

So why is it that determining if a channel has an unread message — and by extension, an entire community — such an outlier? Why has this small piece of the puzzle grown to encapsulate so much of my development time?

Users vs Connections

Things become complex accordingly with the complexity of their inputs… or at least, that’s a general guideline. Here’s an example of where something seemingly simple can explode in complexity from a slight change in requirements.

If we need to track read states on a per-user basis, meaning that we need to just keep track of users, the solution is simple.

  • Track last update time for channels
  • Track the last time a user has looked at those channels
  • Send both to the client

What a clear and concise solution. When a user opens a channel, the last time viewed is updated. If that time is greater than the channel’s last update, then there are no unread messages. So where is this getting messy?

In today’s world, you aren’t just a user. You may have the same app open on both your computer and your phone. Now you’ve greatly increased the complexity of this problem! But how? Now we have to sync state updates across connections, where before we were able to lazily grab states when we needed them. You may be able to see a channel (and its state) in the app, and then open the channel on another device, which means that the first device should know to update the user’s last viewed time!

This means we need the server to notify all active clients of the user to any activity occurring on their other devices. This means we also need a way to track active connections, and tie them to specific users. In Valour, we use Redis to efficiently track these connections, although hard crashes can lead to ‘dead references’, an issue that will have to be resolved.

Oh Nodes

If you think this problem already sounds complex to solve, we have another layer of complexity to add. Valour is built on a node system (for a variety of reasons I will describe in a future post), where every client can connect to multiple nodes at once. To put it simply, this allows for some amazing optimizations and server configurations, and some nifty tricks when scaling. However, when we want to sync the state of a user between devices, things can get messy.

An incredibly simple node connection diagram — one user has three devices, one connected to node A, and two connected to node B

In the image above, it is easy to sync state across the devices on node B. The node simply fires off an event to the SignalR group (SignalR is a system for realtime communications between clients and servers over the web) for the user, which then any authorized devices for that user should receive.

Node A, however, has no clue what’s going on. There’s a few solutions to this, each with benefits and drawbacks. First, nodes could send HTTP requests to notify each other of these events. This is a very direct way to send the message — it doesn’t hit any nodes other than the target. However, there is an issue.

A simple HTTP request from node B to node A. But there’s a problem.

Well, to be fair, there are multiple. Using an HTTP client to send requests every single time there is a synced event can get expensive performance-wise very quickly. Not to mention that there needs to be one request per existing client, leading to O(n).

The real issue is that we have made the assumption that nodes are aware of where other nodes exist, and what clients are connected to them. How does node B even know to send a request to A?

See, to pull this strategy off, we need to store connection states in a globally accessible form. This added layer costs both performance and scalability, and adds another failure point.

The complexity continues to grow.

But there is a solution to this pesky inter-node communications that takes a more laid back approach, and happens to also scale easily. Imagine that we could actually move the HTTP request step inside of the same tool we use to track the connections.

Enter Redis

Redis is almost magical in the world of software engineering. If you didn’t know, Redis has a powerful Pub/Sub system built right in that can be used along with its powerful in-memory caching. If we choose to store connection states in Redis, we can also use it to send events to specific nodes. Nifty, right?

One link to rule them all.

While we still have to double dip and hit Redis twice — once to ask for the nodes a user is a part of, and again to fire off the event s to go to those nodes — we have managed to simplify the architecture and improve its efficiency. We could actually send a ‘user’ event to all nodes, and have nodes subscribe to all users which are actively connected, but there are drawbacks. Foremost, this approach would require many channels, one per active user. And while Redis Pub/Sub is fast, creating potentially millions of channels will quickly cause issues, both speed-wise and concerning architectural limitations.

A Step Forwards

It’s not perfect, and we will likely optimize it. But this is a good start in the direction of a distributed node system for a realtime application, and it’s going to do the best it can to keep Valour humming along.

--

--

Kobe Albright
Valour App

I’m a developer and Youtuber tired of the heavy-handed and controlling approach of silicon valley.