Developer Blog — Websockets

VRChat
VRChat
Published in
6 min readMar 23, 2019

This blog post was written by our Server/DevOps lead, System.

here is a very boring, very difficult thing that System did to speed up a part of the application that you don’t think about very much without changing the forward-facing appearance of it at all

We made friend requests and invites faster.

The new Beta of VRChat includes a relatively new, very experimental build of a websocket-based communications layer to connect the client and the API servers!

What’s The API Even Do, Anyways?

There are three major parts to VRChat — the client, which handles all of the fun virtual reality stuff, the network layer, which handles the high-speed low-drag communication between users in the virtual realms, and the API, which is responsible for permanent information storage like “who our users are” and “where our users are” and “keeping tabs on a truly spectacular collection of petabytes of a̵v̵a̵t̵a̵r̵s̵ loose conglomerations of dynamic bones and particle animations.”

HTTP Polling And You, A Primer

For most of the history of VRChat, our client has polled our API.

To describe how polling works in layperson’s terms, it’s as if every client in the entire universe were to check for notifications every 20 seconds or so.

  • “Do you have anything for me yet?” “No.”
  • “Do you have anything for me yet?” “No.”
  • “Do you have anything for me yet?” “No.”
  • “Do you have anything for me yet?” “No.”
  • “Do you have anything for me yet?” “No.”
  • “Do you have anything for me yet?” “No.”
  • “Do you have anything for me yet?” “No.”
i crave star damage

Unsurprisingly, about 75% of all traffic to our API is needless poll spam — and for that reason, our regularly polled endpoints must be aggressively optimized. Even a tiny slow-down in one of our polled endpoints can cause a cascading failure that leaves the whole application feeling sluggish. All of that load still stresses our load balancers and our connected servers — and, worst of all, the amount of load that polling causes reduces how often we can check for updates, which is why our current turn-around time on friend requests and invites are in the range of 20 entire seconds.

You might ask why we built things that way in the first place — and the answer was because we needed it done quickly and there weren’t any easy alternatives.

there has to be a better way

WebSockets

WebSockets sit on top of the HTTP stack, but provide a two-way communications protocol.

Instead of polling, the websocket client can just sit back, relax, and wait for a message to come through. “You have a notification!” Nice!

A websocket is much, much more expensive on the server side than a single web request, but it’s less expensive on the server side than polling, so it seems like an obvious slam dunk for a problem like this one.

The only problem was that all of our API code in the client was organized around the concept of HTTP access, rather than a pipeline of updates.

So: coding! We built a websocket client into VRChat that manages the websocket connection, as well as all associated HTTP requests, on a series of background threads. It then passes those resources as immutable objects back to the main thread using a concurrent data structure back-ported from a more modern version of C# during Unity’s update loop.

The API’s websocket layer is also a nice outpost for those of us on the API team to exert a little bit of control over the madness that is Client code.

stay outta client you mooks

Updated Load Balancing Stratz

Our servers are comprised of a very large cluster of single-CPU, low RAM virtual machines. For this reason, our load balancing strategy for every single incoming request has long been either “random” (picking a server out of a hat) or “least_conn” (picking whichever server looks like it needs something to do) — strategies that are very good at spreading the load out amongst a group of different servers. The load balancer also helps with self-healing, as servers that don’t respond promptly are kicked out of the rotation until they can pick themselves up and dust themselves off a bit.

This doesn’t work with a socket connection, though — developing and maintaining a socket relationship with a server requires that the load balancer pass us through to the same server every single time. We need the “ip_hash” load balancing strategy, where a computer with a fixed IP gets the same server every time.

We don’t, however, want to give up all of the the benefits of our random load balancing, most of the time. We want both — original recipe load balancing for *most* requests, and IP based load balancing for websocket connections.

The solution? Two different endpoints, one for API connections, one for websocket connections. They both load balance to the same set of servers, but with different load balancing strategies. Problem solved!

Well, there’s still one problem, which is “how do we handle cut-over during deploys” but that sounds like a problem for future us. Maybe it’ll just work? Maybe we’ll have to go back to running API deploys at 2AM on Tuesdays until we tune the reconnection logic to cope with a server vanishing all-of-the-sudden! Who knows!

It’s Much, Much Faster.

We plan to use this new socket layer to power a lot of improved interactions with the API — but it’s a big change to a very fundamental layer of our application, so we wanted to start with something small and relatively simple.

You guessed it — friend requests and invites. It’s a small system, but high impact — our notifications system actually generates more load than worlds, avatars and users combined.

And our updated websocket code has now made it into open beta — it’s launching soon. We’re very excited.

What’s Going To Change?

but you didn’t do anything

Nothing! Nothing will change at all! Notifications should feel faster, and nothing else about the application should seem to have changed at all!

What’s Next For The Websocket Team?

down with this sort of thing

So many parts of our application would benefit from a thorough saucing with the new socket love, but one of our biggest frustrations in the system right now are Friends.

So, no one told us Friends was gonna be this way 👏👏👏👏

The system has to poll for each and every friend you have. Honestly, that’s such a bad idea that we’d have tackled it first, if it weren’t for the fact that friends are significantly more complex than friend requests.

Thanks for reading! We’re trying our hand at some more technically-minded posts like this to give our Community a look behind the curtain. If you enjoyed it and want more, please let us know on Twitter!

--

--