VRChat Downtime Update

VRChat
VRChat
Published in
3 min readApr 17, 2019

Over the past week or so we have been encountering various problems with our infrastructure due to various factors. Although we are not completely out of the woods just yet, we believe it prudent to keep our Community in-the-know about what has been happening, and what we’ve been doing to keep up with the issues.

Problem 1: DDoS Attacks

For approximately 2 weeks, VRChat’s real-time networking partner has been experiencing intermittent DDoS attacks. These attacks seem to have been specifically targeted against VRChat, and have been timed to coincide with our historical daily and weekly concurrent user peaks. If you were visiting VRChat this weekend, you may have experienced a room “lagging” to the point where all other users in the room froze, followed by disconnection and reloading to Home. That behavior indicates an attack.

We have mitigations in place and have been working closely with our provider to increase our protection. Our recent improvements result in a lowered impact of these attacks. Although we can mitigate these attacks fairly well right now, we are actively and continually looking into other methods by which we can fortify the defenses around VRChat services.

As an aside, although DDoS attacks have increased in intensity over these past two weeks, VRChat services are essentially under a low-level constant attack at all times. Our mitigations prevent the majority of these attacks from impacting our service, but some occasionally break through.

Problem 2: Cloudflare

VRChat, like many other web services available, use the service Cloudflare to help mitigate several types of attacks. Cloudflare in particular provides DDoS mitigation and security services. They analyze traffic to determine if behavior is malicious or not, and then can “blacklist” IP addresses to prevent damage to the network as a whole from a bad actor.

VRChat recently implemented a new communication method for our API using WebSocket technology. Although this technology permits fast response times when it comes to things like Notifications, our implementation also resulted in a potential failure state that could flood servers with reconnection efforts.

Over the past week, we experienced such a failure case. Our WebSocket client was misconfigured to attempt to reconnect too often. When we rebooted some of our servers, we experienced a “thundering herd” problem where our clients would repeatedly try (and fail) to reconnect their sockets. Cloudflare misinterpreted this as a DDoS attack and both started null-routing our websocket traffic and flagging our customers as “suspicious agents”. When this happened, these users could no longer log into VRChat, and they would encounter ReCAPTCHA checks across Cloudflare-protected services.

We recognized the problem and the cause fairly quickly, and implemented a patch that would prevent the faulty reconnection behavior. We then contacted Cloudflare and informed them of the situation. They investigated the addresses flagged as a result of this error, and cleared those that were impacted.

Although we believe the majority of this problem is behind us, we are receiving some reports that users are still seeing ReCAPTCHA checks across Cloudflare-protected websites. If you are still affected by this issue and are still getting ReCAPTCHA checks across the web, please contact our Support team by creating a ticket that contains your public-facing IP address. You can find your public-facing IP address by using this DuckDuckGo query.

We continue to work with Cloudflare to help mitigate this issue and similar issues in the future. As it stands, you should no longer be falsely flagged while playing VRChat as long as you are running the latest version.

We realize that this issue has caused a negative impact to a large set of users across various services, and we wish to apologize. We are actively working to solve this particular issue, and are implementing processes and methods by which we can detect these issues before deployment as well as during active usage of VRChat.

Conclusion

The confluence of these two issues resulted in difficulty in diagnosing the problem, which resulted in an increased time-to-solution for either problem. Although we believe we are through the bulk of the issues now, it is possible we may still experience some continued issues as we solve these problems.

VRChat is still growing. Although we expect “growing pains” such as this to occur at times, our goal is to provide the highest quality of service we can.

VRChat services have had a bit of a rough time during the previous week. We thank our Community for their immense patience during this time, as well as the assistance many users provided by informing us of the issues they were having and providing us diagnostic information to track down problems.

--

--