Alpha Live Streaming Issue Update
Hi everyone,
We’ve been having technical issues over the last couple of weeks with the Alpha site. We’ve been working hard to resolve these issues, so we want to give an update on where everything stands.
We have put in some additional fixes for today’s broadcast of Critical Role (February 9, 2017), but there is a good chance we may encounter another streaming issue tonight. We ask that you come to Alpha and watch Critical Role there, because 1 of 2 things will happen:
- The stream will go flawlessly and we win!
- The stream will become inaccessible again, but we’ll get the necessary information to fix the problem. We know that tech issues aren’t fun for anyone, but we’ll be able to better diagnose the issue if we have as many people on Alpha as possible.
We appreciate your patience and as a token of our appreciation, we are going to give everyone who has an active Alpha account 30 days free at the end of February. Please be sure to keep an eye on the Alpha newsletters in the coming weeks for further information about the comped month.
Thanks for the support — as always, if you have any questions or need help, please drop us a line at support@projectalpha.com.
PS — For those of you that are interested in the development of our site, we have also included a more technical description below from our tech team.
— — — — — — — — — — — — — —
Bidet!
Our software engineering team is constantly working on improving the experience for our users. In rolling out some services and features that should have improved performance to the users of Alpha, we’ve had the opposite effect over the past few weeks.
Alpha runs on a custom software stack in Amazon Web Services utilizing ReactJS, Node.js, Redis, and Couchbase. We also utilize a few third-party services as well to help with our video delivery. A few weeks back we began implementing more caching to make site load times quicker as we’ve had more active users. Upon rolling these changes out at first, we addressed a few sizing issues of the underlying service (AWS Elasticache) after we observed real world usage. Things seemed okay until last Tuesday when we started getting flooded with alerts of service degradation.
We employ a service oriented architecture, which breaks down the entirety of our service delivery down into several discrete services. When we began to see issues, they became apparent in one of our client services layer apps. Many of you might realize that during outages, some functionality appears to still work, like chat, while others weren’t working at all. Based off initial logging, we made the assessment that one of our third-party partners were causing a core services to die. We spent a few days on calls, getting multiple tcpdump’s from production servers while trying to recreate the issue.
While working thru the issue, we also integrated some more in-depth application performance monitoring to grab as much data as we could if the event continued to occur. On Tuesday (2/8) during Talks Machina we observed the issue again, but could observe stack traces to tie the back to some issues with our previously implemented caching. We’ve been working hard since Tuesday night to get as much of it fixed as possible for Critical Role on Thursday.
We’ll continue working on issues until things are functioning correctly again. We’re more than happy to answer any other questions if you want to geek out more with us.
Alpha Tech Team