Incident Summary: 2017–03–16
Square Engineering

Thanks for sharing this. Can you explain why the Multipass reboot didn’t fix the problem?

As I understand it, a bad Roster push triggered an existing bug in Multipass which overwhelmed Redis. Shouldn’t a Multipass reboot have given Redis time to recover, thus working around the Multipass bug?

