Crowdstrike Falcon Incident brings VMs DOWN!—Is Cloud Really Reliable?
Reflections of an incident when a critical third-party service Falcon agent, by Crowdstrike, impacts Microsoft Windows, crashes VMs, and brings down applications and operations.
Update: Check the end of this article for the options to fix the issue on Microsoft Azure VM and Amazon AWS EC2 instances.
Friday, July 19th, 2024. It’s 7:43 am EST, and my CIO calls me and says "Have you seen the news? There is a major IT outage. Is it affecting us?".
In a matter of minutes, I'm connected with my team and peers, investigating and researching the incident, validating the surface of impact in our operations.
The heartbeat comes down while system by system, VM by VM, and each involved resource is validated as not impacted.
Even though we are not impacted, I cannot avoid recalling the failure of Microsoft Azure AD almost four years ago, and decided to review some thoughts I shared in this post: