Global IT outage — BSOD and CrowdStrike
Last week we witnessed one of the largest IT outages affecting the travel, banking, business, and health sectors worldwide in the form of BSOD (Blue screen of death) on Windows machines.
The (in)famous Blue screen of death (known as BSOD, fatal error and bugcheck) indicates the system reached a critical condition from where it cannot continue to operate normally and requires troubleshooting. Possible causes include hardware failures or unexpected termination of a crucial process or thread
The cause
System, program and application updates are a common part of information security. They are continuously created, tested and pushed to endpoints. The massive IT outage was caused by one of them, CrowdStrike’s channel file update. The faulty update was pushed into the cloud throughout the night causing Windows machines with Falcon sensor installed to crash showing the BSOD. That marked the starting point of thousands of machines going down and critical systems crashing.
Unlike policy updates triggered from the centralized console that affect the sensor’s version and prevention and detection capabilities, the channel file update is part of Falcon’s behavioral protection mechanisms that influence the sensor’s logic. The channel configuration files are pushed to sensors frequently to stay ahead with discovered TTPs (Tactics, Techniques and Procedures).
In this case, the channel file C-00000291*.sys had faulty logic and disrupted the systems. It affected all Windows machines online on Friday, July 19, 2024, between 04:09 UTC and 05:27 UTC. Systems that were offline at the time were not impacted by the update as CrowdStrike immediately reverted the changes by pulling the file from the cloud. Linux and MacOS were not impacted.
The fix
Identifying impacted hosts
- Windows hosts showing BSOD
- New granular status dashboard in the CrowdStrike console
Remediation
CrowdStrike provided the workaround action to enter the safe mode and remove the specified channel file from C:\Windows\System32\drivers\CrowdStrike.
Although it was one of the most chaotic Fridays (and weekends) in the history of IT, most companies managed to recover critical systems throughout the day. It is expected the resolution will continue in the next weeks and possibly months. But a lot of questions arose and many are demanding answers on the process itself, change management, proper testing and what measures the company will take to address future changes and updates. The catastrophic event has shown how IT systems, in the end, are fragile and vulnerable and many are wondering how to prevent anything similar from happening again.
CrowdStrike
CrowdStrike is one of the leading EDR/MDR/XDR solutions on the market with thousands of clients, mostly targeting organizations. Its lightweight agent provides detection, prevention, and remediation capabilities while constantly monitoring and gathering data from end machines for further analysis in the cloud. It includes Endpoint Security, Cloud security (CNAPP), Threat intelligence and Hunting, Next-Gen SIEM, Workflow Automation, Exposure Management and Identity Protection.
More information on Windows crashes related to Falcon sensor: