slider

CrowdStrike Falcon Sensor Update Triggers Global BSOD Crisis

On July 19, 2024, a seemingly routine software update from cybersecurity firm CrowdStrike unleashed a cascade of disruptions across multiple industries worldwide. The update to CrowdStrike’s Falcon Sensor, intended to enhance security for mission-critical systems, instead caused Windows-based systems to crash with Blue Screens of Death (BSODs). The incident began in Australia and quickly spread globally, severely affecting sectors such as airlines, emergency services, financial institutions, and even the news.

Skynews, a British news site, was unable to broadcast properly this morning due to the CrowdStrike incident

Sequence of Events

The first reports of BSODs emerged from Australia, where systems in TV networks, 911 call centers, and financial institutions began crashing. As the problem followed the dateline, similar reports surfaced from other regions, including India, South Africa, Thailand, and several European countries. The Paris Olympics and numerous airlines, including American Airlines, United, Delta, and Frontier, faced significant operational challenges due to the widespread system failures.

In a thread within the official CrowdStrike subreddit, the moderators posted an statement detailing a manual workaround. The suggested steps involved booting affected systems into Safe Mode or the Recovery Environment, navigating to a specific directory, and deleting a .sys file before rebooting. This labor-intensive solution, requiring manual intervention, exacerbated the disruption as it could not be deployed through a network push.

At 5:45 am Eastern time, CrowdStrike CEO George Kurtz addressed the issue on social media, confirming that the problem stemmed from a defect in a single content update for Windows hosts. He reassured that the issue had been identified, isolated, and a fix had been deployed. Kurtz emphasized that this was not a security incident or cyberattack, but rather a technical defect in the update.


Widespread Impact

The repercussions of the faulty update have been extensive and multifaceted. Airlines have been among the hardest hit, with numerous flights grounded or delayed due to system failures. United Airlines, Delta, American Airlines, and Frontier experienced significant disruptions, with passengers facing long delays and cancellations. The aviation sector’s reliance on interconnected IT systems meant that the outage had a profound ripple effect, causing logistical chaos and operational bottlenecks.

Emergency services also reported major issues. In Alaska, 911 and non-emergency lines experienced outages, while similar problems were reported across other states and countries. Airports in major cities such as Amsterdam, Berlin, London, and Paris saw delays and long queues as check-in systems malfunctioned. Financial institutions in multiple countries faced operational disruptions as computers crashed, affecting banking services and financial transactions.

A view of the various blue screens of death at the Amsterdam Airport, via X/Twitter

Adding to the complexity, Microsoft experienced concurrent outages. Multiple Azure services went down due to a backend cluster management workflow issue, which blocked access between storage clusters and compute resources. This overlap in outages led to confusion regarding the root cause, with some attributing disruptions to Microsoft’s services and others to the CrowdStrike update.

Microsoft issued an advisory on the BSOD issue affecting virtual machines running Windows, suggesting multiple reboots and manual deletions of the problematic file. This highlighted the intertwined nature of modern IT infrastructures, where issues in one system can have far-reaching consequences across various services.


Analysis: Overreliance on a Single Vendor

The CrowdStrike incident presents into the public eye a significant vulnerability in modern IT practices: the overreliance on a single vendor for critical security updates. This dependency can lead to catastrophic outcomes when a failure occurs, as demonstrated by the widespread disruptions following the faulty update.

Key Issues Identified:

  1. Single Point of Failure: This incident has proven how a single update from one vendor could cascade into a global IT crisis. Many organizations, reliant on CrowdStrike for their security needs, were left vulnerable when the update caused system crashes. This single point of failure disrupted operations across diverse sectors, from aviation to emergency services.
  2. Lack of Redundancy and Diversification: Organizations affected by the outage lacked alternative solutions or redundant systems to mitigate the impact. The absence of diversified security measures meant that when CrowdStrike’s update failed, there were no immediate fallback options, leading to prolonged downtime and operational chaos.
  3. Complexity of Manual Interventions: The suggested manual workaround to fix the issue highlighted the challenges of relying on centralized updates. The labor-intensive process of booting systems into Safe Mode and manually deleting files was impractical at scale, especially for large organizations with thousands of affected machines.
  4. Dependency on Interconnected Systems: The concurrent outages at Microsoft illustrated the risks of interconnected IT ecosystems. The reliance on multiple vendors’ systems created a scenario where failures in one could amplify the impact of failures in another, complicating recovery efforts and prolonging disruptions.

How Does the Faulty Falcon Sensor Driver Cause a BSOD?

CrowdStrike Falcon requires installing a lightweight tool called “Falcon Sensor,” which includes services and, crucially, drivers that run in Kernel mode to monitor system activity at a low level—a common practice among security software. When a regular application crashes, it can simply be reopened because it operates in User Mode. However, since Falcon Sensor operates in Kernel Mode, any problem can cause a Kernel Panic, resulting in the dreaded Blue Screen of Death (BSOD) on Windows. In this case, the faulty driver, named “C-00000291*.sys,” caused a Kernel Panic due to a bad read to 0x9c as indicated by the stack trace. Because device drivers load during boot, this issue forces Windows into recovery mode. The only fix is to boot into Safe Mode and delete all files starting with “C-00000291” from the C:\Windows\System32\drivers\CrowdStrike directory. While some systems might be fixed through an update, many will require manual intervention via Safe Mode.


How Does one Fix a BSOD Caused by the Update?

To fix the Blue Screen of Death (BSOD) and the “Recovery” loop caused by CrowdStrike, you can follow several methods.

Method 1

The first method involves using Safe Mode to delete the faulty file. Boot your computer into Safe Mode by selecting “See advanced repair options” on the Recovery screen, then navigating through “Troubleshoot” > “Advanced options” > “Startup Settings” and restarting your PC. After it restarts, press 4 or F4 to enter Safe Mode. Alternatively, you can press F8 repeatedly during startup to access Safe Mode. Once in Safe Mode, open Command Prompt (Admin) and navigate to the CrowdStrike directory by typing cd C:\Windows\System32\drivers\CrowdStrike. Use the command dir C-00000291*.sys to locate the faulty file and then delete it using del C-00000291*.sys.

Method 2

Another method involves renaming the CrowdStrike folder. Boot into Safe Mode as described above, open Command Prompt, and navigate to the drivers directory using cd \windows\system32\drivers. Rename the CrowdStrike folder by typing ren CrowdStrike CrowdStrike_old. This allows the system to bypass the faulty driver during startup.

Method 3

A third method requires using the Registry Editor to block the CSAgent service. Boot into Safe Mode and open the Registry Editor by pressing Win+R, typing regedit, and pressing Enter. Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\CSAgent. Find the Start entry, double-click it, and change its value from 1 to 4, which disables the service. Save the changes, close the Registry Editor, and restart your computer. These steps should resolve the BSOD and recovery loop, allowing your system to boot normally.


Conclusion

The CrowdStrike incident serves as a critical lesson in the importance of diversification and redundancy in IT security practices. Organizations must re-evaluate their reliance on single vendors and implement comprehensive strategies to mitigate risks, ensuring resilience in the face of unforeseen disruptions.


How Can Netizen Help?

Netizen ensures that security gets built-in and not bolted-on. Providing advanced solutions to protect critical IT infrastructure such as the popular “CISO-as-a-Service” wherein companies can leverage the expertise of executive-level cybersecurity professionals without having to bear the cost of employing them full time. 

We also offer compliance support, vulnerability assessments, penetration testing, and more security-related services for businesses of any size and type. 

Additionally, Netizen offers an automated and affordable assessment tool that continuously scans systems, websites, applications, and networks to uncover issues. Vulnerability data is then securely analyzed and presented through an easy-to-interpret dashboard to yield actionable risk and compliance information for audiences ranging from IT professionals to executive managers.

Netizen is an ISO 27001:2013 (Information Security Management), ISO 9001:2015, and CMMI V 2.0 Level 3 certified company. We are a proud Service-Disabled Veteran-Owned Small Business that is recognized by the U.S. Department of Labor for hiring and retention of military veterans. 

Questions or concerns? Feel free to reach out to us any time –

https://www.netizen.net/contact


Copyright © Netizen Corporation. All Rights Reserved.