Analyzing the CrowdStrike Outage: A Technical Deep Dive and Windows Security Best Practices

Windows, a platform widely utilized by major businesses, prioritizes security and availability, particularly in high-availability use cases. To address these requirements, Windows offers a range of operating modes and integrated security monitoring and detection capabilities.

This post examines the recent CrowdStrike outage, offering a technical overview of the root cause. We’ll also investigate the reasons behind security products’ use of kernel-mode drivers, along with the safety measures Windows provides for third-party applications. Furthermore, we’ll explore how customers and security vendors can better leverage the built-in security features of Windows to improve security and reliability. Lastly, we’ll shed light on Windows’ plans to enhance extensibility for future security products.

CrowdStrike recently released a preliminary post-incident review that analyzed their outage. According to their blog post, the root cause was a memory safety issue—specifically, a read-out-of-bounds access violation within the CSagent driver. Microsoft utilized the Microsoft WinDBG Kernel Debugger and several freely available extensions to conduct our analysis.

Customers with crash dumps can reproduce our steps using these tools. Based on Microsoft’s analysis of Windows Error Reporting (WER) kernel crash dumps associated with the incident, global crash patterns were observed reflecting this:

FAULTING_THREAD: ffffe402fe868040 READ_ADDRESS: ffff840500000074 Paged pool MM_INTERNAL_CODE: 2 IMAGE_NAME: csagent.sys MODULE_NAME: csagent FAULTING_MODULE: fffff80671430000 csagent PROCESS_NAME: System TRAP_FRAME: ffff94058305ec20 — (.trap 0xffff94058305ec20)

Digging deeper into the crash dump, we can reconstruct the stack frame at the time of the access violation to understand its origin more clearly. While WER data provides a compressed version of the state, and disassembling backwards to see a larger set of instructions prior to the crash is not possible, we can see in the disassembly that there is a check for NULL before performing a read at the address specified in the R8 register:

6: kd> .trap 0xffff94058305ec20 .trap 0xffff94058305ec20 NOTE: The trap frame does not contain all registers. Some register values may be zeroed or incorrect. rax=ffff94058305f200 rbx=0000000000000000 rcx=0000000000000003 rdi=0000000000000000 rip=fffff806715114ed rsp=ffff94058305edb0 r11=0000000000000014 r12=0000000000000000 6: kd> !pte ffff840500000074 !pte ffff840500000074

Our analysis confirms CrowdStrike’s assessment of a read-out-of-bounds memory safety error within the CrowdStrike-developed CSagent.sys driver. The csagent.sys module is registered as a file system filter driver, commonly used by anti-malware agents to get notifications about file operations like file creation or modification. This is frequently used by security products to scan any new file saved to disk, like when downloading a file via the browser. File System filters can also be used as a signal for security solutions trying to monitor the behavior of the system. CrowdStrike noted in their blog that part of their content update was changing the sensor’s logic relating to data around named pipe creation. The File System filter driver API allows the driver to receive a call when named pipe activity (e.g., named pipe creation) occurs on the system that could enable the detection of malicious behavior. The general function of the driver correlates to the information shared by CrowdStrike.

6: kd>!reg querykey REGISTRYMACHINEsystemControlSet001servicescsagent

We can see the control channel file version 291 specified in the CrowdStrike analysis is also present in the crash, indicating the file was read. Determining how the file itself correlates to the access violation observed in the crash dump would require additional debugging of the driver using these tools but is outside of the scope of this blog post.

As shown above, CrowdStrike loads four driver modules. One of those modules receives dynamic control and content updates frequently based on the CrowdStrike Preliminary Post-incident-review timeline. We can leverage the unique stack and attributes of this crash to identify the Windows crash reports generated by this specific CrowdStrike programming error.

It’s worth noting the number of devices which generated crash reports is a subset of the number of impacted devices previously shared by Microsoft in our blog post, because crash reports are sampled and collected only from customers who choose to upload their crashes to Microsoft. Customers who choose to enable crash dump sharing help both driver vendors and Microsoft to identify and remediate quality issues and crashes.

CrowdStrike driver associated crash dump reports over time

We make this information available to driver owners so they can assess their own reliability.

As we can see from the above, any reliability problem like this invalid memory access issue can lead to widespread availability issues when not combined with safe deployment practices.

Why Security Solutions Leverage Kernel Drivers

Security vendors, like CrowdStrike and Microsoft, frequently leverage kernel driver architecture for several reasons:

Visibility and Enforcement of Security Events: Kernel drivers provide system-wide visibility and can load early in the boot process to detect threats like boot and rootkits. Also, Microsoft offers capabilities such as system event callbacks and filter drivers, enabling monitoring for events like file creation or modification. Kernel activity can trigger callbacks to block activities.
Performance: Kernel drivers can potentially enhance performance, such as for high-throughput network activity analysis or data collection. However, Microsoft is partnering with the ecosystem to improve performance and provide best practices to achieve parity outside of kernel mode.
Tamper Resistance: Loading into kernel mode increases tamper resistance, ensuring that security software cannot be disabled by malware or malicious insiders. Windows provides the Early Launch Antimalware (ELAM) mechanism for this purpose. CrowdStrike signs the CSboot driver as ELAM. However, there is a tradeoff between the benefits and the resilience costs of kernel drivers.

All code operating at the kernel level requires extensive validation because it cannot fail and restart like a normal user application. Microsoft has invested in moving complex Windows core services from kernel to user mode, which is possible today for security tools to balance security and reliability.

Example security product architecture which balances security and reliability

Windows offers several user-mode protection approaches like Virtualization-based security (VBS) Enclaves and Protected Processes. Windows also provides ETW events and user-mode interfaces like Antimalware Scan Interface for event visibility. These mechanisms reduce the amount of kernel code needed, balancing security and robustness.

How Windows Ensures Quality of Third-Party Security Products

Microsoft engages with third-party security vendors through the Microsoft Virus Initiative (MVI) to improve robustness in security product use on the platform. Microsoft also provides runtime protection, such as Patch Guard, to prevent disruptive behavior from kernel driver types. In addition, drivers signed by the Microsoft Windows Hardware Quality Labs (WHQL) undergo tests to ensure security and reliability. A list of the resources and tools is available here. WHQL-signed drivers are run through Microsoft’s checks before being approved for signing, and if distributed via Windows Update (WU), they also go through Microsoft’s flighting and gradual rollout processes.

Can Customers Deploy Windows in a Higher Security Mode?

Windows can be locked down for increased security using integrated tools, and it constantly increases its security defaults. Windows 11 has many security features enabled by default:

Hardware Security Baseline
- TPM2.0
- Secure boot
- Virtualization-based security (VBS)
- Memory integrity (Hypervisor-protected Code Integrity (HVCI))
- Hardware-enforced stack protection
- Kernel Direct Memory Access (DMA) protection
- HW-based kernel protection (HLAT)
- Enhanced sign-in security (ESS) for built-in biometric sensors
Encryption
- BitLocker (commercial)
- Device Encryption (consumer)
Identity Management
- Credential Guard
- Entra primary refresh token (PRT) hardware protected
- MDM deployed SCEP certs hardware protected
- MDM enrollment certs hardware protected
- Local Security Authority (LSA) PPL prevents token/credential dumping
- Account lockout policy (for 10 failed sign-ins)
- Enhanced phishing protection with Microsoft Defender
- Microsoft Defender SmartScreen
- NPLogonNotification doesn’t include password
- WDigest SSO removed to reduce password disclosure
- AD Device Account protected by CredGuard*
- Multi-Factor Authentication (Passwordless)
- MSA & Entra users lead through Hello enablement by default
- MSA password automatically removed from Windows if never used
- Hello container VSM protected
- Peripheral biometric sensors blocked for ESS enabled devices
- Lock on leave integrated into Hello
Security Incident Reduction
- Common Log File Systems run from trusted source
- Move tool-tip APIs from kernel to user mode
- Modernize print stack by removing untrusted drivers
- DPAPI moved from 3DES to AES
- TLS 1.3 default with TLS 1.0/1.1 disabled by default
- NTLM-less*
OS lockdown
- Microsoft Vulnerable Driver Blocklist
- 3P driver security baseline enforced via WHC
- Smart App Control**

Windows has integrated features for self-defense:

Secure Boot: Prevents early-boot malware and rootkits.
Measured Boot: Provides TPM-based cryptographic measurements on boot-time properties.
Memory Integrity: Prevents the runtime generation of dynamic code in the kernel.
Vulnerable driver blocklist: On by default, complements the malicious driver block list.
Protected Local Security Authority: Protects credentials (on by default in Windows 11).
Hardware-based credential protection: On by default for enterprise versions.
Microsoft Defender Antivirus: Enabled by default with anti-malware capabilities.

Leveraging these security features reduces the attack surface and cost. Best practices include:

Using App Control for Business (formerly Windows Defender Application Control) to allow trusted apps.
Use Memory integrity with a specific allow list policy to further protect the Windows kernel using Virtualization-based security (VBS).
Running as Standard User and elevating only as necessary.
Use Device Health Attestation (DHA):

What’s Next?

Windows is committed to helping the anti-malware ecosystem modernize its approach and will:

Provide guidance and technologies to make it safer to perform updates.
Reduce the need for kernel drivers to access security data.
Offer enhanced isolation and anti-tampering capabilities.
Enable zero trust approaches.

Windows continues to innovate in security, including its commitment to the Rust programming language. Microsoft will continue to share guidance on security best practices.

What's Hot

WM Technology Updates Stockholders on Non-Binding Proposal from Co-Founders

Access Restricted: Website Unavailable in Your Location

Best TV Deals in Amazon Prime Day 2025 Sale

Analyzing the CrowdStrike Outage: A Technical Deep Dive and Windows Security Best Practices

WM Technology Updates Stockholders on Non-Binding Proposal from Co-Founders

Access Restricted: Website Unavailable in Your Location

Best TV Deals in Amazon Prime Day 2025 Sale

Tech in Asia Organization Profile

Restaurant Tech Startup Owner.com Hits $1 Billion Valuation

The Hidden Opportunity in AI: Energy Infrastructure

WM Technology Updates Stockholders on Non-Binding Proposal from Co-Founders

Access Restricted: Website Unavailable in Your Location

Best TV Deals in Amazon Prime Day 2025 Sale

Tech in Asia Organization Profile

Our Picks

WM Technology Updates Stockholders on Non-Binding Proposal from Co-Founders

Access Restricted: Website Unavailable in Your Location

Best TV Deals in Amazon Prime Day 2025 Sale

Subscribe to Updates

What's Hot

Analyzing the CrowdStrike Outage: A Technical Deep Dive and Windows Security Best Practices