If you were part of the endless legions of IT workers furiously fixing Windows machines over the weekend thanks to , I salute your service—and if you were affected by the disruptions to flights, hospital services, banking and more, I commiserate. Most of us, however, remained unaffected, as according to Microsoft only 1% of Windows devices fell victim to the bug.
Still, that's 8.5 million devices causing turmoil worldwide, and as a result, it deployed hundreds of Microsoft engineers and experts to work with customers to restore their stricken services (via ). MS also engaged directly with CrowdStrike to work on a solution, with the company releasing its own, regarding some of the technical issues that caused the event.
At the core of the fault was a configuration file contained in an update for CrowdStrike's Falcon platform, which triggered a logic error that in turn caused a BSOD loop on Windows systems running Falcon sensor software.
The update was designed to "target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks", but instead threw some very important infrastructure into a loop, causing a .
How we did this in the old days:When I was on Windows, this was the type of thing that greeted you every morning. Every. Single. Morning.You see, we all had a secondary "debug" PC, and each night we'd run NTStress on all of them, and all the lab machines. NTStress would… pic.twitter.com/rZkvpujbcr
The problem, in this case, is that this event was created by a CrowdStrike driver that passed WHQL testing but still possessed the capability to download and execute p-code that hadn't been signed by Microsoft. Essentially, a third-party driver at the heart of a system can still bring it down with a dodgy update, even if Microsoft's processes for its own updates have appropriate levels of testing and certification.
Well, it's all been a bit of a clusterfudge, hasn't it? Microsoft is unlikely to be happy that its name [[link]] is once again in the headlines for server-related issues, although in recent years it's often been . As of [[link]] now, the issue appears to have been fixed, at least, and perhaps some lessons have been learned for third-party updates in future.