Why CrowdStrike's Baffling BSOD Disaster Was Avoidable

  Рет қаралды 4,106

Risky Business Media

Risky Business Media

Күн бұрын

Risky Business host Patrick Gray talks to SentinelOne's Chris Krebs and Alex Stamos about CrowdStrike's baffling failure and what it means for the wider security industry, government regulation and more. SentinelOne is a direct CrowdStrike competitor but this is a wide ranging chat about the can of worms the BSOD incident has opened.

Пікірлер: 37
@Miglen
@Miglen Ай бұрын
I completely agree with Alex on this one. It’s unacceptable for a cybersecurity company to cause an outage / cyber incident that affects the availability of most of its customers, and effectively affecting the world. There are many lessons even for us to learn from this event and there are better protocols and practices to prevent this incident that were not done.
@mgjk
@mgjk Ай бұрын
First time seeing Patrick's face. I always imagined he looked like the EEVBlog guy.
@AssafLevyIL
@AssafLevyIL Ай бұрын
add 9000 pitch just kidding, love 'em both
@mikeconvertino5725
@mikeconvertino5725 Ай бұрын
So, this opened with “let’s talk to competitors of Crowdstrike talk smack about Crowdstrike” and then they did. These guys are great guys, but they have a vested (literally financial) interest in tearing them down. This shoots your credibility completely. I used to work there and ACTUALLY know things were done there when I was there. These guys don’t. Note: I do not have any interests in either of these companies. About interviewing people who are above reproach on this topic.
@grokitall
@grokitall 29 күн бұрын
you might have worked there, but you obviously don't know how they do it now. these guys came out with a set of conclusions about what could not have been done for the release to get deployed, as any of the steps mentioned would have stopped deployment. the recently released independent report on how it happened basically not only confirmed what they said, but went considerably further in finding additional standard practices which they could have used to prevent this, but did not do. none of the steps identified by either the 2 interviewees or by the report are novel or innovative processes, in fact most have been standard practice for well over a decade, raising the question of why were they ignoring them.
@mikeconvertino5725
@mikeconvertino5725 28 күн бұрын
@@grokitall Well, then it had to have been one of two things: 1. Critical parts of QA that we used to do were stripped out in favor of some automated process or 2. Someone who thought they knew better bypassed one or more components. If it’s number 1, then that is foolish. If it is number 2, George wouldn’t hesitate to fire them.
@grokitall
@grokitall 28 күн бұрын
@@mikeconvertino5725 what they have said has happened is that they checked the signature file with a mock of the template file, and then shipped it to everyone with no other testing and no canary releasing or monitoring. this means the signature file got checked, but the template file never got tested until the customer installed it, and it only collected 20 of the 21 parameters used in the signature file, which never got caught because every previous use of the last parameter was using * in that field of the regex. this means that the template file was shipped having never been tested, and never run against the core kernel driver, which did catch the bug even though they expected it to. it also did not detect that it failed to boot last time, so kept trying to load the broken driver.
@markriffey8292
@markriffey8292 Ай бұрын
Would love to hear this discussion consider the responsibility of global enterprises to test before enabling the rollout of an automated update to millions of critical path systems. Do these enterprise IT shops really not use any sort of tiered deployment? Did they really fail to perform the simplest "my brother in law is an IT guy" tests on even a single air-gapped PC? Think: "Hey Joe, install this pending update, reboot and tell me what happens". Simplistic questions, no doubt. Not apologizing for Crowdstrike or MSFT, but it feels like the responsibility for this goes beyond those 2 vendors.
@StephenMcGregor1986
@StephenMcGregor1986 Ай бұрын
This. It's lazyness. Occam's razor.
@grokitall
@grokitall 29 күн бұрын
as pointed out in the interview, a lot of companies were running n-1 or n-2 deployments and were under the impression that this also applied to how the live updates were deployed. most of them were shocked to find out that this update was deployed without doing any integration tests, with no internal test deployments, and with no canary releasing. but it was actually worse. in actual fact, the signature file got fairly minimal testing, the template file was replaced with a mock during testing and thus was completely untested, and it was never deployed to a windows instance for testing the core kernel driver for compatability with the new changes until it was installed on 8.5 million machines in critical infrastructure positions all at once, with no monitoring to even check if it booted ok. in the long term crowdstike will loose a lot of customers over this, and every alternative will be asked a large number of harder questions to try and make sure it does not happen again.
@s1mph0ny
@s1mph0ny 17 күн бұрын
I have to disagree with Chris on Microsoft's liability to this issue. Microsoft was being paid to certify the driver as part of their whql program or nobody would have installed it. If they weren't able to discern what the kernel module was doing they should have rejected it and told clownstrike to start over. I suspect that Microsoft either looked the other way because they assumed that a formerly $70 Billion marketcap company wouldn't do something so stupid, or Microsoft is abusing the whql program as a form of rent seeking without doing the analysis which justifies its cost. There's an interesting conversation to be had around the dogfooding concept as well. Many businesses are aware that this is a problem but their staff find "creative" ways around some of the more absurd protections and requirements, often involving using macs because Apple can't be bothered to actually implement ad policies and/or their justified aggressive throttling of trash-tier av's.
@squirrel1620
@squirrel1620 Ай бұрын
Testing on production... They went cowboy mode. You never go full cowboy mode when you have MILLIONS of endpoints
@andythebritton
@andythebritton Ай бұрын
Machines in our office only blue-screened once or twice; none of them required intervention to resolve the issue. Perhaps CrowdStrike did do basic smoke testing of the update but saw no issues.
@riskybizmedia
@riskybizmedia Ай бұрын
They have already acknowledged that they didn't
@drbrycedavis
@drbrycedavis Ай бұрын
Jeez, why have the competition on the video. This is why you need to do business with Sentinel One sales pitch.
@icantseethis
@icantseethis Ай бұрын
this could happen to anybody that uses crowdstrike
@grokitall
@grokitall 29 күн бұрын
there are 2 different points being mixed up here. one is that crowdstike acted very unprofessionally, using quality assurance standards which were out of date when they were founded, let alone now, and they need to do better. the other is that a buggy driver crashed the kernel, which then went into a boot loop, requiring manual intervention. this is the point that could happen to anyone, but the solution is to do actual testing and modern release practices so that it does not escape to cause a problem for your customers. crowdstirke is getting roasted for taking down the kernel due to acting like cowboys, microsoft is getting roasted for having weak recovery methods which made manual recovery necessary, despite multiple instances of boot loops over many years.
@heliozone
@heliozone 25 күн бұрын
Windows will never be a mature operating system because every release is turned into an obsolete thing on purpose for financial gains
@mhackling
@mhackling Ай бұрын
8 mins in Airlock Digital and Silvio Cesare mention - it's a small world
@kautzz
@kautzz Ай бұрын
main takeaway for a lot of CEOs will be that they don’t need QA teams. we need to hold crowdstrike accountable or live with even shittier products for the rest of our life
@DailyWisdom1
@DailyWisdom1 Ай бұрын
@@kautzz I don't think a shitty product would ever have the chance to be on 8.5M windows machines. Let me guess, you work at S1 and join your colleagues chasing ambulances.
@kautzz
@kautzz Ай бұрын
@@DailyWisdom1 So you liked Windows Vista, Flash and Internet Explorer? I'm a product engineer who cares about quality control. Unlike what CrowdStrike had in place, I know what proper QA looks like.
@DailyWisdom1
@DailyWisdom1 Ай бұрын
@@kautzz That doesn't justify calling the best product in the market and the only one with breach warranty "shitty".
@Ironfranko
@Ironfranko Ай бұрын
"It's not true that this could happen to anyone" Until it does, that is. There's no 100% error proof technology. I believe the update was tested and flagged as buggy by Crowdstrike, but some human error (which should not have been allowed) made it go through anyways. Said that: the thing that absolutely surprised me is that they did not release the update to a small subset of endpoints and gathered telemetry back before rolling it out to every other customer.
@grokitall
@grokitall 29 күн бұрын
read the independent report, it confirms that little testing was done, and what was done was designed to pass the tests, and ensure release, which is what happened. to sum it up, they did minimum tests against the signature file, which passed. they completely mocked out the template file, which therefore got deployed to customers with no testing at all, and then shipped straight to all 8.5 million machines at once without doing a single deployment to a windows test machine inside the company prior to release, which would have caught both the insufficient parameters bug in the template file, and the failure of the kernel module to cope with it gracefully as they were telling their customers. their release methodology also made a joke of both the microsoft driver testing program, and the customers resiliency plans due to subverting both processes by not acting as customers were lead to expect. hardly surprising that both crowdstrike and microsoft are both getting roasted over this.
@s1mph0ny
@s1mph0ny 17 күн бұрын
You can abstract negligence to human error to no end with a little bit of creativity but that's not a reasonable excuse in this case. This definition update shouldn't have even passed the automated rule-based test since it was 40 some kilobytes of null data. It is interesting that risk management is one of the pillars of cybersecurity and yet this industry company didn't do any risk management on their own product. An argument can be made for one or two of their poor decisions, but put together it's just a huge clusterfuck. Doing no testing on actual hardware might make sense if you have a bulletproof client, or if your automated rule-checking system works, but they had a very weak performance focused client, and their automated checks don't work either.
@aarongrattafiori617
@aarongrattafiori617 Ай бұрын
Loving the iSEC Partners shout out at the start!
@chrisclark9541
@chrisclark9541 Ай бұрын
Booya!
@AdventuresinCyber
@AdventuresinCyber Ай бұрын
Terribly biased episode. Shame from a podcast I hold in high esteem.
@jzk224
@jzk224 Ай бұрын
I agree this sounds solarwindsy
@DailyWisdom1
@DailyWisdom1 Ай бұрын
The whole video is what you emphasized it wouldn't be, ambulance chaising. Lowest type of marketing, definitely not talking to S1 reps
@sec_alex
@sec_alex Ай бұрын
Happy to respond to any substantive criticism.
@grokitall
@grokitall 29 күн бұрын
no it was not. the interview covered what has been standard practice even in web app deployment setups, how it avoids the problems at crowdstrike, why the culture at crowdstrike could look like it must, and how crowdstrike subverted both microsoft certification testing, and customer resiliency planning. of course crowdstrike are going to end up looking bad because of this, but it was not a case of don't use crowdstrike, use us instead like you present it to be. it was more a case of crowdstrike acted like cowboys, and all of these things are going to happen to the rest of us because of it, including asking a lot of questions that would have resulted in crowdstrike not being used if they answered them truthfully.
@harveypaxton1232
@harveypaxton1232 Ай бұрын
The bottom line fault is with companies IT managers without proper policies in place.
@grokitall
@grokitall 29 күн бұрын
the companies had resiliency plans in place, but crowdstrike lead the customers to believe that n-1 and n-2 deployments were being done, when in fact that only applied to the core kernel module.
@harveypaxton1232
@harveypaxton1232 28 күн бұрын
@@grokitall The reason Southwest Airlines escaped is their policies .
@grokitall
@grokitall 28 күн бұрын
@@harveypaxton1232 yes, they use windows 3.11, which has not been supported for over 2 decades. this kept them safe, but after they bragged about it painted a big target on their back.
Telegram's CEO released on bail, but can't leave France
16:51
Risky Business Media
Рет қаралды 377
Now it’s my turn ! 😂🥹 @danilisboom  #tiktok #elsarca
00:20
Elsa Arca
Рет қаралды 12 МЛН
Секрет фокусника! #shorts
00:15
Роман Magic
Рет қаралды 70 МЛН
Developing the RISC-V Framework Laptop Mainboard
24:59
Framework
Рет қаралды 127 М.
Microsoft Is KILLING Windows | ft. Steve @GamersNexus
19:19
Level1Techs
Рет қаралды 422 М.
Sir David Davis: It's 'Highly Probable' Lucy Letby Is Innocent
8:21
Good Morning Britain
Рет қаралды 354 М.
Linus Torvalds: Speaks on Hype and the Future of AI
9:02
SavvyNik
Рет қаралды 191 М.
"Google is Getting Worse," ft. Wendell of Level1 Techs
26:07
Gamers Nexus
Рет қаралды 342 М.
Why Microsoft Is To Blame For The Crowdstrike Outage (Not The EU)
17:37
What's going on with Windows Laptops?
10:30
Marques Brownlee
Рет қаралды 3,6 МЛН
NEVER install these programs on your PC... EVER!!!
19:26
JayzTwoCents
Рет қаралды 3,3 МЛН
NSO Group Lawsuit Interference, Commercial Election Interference and More
16:24