This is one of those times when if you wrote it in a movie, everyone would say "That's so unrealistic. No company could be THAT incompetent without going out of business."
@andrasfogarasi50142 ай бұрын
Well, any business can demonstrate incompetence at least once before failing.
@pyrouscomments2 ай бұрын
We don't know that they won't, though.
@akam99192 ай бұрын
...well, this is crowdstrike going out of business.
@RipVanFish092 ай бұрын
I hope they go out of business because clearly they can’t be trusted. (Also this has happened twice before, just on operating systems that the entire world doesn’t run on) the amount of negligence is dumbfounding.
@2rx_bni2 ай бұрын
Real life is so much wackier than fiction.
@kevinletterer41712 ай бұрын
The 10 dollar gift card is a trap, if you accept it then they can claim that they have already compensated you for any damages and you agreed to it.
@ryanquinn12572 ай бұрын
My thoughts exactly. Had this happen MANY times from a brief summer into Xmas at Costco. They’d offer some compensation usually some 15-50$ gift cards, but if you had accepted those you weren’t part of the class action for those wrong doings later. Same for insurance company’s offering a lowball, but you get money now amount vs waiting and making sure you’re all good physically after a crash.
@earchinmc10802 ай бұрын
they did not offer 10 dollar gift cards to their customers (aka the people affected). They were handing them out to sellers of the software
@sad_man_no_talent2 ай бұрын
but that's 10 dollar man u gotta understand
@klam772 ай бұрын
shysters! What a slick move! OMG. S/W TORT NOW.
@SammmN2 ай бұрын
The $10 is just insulting. I was in the fire department when it happened and NOTHING was working. The firefighters had to use the radios instead of computers. My friend was working in the hospital and computers were down.
@autohmae2 ай бұрын
If they haven't signed the 'content', this is probably the worst failure of all, for a _security_ company, because they are creating a HUGE security issue.
@arcanernz2 ай бұрын
It’s not crowdstrike’s fault the update worked on Dan’s computer; the millions of other computers were just edge cases.
@Spiker985Studios2 ай бұрын
*Docker has entered the chat*
@JoyceMuller-xv6kh2 ай бұрын
This global internet outage is insane! All airlines grounded and i was stock the airport and even banks, media, and offices from the U.S. to Australia. How can CrowdStrike have such a monopoly that could help restore such a massive amount of tech?
@JackMyers-br2vi2 ай бұрын
It's pretty concerning. If they can fix this, what other control do they have over our infrastructure? or are we truly in the matrix?
@MattMiller2112 ай бұрын
Right? It makes you think about the stability of our systems. But hey, I barely spend time online. When I checked my portfolio with Desiree Ruth Hoffman, we were still in the greens. That’s been the case for 16 months straight!
@AlexYoung-212 ай бұрын
Wow, really? I've seen the name Desiree Ruth Hoffman before but can't figure out where.
@JackMyers-br2vi2 ай бұрын
Probably from her forecast on Nvidia before the pump. But how are you in the greens with all the fluctuations due to the election and everything else? Can you share her strategy?
@MattMiller2112 ай бұрын
Honestly, just schedule a call with her. She has vast knowledge in finance and really knows how to navigate these times. I handed over my portfolio to her so I can focus on my family. These days, things just get scarier and scarier.
@desertdude5402 ай бұрын
The craziest thing is that the CEO of Clownstrike was CTO at McAfee when they pushed the update that thought svchost.exe was malware and nuked millions of Windows XP installations.
@PatNeedhamUSA2 ай бұрын
I find it kind of hilarious that a third or half of their PIR document is marketing material
So tldr: so many avoidable errors, that exploded in the most spectacular way
@cabanford2 ай бұрын
So this update bricked every machine it was pushed to... Screw the Rolling Updates - didn't they test this on even a "single" internal machine before their Friday push??? WTF. Managers heads (CEO)
@shimirel2 ай бұрын
Given their list had "local developer testing" we both know the answer ;-)
@coopercummings83702 ай бұрын
It wasn't just a Friday push, it was a 10pm Friday push.
@cabanford2 ай бұрын
@@coopercummings8370 Sadly, it was 24 hours of Friday around the world (not just local dev time 🤣🤦)
@coopercummings83702 ай бұрын
@@cabanford local Dev time absolutely matters in this case though. If they pushed their completely untested changes Friday morning they still would have had people in the office to notice they fucked up and roll it back before it blew up as badly as it did, but when they push it at 10pm that was probably the last guy in the office and he left immediately after pushing the patch.
@DuraanAli2 ай бұрын
I have a small software business, it's not even my full time job and I don't do updates unless it goes through unit tests, staging and behavior monitoring with applications like Sentry and Hotjar! It's so weird to see a huge company like that just pushing updates even if they're urgent.
@wrfsh2 ай бұрын
3:54 Hey Theo, you're wrong here about graphics drivers. WHQL is not there just to put you into the next windows update. It allows the driver to be signed, and the windows kernel won't allow you to load unsigned drivers (normally, unless you're in debug mode and don't have secure boot enabled). So even if you distribute your driver through your own channel, you still need to get it signed, and for some drivers that means going through WHQL. CS sidestepped it with the channel files because they don't actually load those files as drivers, as i think you correctly point out.
@stage6fan4752 ай бұрын
This company is named 'Clownstrike' for the rest of time.
@GreyDeathVaccine2 ай бұрын
Yep, until until the Sun swells and swallows the Earth.
@theapexsurvivor95382 ай бұрын
I mean, I'd almost argue that they earned cloudstrike, as it's basically a cloud software and it was more effective than a LOIC...
@luketurner3142 ай бұрын
Aviation has the Swiss cheese model ClowdStrike had the NO cheese model
@starnumber120462 ай бұрын
As an avgeek, this comment is approved
@stevengill17362 ай бұрын
My sentiments as well...
@kyle70232 ай бұрын
"Disasters don't just happen, they're a chain of critical events"
@stevengill17362 ай бұрын
The holes in the Swiss cheese line up. ;*[}
@Canleaf082 ай бұрын
@@stevengill1736Mentour Pilot vibes.
@robotorfeed68062 ай бұрын
painfully reletable
@conceptrat2 ай бұрын
A little bird told me that the CI/CD process was taking too long and the maxing out the processors. So they removed some parts of the process (tests?) and then pushed start again. And here we are. Partially.
@OhhCrapGuy2 ай бұрын
The tests are maxing out all the processors? That sounds like a *failing test* to me. bEtTeR TuRN oFf ThE uNIt TeSTs
@EwanMarshall2 ай бұрын
Ooh, tests fail, lets remove the tests... yeah, great idea... The thing is, the amount of smaller similar incidents over the last year raises questions about their testing practices before their statements. I think gross negligence can easily be made out here.
@Elesario2 ай бұрын
Sounds like another company that isn't interested in investing development time into the test and relevant toolchains, because they want to focus on their core product.
@itsTyrion2 ай бұрын
source
@Z3rgatul2 ай бұрын
WHQL is not about windows update. It's about digital signature from Microsoft. Windows doesn't allow to load non-signed drivers, unless you enabled debug mode which was made specifically for driver development. Driver can be downloaded from anywhere. But only Microsoft signed drivers can be loaded on normal windows machines.
@Elesario2 ай бұрын
You can install non-WHQL certified drivers without necessarily being in debug mode. That's what a lot of driver update programs, like the ones likely installed with your graphics card, are doing. Theo does mention this during the video. If you try to manually install a non WHQL driver you'll probably get a warning, but you can still instruct Windows to install it. Some drivers do get install by the Windows Update software, and those ones are only ever WHQL certified.
@Z3rgatul2 ай бұрын
@@Elesario I just went to Device Manager -> Display Adapters -> my RTX GPU -> Properties -> Driver tab Digital Signer - Microsoft bla-bla-bla You can click on Driver Details and see that every single file in the list is signed by Microsoft If you don't believe this information, you can manually check every individual sys/dll/exe file and Microsoft signature is here You can't install non-signed drivers starting from Windows Vista. You have to disable driver signature verification on the system level and reboot machine. There is no warning that can be bypassed.
@toastrecon2 ай бұрын
I don’t think that the PR team is necessarily inept, I’d bet more than a few $10 Uber Eats cards that the statement was wordsmithed for hours by engineers, lawyers, executives and whoever else felt like their neck was on the line. Endless revisions until it said everything and yet nothing.
@OhhCrapGuy2 ай бұрын
The ABCs of running a company like Crowdstrike: Airline: We forgot to check if we put any fuel in the plane, or if the engines worked, or if they were attached, but at least there was a pilot in the plane before takeoff. And it's a good thing that Southwest pilot was on our flight. Bar tending: We have an excellent bar, high quality glassware, exquisite lighting, a phenomenal sound system, and only the best point-of-sale system available. Let's open the doors, it's time for the grand opening. ... wait, we have to buy the booze? I thought whenever a bartender came in, they'd bring the booze with them! What do you mean "hire a bartender"? Car manufacturing: Look, we made sure that all the wheels were firmly attached to the vehicle. Sure, one was bolted to the hood, another was attached to the middle of axle, which caused the axle to snap in half, and the last one was put in place of the steering wheel, but we made DAMN sure that it had all 3 wheels attached this time. ...what do you mean "4 wheels"?
@halbeik2 ай бұрын
A smoke test would've caught this at any point in the pipeline
@MeriaDuck2 ай бұрын
Should have indeed.
@ManaVkum-w9b2 ай бұрын
One of the most comprehensive and detailed explanation of the shit that hit the fan...in interesting and funny manner with enough dash of frustration. This incidents proves we are living at brink...and domino effect can happen ANY time.
@leexgx2 ай бұрын
20:20 rolled out the update on a Friday at 10pm pt (-7 or 8 gmt, 5am ish uk/eu) the best time to brick 9 million systems
@circuitousprime2 ай бұрын
To be fair it wasn't a Friday evening. It was Thursday at ~11pm CDT (Texas time), Friday ~5am BST (UK time)
@TheJacrespo2 ай бұрын
I would say the exception here is a related tech company working decently according to minimal tech and engineering standards. The norm these days is a big managerial fat layer pushing the problems towards the bottom, where the technicians are burning out without any understanding from the managerial upper layer over the bolts and knots of the implementations.
@akam99192 ай бұрын
I don't agree with how things are being said that "the crowdstrike update is a not a content update". From what I've been told, the signatures are over glorified config files. That being said, this is lower on the stack than I am familiar with. However, as I see it, this like adding a card to a digital CCG, but all your cards are basically config files that are evaluated when played, performing special actions, etc. Also, let's be self aware here, this is windows and a security company. Do you really want to wait a gazillion hours for an emergency update on a weekend on while cyber attack is going on to get your driver approved? As far as I can tell, the problem is not that crowdstrike is making these content updates not drivers, the problem is they didn't check for a damn null. I would also say that while crowdstrike's default behavior of "crashing" when it can't read a file is dumb...but I would also say this is not something that is totally unreasonable to think it should be default behavior. I can easily see someone thinking... "hmm... if someone tampered with some stuff to the point where I can't read this damn file, I probably shouldn't let us boot". I can also see this being part of some nested if statement, and bad logic causing problems down the line and causing this behavior. I can also see someone not telling the person who wrote the code that "hey, if some weird stuff can't be read, skip the driver. we don't want to be a DOS vector". That being said, I don't like how crowdstrike is handling this. At the very least, if you're going to be spitting on people's faces with a gift card that can't cover squat, have the damn balls to not give them anything. If you really want to be kind, not only refund/credit your customers and tell us that you are reviewing your internal processes, but fucking apologize! Own your mistake, and show us you will do better. Also, give all the un-thanked and called on a weekend IT people that were called away from their friends and family because of your mistake, a $50 uber eats gift card so they can get themselves a goddamn big mac.
@alexholker13092 ай бұрын
The Digital CCG comparison is an apt one since a game like Hearthstone both has run into the same update latency problem (due to needing Google/Apple to sign any updates to their respective mobile clients) and has prompted the company to bypass that process for emergency fixes (implementing urgent nerfs and bans on the "umpire" server without waiting to get the client update approved).
@jbutler85852 ай бұрын
Yeah this is horrifying to hear about. The fact that files aren't signed means they are a huge open target for malware. Nevermind that it would have prevented this disaster, it means the entire product and its design are so fundamentally flawed you can't be sure it's providing protection. It could be doing the exact opposite of what it's supposed to, holding the door open for attackers to do whatever they want, with no way of knowing that it's gone malicious because the kernel driver is untouched.
@TehPwnerer2 ай бұрын
$10.00 gift cards? A slap in the face would be less insulting
@SforSamPlays2 ай бұрын
They had also rolled back on the gift cards apparently. Which somehow is worse
@Space_US2 ай бұрын
@@SforSamPlays how? did uber revoke them?
@SforSamPlays2 ай бұрын
@@Space_US it was mentioned at the very end (I didn’t get that far when leaving the comment) Either CrowdStrike went and invalidated any not used (probably due to people being rightfully upset) or Uber Eats did it cause they thought it was all fraudulent (since scammers use gift cards to transfer money and stuff)
@BroudbrunMusicMerge2 ай бұрын
@@Space_US Yes, sorta. Uber detected a possible fraud due to the large number of cards and deactivated them
@Space_US2 ай бұрын
@@BroudbrunMusicMerge lmao
@JohnLovell-FTW2 ай бұрын
If they are a pubicly traded company dont they have to pass compliance audits? OWASP? I wonder what thier current score is :). There is no cpming back from this as a company in my opinion. So, what did they actually install? This sounds like vaporware... do the sensors actually do anything?
@gamergirlandco2 ай бұрын
of course!!! they're very good at taking up disk space :)
@kanbekan2 ай бұрын
the sensor also good for farming up money from fortune 500
@autohmae2 ай бұрын
Let's just say even Uber thought they are a scam.
@zoeherriot2 ай бұрын
The issue is at Ring 0 in the windows kernel, a crash means a BSOD. That's intended behavior. But they clearly didn't validate that the file they loaded had valid data in that driver. Easy mistake to make, easy one to catch before you push it to the world. They had a similar issue with some Linux distro's earlier in the year. The "system performance" checking may be related to that. But, there is a pattern.
@MyBlogsTV2 ай бұрын
And this is why you don't push to prod on Friday
@nicejungle2 ай бұрын
With Linux, you can reboot with the previous kernel, it's just a selection menu at start On Windows, you're screwed because it's a toy OS
@zoeherriot2 ай бұрын
@@nicejungle um.. you weren't really screwed anymore than you would have been with Linux on Windows. You would still need to physically go to the machine and change the configuration / or in the case of windows delete a file.
@nicejungle2 ай бұрын
@@zoeherriot One major difference : * on linux this can be done in one reboot without any technical knowledge * on Windows, you have to use magic to know the faulty file to delete
@tonytsai66002 ай бұрын
@@nicejungleYes, Windows is suck, but if this incident happened to Linux with this extent, it still would be a disaster, IT still needs to fix these servers physically or remotely. And the whole world has already been impacted. Why Windows suck is cause the bit locker to hinder you fix the issue quickly, but bit locker is also a good tool to protect your data. So what I understand is there is no perfect os if any unexpected thing happened, and that’s the reason why we need to buy insurance.
@teslainvestah50032 ай бұрын
I haven't heard any stories of clownstrike deaths yet. I heard one about a woman who permanently lost her breast because she urgently needed a mastectomy, but the hospital couldn't perform the more advanced skin-sparing mastectomy they originally planned because it required blood transfusions that were to be ordered on a windows PC.
@alenasenie69282 ай бұрын
From my perspective it is funny that this happened here first than in kernel anticheats, that can potentially cause the same but for users.
@zisaizic47592 ай бұрын
it probably has happened, crowdstrike is just very ubiquitous. Delta airlines wouldn't use an anticheat.
@theswordslay35422 ай бұрын
kernel anticheats has already caused some problem, especially the intrusive one like Vanguard. some people has already reported some of their other drivers shutting down, heck, one of them crash their entire PC because of it. the difference is, kernel anticheats are only isolated to gamers community, while crowdstrike? they are spread from airplane to hospital system.
@SomeThingOrMaybeAnother2 ай бұрын
It has happened in kernel anticheats before. They just aren't as widespread.
@alexholker13092 ай бұрын
It would be nice if this debacle made kernel-level anticheat too hot to handle.
@ElmerGLue2 ай бұрын
@@alexholker1309it won’t. Gamers aren’t corporations and without anti cheats half of online gaming would turn into a wasteland or devs would need to up the costs of server management and that would mostly benefit larger game companies and raise the barrier of entry for small creators.
@RipVanFish092 ай бұрын
Please do a video on that conversation with the math teacher.
@applepie98062 ай бұрын
Seconded. He sounds brilliant I would love to hear his opinion on all this.
@markm15142 ай бұрын
The silver lining is that admins who have things inappropriately hosted on Windows are reevaluating their options.
@mattilindstrom2 ай бұрын
I wonder if ClownStrike felt the update was so urgent a rolling release was clearly out. My feeling is their process is just deeply defective: the testing failed, the roll-out failed, and their response to all of this has failed. If I had any loose money, I'd be shorting their already battered stock so hard.
@autohmae2 ай бұрын
The blog post seems to confirm their was no rolling release update process at all.
@starnumber120462 ай бұрын
Cloudflare once did this, then they dos'd themselves because it contained catastrophic backtrack
@mattilindstrom2 ай бұрын
@@starnumber12046 I remember that, it's what one gets not understanding how regexes may have horrible scaling built into the way they work. The usual regex libraries are optimized to the hilt with the compilation to a FSM, a pure PEBCAK situation there.
@NormanLyon2 ай бұрын
I agree with the fact the CS messed up bad. But I have disagreements with your analysis. 1. What I understood from their RCA was that their validation test suite failed in such a manner as to not generate logs from the most recent update, but to instead use logs from previous updates. Yes that's horrible, but it's a bug, not a lack of testing. 2. The fact that validation was only happening on send was a problem. Validation of received updates must occur for anything to be considered resilient. The driver should have refused the update, and stayed on the previous version (sending an alert of failed update). Failing that, it needs to version itself for some level of rollback to previous known good (and send alert of failed update). 3. I can understand the lack of booting on valid update. Many in the security world would call an assumption of being protected, having protection broken as a worse state than knowingly having no protection. When dealing with corporate and government regulations, these concerns are real. 4. I agree that automatic update to "latest and most likely to crash" is a horrible stance. It's also the stance taken by many in the IT industry today as a means to combat never updating. Enterprise software should have a middle ground if it's ever going to be taken seriously as an enterprise solution. Canary testing and staged promotion lifecycles need to be taken seriously. 5. WHQL is also at fault here. Security vendors require moving at a quick pace, and as MS has a near-monopoly on OS for end-user systems, regulatory concerns require that access MS security teams can take advantage of must be usable by non-MS entities. WHQL needs to be able to provide meaningful and quick value in cases such as security products. The fact that this driver has these faults yet is certified is proof that CS deserves their certification pulled, and must re-certify under a worthwhile process. 6. Common corporate implementations of bitlocker make this event a nightmare. Too many companies believe that they empower their employees to manage their own devices. Yet very few users were able to access the bitlocker passcodes for their systems. Each company that fell victim to this situation failed miserably at their disaster recovery tabletops and need to fix this ASAP. I wouldn't want to do business with CS now. Yet any enterprise-wide product of this nature is difficult to change. CS's competitors have their own problems. The general mindset I've seen from every vendor in the security focused enterprise software market is that they are filled with unjustified bravado. This incident won't change that. I just hope it will make some things better, especially as the real issue doesn't completely belong to CS.
@melimsah2 ай бұрын
Seeing you progressively lose your mind throughout the video is a work of art.
@Hexanitrobenzene2 ай бұрын
Yeah, this video is nerdy stand up comedy show :)
@maciejtrybilo2 ай бұрын
Love how they surpassed the "works on my machine" memes by claiming that local developer testing is only going to be introduced now.
@GreyDeathVaccine2 ай бұрын
They don't when to shut the fuck up for their own sake.
@lornova792 ай бұрын
Crowdstrike developers use only macOS so local testing the Windows sensor would be complicated...
@itskdog2 ай бұрын
I realised only now why there's no apology - the lawyers told them not to as apologies are taken as admissions of guilt in the courts and will garuantee a defeat, when by not apologising they have a chance, however slim, of minimising the damage.
@GaryGreene19772 ай бұрын
If I were the CTO of a company that uses this product the first thing that would be done is calling my IT staff in and have them remove this tool from all systems in the company. This is pure incompetence
@hungrymusicwolf2 ай бұрын
15:46 "gracefully handle" - As gracefully as a nuke and twice as !quiet.
@acerreteq7032 ай бұрын
Thank you for this video analyzing the matter and for letting you feel so much pain in the process of producing it. But don´t overdo it, we need you. 🖖
@kamertonaudiophileplayer8472 ай бұрын
FYI: Austin, TX is 2 hours ahead of California.
@Flameboar2 ай бұрын
The time was what used to be called GMT, not CDT.
@kamertonaudiophileplayer8472 ай бұрын
@@Flameboar It makes a lot sense.
@emaayan2 ай бұрын
one of the comments on dave's video said that a customer actually DID configure staggered rollouts on their servers, but CS actually ignored it.
@Pekz00r2 ай бұрын
Several of the steps you suggest would delay the delivery of the update significantly and in this case every minutes delay can be very critical. Even every second can be important when you racing against a virus that is spreading rapidly. You need to reach the machines before the virus. But yeah, you are absolutely right that there are many things they could have done that would not delay the rollout much, but still prevent a bad rollout reliably. It is as you said insane that they are rolling out boot driver updates to millions of machines without more verifications.
@alexedelweiss32672 ай бұрын
I just wonder what were they doing with all the money they receive, because it's a really expensive product... So expensive that it's unaffordable for most companies.
@drankthetranquil2 ай бұрын
This was a great video and I really appreciated the deep dive and discussion. This is almost just... unbelievable haha
@HarithBK2 ай бұрын
what i find shocking is when all these companies purchased crowedstrikes services they didn't bother asking the sales team basic operational questions that wouldn't be secret. the IT people at these companies also needs to reamed by there incompetence to do basic checks over the software they order.
@GackFinder2 ай бұрын
Agreed. This responsibiIity IargeIy faIIs on the CIOs and CTOs of the companies that bought into CIownStrike. But... I've been an IT consuItant for some 20 years now, and I can teII you, CIOs and CTOs are in generaI extremeIy incompetent from a technicaI standpoint. They are usuaIIy in the positions they are in because they are good taIkers, or because they are friends with the CEO.
@JohnDoe43212 ай бұрын
02:31 - WHQL is NOT a "hard certification to get". For a "software driver" like anti-malware, all that really gets tested is that the driver correctly implements the PNP and power management state machines. That isn't easy to get 100% right if you try to write the code from scratch, which is why nobody does that. Everybody either does a copy/paste from one of the Microsoft sample drivers, or uses Microsoft's KMDF framework, which handles it for you. WHQL requires more in-depth testing for known device classes, such as storage, network, and video. These drivers have to pass class-specific tests, which more fully exercise the driver functionality.
@jfbeam2 ай бұрын
_"It is not code or a kernel driver."_ (fine print) But it _is_ turned into code by our kernel driver, poorly. _"Based on..."_ (an unending one sentence paragraph) Long winded weasel-words for "WE DIDN'T F...ING TEST IT." Something anyone with a single functional brain cell already knew. Had they loaded that "channel update" on a SINGLE machine, they would've seen it crash, and hopefully would not publish it. *This alone is sufficient reason to NEVER do business with Crowdstrike or anyone who's ever driven past their office.* The next paragraph is weasel-word for "we don't check for nulls". A common mistake in userspace, an inexcusable failure in kernel programming.
@2rx_bni2 ай бұрын
Glad to catch the beginning of this since I showed up to stream late and only caught part of it.
@frankroos11672 ай бұрын
Erm....as a developer I am used to one thing: The least thing we have done before releasing an update package is test the installation of the package, as it will be released in a safe environment. Obviously, that test is missing. Which is mind-boggling to me. In the almost 30 years of my carreer, I have not worked for any company that didn't have that test. And the companies I worked for are not all big companies. Some as small as a company built around a team of 4 developers. It is so basic. And so everywhere I have been. I can't even imagine a company that doesn't have it. And now there is such a company that is this powerful? I am scared. Really scared. As far as I am concerned, CrowdStrike should go bust, just for having that test missing. I don't want them to improve. I just want them gone. If yo make a mistake like this, you are bound to make more. Question for MS: Why a certification for a driver that has these shortcomings? Null-checks and verifying unsigned files getting loaded. In most code it shouldn't be too hard to see these things are in need and missing.
@ytlongbeach2 ай бұрын
Theo, thanks for putting this out there. 99% of people would never understand what you laid out, without such an incredible run down on the CrowdStrike failures. This video should be required viewing for all large company executives and the entire US congress. kudos !!
@tma20012 ай бұрын
except he was wrong on the all zeros channel file - that turned out to be a red herring but it was initially very confusing because not everyone had an all zeros file. CrowdStrike made this clear in their initial blog post (not the more detailed one in this video).
@mchisolm02 ай бұрын
I did not realize the assumptions I made when learning about this...thanks for taking the time. The reality of getting out of the signing of code by moving the code to another file...crazy. Could Windows have caught that in the signature step? They could they have been like "Hey, it looks like you are running code in this channel file over there...but you didn't tell us what the content of the channel file was. You think we can sign off on that?"
@tablettablete1862 ай бұрын
24:05 That actually makes sense, otherwise the software wouldn't be able to monitor things during boot. Like this: -> OS Starts -> Dtiver loads (but does not parse the rules) -> Malware runs invisible -> Driver parses the rules and starts monitoring the machine Its us pretty common to run at boot and start monitoring as early as possible.
@wulf21212 ай бұрын
Theres one small misunderstanding in the beginning on your side: WHQL certification is not (just) regarding using windows update to push drivers. All the newest drivers that can be downloaded from Nvidias website are WHQL certified before Nvidia makes them available as well. Basically any device driver running in the kernel needs to be certified, or Windows will complain upon Installation (In Home Version nowadays it's completely blocked to install such driver and only with a Pro License you even have the option to override this. At times of Windows XP there still was the option to ignore the warning from Windows and install the driver anyway).
@ForLineage-dr5ju2 ай бұрын
The channel file is probably not being signed. That's why it tried to run a bunch of zeros. Unless somehow a bunch of zeroes got signed, the change was never caught with a diff...and no further testing was done. Just pushed out to everything, everywhere all at once. They could have tried it on a work laptop in the office. I bet they don't have the ability to even push to one targetted test machine. I bet their options are "Push to Airlines or push to hospitals for testing".
@shishsquared2 ай бұрын
The best part is its actually not a "binary file" if it's all zeroes.
@nibblrrr71242 ай бұрын
you don't even have to open the unary file, just stat it to get the length and you have all the information encoded in it. GENIUS!
@the-answer-is-422 ай бұрын
@@nibblrrr7124It's the best optimization! No parsing whatever is in the file is needed! Just make sure some of your zeros are more zero than other zeros and I promise, it will work perfectly!
@tma20012 ай бұрын
That turned out to be a red herring and nothing to do with the crash as CrowdStrike clarified in an inital post. Not everyone had an all zeros channel file - valid files have a magic byte signature of 0xaaaaaaa at the beginning. Disassembly of the actual kernel driver shows this is checked for. Neither does a channel file contain code. Experienced hackers have speculated that channel files are initially preallocated and only updated after the temporary downloaded file is parsed correctly but of course the kernel driver crashed and left it in this state. This makes sense of the all the facts. The actual null ptr dereference was due to a faulty field in the channel file that led to a memory allocation error from the non-paged memory pool, either allocation was the incorrect size or misaligned. The BSOD exception is a non paged access fault. The rest is history with lack of staged rollouts of this hotfix.
2 ай бұрын
@@tma2001 That makes so much sense! Does anyone have a copy of the original file? It would be great to see that instead of information based on misinformation from a Twitter post (the one about the file being all zeros that went viral).
@tma20012 ай бұрын
CrowdStrike published a Tech Analysis: 'Channel File May Contain Null Bytes' explanation at the time of the outage which I and everyone else missed which finally clears up that mystery! My intuition was correct but it is Windows itself rather than the CS driver that first erases data of disk sectors for a newly allocated file for security reasons. Writes don't occur until file is flushed by Cache Manager which doesn't happen if the driver crashes with a BSOD.
@wlockuz44672 ай бұрын
The incident was an accident, but the gift cards were planned, approved and handed out with an intent. This just goes to show that they care more about saving face than help their customers. Their service costs upwards of $180 per device per month. If you make a very conservative example of a company where 100 devices need Crowdstrike, that's $18k a month. The $10 gift card means absolutely nothing in comparison, if anything it's an insult, a f*** you. If they really want to show that they care, they should waive their service charges for a few months for the affected customers. Anything less would be an insult. Last but not least, this is just wishful thinking, but this should be treated as a cyberattack with Crowdstrike brought to court and held accountable.
@daveh02 ай бұрын
It's absurd to say auto software updates is the cause. This is caused by local admin/root on thousands of machines. Allowing them to make changes everywhere at that OS level is the problem.
@dand44852 ай бұрын
I doubt the DLLs are signed. When i worked at Microsoft back in the day when when nimda and slammer happened and all of us were suppose to drop everything beef up security.... I opened a bug against windows that suggested just as .net have a manifest file for dependencies, thought it would be a great idea for native apps a type of manifest file much as Theo kind of commented on briefly. While the other thing i find why i would call this an outright bug, Windows typically has an option for WinoZe updates to "Auto", "Download but let me choose when to install..." Nothing i'd argue should ever have free access to simply update and brick my machine... The fact Crowdstrike simply updates without user intervention.... Bad really bad...! Oh ay, have to hold microsoft for getting rid of testers at microsoft... And a $10 reimbursement for what? Wonder for Delta they can only make one claim for all 5600 flights?
@jeffwells6412 ай бұрын
I'm pretty sure the REAL DLLS are signed. Pretty much all DLLs are signed these days, it takes 10 seconds to do. However, their deployment somehow turned all the nice, meaningful 1's and 0's to all 0's. There's no way this was anything but a deployment error. However, why the hell doesn't the driver check for valid input before running? It should at LEAST be checking that the file it's loading is the correct file, and hasn't been swapped with a malicious copy. That alone would have stopped this, because there's no way the embedded certificates validate when the entire file is null. But not even a hash check? The lack of a rolling rollout is the biggest failure, because that's supposed to be your last line of defense. If they had that, maybe a few thousand systems crash. Maybe even 50,000 crash. But millions don't crash. You don't take down whole industries for 5 days.
@autohmae2 ай бұрын
Let's be very clear: this wasn't Microsoft's fault, their issue is allowing this driver to exist (a driver that loads logic from somewhere else). Everything else is ALL on Crowdstrike. Well, no.. maybe, if Microsoft allowed this driver to load external logic that isn't signed, that's also a failure on their part. Assuming the all zero's or similar is correct. I wonder if it was the signature checking code was the one that crashed, because the blog post says a lot of nothing. That would actually have been maybe the best outcome. And it still is a total failure on all accounts from Clownstrike
@dand44852 ай бұрын
@@autohmae I would argue it is by way of allowing a turd party into ring 0, or the low level Kernel. Not argument the coding error is Crowdstrike's mess up. I'm looking at more of the allowance that Any company could bring down windows as it did... If memory serves me, for a company to get a DLL to run in the kernal thought it needs permissions from windows, so the blame kind of goes right back to windows...
@autohmae2 ай бұрын
@@dand4485 well, they have legitimate reasons for drivers to exist. You can't fully prevent drivers from causing problems. The kinds of drivers you usually see as boot-start drivers is: storage drivers, the BIOS/Firmware loads the OS from disk and gives the OS some ways to read some more data and then the OS should be able to talk to the storage itself to get optimal performance, so Windows (like any OS) will not boot further if it can't read the disk itself is stored on. So this function _needs_ to exist. Also in kernel space, their are not as many protections and they can't be because of performance reasons. Actually MS tried to create a new API for virus scanners, but turns out the EU blocked them, because the EU was worried that MS would prevent other companies from getting access to some parts of the system that they need to prevent viruses from getting access or giving other vendors less performance than Microsoft's own virus scanner software. This is actually something MS has done before, not for virus-scanning both other software from MS. So it makes sense EU blocked it.
@Deniil20002 ай бұрын
@@jeffwells641 if file somehow turned into all-zero bytes, that means signature is also gone. Which means that it shoudn't have been run in the first place, if proper signing was in place
@kamilkardel27922 ай бұрын
Giving control of deployment to the customer is actually a smart move because, as opposed to the vendor, the customer knows the exact roles of specific machines. With that control, you can make sure that a bad update doesn't take down all servers running a specific application or all user workstations in some department.
@tma20012 ай бұрын
Not everyone had an all zeros channel file which leads to more confusion. CrowdStrike if we believe them, made it clear that null bytes in this or any channel file were not at issue. This is lent weight by the fact that valid channel files have a magic byte signature which disassembly confirms is checked for. On a hacker forum some speculated the file was pre-allocated but not updated from the original download after the driver crashed. So all zero channel files are probably a red herring.
@gregutz42842 ай бұрын
I don't believe a lick that CS is pushing.
@ambhaiji2 ай бұрын
Theo is Gus from Psych blasting at CrowdStrike(Shawn) for all the alternative choices he has in handling how to deal with the escaped prisoners on the boat.
@landmanland2 ай бұрын
With gracefully they mean “we let the driver boot without the file”.😊
@gabotron942 ай бұрын
File not found. Improvise? (y/n)
@debasishraychawdhuri2 ай бұрын
Imagine Tesla pushes an update that makes a sudden left turn at random times.
@kingofichigo2 ай бұрын
I wouldn't be surprised tbh
@EscapeMCP2 ай бұрын
@@kingofichigo Yeah, seems immeasurably possible
@MichaelSchuerig2 ай бұрын
I don't have any special insights into the situation, but my hunch is that the fault occurred in a place where nobody thought it could happen. Let's assume the "content update" is checked and perfectly valid. This update needs to be provided to clients by some service. That service could simply be buggy and misbehave for various reasons that results in corrupt data being transferred. Of course, this risk could be mitigated by validation on the clients and rolling releases.
@nu11pointerexception2 ай бұрын
CrowdStrike Blue Falcon is more fitting name after bricking millions of computers
@shotgeek2 ай бұрын
I was admitted to a local hospital the day before this. On Friday almost every hospital system (imaging, laboratory, even food service) were down.
@dondekeeper29432 ай бұрын
Crowdstrike is now officially a ransomware service provider 🤣
@JohnDoe43212 ай бұрын
06:35 - This is wrong. Drivers for actual hardware execute different code paths based on what they receive from the device. This isn't that different than Crowdstrike's driver executing different code based on a definition file. Any driver that doesn't fully validate all data can crash. Real world example: There was a network card (NIC) which had its own firmware. The vendor updated the firmware, and soon after that, reports started to come in about Windows driver crashes. Crashes were infrequent and intermittent -- it took a while to root cause. It turned out that the new firmware would sometimes send data to the driver that the driver didn't interpret properly. This wasn't a failure of the WHQL process. It was a bug, and Bugs Happen. WHQL is intended to reduce the number of driver bugs -- it never promised to eliminate 100% of them.
@Flameboar2 ай бұрын
There is a interview on KZbin of the Clownstrike CEO after the BSOD disaster. He was blinking at a very high rate and coughed as he tried to answer the interviewers' questions. This gave me the impression that he was fully aware of how badly CS had f'ed up.
@RichardHennigan2 ай бұрын
Even a non tech person would come up with the idea of rolling releases to make sure things were working. There MUST have been multiple people in the company pointing out this dangerous flaw.
@wlockuz44672 ай бұрын
It doesn't even have to be something as complex as rolling releases. Something this severe would be easily caught if they actually tried to run their production build.
@JonathanSwiftUK2 ай бұрын
The kernel mode driver did not crash the system, as such, it only faulted when the channel file with zeros was executed / attempted to be used. We fixed most by deleting the file remotely, no safe mode, no moving the drive to another machine and deleting the file, then moving the drive back, simple and quick, and when we deleted that file the system continued to boot. Job done. Just removing that fixed it. They probably have their own definitions language or pseudo-code, like WASM.
@jeremysollars59222 ай бұрын
Any sufficiently advanced incompetence, is indistinguishable from malice. I've been getting a lot of mileage from this quote recently xD
@Calphool2222 ай бұрын
@Theo: Here's another thing they could have done to prevent this: implement a simple smoke test into their CI/CD pipeline. Basically, no matter what you're releasing, when you want to deploy something, first it gets sent to a newly built Windows image running Crowdstrike Falcon from your CI/CD pipeline, and a reboot command is issued to Windows. When Windows boots and hands control to user land, you have a user-land component of Crowdstrike Falcon that runs and does a "all is well" phone home (using one of the half a dozen different ways you can do this in Windows). That way, when you push your release, if your CI/CD pipeline doesn't see the "all is well," you KNOW that whatever you just pushed BRICKED THE OS. Without a basic smoke test like that, they should literally PRESUME that they are bricking everyone's machine with every single release! WHAT THE HELL IS GOING ON WITH THEIR CI/CD PIPELINE DESIGN?! It apparently isn't built even the tiniest bit defensively. You ALWAYS design your test suite so that it presumes things are broken, and then slowly convinces itself that it was wrong. That's just basic quality assurance. AND THIS IS A SECURITY COMPANY! Their press release reads like some engineer was required to document exactly what happened as part of an interrogation from management, and then management, marketing, and public relations "lawyered the shit" out of what they were told, quite likely because they don't have a clue what the engineer was saying, which is really the tip of the cultural ice berg that the entire planet crashed into last week. CLEARLY Crowdstrike's marketing department is funded *dramatically* better than the engineering team working on Falcon.
@forivall2 ай бұрын
My first job out of university was at Fortinet. Security companies are not the towering fortresses of bulletproof code, theres tons of sketchy shit everywhere
@CapsAdmin2 ай бұрын
Did they ship zeros or did their "content update" become zeros right before testing and shipping? Obviously the receiving end should also have some sort of validation other than "graceful" runtime exception handling.
@tma20012 ай бұрын
the all zeros channel file is a complete red herring and Theo keeps propagating it and coming to the wrong conclusion - the initial reports about this were rightly causing a lot of confusion. However not all customers had an all zeros channel file - CrowdStrike clarified this was not issue and valid files have a magic byte signature at the start which is checked for as disassembly of the kernel driver has confirmed.
@itsTyrion2 ай бұрын
@@tma2001 let's remove that part then, most points still stand as-is or slightly altered.
@tma20012 ай бұрын
@@itsTyrion agreed - CrowdStrike actually tell us in this press release what the issue was and its a shocking admission. One word in that press release gives the game away - _trust_ . The 2 previous named pipe template instances had worked in production as expected so was now trusted - they _assumed_ another slight variant of the same kind wouldn't cause any problems. And as a rapid response update it is not subject to policy staging controls the rest is history. Basically complacency led to their downfall.
@stevengill17362 ай бұрын
My question would be, can Crowdstrike even survive this incident? If I was an IT manager, other options would be of interest right now...
@hsvandrew2 ай бұрын
What makes me so mad about these failures (as a software engineer) is that unlike in other industries (i.e. construction), there will be no punishment for this negligence and misconduct. If the CTO was a construction engineer, jail time would be associated with this deliberate, incompetence. Yet in IT, negligence is just allowed despite the financial impact or lives affected. It is about time this changed so that good people doing the right thing are rewarded, and those that aren't get punished. Crowdstrikes share price before this incident was so high, they effectively had 'unlimited money'. The fact they were able to operate like a part time developer is just insane. This is also a test for corporate America and the CTO's who signed up for this product without any due diligence. Will they actually remove this product & company from their systems or by lack of action promote this type of behaviour as acceptable.
@Datalata2 ай бұрын
That was majestic. We all need a vacation. Oh wait…..
@MikeU1282 ай бұрын
LOL, as I watched this, I was thinking "This is like the Swiss cheese model, but way, way worse!"... and then, at the end, you came to the exact same conclusion!
@phillipsusi17912 ай бұрын
A rolling release means you get the latest stuff right away all the time That is exactly what they did. As opposed to a traditional release, which is tested thoroughly in development, allowed to settle with more testing, used by early adopters for a while, and THEN recommended for everyone else to use, but they all still have to choose to install the new release, when they feel like it.
@marknn32 ай бұрын
I wonder if CrowdStrike is using the new Microsoft Dev Drives file system to create the content update files. I had experienced (are reported to MS) files getting all zero content. It was related to Dev Drives + virus scanner (f-secure derivative). After uninstalling it and switching to Defender, no problems since.
@MacGuffin12 ай бұрын
This is very surface level look. You should see Enderman's recent video on Dave. This attack vector was well known and it was almost certainly an attack that was about money, just like the denial of it being an attack was too. I'm sick of telling everyone about the WSB post where the same guy made millions on this AND snowflake by shorting them hours before they were hacked. How can you red-team something that doesn't boot? not to mention hashing, manifests etc. Have a look at the cobalt module it was actually patching for, it would have required a lot of testing. As if they shipped this willingly on a friday night... Anyhoo they are still the best choice in this area, all these guys have issues
@protocol62 ай бұрын
The closest web analogy might be if your ad block vendor pushed a rule that caused your browser to crash on any page load. Normally, rules shouldn't be able to do that but the logic they control could trigger a bug in the extension or the browser itself. Something that might never occur without a very specially crafted rule. Hopefully ad block vendors do some kind of basic testing on their rulesets before pushing them out to everyone because an extension suddenly crashing every browser on the planet on launch could be more than a little annoying. ;P
@WooShell2 ай бұрын
The Uber issue is just the icing on the cake. It would have taken a single email or phone call to some Uber rep "hey, we're going to send out this voucher to all of our customers, it's going to be a lot.", but they didn't, so Uber saw the same voucher code being redeemed thousands of times in an hour and thought some company code had leaked to the internet and invalidated it.
@piccalillipit92112 ай бұрын
*SIMPLE FACT OF LIFE - NO ONE IS GOOD AT THEIR JOB* We like to think we are, we hope others are, but in reality, we are at best less incompetent than most others and we call that "GOOD" - we didn't evolve to do constant work like a machine and we are really BAD at it. The guy hand building a car in a small team start to finish HE is truly GOOD at his job, the guy coding a product he has no interest in for a giant company - NOT GOOD.
@DaxSudo2 ай бұрын
It’s like the rust neon crate. It’s not part of JavaScript or your react ecosystem or your node stuff but it is a node binary that exports functions. Oh, we’re just going to ship a bad binary that crashes everything. But even that’s a misnomer because the binary is configured to run in the node run time rather than be an obfuscated configuration file.
@JonathanSwiftUK2 ай бұрын
That's absurd, of course we automatic updates, just phased within an organisation, and across the world, with much better quality control, and auto recovery - auto recovery would have fixed this. It is not the signed driver it is the extra code imported from the channel file, those files likely contain pseudo-code. The signed driver in windows is just an engine to run other code. This is a very good article and makes it much more clear.
@zaeranos2 ай бұрын
The article reads like a US judicial cover-our-bases response. 😡 To the casual public it may seem as if they are remorseful for what happened. CrowdStrike counts on the fact they may accept their compensation. 😤 However if they were sued in court by individuals or other companies, CrowdStrike can still claim they aren't responsible and they have no blame. That the end user or any of the middleman companies, such as hospitals, airports or even Microsoft, can be blamed for damages, due to their complicit neglect using the software and installing the update. Making it a hard, long and expensive legal battle for lawyers. 😢 This response is unfortunately not their PR failing, but their PR and legal department running on all their best cylinders. And I hate it 🤬
@Henoik2 ай бұрын
As a cyber security professional, what freaks me out about this is that if a threat actor were to reverse engineer the driver, they could basically inject instructions to the driver that turns the Falcon agent into a C2 client. Also, just about every EDR solution and antimalware solution does this. The "You need to restart your PC" screen you get after installing an antimalware solution? Yup, that's the antimalware solution letting you know WIndows needs to start all the new drivers you just installed.
@silvioschurig7492 ай бұрын
You kind of ignore that this is running in Kernel Space. Trying to gracefully recover from fatal kernel error is riskier than immediate shut downs because it is not just some application process that failed. Basically all you know is at that point: your kernel is corrupt. Also huge issues not just with this company: the terms of service. Crowdstrike basically puts a disclaimer in there that their software / services guarantee nothing, are not suited for production environments and they can't be held liable for anything they do or cause, directly or indirectly. This next statement is victim blaming and terms of service like that this seem😂 to standard for these 'security' companies ... But really: why does anyone sign license / service agreements with a company who cannot guarantee any level of function for their products? I mean if you hire some company like Securitas to handle the cash collection from your points of sale, would you sign terms of service stateing "we may ir may not come by and pick up your cash. If we do come by we may loose some or all of it prior to depositing it to your bank. You are happy to assume that risk."? But for software on your production that's ok?
@grokitall2 ай бұрын
you cannot recover from the initial crash, you can recover from the boot loop which kept all the machines down. as to the disclaimer, leonard french covered it on youtube, and it is invalid in the case of negligence, which this clearly is.
@CC212002 ай бұрын
I've been saying that I have personally permanently lost more time and data due to automatic updates than from all other malware combined, and that's considering that I often go to dodgy websites. Are you sure the $10 voucher thing is real? Because it sounds fake... as in I think we're beyond incompetence here, and that you'd have to be a true sociopath to make that offer, or maybe someone who's deliberately trying to tank the company. Can't believe I'm saying that but I'm starting to seriously think the response is more damaging than the crash itself.
@ward62382 ай бұрын
"Suggests a level of ineptitude that I am struggling to fathom." Yep...
@Wampa8422 ай бұрын
Crowdstrike (noun): a large-scale outage or damage of infrastructure caused by the combination of an external vendor's action and industry-wide reliance on said vendor. Example: "The latest crowdstrike has caused a massive disruption of emergency services." Crowdstrike (verb, simple past and past participle "crowdstruck"): to cause such an outage to happen. Example: "Our boss moved our email service off-site and now we can't access it because of the outage, I knew he'd be the one to crowdstrike us!"
@jordanjackson61512 ай бұрын
The ending said it all. Yikes!
@RemotHuman2 ай бұрын
they did the same thing with the driver that you advocate doing with react native apps (of course if a react native app crashes it doesn't break your whole device, and its probably not as important for security as crowdstrike, but also like you said quick updates are important for an antivirus thing like crowdstrike)
@AayuVanced2 ай бұрын
Totally disagree with your statement of it being "not a content update but a code update" Unlike how other softwares are deployed, usually "content" is signatures or behavior that the driver can use to identity potential malicious threats, and unlike a feature update to any software, there can be hundreds of these threat information updates daily. Security companies collaborate with each other and share their threat intelligence to help mitigate potential threats and attacks. Hence a lot of these "content updates" are dynamically created and updated onto a global repository, from where the endpoints keep updating themselves periodically. I agree with the fact that crowdstrike should have done better input validation on the content, but the use of the content itself isn't a "hacky way to update the driver". Think of it like configurations. You don't call the act of configuring a firewall via options as "A gacky way to change firewall behavior"
@elmomertens2 ай бұрын
It's ironic that they are named CrowdStrike. It's like a movie/videogame villain supercorp being named KillPeople Corp
@sanderd172 ай бұрын
The driver is an interpreter or engine for the "content updates". It doesn't matter whether the content is seen as code or data, but the engine shouldn't crash on it. And it should be verified it can't crash on any malicious data. We live in the age of the web, where we get "content updates" from all kinds of random servers as webpages. These also combine data and code, but of a lot more complex nature. The engines, known as web browsers, have to interpret and execute this content. But when malicious content is able to crash the browser, or even worse, use the browser to hack into the system, it's the browser that will get the blame (rightfully so). The fact that this is possible for free web browsers, but that the driver of crowdstrike can't protect against an accidental file with all zeroes, is unbelievably bad.
@cleverlyblonde2 ай бұрын
They previously had an update that caused Linux hosts problems as well. Redhat put out an advisory on it for example. There are multiple discussions on it.
@2rx_bni2 ай бұрын
Heads up, Texas is actually TWO hours ahead of California so like...even worse D: I really do not understand how they managed to handle all of this so poorly but I think the rot at the top might be to blame. It feels like they're doing a sales pitch still instead of apologizing in that "explanation" post...
@temp502 ай бұрын
19:23 Omg indeed. We - at the company I'm working for - are not creating a software which could take down millions of machines worldwide but we are executing _more_ tests than they are just _planning_ to execute. :D