This is one of those times when if you wrote it in a movie, everyone would say "That's so unrealistic. No company could be THAT incompetent without going out of business."
@andrasfogarasi50145 ай бұрын
Well, any business can demonstrate incompetence at least once before failing.
@pyrouscomments5 ай бұрын
We don't know that they won't, though.
@akam99195 ай бұрын
...well, this is crowdstrike going out of business.
@RipVanFish095 ай бұрын
I hope they go out of business because clearly they can’t be trusted. (Also this has happened twice before, just on operating systems that the entire world doesn’t run on) the amount of negligence is dumbfounding.
@2rx_bni5 ай бұрын
Real life is so much wackier than fiction.
@kevinletterer41715 ай бұрын
The 10 dollar gift card is a trap, if you accept it then they can claim that they have already compensated you for any damages and you agreed to it.
@ryanquinn12575 ай бұрын
My thoughts exactly. Had this happen MANY times from a brief summer into Xmas at Costco. They’d offer some compensation usually some 15-50$ gift cards, but if you had accepted those you weren’t part of the class action for those wrong doings later. Same for insurance company’s offering a lowball, but you get money now amount vs waiting and making sure you’re all good physically after a crash.
@earchinmc10805 ай бұрын
they did not offer 10 dollar gift cards to their customers (aka the people affected). They were handing them out to sellers of the software
@sad_man_no_talent5 ай бұрын
but that's 10 dollar man u gotta understand
@klam775 ай бұрын
shysters! What a slick move! OMG. S/W TORT NOW.
@SammmN5 ай бұрын
The $10 is just insulting. I was in the fire department when it happened and NOTHING was working. The firefighters had to use the radios instead of computers. My friend was working in the hospital and computers were down.
@arcanernz5 ай бұрын
It’s not crowdstrike’s fault the update worked on Dan’s computer; the millions of other computers were just edge cases.
@autohmae5 ай бұрын
If they haven't signed the 'content', this is probably the worst failure of all, for a _security_ company, because they are creating a HUGE security issue.
@desertdude5405 ай бұрын
The craziest thing is that the CEO of Clownstrike was CTO at McAfee when they pushed the update that thought svchost.exe was malware and nuked millions of Windows XP installations.
@JoyceMuller-xv6kh5 ай бұрын
This global internet outage is insane! All airlines grounded and i was stock the airport and even banks, media, and offices from the U.S. to Australia. How can CrowdStrike have such a monopoly that could help restore such a massive amount of tech?
@JackMyers-br2vi5 ай бұрын
It's pretty concerning. If they can fix this, what other control do they have over our infrastructure? or are we truly in the matrix?
@MattMiller2115 ай бұрын
Right? It makes you think about the stability of our systems. But hey, I barely spend time online. When I checked my portfolio with Desiree Ruth Hoffman, we were still in the greens. That’s been the case for 16 months straight!
@AlexYoung-215 ай бұрын
Wow, really? I've seen the name Desiree Ruth Hoffman before but can't figure out where.
@JackMyers-br2vi5 ай бұрын
Probably from her forecast on Nvidia before the pump. But how are you in the greens with all the fluctuations due to the election and everything else? Can you share her strategy?
@MattMiller2115 ай бұрын
Honestly, just schedule a call with her. She has vast knowledge in finance and really knows how to navigate these times. I handed over my portfolio to her so I can focus on my family. These days, things just get scarier and scarier.
@PatNeedhamUSA5 ай бұрын
I find it kind of hilarious that a third or half of their PIR document is marketing material
This company is named 'Clownstrike' for the rest of time.
@GreyDeathVaccine5 ай бұрын
Yep, until until the Sun swells and swallows the Earth.
@theapexsurvivor95385 ай бұрын
I mean, I'd almost argue that they earned cloudstrike, as it's basically a cloud software and it was more effective than a LOIC...
@ManaVkum-w9b5 ай бұрын
One of the most comprehensive and detailed explanation of the shit that hit the fan...in interesting and funny manner with enough dash of frustration. This incidents proves we are living at brink...and domino effect can happen ANY time.
@DuraanAli5 ай бұрын
I have a small software business, it's not even my full time job and I don't do updates unless it goes through unit tests, staging and behavior monitoring with applications like Sentry and Hotjar! It's so weird to see a huge company like that just pushing updates even if they're urgent.
@hugazo5 ай бұрын
So tldr: so many avoidable errors, that exploded in the most spectacular way
@kyle70235 ай бұрын
"Disasters don't just happen, they're a chain of critical events"
@stevengill17365 ай бұрын
The holes in the Swiss cheese line up. ;*[}
@Canleaf085 ай бұрын
@@stevengill1736Mentour Pilot vibes.
@robotorfeed68065 ай бұрын
painfully reletable
@luketurner3145 ай бұрын
Aviation has the Swiss cheese model ClowdStrike had the NO cheese model
@starnumber120465 ай бұрын
As an avgeek, this comment is approved
@stevengill17365 ай бұрын
My sentiments as well...
@halbeik5 ай бұрын
A smoke test would've caught this at any point in the pipeline
@MeriaDuck5 ай бұрын
Should have indeed.
@cabanford5 ай бұрын
So this update bricked every machine it was pushed to... Screw the Rolling Updates - didn't they test this on even a "single" internal machine before their Friday push??? WTF. Managers heads (CEO)
@shimirel5 ай бұрын
Given their list had "local developer testing" we both know the answer ;-)
@coopercummings83705 ай бұрын
It wasn't just a Friday push, it was a 10pm Friday push.
@cabanford5 ай бұрын
@@coopercummings8370 Sadly, it was 24 hours of Friday around the world (not just local dev time 🤣🤦)
@coopercummings83705 ай бұрын
@@cabanford local Dev time absolutely matters in this case though. If they pushed their completely untested changes Friday morning they still would have had people in the office to notice they fucked up and roll it back before it blew up as badly as it did, but when they push it at 10pm that was probably the last guy in the office and he left immediately after pushing the patch.
@OhhCrapGuy5 ай бұрын
The ABCs of running a company like Crowdstrike: Airline: We forgot to check if we put any fuel in the plane, or if the engines worked, or if they were attached, but at least there was a pilot in the plane before takeoff. And it's a good thing that Southwest pilot was on our flight. Bar tending: We have an excellent bar, high quality glassware, exquisite lighting, a phenomenal sound system, and only the best point-of-sale system available. Let's open the doors, it's time for the grand opening. ... wait, we have to buy the booze? I thought whenever a bartender came in, they'd bring the booze with them! What do you mean "hire a bartender"? Car manufacturing: Look, we made sure that all the wheels were firmly attached to the vehicle. Sure, one was bolted to the hood, another was attached to the middle of axle, which caused the axle to snap in half, and the last one was put in place of the steering wheel, but we made DAMN sure that it had all 3 wheels attached this time. ...what do you mean "4 wheels"?
@conceptrat5 ай бұрын
A little bird told me that the CI/CD process was taking too long and the maxing out the processors. So they removed some parts of the process (tests?) and then pushed start again. And here we are. Partially.
@OhhCrapGuy5 ай бұрын
The tests are maxing out all the processors? That sounds like a *failing test* to me. bEtTeR TuRN oFf ThE uNIt TeSTs
@EwanMarshall5 ай бұрын
Ooh, tests fail, lets remove the tests... yeah, great idea... The thing is, the amount of smaller similar incidents over the last year raises questions about their testing practices before their statements. I think gross negligence can easily be made out here.
@Elesario5 ай бұрын
Sounds like another company that isn't interested in investing development time into the test and relevant toolchains, because they want to focus on their core product.
@itsTyrion5 ай бұрын
source
@toastrecon5 ай бұрын
I don’t think that the PR team is necessarily inept, I’d bet more than a few $10 Uber Eats cards that the statement was wordsmithed for hours by engineers, lawyers, executives and whoever else felt like their neck was on the line. Endless revisions until it said everything and yet nothing.
@TehPwnerer5 ай бұрын
$10.00 gift cards? A slap in the face would be less insulting
@SforSamPlays5 ай бұрын
They had also rolled back on the gift cards apparently. Which somehow is worse
@Space_US5 ай бұрын
@@SforSamPlays how? did uber revoke them?
@SforSamPlays5 ай бұрын
@@Space_US it was mentioned at the very end (I didn’t get that far when leaving the comment) Either CrowdStrike went and invalidated any not used (probably due to people being rightfully upset) or Uber Eats did it cause they thought it was all fraudulent (since scammers use gift cards to transfer money and stuff)
@BroudbrunMusicMerge5 ай бұрын
@@Space_US Yes, sorta. Uber detected a possible fraud due to the large number of cards and deactivated them
@Space_US5 ай бұрын
@@BroudbrunMusicMerge lmao
@Z3rgatul5 ай бұрын
WHQL is not about windows update. It's about digital signature from Microsoft. Windows doesn't allow to load non-signed drivers, unless you enabled debug mode which was made specifically for driver development. Driver can be downloaded from anywhere. But only Microsoft signed drivers can be loaded on normal windows machines.
@Elesario5 ай бұрын
You can install non-WHQL certified drivers without necessarily being in debug mode. That's what a lot of driver update programs, like the ones likely installed with your graphics card, are doing. Theo does mention this during the video. If you try to manually install a non WHQL driver you'll probably get a warning, but you can still instruct Windows to install it. Some drivers do get install by the Windows Update software, and those ones are only ever WHQL certified.
@Z3rgatul5 ай бұрын
@@Elesario I just went to Device Manager -> Display Adapters -> my RTX GPU -> Properties -> Driver tab Digital Signer - Microsoft bla-bla-bla You can click on Driver Details and see that every single file in the list is signed by Microsoft If you don't believe this information, you can manually check every individual sys/dll/exe file and Microsoft signature is here You can't install non-signed drivers starting from Windows Vista. You have to disable driver signature verification on the system level and reboot machine. There is no warning that can be bypassed.
@wrfsh5 ай бұрын
3:54 Hey Theo, you're wrong here about graphics drivers. WHQL is not there just to put you into the next windows update. It allows the driver to be signed, and the windows kernel won't allow you to load unsigned drivers (normally, unless you're in debug mode and don't have secure boot enabled). So even if you distribute your driver through your own channel, you still need to get it signed, and for some drivers that means going through WHQL. CS sidestepped it with the channel files because they don't actually load those files as drivers, as i think you correctly point out.
@leexgx5 ай бұрын
20:20 rolled out the update on a Friday at 10pm pt (-7 or 8 gmt, 5am ish uk/eu) the best time to brick 9 million systems
@circuitousprime5 ай бұрын
To be fair it wasn't a Friday evening. It was Thursday at ~11pm CDT (Texas time), Friday ~5am BST (UK time)
@markm15145 ай бұрын
The silver lining is that admins who have things inappropriately hosted on Windows are reevaluating their options.
@akam99195 ай бұрын
I don't agree with how things are being said that "the crowdstrike update is a not a content update". From what I've been told, the signatures are over glorified config files. That being said, this is lower on the stack than I am familiar with. However, as I see it, this like adding a card to a digital CCG, but all your cards are basically config files that are evaluated when played, performing special actions, etc. Also, let's be self aware here, this is windows and a security company. Do you really want to wait a gazillion hours for an emergency update on a weekend on while cyber attack is going on to get your driver approved? As far as I can tell, the problem is not that crowdstrike is making these content updates not drivers, the problem is they didn't check for a damn null. I would also say that while crowdstrike's default behavior of "crashing" when it can't read a file is dumb...but I would also say this is not something that is totally unreasonable to think it should be default behavior. I can easily see someone thinking... "hmm... if someone tampered with some stuff to the point where I can't read this damn file, I probably shouldn't let us boot". I can also see this being part of some nested if statement, and bad logic causing problems down the line and causing this behavior. I can also see someone not telling the person who wrote the code that "hey, if some weird stuff can't be read, skip the driver. we don't want to be a DOS vector". That being said, I don't like how crowdstrike is handling this. At the very least, if you're going to be spitting on people's faces with a gift card that can't cover squat, have the damn balls to not give them anything. If you really want to be kind, not only refund/credit your customers and tell us that you are reviewing your internal processes, but fucking apologize! Own your mistake, and show us you will do better. Also, give all the un-thanked and called on a weekend IT people that were called away from their friends and family because of your mistake, a $50 uber eats gift card so they can get themselves a goddamn big mac.
@alexholker13095 ай бұрын
The Digital CCG comparison is an apt one since a game like Hearthstone both has run into the same update latency problem (due to needing Google/Apple to sign any updates to their respective mobile clients) and has prompted the company to bypass that process for emergency fixes (implementing urgent nerfs and bans on the "umpire" server without waiting to get the client update approved).
@jbutler85855 ай бұрын
Yeah this is horrifying to hear about. The fact that files aren't signed means they are a huge open target for malware. Nevermind that it would have prevented this disaster, it means the entire product and its design are so fundamentally flawed you can't be sure it's providing protection. It could be doing the exact opposite of what it's supposed to, holding the door open for attackers to do whatever they want, with no way of knowing that it's gone malicious because the kernel driver is untouched.
@TheJacrespo5 ай бұрын
I would say the exception here is a related tech company working decently according to minimal tech and engineering standards. The norm these days is a big managerial fat layer pushing the problems towards the bottom, where the technicians are burning out without any understanding from the managerial upper layer over the bolts and knots of the implementations.
@teslainvestah50035 ай бұрын
I haven't heard any stories of clownstrike deaths yet. I heard one about a woman who permanently lost her breast because she urgently needed a mastectomy, but the hospital couldn't perform the more advanced skin-sparing mastectomy they originally planned because it required blood transfusions that were to be ordered on a windows PC.
@JohnLovell-FTW5 ай бұрын
If they are a pubicly traded company dont they have to pass compliance audits? OWASP? I wonder what thier current score is :). There is no cpming back from this as a company in my opinion. So, what did they actually install? This sounds like vaporware... do the sensors actually do anything?
@gamergirlandco5 ай бұрын
of course!!! they're very good at taking up disk space :)
@kanbekan5 ай бұрын
the sensor also good for farming up money from fortune 500
@autohmae5 ай бұрын
Let's just say even Uber thought they are a scam.
@RipVanFish095 ай бұрын
Please do a video on that conversation with the math teacher.
@applepie98065 ай бұрын
Seconded. He sounds brilliant I would love to hear his opinion on all this.
@jfbeam5 ай бұрын
_"It is not code or a kernel driver."_ (fine print) But it _is_ turned into code by our kernel driver, poorly. _"Based on..."_ (an unending one sentence paragraph) Long winded weasel-words for "WE DIDN'T F...ING TEST IT." Something anyone with a single functional brain cell already knew. Had they loaded that "channel update" on a SINGLE machine, they would've seen it crash, and hopefully would not publish it. *This alone is sufficient reason to NEVER do business with Crowdstrike or anyone who's ever driven past their office.* The next paragraph is weasel-word for "we don't check for nulls". A common mistake in userspace, an inexcusable failure in kernel programming.
@GaryGreene19775 ай бұрын
If I were the CTO of a company that uses this product the first thing that would be done is calling my IT staff in and have them remove this tool from all systems in the company. This is pure incompetence
@alenasenie69285 ай бұрын
From my perspective it is funny that this happened here first than in kernel anticheats, that can potentially cause the same but for users.
@zisaizic47595 ай бұрын
it probably has happened, crowdstrike is just very ubiquitous. Delta airlines wouldn't use an anticheat.
@theswordslay35425 ай бұрын
kernel anticheats has already caused some problem, especially the intrusive one like Vanguard. some people has already reported some of their other drivers shutting down, heck, one of them crash their entire PC because of it. the difference is, kernel anticheats are only isolated to gamers community, while crowdstrike? they are spread from airplane to hospital system.
@SomeThingOrMaybeAnother5 ай бұрын
It has happened in kernel anticheats before. They just aren't as widespread.
@alexholker13095 ай бұрын
It would be nice if this debacle made kernel-level anticheat too hot to handle.
@ElmerGLue5 ай бұрын
@@alexholker1309it won’t. Gamers aren’t corporations and without anti cheats half of online gaming would turn into a wasteland or devs would need to up the costs of server management and that would mostly benefit larger game companies and raise the barrier of entry for small creators.
@mattilindstrom5 ай бұрын
I wonder if ClownStrike felt the update was so urgent a rolling release was clearly out. My feeling is their process is just deeply defective: the testing failed, the roll-out failed, and their response to all of this has failed. If I had any loose money, I'd be shorting their already battered stock so hard.
@autohmae5 ай бұрын
The blog post seems to confirm their was no rolling release update process at all.
@starnumber120465 ай бұрын
Cloudflare once did this, then they dos'd themselves because it contained catastrophic backtrack
@mattilindstrom5 ай бұрын
@@starnumber12046 I remember that, it's what one gets not understanding how regexes may have horrible scaling built into the way they work. The usual regex libraries are optimized to the hilt with the compilation to a FSM, a pure PEBCAK situation there.
@HarithBK5 ай бұрын
what i find shocking is when all these companies purchased crowedstrikes services they didn't bother asking the sales team basic operational questions that wouldn't be secret. the IT people at these companies also needs to reamed by there incompetence to do basic checks over the software they order.
@GackFinder5 ай бұрын
Agreed. This responsibiIity IargeIy faIIs on the CIOs and CTOs of the companies that bought into CIownStrike. But... I've been an IT consuItant for some 20 years now, and I can teII you, CIOs and CTOs are in generaI extremeIy incompetent from a technicaI standpoint. They are usuaIIy in the positions they are in because they are good taIkers, or because they are friends with the CEO.
@itskdog5 ай бұрын
I realised only now why there's no apology - the lawyers told them not to as apologies are taken as admissions of guilt in the courts and will garuantee a defeat, when by not apologising they have a chance, however slim, of minimising the damage.
@zoeherriot5 ай бұрын
The issue is at Ring 0 in the windows kernel, a crash means a BSOD. That's intended behavior. But they clearly didn't validate that the file they loaded had valid data in that driver. Easy mistake to make, easy one to catch before you push it to the world. They had a similar issue with some Linux distro's earlier in the year. The "system performance" checking may be related to that. But, there is a pattern.
@MyBlogsTV5 ай бұрын
And this is why you don't push to prod on Friday
@nicejungle5 ай бұрын
With Linux, you can reboot with the previous kernel, it's just a selection menu at start On Windows, you're screwed because it's a toy OS
@zoeherriot5 ай бұрын
@@nicejungle um.. you weren't really screwed anymore than you would have been with Linux on Windows. You would still need to physically go to the machine and change the configuration / or in the case of windows delete a file.
@nicejungle5 ай бұрын
@@zoeherriot One major difference : * on linux this can be done in one reboot without any technical knowledge * on Windows, you have to use magic to know the faulty file to delete
@tonytsai66005 ай бұрын
@@nicejungleYes, Windows is suck, but if this incident happened to Linux with this extent, it still would be a disaster, IT still needs to fix these servers physically or remotely. And the whole world has already been impacted. Why Windows suck is cause the bit locker to hinder you fix the issue quickly, but bit locker is also a good tool to protect your data. So what I understand is there is no perfect os if any unexpected thing happened, and that’s the reason why we need to buy insurance.
@maciejtrybilo5 ай бұрын
Love how they surpassed the "works on my machine" memes by claiming that local developer testing is only going to be introduced now.
@GreyDeathVaccine5 ай бұрын
They don't when to shut the fuck up for their own sake.
@lornova795 ай бұрын
Crowdstrike developers use only macOS so local testing the Windows sensor would be complicated...
@melimsah5 ай бұрын
Seeing you progressively lose your mind throughout the video is a work of art.
@Hexanitrobenzene5 ай бұрын
Yeah, this video is nerdy stand up comedy show :)
@wlockuz44675 ай бұрын
The incident was an accident, but the gift cards were planned, approved and handed out with an intent. This just goes to show that they care more about saving face than help their customers. Their service costs upwards of $180 per device per month. If you make a very conservative example of a company where 100 devices need Crowdstrike, that's $18k a month. The $10 gift card means absolutely nothing in comparison, if anything it's an insult, a f*** you. If they really want to show that they care, they should waive their service charges for a few months for the affected customers. Anything less would be an insult. Last but not least, this is just wishful thinking, but this should be treated as a cyberattack with Crowdstrike brought to court and held accountable.
@CC212005 ай бұрын
I've been saying that I have personally permanently lost more time and data due to automatic updates than from all other malware combined, and that's considering that I often go to dodgy websites. Are you sure the $10 voucher thing is real? Because it sounds fake... as in I think we're beyond incompetence here, and that you'd have to be a true sociopath to make that offer, or maybe someone who's deliberately trying to tank the company. Can't believe I'm saying that but I'm starting to seriously think the response is more damaging than the crash itself.
@daveh05 ай бұрын
It's absurd to say auto software updates is the cause. This is caused by local admin/root on thousands of machines. Allowing them to make changes everywhere at that OS level is the problem.
@shishsquared5 ай бұрын
The best part is its actually not a "binary file" if it's all zeroes.
@nibblrrr71245 ай бұрын
you don't even have to open the unary file, just stat it to get the length and you have all the information encoded in it. GENIUS!
@the-answer-is-425 ай бұрын
@@nibblrrr7124It's the best optimization! No parsing whatever is in the file is needed! Just make sure some of your zeros are more zero than other zeros and I promise, it will work perfectly!
@tma20015 ай бұрын
That turned out to be a red herring and nothing to do with the crash as CrowdStrike clarified in an inital post. Not everyone had an all zeros channel file - valid files have a magic byte signature of 0xaaaaaaa at the beginning. Disassembly of the actual kernel driver shows this is checked for. Neither does a channel file contain code. Experienced hackers have speculated that channel files are initially preallocated and only updated after the temporary downloaded file is parsed correctly but of course the kernel driver crashed and left it in this state. This makes sense of the all the facts. The actual null ptr dereference was due to a faulty field in the channel file that led to a memory allocation error from the non-paged memory pool, either allocation was the incorrect size or misaligned. The BSOD exception is a non paged access fault. The rest is history with lack of staged rollouts of this hotfix.
5 ай бұрын
@@tma2001 That makes so much sense! Does anyone have a copy of the original file? It would be great to see that instead of information based on misinformation from a Twitter post (the one about the file being all zeros that went viral).
@tma20015 ай бұрын
CrowdStrike published a Tech Analysis: 'Channel File May Contain Null Bytes' explanation at the time of the outage which I and everyone else missed which finally clears up that mystery! My intuition was correct but it is Windows itself rather than the CS driver that first erases data of disk sectors for a newly allocated file for security reasons. Writes don't occur until file is flushed by Cache Manager which doesn't happen if the driver crashes with a BSOD.
@emaayan5 ай бұрын
one of the comments on dave's video said that a customer actually DID configure staggered rollouts on their servers, but CS actually ignored it.
@CapsAdmin5 ай бұрын
Did they ship zeros or did their "content update" become zeros right before testing and shipping? Obviously the receiving end should also have some sort of validation other than "graceful" runtime exception handling.
@tma20015 ай бұрын
the all zeros channel file is a complete red herring and Theo keeps propagating it and coming to the wrong conclusion - the initial reports about this were rightly causing a lot of confusion. However not all customers had an all zeros channel file - CrowdStrike clarified this was not issue and valid files have a magic byte signature at the start which is checked for as disassembly of the kernel driver has confirmed.
@itsTyrion5 ай бұрын
@@tma2001 let's remove that part then, most points still stand as-is or slightly altered.
@tma20015 ай бұрын
@@itsTyrion agreed - CrowdStrike actually tell us in this press release what the issue was and its a shocking admission. One word in that press release gives the game away - _trust_ . The 2 previous named pipe template instances had worked in production as expected so was now trusted - they _assumed_ another slight variant of the same kind wouldn't cause any problems. And as a rapid response update it is not subject to policy staging controls the rest is history. Basically complacency led to their downfall.
@junit16065 ай бұрын
Arent you the one who was saying that software testing is not needed, and its better to let it fail on users land?
@GreyDeathVaccine5 ай бұрын
Different use cases. Please don't compare apples to orange.
@junit16065 ай бұрын
@@GreyDeathVaccine how does testing difers from testing? And how does letting application fail for the user is not what he is doing?
@kamertonaudiophileplayer8475 ай бұрын
FYI: Austin, TX is 2 hours ahead of California.
@Flameboar5 ай бұрын
The time was what used to be called GMT, not CDT.
@kamertonaudiophileplayer8475 ай бұрын
@@Flameboar It makes a lot sense.
@NormanLyon5 ай бұрын
I agree with the fact the CS messed up bad. But I have disagreements with your analysis. 1. What I understood from their RCA was that their validation test suite failed in such a manner as to not generate logs from the most recent update, but to instead use logs from previous updates. Yes that's horrible, but it's a bug, not a lack of testing. 2. The fact that validation was only happening on send was a problem. Validation of received updates must occur for anything to be considered resilient. The driver should have refused the update, and stayed on the previous version (sending an alert of failed update). Failing that, it needs to version itself for some level of rollback to previous known good (and send alert of failed update). 3. I can understand the lack of booting on valid update. Many in the security world would call an assumption of being protected, having protection broken as a worse state than knowingly having no protection. When dealing with corporate and government regulations, these concerns are real. 4. I agree that automatic update to "latest and most likely to crash" is a horrible stance. It's also the stance taken by many in the IT industry today as a means to combat never updating. Enterprise software should have a middle ground if it's ever going to be taken seriously as an enterprise solution. Canary testing and staged promotion lifecycles need to be taken seriously. 5. WHQL is also at fault here. Security vendors require moving at a quick pace, and as MS has a near-monopoly on OS for end-user systems, regulatory concerns require that access MS security teams can take advantage of must be usable by non-MS entities. WHQL needs to be able to provide meaningful and quick value in cases such as security products. The fact that this driver has these faults yet is certified is proof that CS deserves their certification pulled, and must re-certify under a worthwhile process. 6. Common corporate implementations of bitlocker make this event a nightmare. Too many companies believe that they empower their employees to manage their own devices. Yet very few users were able to access the bitlocker passcodes for their systems. Each company that fell victim to this situation failed miserably at their disaster recovery tabletops and need to fix this ASAP. I wouldn't want to do business with CS now. Yet any enterprise-wide product of this nature is difficult to change. CS's competitors have their own problems. The general mindset I've seen from every vendor in the security focused enterprise software market is that they are filled with unjustified bravado. This incident won't change that. I just hope it will make some things better, especially as the real issue doesn't completely belong to CS.
@RichardHennigan5 ай бұрын
Even a non tech person would come up with the idea of rolling releases to make sure things were working. There MUST have been multiple people in the company pointing out this dangerous flaw.
@wlockuz44675 ай бұрын
It doesn't even have to be something as complex as rolling releases. Something this severe would be easily caught if they actually tried to run their production build.
@marknn35 ай бұрын
I wonder if CrowdStrike is using the new Microsoft Dev Drives file system to create the content update files. I had experienced (are reported to MS) files getting all zero content. It was related to Dev Drives + virus scanner (f-secure derivative). After uninstalling it and switching to Defender, no problems since.
@dondekeeper29435 ай бұрын
Crowdstrike is now officially a ransomware service provider 🤣
@frankroos11675 ай бұрын
Erm....as a developer I am used to one thing: The least thing we have done before releasing an update package is test the installation of the package, as it will be released in a safe environment. Obviously, that test is missing. Which is mind-boggling to me. In the almost 30 years of my carreer, I have not worked for any company that didn't have that test. And the companies I worked for are not all big companies. Some as small as a company built around a team of 4 developers. It is so basic. And so everywhere I have been. I can't even imagine a company that doesn't have it. And now there is such a company that is this powerful? I am scared. Really scared. As far as I am concerned, CrowdStrike should go bust, just for having that test missing. I don't want them to improve. I just want them gone. If yo make a mistake like this, you are bound to make more. Question for MS: Why a certification for a driver that has these shortcomings? Null-checks and verifying unsigned files getting loaded. In most code it shouldn't be too hard to see these things are in need and missing.
@debasishraychawdhuri5 ай бұрын
Imagine Tesla pushes an update that makes a sudden left turn at random times.
@kingofichigo5 ай бұрын
I wouldn't be surprised tbh
@EscapeMCP5 ай бұрын
@@kingofichigo Yeah, seems immeasurably possible
@ytlongbeach5 ай бұрын
Theo, thanks for putting this out there. 99% of people would never understand what you laid out, without such an incredible run down on the CrowdStrike failures. This video should be required viewing for all large company executives and the entire US congress. kudos !!
@tma20015 ай бұрын
except he was wrong on the all zeros channel file - that turned out to be a red herring but it was initially very confusing because not everyone had an all zeros file. CrowdStrike made this clear in their initial blog post (not the more detailed one in this video).
@nu11pointerexception5 ай бұрын
CrowdStrike Blue Falcon is more fitting name after bricking millions of computers
@drankthetranquil5 ай бұрын
This was a great video and I really appreciated the deep dive and discussion. This is almost just... unbelievable haha
@stevengill17365 ай бұрын
My question would be, can Crowdstrike even survive this incident? If I was an IT manager, other options would be of interest right now...
@piccalillipit92115 ай бұрын
*SIMPLE FACT OF LIFE - NO ONE IS GOOD AT THEIR JOB* We like to think we are, we hope others are, but in reality, we are at best less incompetent than most others and we call that "GOOD" - we didn't evolve to do constant work like a machine and we are really BAD at it. The guy hand building a car in a small team start to finish HE is truly GOOD at his job, the guy coding a product he has no interest in for a giant company - NOT GOOD.
@JohnDoe43215 ай бұрын
02:31 - WHQL is NOT a "hard certification to get". For a "software driver" like anti-malware, all that really gets tested is that the driver correctly implements the PNP and power management state machines. That isn't easy to get 100% right if you try to write the code from scratch, which is why nobody does that. Everybody either does a copy/paste from one of the Microsoft sample drivers, or uses Microsoft's KMDF framework, which handles it for you. WHQL requires more in-depth testing for known device classes, such as storage, network, and video. These drivers have to pass class-specific tests, which more fully exercise the driver functionality.
@animanaut5 ай бұрын
lol, so they introduce 'works on my machine' just now? where tf did they start to begin with?
@MiklosGalicz5 ай бұрын
Aren't external files in this case the same as virus definitions for defender and all of the other virus scanners out there?
@autohmae5 ай бұрын
Yes, similar, so similar in fact McAfee had a similar issue in 2010, when the clownstrike CEO was the CTO at McAfee (!)
@hungrymusicwolf5 ай бұрын
15:46 "gracefully handle" - As gracefully as a nuke and twice as !quiet.
@alexedelweiss32675 ай бұрын
I just wonder what were they doing with all the money they receive, because it's a really expensive product... So expensive that it's unaffordable for most companies.
@MichaelSchuerig5 ай бұрын
I don't have any special insights into the situation, but my hunch is that the fault occurred in a place where nobody thought it could happen. Let's assume the "content update" is checked and perfectly valid. This update needs to be provided to clients by some service. That service could simply be buggy and misbehave for various reasons that results in corrupt data being transferred. Of course, this risk could be mitigated by validation on the clients and rolling releases.
@wulf21215 ай бұрын
Theres one small misunderstanding in the beginning on your side: WHQL certification is not (just) regarding using windows update to push drivers. All the newest drivers that can be downloaded from Nvidias website are WHQL certified before Nvidia makes them available as well. Basically any device driver running in the kernel needs to be certified, or Windows will complain upon Installation (In Home Version nowadays it's completely blocked to install such driver and only with a Pro License you even have the option to override this. At times of Windows XP there still was the option to ignore the warning from Windows and install the driver anyway).
@JohnDoe43215 ай бұрын
06:35 - This is wrong. Drivers for actual hardware execute different code paths based on what they receive from the device. This isn't that different than Crowdstrike's driver executing different code based on a definition file. Any driver that doesn't fully validate all data can crash. Real world example: There was a network card (NIC) which had its own firmware. The vendor updated the firmware, and soon after that, reports started to come in about Windows driver crashes. Crashes were infrequent and intermittent -- it took a while to root cause. It turned out that the new firmware would sometimes send data to the driver that the driver didn't interpret properly. This wasn't a failure of the WHQL process. It was a bug, and Bugs Happen. WHQL is intended to reduce the number of driver bugs -- it never promised to eliminate 100% of them.
@Pekz00r5 ай бұрын
Several of the steps you suggest would delay the delivery of the update significantly and in this case every minutes delay can be very critical. Even every second can be important when you racing against a virus that is spreading rapidly. You need to reach the machines before the virus. But yeah, you are absolutely right that there are many things they could have done that would not delay the rollout much, but still prevent a bad rollout reliably. It is as you said insane that they are rolling out boot driver updates to millions of machines without more verifications.
@ForLineage-dr5ju5 ай бұрын
The channel file is probably not being signed. That's why it tried to run a bunch of zeros. Unless somehow a bunch of zeroes got signed, the change was never caught with a diff...and no further testing was done. Just pushed out to everything, everywhere all at once. They could have tried it on a work laptop in the office. I bet they don't have the ability to even push to one targetted test machine. I bet their options are "Push to Airlines or push to hospitals for testing".
@CodingAbroad5 ай бұрын
Rumour has it Microsoft is already looking to sacking off crowdstrike
@MacGuffin15 ай бұрын
This is very surface level look. You should see Enderman's recent video on Dave. This attack vector was well known and it was almost certainly an attack that was about money, just like the denial of it being an attack was too. I'm sick of telling everyone about the WSB post where the same guy made millions on this AND snowflake by shorting them hours before they were hacked. How can you red-team something that doesn't boot? not to mention hashing, manifests etc. Have a look at the cobalt module it was actually patching for, it would have required a lot of testing. As if they shipped this willingly on a friday night... Anyhoo they are still the best choice in this area, all these guys have issues
@Henoik5 ай бұрын
As a cyber security professional, what freaks me out about this is that if a threat actor were to reverse engineer the driver, they could basically inject instructions to the driver that turns the Falcon agent into a C2 client. Also, just about every EDR solution and antimalware solution does this. The "You need to restart your PC" screen you get after installing an antimalware solution? Yup, that's the antimalware solution letting you know WIndows needs to start all the new drivers you just installed.
@2rx_bni5 ай бұрын
Glad to catch the beginning of this since I showed up to stream late and only caught part of it.
@tma20015 ай бұрын
Not everyone had an all zeros channel file which leads to more confusion. CrowdStrike if we believe them, made it clear that null bytes in this or any channel file were not at issue. This is lent weight by the fact that valid channel files have a magic byte signature which disassembly confirms is checked for. On a hacker forum some speculated the file was pre-allocated but not updated from the original download after the driver crashed. So all zero channel files are probably a red herring.
@gregutz42845 ай бұрын
I don't believe a lick that CS is pushing.
@silvioschurig7495 ай бұрын
You kind of ignore that this is running in Kernel Space. Trying to gracefully recover from fatal kernel error is riskier than immediate shut downs because it is not just some application process that failed. Basically all you know is at that point: your kernel is corrupt. Also huge issues not just with this company: the terms of service. Crowdstrike basically puts a disclaimer in there that their software / services guarantee nothing, are not suited for production environments and they can't be held liable for anything they do or cause, directly or indirectly. This next statement is victim blaming and terms of service like that this seem😂 to standard for these 'security' companies ... But really: why does anyone sign license / service agreements with a company who cannot guarantee any level of function for their products? I mean if you hire some company like Securitas to handle the cash collection from your points of sale, would you sign terms of service stateing "we may ir may not come by and pick up your cash. If we do come by we may loose some or all of it prior to depositing it to your bank. You are happy to assume that risk."? But for software on your production that's ok?
@grokitall5 ай бұрын
you cannot recover from the initial crash, you can recover from the boot loop which kept all the machines down. as to the disclaimer, leonard french covered it on youtube, and it is invalid in the case of negligence, which this clearly is.
@DarkerStarSword5 ай бұрын
The file was NOT full of zeroes - I've verified that was not the case on two affected machines. I suspect the people who saw a file full of zeroes experienced a crash before the data was flushed from buffers in RAM to disk.
@shotgeek5 ай бұрын
I was admitted to a local hospital the day before this. On Friday almost every hospital system (imaging, laboratory, even food service) were down.
@JonathanSwiftUK5 ай бұрын
The kernel mode driver did not crash the system, as such, it only faulted when the channel file with zeros was executed / attempted to be used. We fixed most by deleting the file remotely, no safe mode, no moving the drive to another machine and deleting the file, then moving the drive back, simple and quick, and when we deleted that file the system continued to boot. Job done. Just removing that fixed it. They probably have their own definitions language or pseudo-code, like WASM.
@kamilkardel27925 ай бұрын
Giving control of deployment to the customer is actually a smart move because, as opposed to the vendor, the customer knows the exact roles of specific machines. With that control, you can make sure that a bad update doesn't take down all servers running a specific application or all user workstations in some department.
@landmanland5 ай бұрын
With gracefully they mean “we let the driver boot without the file”.😊
@gabotron945 ай бұрын
File not found. Improvise? (y/n)
@mchisolm05 ай бұрын
I did not realize the assumptions I made when learning about this...thanks for taking the time. The reality of getting out of the signing of code by moving the code to another file...crazy. Could Windows have caught that in the signature step? They could they have been like "Hey, it looks like you are running code in this channel file over there...but you didn't tell us what the content of the channel file was. You think we can sign off on that?"
@acerreteq7035 ай бұрын
Thank you for this video analyzing the matter and for letting you feel so much pain in the process of producing it. But don´t overdo it, we need you. 🖖
@Wampa8425 ай бұрын
Crowdstrike (noun): a large-scale outage or damage of infrastructure caused by the combination of an external vendor's action and industry-wide reliance on said vendor. Example: "The latest crowdstrike has caused a massive disruption of emergency services." Crowdstrike (verb, simple past and past participle "crowdstruck"): to cause such an outage to happen. Example: "Our boss moved our email service off-site and now we can't access it because of the outage, I knew he'd be the one to crowdstrike us!"
@dronicx79745 ай бұрын
I hope the literally unusable (because Uber Eats cancelled them thinking they were all fraud gift cards) $10 gift card is able to offset some of Delta's costs with their literal 2 years worth of flight cancelations withing a week
@hsvandrew5 ай бұрын
What makes me so mad about these failures (as a software engineer) is that unlike in other industries (i.e. construction), there will be no punishment for this negligence and misconduct. If the CTO was a construction engineer, jail time would be associated with this deliberate, incompetence. Yet in IT, negligence is just allowed despite the financial impact or lives affected. It is about time this changed so that good people doing the right thing are rewarded, and those that aren't get punished. Crowdstrikes share price before this incident was so high, they effectively had 'unlimited money'. The fact they were able to operate like a part time developer is just insane. This is also a test for corporate America and the CTO's who signed up for this product without any due diligence. Will they actually remove this product & company from their systems or by lack of action promote this type of behaviour as acceptable.
@TheMrbrookster5 ай бұрын
Clearly they don't bother running the software full stop, imagine if these clowns were running CloudStrike on their own servers, all their systems would have been down for a week before they could roll back the change.
@phillipsusi17915 ай бұрын
A rolling release means you get the latest stuff right away all the time That is exactly what they did. As opposed to a traditional release, which is tested thoroughly in development, allowed to settle with more testing, used by early adopters for a while, and THEN recommended for everyone else to use, but they all still have to choose to install the new release, when they feel like it.
@Flameboar5 ай бұрын
There is a interview on KZbin of the Clownstrike CEO after the BSOD disaster. He was blinking at a very high rate and coughed as he tried to answer the interviewers' questions. This gave me the impression that he was fully aware of how badly CS had f'ed up.
@television92335 ай бұрын
This is unfathomable levels of disastrous failure in all levels of their pipeline. I would legitimately have more faith in them if they said that they were in fact hacked. I would never understand any competent engineer still using their services.
@Canleaf085 ай бұрын
It shows how brittle the computer infrastructure is. Been a dev since 2020. The Log4Shell / Log4J vunerability from 2022 feels very small.
@TheStevenWhiting4 ай бұрын
24:51 Maybe but not if the drive is encrypted with Bitlocker, then you wouldn't be able to read the drive to do that.
@ADHJkvsNgsMBbTQe5 ай бұрын
Once again, people seem surprised that the emperor isn’t actually wearing any clothes.
@tablettablete1865 ай бұрын
24:05 That actually makes sense, otherwise the software wouldn't be able to monitor things during boot. Like this: -> OS Starts -> Dtiver loads (but does not parse the rules) -> Malware runs invisible -> Driver parses the rules and starts monitoring the machine Its us pretty common to run at boot and start monitoring as early as possible.
@jeremysollars59225 ай бұрын
Any sufficiently advanced incompetence, is indistinguishable from malice. I've been getting a lot of mileage from this quote recently xD
@ambhaiji5 ай бұрын
Theo is Gus from Psych blasting at CrowdStrike(Shawn) for all the alternative choices he has in handling how to deal with the escaped prisoners on the boat.
@3ventic5 ай бұрын
4:02 by the time nvidia's drivers are out of beta, they're WHQL signed. Getting 3rd party driver updates via Windows Update is relatively new (2016) and nvidia seems to have preferred to bundle it with their own app along with all their other functionality (which first released 2013).
@matthewstott34935 ай бұрын
CrowdStrike has claimed that the content update was not all hex zeros. It was instructions and it had a bug, that crashed the kernel and due to Windows handling of NTFS files when that happens the file was replaced with zeros. It still managed to keep loading and crashing Windows at every boot.
@Wlerin75 ай бұрын
So... what I'm getting from this is that CrowdStrike is basically crowdsourced security?
@Skirakzalus5 ай бұрын
When first hearing about CrowdStrike it took me way too long to figure out that this was not the name of a deliberate cyber attack but a supposedly legitimate company.
@NankitaBR5 ай бұрын
That's why I *always* tick off the automatic updates in everything where I have the option to do it. I'm not a tech-savvy person, but even I know that if there is a broken update on something I use if I don't have automatic updates I can figure out I shouldn't update at that moment and wait for them to fix the issue.
@jordanjackson61515 ай бұрын
The ending said it all. Yikes!
@epicmap5 ай бұрын
17:42 can you imagine a validation of a file which would pass when the file is all zeros? It's hard for me to come up with at least one such validation.
@DaxSudo5 ай бұрын
It’s like the rust neon crate. It’s not part of JavaScript or your react ecosystem or your node stuff but it is a node binary that exports functions. Oh, we’re just going to ship a bad binary that crashes everything. But even that’s a misnomer because the binary is configured to run in the node run time rather than be an obfuscated configuration file.